[jira] [Resolved] (SPARK-39310) rename `required_same_anchor`

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39310.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37585
[https://github.com/apache/spark/pull/37585]

> rename `required_same_anchor`
> -
>
> Key: SPARK-39310
> URL: https://issues.apache.org/jira/browse/SPARK-39310
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0
>
>
> https://github.com/apache/spark/pull/36353#discussion_r882216133



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39310) rename `required_same_anchor`

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-39310:


Assignee: Apache Spark

> rename `required_same_anchor`
> -
>
> Key: SPARK-39310
> URL: https://issues.apache.org/jira/browse/SPARK-39310
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>
> https://github.com/apache/spark/pull/36353#discussion_r882216133



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39310) rename `required_same_anchor`

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-39310:


Assignee: Yikun Jiang  (was: Apache Spark)

> rename `required_same_anchor`
> -
>
> Key: SPARK-39310
> URL: https://issues.apache.org/jira/browse/SPARK-39310
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>
> https://github.com/apache/spark/pull/36353#discussion_r882216133



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39150) Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas to 1.4+

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39150.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37584
[https://github.com/apache/spark/pull/37584]

> Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas 
> to 1.4+
> ---
>
> Key: SPARK-39150
> URL: https://issues.apache.org/jira/browse/SPARK-39150
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>
> [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2333]
> [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2265]
> all doctest in https://github.com/apache/spark/pull/36712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39170) ImportError when creating pyspark.pandas document "Supported APIs" if pandas version is low.

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-39170:


Assignee: Yikun Jiang

> ImportError when creating pyspark.pandas document "Supported APIs" if pandas 
> version is low.
> 
>
> Key: SPARK-39170
> URL: https://issues.apache.org/jira/browse/SPARK-39170
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>
> The pyspark.pandas documentation "Supported APIs" will be auto-generated. 
> ([SPARK-38961|https://issues.apache.org/jira/browse/SPARK-38961])
> At this point, we need to verify the version of pandas. It can be applied 
> after the docker image used in github action is upgraded and republished at 
> https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage.
> Related: https://github.com/apache/spark/pull/36509



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39150) Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas to 1.4+

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-39150:


Assignee: Yikun Jiang

> Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas 
> to 1.4+
> ---
>
> Key: SPARK-39150
> URL: https://issues.apache.org/jira/browse/SPARK-39150
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>
> [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2333]
> [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2265]
> all doctest in https://github.com/apache/spark/pull/36712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-40142:
--

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40142:
-
Fix Version/s: (was: 3.4.0)

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40142.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37581
[https://github.com/apache/spark/pull/37581]

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40145) Create infra image when cut down branches

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40145:


Assignee: Yikun Jiang

> Create infra image when cut down branches
> -
>
> Key: SPARK-40145
> URL: https://issues.apache.org/jira/browse/SPARK-40145
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40145) Create infra image when cut down branches

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40145.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37579
[https://github.com/apache/spark/pull/37579]

> Create infra image when cut down branches
> -
>
> Key: SPARK-40145
> URL: https://issues.apache.org/jira/browse/SPARK-40145
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39170) ImportError when creating pyspark.pandas document "Supported APIs" if pandas version is low.

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39170.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37583
[https://github.com/apache/spark/pull/37583]

> ImportError when creating pyspark.pandas document "Supported APIs" if pandas 
> version is low.
> 
>
> Key: SPARK-39170
> URL: https://issues.apache.org/jira/browse/SPARK-39170
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Priority: Major
> Fix For: 3.4.0
>
>
> The pyspark.pandas documentation "Supported APIs" will be auto-generated. 
> ([SPARK-38961|https://issues.apache.org/jira/browse/SPARK-38961])
> At this point, we need to verify the version of pandas. It can be applied 
> after the docker image used in github action is upgraded and republished at 
> https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage.
> Related: https://github.com/apache/spark/pull/36509



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38648) SPIP: Simplified API for DL Inferencing

2022-08-19 Thread Xiangrui Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582071#comment-17582071
 ] 

Xiangrui Meng edited comment on SPARK-38648 at 8/19/22 10:55 PM:
-

I had an offline discussion with [~leewyang]. Summary:

We might not need to introduce a new package in Spark with dependencies on DL 
frameworks. Instead, we can provide abstractions in pyspark.ml to implement the 
common data operations needed by DL inference, e.g., batching, tensor 
conversion, pipelining, etc.

For example, we can define the following API (just to illustrate the idea, not 
proposing the final API):

{code:scala}
def dl_model_udf(
  predict_fn: Callable[pd.DataFrame, pd.DataFrame],  # need to discuss the data 
format
  batch_size: int,
  input_tensor_shapes: Map[str, List[int]],
  output_data_type,
  preprocess_fn,
  ...
) -> PandasUDF
{code}

Users only need to supply predict_fn, which could return a (wrapped) TensorFlow 
model, a PyTorch model, or an MLflow model. Users are responsible for package 
dependency management and model loading logics. We doesn't cover everything 
proposed in the original SPIP but we do save the boilerplate code for users on 
creating batches over Iterator[DataFrame], converting 1d arrays to tensors, and 
async preprocessing (CPU) and prediction (GPU).

If we go with this direction, I don't feel the change needs an SPIP because it 
doesn't introduce a new Spark package nor new dependencies. It is a just a 
wrapper over pandas_udf for DL inference.


was (Author: mengxr):
I had an offline discussion with [~leewyang]. Summary:

We might not need to introduce a new package in Spark with dependencies on DL 
frameworks. Instead, we can provide abstractions in pyspark.ml to implement the 
common data operations needed by DL inference, e.g., batching, tensor 
conversion, pipelining, etc.

For example, we can define the following API (just to illustrate the idea, not 
proposing the final API):

{code:scala}
def dl_model_udf(
  predict_fn: Callable[pd.DataFrame, pd.DataFrame],  # need to discuss the data 
format
  batch_size: int,
  input_tensor_shapes: Map[str, List[int]],
  output_data_type,
  preprocess_fn,
  ...
) -> PandasUDF
{code}

Users only need to supply predict_fn, which could return a (wrapped) TensorFlow 
model, a PyTorch model, or an MLflow model. Users are responsible for package 
dependency management and model loading logics. We doesn't cover everything 
proposed in the original SPIP but we do save the boilerplate code for users on 
creating batches over Iterator[DataFrame], converting 1d arrays to tensors, and 
async preprocessing (CPU) and prediction (GPU).

If we go with this direction, I don't free the change needs an SPIP because it 
doesn't introduce a new Spark package nor new dependencies.

> SPIP: Simplified API for DL Inferencing
> ---
>
> Key: SPARK-38648
> URL: https://issues.apache.org/jira/browse/SPARK-38648
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Lee Yang
>Priority: Minor
>
> h1. Background and Motivation
> The deployment of deep learning (DL) models to Spark clusters can be a point 
> of friction today.  DL practitioners often aren't well-versed with Spark, and 
> Spark experts often aren't well-versed with the fast-changing DL frameworks.  
> Currently, the deployment of trained DL models is done in a fairly ad-hoc 
> manner, with each model integration usually requiring significant effort.
> To simplify this process, we propose adding an integration layer for each 
> major DL framework that can introspect their respective saved models to 
> more-easily integrate these models into Spark applications.  You can find a 
> detailed proposal here: 
> [https://docs.google.com/document/d/1n7QPHVZfmQknvebZEXxzndHPV2T71aBsDnP4COQa_v0]
> h1. Goals
>  - Simplify the deployment of pre-trained single-node DL models to Spark 
> inference applications.
>  - Follow pandas_udf for simple inference use-cases.
>  - Follow Spark ML Pipelines APIs for transfer-learning use-cases.
>  - Enable integrations with popular third-party DL frameworks like 
> TensorFlow, PyTorch, and Huggingface.
>  - Focus on PySpark, since most of the DL frameworks use Python.
>  - Take advantage of built-in Spark features like GPU scheduling and Arrow 
> integration.
>  - Enable inference on both CPU and GPU.
> h1. Non-goals
>  - DL model training.
>  - Inference w/ distributed models, i.e. "model parallel" inference.
> h1. Target Personas
>  - Data scientists who need to deploy DL models on Spark.
>  - Developers who need to deploy DL models on Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Commented] (SPARK-38648) SPIP: Simplified API for DL Inferencing

2022-08-19 Thread Xiangrui Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582071#comment-17582071
 ] 

Xiangrui Meng commented on SPARK-38648:
---

I had an offline discussion with [~leewyang]. Summary:

We might not need to introduce a new package in Spark with dependencies on DL 
frameworks. Instead, we can provide abstractions in pyspark.ml to implement the 
common data operations needed by DL inference, e.g., batching, tensor 
conversion, pipelining, etc.

For example, we can define the following API (just to illustrate the idea, not 
proposing the final API):

{code:scala}
def dl_model_udf(
  predict_fn: Callable[pd.DataFrame, pd.DataFrame],  # need to discuss the data 
format
  batch_size: int,
  input_tensor_shapes: Map[str, List[int]],
  output_data_type,
  preprocess_fn,
  ...
) -> PandasUDF
{code}

Users only need to supply predict_fn, which could return a (wrapped) TensorFlow 
model, a PyTorch model, or an MLflow model. Users are responsible for package 
dependency management and model loading logics. We doesn't cover everything 
proposed in the original SPIP but we do save the boilerplate code for users on 
creating batches over Iterator[DataFrame], converting 1d arrays to tensors, and 
async preprocessing (CPU) and prediction (GPU).

If we go with this direction, I don't free the change needs an SPIP because it 
doesn't introduce a new Spark package nor new dependencies.

> SPIP: Simplified API for DL Inferencing
> ---
>
> Key: SPARK-38648
> URL: https://issues.apache.org/jira/browse/SPARK-38648
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Lee Yang
>Priority: Minor
>
> h1. Background and Motivation
> The deployment of deep learning (DL) models to Spark clusters can be a point 
> of friction today.  DL practitioners often aren't well-versed with Spark, and 
> Spark experts often aren't well-versed with the fast-changing DL frameworks.  
> Currently, the deployment of trained DL models is done in a fairly ad-hoc 
> manner, with each model integration usually requiring significant effort.
> To simplify this process, we propose adding an integration layer for each 
> major DL framework that can introspect their respective saved models to 
> more-easily integrate these models into Spark applications.  You can find a 
> detailed proposal here: 
> [https://docs.google.com/document/d/1n7QPHVZfmQknvebZEXxzndHPV2T71aBsDnP4COQa_v0]
> h1. Goals
>  - Simplify the deployment of pre-trained single-node DL models to Spark 
> inference applications.
>  - Follow pandas_udf for simple inference use-cases.
>  - Follow Spark ML Pipelines APIs for transfer-learning use-cases.
>  - Enable integrations with popular third-party DL frameworks like 
> TensorFlow, PyTorch, and Huggingface.
>  - Focus on PySpark, since most of the DL frameworks use Python.
>  - Take advantage of built-in Spark features like GPU scheduling and Arrow 
> integration.
>  - Enable inference on both CPU and GPU.
> h1. Non-goals
>  - DL model training.
>  - Inference w/ distributed models, i.e. "model parallel" inference.
> h1. Target Personas
>  - Data scientists who need to deploy DL models on Spark.
>  - Developers who need to deploy DL models on Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40153) Unify the logic of resolve functions and table-valued functions

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40153:


Assignee: (was: Apache Spark)

> Unify the logic of resolve functions and table-valued functions
> ---
>
> Key: SPARK-40153
> URL: https://issues.apache.org/jira/browse/SPARK-40153
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Priority: Major
>
> Make ResolveTableValuedFunctions similar to ResolveFunctions: first try 
> resolving the function as a built-in or temp function, then expand the 
> identifier and resolve it as a persistent function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40153) Unify the logic of resolve functions and table-valued functions

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40153:


Assignee: Apache Spark

> Unify the logic of resolve functions and table-valued functions
> ---
>
> Key: SPARK-40153
> URL: https://issues.apache.org/jira/browse/SPARK-40153
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> Make ResolveTableValuedFunctions similar to ResolveFunctions: first try 
> resolving the function as a built-in or temp function, then expand the 
> identifier and resolve it as a persistent function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40153) Unify the logic of resolve functions and table-valued functions

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582053#comment-17582053
 ] 

Apache Spark commented on SPARK-40153:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/37586

> Unify the logic of resolve functions and table-valued functions
> ---
>
> Key: SPARK-40153
> URL: https://issues.apache.org/jira/browse/SPARK-40153
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Priority: Major
>
> Make ResolveTableValuedFunctions similar to ResolveFunctions: first try 
> resolving the function as a built-in or temp function, then expand the 
> identifier and resolve it as a persistent function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40153) Unify the logic of resolve functions and table-valued functions

2022-08-19 Thread Allison Wang (Jira)
Allison Wang created SPARK-40153:


 Summary: Unify the logic of resolve functions and table-valued 
functions
 Key: SPARK-40153
 URL: https://issues.apache.org/jira/browse/SPARK-40153
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Allison Wang


Make ResolveTableValuedFunctions similar to ResolveFunctions: first try 
resolving the function as a built-in or temp function, then expand the 
identifier and resolve it as a persistent function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40152) Codegen compilation error when using split_part

2022-08-19 Thread Bruce Robbins (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582045#comment-17582045
 ] 

Bruce Robbins commented on SPARK-40152:
---

Seems to be a simple case of missing semicolons. I think it's a very simple fix.

> Codegen compilation error when using split_part
> ---
>
> Key: SPARK-40152
> URL: https://issues.apache.org/jira/browse/SPARK-40152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Bruce Robbins
>Priority: Major
>
> The following query throws an error:
> {noformat}
> create or replace temp view v1 as
> select * from values
> ('11.12.13', '.', 3)
> as v1(col1, col2, col3);
> cache table v1;
> SELECT split_part(col1, col2, col3)
> from v1;
> {noformat}
> The error is:
> {noformat}
> 22/08/19 14:25:14 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 42, Column 1: Expression "project_isNull_0 = false" is not a type
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 42, Column 1: Expression "project_isNull_0 = false" is not a type
>   at 
> org.codehaus.janino.Java$Atom.toTypeOrCompileException(Java.java:3934)
>   at org.codehaus.janino.Parser.parseBlockStatement(Parser.java:1887)
>   at org.codehaus.janino.Parser.parseBlockStatements(Parser.java:1811)
>   at org.codehaus.janino.Parser.parseBlock(Parser.java:1792)
>   at 
> {noformat}
> In the end, {{split_part}} does successfully execute, although in interpreted 
> mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40152) Codegen compilation error when using split_part

2022-08-19 Thread Bruce Robbins (Jira)
Bruce Robbins created SPARK-40152:
-

 Summary: Codegen compilation error when using split_part
 Key: SPARK-40152
 URL: https://issues.apache.org/jira/browse/SPARK-40152
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Bruce Robbins


The following query throws an error:
{noformat}
create or replace temp view v1 as
select * from values
('11.12.13', '.', 3)
as v1(col1, col2, col3);

cache table v1;

SELECT split_part(col1, col2, col3)
from v1;
{noformat}
The error is:
{noformat}
22/08/19 14:25:14 ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 42, 
Column 1: Expression "project_isNull_0 = false" is not a type
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 42, 
Column 1: Expression "project_isNull_0 = false" is not a type
at 
org.codehaus.janino.Java$Atom.toTypeOrCompileException(Java.java:3934)
at org.codehaus.janino.Parser.parseBlockStatement(Parser.java:1887)
at org.codehaus.janino.Parser.parseBlockStatements(Parser.java:1811)
at org.codehaus.janino.Parser.parseBlock(Parser.java:1792)
at 
{noformat}
In the end, {{split_part}} does successfully execute, although in interpreted 
mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40065) Executor ConfigMap is not mounted if profile is not default

2022-08-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-40065:
-

Assignee: Nobuaki Sukegawa

> Executor ConfigMap is not mounted if profile is not default
> ---
>
> Key: SPARK-40065
> URL: https://issues.apache.org/jira/browse/SPARK-40065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Nobuaki Sukegawa
>Assignee: Nobuaki Sukegawa
>Priority: Minor
> Fix For: 3.3.1, 3.2.3
>
>
> When executor config map is made optional in SPARK-34316, mount volume is 
> unconditionally disabled erroneously when non-default profile is used.
> When spark.kubernetes.executor.disableConfigMap is false, expected behavior 
> is that the ConfigMap is mounted regardless of executor's resource profile. 
> However, it is not mounted if the resource profile is non-default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40065) Executor ConfigMap is not mounted if profile is not default

2022-08-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-40065.
---
Fix Version/s: 3.3.1
   3.2.3
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/37504

> Executor ConfigMap is not mounted if profile is not default
> ---
>
> Key: SPARK-40065
> URL: https://issues.apache.org/jira/browse/SPARK-40065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Nobuaki Sukegawa
>Priority: Minor
> Fix For: 3.3.1, 3.2.3
>
>
> When executor config map is made optional in SPARK-34316, mount volume is 
> unconditionally disabled erroneously when non-default profile is used.
> When spark.kubernetes.executor.disableConfigMap is false, expected behavior 
> is that the ConfigMap is mounted regardless of executor's resource profile. 
> However, it is not mounted if the resource profile is non-default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40060) Add numberDecommissioningExecutors metric

2022-08-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-40060:
-

Assignee: Zhongwei Zhu

> Add numberDecommissioningExecutors metric
> -
>
> Key: SPARK-40060
> URL: https://issues.apache.org/jira/browse/SPARK-40060
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Assignee: Zhongwei Zhu
>Priority: Minor
>
> The num of decommissioning executor should exposed as metric



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40060) Add numberDecommissioningExecutors metric

2022-08-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-40060.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37499
[https://github.com/apache/spark/pull/37499]

> Add numberDecommissioningExecutors metric
> -
>
> Key: SPARK-40060
> URL: https://issues.apache.org/jira/browse/SPARK-40060
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Assignee: Zhongwei Zhu
>Priority: Minor
> Fix For: 3.4.0
>
>
> The num of decommissioning executor should exposed as metric



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40151) Fix return type for new median(interval) function

2022-08-19 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-40151:


 Summary: Fix return type for new median(interval) function 
 Key: SPARK-40151
 URL: https://issues.apache.org/jira/browse/SPARK-40151
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


median() right now returns an interval of the same type as the input.
We should instead match mean and avg():

The result type is computed as for the arguments:

- year-month interval: The result is an `INTERVAL YEAR TO MONTH`.
- day-time interval: The result is an `INTERVAL DAY TO SECOND`.
- In all other cases the result is a DOUBLE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38582) Introduce `buildEnvVarsWithKV` and `buildEnvVarsWithFieldRef` for `KubernetesUtils` to eliminate duplicate code pattern

2022-08-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38582:
-

Assignee: Qian Sun

> Introduce `buildEnvVarsWithKV` and `buildEnvVarsWithFieldRef` for 
> `KubernetesUtils` to eliminate duplicate code pattern
> ---
>
> Key: SPARK-38582
> URL: https://issues.apache.org/jira/browse/SPARK-38582
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.1
>Reporter: Qian Sun
>Assignee: Qian Sun
>Priority: Minor
>
> There are many duplicate code patterns in Spark Code:
> {code:java}
> new EnvVarBuilder()
>   .withName(key)
>   .withValue(value)
>   .build() {code}
> {code:java}
> new EnvVarBuilder()
>.withName(name)
>  .withValueFrom(new EnvVarSourceBuilder()
>.withNewFieldRef(version, field)
>.build())
>.build()
> {code}
>  
> [The assignment statement for executor envVar | 
> https://github.com/apache/spark/blob/branch-3.3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala#L123-L185]
>  has 63 lines.  We could introduce _buildEnvVarsWithKV_ and 
> _buildEnvVarsWithFieldRef_ function to simplify the above code patterns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38582) Introduce `buildEnvVarsWithKV` and `buildEnvVarsWithFieldRef` for `KubernetesUtils` to eliminate duplicate code pattern

2022-08-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38582.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 35886
[https://github.com/apache/spark/pull/35886]

> Introduce `buildEnvVarsWithKV` and `buildEnvVarsWithFieldRef` for 
> `KubernetesUtils` to eliminate duplicate code pattern
> ---
>
> Key: SPARK-38582
> URL: https://issues.apache.org/jira/browse/SPARK-38582
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.1
>Reporter: Qian Sun
>Assignee: Qian Sun
>Priority: Minor
> Fix For: 3.4.0
>
>
> There are many duplicate code patterns in Spark Code:
> {code:java}
> new EnvVarBuilder()
>   .withName(key)
>   .withValue(value)
>   .build() {code}
> {code:java}
> new EnvVarBuilder()
>.withName(name)
>  .withValueFrom(new EnvVarSourceBuilder()
>.withNewFieldRef(version, field)
>.build())
>.build()
> {code}
>  
> [The assignment statement for executor envVar | 
> https://github.com/apache/spark/blob/branch-3.3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala#L123-L185]
>  has 63 lines.  We could introduce _buildEnvVarsWithKV_ and 
> _buildEnvVarsWithFieldRef_ function to simplify the above code patterns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38582) Add KubernetesUtils.buildEnvVars(WithFieldRef)? utility functions

2022-08-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38582:
--
Affects Version/s: 3.4.0
   (was: 3.2.1)

> Add KubernetesUtils.buildEnvVars(WithFieldRef)? utility functions
> -
>
> Key: SPARK-38582
> URL: https://issues.apache.org/jira/browse/SPARK-38582
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Qian Sun
>Assignee: Qian Sun
>Priority: Minor
> Fix For: 3.4.0
>
>
> There are many duplicate code patterns in Spark Code:
> {code:java}
> new EnvVarBuilder()
>   .withName(key)
>   .withValue(value)
>   .build() {code}
> {code:java}
> new EnvVarBuilder()
>.withName(name)
>  .withValueFrom(new EnvVarSourceBuilder()
>.withNewFieldRef(version, field)
>.build())
>.build()
> {code}
>  
> [The assignment statement for executor envVar | 
> https://github.com/apache/spark/blob/branch-3.3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala#L123-L185]
>  has 63 lines.  We could introduce _buildEnvVarsWithKV_ and 
> _buildEnvVarsWithFieldRef_ function to simplify the above code patterns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38582) Add KubernetesUtils.buildEnvVars(WithFieldRef)? utility functions

2022-08-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38582:
--
Summary: Add KubernetesUtils.buildEnvVars(WithFieldRef)? utility functions  
(was: Introduce `buildEnvVarsWithKV` and `buildEnvVarsWithFieldRef` for 
`KubernetesUtils` to eliminate duplicate code pattern)

> Add KubernetesUtils.buildEnvVars(WithFieldRef)? utility functions
> -
>
> Key: SPARK-38582
> URL: https://issues.apache.org/jira/browse/SPARK-38582
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.1
>Reporter: Qian Sun
>Assignee: Qian Sun
>Priority: Minor
> Fix For: 3.4.0
>
>
> There are many duplicate code patterns in Spark Code:
> {code:java}
> new EnvVarBuilder()
>   .withName(key)
>   .withValue(value)
>   .build() {code}
> {code:java}
> new EnvVarBuilder()
>.withName(name)
>  .withValueFrom(new EnvVarSourceBuilder()
>.withNewFieldRef(version, field)
>.build())
>.build()
> {code}
>  
> [The assignment statement for executor envVar | 
> https://github.com/apache/spark/blob/branch-3.3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala#L123-L185]
>  has 63 lines.  We could introduce _buildEnvVarsWithKV_ and 
> _buildEnvVarsWithFieldRef_ function to simplify the above code patterns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40000) Add config to toggle whether to automatically add default values for INSERTs without user-specified fields

2022-08-19 Thread Daniel (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel resolved SPARK-4.

Fix Version/s: 3.4.0
   Resolution: Won't Fix

Upon further analysis, we decided not to move forward with this change as it 
added too much complexity to downstream data sources.

> Add config to toggle whether to automatically add default values for INSERTs 
> without user-specified fields
> --
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39310) rename `required_same_anchor`

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581926#comment-17581926
 ] 

Apache Spark commented on SPARK-39310:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37585

> rename `required_same_anchor`
> -
>
> Key: SPARK-39310
> URL: https://issues.apache.org/jira/browse/SPARK-39310
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>
> https://github.com/apache/spark/pull/36353#discussion_r882216133



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39310) rename `required_same_anchor`

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39310:


Assignee: (was: Apache Spark)

> rename `required_same_anchor`
> -
>
> Key: SPARK-39310
> URL: https://issues.apache.org/jira/browse/SPARK-39310
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>
> https://github.com/apache/spark/pull/36353#discussion_r882216133



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39310) rename `required_same_anchor`

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39310:


Assignee: Apache Spark

> rename `required_same_anchor`
> -
>
> Key: SPARK-39310
> URL: https://issues.apache.org/jira/browse/SPARK-39310
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>
> https://github.com/apache/spark/pull/36353#discussion_r882216133



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39310) rename `required_same_anchor`

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581924#comment-17581924
 ] 

Apache Spark commented on SPARK-39310:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37585

> rename `required_same_anchor`
> -
>
> Key: SPARK-39310
> URL: https://issues.apache.org/jira/browse/SPARK-39310
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>
> https://github.com/apache/spark/pull/36353#discussion_r882216133



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39150) Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas to 1.4+

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581919#comment-17581919
 ] 

Apache Spark commented on SPARK-39150:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37584

> Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas 
> to 1.4+
> ---
>
> Key: SPARK-39150
> URL: https://issues.apache.org/jira/browse/SPARK-39150
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>
> [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2333]
> [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2265]
> all doctest in https://github.com/apache/spark/pull/36712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39150) Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas to 1.4+

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581918#comment-17581918
 ] 

Apache Spark commented on SPARK-39150:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37584

> Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas 
> to 1.4+
> ---
>
> Key: SPARK-39150
> URL: https://issues.apache.org/jira/browse/SPARK-39150
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>
> [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2333]
> [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2265]
> all doctest in https://github.com/apache/spark/pull/36712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39150) Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas to 1.4+

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39150:


Assignee: Apache Spark

> Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas 
> to 1.4+
> ---
>
> Key: SPARK-39150
> URL: https://issues.apache.org/jira/browse/SPARK-39150
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>
> [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2333]
> [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2265]
> all doctest in https://github.com/apache/spark/pull/36712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39150) Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas to 1.4+

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39150:


Assignee: (was: Apache Spark)

> Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas 
> to 1.4+
> ---
>
> Key: SPARK-39150
> URL: https://issues.apache.org/jira/browse/SPARK-39150
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>
> [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2333]
> [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2265]
> all doctest in https://github.com/apache/spark/pull/36712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40018) Output SparkThrowable to SQL golden files in JSON format

2022-08-19 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40018.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37452
[https://github.com/apache/spark/pull/37452]

> Output SparkThrowable to SQL golden files in JSON format
> 
>
> Key: SPARK-40018
> URL: https://issues.apache.org/jira/browse/SPARK-40018
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Output content of SparkThrowable in the JSON format instead of plain text.
> For instance, replace:
> {code}
> [INVALID_ARRAY_INDEX_IN_ELEMENT_AT] The index 5 is out of bounds. The array 
> has 3 elements. Use `try_element_at` to tolerate accessing element at invalid 
> index and return NULL instead. If necessary set "spark.sql.ansi.enabled" to 
> "false" to bypass this error.
> == SQL(line 1, position 8) ==
> select element_at(array(1, 2, 3), 5)
>^
> {code}
> by
> {code}
> {"errorClass":"INVALID_ARRAY_INDEX_IN_ELEMENT_AT","messageParameters":["5","3","\"spark.sql.ansi.enabled\""],"queryContext":[{"objectType":"","objectName":"","startIndex":7,"stopIndex":35,"fragment":"element_at(array(1,
>  2, 3), 5"}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39170) ImportError when creating pyspark.pandas document "Supported APIs" if pandas version is low.

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581905#comment-17581905
 ] 

Apache Spark commented on SPARK-39170:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37583

> ImportError when creating pyspark.pandas document "Supported APIs" if pandas 
> version is low.
> 
>
> Key: SPARK-39170
> URL: https://issues.apache.org/jira/browse/SPARK-39170
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Priority: Major
>
> The pyspark.pandas documentation "Supported APIs" will be auto-generated. 
> ([SPARK-38961|https://issues.apache.org/jira/browse/SPARK-38961])
> At this point, we need to verify the version of pandas. It can be applied 
> after the docker image used in github action is upgraded and republished at 
> https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage.
> Related: https://github.com/apache/spark/pull/36509



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39170) ImportError when creating pyspark.pandas document "Supported APIs" if pandas version is low.

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39170:


Assignee: Apache Spark

> ImportError when creating pyspark.pandas document "Supported APIs" if pandas 
> version is low.
> 
>
> Key: SPARK-39170
> URL: https://issues.apache.org/jira/browse/SPARK-39170
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Assignee: Apache Spark
>Priority: Major
>
> The pyspark.pandas documentation "Supported APIs" will be auto-generated. 
> ([SPARK-38961|https://issues.apache.org/jira/browse/SPARK-38961])
> At this point, we need to verify the version of pandas. It can be applied 
> after the docker image used in github action is upgraded and republished at 
> https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage.
> Related: https://github.com/apache/spark/pull/36509



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39170) ImportError when creating pyspark.pandas document "Supported APIs" if pandas version is low.

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581903#comment-17581903
 ] 

Apache Spark commented on SPARK-39170:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37583

> ImportError when creating pyspark.pandas document "Supported APIs" if pandas 
> version is low.
> 
>
> Key: SPARK-39170
> URL: https://issues.apache.org/jira/browse/SPARK-39170
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Priority: Major
>
> The pyspark.pandas documentation "Supported APIs" will be auto-generated. 
> ([SPARK-38961|https://issues.apache.org/jira/browse/SPARK-38961])
> At this point, we need to verify the version of pandas. It can be applied 
> after the docker image used in github action is upgraded and republished at 
> https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage.
> Related: https://github.com/apache/spark/pull/36509



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39170) ImportError when creating pyspark.pandas document "Supported APIs" if pandas version is low.

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39170:


Assignee: (was: Apache Spark)

> ImportError when creating pyspark.pandas document "Supported APIs" if pandas 
> version is low.
> 
>
> Key: SPARK-39170
> URL: https://issues.apache.org/jira/browse/SPARK-39170
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Priority: Major
>
> The pyspark.pandas documentation "Supported APIs" will be auto-generated. 
> ([SPARK-38961|https://issues.apache.org/jira/browse/SPARK-38961])
> At this point, we need to verify the version of pandas. It can be applied 
> after the docker image used in github action is upgraded and republished at 
> https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage.
> Related: https://github.com/apache/spark/pull/36509



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581871#comment-17581871
 ] 

Apache Spark commented on SPARK-38961:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37583

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Hyunwoo Park
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581869#comment-17581869
 ] 

Apache Spark commented on SPARK-38961:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37583

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Hyunwoo Park
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40050) Enhance EliminateSorts to support removing sorts via LocalLimit

2022-08-19 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-40050:

Summary: Enhance EliminateSorts to support removing sorts via LocalLimit  
(was: Eliminate the Sort if there is a LocalLimit between Join and Sort)

> Enhance EliminateSorts to support removing sorts via LocalLimit
> ---
>
> Key: SPARK-40050
> URL: https://issues.apache.org/jira/browse/SPARK-40050
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> It seems we can remove Sort operator:
> {code:scala}
> val projectPlan = testRelation.select($"a", $"b")
> val unnecessaryOrderByPlan = projectPlan.orderBy($"a".asc)
> val localLimitPlan = LocalLimit(Literal(2), unnecessaryOrderByPlan)
> val projectPlanB = testRelationB.select($"d")
> val joinPlan = localLimitPlan.join(projectPlanB, RightOuter).select($"a", 
> $"d")
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40050) Eliminate the Sort if there is a LocalLimit between Join and Sort

2022-08-19 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-40050:
---

Assignee: Yuming Wang

> Eliminate the Sort if there is a LocalLimit between Join and Sort
> -
>
> Key: SPARK-40050
> URL: https://issues.apache.org/jira/browse/SPARK-40050
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> It seems we can remove Sort operator:
> {code:scala}
> val projectPlan = testRelation.select($"a", $"b")
> val unnecessaryOrderByPlan = projectPlan.orderBy($"a".asc)
> val localLimitPlan = LocalLimit(Literal(2), unnecessaryOrderByPlan)
> val projectPlanB = testRelationB.select($"d")
> val joinPlan = localLimitPlan.join(projectPlanB, RightOuter).select($"a", 
> $"d")
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40050) Eliminate the Sort if there is a LocalLimit between Join and Sort

2022-08-19 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-40050.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37519
[https://github.com/apache/spark/pull/37519]

> Eliminate the Sort if there is a LocalLimit between Join and Sort
> -
>
> Key: SPARK-40050
> URL: https://issues.apache.org/jira/browse/SPARK-40050
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> It seems we can remove Sort operator:
> {code:scala}
> val projectPlan = testRelation.select($"a", $"b")
> val unnecessaryOrderByPlan = projectPlan.orderBy($"a".asc)
> val localLimitPlan = LocalLimit(Literal(2), unnecessaryOrderByPlan)
> val projectPlanB = testRelationB.select($"d")
> val joinPlan = localLimitPlan.join(projectPlanB, RightOuter).select($"a", 
> $"d")
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40133) Regenerate excludedTpcdsQueries's golden files if regenerateGoldenFiles is true

2022-08-19 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-40133.
-
Fix Version/s: 3.4.0
 Assignee: Yuming Wang
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/37562

> Regenerate excludedTpcdsQueries's golden files if regenerateGoldenFiles is 
> true
> ---
>
> Key: SPARK-40133
> URL: https://issues.apache.org/jira/browse/SPARK-40133
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40147) Make pyspark.sql.session examples self-contained

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581824#comment-17581824
 ] 

Apache Spark commented on SPARK-40147:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/37582

> Make pyspark.sql.session examples self-contained
> 
>
> Key: SPARK-40147
> URL: https://issues.apache.org/jira/browse/SPARK-40147
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40147) Make pyspark.sql.session examples self-contained

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40147:


Assignee: (was: Apache Spark)

> Make pyspark.sql.session examples self-contained
> 
>
> Key: SPARK-40147
> URL: https://issues.apache.org/jira/browse/SPARK-40147
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40147) Make pyspark.sql.session examples self-contained

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581823#comment-17581823
 ] 

Apache Spark commented on SPARK-40147:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/37582

> Make pyspark.sql.session examples self-contained
> 
>
> Key: SPARK-40147
> URL: https://issues.apache.org/jira/browse/SPARK-40147
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40147) Make pyspark.sql.session examples self-contained

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40147:


Assignee: Apache Spark

> Make pyspark.sql.session examples self-contained
> 
>
> Key: SPARK-40147
> URL: https://issues.apache.org/jira/browse/SPARK-40147
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581821#comment-17581821
 ] 

Apache Spark commented on SPARK-40142:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37581

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40146) Simply the codegen of getting map value

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40146.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37580
[https://github.com/apache/spark/pull/37580]

> Simply the codegen of getting map value
> ---
>
> Key: SPARK-40146
> URL: https://issues.apache.org/jira/browse/SPARK-40146
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40150) Dynamically merge File Splits

2022-08-19 Thread Jackey Lee (Jira)
Jackey Lee created SPARK-40150:
--

 Summary: Dynamically merge File Splits
 Key: SPARK-40150
 URL: https://issues.apache.org/jira/browse/SPARK-40150
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Jackey Lee


We currently use maxPartitionBytes and minPartitionNum to split files and use 
openCostInBytes to merge file splits. But these are static configurations, and 
the same configuration does not work in all scenarios.

This PR attempts to dynamically merge file splits, taking into the concurrency 
while processing more data in one task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40149) Star expansion after outer join asymmetrically includes joining key

2022-08-19 Thread Jira
Otakar Truněček created SPARK-40149:
---

 Summary: Star expansion after outer join asymmetrically includes 
joining key
 Key: SPARK-40149
 URL: https://issues.apache.org/jira/browse/SPARK-40149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.2, 3.3.0, 3.2.1, 3.2.0
Reporter: Otakar Truněček


When star expansion is used on left side of a join, the result will include 
joining key, while on the right side of join it doesn't. I would expect the 
behaviour to be symmetric (either include on both sides or on neither). 

Example:
{code:python}
from pyspark.sql import SparkSession
import pyspark.sql.functions as f

spark = SparkSession.builder.getOrCreate()

df_left = spark.range(5).withColumn('val', f.lit('left'))
df_right = spark.range(3, 7).withColumn('val', f.lit('right'))

df_merged = (
df_left
.alias('left')
.join(df_right.alias('right'), on='id', how='full_outer')
.withColumn('left_all', f.struct('left.*'))
.withColumn('right_all', f.struct('right.*'))
)

df_merged.show()
{code}
result:
{code:java}
+---++-++-+
| id| val|  val|left_all|right_all|
+---++-++-+
|  0|left| null|   {0, left}|   {null}|
|  1|left| null|   {1, left}|   {null}|
|  2|left| null|   {2, left}|   {null}|
|  3|left|right|   {3, left}|  {right}|
|  4|left|right|   {4, left}|  {right}|
|  5|null|right|{null, null}|  {right}|
|  6|null|right|{null, null}|  {right}|
+---++-++-+
{code}
This behaviour started with release 3.2.0. Previously the key was not included 
on either side. 
Result from Spark 3.1.3
{code:java}
+---++-++-+
| id| val|  val|left_all|right_all|
+---++-++-+
|  0|left| null|  {left}|   {null}|
|  6|null|right|  {null}|  {right}|
|  5|null|right|  {null}|  {right}|
|  1|left| null|  {left}|   {null}|
|  3|left|right|  {left}|  {right}|
|  2|left| null|  {left}|   {null}|
|  4|left|right|  {left}|  {right}|
+---++-++-+ {code}
I have a gut feeling this is related to these issues:
https://issues.apache.org/jira/browse/SPARK-39376
https://issues.apache.org/jira/browse/SPARK-34527
https://issues.apache.org/jira/browse/SPARK-38603

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40148) Make pyspark.sql.window examples self-contained

2022-08-19 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-40148:


 Summary: Make pyspark.sql.window examples self-contained
 Key: SPARK-40148
 URL: https://issues.apache.org/jira/browse/SPARK-40148
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40098) Format error messages in the Thrift Server

2022-08-19 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40098.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37520
[https://github.com/apache/spark/pull/37520]

> Format error messages in the Thrift Server
> --
>
> Key: SPARK-40098
> URL: https://issues.apache.org/jira/browse/SPARK-40098
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> # Introduce a config to control the format of error messages: plain text and 
> JSON
> # Modify the Thrift Server to output errors from Spark SQL according to the 
> config



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40138) Implement DataFrame.mode

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40138.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37569
[https://github.com/apache/spark/pull/37569]

> Implement DataFrame.mode
> 
>
> Key: SPARK-40138
> URL: https://issues.apache.org/jira/browse/SPARK-40138
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40138) Implement DataFrame.mode

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40138:


Assignee: Ruifeng Zheng

> Implement DataFrame.mode
> 
>
> Key: SPARK-40138
> URL: https://issues.apache.org/jira/browse/SPARK-40138
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40142:


Assignee: (was: Apache Spark)

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40142:


Assignee: Apache Spark

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40142:
-
Fix Version/s: (was: 3.4.0)

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40142:


Assignee: (was: Hyukjin Kwon)

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40147) Make pyspark.sql.session examples self-contained

2022-08-19 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-40147:


 Summary: Make pyspark.sql.session examples self-contained
 Key: SPARK-40147
 URL: https://issues.apache.org/jira/browse/SPARK-40147
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-40142:
--

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40142:


Assignee: Hyukjin Kwon

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40142.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37575
[https://github.com/apache/spark/pull/37575]

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40146) Simply the codegen of getting map value

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581718#comment-17581718
 ] 

Apache Spark commented on SPARK-40146:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37580

> Simply the codegen of getting map value
> ---
>
> Key: SPARK-40146
> URL: https://issues.apache.org/jira/browse/SPARK-40146
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40146) Simply the codegen of getting map value

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581716#comment-17581716
 ] 

Apache Spark commented on SPARK-40146:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37580

> Simply the codegen of getting map value
> ---
>
> Key: SPARK-40146
> URL: https://issues.apache.org/jira/browse/SPARK-40146
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40146) Simply the codegen of getting map value

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40146:


Assignee: Apache Spark  (was: Gengliang Wang)

> Simply the codegen of getting map value
> ---
>
> Key: SPARK-40146
> URL: https://issues.apache.org/jira/browse/SPARK-40146
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40146) Simply the codegen of getting map value

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40146:


Assignee: Gengliang Wang  (was: Apache Spark)

> Simply the codegen of getting map value
> ---
>
> Key: SPARK-40146
> URL: https://issues.apache.org/jira/browse/SPARK-40146
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40146) Simply the codegen of getting map value

2022-08-19 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-40146:
---
Issue Type: Improvement  (was: Bug)

> Simply the codegen of getting map value
> ---
>
> Key: SPARK-40146
> URL: https://issues.apache.org/jira/browse/SPARK-40146
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40146) Simply the codegen of getting map value

2022-08-19 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-40146:
--

 Summary: Simply the codegen of getting map value
 Key: SPARK-40146
 URL: https://issues.apache.org/jira/browse/SPARK-40146
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40140) REST API for SQL level information does not show information on running queries

2022-08-19 Thread Yeachan Park (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581708#comment-17581708
 ] 

Yeachan Park commented on SPARK-40140:
--

Please feel free to pick it up :)

> REST API for SQL level information does not show information on running 
> queries
> ---
>
> Key: SPARK-40140
> URL: https://issues.apache.org/jira/browse/SPARK-40140
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> Hi All,
> We noticed that the SQL information REST API implemented in 
> https://issues.apache.org/jira/browse/SPARK-27142 does not return back SQL 
> queries which are currently running. We can only see queries which are 
> completed/failed.
> As far as I can see, this should be supported since one of the fields in the 
> returned JSON is "runningJobIds". 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40145) Create infra image when cut down branches

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581701#comment-17581701
 ] 

Apache Spark commented on SPARK-40145:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37579

> Create infra image when cut down branches
> -
>
> Key: SPARK-40145
> URL: https://issues.apache.org/jira/browse/SPARK-40145
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40145) Create infra image when cut down branches

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40145:


Assignee: (was: Apache Spark)

> Create infra image when cut down branches
> -
>
> Key: SPARK-40145
> URL: https://issues.apache.org/jira/browse/SPARK-40145
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40145) Create infra image when cut down branches

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581699#comment-17581699
 ] 

Apache Spark commented on SPARK-40145:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37579

> Create infra image when cut down branches
> -
>
> Key: SPARK-40145
> URL: https://issues.apache.org/jira/browse/SPARK-40145
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39993) Spark on Kubernetes doesn't filter data by date

2022-08-19 Thread Hanna Liashchuk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581700#comment-17581700
 ] 

Hanna Liashchuk commented on SPARK-39993:
-

Could you try running it in client mode? Cause that's exactly what is 
happening, Jupyterhub is running in client mode. And yes, I run df.show() first 
to ensure that the df contains data, that's in the snippet too.

> Spark on Kubernetes doesn't filter data by date
> ---
>
> Key: SPARK-39993
> URL: https://issues.apache.org/jira/browse/SPARK-39993
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.2
> Environment: Kubernetes v1.23.6
> Spark 3.2.2
> Java 1.8.0_312
> Python 3.9.13
> Aws dependencies:
> aws-java-sdk-bundle-1.11.901.jar and hadoop-aws-3.3.1.jar
>Reporter: Hanna Liashchuk
>Priority: Major
>  Labels: kubernetes
>
> I'm creating a Dataset with type date and saving it into s3. When I read it 
> and try to use where() clause, I've noticed it doesn't return data even 
> though it's there
> Below is the code snippet I'm running
>  
> {code:java}
> from pyspark.sql.types import Row
> from pyspark.sql.functions import *
> ds = spark.range(10).withColumn("date", lit("2022-01-01")).withColumn("date", 
> col("date").cast("date"))
> ds.where("date = '2022-01-01'").show()
> ds.write.mode("overwrite").parquet("s3a://bucket/test")
> df = spark.read.format("parquet").load("s3a://bucket/test")
> df.where("date = '2022-01-01'").show()
> {code}
> The first show() returns data, while the second one - no.
> I've noticed that it's Kubernetes master related, as the same code snipped 
> works ok with master "local"
> UPD: if the column is used as a partition and has the type "date" there is no 
> filtering problem.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40145) Create infra image when cut down branches

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40145:


Assignee: Apache Spark

> Create infra image when cut down branches
> -
>
> Key: SPARK-40145
> URL: https://issues.apache.org/jira/browse/SPARK-40145
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40145) Create infra image when cut down branches

2022-08-19 Thread Yikun Jiang (Jira)
Yikun Jiang created SPARK-40145:
---

 Summary: Create infra image when cut down branches
 Key: SPARK-40145
 URL: https://issues.apache.org/jira/browse/SPARK-40145
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 3.4.0
Reporter: Yikun Jiang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40107) Pull out empty2null conversion from FileFormatWriter

2022-08-19 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40107:
---

Assignee: Allison Wang

> Pull out empty2null conversion from FileFormatWriter
> 
>
> Key: SPARK-40107
> URL: https://issues.apache.org/jira/browse/SPARK-40107
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> This is a follow-up for SPARK-37287. We can pull out the physical project to 
> convert empty string partition columns to null in `FileFormatWriter` into 
> logical planning as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40107) Pull out empty2null conversion from FileFormatWriter

2022-08-19 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40107.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37539
[https://github.com/apache/spark/pull/37539]

> Pull out empty2null conversion from FileFormatWriter
> 
>
> Key: SPARK-40107
> URL: https://issues.apache.org/jira/browse/SPARK-40107
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> This is a follow-up for SPARK-37287. We can pull out the physical project to 
> convert empty string partition columns to null in `FileFormatWriter` into 
> logical planning as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39791) In Spark 3.0 standalone cluster mode, unable to customize driver JVM path

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39791:


Assignee: (was: Apache Spark)

> In Spark 3.0 standalone cluster mode, unable to customize driver JVM path
> -
>
> Key: SPARK-39791
> URL: https://issues.apache.org/jira/browse/SPARK-39791
> Project: Spark
>  Issue Type: Question
>  Components: Spark Submit
>Affects Versions: 3.0.0
>Reporter: Obobj
>Priority: Minor
>  Labels: spark-submit, standalone
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> In Spark 3.0 standalone mode, unable to customize driver JVM path, instead 
> the JAVA_HOME of the spark-submit submission machine is used, but the JVM 
> paths of my submission machine and the cluster machine are different
> {code:java}
> launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
> List buildJavaCommand(String extraClassPath) throws IOException {
>   List cmd = new ArrayList<>();
>   String firstJavaHome = firstNonEmpty(javaHome,
> childEnv.get("JAVA_HOME"),
> System.getenv("JAVA_HOME"),
> System.getProperty("java.home")); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39791) In Spark 3.0 standalone cluster mode, unable to customize driver JVM path

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581682#comment-17581682
 ] 

Apache Spark commented on SPARK-39791:
--

User 'obobj' has created a pull request for this issue:
https://github.com/apache/spark/pull/37578

> In Spark 3.0 standalone cluster mode, unable to customize driver JVM path
> -
>
> Key: SPARK-39791
> URL: https://issues.apache.org/jira/browse/SPARK-39791
> Project: Spark
>  Issue Type: Question
>  Components: Spark Submit
>Affects Versions: 3.0.0
>Reporter: Obobj
>Priority: Minor
>  Labels: spark-submit, standalone
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> In Spark 3.0 standalone mode, unable to customize driver JVM path, instead 
> the JAVA_HOME of the spark-submit submission machine is used, but the JVM 
> paths of my submission machine and the cluster machine are different
> {code:java}
> launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
> List buildJavaCommand(String extraClassPath) throws IOException {
>   List cmd = new ArrayList<>();
>   String firstJavaHome = firstNonEmpty(javaHome,
> childEnv.get("JAVA_HOME"),
> System.getenv("JAVA_HOME"),
> System.getProperty("java.home")); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39791) In Spark 3.0 standalone cluster mode, unable to customize driver JVM path

2022-08-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39791:


Assignee: Apache Spark

> In Spark 3.0 standalone cluster mode, unable to customize driver JVM path
> -
>
> Key: SPARK-39791
> URL: https://issues.apache.org/jira/browse/SPARK-39791
> Project: Spark
>  Issue Type: Question
>  Components: Spark Submit
>Affects Versions: 3.0.0
>Reporter: Obobj
>Assignee: Apache Spark
>Priority: Minor
>  Labels: spark-submit, standalone
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> In Spark 3.0 standalone mode, unable to customize driver JVM path, instead 
> the JAVA_HOME of the spark-submit submission machine is used, but the JVM 
> paths of my submission machine and the cluster machine are different
> {code:java}
> launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
> List buildJavaCommand(String extraClassPath) throws IOException {
>   List cmd = new ArrayList<>();
>   String firstJavaHome = firstNonEmpty(javaHome,
> childEnv.get("JAVA_HOME"),
> System.getenv("JAVA_HOME"),
> System.getProperty("java.home")); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org