[jira] [Updated] (SPARK-42596) [YARN] OMP_NUM_THREADS not set to number of executor cores by default

2023-02-26 Thread John Zhuge (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated SPARK-42596:
---
Description: 
Run this PySpark script with `spark.executor.cores=1`
{code:python}
import os
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf

spark = SparkSession.builder.getOrCreate()

var_name = 'OMP_NUM_THREADS'

def get_env_var():
  return os.getenv(var_name)

udf_get_env_var = udf(get_env_var)
spark.range(1).toDF("id").withColumn(f"env_{var_name}", 
udf_get_env_var()).show(truncate=False)
{code}
Output with release `3.3.2`:
{noformat}
+---+---+
|id |env_OMP_NUM_THREADS|
+---+---+
|0  |null   |
+---+---+
{noformat}
Output with release `3.3.0`:
{noformat}
+---+---+
|id |env_OMP_NUM_THREADS|
+---+---+
|0  |1  |
+---+---+
{noformat}

  was:
Run this PySpark script with `spark.executor.cores=1`
{code:python}
import os
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf

spark = SparkSession.builder.getOrCreate()

var_name = 'OMP_NUM_THREADS'

def get_env_var():
  return os.getenv(var_name)

udf_get_env_var = udf(get_env_var)
spark.range(1).toDF("id").withColumn(f"env_{var_name}", 
udf_get_env_var()).show(truncate=False)
{code}
Output with release `3.3.2`:
{noformat}
+---+---+
|id |env_OMP_NUM_THREADS|
+---+---+
|0  |null   |
+---+---+
{noformat}
Output with release `3.3.0`:
{noformat}
+---+---+
|id |env_OMP_NUM_THREADS|
+---+---+
|0  |1   |
+---+---+
{noformat}


> [YARN] OMP_NUM_THREADS not set to number of executor cores by default
> -
>
> Key: SPARK-42596
> URL: https://issues.apache.org/jira/browse/SPARK-42596
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.2
>Reporter: John Zhuge
>Priority: Major
>
> Run this PySpark script with `spark.executor.cores=1`
> {code:python}
> import os
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import udf
> spark = SparkSession.builder.getOrCreate()
> var_name = 'OMP_NUM_THREADS'
> def get_env_var():
>   return os.getenv(var_name)
> udf_get_env_var = udf(get_env_var)
> spark.range(1).toDF("id").withColumn(f"env_{var_name}", 
> udf_get_env_var()).show(truncate=False)
> {code}
> Output with release `3.3.2`:
> {noformat}
> +---+---+
> |id |env_OMP_NUM_THREADS|
> +---+---+
> |0  |null   |
> +---+---+
> {noformat}
> Output with release `3.3.0`:
> {noformat}
> +---+---+
> |id |env_OMP_NUM_THREADS|
> +---+---+
> |0  |1  |
> +---+---+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42596) [YARN] OMP_NUM_THREADS not set to number of executor cores by default

2023-02-26 Thread John Zhuge (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated SPARK-42596:
---
Description: 
Run this PySpark script with `spark.executor.cores=1`
{code:python}
import os
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf

spark = SparkSession.builder.getOrCreate()

var_name = 'OMP_NUM_THREADS'

def get_env_var():
  return os.getenv(var_name)

udf_get_env_var = udf(get_env_var)
spark.range(1).toDF("id").withColumn(f"env_{var_name}", 
udf_get_env_var()).show(truncate=False)
{code}
Output with release `3.3.2`:
{noformat}
+---+---+
|id |env_OMP_NUM_THREADS|
+---+---+
|0  |null   |
+---+---+
{noformat}
Output with release `3.3.0`:
{noformat}
+---+---+
|id |env_OMP_NUM_THREADS|
+---+---+
|0  |1   |
+---+---+
{noformat}

  was:
Run this PySpark script with `spark.executor.cores=1`
{code:python}
import os
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf

spark = SparkSession.builder.getOrCreate()

var_name = 'OMP_NUM_THREADS'

def get_env_var():
  return os.getenv(var_name)

udf_get_env_var = udf(get_env_var)
spark.range(1).toDF("id").withColumn(f"env_{var_name}", 
udf_get_env_var()).show(truncate=False)
{code}
Output with release `3.3.2`:
{noformat}
+---+---+
|id |env_OMP_NUM_THREADS|
+---+---+
|0  |null   |
+---+---+
{noformat}
Output with release `3.3.0`:
{noformat}
+---+---+
|id |env_OMP_NUM_THREADS|
+---+---+
|0  |1   |
+---+---+
{noformat}


> [YARN] OMP_NUM_THREADS not set to number of executor cores by default
> -
>
> Key: SPARK-42596
> URL: https://issues.apache.org/jira/browse/SPARK-42596
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.2
>Reporter: John Zhuge
>Priority: Major
>
> Run this PySpark script with `spark.executor.cores=1`
> {code:python}
> import os
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import udf
> spark = SparkSession.builder.getOrCreate()
> var_name = 'OMP_NUM_THREADS'
> def get_env_var():
>   return os.getenv(var_name)
> udf_get_env_var = udf(get_env_var)
> spark.range(1).toDF("id").withColumn(f"env_{var_name}", 
> udf_get_env_var()).show(truncate=False)
> {code}
> Output with release `3.3.2`:
> {noformat}
> +---+---+
> |id |env_OMP_NUM_THREADS|
> +---+---+
> |0  |null   |
> +---+---+
> {noformat}
> Output with release `3.3.0`:
> {noformat}
> +---+---+
> |id |env_OMP_NUM_THREADS|
> +---+---+
> |0  |1   |
> +---+---+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42596) [YARN] OMP_NUM_THREADS not set to number of executor cores by default

2023-02-26 Thread John Zhuge (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693837#comment-17693837
 ] 

John Zhuge commented on SPARK-42596:


Looks like a regression from SPARK-41188 where it removed the code that sets 
the default OMP_NUM_THREADS from PythonRunner.

Its PR assumes the code can be moved to SparkContext, unfortunately 
`SparkContext#executorEnvs` is only used by StandaloneSchedulerBackend for 
Spark's standalone cluster manager, thus the PR broke YARN as shown in the test 
case above, probably Mesos as well but I don't have a way to test.

> [YARN] OMP_NUM_THREADS not set to number of executor cores by default
> -
>
> Key: SPARK-42596
> URL: https://issues.apache.org/jira/browse/SPARK-42596
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.2
>Reporter: John Zhuge
>Priority: Major
>
> Run this PySpark script with `spark.executor.cores=1`
> {code:python}
> import os
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import udf
> spark = SparkSession.builder.getOrCreate()
> var_name = 'OMP_NUM_THREADS'
> def get_env_var():
>   return os.getenv(var_name)
> udf_get_env_var = udf(get_env_var)
> spark.range(1).toDF("id").withColumn(f"env_{var_name}", 
> udf_get_env_var()).show(truncate=False)
> {code}
> Output with release `3.3.2`:
> {noformat}
> +---+---+
> |id |env_OMP_NUM_THREADS|
> +---+---+
> |0  |null   |
> +---+---+
> {noformat}
> Output with release `3.3.0`:
> {noformat}
> +---+---+
> |id |env_OMP_NUM_THREADS|
> +---+---+
> |0  |1   |
> +---+---+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42572) Logic error for StateStore.validateStateRowFormat

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693835#comment-17693835
 ] 

Apache Spark commented on SPARK-42572:
--

User 'WweiL' has created a pull request for this issue:
https://github.com/apache/spark/pull/40187

> Logic error for StateStore.validateStateRowFormat
> -
>
> Key: SPARK-42572
> URL: https://issues.apache.org/jira/browse/SPARK-42572
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> SPARK-42484 Changed the logic of whether to check state store format in 
> StateStore.validateStateRowFormat. Revert it and add unit test to make sure 
> this won't happen again



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42572) Logic error for StateStore.validateStateRowFormat

2023-02-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42572:


Assignee: Apache Spark

> Logic error for StateStore.validateStateRowFormat
> -
>
> Key: SPARK-42572
> URL: https://issues.apache.org/jira/browse/SPARK-42572
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Wei Liu
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-42484 Changed the logic of whether to check state store format in 
> StateStore.validateStateRowFormat. Revert it and add unit test to make sure 
> this won't happen again



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42572) Logic error for StateStore.validateStateRowFormat

2023-02-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42572:


Assignee: (was: Apache Spark)

> Logic error for StateStore.validateStateRowFormat
> -
>
> Key: SPARK-42572
> URL: https://issues.apache.org/jira/browse/SPARK-42572
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> SPARK-42484 Changed the logic of whether to check state store format in 
> StateStore.validateStateRowFormat. Revert it and add unit test to make sure 
> this won't happen again



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42572) Logic error for StateStore.validateStateRowFormat

2023-02-26 Thread Wei Liu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693834#comment-17693834
 ] 

Wei Liu commented on SPARK-42572:
-

I'm not very sure what's the true process here..

We should still use some changes in #40073 (especially the logging part)

I've create a PR for the fix: [https://github.com/apache/spark/pull/40187]

But I could also revert it and combine the two PRs if that's the correct flow

> Logic error for StateStore.validateStateRowFormat
> -
>
> Key: SPARK-42572
> URL: https://issues.apache.org/jira/browse/SPARK-42572
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> SPARK-42484 Changed the logic of whether to check state store format in 
> StateStore.validateStateRowFormat. Revert it and add unit test to make sure 
> this won't happen again



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42596) [YARN] OMP_NUM_THREADS not set to number of executor cores by default

2023-02-26 Thread John Zhuge (Jira)
John Zhuge created SPARK-42596:
--

 Summary: [YARN] OMP_NUM_THREADS not set to number of executor 
cores by default
 Key: SPARK-42596
 URL: https://issues.apache.org/jira/browse/SPARK-42596
 Project: Spark
  Issue Type: Bug
  Components: PySpark, YARN
Affects Versions: 3.3.2
Reporter: John Zhuge


Run this PySpark script with `spark.executor.cores=1`
{code:python}
import os
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf

spark = SparkSession.builder.getOrCreate()

var_name = 'OMP_NUM_THREADS'

def get_env_var():
  return os.getenv(var_name)

udf_get_env_var = udf(get_env_var)
spark.range(1).toDF("id").withColumn(f"env_{var_name}", 
udf_get_env_var()).show(truncate=False)
{code}
Output with release `3.3.2`:
{noformat}
+---+---+
|id |env_OMP_NUM_THREADS|
+---+---+
|0  |null   |
+---+---+
{noformat}
Output with release `3.3.0`:
{noformat}
+---+---+
|id |env_OMP_NUM_THREADS|
+---+---+
|0  |1   |
+---+---+
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42595) Support query inserted partitions after insert data into table when hive.exec.dynamic.partition=true

2023-02-26 Thread zhang haoyan (Jira)
zhang haoyan created SPARK-42595:


 Summary: Support query inserted partitions after insert data into 
table when hive.exec.dynamic.partition=true
 Key: SPARK-42595
 URL: https://issues.apache.org/jira/browse/SPARK-42595
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.5.0
Reporter: zhang haoyan


When hive.exec.dynamic.partition=true and 
hive.exec.dynamic.partition.mode=nonstrict, we can insert table by sql like 
'insert overwrite table aaa partition(dt) select ',  of course we can know 
the partitions inserted into the table by the sql itself,  but if we want do 
something for common use, we need some common way to get the inserted 
partitions,  for example:

    spark.sql("insert overwrite table aaa partition(dt) select ")  //insert 
table

    val partitions = getInsertedPartitions()   //need some way to get inserted 
partitions

    monitorInsertedPartitions(partitions)    //do something for common use

Since insert statement should not return any data, this ticket propose to 
introduce spark.hive.exec.dynamic.partition.savePartitions=true (default false) 
spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix=hive_dynamic_inserted_partitions

when spark.hive.exec.dynamic.partition.savePartitions=true we save the 
partitions to the 

temporary view 
$spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix_$dbName_$tableName

we will allow user to do this

scala> spark.conf.set("hive.exec.dynamic.partition", true)

scala> spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict")

scala> spark.conf.set("spark.hive.exec.dynamic.partition.savePartitions", true)

scala> spark.sql("insert overwrite table db1.test_partition_table partition 
(dt) select 1, '2023-02-22'").show(false)

++                                                                              

||

++

++

scala> spark.sql("select * from 
hive_dynamic_inserted_partitions_db1_test_partition_table").show(false)

+--+                                                                    

|dt        |

+--+

|2023-02-22|

+--+



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42594) spark can not read lastest view sql when run `create or replace view` by hive

2023-02-26 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-42594.
-
Resolution: Not A Bug

> spark can not read lastest view sql when run `create or replace view` by hive
> -
>
> Key: SPARK-42594
> URL: https://issues.apache.org/jira/browse/SPARK-42594
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: ming95
>Priority: Major
> Attachments: image-2023-02-27-13-31-20-420.png
>
>
> 1. Spark would save view schema as tabel param. 
> 2. Spark will make tabel param as output schema when select the view .
> 3. Hive will not update tabel param when runing `create or replace view` to 
> update the view.
> !image-2023-02-27-13-31-20-420.png!
> So when hive and spark are mixed and update the view, spark may ignore some 
> col.
> To reproduce this issue:
> 1. running in spark
> ```
> create table test_spark (id string);
> create view test_spark_view as select id from test_spark;
> ```
> 2. running in hive
> ```
> create or replace view test_spark_view as select id , "test" as new_id from 
> test_spark;
> ```
> 3. We can see spark will ignore `test_spark_view#new_id` when select 
> test_spark_view using spark. But hive can read it.
> I'm not sure if this is a feature of spark.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-42594) spark can not read lastest view sql when run `create or replace view` by hive

2023-02-26 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reopened SPARK-42594:
-

> spark can not read lastest view sql when run `create or replace view` by hive
> -
>
> Key: SPARK-42594
> URL: https://issues.apache.org/jira/browse/SPARK-42594
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: ming95
>Priority: Major
> Attachments: image-2023-02-27-13-31-20-420.png
>
>
> 1. Spark would save view schema as tabel param. 
> 2. Spark will make tabel param as output schema when select the view .
> 3. Hive will not update tabel param when runing `create or replace view` to 
> update the view.
> !image-2023-02-27-13-31-20-420.png!
> So when hive and spark are mixed and update the view, spark may ignore some 
> col.
> To reproduce this issue:
> 1. running in spark
> ```
> create table test_spark (id string);
> create view test_spark_view as select id from test_spark;
> ```
> 2. running in hive
> ```
> create or replace view test_spark_view as select id , "test" as new_id from 
> test_spark;
> ```
> 3. We can see spark will ignore `test_spark_view#new_id` when select 
> test_spark_view using spark. But hive can read it.
> I'm not sure if this is a feature of spark.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42594) spark can not read lastest view sql when run `create or replace view` by hive

2023-02-26 Thread zzzzming95 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ming95 resolved SPARK-42594.

Resolution: Fixed

> spark can not read lastest view sql when run `create or replace view` by hive
> -
>
> Key: SPARK-42594
> URL: https://issues.apache.org/jira/browse/SPARK-42594
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: ming95
>Priority: Major
> Attachments: image-2023-02-27-13-31-20-420.png
>
>
> 1. Spark would save view schema as tabel param. 
> 2. Spark will make tabel param as output schema when select the view .
> 3. Hive will not update tabel param when runing `create or replace view` to 
> update the view.
> !image-2023-02-27-13-31-20-420.png!
> So when hive and spark are mixed and update the view, spark may ignore some 
> col.
> To reproduce this issue:
> 1. running in spark
> ```
> create table test_spark (id string);
> create view test_spark_view as select id from test_spark;
> ```
> 2. running in hive
> ```
> create or replace view test_spark_view as select id , "test" as new_id from 
> test_spark;
> ```
> 3. We can see spark will ignore `test_spark_view#new_id` when select 
> test_spark_view using spark. But hive can read it.
> I'm not sure if this is a feature of spark.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42594) spark can not read lastest view sql when run `create or replace view` by hive

2023-02-26 Thread zzzzming95 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693810#comment-17693810
 ] 

ming95 commented on SPARK-42594:


OK ,Thanks~ [~yumwang] 

> spark can not read lastest view sql when run `create or replace view` by hive
> -
>
> Key: SPARK-42594
> URL: https://issues.apache.org/jira/browse/SPARK-42594
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: ming95
>Priority: Major
> Attachments: image-2023-02-27-13-31-20-420.png
>
>
> 1. Spark would save view schema as tabel param. 
> 2. Spark will make tabel param as output schema when select the view .
> 3. Hive will not update tabel param when runing `create or replace view` to 
> update the view.
> !image-2023-02-27-13-31-20-420.png!
> So when hive and spark are mixed and update the view, spark may ignore some 
> col.
> To reproduce this issue:
> 1. running in spark
> ```
> create table test_spark (id string);
> create view test_spark_view as select id from test_spark;
> ```
> 2. running in hive
> ```
> create or replace view test_spark_view as select id , "test" as new_id from 
> test_spark;
> ```
> 3. We can see spark will ignore `test_spark_view#new_id` when select 
> test_spark_view using spark. But hive can read it.
> I'm not sure if this is a feature of spark.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42528) Optimize PercentileHeap

2023-02-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42528.
-
Fix Version/s: 3.5.0
   (was: 3.4.0)
   Resolution: Fixed

Issue resolved by pull request 40121
[https://github.com/apache/spark/pull/40121]

> Optimize PercentileHeap
> ---
>
> Key: SPARK-42528
> URL: https://issues.apache.org/jira/browse/SPARK-42528
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Alkis Evlogimenos
>Assignee: Alkis Evlogimenos
>Priority: Major
> Fix For: 3.5.0
>
>
> It is not fast enough when used inside the scheduler for estimations which 
> slows down scheduling rate and as a result query execution time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42594) spark can not read lastest view sql when run `create or replace view` by hive

2023-02-26 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693799#comment-17693799
 ] 

Yuming Wang commented on SPARK-42594:
-

Spark saves information to table properties, Hive does not update this 
information. Please avoid updating view definition through Hive.

> spark can not read lastest view sql when run `create or replace view` by hive
> -
>
> Key: SPARK-42594
> URL: https://issues.apache.org/jira/browse/SPARK-42594
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: ming95
>Priority: Major
> Attachments: image-2023-02-27-13-31-20-420.png
>
>
> 1. Spark would save view schema as tabel param. 
> 2. Spark will make tabel param as output schema when select the view .
> 3. Hive will not update tabel param when runing `create or replace view` to 
> update the view.
> !image-2023-02-27-13-31-20-420.png!
> So when hive and spark are mixed and update the view, spark may ignore some 
> col.
> To reproduce this issue:
> 1. running in spark
> ```
> create table test_spark (id string);
> create view test_spark_view as select id from test_spark;
> ```
> 2. running in hive
> ```
> create or replace view test_spark_view as select id , "test" as new_id from 
> test_spark;
> ```
> 3. We can see spark will ignore `test_spark_view#new_id` when select 
> test_spark_view using spark. But hive can read it.
> I'm not sure if this is a feature of spark.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42594) spark can not read lastest view sql when run `create or replace view` by hive

2023-02-26 Thread zzzzming95 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693795#comment-17693795
 ] 

ming95 commented on SPARK-42594:


[~yumwang] [~gurwls223]  gentel ping ~

> spark can not read lastest view sql when run `create or replace view` by hive
> -
>
> Key: SPARK-42594
> URL: https://issues.apache.org/jira/browse/SPARK-42594
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: ming95
>Priority: Major
> Attachments: image-2023-02-27-13-31-20-420.png
>
>
> 1. Spark would save view schema as tabel param. 
> 2. Spark will make tabel param as output schema when select the view .
> 3. Hive will not update tabel param when runing `create or replace view` to 
> update the view.
> !image-2023-02-27-13-31-20-420.png!
> So when hive and spark are mixed and update the view, spark may ignore some 
> col.
> To reproduce this issue:
> 1. running in spark
> ```
> create table test_spark (id string);
> create view test_spark_view as select id from test_spark;
> ```
> 2. running in hive
> ```
> create or replace view test_spark_view as select id , "test" as new_id from 
> test_spark;
> ```
> 3. We can see spark will ignore `test_spark_view#new_id` when select 
> test_spark_view using spark. But hive can read it.
> I'm not sure if this is a feature of spark.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42594) spark can not read lastest view sql when run `create or replace view` by hive

2023-02-26 Thread zzzzming95 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ming95 updated SPARK-42594:
---
Description: 
1. Spark would save view schema as tabel param. 
2. Spark will make tabel param as output schema when select the view .
3. Hive will not update tabel param when runing `create or replace view` to 
update the view.

!image-2023-02-27-13-31-20-420.png!

So when hive and spark are mixed and update the view, spark may ignore some col.

To reproduce this issue:

1. running in spark
```
create table test_spark (id string);
create view test_spark_view as select id from test_spark;
```

2. running in hive

```
create or replace view test_spark_view as select id , "test" as new_id from 
test_spark;

```

3. We can see spark will ignore `test_spark_view#new_id` when select 
test_spark_view using spark. But hive can read it.

I'm not sure if this is a feature of spark.

 

  was:
1. Spark would save view schema as tabel param. 
2. Spark will make tabel param as output schema when select the view .
3. Hive will not update tabel param when runing `create or replace view` to 
update the view.


So when hive and spark are mixed and update the view, spark may ignore some 
strings.


To reproduce this issue:

1. running in spark
```
create table test_spark (id string);
create view test_spark_view as select id from test_spark;
```

2. running in hive

```
create or replace view test_spark_view as select id , "test" as new_id from 
test_spark;

```

3. We can see spark will ignore `test_spark_view#new_id` when select 
test_spark_view using spark. But hive can read it.


> spark can not read lastest view sql when run `create or replace view` by hive
> -
>
> Key: SPARK-42594
> URL: https://issues.apache.org/jira/browse/SPARK-42594
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: ming95
>Priority: Major
> Attachments: image-2023-02-27-13-31-20-420.png
>
>
> 1. Spark would save view schema as tabel param. 
> 2. Spark will make tabel param as output schema when select the view .
> 3. Hive will not update tabel param when runing `create or replace view` to 
> update the view.
> !image-2023-02-27-13-31-20-420.png!
> So when hive and spark are mixed and update the view, spark may ignore some 
> col.
> To reproduce this issue:
> 1. running in spark
> ```
> create table test_spark (id string);
> create view test_spark_view as select id from test_spark;
> ```
> 2. running in hive
> ```
> create or replace view test_spark_view as select id , "test" as new_id from 
> test_spark;
> ```
> 3. We can see spark will ignore `test_spark_view#new_id` when select 
> test_spark_view using spark. But hive can read it.
> I'm not sure if this is a feature of spark.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42594) spark can not read lastest view sql when run `create or replace view` by hive

2023-02-26 Thread zzzzming95 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ming95 updated SPARK-42594:
---
Attachment: image-2023-02-27-13-31-20-420.png

> spark can not read lastest view sql when run `create or replace view` by hive
> -
>
> Key: SPARK-42594
> URL: https://issues.apache.org/jira/browse/SPARK-42594
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: ming95
>Priority: Major
> Attachments: image-2023-02-27-13-31-20-420.png
>
>
> 1. Spark would save view schema as tabel param. 
> 2. Spark will make tabel param as output schema when select the view .
> 3. Hive will not update tabel param when runing `create or replace view` to 
> update the view.
> So when hive and spark are mixed and update the view, spark may ignore some 
> strings.
> To reproduce this issue:
> 1. running in spark
> ```
> create table test_spark (id string);
> create view test_spark_view as select id from test_spark;
> ```
> 2. running in hive
> ```
> create or replace view test_spark_view as select id , "test" as new_id from 
> test_spark;
> ```
> 3. We can see spark will ignore `test_spark_view#new_id` when select 
> test_spark_view using spark. But hive can read it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42594) spark can not read lastest view sql when run `create or replace view` by hive

2023-02-26 Thread zzzzming95 (Jira)
ming95 created SPARK-42594:
--

 Summary: spark can not read lastest view sql when run `create or 
replace view` by hive
 Key: SPARK-42594
 URL: https://issues.apache.org/jira/browse/SPARK-42594
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.2
Reporter: ming95


1. Spark would save view schema as tabel param. 
2. Spark will make tabel param as output schema when select the view .
3. Hive will not update tabel param when runing `create or replace view` to 
update the view.


So when hive and spark are mixed and update the view, spark may ignore some 
strings.


To reproduce this issue:

1. running in spark
```
create table test_spark (id string);
create view test_spark_view as select id from test_spark;
```

2. running in hive

```
create or replace view test_spark_view as select id , "test" as new_id from 
test_spark;

```

3. We can see spark will ignore `test_spark_view#new_id` when select 
test_spark_view using spark. But hive can read it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42593) Deprecate the APIs that will be removed in pandas 2.0.

2023-02-26 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42593:

Description: 
pandas is preparing to release 2.0 which includes bunch of API changes. 
([https://pandas.pydata.org/pandas-docs/version/2.0/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes])

We also should deprecate the APIs so that we can remove the API in next release.

  was:
pandas is preparing to release 2.0 which includes bunch of API changes.

We also should deprecate the APIs so that we can remove the API in next release.


> Deprecate the APIs that will be removed in pandas 2.0.
> --
>
> Key: SPARK-42593
> URL: https://issues.apache.org/jira/browse/SPARK-42593
> Project: Spark
>  Issue Type: New Feature
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> pandas is preparing to release 2.0 which includes bunch of API changes. 
> ([https://pandas.pydata.org/pandas-docs/version/2.0/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes])
> We also should deprecate the APIs so that we can remove the API in next 
> release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42593) Deprecate the APIs that will be removed in pandas 2.0.

2023-02-26 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693788#comment-17693788
 ] 

Haejoon Lee commented on SPARK-42593:
-

I'm taking a look at this one.

Will submit a PR soon.

> Deprecate the APIs that will be removed in pandas 2.0.
> --
>
> Key: SPARK-42593
> URL: https://issues.apache.org/jira/browse/SPARK-42593
> Project: Spark
>  Issue Type: New Feature
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> pandas is preparing to release 2.0 which includes bunch of API changes.
> We also should deprecate the APIs so that we can remove the API in next 
> release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42593) Deprecate the APIs that will be removed in pandas 2.0.

2023-02-26 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42593:
---

 Summary: Deprecate the APIs that will be removed in pandas 2.0.
 Key: SPARK-42593
 URL: https://issues.apache.org/jira/browse/SPARK-42593
 Project: Spark
  Issue Type: New Feature
  Components: Pandas API on Spark
Affects Versions: 3.4.0
Reporter: Haejoon Lee


pandas is preparing to release 2.0 which includes bunch of API changes.

We also should deprecate the APIs so that we can remove the API in next release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42592) Document SS guide doc for supporting multiple stateful operators (especially chained aggregations)

2023-02-26 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693786#comment-17693786
 ] 

Jungtaek Lim commented on SPARK-42592:
--

Ideally this should be a part of Spark 3.4.0... Since RC already happened, I'll 
try to see whether I can add the doc before RC2.

> Document SS guide doc for supporting multiple stateful operators (especially 
> chained aggregations)
> --
>
> Key: SPARK-42592
> URL: https://issues.apache.org/jira/browse/SPARK-42592
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We made a change on the guide doc for SPARK-40925 via SPARK-42105, but from 
> SPARK-42105 we only removed the section of "limitation of global watermark". 
> That said, we haven't provided any example of new functionality, especially 
> that users need to know about the change of SQL function (window) in chained 
> time window aggregations.
> In this ticket, we will add the example of chained time window aggregations, 
> with introducing new functionality of SQL function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42592) Document SS guide doc for supporting multiple stateful operators (especially chained aggregations)

2023-02-26 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-42592:


 Summary: Document SS guide doc for supporting multiple stateful 
operators (especially chained aggregations)
 Key: SPARK-42592
 URL: https://issues.apache.org/jira/browse/SPARK-42592
 Project: Spark
  Issue Type: Documentation
  Components: Structured Streaming
Affects Versions: 3.5.0
Reporter: Jungtaek Lim


We made a change on the guide doc for SPARK-40925 via SPARK-42105, but from 
SPARK-42105 we only removed the section of "limitation of global watermark". 
That said, we haven't provided any example of new functionality, especially 
that users need to know about the change of SQL function (window) in chained 
time window aggregations.

In this ticket, we will add the example of chained time window aggregations, 
with introducing new functionality of SQL function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42591) Document SS guide doc for introducing watermark propagation among operators

2023-02-26 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-42591:


 Summary: Document SS guide doc for introducing watermark 
propagation among operators
 Key: SPARK-42591
 URL: https://issues.apache.org/jira/browse/SPARK-42591
 Project: Spark
  Issue Type: Documentation
  Components: Structured Streaming
Affects Versions: 3.5.0
Reporter: Jungtaek Lim


Once SPARK-42376 has merged, we would want to also provide the example of using 
stream-stream time interval join followed by streaming aggregation. Just adding 
the feature without proper document may lead to the case no one even knows this 
is supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42581) Add SparkSession implicits

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693782#comment-17693782
 ] 

Apache Spark commented on SPARK-42581:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40186

> Add SparkSession implicits
> --
>
> Key: SPARK-42581
> URL: https://issues.apache.org/jira/browse/SPARK-42581
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42362) Upgrade kubernetes-client from 6.4.0 to 6.4.1

2023-02-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42362:
--
Affects Version/s: 3.4.0
   (was: 3.5.0)

> Upgrade kubernetes-client from 6.4.0 to 6.4.1
> -
>
> Key: SPARK-42362
> URL: https://issues.apache.org/jira/browse/SPARK-42362
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Minor
> Fix For: 3.4.0
>
>
> New version of kubernetes client 
> Release notes 
> https://github.com/fabric8io/kubernetes-client/releases/tag/v6.4.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42586) Implement RuntimeConf

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693777#comment-17693777
 ] 

Apache Spark commented on SPARK-42586:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40185

> Implement RuntimeConf
> -
>
> Key: SPARK-42586
> URL: https://issues.apache.org/jira/browse/SPARK-42586
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>
> Implement RuntimeConf for the Scala Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42497) Support of pandas API on Spark for Spark Connect.

2023-02-26 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42497:

Affects Version/s: 3.5.0
   (was: 3.4.0)

> Support of pandas API on Spark for Spark Connect.
> -
>
> Key: SPARK-42497
> URL: https://issues.apache.org/jira/browse/SPARK-42497
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should enable `pandas API on Spark` on Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42569) Throw unsupported exceptions for non-supported API

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693776#comment-17693776
 ] 

Apache Spark commented on SPARK-42569:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40184

> Throw unsupported exceptions for non-supported API
> --
>
> Key: SPARK-42569
> URL: https://issues.apache.org/jira/browse/SPARK-42569
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42569) Throw unsupported exceptions for non-supported API

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693775#comment-17693775
 ] 

Apache Spark commented on SPARK-42569:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40184

> Throw unsupported exceptions for non-supported API
> --
>
> Key: SPARK-42569
> URL: https://issues.apache.org/jira/browse/SPARK-42569
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42589) Exclude `RelationalGroupedDataset.apply` from CompatibilitySuite

2023-02-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42589.
---
Resolution: Cannot Reproduce

> Exclude `RelationalGroupedDataset.apply` from CompatibilitySuite
> 
>
> Key: SPARK-42589
> URL: https://issues.apache.org/jira/browse/SPARK-42589
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40821) Introduce window_time function to extract event time from the window column

2023-02-26 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-40821:
-
Summary: Introduce window_time function to extract event time from the 
window column  (was: Fix late record filtering to support chaining of stateful 
operators)

> Introduce window_time function to extract event time from the window column
> ---
>
> Key: SPARK-40821
> URL: https://issues.apache.org/jira/browse/SPARK-40821
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Alex Balikov
>Assignee: Alex Balikov
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently chaining of stateful operators is Spark Structured Streaming is not 
> supported for various reasons and is blocked by the unsupported operations 
> check (spark.sql.streaming.unsupportedOperationCheck flag). We propose to fix 
> this as chaining of stateful operators is a common streaming scenario - e.g.
> stream-stream join -> windowed aggregation
> window aggregation -> window aggregation
> etc
> What is broken:
>  # every stateful operator performs late record filtering against the global 
> watermark. When chaining stateful operators (e.g. window aggregations) the 
> output produced by the first stateful operator is effectively late against 
> the watermark and thus filtered out by the next operator late record 
> filtering (technically the next operator should not do late record filtering 
> but it can be changed to assert for correctness detection, etc)
>  # when chaining window aggregations, the first window aggregating operator 
> produces records with schema \{ window: { start: Timestamp, end: Timestamp }, 
> agg: Long } - there is not explicit event time in the schema to be used by 
> the next stateful operator (the correct event time should be window.end - 1 )
>  # stream-stream time-interval join can produce late records by semantics, 
> e.g. if the join condition is:
> left.eventTime BETWEEN right.eventTime + INTERVAL 1 HOUR right.eventTime - 
> INTERVAL 1 HOUR
>           the produced records can be delayed by 1 hr relative to the 
> watermark.
> Proposed fixes:
>  1. 1 can be fixed by performing late record filtering against the previous 
> microbatch watermark instead of the current microbatch watermark.
> 2. 2 can be fixed by allowing the window and session_window functions to work 
> on the window column directly and compute the correct event time 
> transparently to the user. Also, introduce window_time SQL function to 
> compute correct event time from the window column.
> 3. 3 can be fixed by adding support for per-operator watermarks instead of a 
> single global watermark. In the example of stream-stream time interval join 
> followed by a stateful operator, the join operator will 'delay' the 
> downstream operator watermarks by a correct value to handle the delayed 
> records. Only stream-stream time-interval joins will be delaying the 
> watermark, any other operators will not delay downstream watermarks.
>  
> *This ticket handles no. 2 of the proposal.* Others will be handled in 
> separate ticket.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42587) Use wrapper versions for SBT and Maven in `connect` module tests

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693772#comment-17693772
 ] 

Apache Spark commented on SPARK-42587:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40183

> Use wrapper versions for SBT and Maven in `connect` module tests
> 
>
> Key: SPARK-42587
> URL: https://issues.apache.org/jira/browse/SPARK-42587
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42538) `functions#lit` support more types

2023-02-26 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693770#comment-17693770
 ] 

Yang Jie commented on SPARK-42538:
--

Got it

> `functions#lit` support more types 
> ---
>
> Key: SPARK-42538
> URL: https://issues.apache.org/jira/browse/SPARK-42538
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42590) Introduce Decimal128 as the physical type for DecimalType

2023-02-26 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-42590:
--

 Summary: Introduce Decimal128 as the physical type for DecimalType
 Key: SPARK-42590
 URL: https://issues.apache.org/jira/browse/SPARK-42590
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42538) `functions#lit` support more types

2023-02-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-42538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693768#comment-17693768
 ] 

Herman van Hövell commented on SPARK-42538:
---

It technically could be released without it. Retargetting closed issues should 
be a part of the RC process.

> `functions#lit` support more types 
> ---
>
> Key: SPARK-42538
> URL: https://issues.apache.org/jira/browse/SPARK-42538
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42588) collapse two adjacent windows with the equivalent partition/order expression

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693766#comment-17693766
 ] 

Apache Spark commented on SPARK-42588:
--

User 'zml1206' has created a pull request for this issue:
https://github.com/apache/spark/pull/40182

> collapse two adjacent windows with the equivalent partition/order expression
> 
>
> Key: SPARK-42588
> URL: https://issues.apache.org/jira/browse/SPARK-42588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.3, 3.3.2
>Reporter: zhuml
>Priority: Major
>
> Extend the CollapseWindow rule to collapse Window nodes with the equivalent 
> partition/order expressions
> {code:java}
> Seq((1, 1), (2, 2)).toDF("a", "b")
>   .withColumn("max_b", expr("max(b) OVER (PARTITION BY abs(a))"))
>   .withColumn("min_b", expr("min(b) OVER (PARTITION BY abs(a))"))
> == Optimized Logical Plan ==
> before
> Project [a#7, b#8, max_b#11, min_b#17]
> +- Window [min(b#8) windowspecdefinition(_w0#19, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS min_b#17], [_w0#19]
>+- Project [a#7, b#8, max_b#11, abs(a#7) AS _w0#19]
>   +- Window [max(b#8) windowspecdefinition(_w0#13, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS max_b#11], [_w0#13]
>  +- Project [_1#2 AS a#7, _2#3 AS b#8, abs(_1#2) AS _w0#13]
> +- LocalRelation [_1#2, _2#3]
> after
> Project [a#7, b#8, max_b#11, min_b#17]
> +- Window [max(b#8) windowspecdefinition(_w0#13, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS max_b#11, min(b#8) windowspecdefinition(_w0#13, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS min_b#17], [_w0#13]
>+- Project [_1#2 AS a#7, _2#3 AS b#8, abs(_1#2) AS _w0#13]
>   +- LocalRelation [_1#2, _2#3]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42588) collapse two adjacent windows with the equivalent partition/order expression

2023-02-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42588:


Assignee: (was: Apache Spark)

> collapse two adjacent windows with the equivalent partition/order expression
> 
>
> Key: SPARK-42588
> URL: https://issues.apache.org/jira/browse/SPARK-42588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.3, 3.3.2
>Reporter: zhuml
>Priority: Major
>
> Extend the CollapseWindow rule to collapse Window nodes with the equivalent 
> partition/order expressions
> {code:java}
> Seq((1, 1), (2, 2)).toDF("a", "b")
>   .withColumn("max_b", expr("max(b) OVER (PARTITION BY abs(a))"))
>   .withColumn("min_b", expr("min(b) OVER (PARTITION BY abs(a))"))
> == Optimized Logical Plan ==
> before
> Project [a#7, b#8, max_b#11, min_b#17]
> +- Window [min(b#8) windowspecdefinition(_w0#19, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS min_b#17], [_w0#19]
>+- Project [a#7, b#8, max_b#11, abs(a#7) AS _w0#19]
>   +- Window [max(b#8) windowspecdefinition(_w0#13, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS max_b#11], [_w0#13]
>  +- Project [_1#2 AS a#7, _2#3 AS b#8, abs(_1#2) AS _w0#13]
> +- LocalRelation [_1#2, _2#3]
> after
> Project [a#7, b#8, max_b#11, min_b#17]
> +- Window [max(b#8) windowspecdefinition(_w0#13, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS max_b#11, min(b#8) windowspecdefinition(_w0#13, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS min_b#17], [_w0#13]
>+- Project [_1#2 AS a#7, _2#3 AS b#8, abs(_1#2) AS _w0#13]
>   +- LocalRelation [_1#2, _2#3]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42588) collapse two adjacent windows with the equivalent partition/order expression

2023-02-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42588:


Assignee: Apache Spark

> collapse two adjacent windows with the equivalent partition/order expression
> 
>
> Key: SPARK-42588
> URL: https://issues.apache.org/jira/browse/SPARK-42588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.3, 3.3.2
>Reporter: zhuml
>Assignee: Apache Spark
>Priority: Major
>
> Extend the CollapseWindow rule to collapse Window nodes with the equivalent 
> partition/order expressions
> {code:java}
> Seq((1, 1), (2, 2)).toDF("a", "b")
>   .withColumn("max_b", expr("max(b) OVER (PARTITION BY abs(a))"))
>   .withColumn("min_b", expr("min(b) OVER (PARTITION BY abs(a))"))
> == Optimized Logical Plan ==
> before
> Project [a#7, b#8, max_b#11, min_b#17]
> +- Window [min(b#8) windowspecdefinition(_w0#19, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS min_b#17], [_w0#19]
>+- Project [a#7, b#8, max_b#11, abs(a#7) AS _w0#19]
>   +- Window [max(b#8) windowspecdefinition(_w0#13, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS max_b#11], [_w0#13]
>  +- Project [_1#2 AS a#7, _2#3 AS b#8, abs(_1#2) AS _w0#13]
> +- LocalRelation [_1#2, _2#3]
> after
> Project [a#7, b#8, max_b#11, min_b#17]
> +- Window [max(b#8) windowspecdefinition(_w0#13, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS max_b#11, min(b#8) windowspecdefinition(_w0#13, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS min_b#17], [_w0#13]
>+- Project [_1#2 AS a#7, _2#3 AS b#8, abs(_1#2) AS _w0#13]
>   +- LocalRelation [_1#2, _2#3]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42589) Exclude `RelationalGroupedDataset.apply` from CompatibilitySuite

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693764#comment-17693764
 ] 

Apache Spark commented on SPARK-42589:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40181

> Exclude `RelationalGroupedDataset.apply` from CompatibilitySuite
> 
>
> Key: SPARK-42589
> URL: https://issues.apache.org/jira/browse/SPARK-42589
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42589) Exclude `RelationalGroupedDataset.apply` from CompatibilitySuite

2023-02-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42589:


Assignee: (was: Apache Spark)

> Exclude `RelationalGroupedDataset.apply` from CompatibilitySuite
> 
>
> Key: SPARK-42589
> URL: https://issues.apache.org/jira/browse/SPARK-42589
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42588) collapse two adjacent windows with the equivalent partition/order expression

2023-02-26 Thread zhuml (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuml updated SPARK-42588:
--
Description: 
Extend the CollapseWindow rule to collapse Window nodes with the equivalent 
partition/order expressions

{code:java}
Seq((1, 1), (2, 2)).toDF("a", "b")
  .withColumn("max_b", expr("max(b) OVER (PARTITION BY abs(a))"))
  .withColumn("min_b", expr("min(b) OVER (PARTITION BY abs(a))"))

== Optimized Logical Plan ==
before
Project [a#7, b#8, max_b#11, min_b#17]
+- Window [min(b#8) windowspecdefinition(_w0#19, specifiedwindowframe(RowFrame, 
unboundedpreceding$(), unboundedfollowing$())) AS min_b#17], [_w0#19]
   +- Project [a#7, b#8, max_b#11, abs(a#7) AS _w0#19]
  +- Window [max(b#8) windowspecdefinition(_w0#13, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
AS max_b#11], [_w0#13]
 +- Project [_1#2 AS a#7, _2#3 AS b#8, abs(_1#2) AS _w0#13]
+- LocalRelation [_1#2, _2#3]
after
Project [a#7, b#8, max_b#11, min_b#17]
+- Window [max(b#8) windowspecdefinition(_w0#13, specifiedwindowframe(RowFrame, 
unboundedpreceding$(), unboundedfollowing$())) AS max_b#11, min(b#8) 
windowspecdefinition(_w0#13, specifiedwindowframe(RowFrame, 
unboundedpreceding$(), unboundedfollowing$())) AS min_b#17], [_w0#13]
   +- Project [_1#2 AS a#7, _2#3 AS b#8, abs(_1#2) AS _w0#13]
  +- LocalRelation [_1#2, _2#3]
{code}


> collapse two adjacent windows with the equivalent partition/order expression
> 
>
> Key: SPARK-42588
> URL: https://issues.apache.org/jira/browse/SPARK-42588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.3, 3.3.2
>Reporter: zhuml
>Priority: Major
>
> Extend the CollapseWindow rule to collapse Window nodes with the equivalent 
> partition/order expressions
> {code:java}
> Seq((1, 1), (2, 2)).toDF("a", "b")
>   .withColumn("max_b", expr("max(b) OVER (PARTITION BY abs(a))"))
>   .withColumn("min_b", expr("min(b) OVER (PARTITION BY abs(a))"))
> == Optimized Logical Plan ==
> before
> Project [a#7, b#8, max_b#11, min_b#17]
> +- Window [min(b#8) windowspecdefinition(_w0#19, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS min_b#17], [_w0#19]
>+- Project [a#7, b#8, max_b#11, abs(a#7) AS _w0#19]
>   +- Window [max(b#8) windowspecdefinition(_w0#13, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS max_b#11], [_w0#13]
>  +- Project [_1#2 AS a#7, _2#3 AS b#8, abs(_1#2) AS _w0#13]
> +- LocalRelation [_1#2, _2#3]
> after
> Project [a#7, b#8, max_b#11, min_b#17]
> +- Window [max(b#8) windowspecdefinition(_w0#13, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS max_b#11, min(b#8) windowspecdefinition(_w0#13, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS min_b#17], [_w0#13]
>+- Project [_1#2 AS a#7, _2#3 AS b#8, abs(_1#2) AS _w0#13]
>   +- LocalRelation [_1#2, _2#3]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42589) Exclude `RelationalGroupedDataset.apply` from CompatibilitySuite

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693763#comment-17693763
 ] 

Apache Spark commented on SPARK-42589:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40181

> Exclude `RelationalGroupedDataset.apply` from CompatibilitySuite
> 
>
> Key: SPARK-42589
> URL: https://issues.apache.org/jira/browse/SPARK-42589
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42589) Exclude `RelationalGroupedDataset.apply` from CompatibilitySuite

2023-02-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42589:


Assignee: Apache Spark

> Exclude `RelationalGroupedDataset.apply` from CompatibilitySuite
> 
>
> Key: SPARK-42589
> URL: https://issues.apache.org/jira/browse/SPARK-42589
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42589) Exclude `RelationalGroupedDataset.apply` from CompatibilitySuite

2023-02-26 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42589:
-

 Summary: Exclude `RelationalGroupedDataset.apply` from 
CompatibilitySuite
 Key: SPARK-42589
 URL: https://issues.apache.org/jira/browse/SPARK-42589
 Project: Spark
  Issue Type: Test
  Components: Connect, Tests
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42586) Implement RuntimeConf

2023-02-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-42586:
-

Assignee: Herman van Hövell

> Implement RuntimeConf
> -
>
> Key: SPARK-42586
> URL: https://issues.apache.org/jira/browse/SPARK-42586
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>
> Implement RuntimeConf for the Scala Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42560) Implement ColumnName

2023-02-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42560.
---
Fix Version/s: 3.4.1
   Resolution: Fixed

> Implement ColumnName
> 
>
> Key: SPARK-42560
> URL: https://issues.apache.org/jira/browse/SPARK-42560
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.4.1
>
>
> Implement ColumnName class for connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42587) Use wrapper versions for SBT and Maven in `connect` module tests

2023-02-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42587:
-

Assignee: Dongjoon Hyun

> Use wrapper versions for SBT and Maven in `connect` module tests
> 
>
> Key: SPARK-42587
> URL: https://issues.apache.org/jira/browse/SPARK-42587
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42587) Use wrapper versions for SBT and Maven in `connect` module tests

2023-02-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42587.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40180
[https://github.com/apache/spark/pull/40180]

> Use wrapper versions for SBT and Maven in `connect` module tests
> 
>
> Key: SPARK-42587
> URL: https://issues.apache.org/jira/browse/SPARK-42587
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42538) `functions#lit` support more types

2023-02-26 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693757#comment-17693757
 ] 

Yang Jie commented on SPARK-42538:
--

[~hvanhovell]  Should fix version be 3.4.0? It hasn't been officially released 
yet

 
 

> `functions#lit` support more types 
> ---
>
> Key: SPARK-42538
> URL: https://issues.apache.org/jira/browse/SPARK-42538
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42588) collapse two adjacent windows with the equivalent partition/order expression

2023-02-26 Thread zhuml (Jira)
zhuml created SPARK-42588:
-

 Summary: collapse two adjacent windows with the equivalent 
partition/order expression
 Key: SPARK-42588
 URL: https://issues.apache.org/jira/browse/SPARK-42588
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.2, 3.2.3
Reporter: zhuml






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42577) A large stage could run indefinitely due to executor lost

2023-02-26 Thread Tengfei Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693753#comment-17693753
 ] 

Tengfei Huang commented on SPARK-42577:
---

I am working on this. Thanks. [~Ngone51]

> A large stage could run indefinitely due to executor lost
> -
>
> Key: SPARK-42577
> URL: https://issues.apache.org/jira/browse/SPARK-42577
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2
>Reporter: wuyi
>Priority: Major
>
> When a stage is extremely large and Spark runs on spot instances or 
> problematic clusters with frequent worker/executor loss,  the stage could run 
> indefinitely due to task rerun caused by the executor loss. This happens, 
> when the external shuffle service is on, and the large stages runs hours to 
> complete, when spark tries to submit a child stage, it will find the parent 
> stage - the large one, has missed some partitions, so the large stage has to 
> rerun. When it completes again, it finds new missing partitions due to the 
> same reason.
> We should add a attempt limitation for this kind of scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42587) Use wrapper versions for SBT and Maven in `connect` module tests

2023-02-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42587:
--
Priority: Minor  (was: Major)

> Use wrapper versions for SBT and Maven in `connect` module tests
> 
>
> Key: SPARK-42587
> URL: https://issues.apache.org/jira/browse/SPARK-42587
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42587) Use wrapper versions for SBT and Maven in `connect` module tests

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693749#comment-17693749
 ] 

Apache Spark commented on SPARK-42587:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40180

> Use wrapper versions for SBT and Maven in `connect` module tests
> 
>
> Key: SPARK-42587
> URL: https://issues.apache.org/jira/browse/SPARK-42587
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42587) Use wrapper versions for SBT and Maven in `connect` module tests

2023-02-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42587:


Assignee: (was: Apache Spark)

> Use wrapper versions for SBT and Maven in `connect` module tests
> 
>
> Key: SPARK-42587
> URL: https://issues.apache.org/jira/browse/SPARK-42587
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42587) Use wrapper versions for SBT and Maven in `connect` module tests

2023-02-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42587:


Assignee: Apache Spark

> Use wrapper versions for SBT and Maven in `connect` module tests
> 
>
> Key: SPARK-42587
> URL: https://issues.apache.org/jira/browse/SPARK-42587
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42587) Use wrapper versions for SBT and Maven in `connect` module tests

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693748#comment-17693748
 ] 

Apache Spark commented on SPARK-42587:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40180

> Use wrapper versions for SBT and Maven in `connect` module tests
> 
>
> Key: SPARK-42587
> URL: https://issues.apache.org/jira/browse/SPARK-42587
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42587) Use wrapper versions for SBT and Maven in `connect` module tests

2023-02-26 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42587:
-

 Summary: Use wrapper versions for SBT and Maven in `connect` 
module tests
 Key: SPARK-42587
 URL: https://issues.apache.org/jira/browse/SPARK-42587
 Project: Spark
  Issue Type: Test
  Components: Connect, Tests
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-26 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693745#comment-17693745
 ] 

Hyukjin Kwon commented on SPARK-42485:
--

For the proper SPIP, should better read 
https://spark.apache.org/improvement-proposals.html, and answer these questions 
posted there.
>From a cursory look, I think this won't need an SPIP though.

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.2.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#00}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42517) Add documentation for Protobuf connector

2023-02-26 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693744#comment-17693744
 ] 

Hyukjin Kwon commented on SPARK-42517:
--

Duplicate of SPARK-40776?

> Add documentation for Protobuf connector
> 
>
> Key: SPARK-42517
> URL: https://issues.apache.org/jira/browse/SPARK-42517
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Similar to [https://spark.apache.org/docs/latest/sql-data-sources-avro.html,] 
> we should add documentation for Protobuf connector



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42572) Logic error for StateStore.validateStateRowFormat

2023-02-26 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693742#comment-17693742
 ] 

Hyukjin Kwon commented on SPARK-42572:
--

[~WweiL] are you saying that we should revert 
https://github.com/apache/spark/pull/40073? It won't need a new jira for that

> Logic error for StateStore.validateStateRowFormat
> -
>
> Key: SPARK-42572
> URL: https://issues.apache.org/jira/browse/SPARK-42572
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> SPARK-42484 Changed the logic of whether to check state store format in 
> StateStore.validateStateRowFormat. Revert it and add unit test to make sure 
> this won't happen again



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42499) Support for Runtime SQL configuration

2023-02-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42499.
--
Resolution: Duplicate

> Support for Runtime SQL configuration
> -
>
> Key: SPARK-42499
> URL: https://issues.apache.org/jira/browse/SPARK-42499
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42559) Implement DataFrameNaFunctions

2023-02-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-42559:
-

Assignee: BingKun Pan

> Implement DataFrameNaFunctions
> --
>
> Key: SPARK-42559
> URL: https://issues.apache.org/jira/browse/SPARK-42559
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: BingKun Pan
>Priority: Major
>
> Implement DataFrameNaFunctions for connect and hook it up to Dataset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42586) Implement RuntimeConf

2023-02-26 Thread Jira
Herman van Hövell created SPARK-42586:
-

 Summary: Implement RuntimeConf
 Key: SPARK-42586
 URL: https://issues.apache.org/jira/browse/SPARK-42586
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.4.0
Reporter: Herman van Hövell


Implement RuntimeConf for the Scala Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42585) Streaming createDataFrame implementation

2023-02-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42585:
-
Description: createDataFrame in Spark Connect is now one protobuf message 
which doesn't allow creating a large local DataFrame. We should make it 
streaming.

> Streaming createDataFrame implementation
> 
>
> Key: SPARK-42585
> URL: https://issues.apache.org/jira/browse/SPARK-42585
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> createDataFrame in Spark Connect is now one protobuf message which doesn't 
> allow creating a large local DataFrame. We should make it streaming.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42585) Streaming createDataFrame implementation

2023-02-26 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-42585:


 Summary: Streaming createDataFrame implementation
 Key: SPARK-42585
 URL: https://issues.apache.org/jira/browse/SPARK-42585
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42581) Add SparkSession implicits

2023-02-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-42581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693736#comment-17693736
 ] 

Herman van Hövell commented on SPARK-42581:
---

Waiting for SPARK-42560

> Add SparkSession implicits
> --
>
> Key: SPARK-42581
> URL: https://issues.apache.org/jira/browse/SPARK-42581
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42564) Implement Dataset.version and Dataset.time

2023-02-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42564.
---
Fix Version/s: 3.4.1
   Resolution: Fixed

> Implement Dataset.version and Dataset.time
> --
>
> Key: SPARK-42564
> URL: https://issues.apache.org/jira/browse/SPARK-42564
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: BingKun Pan
>Priority: Major
> Fix For: 3.4.1
>
>
> Implement Dataset.version and Dataset.time
> {code:java}
> /**
>  * The version of Spark on which this application is running.
>  *
>  * @since 2.0.0
>  */
> def version: String = SPARK_VERSION
> /**
>  * Executes some code block and prints to stdout the time taken to execute 
> the block. This is
>  * available in Scala only and is used primarily for interactive testing and 
> debugging.
>  *
>  * @since 2.1.0
>  */
> def time[T](f: => T): T = {
>   val start = System.nanoTime()
>   val ret = f
>   val end = System.nanoTime()
>   // scalastyle:off println
>   println(s"Time taken: ${NANOSECONDS.toMillis(end - start)} ms")
>   // scalastyle:on println
>   ret
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42419) Migrate `TypeError` into error framework for Spark Connect column API.

2023-02-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42419.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39991
[https://github.com/apache/spark/pull/39991]

> Migrate `TypeError` into error framework for Spark Connect column API.
> --
>
> Key: SPARK-42419
> URL: https://issues.apache.org/jira/browse/SPARK-42419
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> We should migrate all errors into PySpark error framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42419) Migrate `TypeError` into error framework for Spark Connect column API.

2023-02-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42419:


Assignee: Haejoon Lee

> Migrate `TypeError` into error framework for Spark Connect column API.
> --
>
> Key: SPARK-42419
> URL: https://issues.apache.org/jira/browse/SPARK-42419
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> We should migrate all errors into PySpark error framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42569) Throw unsupported exceptions for non-supported API

2023-02-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42569.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40172
[https://github.com/apache/spark/pull/40172]

> Throw unsupported exceptions for non-supported API
> --
>
> Key: SPARK-42569
> URL: https://issues.apache.org/jira/browse/SPARK-42569
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42574) DataFrame.toPandas should handle duplicated column names

2023-02-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42574.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40170
[https://github.com/apache/spark/pull/40170]

> DataFrame.toPandas should handle duplicated column names
> 
>
> Key: SPARK-42574
> URL: https://issues.apache.org/jira/browse/SPARK-42574
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:python}
> spark.sql("select 1 v, 1 v").toPandas()
> {code}
> should work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42574) DataFrame.toPandas should handle duplicated column names

2023-02-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42574:


Assignee: Takuya Ueshin

> DataFrame.toPandas should handle duplicated column names
> 
>
> Key: SPARK-42574
> URL: https://issues.apache.org/jira/browse/SPARK-42574
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>
> {code:python}
> spark.sql("select 1 v, 1 v").toPandas()
> {code}
> should work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42560) Implement ColumnName

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693730#comment-17693730
 ] 

Apache Spark commented on SPARK-42560:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40179

> Implement ColumnName
> 
>
> Key: SPARK-42560
> URL: https://issues.apache.org/jira/browse/SPARK-42560
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>
> Implement ColumnName class for connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42560) Implement ColumnName

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693729#comment-17693729
 ] 

Apache Spark commented on SPARK-42560:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40179

> Implement ColumnName
> 
>
> Key: SPARK-42560
> URL: https://issues.apache.org/jira/browse/SPARK-42560
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>
> Implement ColumnName class for connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42584) Improve output of Column.explain

2023-02-26 Thread Jira
Herman van Hövell created SPARK-42584:
-

 Summary: Improve output of Column.explain
 Key: SPARK-42584
 URL: https://issues.apache.org/jira/browse/SPARK-42584
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.4.0
Reporter: Herman van Hövell


We currently display the structure of the proto in both the regular and 
extended version of explain. We should display a more compact sql-a-like string 
for the regular version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42407) `with as` executed again

2023-02-26 Thread Pablo Langa Blanco (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693723#comment-17693723
 ] 

Pablo Langa Blanco commented on SPARK-42407:


In my opinion, "WITH AS" syntax is intended to simplify sql queries, but not to 
act at the execution level. To get what you want you can use "CACHE TABLE" 
combined with 'WITH AS'.

> `with as` executed again
> 
>
> Key: SPARK-42407
> URL: https://issues.apache.org/jira/browse/SPARK-42407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.3
>Reporter: yiku123
>Priority: Major
>
> When 'with as' is used multiple times, it will be executed again each time 
> without saving the results of' with as', resulting in low efficiency.
> Will you consider improving the behavior of 'with as'
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-40525) Floating-point value with an INT/BYTE/SHORT/LONG type errors out in DataFrame but evaluates to a rounded value in SparkSQL

2023-02-26 Thread Pablo Langa Blanco (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693721#comment-17693721
 ] 

Pablo Langa Blanco edited comment on SPARK-40525 at 2/26/23 10:57 PM:
--

Hi [~x/sys] ,

When you are working with Spark Sql interface you can configure the behavior 
and you have 3 policies for type coercion rules. 
([https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html)] If you 
set "strict" in spark.sql.storeAssignmentPolicy it's going to happen what you 
expect, but it's not the policy by default.

I hope it help you.


was (Author: planga82):
Hi [~x/sys] ,

When you are working with Spark Sql interface you can configure the behavior 
and you have 3 policies for type coercion rules. 
([https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html)] If you 
set "strict" in spark.sql.storeAssignmentPolicy it's going to happen what you 
expected, but it's not the policy by default.

I hope it help you.

> Floating-point value with an INT/BYTE/SHORT/LONG type errors out in DataFrame 
> but evaluates to a rounded value in SparkSQL
> --
>
> Key: SPARK-40525
> URL: https://issues.apache.org/jira/browse/SPARK-40525
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: xsys
>Priority: Major
>
> h3. Describe the bug
> Storing an invalid INT value {{1.1}} using DataFrames via {{spark-shell}} 
> expectedly errors out. However, it is evaluated to a rounded value {{1}} if 
> the value is inserted into the table via {{{}spark-sql{}}}.
> h3. Steps to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> $SPARK_HOME/bin/spark-sql {code}
> Execute the following:
> {code:java}
> spark-sql> create table int_floating_point_vals(c1 INT) stored as ORC;
> 22/09/19 16:49:11 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> Time taken: 0.216 seconds
> spark-sql> insert into int_floating_point_vals select 1.1;
> Time taken: 1.747 seconds
> spark-sql> select * from int_floating_point_vals;
> 1
> Time taken: 0.518 seconds, Fetched 1 row(s){code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}INT{}}} and {{{}1.1{}}}).
> h4. Here is a simplified example in {{{}spark-shell{}}}, where insertion of 
> the aforementioned value correctly raises an exception:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-shell{}}}:
> {code:java}
> $SPARK_HOME/bin/spark-shell{code}
> Execute the following:
> {code:java}
> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.types._
> val rdd = sc.parallelize(Seq(Row(1.1)))
> val schema = new StructType().add(StructField("c1", IntegerType, true))
> val df = spark.createDataFrame(rdd, schema)
> df.write.mode("overwrite").format("orc").saveAsTable("int_floating_point_vals")
>  {code}
> The following exception is raised:
> {code:java}
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
> java.lang.Double is not a valid external type for schema of int{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40525) Floating-point value with an INT/BYTE/SHORT/LONG type errors out in DataFrame but evaluates to a rounded value in SparkSQL

2023-02-26 Thread Pablo Langa Blanco (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693721#comment-17693721
 ] 

Pablo Langa Blanco commented on SPARK-40525:


Hi [~x/sys] ,

When you are working with Spark Sql interface you can configure the behavior 
and you have 3 policies for type coercion rules. 
([https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html)] If you 
set "strict" in spark.sql.storeAssignmentPolicy it's going to happen what you 
expected, but it's not the policy by default.

I hope it help you.

> Floating-point value with an INT/BYTE/SHORT/LONG type errors out in DataFrame 
> but evaluates to a rounded value in SparkSQL
> --
>
> Key: SPARK-40525
> URL: https://issues.apache.org/jira/browse/SPARK-40525
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: xsys
>Priority: Major
>
> h3. Describe the bug
> Storing an invalid INT value {{1.1}} using DataFrames via {{spark-shell}} 
> expectedly errors out. However, it is evaluated to a rounded value {{1}} if 
> the value is inserted into the table via {{{}spark-sql{}}}.
> h3. Steps to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> $SPARK_HOME/bin/spark-sql {code}
> Execute the following:
> {code:java}
> spark-sql> create table int_floating_point_vals(c1 INT) stored as ORC;
> 22/09/19 16:49:11 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> Time taken: 0.216 seconds
> spark-sql> insert into int_floating_point_vals select 1.1;
> Time taken: 1.747 seconds
> spark-sql> select * from int_floating_point_vals;
> 1
> Time taken: 0.518 seconds, Fetched 1 row(s){code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}INT{}}} and {{{}1.1{}}}).
> h4. Here is a simplified example in {{{}spark-shell{}}}, where insertion of 
> the aforementioned value correctly raises an exception:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-shell{}}}:
> {code:java}
> $SPARK_HOME/bin/spark-shell{code}
> Execute the following:
> {code:java}
> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.types._
> val rdd = sc.parallelize(Seq(Row(1.1)))
> val schema = new StructType().add(StructField("c1", IntegerType, true))
> val df = spark.createDataFrame(rdd, schema)
> df.write.mode("overwrite").format("orc").saveAsTable("int_floating_point_vals")
>  {code}
> The following exception is raised:
> {code:java}
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
> java.lang.Double is not a valid external type for schema of int{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42583) Remove outer join if all aggregate functions are distinct

2023-02-26 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-42583:

Description: To support more cases: 
https://github.com/pingcap/tidb/blob/master/planner/core/rule_join_elimination.go#L159

> Remove outer join if all aggregate functions are distinct
> -
>
> Key: SPARK-42583
> URL: https://issues.apache.org/jira/browse/SPARK-42583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> To support more cases: 
> https://github.com/pingcap/tidb/blob/master/planner/core/rule_join_elimination.go#L159



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42583) Remove outer join if all aggregate functions are distinct

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693649#comment-17693649
 ] 

Apache Spark commented on SPARK-42583:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40177

> Remove outer join if all aggregate functions are distinct
> -
>
> Key: SPARK-42583
> URL: https://issues.apache.org/jira/browse/SPARK-42583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42583) Remove outer join if all aggregate functions are distinct

2023-02-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42583:


Assignee: (was: Apache Spark)

> Remove outer join if all aggregate functions are distinct
> -
>
> Key: SPARK-42583
> URL: https://issues.apache.org/jira/browse/SPARK-42583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42583) Remove outer join if all aggregate functions are distinct

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693648#comment-17693648
 ] 

Apache Spark commented on SPARK-42583:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40177

> Remove outer join if all aggregate functions are distinct
> -
>
> Key: SPARK-42583
> URL: https://issues.apache.org/jira/browse/SPARK-42583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42583) Remove outer join if all aggregate functions are distinct

2023-02-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42583:


Assignee: Apache Spark

> Remove outer join if all aggregate functions are distinct
> -
>
> Key: SPARK-42583
> URL: https://issues.apache.org/jira/browse/SPARK-42583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42583) Remove outer join if all aggregate functions are distinct

2023-02-26 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-42583:
---

 Summary: Remove outer join if all aggregate functions are distinct
 Key: SPARK-42583
 URL: https://issues.apache.org/jira/browse/SPARK-42583
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42564) Implement Dataset.version and Dataset.time

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693643#comment-17693643
 ] 

Apache Spark commented on SPARK-42564:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40176

> Implement Dataset.version and Dataset.time
> --
>
> Key: SPARK-42564
> URL: https://issues.apache.org/jira/browse/SPARK-42564
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: BingKun Pan
>Priority: Major
>
> Implement Dataset.version and Dataset.time
> {code:java}
> /**
>  * The version of Spark on which this application is running.
>  *
>  * @since 2.0.0
>  */
> def version: String = SPARK_VERSION
> /**
>  * Executes some code block and prints to stdout the time taken to execute 
> the block. This is
>  * available in Scala only and is used primarily for interactive testing and 
> debugging.
>  *
>  * @since 2.1.0
>  */
> def time[T](f: => T): T = {
>   val start = System.nanoTime()
>   val ret = f
>   val end = System.nanoTime()
>   // scalastyle:off println
>   println(s"Time taken: ${NANOSECONDS.toMillis(end - start)} ms")
>   // scalastyle:on println
>   ret
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42564) Implement Dataset.version and Dataset.time

2023-02-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42564:


Assignee: BingKun Pan  (was: Apache Spark)

> Implement Dataset.version and Dataset.time
> --
>
> Key: SPARK-42564
> URL: https://issues.apache.org/jira/browse/SPARK-42564
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: BingKun Pan
>Priority: Major
>
> Implement Dataset.version and Dataset.time
> {code:java}
> /**
>  * The version of Spark on which this application is running.
>  *
>  * @since 2.0.0
>  */
> def version: String = SPARK_VERSION
> /**
>  * Executes some code block and prints to stdout the time taken to execute 
> the block. This is
>  * available in Scala only and is used primarily for interactive testing and 
> debugging.
>  *
>  * @since 2.1.0
>  */
> def time[T](f: => T): T = {
>   val start = System.nanoTime()
>   val ret = f
>   val end = System.nanoTime()
>   // scalastyle:off println
>   println(s"Time taken: ${NANOSECONDS.toMillis(end - start)} ms")
>   // scalastyle:on println
>   ret
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42564) Implement Dataset.version and Dataset.time

2023-02-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42564:


Assignee: Apache Spark  (was: BingKun Pan)

> Implement Dataset.version and Dataset.time
> --
>
> Key: SPARK-42564
> URL: https://issues.apache.org/jira/browse/SPARK-42564
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> Implement Dataset.version and Dataset.time
> {code:java}
> /**
>  * The version of Spark on which this application is running.
>  *
>  * @since 2.0.0
>  */
> def version: String = SPARK_VERSION
> /**
>  * Executes some code block and prints to stdout the time taken to execute 
> the block. This is
>  * available in Scala only and is used primarily for interactive testing and 
> debugging.
>  *
>  * @since 2.1.0
>  */
> def time[T](f: => T): T = {
>   val start = System.nanoTime()
>   val ret = f
>   val end = System.nanoTime()
>   // scalastyle:off println
>   println(s"Time taken: ${NANOSECONDS.toMillis(end - start)} ms")
>   // scalastyle:on println
>   ret
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42564) Implement Dataset.version and Dataset.time

2023-02-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693642#comment-17693642
 ] 

Apache Spark commented on SPARK-42564:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40176

> Implement Dataset.version and Dataset.time
> --
>
> Key: SPARK-42564
> URL: https://issues.apache.org/jira/browse/SPARK-42564
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: BingKun Pan
>Priority: Major
>
> Implement Dataset.version and Dataset.time
> {code:java}
> /**
>  * The version of Spark on which this application is running.
>  *
>  * @since 2.0.0
>  */
> def version: String = SPARK_VERSION
> /**
>  * Executes some code block and prints to stdout the time taken to execute 
> the block. This is
>  * available in Scala only and is used primarily for interactive testing and 
> debugging.
>  *
>  * @since 2.1.0
>  */
> def time[T](f: => T): T = {
>   val start = System.nanoTime()
>   val ret = f
>   val end = System.nanoTime()
>   // scalastyle:off println
>   println(s"Time taken: ${NANOSECONDS.toMillis(end - start)} ms")
>   // scalastyle:on println
>   ret
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42559) Implement DataFrameNaFunctions

2023-02-26 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693641#comment-17693641
 ] 

BingKun Pan commented on SPARK-42559:
-

I work on it.

> Implement DataFrameNaFunctions
> --
>
> Key: SPARK-42559
> URL: https://issues.apache.org/jira/browse/SPARK-42559
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Implement DataFrameNaFunctions for connect and hook it up to Dataset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org