[jira] [Updated] (SPARK-48068) Fix `mypy` failure in Python 3.10 and 3.11

2024-05-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48068:
--
Description: 
We assumed that `PYTHON_EXECUTABLE` is used for `dev/lint-python` like the 
following. That's not true. We need to use `mypy`'s parameter to make it sure.

https://github.com/apache/spark/blob/ff401dde50343c9bbc1c49a0294272f2da7d01e2/.github/workflows/build_and_test.yml#L705

  was:
{code}
$ python3 --version
Python 3.10.13

$ dev/lint-python --mypy
starting mypy annotations test...
annotations failed mypy checks:
python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
comment  [unused-ignore]
Found 1 error in 1 file (checked 1013 source files)
1
{code}

{code}
$ python3 --version
Python 3.11.8

$ dev/lint-python --mypy
starting mypy annotations test...
annotations failed mypy checks:
python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
comment  [unused-ignore]
Found 1 error in 1 file (checked 1013 source files)
1
{code}


> Fix `mypy` failure in Python 3.10 and 3.11
> --
>
> Key: SPARK-48068
> URL: https://issues.apache.org/jira/browse/SPARK-48068
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0, 4.0.0, 3.5.1, 3.3.4, 3.4.3
>    Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> We assumed that `PYTHON_EXECUTABLE` is used for `dev/lint-python` like the 
> following. That's not true. We need to use `mypy`'s parameter to make it sure.
> https://github.com/apache/spark/blob/ff401dde50343c9bbc1c49a0294272f2da7d01e2/.github/workflows/build_and_test.yml#L705



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48068) Fix `mypy` failure in Python 3.10 and 3.11

2024-05-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48068:
--
Affects Version/s: 3.4.3
   3.3.4
   3.5.1
   3.3.0

> Fix `mypy` failure in Python 3.10 and 3.11
> --
>
> Key: SPARK-48068
> URL: https://issues.apache.org/jira/browse/SPARK-48068
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0, 4.0.0, 3.5.1, 3.3.4, 3.4.3
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> $ python3 --version
> Python 3.10.13
> $ dev/lint-python --mypy
> starting mypy annotations test...
> annotations failed mypy checks:
> python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
> comment  [unused-ignore]
> Found 1 error in 1 file (checked 1013 source files)
> 1
> {code}
> {code}
> $ python3 --version
> Python 3.11.8
> $ dev/lint-python --mypy
> starting mypy annotations test...
> annotations failed mypy checks:
> python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
> comment  [unused-ignore]
> Found 1 error in 1 file (checked 1013 source files)
> 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48071) Use Python 3.10 in `Python Linter` step

2024-05-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48071:
--
Issue Type: Improvement  (was: Bug)

> Use Python 3.10 in `Python Linter` step
> ---
>
> Key: SPARK-48071
> URL: https://issues.apache.org/jira/browse/SPARK-48071
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48069) Handle PEP-632 by checking `ModuleNotFoundError` on `setuptools` in Python 3.12

2024-05-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48069.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46315
[https://github.com/apache/spark/pull/46315]

> Handle PEP-632 by checking `ModuleNotFoundError` on `setuptools` in Python 
> 3.12
> ---
>
> Key: SPARK-48069
> URL: https://issues.apache.org/jira/browse/SPARK-48069
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48069) Handle PEP-632 by checking `ModuleNotFoundError` on `setuptools` in Python 3.12

2024-05-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48069:
--
Summary: Handle PEP-632 by checking `ModuleNotFoundError` on `setuptools` 
in Python 3.12  (was: Handle PEP-632 by checking `ModuleNotFoundError` on 
`setuptools`)

> Handle PEP-632 by checking `ModuleNotFoundError` on `setuptools` in Python 
> 3.12
> ---
>
> Key: SPARK-48069
> URL: https://issues.apache.org/jira/browse/SPARK-48069
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48068) Fix `mypy` failure in Python 3.10 and 3.11

2024-05-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48068:
--
Summary: Fix `mypy` failure in Python 3.10 and 3.11  (was: Fix `mypy` 
failure in Python 3.10+)

> Fix `mypy` failure in Python 3.10 and 3.11
> --
>
> Key: SPARK-48068
> URL: https://issues.apache.org/jira/browse/SPARK-48068
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> $ python3 --version
> Python 3.10.13
> $ dev/lint-python --mypy
> starting mypy annotations test...
> annotations failed mypy checks:
> python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
> comment  [unused-ignore]
> Found 1 error in 1 file (checked 1013 source files)
> 1
> {code}
> {code}
> $ python3 --version
> Python 3.11.8
> $ dev/lint-python --mypy
> starting mypy annotations test...
> annotations failed mypy checks:
> python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
> comment  [unused-ignore]
> Found 1 error in 1 file (checked 1013 source files)
> 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48069) Handle PEP-632 by checking `ModuleNotFoundError` on `setuptools`

2024-05-01 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48069:
-

 Summary: Handle PEP-632 by checking `ModuleNotFoundError` on 
`setuptools`
 Key: SPARK-48069
 URL: https://issues.apache.org/jira/browse/SPARK-48069
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48069) Handle PEP-632 by checking `ModuleNotFoundError` on `setuptools`

2024-05-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48069:
-

Assignee: Dongjoon Hyun

> Handle PEP-632 by checking `ModuleNotFoundError` on `setuptools`
> 
>
> Key: SPARK-48069
> URL: https://issues.apache.org/jira/browse/SPARK-48069
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48068) Fix `mypy` failure in Python 3.10+

2024-05-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48068:
-

Assignee: Dongjoon Hyun

> Fix `mypy` failure in Python 3.10+
> --
>
> Key: SPARK-48068
> URL: https://issues.apache.org/jira/browse/SPARK-48068
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> $ python3 --version
> Python 3.10.13
> $ dev/lint-python --mypy
> starting mypy annotations test...
> annotations failed mypy checks:
> python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
> comment  [unused-ignore]
> Found 1 error in 1 file (checked 1013 source files)
> 1
> {code}
> {code}
> $ python3 --version
> Python 3.11.8
> $ dev/lint-python --mypy
> starting mypy annotations test...
> annotations failed mypy checks:
> python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
> comment  [unused-ignore]
> Found 1 error in 1 file (checked 1013 source files)
> 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48068) Fix `mypy` failure in Python 3.10+

2024-05-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48068:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Bug)

> Fix `mypy` failure in Python 3.10+
> --
>
> Key: SPARK-48068
> URL: https://issues.apache.org/jira/browse/SPARK-48068
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> $ python3 --version
> Python 3.10.13
> $ dev/lint-python --mypy
> starting mypy annotations test...
> annotations failed mypy checks:
> python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
> comment  [unused-ignore]
> Found 1 error in 1 file (checked 1013 source files)
> 1
> {code}
> {code}
> $ python3 --version
> Python 3.11.8
> $ dev/lint-python --mypy
> starting mypy annotations test...
> annotations failed mypy checks:
> python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
> comment  [unused-ignore]
> Found 1 error in 1 file (checked 1013 source files)
> 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48068) Fix `mypy` failure in Python 3.10+

2024-05-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48068:
--
Description: 
{code}
$ python3 --version
Python 3.10.13

$ dev/lint-python --mypy
starting mypy annotations test...
annotations failed mypy checks:
python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
comment  [unused-ignore]
Found 1 error in 1 file (checked 1013 source files)
1
{code}

{code}
$ python3 --version
Python 3.11.8

$ dev/lint-python --mypy
starting mypy annotations test...
annotations failed mypy checks:
python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
comment  [unused-ignore]
Found 1 error in 1 file (checked 1013 source files)
1
{code}

  was:
{code}
$ python3 --version
Python 3.10.13

$ dev/lint-python --mypy
starting mypy annotations test...
annotations failed mypy checks:
python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
comment  [unused-ignore]
Found 1 error in 1 file (checked 1013 source files)
1
{code}


> Fix `mypy` failure in Python 3.10+
> --
>
> Key: SPARK-48068
> URL: https://issues.apache.org/jira/browse/SPARK-48068
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>    Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> $ python3 --version
> Python 3.10.13
> $ dev/lint-python --mypy
> starting mypy annotations test...
> annotations failed mypy checks:
> python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
> comment  [unused-ignore]
> Found 1 error in 1 file (checked 1013 source files)
> 1
> {code}
> {code}
> $ python3 --version
> Python 3.11.8
> $ dev/lint-python --mypy
> starting mypy annotations test...
> annotations failed mypy checks:
> python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
> comment  [unused-ignore]
> Found 1 error in 1 file (checked 1013 source files)
> 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48068) Fix `mypy` failure in Python 3.10+

2024-05-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48068:
--
Summary: Fix `mypy` failure in Python 3.10+  (was: Fix `mypy` failure in 
Python 3.10)

> Fix `mypy` failure in Python 3.10+
> --
>
> Key: SPARK-48068
> URL: https://issues.apache.org/jira/browse/SPARK-48068
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> $ python3 --version
> Python 3.10.13
> $ dev/lint-python --mypy
> starting mypy annotations test...
> annotations failed mypy checks:
> python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
> comment  [unused-ignore]
> Found 1 error in 1 file (checked 1013 source files)
> 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48068) Fix `mypy` failure in Python 3.10

2024-04-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48068:
--
Description: 
{code}
$ python3 --version
Python 3.10.13

$ dev/lint-python --mypy
starting mypy annotations test...
annotations failed mypy checks:
python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
comment  [unused-ignore]
Found 1 error in 1 file (checked 1013 source files)
1
{code}

  was:
{code}
$ dev/lint-python --mypy
starting mypy annotations test...
annotations failed mypy checks:
python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
comment  [unused-ignore]
Found 1 error in 1 file (checked 1013 source files)
1
{code}


> Fix `mypy` failure in Python 3.10
> -
>
> Key: SPARK-48068
> URL: https://issues.apache.org/jira/browse/SPARK-48068
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> $ python3 --version
> Python 3.10.13
> $ dev/lint-python --mypy
> starting mypy annotations test...
> annotations failed mypy checks:
> python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
> comment  [unused-ignore]
> Found 1 error in 1 file (checked 1013 source files)
> 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48068) Fix `mypy` failure in Python 3.10

2024-04-30 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48068:
-

 Summary: Fix `mypy` failure in Python 3.10
 Key: SPARK-48068
 URL: https://issues.apache.org/jira/browse/SPARK-48068
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun


{code}
$ dev/lint-python --mypy
starting mypy annotations test...
annotations failed mypy checks:
python/pyspark/sql/pandas/conversion.py:450: error: Unused "type: ignore" 
comment  [unused-ignore]
Found 1 error in 1 file (checked 1013 source files)
1
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48028) Regenerate benchmark results after turning ANSI on

2024-04-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48028.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46266
[https://github.com/apache/spark/pull/46266]

> Regenerate benchmark results after turning ANSI on
> --
>
> Key: SPARK-48028
> URL: https://issues.apache.org/jira/browse/SPARK-48028
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48063) Enable `spark.stage.ignoreDecommissionFetchFailure` by default

2024-04-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48063.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46308
[https://github.com/apache/spark/pull/46308]

> Enable `spark.stage.ignoreDecommissionFetchFailure` by default
> --
>
> Key: SPARK-48063
> URL: https://issues.apache.org/jira/browse/SPARK-48063
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48063) Enable `spark.stage.ignoreDecommissionFetchFailure` by default

2024-04-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48063:
-

Assignee: Dongjoon Hyun

> Enable `spark.stage.ignoreDecommissionFetchFailure` by default
> --
>
> Key: SPARK-48063
> URL: https://issues.apache.org/jira/browse/SPARK-48063
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48063) Enable `spark.stage.ignoreDecommissionFetchFailure` by default

2024-04-30 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48063:
-

 Summary: Enable `spark.stage.ignoreDecommissionFetchFailure` by 
default
 Key: SPARK-48063
 URL: https://issues.apache.org/jira/browse/SPARK-48063
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48060) Fix StreamingQueryHashPartitionVerifySuite to update golden files correctly

2024-04-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48060:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Test)

> Fix StreamingQueryHashPartitionVerifySuite to update golden files correctly
> ---
>
> Key: SPARK-48060
> URL: https://issues.apache.org/jira/browse/SPARK-48060
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming, Tests
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48060) Fix StreamingQueryHashPartitionVerifySuite to update golden files correctly

2024-04-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48060.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46304
[https://github.com/apache/spark/pull/46304]

> Fix StreamingQueryHashPartitionVerifySuite to update golden files correctly
> ---
>
> Key: SPARK-48060
> URL: https://issues.apache.org/jira/browse/SPARK-48060
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming, Tests
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48057) Enable `GroupedApplyInPandasTests. test_grouped_with_empty_partition`

2024-04-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48057.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46299
[https://github.com/apache/spark/pull/46299]

> Enable `GroupedApplyInPandasTests. test_grouped_with_empty_partition`
> -
>
> Key: SPARK-48057
> URL: https://issues.apache.org/jira/browse/SPARK-48057
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48061) Parameterize max limits of `spark.sql.test.randomDataGenerator`

2024-04-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48061.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46305
[https://github.com/apache/spark/pull/46305]

> Parameterize max limits of `spark.sql.test.randomDataGenerator`
> ---
>
> Key: SPARK-48061
> URL: https://issues.apache.org/jira/browse/SPARK-48061
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48061) Parameterize max limits of `spark.sql.test.randomDataGenerator`

2024-04-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48061:
-

Assignee: Dongjoon Hyun

> Parameterize max limits of `spark.sql.test.randomDataGenerator`
> ---
>
> Key: SPARK-48061
> URL: https://issues.apache.org/jira/browse/SPARK-48061
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48061) Parameterize max limits of `spark.sql.test.randomDataGenerator`

2024-04-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48061:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Test)

> Parameterize max limits of `spark.sql.test.randomDataGenerator`
> ---
>
> Key: SPARK-48061
> URL: https://issues.apache.org/jira/browse/SPARK-48061
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48061) Parameterize max limits of `spark.sql.test.randomDataGenerator`

2024-04-30 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48061:
-

 Summary: Parameterize max limits of 
`spark.sql.test.randomDataGenerator`
 Key: SPARK-48061
 URL: https://issues.apache.org/jira/browse/SPARK-48061
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48060) Fix StreamingQueryHashPartitionVerifySuite to update golden files correctly

2024-04-30 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48060:
-

 Summary: Fix StreamingQueryHashPartitionVerifySuite to update 
golden files correctly
 Key: SPARK-48060
 URL: https://issues.apache.org/jira/browse/SPARK-48060
 Project: Spark
  Issue Type: Test
  Components: Structured Streaming, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

2024-04-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46122.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46207
[https://github.com/apache/spark/pull/46207]

> Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
> -
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>    Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[VOTE][RESULT] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-30 Thread Dongjoon Hyun
The vote passes with 11 +1s (6 binding +1s) and one -1.
Thanks to all who helped with the vote!

(* = binding)
+1:
- Dongjoon Hyun *
- Gengliang Wang *
- Liang-Chi Hsieh *
- Holden Karau *
- Zhou Jiang
- Cheng Pan
- Hyukjin Kwon *
- DB Tsai *
- Ye Xianjin
- XiDuo You
- Nimrod Ofek

+0: None

-1:
- Mich Talebzadeh


[jira] [Commented] (SPARK-48016) Fix a bug in try_divide function when with decimals

2024-04-29 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842200#comment-17842200
 ] 

Dongjoon Hyun commented on SPARK-48016:
---

Thank you so much!

> Fix a bug in try_divide function when with decimals
> ---
>
> Key: SPARK-48016
> URL: https://issues.apache.org/jira/browse/SPARK-48016
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> Binary Arithmetic operators should include the evalMode during makeCopy. 
> Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of 
> returning null
>  
> {code:java}
> SELECT try_divide(1, decimal(0)); {code}
> This is caused from the rule DecimalPrecision:
> {code:java}
> case b @ BinaryOperator(left, right) if left.dataType != right.dataType =>
>   (left, right) match {
>  ...
> case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
> l.dataType.isInstanceOf[IntegralType] &&
> literalPickMinimumPrecision =>
>   b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48016) Fix a bug in try_divide function when with decimals

2024-04-29 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842191#comment-17842191
 ] 

Dongjoon Hyun commented on SPARK-48016:
---

Hi, [~Gengliang.Wang] .
- I updated the JIRA title according to the commit title.
- The umbrella Jira issue is done at Apache Spark 3.4.0. To give a more 
visibility, shall we move this to SPARK-44111 because recent ANSI JIRA issues 
are there?

> Fix a bug in try_divide function when with decimals
> ---
>
> Key: SPARK-48016
> URL: https://issues.apache.org/jira/browse/SPARK-48016
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> Binary Arithmetic operators should include the evalMode during makeCopy. 
> Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of 
> returning null
>  
> {code:java}
> SELECT try_divide(1, decimal(0)); {code}
> This is caused from the rule DecimalPrecision:
> {code:java}
> case b @ BinaryOperator(left, right) if left.dataType != right.dataType =>
>   (left, right) match {
>  ...
> case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
> l.dataType.isInstanceOf[IntegralType] &&
> literalPickMinimumPrecision =>
>   b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48016) Fix a bug in try_divide function when with decimals

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48016:
--
Summary: Fix a bug in try_divide function when with decimals  (was: Binary 
Arithmetic operators should include the evalMode when makeCopy)

> Fix a bug in try_divide function when with decimals
> ---
>
> Key: SPARK-48016
> URL: https://issues.apache.org/jira/browse/SPARK-48016
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> Binary Arithmetic operators should include the evalMode during makeCopy. 
> Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of 
> returning null
>  
> {code:java}
> SELECT try_divide(1, decimal(0)); {code}
> This is caused from the rule DecimalPrecision:
> {code:java}
> case b @ BinaryOperator(left, right) if left.dataType != right.dataType =>
>   (left, right) match {
>  ...
> case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
> l.dataType.isInstanceOf[IntegralType] &&
> literalPickMinimumPrecision =>
>   b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-29 Thread Dongjoon Hyun
? I'm not sure why you think in that direction.

What I wrote was the following.

- You voted +1 for SPARK-4 on April 14th
  (https://lists.apache.org/thread/tp92yzf8y4yjfk6r3dkqjtlb060g82sy) 
- You voted -1 for SPARK-46122 on April 26th.
  (https://lists.apache.org/thread/2ybq1jb19j0c52rgo43zfd9br1yhtfj8)

You showed a dual-standard for the same kind of SQL votes in two weeks.

We always count all votes from all contributors
in order to keep the record of all comprehensive feedbacks.

Dongjoon.

On 2024/04/29 17:49:36 Mich Talebzadeh wrote:
> Your point
> 
> ".. t's a surprise to me to see that someone has different positions in a
> very short period of time in the community"
> 
> Well, I have  been with Spark since 2015 and this is the article in the
> medium dated February 7, 2016 with regard to both Hive and Spark and also
> presented in Hortonworks meet-up.
> 
> Hive on Spark Engine Versus Spark Using Hive Metastore
> <https://www.linkedin.com/pulse/hive-spark-engine-versus-using-metastore-mich-talebzadeh-ph-d-/>
> 
> With regard to why I castred +1 votre for one and -1 for the other, I think
> it is my prerogative how  I vote and we leave it at that.,
> 
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> London
> United Kingdom
> 
> 
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
> 
> 
>  https://en.everybodywiki.com/Mich_Talebzadeh
> 
> 
> 
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
> 
> 
> On Mon, 29 Apr 2024 at 17:32, Dongjoon Hyun  wrote:
> 
> > It's a surprise to me to see that someone has different positions
> > in a very short period of time in the community.
> >
> > Mitch casted +1 for SPARK-4 and -1 for SPARK-46122.
> > - https://lists.apache.org/thread/4cbkpvc3vr3b6k0wp6lgsw37spdpnqrc
> > - https://lists.apache.org/thread/x09gynt90v3hh5sql1gt9dlcn6m6699p
> >
> > To Mitch, what I'm interested in is the following specifically.
> > > 2. Compatibility: Changing the default behavior could potentially
> > >  break existing workflows or pipelines that rely on the current behavior.
> >
> > May I ask you the following questions?
> > A. What is the purpose of the migration guide in the ASF projects?
> >
> > B. Do you claim that there is incompatibility when you have
> >  spark.sql.legacy.createHiveTableByDefault=true which is described
> >  in the migration guide?
> >
> > C. Do you know that ANSI SQL has new RUNTIME exceptions
> >  which are harder than SPARK-46122?
> >
> > D. Or, did you cast +1 for SPARK-4 because
> >  you think there is no breaking change by default?
> >
> > I guess there is some misunderstanding on the proposal.
> >
> > Thanks,
> > Dongjoon.
> >
> >
> > On Fri, Apr 26, 2024 at 12:05 PM Mich Talebzadeh <
> > mich.talebza...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I would like to add a side note regarding the discussion process and the
> >> current title of the proposal. The title '[DISCUSS] SPARK-46122: Set
> >> spark.sql.legacy.createHiveTableByDefault to false' focuses on a specific
> >> configuration parameter, which might lead some participants to overlook its
> >> broader implications (as was raised by myself and others). I believe that a
> >> more descriptive title, encompassing the broader discussion on default
> >> behaviours for creating Hive tables in Spark SQL, could enable greater
> >> engagement within the community. This is an important topic that deserves
> >> thorough consideration.
> >>
> >> HTH
> >>
> >> Mich Talebzadeh,
> >> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> >> London
> >> United Kingdom
> >>
> >>
> >>view my Linkedin profile
> >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
> >>
> >>
> >>  https://en.everybodywiki.com/Mich_Talebzadeh
> >>
> >>
> >>
> >> *Disclaimer:* The information provided is correct to the best of my
> >> knowledge but of course cannot be guaranteed . It is essential to note
> >> that, as with any advice, quote &quo

[jira] [Resolved] (SPARK-48042) Don't use a copy of timestamp formatter with a new override zone for each value

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48042.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46282
[https://github.com/apache/spark/pull/46282]

> Don't use a copy of timestamp formatter with a new override zone for each 
> value
> ---
>
> Key: SPARK-48042
> URL: https://issues.apache.org/jira/browse/SPARK-48042
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48044) Cache `DataFrame.isStreaming`

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48044.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46281
[https://github.com/apache/spark/pull/46281]

> Cache `DataFrame.isStreaming`
> -
>
> Key: SPARK-48044
> URL: https://issues.apache.org/jira/browse/SPARK-48044
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48046) Remove `clock` parameter from `DriverServiceFeatureStep`

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48046.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46284
[https://github.com/apache/spark/pull/46284]

> Remove `clock` parameter from `DriverServiceFeatureStep`
> 
>
> Key: SPARK-48046
> URL: https://issues.apache.org/jira/browse/SPARK-48046
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48046) Remove `clock` parameter from `DriverServiceFeatureStep`

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48046:
-

Assignee: Dongjoon Hyun

> Remove `clock` parameter from `DriverServiceFeatureStep`
> 
>
> Key: SPARK-48046
> URL: https://issues.apache.org/jira/browse/SPARK-48046
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-29 Thread Dongjoon Hyun
t;>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> בתאריך יום ה׳, 25 באפר׳ 2024, 16:14, מאת Mich Talebzadeh ‏<
>>>>>>>>>> mich.talebza...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> My take regarding your question is that your mileage varies so
>>>>>>>>>>> to speak.
>>>>>>>>>>>
>>>>>>>>>>> 1) Hive provides a more mature and widely adopted catalog
>>>>>>>>>>> solution that integrates well with other components in the Hadoop
>>>>>>>>>>> ecosystem, such as HDFS, HBase, and YARN. IIf you are Hadoop 
>>>>>>>>>>> centric S(say
>>>>>>>>>>> on-premise), using Hive may offer better compatibility and
>>>>>>>>>>> interoperability.
>>>>>>>>>>> 2) Hive provides a SQL-like interface that is familiar to users
>>>>>>>>>>> who are accustomed to traditional RDBMs. If your use case involves 
>>>>>>>>>>> complex
>>>>>>>>>>> SQL queries or existing SQL-based workflows, using Hive may be 
>>>>>>>>>>> advantageous.
>>>>>>>>>>> 3) If you are looking for performance, spark's native catalog
>>>>>>>>>>> tends to offer better performance for certain workloads, 
>>>>>>>>>>> particularly those
>>>>>>>>>>> that involve iterative processing or complex data 
>>>>>>>>>>> transformations.(my
>>>>>>>>>>> understanding). Spark's in-memory processing capabilities and 
>>>>>>>>>>> optimizations
>>>>>>>>>>> make it well-suited for interactive analytics and machine learning
>>>>>>>>>>> tasks.(my favourite)
>>>>>>>>>>> 4) Integration with Spark Workflows: If you primarily use Spark
>>>>>>>>>>> for data processing and analytics, using Spark's native catalog may
>>>>>>>>>>> simplify workflow management and reduce overhead, Spark's  tight
>>>>>>>>>>> integration with its catalog allows for seamless interaction with 
>>>>>>>>>>> Spark
>>>>>>>>>>> applications and libraries.
>>>>>>>>>>> 5) There seems to be some similarity with spark catalog and
>>>>>>>>>>> Databricks unity catalog, so that may favour the choice.
>>>>>>>>>>>
>>>>>>>>>>> HTH
>>>>>>>>>>>
>>>>>>>>>>> Mich Talebzadeh,
>>>>>>>>>>> Technologist | Architect | Data Engineer  | Generative AI |
>>>>>>>>>>> FinCrime
>>>>>>>>>>> London
>>>>>>>>>>> United Kingdom
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>view my Linkedin profile
>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Disclaimer:* The information provided is correct to the best
>>>>>>>>>>> of my knowledge but of course cannot be guaranteed . It is 
>>>>>>>>>>> essential to
>>>>>>>>>>> note that, as with any advice, quote "one test result is worth
>>>>>>>>>>> one-thousand expert opinions (Werner
>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 25 Apr 2024 at 12:30, Nimrod Ofek 
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I will also appreciate some material that describes the
>>>>>>>>>>>> differences between Spark native tables vs hive tables and why 
>>>>>>>>>>>> each should
>>>>>>>>>>>> be used...
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Nimrod
>>>>>>>>>>>>
>>>>>>>>>>>> בתאריך יום ה׳, 25 באפר׳ 2024, 14:27, מאת Mich Talebzadeh ‏<
>>>>>>>>>>>> mich.talebza...@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> I see a statement made as below  and I quote
>>>>>>>>>>>>>
>>>>>>>>>>>>> "The proposal of SPARK-46122 is to switch the default value of
>>>>>>>>>>>>> this
>>>>>>>>>>>>> configuration from `true` to `false` to use Spark native
>>>>>>>>>>>>> tables because
>>>>>>>>>>>>> we support better."
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you please elaborate on the above specifically with regard
>>>>>>>>>>>>> to the phrase ".. because
>>>>>>>>>>>>> we support better."
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are you referring to the performance of Spark catalog (I
>>>>>>>>>>>>> believe it is internal) or integration with Spark?
>>>>>>>>>>>>>
>>>>>>>>>>>>> HTH
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mich Talebzadeh,
>>>>>>>>>>>>> Technologist | Architect | Data Engineer  | Generative AI |
>>>>>>>>>>>>> FinCrime
>>>>>>>>>>>>> London
>>>>>>>>>>>>> United Kingdom
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>view my Linkedin profile
>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Disclaimer:* The information provided is correct to the best
>>>>>>>>>>>>> of my knowledge but of course cannot be guaranteed . It is 
>>>>>>>>>>>>> essential to
>>>>>>>>>>>>> note that, as with any advice, quote "one test result is
>>>>>>>>>>>>> worth one-thousand expert opinions (Werner
>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, 25 Apr 2024 at 11:17, Wenchen Fan 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Apr 25, 2024 at 2:46 PM Kent Yao 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Nit: the umbrella ticket is SPARK-44111, not SPARK-4.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Kent Yao
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dongjoon Hyun  于2024年4月25日周四
>>>>>>>>>>>>>>> 14:39写道:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Hi, All.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > It's great to see community activities to polish 4.0.0
>>>>>>>>>>>>>>> more and more.
>>>>>>>>>>>>>>> > Thank you all.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > I'd like to bring SPARK-46122 (another SQL topic) to you
>>>>>>>>>>>>>>> from the subtasks
>>>>>>>>>>>>>>> > of SPARK-4 (Prepare Apache Spark 4.0.0),
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > - https://issues.apache.org/jira/browse/SPARK-46122
>>>>>>>>>>>>>>> >Set `spark.sql.legacy.createHiveTableByDefault` to
>>>>>>>>>>>>>>> `false` by default
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > This legacy configuration is about `CREATE TABLE` SQL
>>>>>>>>>>>>>>> syntax without
>>>>>>>>>>>>>>> > `USING` and `STORED AS`, which is currently mapped to
>>>>>>>>>>>>>>> `Hive` table.
>>>>>>>>>>>>>>> > The proposal of SPARK-46122 is to switch the default value
>>>>>>>>>>>>>>> of this
>>>>>>>>>>>>>>> > configuration from `true` to `false` to use Spark native
>>>>>>>>>>>>>>> tables because
>>>>>>>>>>>>>>> > we support better.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > In other words, Spark will use the value of
>>>>>>>>>>>>>>> `spark.sql.sources.default`
>>>>>>>>>>>>>>> > as the table provider instead of `Hive` like the other
>>>>>>>>>>>>>>> Spark APIs. Of course,
>>>>>>>>>>>>>>> > the users can get all the legacy behavior by setting back
>>>>>>>>>>>>>>> to `true`.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Historically, this behavior change was merged once at
>>>>>>>>>>>>>>> Apache Spark 3.0.0
>>>>>>>>>>>>>>> > preparation via SPARK-30098 already, but reverted during
>>>>>>>>>>>>>>> the 3.0.0 RC period.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > 2019-12-06: SPARK-30098 Use default datasource as provider
>>>>>>>>>>>>>>> for CREATE TABLE
>>>>>>>>>>>>>>> > 2020-05-16: SPARK-31707 Revert SPARK-30098 Use default
>>>>>>>>>>>>>>> datasource as
>>>>>>>>>>>>>>> > provider for CREATE TABLE command
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > At Apache Spark 3.1.0, we had another discussion about
>>>>>>>>>>>>>>> this and defined it
>>>>>>>>>>>>>>> > as one of legacy behavior via this configuration via
>>>>>>>>>>>>>>> reused ID, SPARK-30098.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > 2020-12-01:
>>>>>>>>>>>>>>> https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204
>>>>>>>>>>>>>>> > 2020-12-03: SPARK-30098 Add a configuration to use default
>>>>>>>>>>>>>>> datasource as
>>>>>>>>>>>>>>> > provider for CREATE TABLE command
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Last year, we received two additional requests twice to
>>>>>>>>>>>>>>> switch this because
>>>>>>>>>>>>>>> > Apache Spark 4.0.0 is a good time to make a decision for
>>>>>>>>>>>>>>> the future direction.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > 2023-02-27: SPARK-42603 as an independent idea.
>>>>>>>>>>>>>>> > 2023-11-27: SPARK-46122 as a part of Apache Spark 4.0.0
>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > WDYT? The technical scope is defined in the following PR
>>>>>>>>>>>>>>> which is one line of main
>>>>>>>>>>>>>>> > code, one line of migration guide, and a few lines of test
>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > - https://github.com/apache/spark/pull/46207
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Dongjoon.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>


[jira] [Created] (SPARK-48046) Remove `clock` parameter from `DriverServiceFeatureStep`

2024-04-29 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48046:
-

 Summary: Remove `clock` parameter from `DriverServiceFeatureStep`
 Key: SPARK-48046
 URL: https://issues.apache.org/jira/browse/SPARK-48046
 Project: Spark
  Issue Type: Task
  Components: Kubernetes
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48038) Promote driverServiceName to KubernetesDriverConf

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48038:
-

Assignee: Cheng Pan

> Promote driverServiceName to KubernetesDriverConf
> -
>
> Key: SPARK-48038
> URL: https://issues.apache.org/jira/browse/SPARK-48038
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48038) Promote driverServiceName to KubernetesDriverConf

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48038.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46276
[https://github.com/apache/spark/pull/46276]

> Promote driverServiceName to KubernetesDriverConf
> -
>
> Key: SPARK-48038
> URL: https://issues.apache.org/jira/browse/SPARK-48038
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48036) Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md`

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48036.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46271
[https://github.com/apache/spark/pull/46271]

> Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md`
> ---
>
> Key: SPARK-48036
> URL: https://issues.apache.org/jira/browse/SPARK-48036
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48029) Update the packages name removed in building the spark docker image

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48029.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46258
[https://github.com/apache/spark/pull/46258]

> Update the packages name removed in building the spark docker image
> ---
>
> Key: SPARK-48029
> URL: https://issues.apache.org/jira/browse/SPARK-48029
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48029) Update the packages name removed in building the spark docker image

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48029:
-

Assignee: BingKun Pan

> Update the packages name removed in building the spark docker image
> ---
>
> Key: SPARK-48029
> URL: https://issues.apache.org/jira/browse/SPARK-48029
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48036) Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md`

2024-04-28 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48036:
-

 Summary: Update `sql-ref-ansi-compliance.md` and 
`sql-ref-identifier.md`
 Key: SPARK-48036
 URL: https://issues.apache.org/jira/browse/SPARK-48036
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48032) Upgrade `commons-codec` to 1.17.0

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48032.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46268
[https://github.com/apache/spark/pull/46268]

> Upgrade `commons-codec` to 1.17.0
> -
>
> Key: SPARK-48032
> URL: https://issues.apache.org/jira/browse/SPARK-48032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47730) Support APP_ID and EXECUTOR_ID placeholder in labels

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47730:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Improvement)

> Support APP_ID and EXECUTOR_ID placeholder in labels
> 
>
> Key: SPARK-47730
> URL: https://issues.apache.org/jira/browse/SPARK-47730
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: 3.5.1
>Reporter: Xi Chen
>Assignee: Xi Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47730) Support APP_ID and EXECUTOR_ID placeholder in labels

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47730.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46149
[https://github.com/apache/spark/pull/46149]

> Support APP_ID and EXECUTOR_ID placeholder in labels
> 
>
> Key: SPARK-47730
> URL: https://issues.apache.org/jira/browse/SPARK-47730
> Project: Spark
>  Issue Type: Improvement
>  Components: k8s
>Affects Versions: 3.5.1
>Reporter: Xi Chen
>Assignee: Xi Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47730) Support APP_ID and EXECUTOR_ID placeholder in labels

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47730:
-

Assignee: Xi Chen

> Support APP_ID and EXECUTOR_ID placeholder in labels
> 
>
> Key: SPARK-47730
> URL: https://issues.apache.org/jira/browse/SPARK-47730
> Project: Spark
>  Issue Type: Improvement
>  Components: k8s
>Affects Versions: 3.5.1
>Reporter: Xi Chen
>Assignee: Xi Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48021) Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`

2024-04-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48021:
-

Assignee: BingKun Pan

> Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`
> ---
>
> Key: SPARK-48021
> URL: https://issues.apache.org/jira/browse/SPARK-48021
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48021) Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`

2024-04-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48021.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46246
[https://github.com/apache/spark/pull/46246]

> Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`
> ---
>
> Key: SPARK-48021
> URL: https://issues.apache.org/jira/browse/SPARK-48021
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47408) Fix mathExpressions that use StringType

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47408.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46227
[https://github.com/apache/spark/pull/46227]

> Fix mathExpressions that use StringType
> ---
>
> Key: SPARK-47408
> URL: https://issues.apache.org/jira/browse/SPARK-47408
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47943) Add Operator CI Task for Java Build and Test

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47943:
--
Fix Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Add Operator CI Task for Java Build and Test
> 
>
> Key: SPARK-47943
> URL: https://issues.apache.org/jira/browse/SPARK-47943
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>
> We need to add CI task to build and test Java code for upcoming operator pull 
> requests. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48015) Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48015:
-

Assignee: Dongjoon Hyun

> Update `build.gradle` to fix deprecation warnings
> -
>
> Key: SPARK-48015
> URL: https://issues.apache.org/jira/browse/SPARK-48015
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48015) Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48015.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9
[https://github.com/apache/spark-kubernetes-operator/pull/9]

> Update `build.gradle` to fix deprecation warnings
> -
>
> Key: SPARK-48015
> URL: https://issues.apache.org/jira/browse/SPARK-48015
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47929) Setup Static Analysis for Operator

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47929:
--
Fix Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Setup Static Analysis for Operator
> --
>
> Key: SPARK-47929
> URL: https://issues.apache.org/jira/browse/SPARK-47929
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>
> Add common analysis tasks including checkstyle, spotbugs, jacoco. Also 
> include spotless for style fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47950) Add Java API Module for Spark Operator

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47950:
--
Fix Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Add Java API Module for Spark Operator
> --
>
> Key: SPARK-47950
> URL: https://issues.apache.org/jira/browse/SPARK-47950
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>
> Spark Operator API refers to the 
> [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/]
>  __ that represents the spec for Spark Application in k8s.
> This aims to add Java API library for Spark Operator, with the ability to 
> generate yaml spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48015) Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48015:
--
Fix Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Update `build.gradle` to fix deprecation warnings
> -
>
> Key: SPARK-48015
> URL: https://issues.apache.org/jira/browse/SPARK-48015
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48015) Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48015:
-

 Summary: Update `build.gradle` to fix deprecation warnings
 Key: SPARK-48015
 URL: https://issues.apache.org/jira/browse/SPARK-48015
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Kubernetes
Affects Versions: kubernetes-operator-0.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47950) Add Java API Module for Spark Operator

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47950:
-

Assignee: Zhou JIANG

> Add Java API Module for Spark Operator
> --
>
> Key: SPARK-47950
> URL: https://issues.apache.org/jira/browse/SPARK-47950
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>
> Spark Operator API refers to the 
> [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/]
>  __ that represents the spec for Spark Application in k8s.
> This aims to add Java API library for Spark Operator, with the ability to 
> generate yaml spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47950) Add Java API Module for Spark Operator

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47950.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 8
[https://github.com/apache/spark-kubernetes-operator/pull/8]

> Add Java API Module for Spark Operator
> --
>
> Key: SPARK-47950
> URL: https://issues.apache.org/jira/browse/SPARK-47950
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Spark Operator API refers to the 
> [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/]
>  __ that represents the spec for Spark Application in k8s.
> This aims to add Java API library for Spark Operator, with the ability to 
> generate yaml spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48010) Avoid repeated calls to conf.resolver in resolveExpression

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48010.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46248
[https://github.com/apache/spark/pull/46248]

> Avoid repeated calls to conf.resolver in resolveExpression
> --
>
> Key: SPARK-48010
> URL: https://issues.apache.org/jira/browse/SPARK-48010
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Nikhil Sheoran
>Assignee: Nikhil Sheoran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Consider a view with a large number of columns (~1000s). When resolving this 
> view, looking at the flamegraph, observed repeated initializations of `conf` 
> to obtain the `resolver` for each column of the view.
> This can be easily optimized to reuse the same resolver (obtained once) for 
> the various calls to `innerResolve` in `resolveExpression`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46122:
-

Assignee: Dongjoon Hyun

> Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
> -
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>    Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48005) Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48005.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46242
[https://github.com/apache/spark/pull/46242]

> Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`
> -
>
> Key: SPARK-48005
> URL: https://issues.apache.org/jira/browse/SPARK-48005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Dongjoon Hyun
I'll start with my +1.

Dongjoon.

On 2024/04/26 16:45:51 Dongjoon Hyun wrote:
> Please vote on SPARK-46122 to set spark.sql.legacy.createHiveTableByDefault
> to `false` by default. The technical scope is defined in the following PR.
> 
> - DISCUSSION:
> https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
> - JIRA: https://issues.apache.org/jira/browse/SPARK-46122
> - PR: https://github.com/apache/spark/pull/46207
> 
> The vote is open until April 30th 1AM (PST) and passes
> if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> 
> [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default
> [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because ...
> 
> Thank you in advance.
> 
> Dongjoon
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Dongjoon Hyun
Please vote on SPARK-46122 to set spark.sql.legacy.createHiveTableByDefault
to `false` by default. The technical scope is defined in the following PR.

- DISCUSSION:
https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd
- JIRA: https://issues.apache.org/jira/browse/SPARK-46122
- PR: https://github.com/apache/spark/pull/46207

The vote is open until April 30th 1AM (PST) and passes
if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false by default
[ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault because ...

Thank you in advance.

Dongjoon


Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Dongjoon Hyun
com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>
>>>>>>>>
>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* The information provided is correct to the best of
>>>>>>>> my knowledge but of course cannot be guaranteed . It is essential to 
>>>>>>>> note
>>>>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>>>>> expert opinions (Werner
>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, 25 Apr 2024 at 14:38, Nimrod Ofek 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for the detailed answer.
>>>>>>>>> The thing I'm missing is this: let's say that the output format I
>>>>>>>>> choose is delta lake or iceberg or whatever format that uses parquet. 
>>>>>>>>> Where
>>>>>>>>> does the catalog implementation (which holds metadata afaik, same 
>>>>>>>>> metadata
>>>>>>>>> that iceberg and delta lake save for their tables about their columns)
>>>>>>>>> comes into play and why should it affect performance?
>>>>>>>>> Another thing is that if I understand correctly, and I might be
>>>>>>>>> totally wrong here, the internal spark catalog is a local 
>>>>>>>>> installation of
>>>>>>>>> hive metastore anyway, so I'm not sure what the catalog has to do with
>>>>>>>>> anything.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> בתאריך יום ה׳, 25 באפר׳ 2024, 16:14, מאת Mich Talebzadeh ‏<
>>>>>>>>> mich.talebza...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> My take regarding your question is that your mileage varies so to
>>>>>>>>>> speak.
>>>>>>>>>>
>>>>>>>>>> 1) Hive provides a more mature and widely adopted catalog
>>>>>>>>>> solution that integrates well with other components in the Hadoop
>>>>>>>>>> ecosystem, such as HDFS, HBase, and YARN. IIf you are Hadoop centric 
>>>>>>>>>> S(say
>>>>>>>>>> on-premise), using Hive may offer better compatibility and
>>>>>>>>>> interoperability.
>>>>>>>>>> 2) Hive provides a SQL-like interface that is familiar to users
>>>>>>>>>> who are accustomed to traditional RDBMs. If your use case involves 
>>>>>>>>>> complex
>>>>>>>>>> SQL queries or existing SQL-based workflows, using Hive may be 
>>>>>>>>>> advantageous.
>>>>>>>>>> 3) If you are looking for performance, spark's native catalog
>>>>>>>>>> tends to offer better performance for certain workloads, 
>>>>>>>>>> particularly those
>>>>>>>>>> that involve iterative processing or complex data transformations.(my
>>>>>>>>>> understanding). Spark's in-memory processing capabilities and 
>>>>>>>>>> optimizations
>>>>>>>>>> make it well-suited for interactive analytics and machine learning
>>>>>>>>>> tasks.(my favourite)
>>>>>>>>>> 4) Integration with Spark Workflows: If you primarily use Spark
>>>>>>>>>> for data processing and analytics, using Spark's native catalog may
>>>>>>>>>> simplify workflow management and reduce overhead, Spark's  tight
>>>>>>>>>> integration with its catalog allows for seamless interaction with 
>>>>>>>>>> Spark
>>>>>>>>>> applications and libraries.
>>>>>>>>>> 5) There seems to be some similari

[jira] [Assigned] (ORC-1705) Upgrade `zstd-jni` to 1.5.6-3

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/ORC-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned ORC-1705:
--

Assignee: dzcxzl

> Upgrade `zstd-jni` to 1.5.6-3
> -
>
> Key: ORC-1705
> URL: https://issues.apache.org/jira/browse/ORC-1705
> Project: ORC
>  Issue Type: Improvement
>  Components: Java
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ORC-1705) Upgrade `zstd-jni` to 1.5.6-3

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/ORC-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved ORC-1705.

Fix Version/s: 2.0.1
   2.1.0
   Resolution: Fixed

Issue resolved by pull request 1914
[https://github.com/apache/orc/pull/1914]

> Upgrade `zstd-jni` to 1.5.6-3
> -
>
> Key: ORC-1705
> URL: https://issues.apache.org/jira/browse/ORC-1705
> Project: ORC
>  Issue Type: Improvement
>  Components: Java
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Major
> Fix For: 2.0.1, 2.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (SPARK-48007) MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48007.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46244
[https://github.com/apache/spark/pull/46244]

> MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11
> ---
>
> Key: SPARK-48007
> URL: https://issues.apache.org/jira/browse/SPARK-48007
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47991) Arrange the test cases for window frames and window functions.

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47991.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46226
[https://github.com/apache/spark/pull/46226]

> Arrange the test cases for window frames and window functions.
> --
>
> Key: SPARK-47991
> URL: https://issues.apache.org/jira/browse/SPARK-47991
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22231) Support of map, filter, withField, dropFields in nested list of structures

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840934#comment-17840934
 ] 

Dongjoon Hyun commented on SPARK-22231:
---

I removed the outdated target version from this issue.

> Support of map, filter, withField, dropFields in nested list of structures
> --
>
> Key: SPARK-22231
> URL: https://issues.apache.org/jira/browse/SPARK-22231
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: DB Tsai
>Priority: Major
>
> At Netflix's algorithm team, we work on ranking problems to find the great 
> content to fulfill the unique tastes of our members. Before building a 
> recommendation algorithms, we need to prepare the training, testing, and 
> validation datasets in Apache Spark. Due to the nature of ranking problems, 
> we have a nested list of items to be ranked in one column, and the top level 
> is the contexts describing the setting for where a model is to be used (e.g. 
> profiles, country, time, device, etc.)  Here is a blog post describing the 
> details, [Distributed Time Travel for Feature 
> Generation|https://medium.com/netflix-techblog/distributed-time-travel-for-feature-generation-389cccdd3907].
>  
> To be more concrete, for the ranks of videos for a given profile_id at a 
> given country, our data schema can be looked like this,
> {code:java}
> root
>  |-- profile_id: long (nullable = true)
>  |-- country_iso_code: string (nullable = true)
>  |-- items: array (nullable = false)
>  ||-- element: struct (containsNull = false)
>  |||-- title_id: integer (nullable = true)
>  |||-- scores: double (nullable = true)
> ...
> {code}
> We oftentimes need to work on the nested list of structs by applying some 
> functions on them. Sometimes, we're dropping or adding new columns in the 
> nested list of structs. Currently, there is no easy solution in open source 
> Apache Spark to perform those operations using SQL primitives; many people 
> just convert the data into RDD to work on the nested level of data, and then 
> reconstruct the new dataframe as workaround. This is extremely inefficient 
> because all the optimizations like predicate pushdown in SQL can not be 
> performed, we can not leverage on the columnar format, and the serialization 
> and deserialization cost becomes really huge even we just want to add a new 
> column in the nested level.
> We built a solution internally at Netflix which we're very happy with. We 
> plan to make it open source in Spark upstream. We would like to socialize the 
> API design to see if we miss any use-case.  
> The first API we added is *mapItems* on dataframe which take a function from 
> *Column* to *Column*, and then apply the function on nested dataframe. Here 
> is an example,
> {code:java}
> case class Data(foo: Int, bar: Double, items: Seq[Double])
> val df: Dataset[Data] = spark.createDataset(Seq(
>   Data(10, 10.0, Seq(10.1, 10.2, 10.3, 10.4)),
>   Data(20, 20.0, Seq(20.1, 20.2, 20.3, 20.4))
> ))
> val result = df.mapItems("items") {
>   item => item * 2.0
> }
> result.printSchema()
> // root
> // |-- foo: integer (nullable = false)
> // |-- bar: double (nullable = false)
> // |-- items: array (nullable = true)
> // ||-- element: double (containsNull = true)
> result.show()
> // +---+++
> // |foo| bar|   items|
> // +---+++
> // | 10|10.0|[20.2, 20.4, 20.6...|
> // | 20|20.0|[40.2, 40.4, 40.6...|
> // +---+++
> {code}
> Now, with the ability of applying a function in the nested dataframe, we can 
> add a new function, *withColumn* in *Column* to add or replace the existing 
> column that has the same name in the nested list of struct. Here is two 
> examples demonstrating the API together with *mapItems*; the first one 
> replaces the existing column,
> {code:java}
> case class Item(a: Int, b: Double)
> case class Data(foo: Int, bar: Double, items: Seq[Item])
> val df: Dataset[Data] = spark.createDataset(Seq(
>   Data(10, 10.0, Seq(Item(10, 10.0), Item(11, 11.0))),
>   Data(20, 20.0, Seq(Item(20, 20.0), Item(21, 21.0)))
> ))
> val result = df.mapItems("items") {
>   item => item.withColumn(item("b") + 1 as "b")
> }
> result.printSchema
> root
> // |-- foo: integer (nullable = false)
> // |-- bar: double (nullable = false)
> // |-- items: array (nullable = true)
> // ||-- element: struct (

[jira] [Updated] (SPARK-22231) Support of map, filter, withField, dropFields in nested list of structures

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-22231:
--
Target Version/s:   (was: 3.2.0)

> Support of map, filter, withField, dropFields in nested list of structures
> --
>
> Key: SPARK-22231
> URL: https://issues.apache.org/jira/browse/SPARK-22231
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: DB Tsai
>Priority: Major
>
> At Netflix's algorithm team, we work on ranking problems to find the great 
> content to fulfill the unique tastes of our members. Before building a 
> recommendation algorithms, we need to prepare the training, testing, and 
> validation datasets in Apache Spark. Due to the nature of ranking problems, 
> we have a nested list of items to be ranked in one column, and the top level 
> is the contexts describing the setting for where a model is to be used (e.g. 
> profiles, country, time, device, etc.)  Here is a blog post describing the 
> details, [Distributed Time Travel for Feature 
> Generation|https://medium.com/netflix-techblog/distributed-time-travel-for-feature-generation-389cccdd3907].
>  
> To be more concrete, for the ranks of videos for a given profile_id at a 
> given country, our data schema can be looked like this,
> {code:java}
> root
>  |-- profile_id: long (nullable = true)
>  |-- country_iso_code: string (nullable = true)
>  |-- items: array (nullable = false)
>  ||-- element: struct (containsNull = false)
>  |||-- title_id: integer (nullable = true)
>  |||-- scores: double (nullable = true)
> ...
> {code}
> We oftentimes need to work on the nested list of structs by applying some 
> functions on them. Sometimes, we're dropping or adding new columns in the 
> nested list of structs. Currently, there is no easy solution in open source 
> Apache Spark to perform those operations using SQL primitives; many people 
> just convert the data into RDD to work on the nested level of data, and then 
> reconstruct the new dataframe as workaround. This is extremely inefficient 
> because all the optimizations like predicate pushdown in SQL can not be 
> performed, we can not leverage on the columnar format, and the serialization 
> and deserialization cost becomes really huge even we just want to add a new 
> column in the nested level.
> We built a solution internally at Netflix which we're very happy with. We 
> plan to make it open source in Spark upstream. We would like to socialize the 
> API design to see if we miss any use-case.  
> The first API we added is *mapItems* on dataframe which take a function from 
> *Column* to *Column*, and then apply the function on nested dataframe. Here 
> is an example,
> {code:java}
> case class Data(foo: Int, bar: Double, items: Seq[Double])
> val df: Dataset[Data] = spark.createDataset(Seq(
>   Data(10, 10.0, Seq(10.1, 10.2, 10.3, 10.4)),
>   Data(20, 20.0, Seq(20.1, 20.2, 20.3, 20.4))
> ))
> val result = df.mapItems("items") {
>   item => item * 2.0
> }
> result.printSchema()
> // root
> // |-- foo: integer (nullable = false)
> // |-- bar: double (nullable = false)
> // |-- items: array (nullable = true)
> // ||-- element: double (containsNull = true)
> result.show()
> // +---+++
> // |foo| bar|   items|
> // +---+++
> // | 10|10.0|[20.2, 20.4, 20.6...|
> // | 20|20.0|[40.2, 40.4, 40.6...|
> // +---+++
> {code}
> Now, with the ability of applying a function in the nested dataframe, we can 
> add a new function, *withColumn* in *Column* to add or replace the existing 
> column that has the same name in the nested list of struct. Here is two 
> examples demonstrating the API together with *mapItems*; the first one 
> replaces the existing column,
> {code:java}
> case class Item(a: Int, b: Double)
> case class Data(foo: Int, bar: Double, items: Seq[Item])
> val df: Dataset[Data] = spark.createDataset(Seq(
>   Data(10, 10.0, Seq(Item(10, 10.0), Item(11, 11.0))),
>   Data(20, 20.0, Seq(Item(20, 20.0), Item(21, 21.0)))
> ))
> val result = df.mapItems("items") {
>   item => item.withColumn(item("b") + 1 as "b")
> }
> result.printSchema
> root
> // |-- foo: integer (nullable = false)
> // |-- bar: double (nullable = false)
> // |-- items: array (nullable = true)
> // ||-- element: struct (containsNull = true)
> // |||-- a: integer (nullable = tr

[jira] [Updated] (SPARK-24941) Add RDDBarrier.coalesce() function

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24941:
--
Target Version/s:   (was: 3.2.0)

> Add RDDBarrier.coalesce() function
> --
>
> Key: SPARK-24941
> URL: https://issues.apache.org/jira/browse/SPARK-24941
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Xingbo Jiang
>Priority: Major
>
> https://github.com/apache/spark/pull/21758#discussion_r204917245
> The number of partitions from the input data can be unexpectedly large, eg. 
> if you do
> {code}
> sc.textFile(...).barrier().mapPartitions()
> {code}
> The number of input partitions is based on the hdfs input splits. We shall 
> provide a way in RDDBarrier to enable users to specify the number of tasks in 
> a barrier stage. Maybe something like RDDBarrier.coalesce(numPartitions: Int) 
> .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25383) Image data source supports sample pushdown

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25383:
--
Target Version/s:   (was: 3.2.0)

> Image data source supports sample pushdown
> --
>
> Key: SPARK-25383
> URL: https://issues.apache.org/jira/browse/SPARK-25383
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SQL
>Affects Versions: 3.1.0
>Reporter: Xiangrui Meng
>Priority: Major
>
> After SPARK-25349, we should update image data source to support sampling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25752) Add trait to easily whitelist logical operators that produce named output from CleanupAliases

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25752:
--
Target Version/s:   (was: 3.2.0)

> Add trait to easily whitelist logical operators that produce named output 
> from CleanupAliases
> -
>
> Key: SPARK-25752
> URL: https://issues.apache.org/jira/browse/SPARK-25752
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Minor
>
> The rule `CleanupAliases` cleans up aliases from logical operators that do 
> not match a whitelist. This whitelist is hardcoded inside the rule which is 
> cumbersome. This PR is to clean that up by making a trait `HasNamedOutput` 
> that will be ignored by `CleanupAliases` and other ops that require aliases 
> to be preserved in the operator should extend it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28629) Capture the missing rules in HiveSessionStateBuilder

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840928#comment-17840928
 ] 

Dongjoon Hyun commented on SPARK-28629:
---

I removed the outdated target version from this issue.

> Capture the missing rules in HiveSessionStateBuilder
> 
>
> Key: SPARK-28629
> URL: https://issues.apache.org/jira/browse/SPARK-28629
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Priority: Major
>
> A general mistake for new contributors is to forget adding the corresponding 
> rules into the extended extendedResolutionRules, postHocResolutionRules, 
> extendedCheckRules in HiveSessionStateBuilder. We need to avoid missing the 
> rules or capture them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840930#comment-17840930
 ] 

Dongjoon Hyun commented on SPARK-27780:
---

I removed the outdated target version from this issue.

> Shuffle server & client should be versioned to enable smoother upgrade
> --
>
> Key: SPARK-27780
> URL: https://issues.apache.org/jira/browse/SPARK-27780
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Imran Rashid
>Priority: Major
>
> The external shuffle service is often upgraded at a different time than spark 
> itself.  However, this causes problems when the protocol changes between the 
> shuffle service and the spark runtime -- this forces users to upgrade 
> everything simultaneously.
> We should add versioning to the shuffle client & server, so they know what 
> messages the other will support.  This would allow better handling of mixed 
> versions, from better error msgs to allowing some mismatched versions (with 
> reduced capabilities).
> This originally came up in a discussion here: 
> https://github.com/apache/spark/pull/24565#issuecomment-493496466
> There are a few ways we could do the versioning which we still need to 
> discuss:
> 1) Version specified by config.  This allows for mixed versions across the 
> cluster and rolling upgrades.  It also will let a spark 3.0 client talk to a 
> 2.4 shuffle service.  But, may be a nuisance for users to get this right.
> 2) Auto-detection during registration with local shuffle service.  This makes 
> the versioning easy for the end user, and can even handle a 2.4 shuffle 
> service though it does not support the new versioning.  However, it will not 
> handle a rolling upgrade correctly -- if the local shuffle service has been 
> upgraded, but other nodes in the cluster have not, it will get the version 
> wrong.
> 3) Exchange versions per-connection.  When a connection is opened, the server 
> & client could first exchange messages with their versions, so they know how 
> to continue communication after that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28629) Capture the missing rules in HiveSessionStateBuilder

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28629:
--
Target Version/s:   (was: 3.2.0)

> Capture the missing rules in HiveSessionStateBuilder
> 
>
> Key: SPARK-28629
> URL: https://issues.apache.org/jira/browse/SPARK-28629
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Priority: Major
>
> A general mistake for new contributors is to forget adding the corresponding 
> rules into the extended extendedResolutionRules, postHocResolutionRules, 
> extendedCheckRules in HiveSessionStateBuilder. We need to avoid missing the 
> rules or capture them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27780:
--
Target Version/s:   (was: 3.2.0)

> Shuffle server & client should be versioned to enable smoother upgrade
> --
>
> Key: SPARK-27780
> URL: https://issues.apache.org/jira/browse/SPARK-27780
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Imran Rashid
>Priority: Major
>
> The external shuffle service is often upgraded at a different time than spark 
> itself.  However, this causes problems when the protocol changes between the 
> shuffle service and the spark runtime -- this forces users to upgrade 
> everything simultaneously.
> We should add versioning to the shuffle client & server, so they know what 
> messages the other will support.  This would allow better handling of mixed 
> versions, from better error msgs to allowing some mismatched versions (with 
> reduced capabilities).
> This originally came up in a discussion here: 
> https://github.com/apache/spark/pull/24565#issuecomment-493496466
> There are a few ways we could do the versioning which we still need to 
> discuss:
> 1) Version specified by config.  This allows for mixed versions across the 
> cluster and rolling upgrades.  It also will let a spark 3.0 client talk to a 
> 2.4 shuffle service.  But, may be a nuisance for users to get this right.
> 2) Auto-detection during registration with local shuffle service.  This makes 
> the versioning easy for the end user, and can even handle a 2.4 shuffle 
> service though it does not support the new versioning.  However, it will not 
> handle a rolling upgrade correctly -- if the local shuffle service has been 
> upgraded, but other nodes in the cluster have not, it will get the version 
> wrong.
> 3) Exchange versions per-connection.  When a connection is opened, the server 
> & client could first exchange messages with their versions, so they know how 
> to continue communication after that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30324) Simplify API for JSON access in DataFrames/SQL

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840927#comment-17840927
 ] 

Dongjoon Hyun commented on SPARK-30324:
---

I removed the outdated target version from this issue.

> Simplify API for JSON access in DataFrames/SQL
> --
>
> Key: SPARK-30324
> URL: https://issues.apache.org/jira/browse/SPARK-30324
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Burak Yavuz
>Priority: Major
>
> get_json_object() is a UDF to parse JSON fields. It is verbose and hard to 
> use, e.g. I wasn't expecting the path to a field to have to start with "$.". 
> We can simplify all of this when a column is of StringType, and a nested 
> field is requested. This API sugar will in the query planner be rewritten as 
> get_json_object.
> This nested access can then be extended in the future to other 
> semi-structured formats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30324) Simplify API for JSON access in DataFrames/SQL

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30324:
--
Target Version/s:   (was: 3.2.0)

> Simplify API for JSON access in DataFrames/SQL
> --
>
> Key: SPARK-30324
> URL: https://issues.apache.org/jira/browse/SPARK-30324
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Burak Yavuz
>Priority: Major
>
> get_json_object() is a UDF to parse JSON fields. It is verbose and hard to 
> use, e.g. I wasn't expecting the path to a field to have to start with "$.". 
> We can simplify all of this when a column is of StringType, and a nested 
> field is requested. This API sugar will in the query planner be rewritten as 
> get_json_object.
> This nested access can then be extended in the future to other 
> semi-structured formats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30334) Add metadata around semi-structured columns to Spark

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30334:
--
Target Version/s:   (was: 3.2.0)

> Add metadata around semi-structured columns to Spark
> 
>
> Key: SPARK-30334
> URL: https://issues.apache.org/jira/browse/SPARK-30334
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Burak Yavuz
>Priority: Major
>
> Semi-structured data is used widely in the data industry for reporting events 
> in a wide variety of formats. Click events in product analytics can be stored 
> as json. Some application logs can be in the form of delimited key=value 
> text. Some data may be in xml.
> The goal of this project is to be able to signal Spark that such a column 
> exists. This will then enable Spark to "auto-parse" these columns on the fly. 
> The proposal is to store this information as part of the column metadata, in 
> the fields:
>  - format: The format of the semi-structured column, e.g. json, xml, avro
>  - options: Options for parsing these columns
> Then imagine having the following data:
> {code:java}
> ++---++
> | ts | event |raw |
> ++---++
> | 2019-10-12 | click | {"field":"value"}  |
> ++---++ {code}
> SELECT raw.field FROM data
> will return "value"
> or the following data
> {code:java}
> ++---+--+
> | ts | event | raw  |
> ++---+--+
> | 2019-10-12 | click | field1=v1|field2=v2  |
> ++---+--+ {code}
> SELECT raw.field1 FROM data
> will return v1.
>  
> As a first step, we will introduce the function "as_json", which accomplishes 
> this for JSON columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30334) Add metadata around semi-structured columns to Spark

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840926#comment-17840926
 ] 

Dongjoon Hyun commented on SPARK-30334:
---

I removed the outdated target version from this issue.

> Add metadata around semi-structured columns to Spark
> 
>
> Key: SPARK-30334
> URL: https://issues.apache.org/jira/browse/SPARK-30334
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Burak Yavuz
>Priority: Major
>
> Semi-structured data is used widely in the data industry for reporting events 
> in a wide variety of formats. Click events in product analytics can be stored 
> as json. Some application logs can be in the form of delimited key=value 
> text. Some data may be in xml.
> The goal of this project is to be able to signal Spark that such a column 
> exists. This will then enable Spark to "auto-parse" these columns on the fly. 
> The proposal is to store this information as part of the column metadata, in 
> the fields:
>  - format: The format of the semi-structured column, e.g. json, xml, avro
>  - options: Options for parsing these columns
> Then imagine having the following data:
> {code:java}
> ++---++
> | ts | event |raw |
> ++---++
> | 2019-10-12 | click | {"field":"value"}  |
> ++---++ {code}
> SELECT raw.field FROM data
> will return "value"
> or the following data
> {code:java}
> ++---+--+
> | ts | event | raw  |
> ++---+--+
> | 2019-10-12 | click | field1=v1|field2=v2  |
> ++---+--+ {code}
> SELECT raw.field1 FROM data
> will return v1.
>  
> As a first step, we will introduce the function "as_json", which accomplishes 
> this for JSON columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24942) Improve cluster resource management with jobs containing barrier stage

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840913#comment-17840913
 ] 

Dongjoon Hyun commented on SPARK-24942:
---

I removed the outdated target version, `3.2.0`, from this Jira. For now, Apache 
Spark community has no target version for this issue.

> Improve cluster resource management with jobs containing barrier stage
> --
>
> Key: SPARK-24942
> URL: https://issues.apache.org/jira/browse/SPARK-24942
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Xingbo Jiang
>Priority: Major
>
> https://github.com/apache/spark/pull/21758#discussion_r205652317
> We shall improve cluster resource management to address the following issues:
> - With dynamic resource allocation enabled, it may happen that we acquire 
> some executors (but not enough to launch all the tasks in a barrier stage) 
> and later release them due to executor idle time expire, and then acquire 
> again.
> - There can be deadlock with two concurrent applications. Each application 
> may acquire some resources, but not enough to launch all the tasks in a 
> barrier stage. And after hitting the idle timeout and releasing them, they 
> may acquire resources again, but just continually trade resources between 
> each other.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24942) Improve cluster resource management with jobs containing barrier stage

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24942:
--
Target Version/s:   (was: 3.2.0)

> Improve cluster resource management with jobs containing barrier stage
> --
>
> Key: SPARK-24942
> URL: https://issues.apache.org/jira/browse/SPARK-24942
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Xingbo Jiang
>Priority: Major
>
> https://github.com/apache/spark/pull/21758#discussion_r205652317
> We shall improve cluster resource management to address the following issues:
> - With dynamic resource allocation enabled, it may happen that we acquire 
> some executors (but not enough to launch all the tasks in a barrier stage) 
> and later release them due to executor idle time expire, and then acquire 
> again.
> - There can be deadlock with two concurrent applications. Each application 
> may acquire some resources, but not enough to launch all the tasks in a 
> barrier stage. And after hitting the idle timeout and releasing them, they 
> may acquire resources again, but just continually trade resources between 
> each other.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44111) Prepare Apache Spark 4.0.0

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840853#comment-17840853
 ] 

Dongjoon Hyun commented on SPARK-44111:
---

Yes, we will provide `4.0.0-preview` in advance, [~fbiville] . Here is the 
discussion thread on Apache Spark dev mailing list.
 * [https://lists.apache.org/thread/nxmvz2j7kp96otzlnl3kd277knlb6qgb]

[~cloud_fan] is the release manager who is leading Apache Spark 4.0.0 release 
(including preview).

> Prepare Apache Spark 4.0.0
> --
>
> Key: SPARK-44111
> URL: https://issues.apache.org/jira/browse/SPARK-44111
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>Priority: Critical
>  Labels: pull-request-available
>
> For now, this issue aims to collect ideas for planning Apache Spark 4.0.0.
> We will add more items which will be excluded from Apache Spark 3.5.0 
> (Feature Freeze: July 16th, 2023).
> {code}
> Spark 1: 2014.05 (1.0.0) ~ 2016.11 (1.6.3)
> Spark 2: 2016.07 (2.0.0) ~ 2021.05 (2.4.8)
> Spark 3: 2020.06 (3.0.0) ~ 2026.xx (3.5.x)
> Spark 4: 2024.06 (4.0.0, NEW)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Dongjoon Hyun
FYI, there is a proposal to drop Python 3.8 because its EOL is October 2024.

https://github.com/apache/spark/pull/46228
[SPARK-47993][PYTHON] Drop Python 3.8

Since it's still alive and there will be an overlap between the lifecycle
of Python 3.8 and Apache Spark 4.0.0, please give us your feedback on the
PR, if you have any concerns.

>From my side, I agree with this decision.

Thanks,
Dongjoon.


[jira] [Resolved] (SPARK-47987) Enable `ArrowParityTests.test_createDataFrame_empty_partition`

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47987.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46220
[https://github.com/apache/spark/pull/46220]

> Enable `ArrowParityTests.test_createDataFrame_empty_partition`
> --
>
> Key: SPARK-47987
> URL: https://issues.apache.org/jira/browse/SPARK-47987
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47990) Upgrade `zstd-jni` to 1.5.6-3

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47990.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46225
[https://github.com/apache/spark/pull/46225]

> Upgrade `zstd-jni` to 1.5.6-3
> -
>
> Key: SPARK-47990
> URL: https://issues.apache.org/jira/browse/SPARK-47990
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

2024-04-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840644#comment-17840644
 ] 

Dongjoon Hyun commented on SPARK-46122:
---

I sent the discussion thread for this issue.

- [https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd]

> Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
> -
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (ORC-1704) Migration to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/ORC-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved ORC-1704.

Fix Version/s: 2.0.1
   2.1.0
   Resolution: Fixed

Issue resolved by pull request 1912
[https://github.com/apache/orc/pull/1912]

> Migration to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark
> ---
>
> Key: ORC-1704
> URL: https://issues.apache.org/jira/browse/ORC-1704
> Project: ORC
>  Issue Type: Improvement
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ORC-1704) Migration to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/ORC-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned ORC-1704:
--

Assignee: dzcxzl

> Migration to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark
> ---
>
> Key: ORC-1704
> URL: https://issues.apache.org/jira/browse/ORC-1704
> Project: ORC
>  Issue Type: Improvement
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-24 Thread Dongjoon Hyun
Hi, All.

It's great to see community activities to polish 4.0.0 more and more.
Thank you all.

I'd like to bring SPARK-46122 (another SQL topic) to you from the subtasks
of SPARK-4 (Prepare Apache Spark 4.0.0),

- https://issues.apache.org/jira/browse/SPARK-46122
   Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

This legacy configuration is about `CREATE TABLE` SQL syntax without
`USING` and `STORED AS`, which is currently mapped to `Hive` table.
The proposal of SPARK-46122 is to switch the default value of this
configuration from `true` to `false` to use Spark native tables because
we support better.

In other words, Spark will use the value of `spark.sql.sources.default`
as the table provider instead of `Hive` like the other Spark APIs. Of
course,
the users can get all the legacy behavior by setting back to `true`.

Historically, this behavior change was merged once at Apache Spark 3.0.0
preparation via SPARK-30098 already, but reverted during the 3.0.0 RC
period.

2019-12-06: SPARK-30098 Use default datasource as provider for CREATE TABLE
2020-05-16: SPARK-31707 Revert SPARK-30098 Use default datasource as
provider for CREATE TABLE command

At Apache Spark 3.1.0, we had another discussion about this and defined it
as one of legacy behavior via this configuration via reused ID, SPARK-30098.

2020-12-01: https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204
2020-12-03: SPARK-30098 Add a configuration to use default datasource as
provider for CREATE TABLE command

Last year, we received two additional requests twice to switch this because
Apache Spark 4.0.0 is a good time to make a decision for the future
direction.

2023-02-27: SPARK-42603 as an independent idea.
2023-11-27: SPARK-46122 as a part of Apache Spark 4.0.0 idea


WDYT? The technical scope is defined in the following PR which is one line
of main
code, one line of migration guide, and a few lines of test code.

- https://github.com/apache/spark/pull/46207

Dongjoon.


[jira] [Updated] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46122:
--
Summary: Set `spark.sql.legacy.createHiveTableByDefault` to `false` by 
default  (was: Set `spark.sql.legacy.createHiveTableByDefault` to false by 
default)

> Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
> -
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to false by default

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46122:
--
Summary: Set `spark.sql.legacy.createHiveTableByDefault` to false by 
default  (was: Disable spark.sql.legacy.createHiveTableByDefault by default)

> Set `spark.sql.legacy.createHiveTableByDefault` to false by default
> ---
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47979) Use Hive tables explicitly for Hive table capability tests

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47979.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46211
[https://github.com/apache/spark/pull/46211]

> Use Hive tables explicitly for Hive table capability tests
> --
>
> Key: SPARK-47979
> URL: https://issues.apache.org/jira/browse/SPARK-47979
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>    Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47979) Use Hive table explicitly for Hive table capability tests

2024-04-24 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47979:
-

 Summary: Use Hive table explicitly for Hive table capability tests
 Key: SPARK-47979
 URL: https://issues.apache.org/jira/browse/SPARK-47979
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47979) Use Hive tables explicitly for Hive table capability tests

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47979:
--
Summary: Use Hive tables explicitly for Hive table capability tests  (was: 
Use Hive table explicitly for Hive table capability tests)

> Use Hive tables explicitly for Hive table capability tests
> --
>
> Key: SPARK-47979
> URL: https://issues.apache.org/jira/browse/SPARK-47979
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>    Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45265) Support Hive 4.0 metastore

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45265:
-

Assignee: (was: Attila Zsolt Piros)

> Support Hive 4.0 metastore
> --
>
> Key: SPARK-45265
> URL: https://issues.apache.org/jira/browse/SPARK-45265
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>  Labels: pull-request-available
>
> Although Hive 4.0.0 is still beta I would like to work on this as Hive 4.0.0 
> will support support the pushdowns of partition column filters with 
> VARCHAR/CHAR types.
> For details please see HIVE-26661: Support partition filter for char and 
> varchar types on Hive metastore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44677) Drop legacy Hive-based ORC file format

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44677:
--
Parent: (was: SPARK-44111)
Issue Type: Task  (was: Sub-task)

> Drop legacy Hive-based ORC file format
> --
>
> Key: SPARK-44677
> URL: https://issues.apache.org/jira/browse/SPARK-44677
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>
> Currently, Spark allows to use spark.sql.orc.impl=native/hive to switch the 
> ORC FileFormat implementation.
> SPARK-23456(2.4) switched the default value of spark.sql.orc.impl from "hive" 
> to "native". and prepared to drop the "hive" implementation in the future.
> > ... eventually, Apache Spark will drop old Hive-based ORC code.
> The native implementation works well during the whole Spark 3.x period, so 
> it's a good time to consider dropping the "hive" one in Spark 4.0.
> Also, we should take care about the backward-compatibility during change.
> > BTW, IIRC, there was a different at Hive ORC CHAR implementation before. 
> > So, we couldn't remove it for backward-compatibility issues. Since Spark 
> > implements many CHAR features, we need to re-verify that {{native}} 
> > implementation has all legacy Hive-based ORC features



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >