[jira] [Commented] (SPARK-39235) Make Catalog API be compatible with 3-layer-namespace

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558713#comment-17558713
 ] 

Apache Spark commented on SPARK-39235:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/36986

> Make Catalog API be compatible with 3-layer-namespace
> -
>
> Key: SPARK-39235
> URL: https://issues.apache.org/jira/browse/SPARK-39235
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Rui Wang
>Priority: Major
>
> We can make Catalog API support 3 layer namespace: 
> catalog_name.database_name.table_name



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39598) Make *cache*, *catalog* in the python side support 3-layer-namespace

2022-06-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39598:


 Summary: Make *cache*, *catalog* in the python side support 
3-layer-namespace
 Key: SPARK-39598
 URL: https://issues.apache.org/jira/browse/SPARK-39598
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.4.0
Reporter: zhengruifeng






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-39596) Run `Linters, licenses, dependencies and documentation generation ` GitHub Actions failed

2022-06-24 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558710#comment-17558710
 ] 

Yang Jie edited comment on SPARK-39596 at 6/25/22 2:33 AM:
---

also ping [~dongjoon] 


was (Author: luciferyang):
also ping [~dongjoon] **

> Run `Linters, licenses, dependencies and documentation generation ` GitHub 
> Actions failed
> -
>
> Key: SPARK-39596
> URL: https://issues.apache.org/jira/browse/SPARK-39596
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> Run / Linters, licenses, dependencies and documentation generation
>  
> {code:java}
> x Failed to parse Rd in histogram.Rd
> ℹ there is no package called ‘ggplot2’
> Caused by error in `loadNamespace()`:
> ! there is no package called ‘ggplot2’ 
> Execution halted
>                     
>       Jekyll 4.2.1   Please append `--trace` to the `build` command 
>                      for any additional information or backtrace. 
>                     
> /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:147:in `': R 
> doc generation failed (RuntimeError)
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `require'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `block in require_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `require_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
>  `block in require_plugin_files'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `require_plugin_files'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
>  `conscientious_require'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
>  `setup'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in
>  `initialize'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
>  `new'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
>  `process'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `block in process_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `process_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in
>  `block (2 levels) in init_with_program'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `block in execute'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `execute'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in
>  `go'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in
>  `program'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in
>  `'
>     from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
> `load'
>     from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
> `'
> Error: Process completed with exit code 1. {code}
>  * [https://github.com/nchammas/spark/runs/7047789435?check_suite_focus=true]
>  * 
> [https://github.com/LuciferYang/spark/runs/7050249606?check_suite_focus=true]
>  * [https://github.com/apache/spark/runs/7047294196?check_suite_focus=true]
>  * 

[jira] [Commented] (SPARK-39596) Run `Linters, licenses, dependencies and documentation generation ` GitHub Actions failed

2022-06-24 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558710#comment-17558710
 ] 

Yang Jie commented on SPARK-39596:
--

also ping [~dongjoon] **

> Run `Linters, licenses, dependencies and documentation generation ` GitHub 
> Actions failed
> -
>
> Key: SPARK-39596
> URL: https://issues.apache.org/jira/browse/SPARK-39596
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> Run / Linters, licenses, dependencies and documentation generation
>  
> {code:java}
> x Failed to parse Rd in histogram.Rd
> ℹ there is no package called ‘ggplot2’
> Caused by error in `loadNamespace()`:
> ! there is no package called ‘ggplot2’ 
> Execution halted
>                     
>       Jekyll 4.2.1   Please append `--trace` to the `build` command 
>                      for any additional information or backtrace. 
>                     
> /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:147:in `': R 
> doc generation failed (RuntimeError)
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `require'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `block in require_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `require_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
>  `block in require_plugin_files'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `require_plugin_files'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
>  `conscientious_require'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
>  `setup'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in
>  `initialize'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
>  `new'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
>  `process'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `block in process_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `process_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in
>  `block (2 levels) in init_with_program'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `block in execute'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `execute'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in
>  `go'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in
>  `program'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in
>  `'
>     from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
> `load'
>     from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
> `'
> Error: Process completed with exit code 1. {code}
>  * [https://github.com/nchammas/spark/runs/7047789435?check_suite_focus=true]
>  * 
> [https://github.com/LuciferYang/spark/runs/7050249606?check_suite_focus=true]
>  * [https://github.com/apache/spark/runs/7047294196?check_suite_focus=true]
>  * https://github.com/amaliujia/spark/runs/7048443511?check_suite_focus=true
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (SPARK-39597) Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39597:


Assignee: Apache Spark

> Make GetTable, TableExists and DatabaseExists in the python side support 
> 3-layer-namespace
> --
>
> Key: SPARK-39597
> URL: https://issues.apache.org/jira/browse/SPARK-39597
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39597) Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39597:


Assignee: (was: Apache Spark)

> Make GetTable, TableExists and DatabaseExists in the python side support 
> 3-layer-namespace
> --
>
> Key: SPARK-39597
> URL: https://issues.apache.org/jira/browse/SPARK-39597
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: zhengruifeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39597) Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558707#comment-17558707
 ] 

Apache Spark commented on SPARK-39597:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/36985

> Make GetTable, TableExists and DatabaseExists in the python side support 
> 3-layer-namespace
> --
>
> Key: SPARK-39597
> URL: https://issues.apache.org/jira/browse/SPARK-39597
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: zhengruifeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39596) Run `Linters, licenses, dependencies and documentation generation ` GitHub Actions failed

2022-06-24 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558706#comment-17558706
 ] 

Yang Jie commented on SPARK-39596:
--

cc [HyukjinKwon|https://github.com/HyukjinKwon]

> Run `Linters, licenses, dependencies and documentation generation ` GitHub 
> Actions failed
> -
>
> Key: SPARK-39596
> URL: https://issues.apache.org/jira/browse/SPARK-39596
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> Run / Linters, licenses, dependencies and documentation generation
>  
> {code:java}
> x Failed to parse Rd in histogram.Rd
> ℹ there is no package called ‘ggplot2’
> Caused by error in `loadNamespace()`:
> ! there is no package called ‘ggplot2’ 
> Execution halted
>                     
>       Jekyll 4.2.1   Please append `--trace` to the `build` command 
>                      for any additional information or backtrace. 
>                     
> /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:147:in `': R 
> doc generation failed (RuntimeError)
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `require'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `block in require_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `require_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
>  `block in require_plugin_files'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `require_plugin_files'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
>  `conscientious_require'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
>  `setup'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in
>  `initialize'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
>  `new'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
>  `process'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `block in process_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
>  `process_with_graceful_fail'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in
>  `block (2 levels) in init_with_program'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `block in execute'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `each'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
>  `execute'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in
>  `go'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in
>  `program'
>     from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in
>  `'
>     from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
> `load'
>     from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
> `'
> Error: Process completed with exit code 1. {code}
>  * [https://github.com/nchammas/spark/runs/7047789435?check_suite_focus=true]
>  * 
> [https://github.com/LuciferYang/spark/runs/7050249606?check_suite_focus=true]
>  * [https://github.com/apache/spark/runs/7047294196?check_suite_focus=true]
>  * https://github.com/amaliujia/spark/runs/7048443511?check_suite_focus=true
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (SPARK-39597) Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

2022-06-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39597:


 Summary: Make GetTable, TableExists and DatabaseExists in the 
python side support 3-layer-namespace
 Key: SPARK-39597
 URL: https://issues.apache.org/jira/browse/SPARK-39597
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.4.0
Reporter: zhengruifeng






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39596) Run `Linters, licenses, dependencies and documentation generation ` GitHub Actions failed

2022-06-24 Thread Yang Jie (Jira)
Yang Jie created SPARK-39596:


 Summary: Run `Linters, licenses, dependencies and documentation 
generation ` GitHub Actions failed
 Key: SPARK-39596
 URL: https://issues.apache.org/jira/browse/SPARK-39596
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie


Run / Linters, licenses, dependencies and documentation generation

 
{code:java}
x Failed to parse Rd in histogram.Rd
ℹ there is no package called ‘ggplot2’
Caused by error in `loadNamespace()`:
! there is no package called ‘ggplot2’ 
Execution halted
                    
      Jekyll 4.2.1   Please append `--trace` to the `build` command 
                     for any additional information or backtrace. 
                    
/__w/spark/spark/docs/_plugins/copy_api_dirs.rb:147:in `': R 
doc generation failed (RuntimeError)
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
 `require'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
 `block in require_with_graceful_fail'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
 `each'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
 `require_with_graceful_fail'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
 `block in require_plugin_files'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
 `each'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
 `require_plugin_files'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
 `conscientious_require'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
 `setup'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in
 `initialize'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
 `new'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
 `process'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
 `block in process_with_graceful_fail'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
 `each'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
 `process_with_graceful_fail'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in
 `block (2 levels) in init_with_program'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
 `block in execute'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
 `each'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
 `execute'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in
 `go'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in
 `program'
    from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in
 `'
    from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
`load'
    from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in 
`'
Error: Process completed with exit code 1. {code}
 * [https://github.com/nchammas/spark/runs/7047789435?check_suite_focus=true]
 * [https://github.com/LuciferYang/spark/runs/7050249606?check_suite_focus=true]
 * [https://github.com/apache/spark/runs/7047294196?check_suite_focus=true]
 * https://github.com/amaliujia/spark/runs/7048443511?check_suite_focus=true

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39595) Upgrade rocksdbjni to 7.3.1

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39595:


Assignee: Apache Spark

> Upgrade rocksdbjni to 7.3.1
> ---
>
> Key: SPARK-39595
> URL: https://issues.apache.org/jira/browse/SPARK-39595
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> https://github.com/facebook/rocksdb/releases/tag/v7.3.1



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39595) Upgrade rocksdbjni to 7.3.1

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39595:


Assignee: (was: Apache Spark)

> Upgrade rocksdbjni to 7.3.1
> ---
>
> Key: SPARK-39595
> URL: https://issues.apache.org/jira/browse/SPARK-39595
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/facebook/rocksdb/releases/tag/v7.3.1



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39595) Upgrade rocksdbjni to 7.3.1

2022-06-24 Thread Yang Jie (Jira)
Yang Jie created SPARK-39595:


 Summary: Upgrade rocksdbjni to 7.3.1
 Key: SPARK-39595
 URL: https://issues.apache.org/jira/browse/SPARK-39595
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie


https://github.com/facebook/rocksdb/releases/tag/v7.3.1



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39594) Improve logs to show addresses in addition to port

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558689#comment-17558689
 ] 

Apache Spark commented on SPARK-39594:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36984

> Improve logs to show addresses in addition to port
> --
>
> Key: SPARK-39594
> URL: https://issues.apache.org/jira/browse/SPARK-39594
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39594) Improve logs to show addresses in addition to port

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39594:


Assignee: (was: Apache Spark)

> Improve logs to show addresses in addition to port
> --
>
> Key: SPARK-39594
> URL: https://issues.apache.org/jira/browse/SPARK-39594
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39594) Improve logs to show addresses in addition to port

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39594:


Assignee: Apache Spark

> Improve logs to show addresses in addition to port
> --
>
> Key: SPARK-39594
> URL: https://issues.apache.org/jira/browse/SPARK-39594
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39594) Improve logs to show addresses in addition to port

2022-06-24 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-39594:
-

 Summary: Improve logs to show addresses in addition to port
 Key: SPARK-39594
 URL: https://issues.apache.org/jira/browse/SPARK-39594
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39583) Make RefreshTable be compatible with 3 layer namespace

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39583:


Assignee: (was: Apache Spark)

> Make RefreshTable be compatible with 3 layer namespace
> --
>
> Key: SPARK-39583
> URL: https://issues.apache.org/jira/browse/SPARK-39583
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39583) Make RefreshTable be compatible with 3 layer namespace

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558679#comment-17558679
 ] 

Apache Spark commented on SPARK-39583:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/36983

> Make RefreshTable be compatible with 3 layer namespace
> --
>
> Key: SPARK-39583
> URL: https://issues.apache.org/jira/browse/SPARK-39583
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39583) Make RefreshTable be compatible with 3 layer namespace

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39583:


Assignee: Apache Spark

> Make RefreshTable be compatible with 3 layer namespace
> --
>
> Key: SPARK-39583
> URL: https://issues.apache.org/jira/browse/SPARK-39583
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39593) Configurable State Checkpointing Frequency in Structured Streaming

2022-06-24 Thread Boyang Jerry Peng (Jira)
Boyang Jerry Peng created SPARK-39593:
-

 Summary: Configurable State Checkpointing Frequency in Structured 
Streaming
 Key: SPARK-39593
 URL: https://issues.apache.org/jira/browse/SPARK-39593
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Boyang Jerry Peng


Currently, for stateful pipelines state changes are checkpointed for every 
micro-batch. State checkpoints can contribute significantly to the latency of a 
micro-batch.  If state is checkpointed less frequently, its effect on batch 
latency can be amortized.  This can be used in conjunction with asynchronous 
state checkpointing to further reduce the cost in latency state checkpointing 
may incur.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39592) Asynchronous State Checkpointing in Structured Streaming

2022-06-24 Thread Boyang Jerry Peng (Jira)
Boyang Jerry Peng created SPARK-39592:
-

 Summary: Asynchronous State Checkpointing in Structured Streaming
 Key: SPARK-39592
 URL: https://issues.apache.org/jira/browse/SPARK-39592
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Boyang Jerry Peng


We can reduce the latency of stateful pipelines in Structured Streaming by 
making state checkpoints asynchronous.  One of the major contributors of 
latency for stateful pipelines in Structured Streaming can be checkpointing the 
state changes of every micro-batch.  If we make the state checkpointing 
asynchronous, we can potentially significantly lower the latency of the 
pipeline as the state checkpointing won’t or will contribute less to the batch 
latency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39591) Offset Management Improvements in Structured Streaming

2022-06-24 Thread Boyang Jerry Peng (Jira)
Boyang Jerry Peng created SPARK-39591:
-

 Summary: Offset Management Improvements in Structured Streaming
 Key: SPARK-39591
 URL: https://issues.apache.org/jira/browse/SPARK-39591
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Boyang Jerry Peng


Currently in Structured Streaming, at the beginning of every micro-batch the 
offset to process up to for the current batch is persisted to durable storage.  
At the end of every micro-batch, a marker to indicate the completion of this 
current micro-batch is persisted to durable storage. For pipelines such as one 
that read from Kafka and write to Kafka, end-to-end exactly once is not support 
and latency is sensitive, we can allow users to configure offset commits to be 
written asynchronously thus this commit operation will not contribute to the 
batch duration and effectively lowering the overall latency of the pipeline.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39590) Python API Parity in Structure Streaming

2022-06-24 Thread Boyang Jerry Peng (Jira)
Boyang Jerry Peng created SPARK-39590:
-

 Summary: Python API Parity in Structure Streaming
 Key: SPARK-39590
 URL: https://issues.apache.org/jira/browse/SPARK-39590
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Boyang Jerry Peng


New APIs in Structured Streaming tend to get added to Java/Scala first.  This 
creates a situation where the Python API have fallen behind.  For example 
map/flatMapGroupsWithState is not supported in the Pyspark.  We need Pyspark 
API to catch up with the Java/Scala APIs and, where necessary, provide tighter 
integrations with native python data processing frameworks such as Pandas.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39589) Asynchronous I/O API support in Structured Streaming

2022-06-24 Thread Boyang Jerry Peng (Jira)
Boyang Jerry Peng created SPARK-39589:
-

 Summary: Asynchronous I/O API support in Structured Streaming
 Key: SPARK-39589
 URL: https://issues.apache.org/jira/browse/SPARK-39589
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Boyang Jerry Peng


Often in streaming pipelines used to support ETL use cases, API calls to 
external systems can be found.  To better support and improve performance for 
such use cases, an asynchronous processing API should be added to Structured 
Streaming so that external requests can be handled in a more efficient manner.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39585) Multiple Stateful Operators in Structured Streaming

2022-06-24 Thread Boyang Jerry Peng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Jerry Peng updated SPARK-39585:
--
Summary: Multiple Stateful Operators in Structured Streaming  (was: 
Multiple Stateful Operators)

> Multiple Stateful Operators in Structured Streaming
> ---
>
> Key: SPARK-39585
> URL: https://issues.apache.org/jira/browse/SPARK-39585
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Boyang Jerry Peng
>Priority: Major
>
> Currently, Structured Streaming only supports one stateful operator per 
> pipeline.  This constraint excludes Structured Streaming to be used for many 
> use cases such ones involving chained time window aggregations, chained 
> stream-stream outer equality join, stream-stream time interval join followed 
> by time window aggregation.  Enabling multiple stateful operators within a 
> pipeline in Structured Streaming will open up many additional use cases for 
> its usage as well as be on par with other stream processing engines.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39586) Advanced Windowing in Structured Streaming

2022-06-24 Thread Boyang Jerry Peng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Jerry Peng updated SPARK-39586:
--
Summary: Advanced Windowing in Structured Streaming  (was: Advanced 
Windowing)

> Advanced Windowing in Structured Streaming
> --
>
> Key: SPARK-39586
> URL: https://issues.apache.org/jira/browse/SPARK-39586
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Boyang Jerry Peng
>Priority: Major
>
> Currently in Structure Streaming there does not exist a user friendly method 
> to define custom windowing strategies. Users can only choose from an existing 
> set of built-in window strategies such tumbling, sliding, session windows.  
> To improve flexibility of the engine and to support more advanced use cases, 
> Structured Streaming needs to provide an intuitive API that allows users to 
> define custom window boundaries, to trigger windows when they are complete, 
> and, if necessary, evict elements from windows.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39588) Inspect the State of Stateful Streaming Pipelines

2022-06-24 Thread Boyang Jerry Peng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Jerry Peng updated SPARK-39588:
--
Issue Type: New Feature  (was: Improvement)

> Inspect the State of Stateful Streaming Pipelines
> -
>
> Key: SPARK-39588
> URL: https://issues.apache.org/jira/browse/SPARK-39588
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Boyang Jerry Peng
>Priority: Major
>
> Currently there is no mechanism to query or inspect the state of the stateful 
> streaming pipeline externally.  A tool or API that allows a user to do that 
> would be extremely helpful for debugging.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39588) Inspect the State of Stateful Streaming Pipelines

2022-06-24 Thread Boyang Jerry Peng (Jira)
Boyang Jerry Peng created SPARK-39588:
-

 Summary: Inspect the State of Stateful Streaming Pipelines
 Key: SPARK-39588
 URL: https://issues.apache.org/jira/browse/SPARK-39588
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Boyang Jerry Peng


Currently there is no mechanism to query or inspect the state of the stateful 
streaming pipeline externally.  A tool or API that allows a user to do that 
would be extremely helpful for debugging.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39587) Schema Evolution for Stateful Pipelines in Structured Streaming

2022-06-24 Thread Boyang Jerry Peng (Jira)
Boyang Jerry Peng created SPARK-39587:
-

 Summary: Schema Evolution for Stateful Pipelines in Structured 
Streaming 
 Key: SPARK-39587
 URL: https://issues.apache.org/jira/browse/SPARK-39587
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Boyang Jerry Peng


Support for schema evolution of stateful operators is non-existent. There is no 
clear path to evolve the schema for existing built-in stateful operators.  For 
use defined stateful operators such as map/flatMapGroupsWithState, there is no 
path for evolving state as well.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39584) Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results

2022-06-24 Thread Kazuyuki Tanimura (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuyuki Tanimura updated SPARK-39584:
--
Description: 
GenTPCDSData uses the schema defined in `TPCDSSchema` that contains 
varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for 
strings whose lengths are < N.

When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it 
uses schema from the parquet file and keeps the paddings. Due to the extra 
spaces, string filter queries of TPC-DS fail to match. For example, q13 query 
results are all nulls and returns too fast because string filter does not meet 
any rows.

Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and 
that is inflating some performance results.

I am exploring two possible solutions now
1. Call `CREATE TABLE tableName schema USING parquet LOCATION path` before 
reading. This is what Spark TPC-DS unit tests are doing
2. Change varchar to string in the schema. This is what [databricks data 
generator|https://github.com/databricks/spark-sql-perf] is doing

TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in 
https://issues.apache.org/jira/browse/SPARK-35192

History related varchar issue 
[https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn]

  was:
GenTPCDSData uses the schema defined in `TPCDSSchema` that contains 
varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for 
strings whose lengths are < N.

When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it 
uses schema from the parquet file and keeps the paddings. Due to the extra 
spaces, string filter queries of TPC-DS fail to match. For example, q13 query 
results are all nulls and returns too fast because string filter does not meet 
any rows.

Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and 
that is inflating some performance results.

I am exploring two possible solutions now
1. Call `CREATE TABLE tableName schema USING parquet LOCATION path` before 
reading. This is what Spark unit tests are doing
2. Change varchar to string in the schema. This is what [databricks data 
generator|https://github.com/databricks/spark-sql-perf] is doing

TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in 
https://issues.apache.org/jira/browse/SPARK-35192

History related varchar issue 
[https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn]


> Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results
> 
>
> Key: SPARK-39584
> URL: https://issues.apache.org/jira/browse/SPARK-39584
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> GenTPCDSData uses the schema defined in `TPCDSSchema` that contains 
> varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for 
> strings whose lengths are < N.
> When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, 
> it uses schema from the parquet file and keeps the paddings. Due to the extra 
> spaces, string filter queries of TPC-DS fail to match. For example, q13 query 
> results are all nulls and returns too fast because string filter does not 
> meet any rows.
> Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and 
> that is inflating some performance results.
> I am exploring two possible solutions now
> 1. Call `CREATE TABLE tableName schema USING parquet LOCATION path` before 
> reading. This is what Spark TPC-DS unit tests are doing
> 2. Change varchar to string in the schema. This is what [databricks data 
> generator|https://github.com/databricks/spark-sql-perf] is doing
> TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in 
> https://issues.apache.org/jira/browse/SPARK-35192
> History related varchar issue 
> [https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39586) Advanced Windowing

2022-06-24 Thread Boyang Jerry Peng (Jira)
Boyang Jerry Peng created SPARK-39586:
-

 Summary: Advanced Windowing
 Key: SPARK-39586
 URL: https://issues.apache.org/jira/browse/SPARK-39586
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Boyang Jerry Peng


Currently in Structure Streaming there does not exist a user friendly method to 
define custom windowing strategies. Users can only choose from an existing set 
of built-in window strategies such tumbling, sliding, session windows.  To 
improve flexibility of the engine and to support more advanced use cases, 
Structured Streaming needs to provide an intuitive API that allows users to 
define custom window boundaries, to trigger windows when they are complete, 
and, if necessary, evict elements from windows.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39585) Multiple Stateful Operators

2022-06-24 Thread Boyang Jerry Peng (Jira)
Boyang Jerry Peng created SPARK-39585:
-

 Summary: Multiple Stateful Operators
 Key: SPARK-39585
 URL: https://issues.apache.org/jira/browse/SPARK-39585
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Boyang Jerry Peng


Currently, Structured Streaming only supports one stateful operator per 
pipeline.  This constraint excludes Structured Streaming to be used for many 
use cases such ones involving chained time window aggregations, chained 
stream-stream outer equality join, stream-stream time interval join followed by 
time window aggregation.  Enabling multiple stateful operators within a 
pipeline in Structured Streaming will open up many additional use cases for its 
usage as well as be on par with other stream processing engines.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39584) Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results

2022-06-24 Thread Kazuyuki Tanimura (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuyuki Tanimura updated SPARK-39584:
--
Description: 
GenTPCDSData uses the schema defined in `TPCDSSchema` that contains 
varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for 
strings whose lengths are < N.

When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it 
uses schema from the parquet file and keeps the paddings. Due to the extra 
spaces, string filter queries of TPC-DS fail to match. For example, q13 query 
results are all nulls and returns too fast because string filter does not meet 
any rows.

Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and 
that is inflating some performance results.

I am exploring two possible solutions now
1. Call `CREATE TABLE tableName schema USING parquet LOCATION path` before 
reading. This is what Spark unit tests are doing
2. Change varchar to string in the schema. This is what [databricks data 
generator|https://github.com/databricks/spark-sql-perf] is doing

TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in 
https://issues.apache.org/jira/browse/SPARK-35192

History related varchar issue 
[https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn]

  was:
GenTPCDSData uses the schema defined in `TPCDSSchema` that contains 
varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for 
strings whose lengths are < N.

When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it 
uses schema from the parquet file and keeps the paddings. Due to the extra 
spaces, string filter queries of TPC-DS fail to match. For example, q13 query 
results are all nulls and returns too fast because string filter does not meet 
any rows.

Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and 
that is inflating some performance results.

I am exploring two possible solutions now
1. Call `{{{}CREATE TABLE tableName schema USING parquet LOCATION path` 
{}}}before reading. This is what Spark unit tests are doing
2. Change varchar to string in the schema. This is what [databricks data 
generator|[https://github.com/databricks/spark-sql-perf]] is doing

TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in 
https://issues.apache.org/jira/browse/SPARK-35192

History related varchar issue 
[https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn]


> Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results
> 
>
> Key: SPARK-39584
> URL: https://issues.apache.org/jira/browse/SPARK-39584
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> GenTPCDSData uses the schema defined in `TPCDSSchema` that contains 
> varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for 
> strings whose lengths are < N.
> When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, 
> it uses schema from the parquet file and keeps the paddings. Due to the extra 
> spaces, string filter queries of TPC-DS fail to match. For example, q13 query 
> results are all nulls and returns too fast because string filter does not 
> meet any rows.
> Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and 
> that is inflating some performance results.
> I am exploring two possible solutions now
> 1. Call `CREATE TABLE tableName schema USING parquet LOCATION path` before 
> reading. This is what Spark unit tests are doing
> 2. Change varchar to string in the schema. This is what [databricks data 
> generator|https://github.com/databricks/spark-sql-perf] is doing
> TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in 
> https://issues.apache.org/jira/browse/SPARK-35192
> History related varchar issue 
> [https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39584) Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results

2022-06-24 Thread Kazuyuki Tanimura (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuyuki Tanimura updated SPARK-39584:
--
Description: 
GenTPCDSData uses the schema defined in `TPCDSSchema` that contains 
varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for 
strings whose lengths are < N.

When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it 
uses schema from the parquet file and keeps the paddings. Due to the extra 
spaces, string filter queries of TPC-DS fail to match. For example, q13 query 
results are all nulls and returns too fast because string filter does not meet 
any rows.

Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and 
that is inflating some performance results.

I am exploring two possible solutions now
1. Call `{{{}CREATE TABLE tableName schema USING parquet LOCATION path` 
{}}}before reading. This is what Spark unit tests are doing
2. Change varchar to string in the schema. This is what [databricks data 
generator|[https://github.com/databricks/spark-sql-perf]] is doing

TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in 
https://issues.apache.org/jira/browse/SPARK-35192

History related varchar issue 
[https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn]

  was:
GenTPCDSData uses the schema defined in `TPCDSSchema` that contains 
varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for 
strings whose lengths are < N.

When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it 
uses schema from the parquet file and keeps the paddings. Due to the extra 
spaces, string filter queries of TPC-DS fail to match. For example, q13 query 
results are all nulls and returns too fast because string filter does not meet 
any rows.

Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and 
that is inflating some performance results.


I am exploring two possible solutions now
1. Call `{{{}CREATE TABLE tableName schema USING parquet LOCATION path` 
{}}}before reading. This is what Spark unit tests are doing
2. Change varchar to string in the schema. This is what [databricks data 
generator| [https://github.com/databricks/spark-sql-perf]] is doing

TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in 
https://issues.apache.org/jira/browse/SPARK-35192

History related varchar 
https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn


> Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results
> 
>
> Key: SPARK-39584
> URL: https://issues.apache.org/jira/browse/SPARK-39584
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> GenTPCDSData uses the schema defined in `TPCDSSchema` that contains 
> varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for 
> strings whose lengths are < N.
> When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, 
> it uses schema from the parquet file and keeps the paddings. Due to the extra 
> spaces, string filter queries of TPC-DS fail to match. For example, q13 query 
> results are all nulls and returns too fast because string filter does not 
> meet any rows.
> Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and 
> that is inflating some performance results.
> I am exploring two possible solutions now
> 1. Call `{{{}CREATE TABLE tableName schema USING parquet LOCATION path` 
> {}}}before reading. This is what Spark unit tests are doing
> 2. Change varchar to string in the schema. This is what [databricks data 
> generator|[https://github.com/databricks/spark-sql-perf]] is doing
> TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in 
> https://issues.apache.org/jira/browse/SPARK-35192
> History related varchar issue 
> [https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39584) Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results

2022-06-24 Thread Kazuyuki Tanimura (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558678#comment-17558678
 ] 

Kazuyuki Tanimura commented on SPARK-39584:
---

Hi [~maropu] , pinging you since it seems you are the expert in this area. I am 
wondering if you have preference between the solution 1 vs 2?

CC [~dongjoon] 

I will be adding benchmark results from the master for the future comparison 
purpose

> Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results
> 
>
> Key: SPARK-39584
> URL: https://issues.apache.org/jira/browse/SPARK-39584
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> GenTPCDSData uses the schema defined in `TPCDSSchema` that contains 
> varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for 
> strings whose lengths are < N.
> When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, 
> it uses schema from the parquet file and keeps the paddings. Due to the extra 
> spaces, string filter queries of TPC-DS fail to match. For example, q13 query 
> results are all nulls and returns too fast because string filter does not 
> meet any rows.
> Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and 
> that is inflating some performance results.
> I am exploring two possible solutions now
> 1. Call `{{{}CREATE TABLE tableName schema USING parquet LOCATION path` 
> {}}}before reading. This is what Spark unit tests are doing
> 2. Change varchar to string in the schema. This is what [databricks data 
> generator| [https://github.com/databricks/spark-sql-perf]] is doing
> TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in 
> https://issues.apache.org/jira/browse/SPARK-35192
> History related varchar 
> https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39584) Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results

2022-06-24 Thread Kazuyuki Tanimura (Jira)
Kazuyuki Tanimura created SPARK-39584:
-

 Summary: Fix TPCDSQueryBenchmark Measuring Performance of Wrong 
Query Results
 Key: SPARK-39584
 URL: https://issues.apache.org/jira/browse/SPARK-39584
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 3.3.0, 3.2.1, 3.1.2, 3.0.3, 3.4.0
Reporter: Kazuyuki Tanimura


GenTPCDSData uses the schema defined in `TPCDSSchema` that contains 
varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for 
strings whose lengths are < N.

When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it 
uses schema from the parquet file and keeps the paddings. Due to the extra 
spaces, string filter queries of TPC-DS fail to match. For example, q13 query 
results are all nulls and returns too fast because string filter does not meet 
any rows.

Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and 
that is inflating some performance results.


I am exploring two possible solutions now
1. Call `{{{}CREATE TABLE tableName schema USING parquet LOCATION path` 
{}}}before reading. This is what Spark unit tests are doing
2. Change varchar to string in the schema. This is what [databricks data 
generator| [https://github.com/databricks/spark-sql-perf]] is doing

TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in 
https://issues.apache.org/jira/browse/SPARK-35192

History related varchar 
https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39583) Make RefreshTable be compatible with 3 layer namespace

2022-06-24 Thread Rui Wang (Jira)
Rui Wang created SPARK-39583:


 Summary: Make RefreshTable be compatible with 3 layer namespace
 Key: SPARK-39583
 URL: https://issues.apache.org/jira/browse/SPARK-39583
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39582) "Since " docs on array_agg are incorrect

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558674#comment-17558674
 ] 

Apache Spark commented on SPARK-39582:
--

User 'nchammas' has created a pull request for this issue:
https://github.com/apache/spark/pull/36982

> "Since " docs on array_agg are incorrect
> -
>
> Key: SPARK-39582
> URL: https://issues.apache.org/jira/browse/SPARK-39582
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> [https://spark.apache.org/docs/latest/api/sql/#array_agg]
> The docs currently say "Since: 2.0.0", but `array_agg` was added in Spark 
> 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39582) "Since " docs on array_agg are incorrect

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558675#comment-17558675
 ] 

Apache Spark commented on SPARK-39582:
--

User 'nchammas' has created a pull request for this issue:
https://github.com/apache/spark/pull/36982

> "Since " docs on array_agg are incorrect
> -
>
> Key: SPARK-39582
> URL: https://issues.apache.org/jira/browse/SPARK-39582
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Nicholas Chammas
>Assignee: Apache Spark
>Priority: Minor
>
> [https://spark.apache.org/docs/latest/api/sql/#array_agg]
> The docs currently say "Since: 2.0.0", but `array_agg` was added in Spark 
> 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39582) "Since " docs on array_agg are incorrect

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39582:


Assignee: (was: Apache Spark)

> "Since " docs on array_agg are incorrect
> -
>
> Key: SPARK-39582
> URL: https://issues.apache.org/jira/browse/SPARK-39582
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> [https://spark.apache.org/docs/latest/api/sql/#array_agg]
> The docs currently say "Since: 2.0.0", but `array_agg` was added in Spark 
> 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39582) "Since " docs on array_agg are incorrect

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39582:


Assignee: Apache Spark

> "Since " docs on array_agg are incorrect
> -
>
> Key: SPARK-39582
> URL: https://issues.apache.org/jira/browse/SPARK-39582
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Nicholas Chammas
>Assignee: Apache Spark
>Priority: Minor
>
> [https://spark.apache.org/docs/latest/api/sql/#array_agg]
> The docs currently say "Since: 2.0.0", but `array_agg` was added in Spark 
> 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39582) "Since " docs on array_agg are incorrect

2022-06-24 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-39582:


 Summary: "Since " docs on array_agg are incorrect
 Key: SPARK-39582
 URL: https://issues.apache.org/jira/browse/SPARK-39582
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Nicholas Chammas


[https://spark.apache.org/docs/latest/api/sql/#array_agg]

The docs currently say "Since: 2.0.0", but `array_agg` was added in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39581) Better error messages for Pandas API on Spark

2022-06-24 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-39581:


 Summary: Better error messages for Pandas API on Spark
 Key: SPARK-39581
 URL: https://issues.apache.org/jira/browse/SPARK-39581
 Project: Spark
  Issue Type: Improvement
  Components: Pandas API on Spark
Affects Versions: 3.4.0
Reporter: Xinrong Meng


Currently, some error messages for pandas API on Spark are confusing.

Specifically,
 * when users assume pandas-on-Spark has implemented pandas Scalars 
 * when users pass a ps.Index when creating a DataFrame/Series

Better error messages enhance usability and furthermore, user adoption.

We should improve error messages for Pandas API on Spark.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39574) Better error message when `ps.Index` is used for DataFrame/Series creation

2022-06-24 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-39574:
-
Parent: SPARK-39581
Issue Type: Sub-task  (was: Improvement)

> Better error message when `ps.Index` is used for DataFrame/Series creation
> --
>
> Key: SPARK-39574
> URL: https://issues.apache.org/jira/browse/SPARK-39574
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Better error message when `ps.Index` is used for DataFrame/Series creation.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39522) Uses Docker image cache over a custom image

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39522:


Assignee: Apache Spark

> Uses Docker image cache over a custom image
> ---
>
> Key: SPARK-39522
> URL: https://issues.apache.org/jira/browse/SPARK-39522
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> We should probably replace the base image 
> (https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L302,
>  https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage) to plain 
> ubunto image w/ Docker image cache. See also 
> https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39522) Uses Docker image cache over a custom image

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39522:


Assignee: (was: Apache Spark)

> Uses Docker image cache over a custom image
> ---
>
> Key: SPARK-39522
> URL: https://issues.apache.org/jira/browse/SPARK-39522
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We should probably replace the base image 
> (https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L302,
>  https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage) to plain 
> ubunto image w/ Docker image cache. See also 
> https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39574) Better error message when `ps.Index` is used for DataFrame/Series creation

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39574:


Assignee: (was: Apache Spark)

> Better error message when `ps.Index` is used for DataFrame/Series creation
> --
>
> Key: SPARK-39574
> URL: https://issues.apache.org/jira/browse/SPARK-39574
> Project: Spark
>  Issue Type: Improvement
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Better error message when `ps.Index` is used for DataFrame/Series creation.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39522) Uses Docker image cache over a custom image

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558585#comment-17558585
 ] 

Apache Spark commented on SPARK-39522:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36980

> Uses Docker image cache over a custom image
> ---
>
> Key: SPARK-39522
> URL: https://issues.apache.org/jira/browse/SPARK-39522
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We should probably replace the base image 
> (https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L302,
>  https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage) to plain 
> ubunto image w/ Docker image cache. See also 
> https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39574) Better error message when `ps.Index` is used for DataFrame/Series creation

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39574:


Assignee: Apache Spark

> Better error message when `ps.Index` is used for DataFrame/Series creation
> --
>
> Key: SPARK-39574
> URL: https://issues.apache.org/jira/browse/SPARK-39574
> Project: Spark
>  Issue Type: Improvement
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Better error message when `ps.Index` is used for DataFrame/Series creation.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39574) Better error message when `ps.Index` is used for DataFrame/Series creation

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558583#comment-17558583
 ] 

Apache Spark commented on SPARK-39574:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/36981

> Better error message when `ps.Index` is used for DataFrame/Series creation
> --
>
> Key: SPARK-39574
> URL: https://issues.apache.org/jira/browse/SPARK-39574
> Project: Spark
>  Issue Type: Improvement
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Better error message when `ps.Index` is used for DataFrame/Series creation.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39522) Uses Docker image cache over a custom image

2022-06-24 Thread Yikun Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558570#comment-17558570
 ] 

Yikun Jiang commented on SPARK-39522:
-

Some investigation:

- The build job are running in each user's downstream repo, so we have to use 
"Registry cache" as a bridge.

- The complete flow would be:
 # (apache repo) Build the image cache in apache repo, this image will be 
refreshed if dockerfile changes merged.
 # (user repo) Build the latest infra image in each pr based on image cache and 
PR changes Dockerfile, and upload it to user gchr.io.
 # (user repo) Use the latest infra image of Step2 to running pyspark, sparkr, 
lint.

 

> Uses Docker image cache over a custom image
> ---
>
> Key: SPARK-39522
> URL: https://issues.apache.org/jira/browse/SPARK-39522
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We should probably replace the base image 
> (https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L302,
>  https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage) to plain 
> ubunto image w/ Docker image cache. See also 
> https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39576) Support GitHub Actions generate benchmark results using Scala 2.13

2022-06-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-39576.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36975
[https://github.com/apache/spark/pull/36975]

> Support GitHub Actions generate benchmark results using Scala 2.13
> --
>
> Key: SPARK-39576
> URL: https://issues.apache.org/jira/browse/SPARK-39576
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
>  GitHub Actions only support generate benchmark results using Scala 2.12



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39576) Support GitHub Actions generate benchmark results using Scala 2.13

2022-06-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-39576:
-

Assignee: Yang Jie

> Support GitHub Actions generate benchmark results using Scala 2.13
> --
>
> Key: SPARK-39576
> URL: https://issues.apache.org/jira/browse/SPARK-39576
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
>  GitHub Actions only support generate benchmark results using Scala 2.12



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39580) Write event logs in a continuous manner

2022-06-24 Thread dimtiris kanoute (Jira)
dimtiris kanoute created SPARK-39580:


 Summary: Write event logs in a continuous manner
 Key: SPARK-39580
 URL: https://issues.apache.org/jira/browse/SPARK-39580
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: dimtiris kanoute


We are using Spark Thrift Server as a service to run Spark SQL queries.

Currently Spark writes *eventlogs* once the job is done/ gracefully killed.

This means that in case of Spark thrift Server when it runs as one job in case 
of unexpected error ( i.e. OOM ) we cannot access the logs after, as it doesn't 
write them due to the unexpected shut down.

We would like to suggest as an improvement the functionality for sparks *to 
write the event logs to a cloud storage in a continuous manner.*

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39579) Make ListFunctions API compatible

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558488#comment-17558488
 ] 

Apache Spark commented on SPARK-39579:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/36977

> Make ListFunctions API compatible 
> --
>
> Key: SPARK-39579
> URL: https://issues.apache.org/jira/browse/SPARK-39579
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: zhengruifeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39579) Make ListFunctions API compatible

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558487#comment-17558487
 ] 

Apache Spark commented on SPARK-39579:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/36977

> Make ListFunctions API compatible 
> --
>
> Key: SPARK-39579
> URL: https://issues.apache.org/jira/browse/SPARK-39579
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: zhengruifeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39579) Make ListFunctions API compatible

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39579:


Assignee: (was: Apache Spark)

> Make ListFunctions API compatible 
> --
>
> Key: SPARK-39579
> URL: https://issues.apache.org/jira/browse/SPARK-39579
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: zhengruifeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39579) Make ListFunctions API compatible

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39579:


Assignee: Apache Spark

> Make ListFunctions API compatible 
> --
>
> Key: SPARK-39579
> URL: https://issues.apache.org/jira/browse/SPARK-39579
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39579) Make ListFunctions API compatible

2022-06-24 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-39579:


 Summary: Make ListFunctions API compatible 
 Key: SPARK-39579
 URL: https://issues.apache.org/jira/browse/SPARK-39579
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: zhengruifeng






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39578) The driver cannot parse the SQL statement and the job is not executed

2022-06-24 Thread miaowang (Jira)
miaowang created SPARK-39578:


 Summary: The driver cannot parse the SQL statement and the job is 
not executed
 Key: SPARK-39578
 URL: https://issues.apache.org/jira/browse/SPARK-39578
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.8
Reporter: miaowang


We use Spark's API to submit SQL execution tasks. Occasionally, SQL cannot be 
executed and the task does not exit. Please give us some advice on how to solve 
this problem.

Operation execution mode:

!image-2022-06-24-18-41-50-864.png!

Read SQL script and submit jobs in sequence。

Abnormal operation information:

 
{code:java}
//22/06/23 20:55:36 WARN HiveConf: HiveConf of name 
hive.strict.checks.type.safety does not exist
22/06/23 20:55:36 WARN HiveConf: HiveConf of name 
hive.strict.checks.cartesian.product does not exist
22/06/23 20:55:37 INFO metastore: Trying to connect to metastore with URI 
thrift://bdp-datalake-hive-metastore-01-10-8-49-114:9083
22/06/23 20:55:37 INFO metastore: Connected to metastore.
22/06/23 20:55:38 INFO SessionState: Created local directory: 
/hdfsdata/subdata10/yarn/nmcgroup/usercache/hive/appcache/application_1643023142753_3753253/container_e16_1643023142753_3753253_01_01/tmp/nobody
22/06/23 20:55:38 INFO SessionState: Created local directory: 
/hdfsdata/subdata10/yarn/nmcgroup/usercache/hive/appcache/application_1643023142753_3753253/container_e16_1643023142753_3753253_01_01/tmp/944b1102-4564-4879-8305-4df390b26669_resources
22/06/23 20:55:38 INFO SessionState: Created HDFS directory: 
/tmp/hive/hive/944b1102-4564-4879-8305-4df390b26669
22/06/23 20:55:38 INFO SessionState: Created local directory: 
/hdfsdata/subdata10/yarn/nmcgroup/usercache/hive/appcache/application_1643023142753_3753253/container_e16_1643023142753_3753253_01_01/tmp/nobody/944b1102-4564-4879-8305-4df390b26669
22/06/23 20:55:38 INFO SessionState: Created HDFS directory: 
/tmp/hive/hive/944b1102-4564-4879-8305-4df390b26669/_tmp_space.db
22/06/23 20:55:38 INFO HiveClientImpl: Warehouse location for Hive client 
(version 1.2.2) is /user/hive/warehouse
22/06/23 20:55:47 INFO ContextCleaner: Cleaned accumulator 7
22/06/23 20:55:47 INFO ContextCleaner: Cleaned accumulator 12
22/06/23 20:55:47 INFO ContextCleaner: Cleaned accumulator 14
22/06/23 20:55:47 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 
bdp-hdfs-36-10-8-33-65:39014 in memory (size: 31.1 KB, free: 3.0 GB)
22/06/23 20:55:48 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 
bdp-hdfs-16-10-8-33-45:33394 in memory (size: 31.1 KB, free: 8.4 GB)
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 10
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 2
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 18
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 24
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 19
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 6
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 5
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 17
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 25
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 13
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 29
22/06/23 20:55:48 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 
bdp-hdfs-36-10-8-33-65:39014 in memory (size: 4.0 KB, free: 3.0 GB)
22/06/23 20:55:48 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 
bdp-hdfs-16-10-8-33-45:33394 in memory (size: 4.0 KB, free: 8.4 GB)
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 28
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 4
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 16
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 21
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 20
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 9
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 23
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 26
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 0
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 3
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 11
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 15
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 27
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 8
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 1
22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 22
22/06/23 20:55:49 INFO InMemoryFileIndex: It took 66 ms to list leaf files for 
1 paths.
22/06/23 20:55:49 INFO InMemoryFileIndex: It took 8 ms to list leaf files for 1 
paths.

正在执行>drop table if exists 
dc.cowell_omni_channel_jingli_day03_operator_info_org_20220622
执行结束>drop table if exists 
dc.cowell_omni_channel_jingli_day03_operator_info_org_20220622
正在执行>  

[jira] [Assigned] (SPARK-39506) CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs

2022-06-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-39506:
---

Assignee: Rui Wang

> CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, 
> listCatalogs
> ---
>
> Key: SPARK-39506
> URL: https://issues.apache.org/jira/browse/SPARK-39506
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39506) CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs

2022-06-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-39506.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36904
[https://github.com/apache/spark/pull/36904]

> CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, 
> listCatalogs
> ---
>
> Key: SPARK-39506
> URL: https://issues.apache.org/jira/browse/SPARK-39506
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39453) DS V2 supports push down misc non-aggregate functions(non ANSI)

2022-06-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-39453:
---

Assignee: jiaan.geng

> DS V2 supports push down misc non-aggregate functions(non ANSI)
> ---
>
> Key: SPARK-39453
> URL: https://issues.apache.org/jira/browse/SPARK-39453
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39453) DS V2 supports push down misc non-aggregate functions(non ANSI)

2022-06-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-39453.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36830
[https://github.com/apache/spark/pull/36830]

> DS V2 supports push down misc non-aggregate functions(non ANSI)
> ---
>
> Key: SPARK-39453
> URL: https://issues.apache.org/jira/browse/SPARK-39453
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39577) Add SQL reference for built-in functions

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558424#comment-17558424
 ] 

Apache Spark commented on SPARK-39577:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/36976

> Add SQL reference for built-in functions
> 
>
> Key: SPARK-39577
> URL: https://issues.apache.org/jira/browse/SPARK-39577
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark doc of  SQL reference missing many functions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39577) Add SQL reference for built-in functions

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39577:


Assignee: Apache Spark

> Add SQL reference for built-in functions
> 
>
> Key: SPARK-39577
> URL: https://issues.apache.org/jira/browse/SPARK-39577
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Currently, Spark doc of  SQL reference missing many functions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39577) Add SQL reference for built-in functions

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39577:


Assignee: (was: Apache Spark)

> Add SQL reference for built-in functions
> 
>
> Key: SPARK-39577
> URL: https://issues.apache.org/jira/browse/SPARK-39577
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark doc of  SQL reference missing many functions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39577) Add SQL reference for built-in functions

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558423#comment-17558423
 ] 

Apache Spark commented on SPARK-39577:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/36976

> Add SQL reference for built-in functions
> 
>
> Key: SPARK-39577
> URL: https://issues.apache.org/jira/browse/SPARK-39577
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark doc of  SQL reference missing many functions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38978) DS V2 supports push down OFFSET operator

2022-06-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-38978.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36295
[https://github.com/apache/spark/pull/36295]

> DS V2 supports push down OFFSET operator
> 
>
> Key: SPARK-38978
> URL: https://issues.apache.org/jira/browse/SPARK-38978
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, DS V2 push-down supports LIMIT but OFFSET.
> If we can pushing down OFFSET to JDBC data source, it will be better 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38978) DS V2 supports push down OFFSET operator

2022-06-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-38978:
---

Assignee: jiaan.geng

> DS V2 supports push down OFFSET operator
> 
>
> Key: SPARK-38978
> URL: https://issues.apache.org/jira/browse/SPARK-38978
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> Currently, DS V2 push-down supports LIMIT but OFFSET.
> If we can pushing down OFFSET to JDBC data source, it will be better 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39577) Add SQL reference for built-in functions

2022-06-24 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-39577:
---
Summary: Add SQL reference for built-in functions  (was: Add document into 
SQL reference for built-in functions)

> Add SQL reference for built-in functions
> 
>
> Key: SPARK-39577
> URL: https://issues.apache.org/jira/browse/SPARK-39577
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark doc of  SQL reference missing many functions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39577) Add document into SQL reference for built-in functions

2022-06-24 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-39577:
--

 Summary: Add document into SQL reference for built-in functions
 Key: SPARK-39577
 URL: https://issues.apache.org/jira/browse/SPARK-39577
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.4.0
Reporter: jiaan.geng


Currently, Spark doc of  SQL reference missing many functions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39576) Support GitHub Actions generate benchmark results using Scala 2.13

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558377#comment-17558377
 ] 

Apache Spark commented on SPARK-39576:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36975

> Support GitHub Actions generate benchmark results using Scala 2.13
> --
>
> Key: SPARK-39576
> URL: https://issues.apache.org/jira/browse/SPARK-39576
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
>  GitHub Actions only support generate benchmark results using Scala 2.12



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39576) Support GitHub Actions generate benchmark results using Scala 2.13

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39576:


Assignee: (was: Apache Spark)

> Support GitHub Actions generate benchmark results using Scala 2.13
> --
>
> Key: SPARK-39576
> URL: https://issues.apache.org/jira/browse/SPARK-39576
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
>  GitHub Actions only support generate benchmark results using Scala 2.12



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39576) Support GitHub Actions generate benchmark results using Scala 2.13

2022-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39576:


Assignee: Apache Spark

> Support GitHub Actions generate benchmark results using Scala 2.13
> --
>
> Key: SPARK-39576
> URL: https://issues.apache.org/jira/browse/SPARK-39576
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
>  GitHub Actions only support generate benchmark results using Scala 2.12



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39447) Only non-broadcast query stage can propagate empty relation

2022-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558376#comment-17558376
 ] 

Apache Spark commented on SPARK-39447:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/36974

> Only non-broadcast query stage can propagate empty relation
> ---
>
> Key: SPARK-39447
> URL: https://issues.apache.org/jira/browse/SPARK-39447
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39576) Support GitHub Actions generate benchmark results using Scala 2.13

2022-06-24 Thread Yang Jie (Jira)
Yang Jie created SPARK-39576:


 Summary: Support GitHub Actions generate benchmark results using 
Scala 2.13
 Key: SPARK-39576
 URL: https://issues.apache.org/jira/browse/SPARK-39576
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.4.0
Reporter: Yang Jie


 GitHub Actions only support generate benchmark results using Scala 2.12



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org