[jira] [Commented] (SPARK-39235) Make Catalog API be compatible with 3-layer-namespace
[ https://issues.apache.org/jira/browse/SPARK-39235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558713#comment-17558713 ] Apache Spark commented on SPARK-39235: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/36986 > Make Catalog API be compatible with 3-layer-namespace > - > > Key: SPARK-39235 > URL: https://issues.apache.org/jira/browse/SPARK-39235 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Rui Wang >Priority: Major > > We can make Catalog API support 3 layer namespace: > catalog_name.database_name.table_name -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39598) Make *cache*, *catalog* in the python side support 3-layer-namespace
zhengruifeng created SPARK-39598: Summary: Make *cache*, *catalog* in the python side support 3-layer-namespace Key: SPARK-39598 URL: https://issues.apache.org/jira/browse/SPARK-39598 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.0 Reporter: zhengruifeng -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-39596) Run `Linters, licenses, dependencies and documentation generation ` GitHub Actions failed
[ https://issues.apache.org/jira/browse/SPARK-39596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558710#comment-17558710 ] Yang Jie edited comment on SPARK-39596 at 6/25/22 2:33 AM: --- also ping [~dongjoon] was (Author: luciferyang): also ping [~dongjoon] ** > Run `Linters, licenses, dependencies and documentation generation ` GitHub > Actions failed > - > > Key: SPARK-39596 > URL: https://issues.apache.org/jira/browse/SPARK-39596 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > Run / Linters, licenses, dependencies and documentation generation > > {code:java} > x Failed to parse Rd in histogram.Rd > ℹ there is no package called ‘ggplot2’ > Caused by error in `loadNamespace()`: > ! there is no package called ‘ggplot2’ > Execution halted > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:147:in `': R > doc generation failed (RuntimeError) > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `block in require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in > `block in require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in > `conscientious_require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in > `setup' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in > `initialize' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in > `new' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in > `process' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `block in process_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `process_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in > `block (2 levels) in init_with_program' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `block in execute' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `execute' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in > `go' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in > `program' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in > `' > from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in > `load' > from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in > `' > Error: Process completed with exit code 1. {code} > * [https://github.com/nchammas/spark/runs/7047789435?check_suite_focus=true] > * > [https://github.com/LuciferYang/spark/runs/7050249606?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7047294196?check_suite_focus=true] > *
[jira] [Commented] (SPARK-39596) Run `Linters, licenses, dependencies and documentation generation ` GitHub Actions failed
[ https://issues.apache.org/jira/browse/SPARK-39596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558710#comment-17558710 ] Yang Jie commented on SPARK-39596: -- also ping [~dongjoon] ** > Run `Linters, licenses, dependencies and documentation generation ` GitHub > Actions failed > - > > Key: SPARK-39596 > URL: https://issues.apache.org/jira/browse/SPARK-39596 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > Run / Linters, licenses, dependencies and documentation generation > > {code:java} > x Failed to parse Rd in histogram.Rd > ℹ there is no package called ‘ggplot2’ > Caused by error in `loadNamespace()`: > ! there is no package called ‘ggplot2’ > Execution halted > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:147:in `': R > doc generation failed (RuntimeError) > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `block in require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in > `block in require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in > `conscientious_require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in > `setup' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in > `initialize' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in > `new' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in > `process' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `block in process_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `process_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in > `block (2 levels) in init_with_program' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `block in execute' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `execute' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in > `go' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in > `program' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in > `' > from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in > `load' > from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in > `' > Error: Process completed with exit code 1. {code} > * [https://github.com/nchammas/spark/runs/7047789435?check_suite_focus=true] > * > [https://github.com/LuciferYang/spark/runs/7050249606?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7047294196?check_suite_focus=true] > * https://github.com/amaliujia/spark/runs/7048443511?check_suite_focus=true > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (SPARK-39597) Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace
[ https://issues.apache.org/jira/browse/SPARK-39597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39597: Assignee: Apache Spark > Make GetTable, TableExists and DatabaseExists in the python side support > 3-layer-namespace > -- > > Key: SPARK-39597 > URL: https://issues.apache.org/jira/browse/SPARK-39597 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: zhengruifeng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39597) Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace
[ https://issues.apache.org/jira/browse/SPARK-39597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39597: Assignee: (was: Apache Spark) > Make GetTable, TableExists and DatabaseExists in the python side support > 3-layer-namespace > -- > > Key: SPARK-39597 > URL: https://issues.apache.org/jira/browse/SPARK-39597 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: zhengruifeng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39597) Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace
[ https://issues.apache.org/jira/browse/SPARK-39597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558707#comment-17558707 ] Apache Spark commented on SPARK-39597: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/36985 > Make GetTable, TableExists and DatabaseExists in the python side support > 3-layer-namespace > -- > > Key: SPARK-39597 > URL: https://issues.apache.org/jira/browse/SPARK-39597 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: zhengruifeng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39596) Run `Linters, licenses, dependencies and documentation generation ` GitHub Actions failed
[ https://issues.apache.org/jira/browse/SPARK-39596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558706#comment-17558706 ] Yang Jie commented on SPARK-39596: -- cc [HyukjinKwon|https://github.com/HyukjinKwon] > Run `Linters, licenses, dependencies and documentation generation ` GitHub > Actions failed > - > > Key: SPARK-39596 > URL: https://issues.apache.org/jira/browse/SPARK-39596 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > Run / Linters, licenses, dependencies and documentation generation > > {code:java} > x Failed to parse Rd in histogram.Rd > ℹ there is no package called ‘ggplot2’ > Caused by error in `loadNamespace()`: > ! there is no package called ‘ggplot2’ > Execution halted > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:147:in `': R > doc generation failed (RuntimeError) > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `block in require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in > `block in require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in > `conscientious_require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in > `setup' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in > `initialize' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in > `new' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in > `process' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `block in process_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in > `process_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in > `block (2 levels) in init_with_program' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `block in execute' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in > `execute' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in > `go' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in > `program' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in > `' > from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in > `load' > from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in > `' > Error: Process completed with exit code 1. {code} > * [https://github.com/nchammas/spark/runs/7047789435?check_suite_focus=true] > * > [https://github.com/LuciferYang/spark/runs/7050249606?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7047294196?check_suite_focus=true] > * https://github.com/amaliujia/spark/runs/7048443511?check_suite_focus=true > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (SPARK-39597) Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace
zhengruifeng created SPARK-39597: Summary: Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace Key: SPARK-39597 URL: https://issues.apache.org/jira/browse/SPARK-39597 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.0 Reporter: zhengruifeng -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39596) Run `Linters, licenses, dependencies and documentation generation ` GitHub Actions failed
Yang Jie created SPARK-39596: Summary: Run `Linters, licenses, dependencies and documentation generation ` GitHub Actions failed Key: SPARK-39596 URL: https://issues.apache.org/jira/browse/SPARK-39596 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie Run / Linters, licenses, dependencies and documentation generation {code:java} x Failed to parse Rd in histogram.Rd ℹ there is no package called ‘ggplot2’ Caused by error in `loadNamespace()`: ! there is no package called ‘ggplot2’ Execution halted Jekyll 4.2.1 Please append `--trace` to the `build` command for any additional information or backtrace. /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:147:in `': R doc generation failed (RuntimeError) from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in `require' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in `block in require_with_graceful_fail' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in `each' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in `require_with_graceful_fail' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in `block in require_plugin_files' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in `each' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in `require_plugin_files' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in `conscientious_require' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in `setup' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in `initialize' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in `new' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in `process' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in `block in process_with_graceful_fail' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in `each' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in `process_with_graceful_fail' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in `block (2 levels) in init_with_program' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in `block in execute' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in `each' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in `execute' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in `go' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in `program' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in `' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in `load' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in `' Error: Process completed with exit code 1. {code} * [https://github.com/nchammas/spark/runs/7047789435?check_suite_focus=true] * [https://github.com/LuciferYang/spark/runs/7050249606?check_suite_focus=true] * [https://github.com/apache/spark/runs/7047294196?check_suite_focus=true] * https://github.com/amaliujia/spark/runs/7048443511?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39595) Upgrade rocksdbjni to 7.3.1
[ https://issues.apache.org/jira/browse/SPARK-39595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39595: Assignee: Apache Spark > Upgrade rocksdbjni to 7.3.1 > --- > > Key: SPARK-39595 > URL: https://issues.apache.org/jira/browse/SPARK-39595 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > https://github.com/facebook/rocksdb/releases/tag/v7.3.1 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39595) Upgrade rocksdbjni to 7.3.1
[ https://issues.apache.org/jira/browse/SPARK-39595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39595: Assignee: (was: Apache Spark) > Upgrade rocksdbjni to 7.3.1 > --- > > Key: SPARK-39595 > URL: https://issues.apache.org/jira/browse/SPARK-39595 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/facebook/rocksdb/releases/tag/v7.3.1 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39595) Upgrade rocksdbjni to 7.3.1
Yang Jie created SPARK-39595: Summary: Upgrade rocksdbjni to 7.3.1 Key: SPARK-39595 URL: https://issues.apache.org/jira/browse/SPARK-39595 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie https://github.com/facebook/rocksdb/releases/tag/v7.3.1 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39594) Improve logs to show addresses in addition to port
[ https://issues.apache.org/jira/browse/SPARK-39594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558689#comment-17558689 ] Apache Spark commented on SPARK-39594: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36984 > Improve logs to show addresses in addition to port > -- > > Key: SPARK-39594 > URL: https://issues.apache.org/jira/browse/SPARK-39594 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39594) Improve logs to show addresses in addition to port
[ https://issues.apache.org/jira/browse/SPARK-39594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39594: Assignee: (was: Apache Spark) > Improve logs to show addresses in addition to port > -- > > Key: SPARK-39594 > URL: https://issues.apache.org/jira/browse/SPARK-39594 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39594) Improve logs to show addresses in addition to port
[ https://issues.apache.org/jira/browse/SPARK-39594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39594: Assignee: Apache Spark > Improve logs to show addresses in addition to port > -- > > Key: SPARK-39594 > URL: https://issues.apache.org/jira/browse/SPARK-39594 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39594) Improve logs to show addresses in addition to port
Dongjoon Hyun created SPARK-39594: - Summary: Improve logs to show addresses in addition to port Key: SPARK-39594 URL: https://issues.apache.org/jira/browse/SPARK-39594 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39583) Make RefreshTable be compatible with 3 layer namespace
[ https://issues.apache.org/jira/browse/SPARK-39583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39583: Assignee: (was: Apache Spark) > Make RefreshTable be compatible with 3 layer namespace > -- > > Key: SPARK-39583 > URL: https://issues.apache.org/jira/browse/SPARK-39583 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39583) Make RefreshTable be compatible with 3 layer namespace
[ https://issues.apache.org/jira/browse/SPARK-39583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558679#comment-17558679 ] Apache Spark commented on SPARK-39583: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/36983 > Make RefreshTable be compatible with 3 layer namespace > -- > > Key: SPARK-39583 > URL: https://issues.apache.org/jira/browse/SPARK-39583 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39583) Make RefreshTable be compatible with 3 layer namespace
[ https://issues.apache.org/jira/browse/SPARK-39583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39583: Assignee: Apache Spark > Make RefreshTable be compatible with 3 layer namespace > -- > > Key: SPARK-39583 > URL: https://issues.apache.org/jira/browse/SPARK-39583 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39593) Configurable State Checkpointing Frequency in Structured Streaming
Boyang Jerry Peng created SPARK-39593: - Summary: Configurable State Checkpointing Frequency in Structured Streaming Key: SPARK-39593 URL: https://issues.apache.org/jira/browse/SPARK-39593 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.3.0 Reporter: Boyang Jerry Peng Currently, for stateful pipelines state changes are checkpointed for every micro-batch. State checkpoints can contribute significantly to the latency of a micro-batch. If state is checkpointed less frequently, its effect on batch latency can be amortized. This can be used in conjunction with asynchronous state checkpointing to further reduce the cost in latency state checkpointing may incur. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39592) Asynchronous State Checkpointing in Structured Streaming
Boyang Jerry Peng created SPARK-39592: - Summary: Asynchronous State Checkpointing in Structured Streaming Key: SPARK-39592 URL: https://issues.apache.org/jira/browse/SPARK-39592 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.3.0 Reporter: Boyang Jerry Peng We can reduce the latency of stateful pipelines in Structured Streaming by making state checkpoints asynchronous. One of the major contributors of latency for stateful pipelines in Structured Streaming can be checkpointing the state changes of every micro-batch. If we make the state checkpointing asynchronous, we can potentially significantly lower the latency of the pipeline as the state checkpointing won’t or will contribute less to the batch latency. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39591) Offset Management Improvements in Structured Streaming
Boyang Jerry Peng created SPARK-39591: - Summary: Offset Management Improvements in Structured Streaming Key: SPARK-39591 URL: https://issues.apache.org/jira/browse/SPARK-39591 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.3.0 Reporter: Boyang Jerry Peng Currently in Structured Streaming, at the beginning of every micro-batch the offset to process up to for the current batch is persisted to durable storage. At the end of every micro-batch, a marker to indicate the completion of this current micro-batch is persisted to durable storage. For pipelines such as one that read from Kafka and write to Kafka, end-to-end exactly once is not support and latency is sensitive, we can allow users to configure offset commits to be written asynchronously thus this commit operation will not contribute to the batch duration and effectively lowering the overall latency of the pipeline. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39590) Python API Parity in Structure Streaming
Boyang Jerry Peng created SPARK-39590: - Summary: Python API Parity in Structure Streaming Key: SPARK-39590 URL: https://issues.apache.org/jira/browse/SPARK-39590 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.3.0 Reporter: Boyang Jerry Peng New APIs in Structured Streaming tend to get added to Java/Scala first. This creates a situation where the Python API have fallen behind. For example map/flatMapGroupsWithState is not supported in the Pyspark. We need Pyspark API to catch up with the Java/Scala APIs and, where necessary, provide tighter integrations with native python data processing frameworks such as Pandas. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39589) Asynchronous I/O API support in Structured Streaming
Boyang Jerry Peng created SPARK-39589: - Summary: Asynchronous I/O API support in Structured Streaming Key: SPARK-39589 URL: https://issues.apache.org/jira/browse/SPARK-39589 Project: Spark Issue Type: New Feature Components: Structured Streaming Affects Versions: 3.3.0 Reporter: Boyang Jerry Peng Often in streaming pipelines used to support ETL use cases, API calls to external systems can be found. To better support and improve performance for such use cases, an asynchronous processing API should be added to Structured Streaming so that external requests can be handled in a more efficient manner. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39585) Multiple Stateful Operators in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-39585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Jerry Peng updated SPARK-39585: -- Summary: Multiple Stateful Operators in Structured Streaming (was: Multiple Stateful Operators) > Multiple Stateful Operators in Structured Streaming > --- > > Key: SPARK-39585 > URL: https://issues.apache.org/jira/browse/SPARK-39585 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Boyang Jerry Peng >Priority: Major > > Currently, Structured Streaming only supports one stateful operator per > pipeline. This constraint excludes Structured Streaming to be used for many > use cases such ones involving chained time window aggregations, chained > stream-stream outer equality join, stream-stream time interval join followed > by time window aggregation. Enabling multiple stateful operators within a > pipeline in Structured Streaming will open up many additional use cases for > its usage as well as be on par with other stream processing engines. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39586) Advanced Windowing in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-39586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Jerry Peng updated SPARK-39586: -- Summary: Advanced Windowing in Structured Streaming (was: Advanced Windowing) > Advanced Windowing in Structured Streaming > -- > > Key: SPARK-39586 > URL: https://issues.apache.org/jira/browse/SPARK-39586 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Boyang Jerry Peng >Priority: Major > > Currently in Structure Streaming there does not exist a user friendly method > to define custom windowing strategies. Users can only choose from an existing > set of built-in window strategies such tumbling, sliding, session windows. > To improve flexibility of the engine and to support more advanced use cases, > Structured Streaming needs to provide an intuitive API that allows users to > define custom window boundaries, to trigger windows when they are complete, > and, if necessary, evict elements from windows. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39588) Inspect the State of Stateful Streaming Pipelines
[ https://issues.apache.org/jira/browse/SPARK-39588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Jerry Peng updated SPARK-39588: -- Issue Type: New Feature (was: Improvement) > Inspect the State of Stateful Streaming Pipelines > - > > Key: SPARK-39588 > URL: https://issues.apache.org/jira/browse/SPARK-39588 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Boyang Jerry Peng >Priority: Major > > Currently there is no mechanism to query or inspect the state of the stateful > streaming pipeline externally. A tool or API that allows a user to do that > would be extremely helpful for debugging. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39588) Inspect the State of Stateful Streaming Pipelines
Boyang Jerry Peng created SPARK-39588: - Summary: Inspect the State of Stateful Streaming Pipelines Key: SPARK-39588 URL: https://issues.apache.org/jira/browse/SPARK-39588 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.3.0 Reporter: Boyang Jerry Peng Currently there is no mechanism to query or inspect the state of the stateful streaming pipeline externally. A tool or API that allows a user to do that would be extremely helpful for debugging. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39587) Schema Evolution for Stateful Pipelines in Structured Streaming
Boyang Jerry Peng created SPARK-39587: - Summary: Schema Evolution for Stateful Pipelines in Structured Streaming Key: SPARK-39587 URL: https://issues.apache.org/jira/browse/SPARK-39587 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.3.0 Reporter: Boyang Jerry Peng Support for schema evolution of stateful operators is non-existent. There is no clear path to evolve the schema for existing built-in stateful operators. For use defined stateful operators such as map/flatMapGroupsWithState, there is no path for evolving state as well. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39584) Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results
[ https://issues.apache.org/jira/browse/SPARK-39584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuyuki Tanimura updated SPARK-39584: -- Description: GenTPCDSData uses the schema defined in `TPCDSSchema` that contains varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for strings whose lengths are < N. When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it uses schema from the parquet file and keeps the paddings. Due to the extra spaces, string filter queries of TPC-DS fail to match. For example, q13 query results are all nulls and returns too fast because string filter does not meet any rows. Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and that is inflating some performance results. I am exploring two possible solutions now 1. Call `CREATE TABLE tableName schema USING parquet LOCATION path` before reading. This is what Spark TPC-DS unit tests are doing 2. Change varchar to string in the schema. This is what [databricks data generator|https://github.com/databricks/spark-sql-perf] is doing TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in https://issues.apache.org/jira/browse/SPARK-35192 History related varchar issue [https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn] was: GenTPCDSData uses the schema defined in `TPCDSSchema` that contains varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for strings whose lengths are < N. When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it uses schema from the parquet file and keeps the paddings. Due to the extra spaces, string filter queries of TPC-DS fail to match. For example, q13 query results are all nulls and returns too fast because string filter does not meet any rows. Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and that is inflating some performance results. I am exploring two possible solutions now 1. Call `CREATE TABLE tableName schema USING parquet LOCATION path` before reading. This is what Spark unit tests are doing 2. Change varchar to string in the schema. This is what [databricks data generator|https://github.com/databricks/spark-sql-perf] is doing TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in https://issues.apache.org/jira/browse/SPARK-35192 History related varchar issue [https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn] > Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results > > > Key: SPARK-39584 > URL: https://issues.apache.org/jira/browse/SPARK-39584 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > > GenTPCDSData uses the schema defined in `TPCDSSchema` that contains > varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for > strings whose lengths are < N. > When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, > it uses schema from the parquet file and keeps the paddings. Due to the extra > spaces, string filter queries of TPC-DS fail to match. For example, q13 query > results are all nulls and returns too fast because string filter does not > meet any rows. > Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and > that is inflating some performance results. > I am exploring two possible solutions now > 1. Call `CREATE TABLE tableName schema USING parquet LOCATION path` before > reading. This is what Spark TPC-DS unit tests are doing > 2. Change varchar to string in the schema. This is what [databricks data > generator|https://github.com/databricks/spark-sql-perf] is doing > TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in > https://issues.apache.org/jira/browse/SPARK-35192 > History related varchar issue > [https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39586) Advanced Windowing
Boyang Jerry Peng created SPARK-39586: - Summary: Advanced Windowing Key: SPARK-39586 URL: https://issues.apache.org/jira/browse/SPARK-39586 Project: Spark Issue Type: New Feature Components: Structured Streaming Affects Versions: 3.3.0 Reporter: Boyang Jerry Peng Currently in Structure Streaming there does not exist a user friendly method to define custom windowing strategies. Users can only choose from an existing set of built-in window strategies such tumbling, sliding, session windows. To improve flexibility of the engine and to support more advanced use cases, Structured Streaming needs to provide an intuitive API that allows users to define custom window boundaries, to trigger windows when they are complete, and, if necessary, evict elements from windows. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39585) Multiple Stateful Operators
Boyang Jerry Peng created SPARK-39585: - Summary: Multiple Stateful Operators Key: SPARK-39585 URL: https://issues.apache.org/jira/browse/SPARK-39585 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.3.0 Reporter: Boyang Jerry Peng Currently, Structured Streaming only supports one stateful operator per pipeline. This constraint excludes Structured Streaming to be used for many use cases such ones involving chained time window aggregations, chained stream-stream outer equality join, stream-stream time interval join followed by time window aggregation. Enabling multiple stateful operators within a pipeline in Structured Streaming will open up many additional use cases for its usage as well as be on par with other stream processing engines. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39584) Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results
[ https://issues.apache.org/jira/browse/SPARK-39584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuyuki Tanimura updated SPARK-39584: -- Description: GenTPCDSData uses the schema defined in `TPCDSSchema` that contains varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for strings whose lengths are < N. When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it uses schema from the parquet file and keeps the paddings. Due to the extra spaces, string filter queries of TPC-DS fail to match. For example, q13 query results are all nulls and returns too fast because string filter does not meet any rows. Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and that is inflating some performance results. I am exploring two possible solutions now 1. Call `CREATE TABLE tableName schema USING parquet LOCATION path` before reading. This is what Spark unit tests are doing 2. Change varchar to string in the schema. This is what [databricks data generator|https://github.com/databricks/spark-sql-perf] is doing TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in https://issues.apache.org/jira/browse/SPARK-35192 History related varchar issue [https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn] was: GenTPCDSData uses the schema defined in `TPCDSSchema` that contains varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for strings whose lengths are < N. When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it uses schema from the parquet file and keeps the paddings. Due to the extra spaces, string filter queries of TPC-DS fail to match. For example, q13 query results are all nulls and returns too fast because string filter does not meet any rows. Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and that is inflating some performance results. I am exploring two possible solutions now 1. Call `{{{}CREATE TABLE tableName schema USING parquet LOCATION path` {}}}before reading. This is what Spark unit tests are doing 2. Change varchar to string in the schema. This is what [databricks data generator|[https://github.com/databricks/spark-sql-perf]] is doing TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in https://issues.apache.org/jira/browse/SPARK-35192 History related varchar issue [https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn] > Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results > > > Key: SPARK-39584 > URL: https://issues.apache.org/jira/browse/SPARK-39584 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > > GenTPCDSData uses the schema defined in `TPCDSSchema` that contains > varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for > strings whose lengths are < N. > When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, > it uses schema from the parquet file and keeps the paddings. Due to the extra > spaces, string filter queries of TPC-DS fail to match. For example, q13 query > results are all nulls and returns too fast because string filter does not > meet any rows. > Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and > that is inflating some performance results. > I am exploring two possible solutions now > 1. Call `CREATE TABLE tableName schema USING parquet LOCATION path` before > reading. This is what Spark unit tests are doing > 2. Change varchar to string in the schema. This is what [databricks data > generator|https://github.com/databricks/spark-sql-perf] is doing > TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in > https://issues.apache.org/jira/browse/SPARK-35192 > History related varchar issue > [https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39584) Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results
[ https://issues.apache.org/jira/browse/SPARK-39584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuyuki Tanimura updated SPARK-39584: -- Description: GenTPCDSData uses the schema defined in `TPCDSSchema` that contains varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for strings whose lengths are < N. When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it uses schema from the parquet file and keeps the paddings. Due to the extra spaces, string filter queries of TPC-DS fail to match. For example, q13 query results are all nulls and returns too fast because string filter does not meet any rows. Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and that is inflating some performance results. I am exploring two possible solutions now 1. Call `{{{}CREATE TABLE tableName schema USING parquet LOCATION path` {}}}before reading. This is what Spark unit tests are doing 2. Change varchar to string in the schema. This is what [databricks data generator|[https://github.com/databricks/spark-sql-perf]] is doing TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in https://issues.apache.org/jira/browse/SPARK-35192 History related varchar issue [https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn] was: GenTPCDSData uses the schema defined in `TPCDSSchema` that contains varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for strings whose lengths are < N. When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it uses schema from the parquet file and keeps the paddings. Due to the extra spaces, string filter queries of TPC-DS fail to match. For example, q13 query results are all nulls and returns too fast because string filter does not meet any rows. Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and that is inflating some performance results. I am exploring two possible solutions now 1. Call `{{{}CREATE TABLE tableName schema USING parquet LOCATION path` {}}}before reading. This is what Spark unit tests are doing 2. Change varchar to string in the schema. This is what [databricks data generator| [https://github.com/databricks/spark-sql-perf]] is doing TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in https://issues.apache.org/jira/browse/SPARK-35192 History related varchar https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn > Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results > > > Key: SPARK-39584 > URL: https://issues.apache.org/jira/browse/SPARK-39584 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > > GenTPCDSData uses the schema defined in `TPCDSSchema` that contains > varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for > strings whose lengths are < N. > When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, > it uses schema from the parquet file and keeps the paddings. Due to the extra > spaces, string filter queries of TPC-DS fail to match. For example, q13 query > results are all nulls and returns too fast because string filter does not > meet any rows. > Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and > that is inflating some performance results. > I am exploring two possible solutions now > 1. Call `{{{}CREATE TABLE tableName schema USING parquet LOCATION path` > {}}}before reading. This is what Spark unit tests are doing > 2. Change varchar to string in the schema. This is what [databricks data > generator|[https://github.com/databricks/spark-sql-perf]] is doing > TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in > https://issues.apache.org/jira/browse/SPARK-35192 > History related varchar issue > [https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39584) Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results
[ https://issues.apache.org/jira/browse/SPARK-39584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558678#comment-17558678 ] Kazuyuki Tanimura commented on SPARK-39584: --- Hi [~maropu] , pinging you since it seems you are the expert in this area. I am wondering if you have preference between the solution 1 vs 2? CC [~dongjoon] I will be adding benchmark results from the master for the future comparison purpose > Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results > > > Key: SPARK-39584 > URL: https://issues.apache.org/jira/browse/SPARK-39584 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > > GenTPCDSData uses the schema defined in `TPCDSSchema` that contains > varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for > strings whose lengths are < N. > When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, > it uses schema from the parquet file and keeps the paddings. Due to the extra > spaces, string filter queries of TPC-DS fail to match. For example, q13 query > results are all nulls and returns too fast because string filter does not > meet any rows. > Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and > that is inflating some performance results. > I am exploring two possible solutions now > 1. Call `{{{}CREATE TABLE tableName schema USING parquet LOCATION path` > {}}}before reading. This is what Spark unit tests are doing > 2. Change varchar to string in the schema. This is what [databricks data > generator| [https://github.com/databricks/spark-sql-perf]] is doing > TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in > https://issues.apache.org/jira/browse/SPARK-35192 > History related varchar > https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39584) Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results
Kazuyuki Tanimura created SPARK-39584: - Summary: Fix TPCDSQueryBenchmark Measuring Performance of Wrong Query Results Key: SPARK-39584 URL: https://issues.apache.org/jira/browse/SPARK-39584 Project: Spark Issue Type: Test Components: Tests Affects Versions: 3.3.0, 3.2.1, 3.1.2, 3.0.3, 3.4.0 Reporter: Kazuyuki Tanimura GenTPCDSData uses the schema defined in `TPCDSSchema` that contains varchar(N)/char(N). When GenTPCDSData generates parquet, that pads spaces for strings whose lengths are < N. When TPCDSQueryBenchmark reads data from parquet generated by GenTPCDSData, it uses schema from the parquet file and keeps the paddings. Due to the extra spaces, string filter queries of TPC-DS fail to match. For example, q13 query results are all nulls and returns too fast because string filter does not meet any rows. Therefore, TPCDSQueryBenchmark is benchmarking with wrong query results and that is inflating some performance results. I am exploring two possible solutions now 1. Call `{{{}CREATE TABLE tableName schema USING parquet LOCATION path` {}}}before reading. This is what Spark unit tests are doing 2. Change varchar to string in the schema. This is what [databricks data generator| [https://github.com/databricks/spark-sql-perf]] is doing TPCDSQueryBenchmark was ported from databricks/spark-sql-perf in https://issues.apache.org/jira/browse/SPARK-35192 History related varchar https://lists.apache.org/thread/rg7pgwyto3616hb15q78n0sykls9j7rn -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39583) Make RefreshTable be compatible with 3 layer namespace
Rui Wang created SPARK-39583: Summary: Make RefreshTable be compatible with 3 layer namespace Key: SPARK-39583 URL: https://issues.apache.org/jira/browse/SPARK-39583 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39582) "Since " docs on array_agg are incorrect
[ https://issues.apache.org/jira/browse/SPARK-39582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558674#comment-17558674 ] Apache Spark commented on SPARK-39582: -- User 'nchammas' has created a pull request for this issue: https://github.com/apache/spark/pull/36982 > "Since " docs on array_agg are incorrect > - > > Key: SPARK-39582 > URL: https://issues.apache.org/jira/browse/SPARK-39582 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Nicholas Chammas >Priority: Minor > > [https://spark.apache.org/docs/latest/api/sql/#array_agg] > The docs currently say "Since: 2.0.0", but `array_agg` was added in Spark > 3.3.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39582) "Since " docs on array_agg are incorrect
[ https://issues.apache.org/jira/browse/SPARK-39582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558675#comment-17558675 ] Apache Spark commented on SPARK-39582: -- User 'nchammas' has created a pull request for this issue: https://github.com/apache/spark/pull/36982 > "Since " docs on array_agg are incorrect > - > > Key: SPARK-39582 > URL: https://issues.apache.org/jira/browse/SPARK-39582 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Nicholas Chammas >Assignee: Apache Spark >Priority: Minor > > [https://spark.apache.org/docs/latest/api/sql/#array_agg] > The docs currently say "Since: 2.0.0", but `array_agg` was added in Spark > 3.3.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39582) "Since " docs on array_agg are incorrect
[ https://issues.apache.org/jira/browse/SPARK-39582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39582: Assignee: (was: Apache Spark) > "Since " docs on array_agg are incorrect > - > > Key: SPARK-39582 > URL: https://issues.apache.org/jira/browse/SPARK-39582 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Nicholas Chammas >Priority: Minor > > [https://spark.apache.org/docs/latest/api/sql/#array_agg] > The docs currently say "Since: 2.0.0", but `array_agg` was added in Spark > 3.3.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39582) "Since " docs on array_agg are incorrect
[ https://issues.apache.org/jira/browse/SPARK-39582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39582: Assignee: Apache Spark > "Since " docs on array_agg are incorrect > - > > Key: SPARK-39582 > URL: https://issues.apache.org/jira/browse/SPARK-39582 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Nicholas Chammas >Assignee: Apache Spark >Priority: Minor > > [https://spark.apache.org/docs/latest/api/sql/#array_agg] > The docs currently say "Since: 2.0.0", but `array_agg` was added in Spark > 3.3.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39582) "Since " docs on array_agg are incorrect
Nicholas Chammas created SPARK-39582: Summary: "Since " docs on array_agg are incorrect Key: SPARK-39582 URL: https://issues.apache.org/jira/browse/SPARK-39582 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Nicholas Chammas [https://spark.apache.org/docs/latest/api/sql/#array_agg] The docs currently say "Since: 2.0.0", but `array_agg` was added in Spark 3.3.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39581) Better error messages for Pandas API on Spark
Xinrong Meng created SPARK-39581: Summary: Better error messages for Pandas API on Spark Key: SPARK-39581 URL: https://issues.apache.org/jira/browse/SPARK-39581 Project: Spark Issue Type: Improvement Components: Pandas API on Spark Affects Versions: 3.4.0 Reporter: Xinrong Meng Currently, some error messages for pandas API on Spark are confusing. Specifically, * when users assume pandas-on-Spark has implemented pandas Scalars * when users pass a ps.Index when creating a DataFrame/Series Better error messages enhance usability and furthermore, user adoption. We should improve error messages for Pandas API on Spark. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39574) Better error message when `ps.Index` is used for DataFrame/Series creation
[ https://issues.apache.org/jira/browse/SPARK-39574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-39574: - Parent: SPARK-39581 Issue Type: Sub-task (was: Improvement) > Better error message when `ps.Index` is used for DataFrame/Series creation > -- > > Key: SPARK-39574 > URL: https://issues.apache.org/jira/browse/SPARK-39574 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Better error message when `ps.Index` is used for DataFrame/Series creation. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39522) Uses Docker image cache over a custom image
[ https://issues.apache.org/jira/browse/SPARK-39522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39522: Assignee: Apache Spark > Uses Docker image cache over a custom image > --- > > Key: SPARK-39522 > URL: https://issues.apache.org/jira/browse/SPARK-39522 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > We should probably replace the base image > (https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L302, > https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage) to plain > ubunto image w/ Docker image cache. See also > https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39522) Uses Docker image cache over a custom image
[ https://issues.apache.org/jira/browse/SPARK-39522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39522: Assignee: (was: Apache Spark) > Uses Docker image cache over a custom image > --- > > Key: SPARK-39522 > URL: https://issues.apache.org/jira/browse/SPARK-39522 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > We should probably replace the base image > (https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L302, > https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage) to plain > ubunto image w/ Docker image cache. See also > https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39574) Better error message when `ps.Index` is used for DataFrame/Series creation
[ https://issues.apache.org/jira/browse/SPARK-39574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39574: Assignee: (was: Apache Spark) > Better error message when `ps.Index` is used for DataFrame/Series creation > -- > > Key: SPARK-39574 > URL: https://issues.apache.org/jira/browse/SPARK-39574 > Project: Spark > Issue Type: Improvement > Components: Pandas API on Spark, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Better error message when `ps.Index` is used for DataFrame/Series creation. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39522) Uses Docker image cache over a custom image
[ https://issues.apache.org/jira/browse/SPARK-39522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558585#comment-17558585 ] Apache Spark commented on SPARK-39522: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/36980 > Uses Docker image cache over a custom image > --- > > Key: SPARK-39522 > URL: https://issues.apache.org/jira/browse/SPARK-39522 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > We should probably replace the base image > (https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L302, > https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage) to plain > ubunto image w/ Docker image cache. See also > https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39574) Better error message when `ps.Index` is used for DataFrame/Series creation
[ https://issues.apache.org/jira/browse/SPARK-39574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39574: Assignee: Apache Spark > Better error message when `ps.Index` is used for DataFrame/Series creation > -- > > Key: SPARK-39574 > URL: https://issues.apache.org/jira/browse/SPARK-39574 > Project: Spark > Issue Type: Improvement > Components: Pandas API on Spark, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Better error message when `ps.Index` is used for DataFrame/Series creation. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39574) Better error message when `ps.Index` is used for DataFrame/Series creation
[ https://issues.apache.org/jira/browse/SPARK-39574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558583#comment-17558583 ] Apache Spark commented on SPARK-39574: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36981 > Better error message when `ps.Index` is used for DataFrame/Series creation > -- > > Key: SPARK-39574 > URL: https://issues.apache.org/jira/browse/SPARK-39574 > Project: Spark > Issue Type: Improvement > Components: Pandas API on Spark, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Better error message when `ps.Index` is used for DataFrame/Series creation. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39522) Uses Docker image cache over a custom image
[ https://issues.apache.org/jira/browse/SPARK-39522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558570#comment-17558570 ] Yikun Jiang commented on SPARK-39522: - Some investigation: - The build job are running in each user's downstream repo, so we have to use "Registry cache" as a bridge. - The complete flow would be: # (apache repo) Build the image cache in apache repo, this image will be refreshed if dockerfile changes merged. # (user repo) Build the latest infra image in each pr based on image cache and PR changes Dockerfile, and upload it to user gchr.io. # (user repo) Use the latest infra image of Step2 to running pyspark, sparkr, lint. > Uses Docker image cache over a custom image > --- > > Key: SPARK-39522 > URL: https://issues.apache.org/jira/browse/SPARK-39522 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > We should probably replace the base image > (https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L302, > https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage) to plain > ubunto image w/ Docker image cache. See also > https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39576) Support GitHub Actions generate benchmark results using Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-39576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-39576. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36975 [https://github.com/apache/spark/pull/36975] > Support GitHub Actions generate benchmark results using Scala 2.13 > -- > > Key: SPARK-39576 > URL: https://issues.apache.org/jira/browse/SPARK-39576 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > > GitHub Actions only support generate benchmark results using Scala 2.12 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39576) Support GitHub Actions generate benchmark results using Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-39576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-39576: - Assignee: Yang Jie > Support GitHub Actions generate benchmark results using Scala 2.13 > -- > > Key: SPARK-39576 > URL: https://issues.apache.org/jira/browse/SPARK-39576 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > GitHub Actions only support generate benchmark results using Scala 2.12 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39580) Write event logs in a continuous manner
dimtiris kanoute created SPARK-39580: Summary: Write event logs in a continuous manner Key: SPARK-39580 URL: https://issues.apache.org/jira/browse/SPARK-39580 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.3.0 Reporter: dimtiris kanoute We are using Spark Thrift Server as a service to run Spark SQL queries. Currently Spark writes *eventlogs* once the job is done/ gracefully killed. This means that in case of Spark thrift Server when it runs as one job in case of unexpected error ( i.e. OOM ) we cannot access the logs after, as it doesn't write them due to the unexpected shut down. We would like to suggest as an improvement the functionality for sparks *to write the event logs to a cloud storage in a continuous manner.* -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39579) Make ListFunctions API compatible
[ https://issues.apache.org/jira/browse/SPARK-39579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558488#comment-17558488 ] Apache Spark commented on SPARK-39579: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/36977 > Make ListFunctions API compatible > -- > > Key: SPARK-39579 > URL: https://issues.apache.org/jira/browse/SPARK-39579 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: zhengruifeng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39579) Make ListFunctions API compatible
[ https://issues.apache.org/jira/browse/SPARK-39579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558487#comment-17558487 ] Apache Spark commented on SPARK-39579: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/36977 > Make ListFunctions API compatible > -- > > Key: SPARK-39579 > URL: https://issues.apache.org/jira/browse/SPARK-39579 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: zhengruifeng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39579) Make ListFunctions API compatible
[ https://issues.apache.org/jira/browse/SPARK-39579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39579: Assignee: (was: Apache Spark) > Make ListFunctions API compatible > -- > > Key: SPARK-39579 > URL: https://issues.apache.org/jira/browse/SPARK-39579 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: zhengruifeng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39579) Make ListFunctions API compatible
[ https://issues.apache.org/jira/browse/SPARK-39579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39579: Assignee: Apache Spark > Make ListFunctions API compatible > -- > > Key: SPARK-39579 > URL: https://issues.apache.org/jira/browse/SPARK-39579 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: zhengruifeng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39579) Make ListFunctions API compatible
zhengruifeng created SPARK-39579: Summary: Make ListFunctions API compatible Key: SPARK-39579 URL: https://issues.apache.org/jira/browse/SPARK-39579 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: zhengruifeng -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39578) The driver cannot parse the SQL statement and the job is not executed
miaowang created SPARK-39578: Summary: The driver cannot parse the SQL statement and the job is not executed Key: SPARK-39578 URL: https://issues.apache.org/jira/browse/SPARK-39578 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.8 Reporter: miaowang We use Spark's API to submit SQL execution tasks. Occasionally, SQL cannot be executed and the task does not exit. Please give us some advice on how to solve this problem. Operation execution mode: !image-2022-06-24-18-41-50-864.png! Read SQL script and submit jobs in sequence。 Abnormal operation information: {code:java} //22/06/23 20:55:36 WARN HiveConf: HiveConf of name hive.strict.checks.type.safety does not exist 22/06/23 20:55:36 WARN HiveConf: HiveConf of name hive.strict.checks.cartesian.product does not exist 22/06/23 20:55:37 INFO metastore: Trying to connect to metastore with URI thrift://bdp-datalake-hive-metastore-01-10-8-49-114:9083 22/06/23 20:55:37 INFO metastore: Connected to metastore. 22/06/23 20:55:38 INFO SessionState: Created local directory: /hdfsdata/subdata10/yarn/nmcgroup/usercache/hive/appcache/application_1643023142753_3753253/container_e16_1643023142753_3753253_01_01/tmp/nobody 22/06/23 20:55:38 INFO SessionState: Created local directory: /hdfsdata/subdata10/yarn/nmcgroup/usercache/hive/appcache/application_1643023142753_3753253/container_e16_1643023142753_3753253_01_01/tmp/944b1102-4564-4879-8305-4df390b26669_resources 22/06/23 20:55:38 INFO SessionState: Created HDFS directory: /tmp/hive/hive/944b1102-4564-4879-8305-4df390b26669 22/06/23 20:55:38 INFO SessionState: Created local directory: /hdfsdata/subdata10/yarn/nmcgroup/usercache/hive/appcache/application_1643023142753_3753253/container_e16_1643023142753_3753253_01_01/tmp/nobody/944b1102-4564-4879-8305-4df390b26669 22/06/23 20:55:38 INFO SessionState: Created HDFS directory: /tmp/hive/hive/944b1102-4564-4879-8305-4df390b26669/_tmp_space.db 22/06/23 20:55:38 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is /user/hive/warehouse 22/06/23 20:55:47 INFO ContextCleaner: Cleaned accumulator 7 22/06/23 20:55:47 INFO ContextCleaner: Cleaned accumulator 12 22/06/23 20:55:47 INFO ContextCleaner: Cleaned accumulator 14 22/06/23 20:55:47 INFO BlockManagerInfo: Removed broadcast_0_piece0 on bdp-hdfs-36-10-8-33-65:39014 in memory (size: 31.1 KB, free: 3.0 GB) 22/06/23 20:55:48 INFO BlockManagerInfo: Removed broadcast_0_piece0 on bdp-hdfs-16-10-8-33-45:33394 in memory (size: 31.1 KB, free: 8.4 GB) 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 10 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 2 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 18 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 24 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 19 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 6 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 5 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 17 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 25 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 13 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 29 22/06/23 20:55:48 INFO BlockManagerInfo: Removed broadcast_1_piece0 on bdp-hdfs-36-10-8-33-65:39014 in memory (size: 4.0 KB, free: 3.0 GB) 22/06/23 20:55:48 INFO BlockManagerInfo: Removed broadcast_1_piece0 on bdp-hdfs-16-10-8-33-45:33394 in memory (size: 4.0 KB, free: 8.4 GB) 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 28 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 4 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 16 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 21 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 20 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 9 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 23 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 26 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 0 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 3 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 11 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 15 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 27 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 8 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 1 22/06/23 20:55:48 INFO ContextCleaner: Cleaned accumulator 22 22/06/23 20:55:49 INFO InMemoryFileIndex: It took 66 ms to list leaf files for 1 paths. 22/06/23 20:55:49 INFO InMemoryFileIndex: It took 8 ms to list leaf files for 1 paths. 正在执行>drop table if exists dc.cowell_omni_channel_jingli_day03_operator_info_org_20220622 执行结束>drop table if exists dc.cowell_omni_channel_jingli_day03_operator_info_org_20220622 正在执行>
[jira] [Assigned] (SPARK-39506) CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs
[ https://issues.apache.org/jira/browse/SPARK-39506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-39506: --- Assignee: Rui Wang > CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, > listCatalogs > --- > > Key: SPARK-39506 > URL: https://issues.apache.org/jira/browse/SPARK-39506 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39506) CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs
[ https://issues.apache.org/jira/browse/SPARK-39506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-39506. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36904 [https://github.com/apache/spark/pull/36904] > CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, > listCatalogs > --- > > Key: SPARK-39506 > URL: https://issues.apache.org/jira/browse/SPARK-39506 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39453) DS V2 supports push down misc non-aggregate functions(non ANSI)
[ https://issues.apache.org/jira/browse/SPARK-39453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-39453: --- Assignee: jiaan.geng > DS V2 supports push down misc non-aggregate functions(non ANSI) > --- > > Key: SPARK-39453 > URL: https://issues.apache.org/jira/browse/SPARK-39453 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39453) DS V2 supports push down misc non-aggregate functions(non ANSI)
[ https://issues.apache.org/jira/browse/SPARK-39453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-39453. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36830 [https://github.com/apache/spark/pull/36830] > DS V2 supports push down misc non-aggregate functions(non ANSI) > --- > > Key: SPARK-39453 > URL: https://issues.apache.org/jira/browse/SPARK-39453 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39577) Add SQL reference for built-in functions
[ https://issues.apache.org/jira/browse/SPARK-39577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558424#comment-17558424 ] Apache Spark commented on SPARK-39577: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/36976 > Add SQL reference for built-in functions > > > Key: SPARK-39577 > URL: https://issues.apache.org/jira/browse/SPARK-39577 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark doc of SQL reference missing many functions. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39577) Add SQL reference for built-in functions
[ https://issues.apache.org/jira/browse/SPARK-39577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39577: Assignee: Apache Spark > Add SQL reference for built-in functions > > > Key: SPARK-39577 > URL: https://issues.apache.org/jira/browse/SPARK-39577 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > Currently, Spark doc of SQL reference missing many functions. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39577) Add SQL reference for built-in functions
[ https://issues.apache.org/jira/browse/SPARK-39577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39577: Assignee: (was: Apache Spark) > Add SQL reference for built-in functions > > > Key: SPARK-39577 > URL: https://issues.apache.org/jira/browse/SPARK-39577 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark doc of SQL reference missing many functions. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39577) Add SQL reference for built-in functions
[ https://issues.apache.org/jira/browse/SPARK-39577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558423#comment-17558423 ] Apache Spark commented on SPARK-39577: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/36976 > Add SQL reference for built-in functions > > > Key: SPARK-39577 > URL: https://issues.apache.org/jira/browse/SPARK-39577 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark doc of SQL reference missing many functions. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38978) DS V2 supports push down OFFSET operator
[ https://issues.apache.org/jira/browse/SPARK-38978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-38978. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36295 [https://github.com/apache/spark/pull/36295] > DS V2 supports push down OFFSET operator > > > Key: SPARK-38978 > URL: https://issues.apache.org/jira/browse/SPARK-38978 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > > Currently, DS V2 push-down supports LIMIT but OFFSET. > If we can pushing down OFFSET to JDBC data source, it will be better > performance. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38978) DS V2 supports push down OFFSET operator
[ https://issues.apache.org/jira/browse/SPARK-38978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-38978: --- Assignee: jiaan.geng > DS V2 supports push down OFFSET operator > > > Key: SPARK-38978 > URL: https://issues.apache.org/jira/browse/SPARK-38978 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > > Currently, DS V2 push-down supports LIMIT but OFFSET. > If we can pushing down OFFSET to JDBC data source, it will be better > performance. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39577) Add SQL reference for built-in functions
[ https://issues.apache.org/jira/browse/SPARK-39577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-39577: --- Summary: Add SQL reference for built-in functions (was: Add document into SQL reference for built-in functions) > Add SQL reference for built-in functions > > > Key: SPARK-39577 > URL: https://issues.apache.org/jira/browse/SPARK-39577 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark doc of SQL reference missing many functions. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39577) Add document into SQL reference for built-in functions
jiaan.geng created SPARK-39577: -- Summary: Add document into SQL reference for built-in functions Key: SPARK-39577 URL: https://issues.apache.org/jira/browse/SPARK-39577 Project: Spark Issue Type: Documentation Components: SQL Affects Versions: 3.4.0 Reporter: jiaan.geng Currently, Spark doc of SQL reference missing many functions. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39576) Support GitHub Actions generate benchmark results using Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-39576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558377#comment-17558377 ] Apache Spark commented on SPARK-39576: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/36975 > Support GitHub Actions generate benchmark results using Scala 2.13 > -- > > Key: SPARK-39576 > URL: https://issues.apache.org/jira/browse/SPARK-39576 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > GitHub Actions only support generate benchmark results using Scala 2.12 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39576) Support GitHub Actions generate benchmark results using Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-39576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39576: Assignee: (was: Apache Spark) > Support GitHub Actions generate benchmark results using Scala 2.13 > -- > > Key: SPARK-39576 > URL: https://issues.apache.org/jira/browse/SPARK-39576 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > GitHub Actions only support generate benchmark results using Scala 2.12 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39576) Support GitHub Actions generate benchmark results using Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-39576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39576: Assignee: Apache Spark > Support GitHub Actions generate benchmark results using Scala 2.13 > -- > > Key: SPARK-39576 > URL: https://issues.apache.org/jira/browse/SPARK-39576 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > GitHub Actions only support generate benchmark results using Scala 2.12 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39447) Only non-broadcast query stage can propagate empty relation
[ https://issues.apache.org/jira/browse/SPARK-39447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558376#comment-17558376 ] Apache Spark commented on SPARK-39447: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/36974 > Only non-broadcast query stage can propagate empty relation > --- > > Key: SPARK-39447 > URL: https://issues.apache.org/jira/browse/SPARK-39447 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39576) Support GitHub Actions generate benchmark results using Scala 2.13
Yang Jie created SPARK-39576: Summary: Support GitHub Actions generate benchmark results using Scala 2.13 Key: SPARK-39576 URL: https://issues.apache.org/jira/browse/SPARK-39576 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.4.0 Reporter: Yang Jie GitHub Actions only support generate benchmark results using Scala 2.12 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org