[jira] [Resolved] (SPARK-34355) Add log and time cost for commit job
[ https://issues.apache.org/jira/browse/SPARK-34355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-34355. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31471 [https://github.com/apache/spark/pull/31471] > Add log and time cost for commit job > > > Key: SPARK-34355 > URL: https://issues.apache.org/jira/browse/SPARK-34355 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Assignee: ulysses you >Priority: Minor > Fix For: 3.2.0 > > > Th commit job is a heavy option and we have seen many times Spark block at > this code place due to the slow rpc with namenode or other. > > It's better to record the time that commit job cost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34355) Add log and time cost for commit job
[ https://issues.apache.org/jira/browse/SPARK-34355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-34355: Assignee: ulysses you > Add log and time cost for commit job > > > Key: SPARK-34355 > URL: https://issues.apache.org/jira/browse/SPARK-34355 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Assignee: ulysses you >Priority: Minor > > Th commit job is a heavy option and we have seen many times Spark block at > this code place due to the slow rpc with namenode or other. > > It's better to record the time that commit job cost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-34394) Unify output of SHOW FUNCTIONS and pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-34394: --- Comment: was deleted (was: I'm working on.) > Unify output of SHOW FUNCTIONS and pass output attributes properly > -- > > Key: SPARK-34394 > URL: https://issues.apache.org/jira/browse/SPARK-34394 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34394) Unify output of SHOW FUNCTIONS and pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280803#comment-17280803 ] Apache Spark commented on SPARK-34394: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/31519 > Unify output of SHOW FUNCTIONS and pass output attributes properly > -- > > Key: SPARK-34394 > URL: https://issues.apache.org/jira/browse/SPARK-34394 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34394) Unify output of SHOW FUNCTIONS and pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280802#comment-17280802 ] Apache Spark commented on SPARK-34394: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/31519 > Unify output of SHOW FUNCTIONS and pass output attributes properly > -- > > Key: SPARK-34394 > URL: https://issues.apache.org/jira/browse/SPARK-34394 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34394) Unify output of SHOW FUNCTIONS and pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34394: Assignee: Apache Spark > Unify output of SHOW FUNCTIONS and pass output attributes properly > -- > > Key: SPARK-34394 > URL: https://issues.apache.org/jira/browse/SPARK-34394 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34394) Unify output of SHOW FUNCTIONS and pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34394: Assignee: (was: Apache Spark) > Unify output of SHOW FUNCTIONS and pass output attributes properly > -- > > Key: SPARK-34394 > URL: https://issues.apache.org/jira/browse/SPARK-34394 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34375) Replaces `Mockito.initMocks` with `Mockito.openMocks`
[ https://issues.apache.org/jira/browse/SPARK-34375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-34375. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31487 [https://github.com/apache/spark/pull/31487] > Replaces `Mockito.initMocks` with `Mockito.openMocks` > - > > Key: SPARK-34375 > URL: https://issues.apache.org/jira/browse/SPARK-34375 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core, Tests >Affects Versions: 3.2.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.2.0 > > > Mockito.initMocks is a deprecated api, should use openMocks(Object) instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34375) Replaces `Mockito.initMocks` with `Mockito.openMocks`
[ https://issues.apache.org/jira/browse/SPARK-34375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-34375: Assignee: Yang Jie > Replaces `Mockito.initMocks` with `Mockito.openMocks` > - > > Key: SPARK-34375 > URL: https://issues.apache.org/jira/browse/SPARK-34375 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core, Tests >Affects Versions: 3.2.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > Mockito.initMocks is a deprecated api, should use openMocks(Object) instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34391) Upgrade commons-io to 2.8.0
[ https://issues.apache.org/jira/browse/SPARK-34391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-34391. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31503 [https://github.com/apache/spark/pull/31503] > Upgrade commons-io to 2.8.0 > --- > > Key: SPARK-34391 > URL: https://issues.apache.org/jira/browse/SPARK-34391 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34391) Upgrade commons-io to 2.8.0
[ https://issues.apache.org/jira/browse/SPARK-34391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-34391: - Assignee: Dongjoon Hyun > Upgrade commons-io to 2.8.0 > --- > > Key: SPARK-34391 > URL: https://issues.apache.org/jira/browse/SPARK-34391 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34309) Use Caffeine instead of Guava Cache
[ https://issues.apache.org/jira/browse/SPARK-34309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280769#comment-17280769 ] Yang Jie edited comment on SPARK-34309 at 2/8/21, 5:48 AM: --- Thx [~xkrogen] ~ I submitted a temporary pr which includes all the places where guava cache is used {quote}[~LuciferYang] can you elaborate on where Guava Cache is used in performance critical sections within Spark? Microbenchmarks are nice but it would be good to understand what kind of effect this could have on overall performance. {quote} I need to think about where to make a Microbenchmarks was (Author: luciferyang): Thx [~xkrogen] ~ I submitted a temporary pr which includes all the places where guava cache is used, {quote}[~LuciferYang] can you elaborate on where Guava Cache is used in performance critical sections within Spark? Microbenchmarks are nice but it would be good to understand what kind of effect this could have on overall performance. {quote} I need to think about where to make a Microbenchmarks > Use Caffeine instead of Guava Cache > --- > > Key: SPARK-34309 > URL: https://issues.apache.org/jira/browse/SPARK-34309 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Minor > Attachments: image-2021-02-05-18-08-48-852.png, screenshot-1.png > > > Caffeine is a high performance, near optimal caching library based on Java 8, > it is used in a similar way to guava cache, but with better performance. The > comparison results are as follow are on the [caffeine benchmarks > |https://github.com/ben-manes/caffeine/wiki/Benchmarks] > At the same time, caffeine has been used in some open source projects like > Cassandra, Hbase, Neo4j, Druid, Spring and so on. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34309) Use Caffeine instead of Guava Cache
[ https://issues.apache.org/jira/browse/SPARK-34309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280769#comment-17280769 ] Yang Jie commented on SPARK-34309: -- Thx [~xkrogen] ~ I submitted a temporary pr which includes all the places where guava cache is used, {quote}[~LuciferYang] can you elaborate on where Guava Cache is used in performance critical sections within Spark? Microbenchmarks are nice but it would be good to understand what kind of effect this could have on overall performance. {quote} I need to think about where to make a Microbenchmarks > Use Caffeine instead of Guava Cache > --- > > Key: SPARK-34309 > URL: https://issues.apache.org/jira/browse/SPARK-34309 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Minor > Attachments: image-2021-02-05-18-08-48-852.png, screenshot-1.png > > > Caffeine is a high performance, near optimal caching library based on Java 8, > it is used in a similar way to guava cache, but with better performance. The > comparison results are as follow are on the [caffeine benchmarks > |https://github.com/ben-manes/caffeine/wiki/Benchmarks] > At the same time, caffeine has been used in some open source projects like > Cassandra, Hbase, Neo4j, Druid, Spring and so on. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34309) Use Caffeine instead of Guava Cache
[ https://issues.apache.org/jira/browse/SPARK-34309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280768#comment-17280768 ] Apache Spark commented on SPARK-34309: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/31517 > Use Caffeine instead of Guava Cache > --- > > Key: SPARK-34309 > URL: https://issues.apache.org/jira/browse/SPARK-34309 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Minor > Attachments: image-2021-02-05-18-08-48-852.png, screenshot-1.png > > > Caffeine is a high performance, near optimal caching library based on Java 8, > it is used in a similar way to guava cache, but with better performance. The > comparison results are as follow are on the [caffeine benchmarks > |https://github.com/ben-manes/caffeine/wiki/Benchmarks] > At the same time, caffeine has been used in some open source projects like > Cassandra, Hbase, Neo4j, Druid, Spring and so on. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34239) Unify output of SHOW COLUMNS pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280767#comment-17280767 ] Apache Spark commented on SPARK-34239: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/31518 > Unify output of SHOW COLUMNS pass output attributes properly > > > Key: SPARK-34239 > URL: https://issues.apache.org/jira/browse/SPARK-34239 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > Unify output of SHOW COLUMNS pass output attributes properly -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34309) Use Caffeine instead of Guava Cache
[ https://issues.apache.org/jira/browse/SPARK-34309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34309: Assignee: (was: Apache Spark) > Use Caffeine instead of Guava Cache > --- > > Key: SPARK-34309 > URL: https://issues.apache.org/jira/browse/SPARK-34309 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Minor > Attachments: image-2021-02-05-18-08-48-852.png, screenshot-1.png > > > Caffeine is a high performance, near optimal caching library based on Java 8, > it is used in a similar way to guava cache, but with better performance. The > comparison results are as follow are on the [caffeine benchmarks > |https://github.com/ben-manes/caffeine/wiki/Benchmarks] > At the same time, caffeine has been used in some open source projects like > Cassandra, Hbase, Neo4j, Druid, Spring and so on. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34309) Use Caffeine instead of Guava Cache
[ https://issues.apache.org/jira/browse/SPARK-34309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280764#comment-17280764 ] Apache Spark commented on SPARK-34309: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/31517 > Use Caffeine instead of Guava Cache > --- > > Key: SPARK-34309 > URL: https://issues.apache.org/jira/browse/SPARK-34309 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Minor > Attachments: image-2021-02-05-18-08-48-852.png, screenshot-1.png > > > Caffeine is a high performance, near optimal caching library based on Java 8, > it is used in a similar way to guava cache, but with better performance. The > comparison results are as follow are on the [caffeine benchmarks > |https://github.com/ben-manes/caffeine/wiki/Benchmarks] > At the same time, caffeine has been used in some open source projects like > Cassandra, Hbase, Neo4j, Druid, Spring and so on. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34309) Use Caffeine instead of Guava Cache
[ https://issues.apache.org/jira/browse/SPARK-34309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34309: Assignee: Apache Spark > Use Caffeine instead of Guava Cache > --- > > Key: SPARK-34309 > URL: https://issues.apache.org/jira/browse/SPARK-34309 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > Attachments: image-2021-02-05-18-08-48-852.png, screenshot-1.png > > > Caffeine is a high performance, near optimal caching library based on Java 8, > it is used in a similar way to guava cache, but with better performance. The > comparison results are as follow are on the [caffeine benchmarks > |https://github.com/ben-manes/caffeine/wiki/Benchmarks] > At the same time, caffeine has been used in some open source projects like > Cassandra, Hbase, Neo4j, Druid, Spring and so on. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34239) Unify output of SHOW COLUMNS pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280765#comment-17280765 ] Apache Spark commented on SPARK-34239: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/31518 > Unify output of SHOW COLUMNS pass output attributes properly > > > Key: SPARK-34239 > URL: https://issues.apache.org/jira/browse/SPARK-34239 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > Unify output of SHOW COLUMNS pass output attributes properly -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34238) Unify output of SHOW PARTITIONS pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280763#comment-17280763 ] Apache Spark commented on SPARK-34238: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/31516 > Unify output of SHOW PARTITIONS pass output attributes properly > --- > > Key: SPARK-34238 > URL: https://issues.apache.org/jira/browse/SPARK-34238 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > Unify output of SHOW PARTITIONS -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34238) Unify output of SHOW PARTITIONS pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280762#comment-17280762 ] Apache Spark commented on SPARK-34238: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/31516 > Unify output of SHOW PARTITIONS pass output attributes properly > --- > > Key: SPARK-34238 > URL: https://issues.apache.org/jira/browse/SPARK-34238 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > Unify output of SHOW PARTITIONS -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34344) Have functionality to trace back Spark SQL queries from the application ID that got submitted on YARN
[ https://issues.apache.org/jira/browse/SPARK-34344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpan Bhandari updated SPARK-34344: --- Description: We need to have Application Id from resource manager mapped to the specific spark sql query that got executed with respect to that application Id so that back tracing is possible. For example : if i run a query using spark shell : spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg desc,brand_id limit 100").show(); When i see the event logs or the history server i don't see the query anywhere, but the query plan is there, so it becomes difficult to trace back what query actually got submitted. (if have to map it to the specific application Id on yarn) was: We need to have Application Id from resource manager mapped to the specific spark sql query that got executed with respect to that application Id so that back tracing is possible. For example : if i run a query using spark shell : spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg desc,brand_id limit 100").show(); When i see the event logs or the history server i don't see the query anywhere, but the query plan is there, so it becomes difficult to trace back what query actually got submitted. > Have functionality to trace back Spark SQL queries from the application ID > that got submitted on YARN > - > > Key: SPARK-34344 > URL: https://issues.apache.org/jira/browse/SPARK-34344 > Project: Spark > Issue Type: New Feature > Components: Spark Shell, Spark Submit >Affects Versions: 1.6.3, 2.3.0, 2.4.5 >Reporter: Arpan Bhandari >Priority: Major > > We need to have Application Id from resource manager mapped to the specific > spark sql query that got executed with respect to that application Id so that > back tracing is possible. > For example : if i run a query using spark shell : > spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand > brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where > dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = > item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by > dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg > desc,brand_id limit 100").show(); > When i see the event logs or the history server i don't see the query > anywhere, but the query plan is there, so it becomes difficult to trace back > what query actually got submitted. (if have to map it to the specific > application Id on yarn) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34344) Have functionality to trace back Spark SQL queries from the application ID that got submitted on YARN
[ https://issues.apache.org/jira/browse/SPARK-34344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280760#comment-17280760 ] Arpan Bhandari commented on SPARK-34344: [~hyukjin.kwon] : I have updated the diescription , let me know if u need more details on this. > Have functionality to trace back Spark SQL queries from the application ID > that got submitted on YARN > - > > Key: SPARK-34344 > URL: https://issues.apache.org/jira/browse/SPARK-34344 > Project: Spark > Issue Type: New Feature > Components: Spark Shell, Spark Submit >Affects Versions: 1.6.3, 2.3.0, 2.4.5 >Reporter: Arpan Bhandari >Priority: Major > > We need to have Application Id from resource manager mapped to the specific > spark sql query that got executed with respect to that application Id so that > back tracing is possible. > For example : if i run a query using spark shell : > spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand > brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where > dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = > item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by > dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg > desc,brand_id limit 100").show(); > When i see the event logs or the history server i don't see the query > anywhere, but the query plan is there, so it becomes difficult to trace back > what query actually got submitted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34344) Have functionality to trace back Spark SQL queries from the application ID that got submitted on YARN
[ https://issues.apache.org/jira/browse/SPARK-34344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280760#comment-17280760 ] Arpan Bhandari edited comment on SPARK-34344 at 2/8/21, 5:37 AM: - [~hyukjin.kwon] : I have updated the description , let me know if u need more details on this. was (Author: arpan3189): [~hyukjin.kwon] : I have updated the diescription , let me know if u need more details on this. > Have functionality to trace back Spark SQL queries from the application ID > that got submitted on YARN > - > > Key: SPARK-34344 > URL: https://issues.apache.org/jira/browse/SPARK-34344 > Project: Spark > Issue Type: New Feature > Components: Spark Shell, Spark Submit >Affects Versions: 1.6.3, 2.3.0, 2.4.5 >Reporter: Arpan Bhandari >Priority: Major > > We need to have Application Id from resource manager mapped to the specific > spark sql query that got executed with respect to that application Id so that > back tracing is possible. > For example : if i run a query using spark shell : > spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand > brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where > dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = > item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by > dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg > desc,brand_id limit 100").show(); > When i see the event logs or the history server i don't see the query > anywhere, but the query plan is there, so it becomes difficult to trace back > what query actually got submitted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34344) Have functionality to trace back Spark SQL queries from the application ID that got submitted on YARN
[ https://issues.apache.org/jira/browse/SPARK-34344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpan Bhandari updated SPARK-34344: --- Description: We need to have Application Id from resource manager mapped to the specific spark sql query that got executed with respect to that application Id so that back tracing is possible. For example : if i run a query using spark shell : spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg desc,brand_id limit 100").show(); When i see the event logs or the history server i don't see the query anywhere, but the query plan is there, so it becomes difficult to trace back what query actually got submitted. was:We need to have Application Id from resource manager mapped to the specific spark sql query that got executed with respect to that application Id so that back tracing is possible. > Have functionality to trace back Spark SQL queries from the application ID > that got submitted on YARN > - > > Key: SPARK-34344 > URL: https://issues.apache.org/jira/browse/SPARK-34344 > Project: Spark > Issue Type: New Feature > Components: Spark Shell, Spark Submit >Affects Versions: 1.6.3, 2.3.0, 2.4.5 >Reporter: Arpan Bhandari >Priority: Major > > We need to have Application Id from resource manager mapped to the specific > spark sql query that got executed with respect to that application Id so that > back tracing is possible. > For example : if i run a query using spark shell : > spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand > brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where > dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = > item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by > dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg > desc,brand_id limit 100").show(); > When i see the event logs or the history server i don't see the query > anywhere, but the query plan is there, so it becomes difficult to trace back > what query actually got submitted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34158) Incorrect url of the only developer Matei in pom.xml
[ https://issues.apache.org/jira/browse/SPARK-34158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-34158: Assignee: Kevin Su > Incorrect url of the only developer Matei in pom.xml > > > Key: SPARK-34158 > URL: https://issues.apache.org/jira/browse/SPARK-34158 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 3.1.1 >Reporter: Jacek Laskowski >Assignee: Kevin Su >Priority: Minor > Fix For: 3.1.2 > > > {{[http://www.cs.berkeley.edu/~matei]}} in > [pom.xml|https://github.com/apache/spark/blob/53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d/pom.xml#L51] > gives > {quote}Resource not found > The server has encountered a problem because the resource was not found. > Your request was : > [https://people.eecs.berkeley.edu/~matei] > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34158) Incorrect url of the only developer Matei in pom.xml
[ https://issues.apache.org/jira/browse/SPARK-34158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280747#comment-17280747 ] Kevin Su commented on SPARK-34158: -- Hi [~hyukjin.kwon], my Jira ID is pingsutw > Incorrect url of the only developer Matei in pom.xml > > > Key: SPARK-34158 > URL: https://issues.apache.org/jira/browse/SPARK-34158 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 3.1.1 >Reporter: Jacek Laskowski >Priority: Minor > Fix For: 3.1.2 > > > {{[http://www.cs.berkeley.edu/~matei]}} in > [pom.xml|https://github.com/apache/spark/blob/53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d/pom.xml#L51] > gives > {quote}Resource not found > The server has encountered a problem because the resource was not found. > Your request was : > [https://people.eecs.berkeley.edu/~matei] > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34399) Add file commit time to metrics and shown in SQL Tab UI
angerszhu created SPARK-34399: - Summary: Add file commit time to metrics and shown in SQL Tab UI Key: SPARK-34399 URL: https://issues.apache.org/jira/browse/SPARK-34399 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: angerszhu Add file commit time to metrics and shown in SQL Tab UI -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34399) Add file commit time to metrics and shown in SQL Tab UI
[ https://issues.apache.org/jira/browse/SPARK-34399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280726#comment-17280726 ] angerszhu commented on SPARK-34399: --- Working on it > Add file commit time to metrics and shown in SQL Tab UI > --- > > Key: SPARK-34399 > URL: https://issues.apache.org/jira/browse/SPARK-34399 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Add file commit time to metrics and shown in SQL Tab UI -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34392) Invalid ID for offset-based ZoneId since Spark 3.0
[ https://issues.apache.org/jira/browse/SPARK-34392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280724#comment-17280724 ] Yuming Wang edited comment on SPARK-34392 at 2/8/21, 2:42 AM: -- https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html was (Author: q79969786): https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.htmlhttps://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html > Invalid ID for offset-based ZoneId since Spark 3.0 > -- > > Key: SPARK-34392 > URL: https://issues.apache.org/jira/browse/SPARK-34392 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:sql} > select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > {code} > Spark 2.4: > {noformat} > spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > 2020-02-07 08:00:00 > Time taken: 0.089 seconds, Fetched 1 row(s) > {noformat} > Spark 3.x: > {noformat} > spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > 21/02/07 01:24:32 ERROR SparkSQLDriver: Failed in [select > to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00")] > java.time.DateTimeException: Invalid ID for offset-based ZoneId: GMT+8:00 > at java.time.ZoneId.ofWithPrefix(ZoneId.java:437) > at java.time.ZoneId.of(ZoneId.java:407) > at java.time.ZoneId.of(ZoneId.java:359) > at java.time.ZoneId.of(ZoneId.java:315) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.getZoneId(DateTimeUtils.scala:53) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.toUTCTime(DateTimeUtils.scala:814) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34392) Invalid ID for offset-based ZoneId since Spark 3.0
[ https://issues.apache.org/jira/browse/SPARK-34392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280724#comment-17280724 ] Yuming Wang commented on SPARK-34392: - https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.htmlhttps://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html > Invalid ID for offset-based ZoneId since Spark 3.0 > -- > > Key: SPARK-34392 > URL: https://issues.apache.org/jira/browse/SPARK-34392 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:sql} > select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > {code} > Spark 2.4: > {noformat} > spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > 2020-02-07 08:00:00 > Time taken: 0.089 seconds, Fetched 1 row(s) > {noformat} > Spark 3.x: > {noformat} > spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > 21/02/07 01:24:32 ERROR SparkSQLDriver: Failed in [select > to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00")] > java.time.DateTimeException: Invalid ID for offset-based ZoneId: GMT+8:00 > at java.time.ZoneId.ofWithPrefix(ZoneId.java:437) > at java.time.ZoneId.of(ZoneId.java:407) > at java.time.ZoneId.of(ZoneId.java:359) > at java.time.ZoneId.of(ZoneId.java:315) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.getZoneId(DateTimeUtils.scala:53) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.toUTCTime(DateTimeUtils.scala:814) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34395) Clean up unused code for code simplifications
[ https://issues.apache.org/jira/browse/SPARK-34395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-34395: - Priority: Trivial (was: Minor) > Clean up unused code for code simplifications > - > > Key: SPARK-34395 > URL: https://issues.apache.org/jira/browse/SPARK-34395 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0, 3.2.0 >Reporter: yikf >Priority: Trivial > > Currently, we pass the default value `EmptyRow` to method `checkEvaluation` > in the StringExpressionsSuite, but the default value of the 'checkEvaluation' > method parameter is the `emptyRow`. > We can clean the parameter for Code Simplifications. > > example: > *before:* > {code:java} > def testConcat(inputs: String*): Unit = { > val expected = if (inputs.contains(null)) null else inputs.mkString > checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), > expected, EmptyRow) > }{code} > *after:* > {code:java} > def testConcat(inputs: String*): Unit = { > val expected = if (inputs.contains(null)) null else inputs.mkString > checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected) > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34374) Use standard methods to extract keys or values from a Map.
[ https://issues.apache.org/jira/browse/SPARK-34374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-34374: - Priority: Trivial (was: Minor) > Use standard methods to extract keys or values from a Map. > -- > > Key: SPARK-34374 > URL: https://issues.apache.org/jira/browse/SPARK-34374 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Trivial > > For keys: > *before* > {code:scala} > map.map(_._1) > {code} > *after* > {code:java} > map.keys > {code} > For values: > {code:scala} > map.map(_._2) > {code} > *after* > {code:java} > map.values > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34346) io.file.buffer.size set by spark.buffer.size will override by hive-site.xml may cause perf regression
[ https://issues.apache.org/jira/browse/SPARK-34346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280700#comment-17280700 ] Apache Spark commented on SPARK-34346: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31515 > io.file.buffer.size set by spark.buffer.size will override by hive-site.xml > may cause perf regression > - > > Key: SPARK-34346 > URL: https://issues.apache.org/jira/browse/SPARK-34346 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.0.2, 3.1.1 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Blocker > Fix For: 3.0.2, 3.1.1 > > > In many real-world cases, when interacting with hive catalog through Spark > SQL, users may just share the `hive-site.xml` for their hive jobs and make a > copy to `SPARK_HOME`/conf w/o modification. In Spark, when we generate Hadoop > configurations, we will use `spark.buffer.size(65536)` to reset > `io.file.buffer.size(4096)`. But when we load the hive-site.xml, we may > ignore this behavior and reset `io.file.buffer.size` again according to > `hive-site.xml`. > 1. The configuration priority for setting Hadoop and Hive config here is not > right, while literally, the order should be `spark > spark.hive > > spark.hadoop > hive > hadoop` > 2. This breaks `spark.buffer.size` congfig's behavior for tuning the IO > performance w/ HDFS if there is an existing `io.file.buffer.size` in > hive-site.xml -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34158) Incorrect url of the only developer Matei in pom.xml
[ https://issues.apache.org/jira/browse/SPARK-34158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-34158. -- Fix Version/s: 3.1.2 Resolution: Fixed Issue resolved by pull request 31512 [https://github.com/apache/spark/pull/31512] > Incorrect url of the only developer Matei in pom.xml > > > Key: SPARK-34158 > URL: https://issues.apache.org/jira/browse/SPARK-34158 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 3.1.1 >Reporter: Jacek Laskowski >Priority: Minor > Fix For: 3.1.2 > > > {{[http://www.cs.berkeley.edu/~matei]}} in > [pom.xml|https://github.com/apache/spark/blob/53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d/pom.xml#L51] > gives > {quote}Resource not found > The server has encountered a problem because the resource was not found. > Your request was : > [https://people.eecs.berkeley.edu/~matei] > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34398) Missing pyspark 3.1.1 doc migration
[ https://issues.apache.org/jira/browse/SPARK-34398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-34398: Assignee: raphael auv > Missing pyspark 3.1.1 doc migration > --- > > Key: SPARK-34398 > URL: https://issues.apache.org/jira/browse/SPARK-34398 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.1.1 >Reporter: raphael auv >Assignee: raphael auv >Priority: Major > > The link on the page > [https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/pyspark-migration-guide.html] > is not wokring. > Also there is nothing on 3.0 to 3.1.1 on this page > https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/api/python/migration_guide/index.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34398) Missing pyspark 3.1.1 doc migration
[ https://issues.apache.org/jira/browse/SPARK-34398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-34398. -- Fix Version/s: 3.1.2 Resolution: Fixed Issue resolved by pull request 31514 [https://github.com/apache/spark/pull/31514] > Missing pyspark 3.1.1 doc migration > --- > > Key: SPARK-34398 > URL: https://issues.apache.org/jira/browse/SPARK-34398 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.1.1 >Reporter: raphael auv >Assignee: raphael auv >Priority: Major > Fix For: 3.1.2 > > > The link on the page > [https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/pyspark-migration-guide.html] > is not wokring. > Also there is nothing on 3.0 to 3.1.1 on this page > https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/api/python/migration_guide/index.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34398) Missing pyspark 3.1.1 doc migration
[ https://issues.apache.org/jira/browse/SPARK-34398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34398: - Priority: Major (was: Blocker) > Missing pyspark 3.1.1 doc migration > --- > > Key: SPARK-34398 > URL: https://issues.apache.org/jira/browse/SPARK-34398 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.1.1 >Reporter: raphael auv >Priority: Major > > The link on the page > [https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/pyspark-migration-guide.html] > is not wokring. > Also there is nothing on 3.0 to 3.1.1 on this page > https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/api/python/migration_guide/index.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34349) No python3 in docker images
[ https://issues.apache.org/jira/browse/SPARK-34349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280694#comment-17280694 ] Hyukjin Kwon commented on SPARK-34349: -- [~chethanuk] does it work if you set {{spark.kubernetes.pyspark.pythonVersion}} to {{3}} ? > No python3 in docker images > > > Key: SPARK-34349 > URL: https://issues.apache.org/jira/browse/SPARK-34349 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.1 >Reporter: Tim Hughes >Priority: Critical > > The spark-py container image doesn't receive the instruction to use python3 > and defaults to python 2.7 > > The worker container was build using the following commands > {code:java} > mkdir ./tmp > wget -qO- > https://www.mirrorservice.org/sites/ftp.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz > | tar -C ./tmp/ -xzf - > cd ../spark-3.0.1-bin-hadoop3.2/ > ./bin/docker-image-tool.sh -r docker.io/timhughes -t > spark-3.0.1-bin-hadoop3.2 -p > kubernetes/dockerfiles/spark/bindings/python/Dockerfile build > docker push docker.io/timhughes/spark-py:spark-3.0.1-bin-hadoop3.2{code} > > This is the code I am using to initialize the workers > > {code:java} > import os > from pyspark import SparkContext, SparkConf > from pyspark.sql import SparkSession# Create Spark config for our Kubernetes > based cluster manager > sparkConf = SparkConf() > sparkConf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443";) > sparkConf.setAppName("spark") > sparkConf.set("spark.kubernetes.container.image", > "docker.io/timhughes/spark-py:spark-3.0.1-bin-hadoop3.2") > sparkConf.set("spark.kubernetes.namespace", "spark") > sparkConf.set("spark.executor.instances", "2") > sparkConf.set("spark.executor.cores", "1") > sparkConf.set("spark.driver.memory", "1024m") > sparkConf.set("spark.executor.memory", "1024m") > sparkConf.set("spark.kubernetes.pyspark.pythonVersion", "3") > sparkConf.set("spark.kubernetes.authenticate.driver.serviceAccountName", > "spark") > sparkConf.set("spark.kubernetes.authenticate.serviceAccountName", "spark") > sparkConf.set("spark.driver.port", "29413") > sparkConf.set("spark.driver.host", > "my-notebook-deployment.spark.svc.cluster.local") > # Initialize our Spark cluster, this will actually > # generate the worker nodes. > spark = SparkSession.builder.config(conf=sparkConf).getOrCreate() > sc = spark.sparkContext > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34398) Missing pyspark 3.1.1 doc migration
[ https://issues.apache.org/jira/browse/SPARK-34398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280691#comment-17280691 ] Apache Spark commented on SPARK-34398: -- User 'raphaelauv' has created a pull request for this issue: https://github.com/apache/spark/pull/31514 > Missing pyspark 3.1.1 doc migration > --- > > Key: SPARK-34398 > URL: https://issues.apache.org/jira/browse/SPARK-34398 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.1.1 >Reporter: raphael auv >Priority: Blocker > > The link on the page > [https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/pyspark-migration-guide.html] > is not wokring. > Also there is nothing on 3.0 to 3.1.1 on this page > https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/api/python/migration_guide/index.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34398) Missing pyspark 3.1.1 doc migration
[ https://issues.apache.org/jira/browse/SPARK-34398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34398: Assignee: (was: Apache Spark) > Missing pyspark 3.1.1 doc migration > --- > > Key: SPARK-34398 > URL: https://issues.apache.org/jira/browse/SPARK-34398 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.1.1 >Reporter: raphael auv >Priority: Blocker > > The link on the page > [https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/pyspark-migration-guide.html] > is not wokring. > Also there is nothing on 3.0 to 3.1.1 on this page > https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/api/python/migration_guide/index.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34398) Missing pyspark 3.1.1 doc migration
[ https://issues.apache.org/jira/browse/SPARK-34398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34398: Assignee: Apache Spark > Missing pyspark 3.1.1 doc migration > --- > > Key: SPARK-34398 > URL: https://issues.apache.org/jira/browse/SPARK-34398 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.1.1 >Reporter: raphael auv >Assignee: Apache Spark >Priority: Blocker > > The link on the page > [https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/pyspark-migration-guide.html] > is not wokring. > Also there is nothing on 3.0 to 3.1.1 on this page > https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/api/python/migration_guide/index.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34398) Missing pyspark 3.1.1 doc migration
raphael auv created SPARK-34398: --- Summary: Missing pyspark 3.1.1 doc migration Key: SPARK-34398 URL: https://issues.apache.org/jira/browse/SPARK-34398 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 3.1.1 Reporter: raphael auv The link on the page [https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/pyspark-migration-guide.html] is not wokring. Also there is nothing on 3.0 to 3.1.1 on this page https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/_site/api/python/migration_guide/index.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34361) Dynamic allocation on K8s kills executors with running tasks
[ https://issues.apache.org/jira/browse/SPARK-34361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280636#comment-17280636 ] Apache Spark commented on SPARK-34361: -- User 'attilapiros' has created a pull request for this issue: https://github.com/apache/spark/pull/31513 > Dynamic allocation on K8s kills executors with running tasks > > > Key: SPARK-34361 > URL: https://issues.apache.org/jira/browse/SPARK-34361 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.2.0, 3.1.1, 3.1.2 >Reporter: Attila Zsolt Piros >Priority: Major > > There is race between executor POD allocator and cluster scheduler backend. > During downscaling (in dynamic allocation) we experienced a lot of killed new > executors with running task on them. > The pattern in the log is the following: > {noformat} > 21/02/01 15:12:03 INFO ExecutorMonitor: New executor 312 has registered (new > total is 138) > ... > 21/02/01 15:12:03 INFO TaskSetManager: Starting task 247.0 in stage 4.0 (TID > 2079, 100.100.18.138, executor 312, partition 247, PROCESS_LOCAL, 8777 bytes) > 21/02/01 15:12:03 INFO ExecutorPodsAllocator: Deleting 3 excess pod requests > (408,312,307). > ... > 21/02/01 15:12:04 ERROR TaskSchedulerImpl: Lost executor 312 on > 100.100.18.138: The executor with id 312 was deleted by a user or the > framework. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34361) Dynamic allocation on K8s kills executors with running tasks
[ https://issues.apache.org/jira/browse/SPARK-34361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34361: Assignee: (was: Apache Spark) > Dynamic allocation on K8s kills executors with running tasks > > > Key: SPARK-34361 > URL: https://issues.apache.org/jira/browse/SPARK-34361 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.2.0, 3.1.1, 3.1.2 >Reporter: Attila Zsolt Piros >Priority: Major > > There is race between executor POD allocator and cluster scheduler backend. > During downscaling (in dynamic allocation) we experienced a lot of killed new > executors with running task on them. > The pattern in the log is the following: > {noformat} > 21/02/01 15:12:03 INFO ExecutorMonitor: New executor 312 has registered (new > total is 138) > ... > 21/02/01 15:12:03 INFO TaskSetManager: Starting task 247.0 in stage 4.0 (TID > 2079, 100.100.18.138, executor 312, partition 247, PROCESS_LOCAL, 8777 bytes) > 21/02/01 15:12:03 INFO ExecutorPodsAllocator: Deleting 3 excess pod requests > (408,312,307). > ... > 21/02/01 15:12:04 ERROR TaskSchedulerImpl: Lost executor 312 on > 100.100.18.138: The executor with id 312 was deleted by a user or the > framework. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34361) Dynamic allocation on K8s kills executors with running tasks
[ https://issues.apache.org/jira/browse/SPARK-34361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34361: Assignee: Apache Spark > Dynamic allocation on K8s kills executors with running tasks > > > Key: SPARK-34361 > URL: https://issues.apache.org/jira/browse/SPARK-34361 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.2.0, 3.1.1, 3.1.2 >Reporter: Attila Zsolt Piros >Assignee: Apache Spark >Priority: Major > > There is race between executor POD allocator and cluster scheduler backend. > During downscaling (in dynamic allocation) we experienced a lot of killed new > executors with running task on them. > The pattern in the log is the following: > {noformat} > 21/02/01 15:12:03 INFO ExecutorMonitor: New executor 312 has registered (new > total is 138) > ... > 21/02/01 15:12:03 INFO TaskSetManager: Starting task 247.0 in stage 4.0 (TID > 2079, 100.100.18.138, executor 312, partition 247, PROCESS_LOCAL, 8777 bytes) > 21/02/01 15:12:03 INFO ExecutorPodsAllocator: Deleting 3 excess pod requests > (408,312,307). > ... > 21/02/01 15:12:04 ERROR TaskSchedulerImpl: Lost executor 312 on > 100.100.18.138: The executor with id 312 was deleted by a user or the > framework. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34349) No python3 in docker images
[ https://issues.apache.org/jira/browse/SPARK-34349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280634#comment-17280634 ] Chethan UK commented on SPARK-34349: Yes, this is fixed. I Have tested with 3.1.1 > No python3 in docker images > > > Key: SPARK-34349 > URL: https://issues.apache.org/jira/browse/SPARK-34349 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.1 >Reporter: Tim Hughes >Priority: Critical > > The spark-py container image doesn't receive the instruction to use python3 > and defaults to python 2.7 > > The worker container was build using the following commands > {code:java} > mkdir ./tmp > wget -qO- > https://www.mirrorservice.org/sites/ftp.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz > | tar -C ./tmp/ -xzf - > cd ../spark-3.0.1-bin-hadoop3.2/ > ./bin/docker-image-tool.sh -r docker.io/timhughes -t > spark-3.0.1-bin-hadoop3.2 -p > kubernetes/dockerfiles/spark/bindings/python/Dockerfile build > docker push docker.io/timhughes/spark-py:spark-3.0.1-bin-hadoop3.2{code} > > This is the code I am using to initialize the workers > > {code:java} > import os > from pyspark import SparkContext, SparkConf > from pyspark.sql import SparkSession# Create Spark config for our Kubernetes > based cluster manager > sparkConf = SparkConf() > sparkConf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443";) > sparkConf.setAppName("spark") > sparkConf.set("spark.kubernetes.container.image", > "docker.io/timhughes/spark-py:spark-3.0.1-bin-hadoop3.2") > sparkConf.set("spark.kubernetes.namespace", "spark") > sparkConf.set("spark.executor.instances", "2") > sparkConf.set("spark.executor.cores", "1") > sparkConf.set("spark.driver.memory", "1024m") > sparkConf.set("spark.executor.memory", "1024m") > sparkConf.set("spark.kubernetes.pyspark.pythonVersion", "3") > sparkConf.set("spark.kubernetes.authenticate.driver.serviceAccountName", > "spark") > sparkConf.set("spark.kubernetes.authenticate.serviceAccountName", "spark") > sparkConf.set("spark.driver.port", "29413") > sparkConf.set("spark.driver.host", > "my-notebook-deployment.spark.svc.cluster.local") > # Initialize our Spark cluster, this will actually > # generate the worker nodes. > spark = SparkSession.builder.config(conf=sparkConf).getOrCreate() > sc = spark.sparkContext > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33603) Group exception messages in execution/command
[ https://issues.apache.org/jira/browse/SPARK-33603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280610#comment-17280610 ] Kevin Su commented on SPARK-33603: -- I'm working on. > Group exception messages in execution/command > - > > Key: SPARK-33603 > URL: https://issues.apache.org/jira/browse/SPARK-33603 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/command' > || Filename || Count || > | AnalyzeColumnCommand.scala| 3 | > | AnalyzePartitionCommand.scala | 2 | > | AnalyzeTableCommand.scala | 1 | > | SetCommand.scala | 2 | > | createDataSourceTables.scala | 2 | > | ddl.scala | 1 | > | functions.scala | 4 | > | tables.scala | 7 | > | views.scala | 3 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34158) Incorrect url of the only developer Matei in pom.xml
[ https://issues.apache.org/jira/browse/SPARK-34158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34158: Assignee: (was: Apache Spark) > Incorrect url of the only developer Matei in pom.xml > > > Key: SPARK-34158 > URL: https://issues.apache.org/jira/browse/SPARK-34158 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 3.1.1 >Reporter: Jacek Laskowski >Priority: Minor > > {{[http://www.cs.berkeley.edu/~matei]}} in > [pom.xml|https://github.com/apache/spark/blob/53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d/pom.xml#L51] > gives > {quote}Resource not found > The server has encountered a problem because the resource was not found. > Your request was : > [https://people.eecs.berkeley.edu/~matei] > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34158) Incorrect url of the only developer Matei in pom.xml
[ https://issues.apache.org/jira/browse/SPARK-34158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280597#comment-17280597 ] Apache Spark commented on SPARK-34158: -- User 'pingsutw' has created a pull request for this issue: https://github.com/apache/spark/pull/31512 > Incorrect url of the only developer Matei in pom.xml > > > Key: SPARK-34158 > URL: https://issues.apache.org/jira/browse/SPARK-34158 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 3.1.1 >Reporter: Jacek Laskowski >Priority: Minor > > {{[http://www.cs.berkeley.edu/~matei]}} in > [pom.xml|https://github.com/apache/spark/blob/53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d/pom.xml#L51] > gives > {quote}Resource not found > The server has encountered a problem because the resource was not found. > Your request was : > [https://people.eecs.berkeley.edu/~matei] > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34158) Incorrect url of the only developer Matei in pom.xml
[ https://issues.apache.org/jira/browse/SPARK-34158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34158: Assignee: Apache Spark > Incorrect url of the only developer Matei in pom.xml > > > Key: SPARK-34158 > URL: https://issues.apache.org/jira/browse/SPARK-34158 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Affects Versions: 3.1.1 >Reporter: Jacek Laskowski >Assignee: Apache Spark >Priority: Minor > > {{[http://www.cs.berkeley.edu/~matei]}} in > [pom.xml|https://github.com/apache/spark/blob/53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d/pom.xml#L51] > gives > {quote}Resource not found > The server has encountered a problem because the resource was not found. > Your request was : > [https://people.eecs.berkeley.edu/~matei] > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34397) Support v2 `MSCK REPAIR TABLE`
Maxim Gekk created SPARK-34397: -- Summary: Support v2 `MSCK REPAIR TABLE` Key: SPARK-34397 URL: https://issues.apache.org/jira/browse/SPARK-34397 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Implement the `MSCK REPAIR TABLE` command for tables from v2 catalogs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28330) ANSI SQL: Top-level in
[ https://issues.apache.org/jira/browse/SPARK-28330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280504#comment-17280504 ] ShengJun Zheng commented on SPARK-28330: any progress ? > ANSI SQL: Top-level in > > > Key: SPARK-28330 > URL: https://issues.apache.org/jira/browse/SPARK-28330 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > h2. {{LIMIT}} and {{OFFSET}} > LIMIT and OFFSET allow you to retrieve just a portion of the rows that are > generated by the rest of the query: > {noformat} > SELECT select_list > FROM table_expression > [ ORDER BY ... ] > [ LIMIT { number | ALL } ] [ OFFSET number ] > {noformat} > If a limit count is given, no more than that many rows will be returned (but > possibly fewer, if the query itself yields fewer rows). LIMIT ALL is the same > as omitting the LIMIT clause, as is LIMIT with a NULL argument. > OFFSET says to skip that many rows before beginning to return rows. OFFSET 0 > is the same as omitting the OFFSET clause, as is OFFSET with a NULL argument. > If both OFFSET and LIMIT appear, then OFFSET rows are skipped before starting > to count the LIMIT rows that are returned. > https://www.postgresql.org/docs/11/queries-limit.html > *Feature ID*: F861 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34395) Clean up unused code for code simplifications
[ https://issues.apache.org/jira/browse/SPARK-34395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280493#comment-17280493 ] Apache Spark commented on SPARK-34395: -- User 'yikf' has created a pull request for this issue: https://github.com/apache/spark/pull/31510 > Clean up unused code for code simplifications > - > > Key: SPARK-34395 > URL: https://issues.apache.org/jira/browse/SPARK-34395 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0, 3.2.0 >Reporter: yikf >Priority: Minor > > Currently, we pass the default value `EmptyRow` to method `checkEvaluation` > in the StringExpressionsSuite, but the default value of the 'checkEvaluation' > method parameter is the `emptyRow`. > We can clean the parameter for Code Simplifications. > > example: > *before:* > {code:java} > def testConcat(inputs: String*): Unit = { > val expected = if (inputs.contains(null)) null else inputs.mkString > checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), > expected, EmptyRow) > }{code} > *after:* > {code:java} > def testConcat(inputs: String*): Unit = { > val expected = if (inputs.contains(null)) null else inputs.mkString > checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected) > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34395) Clean up unused code for code simplifications
[ https://issues.apache.org/jira/browse/SPARK-34395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34395: Assignee: (was: Apache Spark) > Clean up unused code for code simplifications > - > > Key: SPARK-34395 > URL: https://issues.apache.org/jira/browse/SPARK-34395 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0, 3.2.0 >Reporter: yikf >Priority: Minor > > Currently, we pass the default value `EmptyRow` to method `checkEvaluation` > in the StringExpressionsSuite, but the default value of the 'checkEvaluation' > method parameter is the `emptyRow`. > We can clean the parameter for Code Simplifications. > > example: > *before:* > {code:java} > def testConcat(inputs: String*): Unit = { > val expected = if (inputs.contains(null)) null else inputs.mkString > checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), > expected, EmptyRow) > }{code} > *after:* > {code:java} > def testConcat(inputs: String*): Unit = { > val expected = if (inputs.contains(null)) null else inputs.mkString > checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected) > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34395) Clean up unused code for code simplifications
[ https://issues.apache.org/jira/browse/SPARK-34395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34395: Assignee: Apache Spark > Clean up unused code for code simplifications > - > > Key: SPARK-34395 > URL: https://issues.apache.org/jira/browse/SPARK-34395 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0, 3.2.0 >Reporter: yikf >Assignee: Apache Spark >Priority: Minor > > Currently, we pass the default value `EmptyRow` to method `checkEvaluation` > in the StringExpressionsSuite, but the default value of the 'checkEvaluation' > method parameter is the `emptyRow`. > We can clean the parameter for Code Simplifications. > > example: > *before:* > {code:java} > def testConcat(inputs: String*): Unit = { > val expected = if (inputs.contains(null)) null else inputs.mkString > checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), > expected, EmptyRow) > }{code} > *after:* > {code:java} > def testConcat(inputs: String*): Unit = { > val expected = if (inputs.contains(null)) null else inputs.mkString > checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected) > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34396) Add a new build-in function delegate
[ https://issues.apache.org/jira/browse/SPARK-34396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280487#comment-17280487 ] Apache Spark commented on SPARK-34396: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/31509 > Add a new build-in function delegate > > > Key: SPARK-34396 > URL: https://issues.apache.org/jira/browse/SPARK-34396 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > > Delegate is just like a big box that can include some other expressions. It > will execute all of children and return the last child result as its result. > > The origin idea is from debug. Debug SQL is hard since SQL is always quite > long and complex. This new function can help debug with inject some help > functions, e.g., `println, sleep, raise_error`. > > Two usage examples: > {code:java} > -- raw sql > INSERT INTO TABLE t1 > SELECT coalesce(c1, c2) as c FROM t2 > > -- print the column data > INSERT INTO TABLE t1 > SELECT delegate( > java_method('scala.Console', 'println', concat('c1: ', c1, ', c2: ', c2)), > coalesce(c1, c2) > ) as c FROM t2{code} > > {code:java} > -- raw sql > SELECT if(spark_partition_id() = 1, c1, raise_error('test error')) FROM t2 > > -- add a sleep time before throw error > SELECT if(spark_partition_id() = 1, c1, delegate( > java_method('java.lang.Thread', 'sleep', 3000l), > raise_error('test error') > )) FROM t2{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34396) Add a new build-in function delegate
[ https://issues.apache.org/jira/browse/SPARK-34396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34396: Assignee: Apache Spark > Add a new build-in function delegate > > > Key: SPARK-34396 > URL: https://issues.apache.org/jira/browse/SPARK-34396 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Assignee: Apache Spark >Priority: Minor > > Delegate is just like a big box that can include some other expressions. It > will execute all of children and return the last child result as its result. > > The origin idea is from debug. Debug SQL is hard since SQL is always quite > long and complex. This new function can help debug with inject some help > functions, e.g., `println, sleep, raise_error`. > > Two usage examples: > {code:java} > -- raw sql > INSERT INTO TABLE t1 > SELECT coalesce(c1, c2) as c FROM t2 > > -- print the column data > INSERT INTO TABLE t1 > SELECT delegate( > java_method('scala.Console', 'println', concat('c1: ', c1, ', c2: ', c2)), > coalesce(c1, c2) > ) as c FROM t2{code} > > {code:java} > -- raw sql > SELECT if(spark_partition_id() = 1, c1, raise_error('test error')) FROM t2 > > -- add a sleep time before throw error > SELECT if(spark_partition_id() = 1, c1, delegate( > java_method('java.lang.Thread', 'sleep', 3000l), > raise_error('test error') > )) FROM t2{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34396) Add a new build-in function delegate
[ https://issues.apache.org/jira/browse/SPARK-34396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280485#comment-17280485 ] Apache Spark commented on SPARK-34396: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/31509 > Add a new build-in function delegate > > > Key: SPARK-34396 > URL: https://issues.apache.org/jira/browse/SPARK-34396 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > > Delegate is just like a big box that can include some other expressions. It > will execute all of children and return the last child result as its result. > > The origin idea is from debug. Debug SQL is hard since SQL is always quite > long and complex. This new function can help debug with inject some help > functions, e.g., `println, sleep, raise_error`. > > Two usage examples: > {code:java} > -- raw sql > INSERT INTO TABLE t1 > SELECT coalesce(c1, c2) as c FROM t2 > > -- print the column data > INSERT INTO TABLE t1 > SELECT delegate( > java_method('scala.Console', 'println', concat('c1: ', c1, ', c2: ', c2)), > coalesce(c1, c2) > ) as c FROM t2{code} > > {code:java} > -- raw sql > SELECT if(spark_partition_id() = 1, c1, raise_error('test error')) FROM t2 > > -- add a sleep time before throw error > SELECT if(spark_partition_id() = 1, c1, delegate( > java_method('java.lang.Thread', 'sleep', 3000l), > raise_error('test error') > )) FROM t2{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34396) Add a new build-in function delegate
[ https://issues.apache.org/jira/browse/SPARK-34396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34396: Assignee: (was: Apache Spark) > Add a new build-in function delegate > > > Key: SPARK-34396 > URL: https://issues.apache.org/jira/browse/SPARK-34396 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > > Delegate is just like a big box that can include some other expressions. It > will execute all of children and return the last child result as its result. > > The origin idea is from debug. Debug SQL is hard since SQL is always quite > long and complex. This new function can help debug with inject some help > functions, e.g., `println, sleep, raise_error`. > > Two usage examples: > {code:java} > -- raw sql > INSERT INTO TABLE t1 > SELECT coalesce(c1, c2) as c FROM t2 > > -- print the column data > INSERT INTO TABLE t1 > SELECT delegate( > java_method('scala.Console', 'println', concat('c1: ', c1, ', c2: ', c2)), > coalesce(c1, c2) > ) as c FROM t2{code} > > {code:java} > -- raw sql > SELECT if(spark_partition_id() = 1, c1, raise_error('test error')) FROM t2 > > -- add a sleep time before throw error > SELECT if(spark_partition_id() = 1, c1, delegate( > java_method('java.lang.Thread', 'sleep', 3000l), > raise_error('test error') > )) FROM t2{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34396) Add a new build-in function delegate
[ https://issues.apache.org/jira/browse/SPARK-34396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ulysses you updated SPARK-34396: Description: Delegate is just like a big box that can include some other expressions. It will execute all of children and return the last child result as its result. The origin idea is from debug. Debug SQL is hard since SQL is always quite long and complex. This new function can help debug with inject some help functions, e.g., `println, sleep, raise_error`. Two usage examples: {code:java} -- raw sql INSERT INTO TABLE t1 SELECT coalesce(c1, c2) as c FROM t2 -- print the column data INSERT INTO TABLE t1 SELECT delegate( java_method('scala.Console', 'println', concat('c1: ', c1, ', c2: ', c2)), coalesce(c1, c2) ) as c FROM t2{code} {code:java} -- raw sql SELECT if(spark_partition_id() = 1, c1, raise_error('test error')) FROM t2 -- add a sleep time before throw error SELECT if(spark_partition_id() = 1, c1, delegate( java_method('java.lang.Thread', 'sleep', 3000l), raise_error('test error') )) FROM t2{code} was: Delegate is just like a big box that can include some other expressions. It will execute all of children and return the last child result as its result. The origin idea is from debug. Debug SQL is hard since SQL is always quite long and complex. This new function can help debug with inject some help functions, e.g., `println, sleep, raise_error`. Two usage examples: {code:java} -- raw sql INSERT INTO TABLE t1 SELECT coalesce(c1, c2) as c FROM t2 -- print the column data INSERT INTO TABLE t1 SELECT delegate( java_method('scala.Console', 'println', concat('c1: ', c1, ', c2: ', c2)), coalesce(c1, c2) ) as c FROM t2{code} {code:java} -- raw sql SELECT if(spark_partition_id() = 1, c1, raise_error('test error')) FROM t2 -- add a sleep time before throw error SELECT if(spark_partition_id() = 1, c1, delegate( java_method('java.lang.Thread', 'sleep', 3000l), raise_error('test error') )) FROM t2{code} > Add a new build-in function delegate > > > Key: SPARK-34396 > URL: https://issues.apache.org/jira/browse/SPARK-34396 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > > Delegate is just like a big box that can include some other expressions. It > will execute all of children and return the last child result as its result. > > The origin idea is from debug. Debug SQL is hard since SQL is always quite > long and complex. This new function can help debug with inject some help > functions, e.g., `println, sleep, raise_error`. > > Two usage examples: > {code:java} > -- raw sql > INSERT INTO TABLE t1 > SELECT coalesce(c1, c2) as c FROM t2 > > -- print the column data > INSERT INTO TABLE t1 > SELECT delegate( > java_method('scala.Console', 'println', concat('c1: ', c1, ', c2: ', c2)), > coalesce(c1, c2) > ) as c FROM t2{code} > > {code:java} > -- raw sql > SELECT if(spark_partition_id() = 1, c1, raise_error('test error')) FROM t2 > > -- add a sleep time before throw error > SELECT if(spark_partition_id() = 1, c1, delegate( > java_method('java.lang.Thread', 'sleep', 3000l), > raise_error('test error') > )) FROM t2{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34396) Add a new build-in function delegate
[ https://issues.apache.org/jira/browse/SPARK-34396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ulysses you updated SPARK-34396: Description: Delegate is just like a big box that can include some other expressions. It will execute all of children and return the last child result as its result. The origin idea is from debug. Debug SQL is hard since SQL is always quite long and complex. This new function can help debug with inject some help functions, e.g., `println, sleep, raise_error`. Two usage examples: {code:java} -- raw sql INSERT INTO TABLE t1 SELECT coalesce(c1, c2) as c FROM t2 -- print the column data INSERT INTO TABLE t1 SELECT delegate( java_method('scala.Console', 'println', concat('c1: ', c1, ', c2: ', c2)), coalesce(c1, c2) ) as c FROM t2{code} {code:java} -- raw sql SELECT if(spark_partition_id() = 1, c1, raise_error('test error')) FROM t2 -- add a sleep time before throw error SELECT if(spark_partition_id() = 1, c1, delegate( java_method('java.lang.Thread', 'sleep', 3000l), raise_error('test error') )) FROM t2{code} was: Delegate is just like a big box that can include some other expressions. It will execute all of children and return the last child result as its result. The origin idea is from debug. Debug SQL is hard since SQL is always quite long and complex. This new function can help debug with inject some help functions, e.g., `println, sleep, raise_error`. Two usage examples: {code:java} -- raw sql INSERT INTO TABLE t1 SELECT coalesce(c1, c2) as c FROM t2 -- print the column data INSERT INTO TABLE t1 SELECT delegate( java_method('scala.Console', 'println', concat('c1: ', c1, ', c2: ', c2)), coalesce(c1, c2) ) as c FROM t2{code} {code:java} -- raw sql SELECT if(spark_partition_id() = 1, c1, raise_error('test error')) FROM t2 -- add a sleep time before throw error SELECT if(spark_partition_id() = 1, c1, delegate( java_method('java.lang.Thread', 'sleep', 3000l), raise_error('test error') )) FROM t2{code} > Add a new build-in function delegate > > > Key: SPARK-34396 > URL: https://issues.apache.org/jira/browse/SPARK-34396 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > > Delegate is just like a big box that can include some other expressions. It > will execute all of children and return the last child result as its result. > > The origin idea is from debug. Debug SQL is hard since SQL is always quite > long and complex. This new function can help debug with inject some help > functions, e.g., `println, sleep, raise_error`. > > Two usage examples: > {code:java} > -- raw sql > INSERT INTO TABLE t1 > SELECT coalesce(c1, c2) as c FROM t2 > > -- print the column data > INSERT INTO TABLE t1 > SELECT delegate( > java_method('scala.Console', 'println', concat('c1: ', c1, ', c2: ', c2)), > coalesce(c1, c2) > ) as c FROM t2{code} > > > {code:java} > -- raw sql > SELECT if(spark_partition_id() = 1, c1, raise_error('test error')) FROM t2 > > -- add a sleep time before throw error > SELECT if(spark_partition_id() = 1, c1, delegate( > java_method('java.lang.Thread', 'sleep', 3000l), > raise_error('test error') > )) FROM t2{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34396) Add a new build-in function delegate
ulysses you created SPARK-34396: --- Summary: Add a new build-in function delegate Key: SPARK-34396 URL: https://issues.apache.org/jira/browse/SPARK-34396 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: ulysses you Delegate is just like a big box that can include some other expressions. It will execute all of children and return the last child result as its result. The origin idea is from debug. Debug SQL is hard since SQL is always quite long and complex. This new function can help debug with inject some help functions, e.g., `println, sleep, raise_error`. Two usage examples: {code:java} -- raw sql INSERT INTO TABLE t1 SELECT coalesce(c1, c2) as c FROM t2 -- print the column data INSERT INTO TABLE t1 SELECT delegate( java_method('scala.Console', 'println', concat('c1: ', c1, ', c2: ', c2)), coalesce(c1, c2) ) as c FROM t2{code} {code:java} -- raw sql SELECT if(spark_partition_id() = 1, c1, raise_error('test error')) FROM t2 -- add a sleep time before throw error SELECT if(spark_partition_id() = 1, c1, delegate( java_method('java.lang.Thread', 'sleep', 3000l), raise_error('test error') )) FROM t2{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34395) Clean up unused code for code simplifications
[ https://issues.apache.org/jira/browse/SPARK-34395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yikf updated SPARK-34395: - Description: Currently, we pass the default value `EmptyRow` to method `checkEvaluation` in the StringExpressionsSuite, but the default value of the 'checkEvaluation' method parameter is the `emptyRow`. We can clean the parameter for Code Simplifications. example: *before:* {code:java} def testConcat(inputs: String*): Unit = { val expected = if (inputs.contains(null)) null else inputs.mkString checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected, EmptyRow) }{code} *after:* {code:java} def testConcat(inputs: String*): Unit = { val expected = if (inputs.contains(null)) null else inputs.mkString checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected) }{code} was: Currently, we pass the default value `EmptyRow` to method `checkEvaluation` in the StringExpressionsSuite, but the default value of the 'checkEvaluation' method parameter is the `emptyRow`. We can clean the parameter for Code Simplifications. example: *before:* {code:java} def testConcat(inputs: String*): Unit = { val expected = if (inputs.contains(null)) null else inputs.mkString checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected, EmptyRow) }{code} *after:* {code:java} def testConcat(inputs: String*): Unit = { val expected = if (inputs.contains(null)) null else inputs.mkString checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected) }{code} > Clean up unused code for code simplifications > - > > Key: SPARK-34395 > URL: https://issues.apache.org/jira/browse/SPARK-34395 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0, 3.2.0 >Reporter: yikf >Priority: Minor > > Currently, we pass the default value `EmptyRow` to method `checkEvaluation` > in the StringExpressionsSuite, but the default value of the 'checkEvaluation' > method parameter is the `emptyRow`. > We can clean the parameter for Code Simplifications. > > example: > *before:* > {code:java} > def testConcat(inputs: String*): Unit = { > val expected = if (inputs.contains(null)) null else inputs.mkString > checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), > expected, EmptyRow) > }{code} > *after:* > {code:java} > def testConcat(inputs: String*): Unit = { > val expected = if (inputs.contains(null)) null else inputs.mkString > checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected) > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34395) Clean up unused code for code simplifications
[ https://issues.apache.org/jira/browse/SPARK-34395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yikf updated SPARK-34395: - Description: Currently, we pass the default value `EmptyRow` to method `checkEvaluation` in the StringExpressionsSuite, but the default value of the 'checkEvaluation' method parameter is the `emptyRow`. We can clean the parameter for Code Simplifications. example: *before:* {code:java} def testConcat(inputs: String*): Unit = { val expected = if (inputs.contains(null)) null else inputs.mkString checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected, EmptyRow) }{code} *after:* {code:java} def testConcat(inputs: String*): Unit = { val expected = if (inputs.contains(null)) null else inputs.mkString checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected) }{code} was: Currently, we pass the default value `EmptyRow` to method `checkEvaluation` in the StringExpressionsSuite, but the default value of the 'checkEvaluation' method parameter is the `emptyRow`. We can clean the parameter for Code Simplifications. example: *before:* *after:* > Clean up unused code for code simplifications > - > > Key: SPARK-34395 > URL: https://issues.apache.org/jira/browse/SPARK-34395 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0, 3.2.0 >Reporter: yikf >Priority: Minor > > Currently, we pass the default value `EmptyRow` to method `checkEvaluation` > in the StringExpressionsSuite, but the default value of the 'checkEvaluation' > method parameter is the `emptyRow`. > We can clean the parameter for Code Simplifications. > > example: > *before:* > {code:java} > def testConcat(inputs: String*): Unit = { > val expected = if (inputs.contains(null)) null else inputs.mkString > checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), > expected, EmptyRow) > }{code} > *after:* > > {code:java} > def testConcat(inputs: String*): Unit = { > val expected = if (inputs.contains(null)) null else inputs.mkString > checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected) > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34395) Clean up unused code for code simplifications
yikf created SPARK-34395: Summary: Clean up unused code for code simplifications Key: SPARK-34395 URL: https://issues.apache.org/jira/browse/SPARK-34395 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0, 3.2.0 Reporter: yikf Currently, we pass the default value `EmptyRow` to method `checkEvaluation` in the StringExpressionsSuite, but the default value of the 'checkEvaluation' method parameter is the `emptyRow`. We can clean the parameter for Code Simplifications. example: *before:* *after:* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34394) Unify output of SHOW FUNCTIONS and pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280456#comment-17280456 ] jiaan.geng commented on SPARK-34394: I'm working on. > Unify output of SHOW FUNCTIONS and pass output attributes properly > -- > > Key: SPARK-34394 > URL: https://issues.apache.org/jira/browse/SPARK-34394 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34394) Unify output of SHOW FUNCTIONS and pass output attributes properly
jiaan.geng created SPARK-34394: -- Summary: Unify output of SHOW FUNCTIONS and pass output attributes properly Key: SPARK-34394 URL: https://issues.apache.org/jira/browse/SPARK-34394 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: jiaan.geng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34393) Unify output of SHOW VIEWS and pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280444#comment-17280444 ] Apache Spark commented on SPARK-34393: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/31508 > Unify output of SHOW VIEWS and pass output attributes properly > -- > > Key: SPARK-34393 > URL: https://issues.apache.org/jira/browse/SPARK-34393 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-34393) Unify output of SHOW VIEWS and pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-34393: --- Comment: was deleted (was: I'm working on.) > Unify output of SHOW VIEWS and pass output attributes properly > -- > > Key: SPARK-34393 > URL: https://issues.apache.org/jira/browse/SPARK-34393 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34393) Unify output of SHOW VIEWS and pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34393: Assignee: (was: Apache Spark) > Unify output of SHOW VIEWS and pass output attributes properly > -- > > Key: SPARK-34393 > URL: https://issues.apache.org/jira/browse/SPARK-34393 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34393) Unify output of SHOW VIEWS and pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34393: Assignee: Apache Spark > Unify output of SHOW VIEWS and pass output attributes properly > -- > > Key: SPARK-34393 > URL: https://issues.apache.org/jira/browse/SPARK-34393 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34393) Unify output of SHOW VIEWS and pass output attributes properly
[ https://issues.apache.org/jira/browse/SPARK-34393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280439#comment-17280439 ] jiaan.geng commented on SPARK-34393: I'm working on. > Unify output of SHOW VIEWS and pass output attributes properly > -- > > Key: SPARK-34393 > URL: https://issues.apache.org/jira/browse/SPARK-34393 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34393) Unify output of SHOW VIEWS and pass output attributes properly
jiaan.geng created SPARK-34393: -- Summary: Unify output of SHOW VIEWS and pass output attributes properly Key: SPARK-34393 URL: https://issues.apache.org/jira/browse/SPARK-34393 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: jiaan.geng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33979) Filter predicate reorder
[ https://issues.apache.org/jira/browse/SPARK-33979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33979: Description: Reorder filter predicate to improve query performance: {noformat} others < In < Like < UDF/CaseWhen/If < Inset < LikeAny/LikeAll {noformat} [https://www.ibm.com/support/knowledgecenter/SSSHTQ_8.1.0/com.ibm.netcool_OMNIbus.doc_8.1.0/omnibus/wip/admin/reference/omn_adm_per_optimizationrules.html#omn_adm_per_optimizationrules__reorder] [https://docs.oracle.com/en/database/oracle/oracle-database/21/addci/extensible-optimizer-interface.html#GUID-28A4EDA6-19DD-4773-B3B8-1802C3B01E21] [https://docs.oracle.com/cd/B10501_01/server.920/a96533/hintsref.htm#13676] https://issues.apache.org/jira/browse/HIVE-21857 was: Reorder filter predicate to improve query performance: {noformat} others < In < Like < UDF/CaseWhen/If < Inset < LikeAny/LikeAll {noformat} https://www.ibm.com/support/knowledgecenter/SSSHTQ_8.1.0/com.ibm.netcool_OMNIbus.doc_8.1.0/omnibus/wip/admin/reference/omn_adm_per_optimizationrules.html#omn_adm_per_optimizationrules__reorder https://docs.oracle.com/en/database/oracle/oracle-database/21/addci/extensible-optimizer-interface.html#GUID-28A4EDA6-19DD-4773-B3B8-1802C3B01E21 https://docs.oracle.com/cd/B10501_01/server.920/a96533/hintsref.htm#13676 > Filter predicate reorder > > > Key: SPARK-33979 > URL: https://issues.apache.org/jira/browse/SPARK-33979 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Reorder filter predicate to improve query performance: > {noformat} > others < In < Like < UDF/CaseWhen/If < Inset < LikeAny/LikeAll > {noformat} > [https://www.ibm.com/support/knowledgecenter/SSSHTQ_8.1.0/com.ibm.netcool_OMNIbus.doc_8.1.0/omnibus/wip/admin/reference/omn_adm_per_optimizationrules.html#omn_adm_per_optimizationrules__reorder] > > [https://docs.oracle.com/en/database/oracle/oracle-database/21/addci/extensible-optimizer-interface.html#GUID-28A4EDA6-19DD-4773-B3B8-1802C3B01E21] > [https://docs.oracle.com/cd/B10501_01/server.920/a96533/hintsref.htm#13676] > https://issues.apache.org/jira/browse/HIVE-21857 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34392) Invalid ID for offset-based ZoneId since Spark 3.0
Yuming Wang created SPARK-34392: --- Summary: Invalid ID for offset-based ZoneId since Spark 3.0 Key: SPARK-34392 URL: https://issues.apache.org/jira/browse/SPARK-34392 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1, 3.0.0 Reporter: Yuming Wang How to reproduce this issue: {code:sql} select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); {code} Spark 2.4: {noformat} spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); 2020-02-07 08:00:00 Time taken: 0.089 seconds, Fetched 1 row(s) {noformat} Spark 3.x: {noformat} spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); 21/02/07 01:24:32 ERROR SparkSQLDriver: Failed in [select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00")] java.time.DateTimeException: Invalid ID for offset-based ZoneId: GMT+8:00 at java.time.ZoneId.ofWithPrefix(ZoneId.java:437) at java.time.ZoneId.of(ZoneId.java:407) at java.time.ZoneId.of(ZoneId.java:359) at java.time.ZoneId.of(ZoneId.java:315) at org.apache.spark.sql.catalyst.util.DateTimeUtils$.getZoneId(DateTimeUtils.scala:53) at org.apache.spark.sql.catalyst.util.DateTimeUtils$.toUTCTime(DateTimeUtils.scala:814) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org