[jira] [Created] (SPARK-24170) [Spark SQL] json file format is not dropped after dropping table
ABHISHEK KUMAR GUPTA created SPARK-24170: Summary: [Spark SQL] json file format is not dropped after dropping table Key: SPARK-24170 URL: https://issues.apache.org/jira/browse/SPARK-24170 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Environment: OS: SUSE 11 Spark Version: 2.3 Reporter: ABHISHEK KUMAR GUPTA Steps: # Launch spark-sql --master yarn # create table json(name STRING, age int, gender string, id INT) using org.apache.spark.sql.json options(path "hdfs:///user/testdemo/"); # Execute the below SQL queries INSERT into json SELECT 'Shaan',21,'Male',1 UNION ALL SELECT 'Xing',20,'Female',11 UNION ALL SELECT 'Mile',4,'Female',20 UNION ALL SELECT 'Malan',10,'Male',9; Below 4 json file format created BLR123111:/opt/Antsecure/install/hadoop/namenode/bin # ./hdfs dfs -ls /user/testdemo Found 14 items -rw-r--r-- 3 spark hadoop 0 2018-04-26 17:44 /user/testdemo/_SUCCESS -rw-r--r-- 3 spark hadoop 4802 2018-04-24 18:20 /user/testdemo/customer1.csv -rw-r--r-- 3 spark hadoop 92 2018-04-26 17:02 /user/testdemo/json1.txt -rw-r--r-- 3 spark hadoop 49 2018-04-26 17:32 /user/testdemo/part-0-4311f66b-ba1b-4a4d-a289-1a211f27f653-c000.json -rw-r--r-- 3 spark hadoop 49 2018-04-26 17:44 /user/testdemo/part-0-b8a8e16a-91a8-48ec-9998-2d741c52cf5a-c000.json -rw-r--r-- 3 spark hadoop 51 2018-04-26 17:32 /user/testdemo/part-1-4311f66b-ba1b-4a4d-a289-1a211f27f653-c000.json -rw-r--r-- 3 spark hadoop 51 2018-04-26 17:44 /user/testdemo/part-1-b8a8e16a-91a8-48ec-9998-2d741c52cf5a-c000.json -rw-r--r-- 3 spark hadoop 50 2018-04-26 17:32 /user/testdemo/part-2-4311f66b-ba1b-4a4d-a289-1a211f27f653-c000.json -rw-r--r-- 3 spark hadoop 50 2018-04-26 17:44 /user/testdemo/part-2-b8a8e16a-91a8-48ec-9998-2d741c52cf5a-c000.json -rw-r--r-- 3 spark hadoop 49 2018-04-26 17:32 /user/testdemo/part-3-4311f66b-ba1b-4a4d-a289-1a211f27f653-c000.json -rw-r--r-- 3 spark hadoop 49 2018-04-26 17:44 /user/testdemo/part-3-b8a8e16a-91a8-48ec-9998-2d741c52cf5a-c000.json Issue is: Now executed below drop command spark-sql> drop table json; Table dropped successfully but json file still present in the path /user/testdemo -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24152) SparkR CRAN feasibility check server problem
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461983#comment-16461983 ] Felix Cheung edited comment on SPARK-24152 at 5/3/18 6:29 AM: -- ok good. in the event this reoccurs persistently, option 1: * since we have NO_TESTS, we could remove --as-cran from this line [https://github.com/apache/spark/blob/master/R/check-cran.sh#L54] (temporarily) option 2: - we could set _R_CHECK_CRAN_INCOMING_ to "FALSE" in the environment to disable this check, check_CRAN_incoming() (see [http://mtweb.cs.ucl.ac.uk/mus/bin/install_R/R-3.1.1/src/library/tools/R/check.R] was (Author: felixcheung): ok good. in the event this reoccurs persistently, option 1: * since we have NO_TESTS, we could remove --as-cran from this line [https://github.com/apache/spark/blob/master/R/check-cran.sh#L54] (temporarily) option 2: - we could set _R_CHECK_CRAN_INCOMING_ to "FALSE" in the environment to disable this check check_CRAN_incoming() (see [http://mtweb.cs.ucl.ac.uk/mus/bin/install_R/R-3.1.1/src/library/tools/R/check.R] > SparkR CRAN feasibility check server problem > > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Liang-Chi Hsieh >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24152) SparkR CRAN feasibility check server problem
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-24152: - Summary: SparkR CRAN feasibility check server problem (was: Flaky Test: SparkR) > SparkR CRAN feasibility check server problem > > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Liang-Chi Hsieh >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461993#comment-16461993 ] Felix Cheung commented on SPARK-24152: -- (I updated the bug title - it's not really flaky..) > SparkR CRAN feasibility check server problem > > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Liang-Chi Hsieh >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461983#comment-16461983 ] Felix Cheung edited comment on SPARK-24152 at 5/3/18 6:26 AM: -- ok good. in the event this reoccurs persistently, option 1: * since we have NO_TESTS, we could remove --as-cran from this line [https://github.com/apache/spark/blob/master/R/check-cran.sh#L54] (temporarily) option 2: - we could set _R_CHECK_CRAN_INCOMING_ to "FALSE" in the environment to disable this check check_CRAN_incoming() (see [http://mtweb.cs.ucl.ac.uk/mus/bin/install_R/R-3.1.1/src/library/tools/R/check.R|http://mtweb.cs.ucl.ac.uk/mus/bin/install_R/R-3.1.1/src/library/tools/R/check.R)] was (Author: felixcheung): ok good. in the event this reoccurs persistently, option 1: * since we have NO_TESTS, we could remove --as-cran from this line [https://github.com/apache/spark/blob/master/R/check-cran.sh#L54] (temporarily) option 2: - we could set _R_CHECK_CRAN_INCOMING_ to "FALSE" in the environment to disable this check check_CRAN_incoming() (see [http://mtweb.cs.ucl.ac.uk/mus/bin/install_R/R-3.1.1/src/library/tools/R/check.R)] > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Liang-Chi Hsieh >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461983#comment-16461983 ] Felix Cheung edited comment on SPARK-24152 at 5/3/18 6:26 AM: -- ok good. in the event this reoccurs persistently, option 1: * since we have NO_TESTS, we could remove --as-cran from this line [https://github.com/apache/spark/blob/master/R/check-cran.sh#L54] (temporarily) option 2: - we could set _R_CHECK_CRAN_INCOMING_ to "FALSE" in the environment to disable this check check_CRAN_incoming() (see [http://mtweb.cs.ucl.ac.uk/mus/bin/install_R/R-3.1.1/src/library/tools/R/check.R] was (Author: felixcheung): ok good. in the event this reoccurs persistently, option 1: * since we have NO_TESTS, we could remove --as-cran from this line [https://github.com/apache/spark/blob/master/R/check-cran.sh#L54] (temporarily) option 2: - we could set _R_CHECK_CRAN_INCOMING_ to "FALSE" in the environment to disable this check check_CRAN_incoming() (see [http://mtweb.cs.ucl.ac.uk/mus/bin/install_R/R-3.1.1/src/library/tools/R/check.R|http://mtweb.cs.ucl.ac.uk/mus/bin/install_R/R-3.1.1/src/library/tools/R/check.R)] > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Liang-Chi Hsieh >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461983#comment-16461983 ] Felix Cheung commented on SPARK-24152: -- ok good. in the event this reoccurs persistently, option 1: * since we have NO_TESTS, we could remove --as-cran from this line [https://github.com/apache/spark/blob/master/R/check-cran.sh#L54] (temporarily) option 2: - we could set _R_CHECK_CRAN_INCOMING_ to "FALSE" in the environment to disable this check check_CRAN_incoming() (see [http://mtweb.cs.ucl.ac.uk/mus/bin/install_R/R-3.1.1/src/library/tools/R/check.R)] > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Liang-Chi Hsieh >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461980#comment-16461980 ] Hyukjin Kwon commented on SPARK-24152: -- Seems fixed just now. I found one build passed - https://github.com/apache/spark/pull/21190#issuecomment-386198706 but let me check other ones before resolving this .. > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Liang-Chi Hsieh >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461974#comment-16461974 ] Felix Cheung commented on SPARK-24152: -- Is this still a problem? > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Liang-Chi Hsieh >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461971#comment-16461971 ] Hyukjin Kwon commented on SPARK-24152: -- FYI [~smilegator] and [~cloud_fan] > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Liang-Chi Hsieh >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461965#comment-16461965 ] Hyukjin Kwon commented on SPARK-24152: -- Thanks [~viirya], I retriggered one build. Will resolve this once it gets passed. > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Liang-Chi Hsieh >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-24152: Assignee: Liang-Chi Hsieh > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Liang-Chi Hsieh >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461962#comment-16461962 ] Liang-Chi Hsieh commented on SPARK-24152: - CRAN sysadmin replied me it should be fixed now. I can't access laptop so don't confirm it. Maybe someone can confirm it by checking if Jenkins R tests pass now. Thanks. > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24151) CURRENT_DATE, CURRENT_TIMESTAMP incorrectly resolved as column names when caseSensitive is enabled
[ https://issues.apache.org/jira/browse/SPARK-24151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461957#comment-16461957 ] Dongjoon Hyun commented on SPARK-24151: --- Thank you for reporting and fixing this, [~jamesthomp]. I also checked that this is a regression at Apache Spark 2.2.1 as you reported. > CURRENT_DATE, CURRENT_TIMESTAMP incorrectly resolved as column names when > caseSensitive is enabled > -- > > Key: SPARK-24151 > URL: https://issues.apache.org/jira/browse/SPARK-24151 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.0 >Reporter: James Thompson >Priority: Major > > After this change: https://issues.apache.org/jira/browse/SPARK-22333 > Running SQL such as "CURRENT_TIMESTAMP" can fail spark.sql.caseSensitive has > been enabled: > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`CURRENT_TIMESTAMP`' > given input columns: [col1]{code} > This is due to the fact that the analyzer incorrectly uses a case sensitive > resolver to resolve the function. I will submit a PR with a fix + test for > this. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24169) JsonToStructs should not access SQLConf at executor side
[ https://issues.apache.org/jira/browse/SPARK-24169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24169: Assignee: Apache Spark (was: Wenchen Fan) > JsonToStructs should not access SQLConf at executor side > > > Key: SPARK-24169 > URL: https://issues.apache.org/jira/browse/SPARK-24169 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24169) JsonToStructs should not access SQLConf at executor side
[ https://issues.apache.org/jira/browse/SPARK-24169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24169: Assignee: Wenchen Fan (was: Apache Spark) > JsonToStructs should not access SQLConf at executor side > > > Key: SPARK-24169 > URL: https://issues.apache.org/jira/browse/SPARK-24169 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24169) JsonToStructs should not access SQLConf at executor side
[ https://issues.apache.org/jira/browse/SPARK-24169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461954#comment-16461954 ] Apache Spark commented on SPARK-24169: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/21226 > JsonToStructs should not access SQLConf at executor side > > > Key: SPARK-24169 > URL: https://issues.apache.org/jira/browse/SPARK-24169 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24151) CURRENT_DATE, CURRENT_TIMESTAMP incorrectly resolved as column names when caseSensitive is enabled
[ https://issues.apache.org/jira/browse/SPARK-24151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24151: -- Affects Version/s: 2.2.1 > CURRENT_DATE, CURRENT_TIMESTAMP incorrectly resolved as column names when > caseSensitive is enabled > -- > > Key: SPARK-24151 > URL: https://issues.apache.org/jira/browse/SPARK-24151 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.0 >Reporter: James Thompson >Priority: Major > > After this change: https://issues.apache.org/jira/browse/SPARK-22333 > Running SQL such as "CURRENT_TIMESTAMP" can fail spark.sql.caseSensitive has > been enabled: > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`CURRENT_TIMESTAMP`' > given input columns: [col1]{code} > This is due to the fact that the analyzer incorrectly uses a case sensitive > resolver to resolve the function. I will submit a PR with a fix + test for > this. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24169) JsonToStructs should not access SQLConf at executor side
Wenchen Fan created SPARK-24169: --- Summary: JsonToStructs should not access SQLConf at executor side Key: SPARK-24169 URL: https://issues.apache.org/jira/browse/SPARK-24169 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24168) WindowExec should not access SQLConf at executor side
[ https://issues.apache.org/jira/browse/SPARK-24168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461951#comment-16461951 ] Apache Spark commented on SPARK-24168: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/21225 > WindowExec should not access SQLConf at executor side > - > > Key: SPARK-24168 > URL: https://issues.apache.org/jira/browse/SPARK-24168 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24168) WindowExec should not access SQLConf at executor side
[ https://issues.apache.org/jira/browse/SPARK-24168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24168: Assignee: Wenchen Fan (was: Apache Spark) > WindowExec should not access SQLConf at executor side > - > > Key: SPARK-24168 > URL: https://issues.apache.org/jira/browse/SPARK-24168 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24168) WindowExec should not access SQLConf at executor side
[ https://issues.apache.org/jira/browse/SPARK-24168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24168: Assignee: Apache Spark (was: Wenchen Fan) > WindowExec should not access SQLConf at executor side > - > > Key: SPARK-24168 > URL: https://issues.apache.org/jira/browse/SPARK-24168 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24168) WindowExec should not access SQLConf at executor side
Wenchen Fan created SPARK-24168: --- Summary: WindowExec should not access SQLConf at executor side Key: SPARK-24168 URL: https://issues.apache.org/jira/browse/SPARK-24168 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24167) ParquetFilters should not access SQLConf at executor side
[ https://issues.apache.org/jira/browse/SPARK-24167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24167: Assignee: Apache Spark (was: Wenchen Fan) > ParquetFilters should not access SQLConf at executor side > - > > Key: SPARK-24167 > URL: https://issues.apache.org/jira/browse/SPARK-24167 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24167) ParquetFilters should not access SQLConf at executor side
[ https://issues.apache.org/jira/browse/SPARK-24167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24167: Assignee: Wenchen Fan (was: Apache Spark) > ParquetFilters should not access SQLConf at executor side > - > > Key: SPARK-24167 > URL: https://issues.apache.org/jira/browse/SPARK-24167 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24167) ParquetFilters should not access SQLConf at executor side
[ https://issues.apache.org/jira/browse/SPARK-24167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461949#comment-16461949 ] Apache Spark commented on SPARK-24167: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/21224 > ParquetFilters should not access SQLConf at executor side > - > > Key: SPARK-24167 > URL: https://issues.apache.org/jira/browse/SPARK-24167 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24167) ParquetFilters should not access SQLConf at executor side
Wenchen Fan created SPARK-24167: --- Summary: ParquetFilters should not access SQLConf at executor side Key: SPARK-24167 URL: https://issues.apache.org/jira/browse/SPARK-24167 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24166) InMemoryTableScanExec should not access SQLConf at executor side
[ https://issues.apache.org/jira/browse/SPARK-24166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461935#comment-16461935 ] Apache Spark commented on SPARK-24166: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/21223 > InMemoryTableScanExec should not access SQLConf at executor side > > > Key: SPARK-24166 > URL: https://issues.apache.org/jira/browse/SPARK-24166 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24166) InMemoryTableScanExec should not access SQLConf at executor side
[ https://issues.apache.org/jira/browse/SPARK-24166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24166: Assignee: Apache Spark (was: Wenchen Fan) > InMemoryTableScanExec should not access SQLConf at executor side > > > Key: SPARK-24166 > URL: https://issues.apache.org/jira/browse/SPARK-24166 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24166) InMemoryTableScanExec should not access SQLConf at executor side
[ https://issues.apache.org/jira/browse/SPARK-24166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24166: Assignee: Wenchen Fan (was: Apache Spark) > InMemoryTableScanExec should not access SQLConf at executor side > > > Key: SPARK-24166 > URL: https://issues.apache.org/jira/browse/SPARK-24166 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24166) InMemoryTableScanExec should not access SQLConf at executor side
Wenchen Fan created SPARK-24166: --- Summary: InMemoryTableScanExec should not access SQLConf at executor side Key: SPARK-24166 URL: https://issues.apache.org/jira/browse/SPARK-24166 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461932#comment-16461932 ] Hyukjin Kwon edited comment on SPARK-24152 at 5/3/18 4:44 AM: -- For the past issue, it was fixed within only a couple of hours (after his action to CRAN admin). Will take an action if it takes a longer while. was (Author: hyukjin.kwon): For the past issue, it was fixed within only a couple of hours. Will take an action if it takes a longer while. > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461932#comment-16461932 ] Hyukjin Kwon commented on SPARK-24152: -- For the past issue, it was fixed within only a couple of hours. Will take an action if it takes a longer while. > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461931#comment-16461931 ] Shivaram Venkataraman commented on SPARK-24152: --- If this is blocking all PRs I think its fine to temporarily remove the CRAN check from Jenkins – We'll just need to be extra careful while merging SparkR PRs for a short period of time. > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461929#comment-16461929 ] Hyukjin Kwon edited comment on SPARK-24152 at 5/3/18 4:40 AM: -- >From Liang-Chi's comment and given previous discussion and resolution - >[https://github.com/apache/spark/pull/20005|https://github.com/apache/spark/pull/20005] > (SPARK-22812) seems it's a problem from R's. We (mainly he) investigated this >problem there and he solved that by asking / reporting the problem to R dev. I >think it's outside of Spark and we could wait for the response. BTW, I think this is quite critical since it blocks all other PRs. was (Author: hyukjin.kwon): >From Liang-Chi's comment and given previous discussion and resolution - >[https://github.com/apache/spark/pull/20005|https://github.com/apache/spark/pull/20005,] > (SPARK-22812) seems it's a problem from R's. We (mainly he) investigated this >problem there and he solved that by asking / reporting the problem to R dev. I >think it's outside of Spark and we could wait for the response. BTW, I think this is quite critical since it blocks all other PRs. > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461929#comment-16461929 ] Hyukjin Kwon edited comment on SPARK-24152 at 5/3/18 4:39 AM: -- >From Liang-Chi's comment and given previous discussion and resolution - >[https://github.com/apache/spark/pull/20005|https://github.com/apache/spark/pull/20005,] > (SPARK-22812) seems it's a problem from R's. We (mainly he) investigated this >problem there and he solved that by asking / reporting the problem to R dev. I >think it's outside of Spark and we could wait for the response. BTW, I think this is quite critical since it blocks all other PRs. was (Author: hyukjin.kwon): >From Liang-Chi's comment and given previous discussion and resolution - >[https://github.com/apache/spark/pull/20005,] seems it's a problem from R's. >We (mainly he) investigated this problem there and he solved that by asking / >reporting the problem to R dev. I think it's outside of Spark and we could >wait for the response. > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461929#comment-16461929 ] Hyukjin Kwon commented on SPARK-24152: -- >From Liang-Chi's comment and given previous discussion and resolution - >[https://github.com/apache/spark/pull/20005,] seems it's a problem from R's. >We (mainly he) investigated this problem there and he solved that by asking / >reporting the problem to R dev. I think it's outside of Spark and we could >wait for the response. > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461920#comment-16461920 ] Shivaram Venkataraman commented on SPARK-24152: --- Unfortunately I dont have time to look at this till Friday. Do we know if the problem is in SparkR or from some other package ? > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461867#comment-16461867 ] Liang-Chi Hsieh commented on SPARK-24152: - Thanks [~hyukjin.kwon] for pinging me. I found a problem in CRAN PACKAGES.in file. Seems it causes the R test failure again. Already emailed to cran sysadmin for help. > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24161) Enable debug package feature on structured streaming
[ https://issues.apache.org/jira/browse/SPARK-24161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24161: Assignee: (was: Apache Spark) > Enable debug package feature on structured streaming > > > Key: SPARK-24161 > URL: https://issues.apache.org/jira/browse/SPARK-24161 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Jungtaek Lim >Priority: Major > > Currently, debug package has a implicit class which matches Dataset to > provide debug features on Dataset class. It doesn't work with structured > streaming: it requires query is already started, and the information can be > retrieved from StreamingQuery, not Dataset. For the same reason, "explain" > had to be placed to StreamingQuery whereas it exists on Dataset. > This issue tracks effort to enable debug package feature on structured > streaming. Unlike batch, it may have some restrictions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24161) Enable debug package feature on structured streaming
[ https://issues.apache.org/jira/browse/SPARK-24161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24161: Assignee: Apache Spark > Enable debug package feature on structured streaming > > > Key: SPARK-24161 > URL: https://issues.apache.org/jira/browse/SPARK-24161 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Jungtaek Lim >Assignee: Apache Spark >Priority: Major > > Currently, debug package has a implicit class which matches Dataset to > provide debug features on Dataset class. It doesn't work with structured > streaming: it requires query is already started, and the information can be > retrieved from StreamingQuery, not Dataset. For the same reason, "explain" > had to be placed to StreamingQuery whereas it exists on Dataset. > This issue tracks effort to enable debug package feature on structured > streaming. Unlike batch, it may have some restrictions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24161) Enable debug package feature on structured streaming
[ https://issues.apache.org/jira/browse/SPARK-24161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461858#comment-16461858 ] Apache Spark commented on SPARK-24161: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/21222 > Enable debug package feature on structured streaming > > > Key: SPARK-24161 > URL: https://issues.apache.org/jira/browse/SPARK-24161 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Jungtaek Lim >Priority: Major > > Currently, debug package has a implicit class which matches Dataset to > provide debug features on Dataset class. It doesn't work with structured > streaming: it requires query is already started, and the information can be > retrieved from StreamingQuery, not Dataset. For the same reason, "explain" > had to be placed to StreamingQuery whereas it exists on Dataset. > This issue tracks effort to enable debug package feature on structured > streaming. Unlike batch, it may have some restrictions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24110) Avoid calling UGI loginUserFromKeytab in ThriftServer
[ https://issues.apache.org/jira/browse/SPARK-24110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved SPARK-24110. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21178 [https://github.com/apache/spark/pull/21178] > Avoid calling UGI loginUserFromKeytab in ThriftServer > - > > Key: SPARK-24110 > URL: https://issues.apache.org/jira/browse/SPARK-24110 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Saisai Shao >Assignee: Saisai Shao >Priority: Major > Fix For: 2.4.0 > > > Spark ThriftServer will call UGI.loginUserFromKeytab twice in initialization. > This is unnecessary and will cause various potential problems, like Hadoop > IPC failure after 7 days, or RM failover issue and so on. > So here we need to remove all the unnecessary login logics and make sure UGI > in the context never be created again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24110) Avoid calling UGI loginUserFromKeytab in ThriftServer
[ https://issues.apache.org/jira/browse/SPARK-24110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned SPARK-24110: --- Assignee: Saisai Shao > Avoid calling UGI loginUserFromKeytab in ThriftServer > - > > Key: SPARK-24110 > URL: https://issues.apache.org/jira/browse/SPARK-24110 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Saisai Shao >Assignee: Saisai Shao >Priority: Major > Fix For: 2.4.0 > > > Spark ThriftServer will call UGI.loginUserFromKeytab twice in initialization. > This is unnecessary and will cause various potential problems, like Hadoop > IPC failure after 7 days, or RM failover issue and so on. > So here we need to remove all the unnecessary login logics and make sure UGI > in the context never be created again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461798#comment-16461798 ] Hyukjin Kwon commented on SPARK-24152: -- cc [~viirya] too > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22812) Failing cran-check on master
[ https://issues.apache.org/jira/browse/SPARK-22812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-22812: Assignee: Liang-Chi Hsieh > Failing cran-check on master > - > > Key: SPARK-22812 > URL: https://issues.apache.org/jira/browse/SPARK-22812 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Hossein Falaki >Assignee: Liang-Chi Hsieh >Priority: Minor > > When I run {{R/run-tests.sh}} or {{R/check-cran.sh}} I get the following > failure message: > {code} > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 22] do not match the length of object [0] > {code} > cc [~felixcheung] have you experienced this error before? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24165) UDF within when().otherwise() raises NullPointerException
Jingxuan Wang created SPARK-24165: - Summary: UDF within when().otherwise() raises NullPointerException Key: SPARK-24165 URL: https://issues.apache.org/jira/browse/SPARK-24165 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: Jingxuan Wang I have a UDF which takes java.sql.Timestamp and String as input column type and returns an Array of (Seq[case class], Double) as output. Since some of values in input columns can be nullable, I put the UDF inside a when($input.isNull, null).otherwise(UDF) filter. Such function works well when I test in spark shell. But running as a scala jar in spark-submit with yarn cluster mode, it raised NullPointerException which points to the UDF function. If I remove the when().otherwsie() condition, but put null check inside the UDF, the function works without issue in spark-submit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API
[ https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461752#comment-16461752 ] Apache Spark commented on SPARK-23429: -- User 'edwinalu' has created a pull request for this issue: https://github.com/apache/spark/pull/21221 > Add executor memory metrics to heartbeat and expose in executors REST API > - > > Key: SPARK-23429 > URL: https://issues.apache.org/jira/browse/SPARK-23429 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edwina Lu >Priority: Major > > Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, > offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, > onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the > executors REST API. This information will help provide insight into how > executor and driver JVM memory is used, and for the different memory regions. > It can be used to help determine good values for spark.executor.memory, > spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction. > Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, > offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This > will track the memory usage at the executor level. The new ExecutorMetrics > will be sent by executors to the driver as part of the Heartbeat. A heartbeat > will be added for the driver as well, to collect these metrics for the driver. > Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there > is a new peak value for one of the memory metrics for an executor and stage. > Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize > additional logging. Analysis on a set of sample applications showed an > increase of 0.25% in the size of the Spark history log, with this approach. > Modify the AppStatusListener to collect snapshots of peak values for each > memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and > storageMemory, and list of active stages. > Add the new memory metrics (snapshots of peak values for each memory metric) > to the executors REST API. > This is a subtask for SPARK-23206. Please refer to the design doc for that > ticket for more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24164) Support column list as the pivot column in Pivot
Maryann Xue created SPARK-24164: --- Summary: Support column list as the pivot column in Pivot Key: SPARK-24164 URL: https://issues.apache.org/jira/browse/SPARK-24164 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Maryann Xue This is part of a functionality extension to Pivot SQL support as SPARK-24035. Currently, we only support a single column as the pivot column, while a column list as the pivot column would look like: {code:java} SELECT * FROM ( SELECT year, course, earnings FROM courseSales ) PIVOT ( sum(earnings) FOR (course, year) IN (('dotNET', 2012), ('Java', 2013)) );{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24163) Support "ANY" or sub-query for Pivot "IN" clause
Maryann Xue created SPARK-24163: --- Summary: Support "ANY" or sub-query for Pivot "IN" clause Key: SPARK-24163 URL: https://issues.apache.org/jira/browse/SPARK-24163 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Maryann Xue This is part of a functionality extension to Pivot SQL support as SPARK-24035. Currently, only literal values are allowed in Pivot "IN" clause. To support ANY or a sub-query in the "IN" clause (the examples of which provided below), we need to enable evaluation of a sub-query before/during query analysis time. {code:java} SELECT * FROM ( SELECT year, course, earnings FROM courseSales ) PIVOT ( sum(earnings) FOR course IN ANY );{code} {code:java} SELECT * FROM ( SELECT year, course, earnings FROM courseSales ) PIVOT ( sum(earnings) FOR course IN ( SELECT course FROM courses WHERE region = 'AZ' ) ); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24162) Support aliased literal values for Pivot "IN" clause
Maryann Xue created SPARK-24162: --- Summary: Support aliased literal values for Pivot "IN" clause Key: SPARK-24162 URL: https://issues.apache.org/jira/browse/SPARK-24162 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Maryann Xue This is part of a functionality extension to Pivot SQL support as SPARK-24035. When literal values are specified in Pivot IN clause, it would be nice to allow aliases for those values so that the output column names can be customized. For example: {code:java} SELECT * FROM ( SELECT year, course, earnings FROM courseSales ) PIVOT ( sum(earnings) FOR course IN ('dotNET' as c1, 'Java' as c2) );{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24111) Add TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark
[ https://issues.apache.org/jira/browse/SPARK-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24111. - Resolution: Fixed Assignee: Takeshi Yamamuro Fix Version/s: 2.4.0 > Add TPCDS v2.7 (latest) queries in TPCDSQueryBenchmark > -- > > Key: SPARK-24111 > URL: https://issues.apache.org/jira/browse/SPARK-24111 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Trivial > Fix For: 2.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24161) Enable debug package feature on structured streaming
Jungtaek Lim created SPARK-24161: Summary: Enable debug package feature on structured streaming Key: SPARK-24161 URL: https://issues.apache.org/jira/browse/SPARK-24161 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 2.3.0 Reporter: Jungtaek Lim Currently, debug package has a implicit class which matches Dataset to provide debug features on Dataset class. It doesn't work with structured streaming: it requires query is already started, and the information can be retrieved from StreamingQuery, not Dataset. For the same reason, "explain" had to be placed to StreamingQuery whereas it exists on Dataset. This issue tracks effort to enable debug package feature on structured streaming. Unlike batch, it may have some restrictions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24161) Enable debug package feature on structured streaming
[ https://issues.apache.org/jira/browse/SPARK-24161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461713#comment-16461713 ] Jungtaek Lim commented on SPARK-24161: -- I have a working patch. Will raise a PR sooner. > Enable debug package feature on structured streaming > > > Key: SPARK-24161 > URL: https://issues.apache.org/jira/browse/SPARK-24161 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Jungtaek Lim >Priority: Major > > Currently, debug package has a implicit class which matches Dataset to > provide debug features on Dataset class. It doesn't work with structured > streaming: it requires query is already started, and the information can be > retrieved from StreamingQuery, not Dataset. For the same reason, "explain" > had to be placed to StreamingQuery whereas it exists on Dataset. > This issue tracks effort to enable debug package feature on structured > streaming. Unlike batch, it may have some restrictions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24157) Enable no-data micro batches for streaming aggregation and deduplication
[ https://issues.apache.org/jira/browse/SPARK-24157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461680#comment-16461680 ] Apache Spark commented on SPARK-24157: -- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/21220 > Enable no-data micro batches for streaming aggregation and deduplication > > > Key: SPARK-24157 > URL: https://issues.apache.org/jira/browse/SPARK-24157 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24157) Enable no-data micro batches for streaming aggregation and deduplication
[ https://issues.apache.org/jira/browse/SPARK-24157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24157: Assignee: Apache Spark (was: Tathagata Das) > Enable no-data micro batches for streaming aggregation and deduplication > > > Key: SPARK-24157 > URL: https://issues.apache.org/jira/browse/SPARK-24157 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Tathagata Das >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24157) Enable no-data micro batches for streaming aggregation and deduplication
[ https://issues.apache.org/jira/browse/SPARK-24157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24157: Assignee: Tathagata Das (was: Apache Spark) > Enable no-data micro batches for streaming aggregation and deduplication > > > Key: SPARK-24157 > URL: https://issues.apache.org/jira/browse/SPARK-24157 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24160) ShuffleBlockFetcherIterator should fail if it receives zero-size blocks
[ https://issues.apache.org/jira/browse/SPARK-24160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24160: Assignee: Josh Rosen (was: Apache Spark) > ShuffleBlockFetcherIterator should fail if it receives zero-size blocks > --- > > Key: SPARK-24160 > URL: https://issues.apache.org/jira/browse/SPARK-24160 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 2.3.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Major > > In the shuffle layer, we guarantee that zero-size blocks will never be > requested (a block containing zero records is always 0 bytes in size and is > marked as empty such that it will never be legitimately requested by > executors). However, we failed to take advantage of this in the shuffle-read > path: the existing code did not explicitly check whether blocks are > non-zero-size. > > We should add `buf.size != 0` checks to ShuffleBlockFetcherIterator to take > advantage of this invariant and prevent potential data loss / corruption > issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24160) ShuffleBlockFetcherIterator should fail if it receives zero-size blocks
[ https://issues.apache.org/jira/browse/SPARK-24160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24160: Assignee: Apache Spark (was: Josh Rosen) > ShuffleBlockFetcherIterator should fail if it receives zero-size blocks > --- > > Key: SPARK-24160 > URL: https://issues.apache.org/jira/browse/SPARK-24160 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 2.3.0 >Reporter: Josh Rosen >Assignee: Apache Spark >Priority: Major > > In the shuffle layer, we guarantee that zero-size blocks will never be > requested (a block containing zero records is always 0 bytes in size and is > marked as empty such that it will never be legitimately requested by > executors). However, we failed to take advantage of this in the shuffle-read > path: the existing code did not explicitly check whether blocks are > non-zero-size. > > We should add `buf.size != 0` checks to ShuffleBlockFetcherIterator to take > advantage of this invariant and prevent potential data loss / corruption > issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24160) ShuffleBlockFetcherIterator should fail if it receives zero-size blocks
[ https://issues.apache.org/jira/browse/SPARK-24160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461675#comment-16461675 ] Apache Spark commented on SPARK-24160: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/21219 > ShuffleBlockFetcherIterator should fail if it receives zero-size blocks > --- > > Key: SPARK-24160 > URL: https://issues.apache.org/jira/browse/SPARK-24160 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 2.3.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Major > > In the shuffle layer, we guarantee that zero-size blocks will never be > requested (a block containing zero records is always 0 bytes in size and is > marked as empty such that it will never be legitimately requested by > executors). However, we failed to take advantage of this in the shuffle-read > path: the existing code did not explicitly check whether blocks are > non-zero-size. > > We should add `buf.size != 0` checks to ShuffleBlockFetcherIterator to take > advantage of this invariant and prevent potential data loss / corruption > issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24160) ShuffleBlockFetcherIterator should fail if it receives zero-size blocks
Josh Rosen created SPARK-24160: -- Summary: ShuffleBlockFetcherIterator should fail if it receives zero-size blocks Key: SPARK-24160 URL: https://issues.apache.org/jira/browse/SPARK-24160 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 2.3.0 Reporter: Josh Rosen Assignee: Josh Rosen In the shuffle layer, we guarantee that zero-size blocks will never be requested (a block containing zero records is always 0 bytes in size and is marked as empty such that it will never be legitimately requested by executors). However, we failed to take advantage of this in the shuffle-read path: the existing code did not explicitly check whether blocks are non-zero-size. We should add `buf.size != 0` checks to ShuffleBlockFetcherIterator to take advantage of this invariant and prevent potential data loss / corruption issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24155) Instrumentation improvement for clustering
[ https://issues.apache.org/jira/browse/SPARK-24155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Wang updated SPARK-24155: Summary: Instrumentation improvement for clustering (was: Instrument improvement for clustering) > Instrumentation improvement for clustering > -- > > Key: SPARK-24155 > URL: https://issues.apache.org/jira/browse/SPARK-24155 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.3.0 >Reporter: Lu Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24132) Instrumentation improvement for classification
[ https://issues.apache.org/jira/browse/SPARK-24132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Wang updated SPARK-24132: Summary: Instrumentation improvement for classification (was: Instruments improvement for classification) > Instrumentation improvement for classification > -- > > Key: SPARK-24132 > URL: https://issues.apache.org/jira/browse/SPARK-24132 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.3.0 >Reporter: Lu Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24159) Enable no-data micro batches for streaming mapGroupswithState
Tathagata Das created SPARK-24159: - Summary: Enable no-data micro batches for streaming mapGroupswithState Key: SPARK-24159 URL: https://issues.apache.org/jira/browse/SPARK-24159 Project: Spark Issue Type: Sub-task Components: Structured Streaming Affects Versions: 2.3.0 Reporter: Tathagata Das When event-time timeout is enabled, then use watermark updates to decide whether to run another batch When processing-time timeout is enabled, then use the processing time and to decide when to run more batches. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24158) Enable no-data micro batches for streaming joins
Tathagata Das created SPARK-24158: - Summary: Enable no-data micro batches for streaming joins Key: SPARK-24158 URL: https://issues.apache.org/jira/browse/SPARK-24158 Project: Spark Issue Type: Sub-task Components: Structured Streaming Affects Versions: 2.3.0 Reporter: Tathagata Das -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24158) Enable no-data micro batches for streaming joins
[ https://issues.apache.org/jira/browse/SPARK-24158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das reassigned SPARK-24158: - Assignee: Tathagata Das > Enable no-data micro batches for streaming joins > > > Key: SPARK-24158 > URL: https://issues.apache.org/jira/browse/SPARK-24158 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24156) Enable no-data micro batches for more eager streaming state clean up
Tathagata Das created SPARK-24156: - Summary: Enable no-data micro batches for more eager streaming state clean up Key: SPARK-24156 URL: https://issues.apache.org/jira/browse/SPARK-24156 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 2.3.0 Reporter: Tathagata Das Assignee: Tathagata Das Currently, MicroBatchExecution in Structured Streaming runs batches only when there is new data to process. This is sensible in most cases as we dont want to unnecessarily use resources when there is nothing new to process. However, in some cases of stateful streaming queries, this delays state clean up as well as clean-up based output. For example, consider a streaming aggregation query with watermark-based state cleanup. The watermark is updated after every batch with new data completes. The updated value is used in the next batch to clean up state, and output finalized aggregates in append mode. However, if there is no data, then the next batch does not occur, and cleanup/output gets delayed unnecessarily. This is true for all stateful streaming operators - aggregation, deduplication, joins, mapGroupsWithState This issue tracks the work to enable no-data batches in MicroBatchExecution. The major challenge is that all the tests of relevant stateful operations add dummy data to force another batch for testing the state cleanup. So a lot of the tests are going to be changed. So my plan is to enable no-data batches for different stateful operators one at a time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24157) Enable no-data micro batches for streaming aggregation and deduplication
Tathagata Das created SPARK-24157: - Summary: Enable no-data micro batches for streaming aggregation and deduplication Key: SPARK-24157 URL: https://issues.apache.org/jira/browse/SPARK-24157 Project: Spark Issue Type: Sub-task Components: Structured Streaming Affects Versions: 2.3.0 Reporter: Tathagata Das Assignee: Tathagata Das -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24155) Instrument improvement for clustering
[ https://issues.apache.org/jira/browse/SPARK-24155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24155: Assignee: Apache Spark > Instrument improvement for clustering > - > > Key: SPARK-24155 > URL: https://issues.apache.org/jira/browse/SPARK-24155 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.3.0 >Reporter: Lu Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24155) Instrument improvement for clustering
[ https://issues.apache.org/jira/browse/SPARK-24155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461640#comment-16461640 ] Apache Spark commented on SPARK-24155: -- User 'ludatabricks' has created a pull request for this issue: https://github.com/apache/spark/pull/21218 > Instrument improvement for clustering > - > > Key: SPARK-24155 > URL: https://issues.apache.org/jira/browse/SPARK-24155 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.3.0 >Reporter: Lu Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24155) Instrument improvement for clustering
[ https://issues.apache.org/jira/browse/SPARK-24155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24155: Assignee: (was: Apache Spark) > Instrument improvement for clustering > - > > Key: SPARK-24155 > URL: https://issues.apache.org/jira/browse/SPARK-24155 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.3.0 >Reporter: Lu Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24155) Instrument improvement for clustering
Lu Wang created SPARK-24155: --- Summary: Instrument improvement for clustering Key: SPARK-24155 URL: https://issues.apache.org/jira/browse/SPARK-24155 Project: Spark Issue Type: Sub-task Components: ML Affects Versions: 2.3.0 Reporter: Lu Wang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18791) Stream-Stream Joins
[ https://issues.apache.org/jira/browse/SPARK-18791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-18791. --- Resolution: Done Fix Version/s: 2.3.0 > Stream-Stream Joins > --- > > Key: SPARK-18791 > URL: https://issues.apache.org/jira/browse/SPARK-18791 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Michael Armbrust >Assignee: Tathagata Das >Priority: Major > Fix For: 2.3.0 > > > Stream stream join is a much requested, but missing feature in Structured > Streaming. While the join API exists in Datasets and DataFrames, it throws > UnsupportedOperationException when applied between two streaming > Datasets/DataFrames. To support this, we have to maintain the same semantics > as other Structured Streaming operations - the result of the operation after > consuming two data streams data till positions/offsets X and Y, respectively, > must be the same as a single batch join operation on all the data till > positions X and Y, respectively. To achieve this, the execution has to buffer > past data (i.e. streaming state) from each stream, so that future data can be > matched against past data. Here is the set of a few high-level requirements. > - Buffer past rows as streaming state (using StateStore), and joining with > the past rows. > - Support state cleanup using the event time watermark when possible. > - Support different types of joins (inner, left outer, right outer is in > highest demand for ETL/enrichment type use cases [kafka -> best-effort enrich > -> write to S3]) > - Support cascading join operations (i.e. joining more than 2 streams) > - Support multiple output modes (Append mode is in highest demand for > enabling ETL/enrichment type use cases) > All the work to incrementally build this is going represented by this JIRA, > with specific subtasks for each step. At this point, this is the rough > direction as follows: > - Implement stream-stream inner join in Append Mode, supporting multiple > cascaded joins. > - Extends it stream-stream left/right outer join in Append Mode -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23923) High-order function: cardinality(x) → bigint
[ https://issues.apache.org/jira/browse/SPARK-23923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-23923. - Resolution: Fixed Assignee: Kazuaki Ishizaki Fix Version/s: 2.4.0 > High-order function: cardinality(x) → bigint > > > Key: SPARK-23923 > URL: https://issues.apache.org/jira/browse/SPARK-23923 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.4.0 > > > Ref: https://prestodb.io/docs/current/functions/array.html and > https://prestodb.io/docs/current/functions/map.html. > Returns the cardinality (size) of the array/map x. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24123) Fix a flaky test `DateTimeUtilsSuite.monthsBetween`
[ https://issues.apache.org/jira/browse/SPARK-24123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24123. - Resolution: Fixed Assignee: Marco Gaido Fix Version/s: 2.4.0 > Fix a flaky test `DateTimeUtilsSuite.monthsBetween` > --- > > Key: SPARK-24123 > URL: https://issues.apache.org/jira/browse/SPARK-24123 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > **MASTER BRANCH** > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4810/testReport/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/monthsBetween/ > {code} > Error Message > 3.949596773820191 did not equal 3.9495967741935485 > Stacktrace > org.scalatest.exceptions.TestFailedException: 3.949596773820191 did not > equal 3.9495967741935485 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) > at > org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:495) > at > org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite$$anonfun$25.apply(DateTimeUtilsSuite.scala:488) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24097) Instruments improvements - RandomForest and GradientBoostedTree
[ https://issues.apache.org/jira/browse/SPARK-24097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-24097: -- Shepherd: Joseph K. Bradley > Instruments improvements - RandomForest and GradientBoostedTree > --- > > Key: SPARK-24097 > URL: https://issues.apache.org/jira/browse/SPARK-24097 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.3.0 >Reporter: Weichen Xu >Priority: Major > > Instruments improvements - RandomForest and GradientBoostedTree -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24097) Instruments improvements - RandomForest and GradientBoostedTree
[ https://issues.apache.org/jira/browse/SPARK-24097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley reassigned SPARK-24097: - Assignee: Weichen Xu > Instruments improvements - RandomForest and GradientBoostedTree > --- > > Key: SPARK-24097 > URL: https://issues.apache.org/jira/browse/SPARK-24097 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.3.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > > Instruments improvements - RandomForest and GradientBoostedTree -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24133) Reading Parquet files containing large strings can fail with java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/SPARK-24133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24133. - Resolution: Fixed Assignee: Ala Luszczak Fix Version/s: 2.4.0 > Reading Parquet files containing large strings can fail with > java.lang.ArrayIndexOutOfBoundsException > - > > Key: SPARK-24133 > URL: https://issues.apache.org/jira/browse/SPARK-24133 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Ala Luszczak >Assignee: Ala Luszczak >Priority: Major > Fix For: 2.4.0 > > > ColumnVectors store string data in one big byte array. Since the array size > is capped at just under Integer.MAX_VALUE, a single ColumnVector cannot store > more than 2GB of string data. > However, since the Parquet files commonly contain large blobs stored as > strings, and ColumnVectors by default carry 4096 values, it's entirely > possible to go past that limit. > In such cases a negative capacity is requested from > WritableColumnVector.reserve(). The call succeeds (requested capacity is > smaller than already allocated), and consequently > java.lang.ArrayIndexOutOfBoundsException is thrown when the reader actually > attempts to put the data into the array. > This behavior is hard to troubleshoot for the users. Spark should instead > check for negative requested capacity in WritableColumnVector.reserve() and > throw more informative error, instructing the user to tweak ColumnarBatch > size. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24154) AccumulatorV2 loses type information during serialization
[ https://issues.apache.org/jira/browse/SPARK-24154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Zhemzhitsky updated SPARK-24154: --- Description: AccumulatorV2 loses type information during serialization. It happens [here|https://github.com/apache/spark/blob/4f5bad615b47d743b8932aea1071652293981604/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala#L164] during *writeReplace* call {code:scala} final protected def writeReplace(): Any = { if (atDriverSide) { if (!isRegistered) { throw new UnsupportedOperationException( "Accumulator must be registered before send to executor") } val copyAcc = copyAndReset() assert(copyAcc.isZero, "copyAndReset must return a zero value copy") val isInternalAcc = name.isDefined && name.get.startsWith(InternalAccumulator.METRICS_PREFIX) if (isInternalAcc) { // Do not serialize the name of internal accumulator and send it to executor. copyAcc.metadata = metadata.copy(name = None) } else { // For non-internal accumulators, we still need to send the name because users may need to // access the accumulator name at executor side, or they may keep the accumulators sent from // executors and access the name when the registered accumulator is already garbage // collected(e.g. SQLMetrics). copyAcc.metadata = metadata } copyAcc } else { this } } {code} It means that it is hardly possible to create new accumulators easily by adding new behaviour to existing ones by means of mix-ins or inheritance (without overriding *copy*). For example the following snippet ... {code:scala} trait TripleCount { self: LongAccumulator => abstract override def add(v: jl.Long): Unit = { self.add(v * 3) } } val acc = new LongAccumulator with TripleCount sc.register(acc) val data = 1 to 10 val rdd = sc.makeRDD(data, 5) rdd.foreach(acc.add(_)) acc.value shouldBe 3 * data.sum {code} ... fails with {code:none} org.scalatest.exceptions.TestFailedException: 55 was not equal to 165 at org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340) at org.scalatest.Matchers$AnyShouldWrapper.shouldBe(Matchers.scala:6864) {code} Also such a behaviour seems to be error prone and confusing because an implementor gets not the same thing as he/she sees in the code. was: AccumulatorV2 loses type information during serialization. It happens [here|https://github.com/apache/spark/blob/4f5bad615b47d743b8932aea1071652293981604/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala#L164] during *writeReplace* call {code:scala} final protected def writeReplace(): Any = { if (atDriverSide) { if (!isRegistered) { throw new UnsupportedOperationException( "Accumulator must be registered before send to executor") } val copyAcc = copyAndReset() assert(copyAcc.isZero, "copyAndReset must return a zero value copy") val isInternalAcc = name.isDefined && name.get.startsWith(InternalAccumulator.METRICS_PREFIX) if (isInternalAcc) { // Do not serialize the name of internal accumulator and send it to executor. copyAcc.metadata = metadata.copy(name = None) } else { // For non-internal accumulators, we still need to send the name because users may need to // access the accumulator name at executor side, or they may keep the accumulators sent from // executors and access the name when the registered accumulator is already garbage // collected(e.g. SQLMetrics). copyAcc.metadata = metadata } copyAcc } else { this } } {code} It means that it is hardly possible to create new accumulators easily by adding new behaviour to existing ones by means of mix-ins or inheritance (without overriding *copy*). For example the following snippet ... {code:scala} trait TripleCount { self: LongAccumulator => abstract override def add(v: jl.Long): Unit = { self.add(v * 3) } } val acc = new LongAccumulator with TripleCount sc.register(acc) val data = 1 to 10 val rdd = sc.makeRDD(data, 5) rdd.foreach(acc.add(_)) acc.value shouldBe 3 * data.sum {code} ... fails with {code:none} org.scalatest.exceptions.TestFailedException: 55 was not equal to 165 at org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340) at org.scalatest.Matchers$AnyShouldWrapper.shouldBe(Matchers.scala:6864) {code} > AccumulatorV2 loses type information during serialization > - > > Key: SPARK-24154 > URL: https://issues.apache.org/jira/browse/SPARK-24154 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0, 2.2.1, 2.3.0, 2.3.1 > Environment: Scala 2.11 > Spark 2.2.0 >Reporter: Sergey Zhemzhitsky >Priority: Major > > AccumulatorV2 loses type
[jira] [Created] (SPARK-24154) AccumulatorV2 loses type information during serialization
Sergey Zhemzhitsky created SPARK-24154: -- Summary: AccumulatorV2 loses type information during serialization Key: SPARK-24154 URL: https://issues.apache.org/jira/browse/SPARK-24154 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.0, 2.2.1, 2.2.0, 2.3.1 Environment: Scala 2.11 Spark 2.2.0 Reporter: Sergey Zhemzhitsky AccumulatorV2 loses type information during serialization. It happens [here|https://github.com/apache/spark/blob/4f5bad615b47d743b8932aea1071652293981604/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala#L164] during *writeReplace* call {code:scala} final protected def writeReplace(): Any = { if (atDriverSide) { if (!isRegistered) { throw new UnsupportedOperationException( "Accumulator must be registered before send to executor") } val copyAcc = copyAndReset() assert(copyAcc.isZero, "copyAndReset must return a zero value copy") val isInternalAcc = name.isDefined && name.get.startsWith(InternalAccumulator.METRICS_PREFIX) if (isInternalAcc) { // Do not serialize the name of internal accumulator and send it to executor. copyAcc.metadata = metadata.copy(name = None) } else { // For non-internal accumulators, we still need to send the name because users may need to // access the accumulator name at executor side, or they may keep the accumulators sent from // executors and access the name when the registered accumulator is already garbage // collected(e.g. SQLMetrics). copyAcc.metadata = metadata } copyAcc } else { this } } {code} It means that it is hardly possible to create new accumulators easily by adding new behaviour to existing ones by means of mix-ins or inheritance (without overriding *copy*). For example the following snippet ... {code:scala} trait TripleCount { self: LongAccumulator => abstract override def add(v: jl.Long): Unit = { self.add(v * 3) } } val acc = new LongAccumulator with TripleCount sc.register(acc) val data = 1 to 10 val rdd = sc.makeRDD(data, 5) rdd.foreach(acc.add(_)) acc.value shouldBe 3 * data.sum {code} ... fails with {code:none} org.scalatest.exceptions.TestFailedException: 55 was not equal to 165 at org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340) at org.scalatest.Matchers$AnyShouldWrapper.shouldBe(Matchers.scala:6864) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet
[ https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461502#comment-16461502 ] Evan McClain commented on SPARK-4502: - The workaround I've been using is to explicitly pass in the read schema. It's an ugly workaround (typos in the field names and/or types can lead to seemingly unrelated errors), but it works. > Spark SQL reads unneccesary nested fields from Parquet > -- > > Key: SPARK-4502 > URL: https://issues.apache.org/jira/browse/SPARK-4502 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.1.0 >Reporter: Liwen Sun >Priority: Critical > > When reading a field of a nested column from Parquet, SparkSQL reads and > assemble all the fields of that nested column. This is unnecessary, as > Parquet supports fine-grained field reads out of a nested column. This may > degrades the performance significantly when a nested column has many fields. > For example, I loaded json tweets data into SparkSQL and ran the following > query: > {{SELECT User.contributors_enabled from Tweets;}} > User is a nested structure that has 38 primitive fields (for Tweets schema, > see: https://dev.twitter.com/overview/api/tweets), here is the log message: > {{14/11/19 16:36:49 INFO InternalParquetRecordReader: Assembled and processed > 385779 records from 38 columns in 3976 ms: 97.02691 rec/ms, 3687.0227 > cell/ms}} > For comparison, I also ran: > {{SELECT User FROM Tweets;}} > And here is the log message: > {{14/11/19 16:45:40 INFO InternalParquetRecordReader: Assembled and processed > 385779 records from 38 columns in 9461 ms: 40.77571 rec/ms, 1549.477 cell/ms}} > So both queries load 38 columns from Parquet, while the first query only > needs 1 column. I also measured the bytes read within Parquet. In these two > cases, the same number of bytes (99365194 bytes) were read. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23971) Should not leak Spark sessions across test suites
[ https://issues.apache.org/jira/browse/SPARK-23971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-23971: Fix Version/s: 2.3.1 > Should not leak Spark sessions across test suites > - > > Key: SPARK-23971 > URL: https://issues.apache.org/jira/browse/SPARK-23971 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.4.0 >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Major > Fix For: 2.3.1, 2.4.0 > > > Many suites currently leak Spark sessions (sometimes with stopped > SparkContexts) via the thread-local active Spark session and default Spark > session. We should attempt to clean these up and detect when this happens to > improve the reproducibility of tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23971) Should not leak Spark sessions across test suites
[ https://issues.apache.org/jira/browse/SPARK-23971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-23971: Component/s: Tests > Should not leak Spark sessions across test suites > - > > Key: SPARK-23971 > URL: https://issues.apache.org/jira/browse/SPARK-23971 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.4.0 >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Major > Fix For: 2.3.1, 2.4.0 > > > Many suites currently leak Spark sessions (sometimes with stopped > SparkContexts) via the thread-local active Spark session and default Spark > session. We should attempt to clean these up and detect when this happens to > improve the reproducibility of tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24013) ApproximatePercentile grinds to a halt on sorted input.
[ https://issues.apache.org/jira/browse/SPARK-24013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24013. - Resolution: Fixed Assignee: Marco Gaido Fix Version/s: 2.4.0 > ApproximatePercentile grinds to a halt on sorted input. > --- > > Key: SPARK-24013 > URL: https://issues.apache.org/jira/browse/SPARK-24013 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Juliusz Sompolski >Assignee: Marco Gaido >Priority: Major > Fix For: 2.4.0 > > Attachments: screenshot-1.png > > > Running > {code} > sql("select approx_percentile(rid, array(0.1)) from (select rand() as rid > from range(1000))").collect() > {code} > takes 7 seconds, while > {code} > sql("select approx_percentile(id, array(0.1)) from range(1000)").collect() > {code} > grinds to a halt - processes the first million rows quickly, and then slows > down to a few thousands rows / second (4m rows processed after 20 minutes). > Thread dumps show that it spends time in QuantileSummary.compress. > Seems it hits some edge case inefficiency when dealing with sorted data? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23489) Flaky Test: HiveExternalCatalogVersionsSuite
[ https://issues.apache.org/jira/browse/SPARK-23489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-23489: -- Description: I saw this error in an unrelated PR. It seems to me a bad configuration in the Jenkins node where the tests are run. {code} Error Message java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "/tmp/test-spark/spark-2.2.1"): error=2, No such file or directory Stacktrace sbt.ForkMain$ForkError: java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "/tmp/test-spark/spark-2.2.1"): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.spark.sql.hive.SparkSubmitTestUtils$class.runSparkSubmit(SparkSubmitTestUtils.scala:73) at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite.runSparkSubmit(HiveExternalCatalogVersionsSuite.scala:43) at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite$$anonfun$beforeAll$1.apply(HiveExternalCatalogVersionsSuite.scala:176) at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite$$anonfun$beforeAll$1.apply(HiveExternalCatalogVersionsSuite.scala:161) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite.beforeAll(HiveExternalCatalogVersionsSuite.scala:161) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:212) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:52) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: sbt.ForkMain$ForkError: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:248) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 17 more {code} This is the link: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87615/testReport/. *MASTER BRANCH* - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/4389 *BRANCH 2.3* - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/321/ *NOTE: This failure frequently looks as `Test Result (no failures)`* - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4811/ was: I saw this error in an unrelated PR. It seems to me a bad configuration in the Jenkins node where the tests are run. {code} Error Message java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "/tmp/test-spark/spark-2.2.1"): error=2, No such file or directory Stacktrace sbt.ForkMain$ForkError: java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "/tmp/test-spark/spark-2.2.1"): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.spark.sql.hive.SparkSubmitTestUtils$class.runSparkSubmit(SparkSubmitTestUtils.scala:73) at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite.runSparkSubmit(HiveExternalCatalogVersionsSuite.scala:43) at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite$$anonfun$beforeAll$1.apply(HiveExternalCatalogVersionsSuite.scala:176) at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite$$anonfun$beforeAll$1.apply(HiveExternalCatalogVersionsSuite.scala:161) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite.beforeAll(HiveExternalCatalogVersionsSuite.scala:161) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:212) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:52) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480) at sbt.ForkMain$Run$2.call(ForkMain.java:296) a
[jira] [Created] (SPARK-24153) Flaky Test: DirectKafkaStreamSuite
Dongjoon Hyun created SPARK-24153: - Summary: Flaky Test: DirectKafkaStreamSuite Key: SPARK-24153 URL: https://issues.apache.org/jira/browse/SPARK-24153 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.0 Reporter: Dongjoon Hyun {code} Test Result (5 failures / +5) org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.receiving from largest starting offset org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.creating stream by offset org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.offset recovery org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.offset recovery from kafka org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.Direct Kafka stream report input information {code} - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/348/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24152: -- Description: PR builder and master branch test fails with the following SparkR error with unknown reason. The following is an error message from that. {code} * this is package 'SparkR' version '2.4.0' * checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : dims [product 24] do not match the length of object [0] Execution halted {code} *PR BUILDER* - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ *MASTER BRANCH* - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ (Fail with no failures) This is critical because we already start to merge the PR by ignoring this **known unkonwn** SparkR failure. - https://github.com/apache/spark/pull/21175 was: PR builder and master branch test fails with the following SparkR error with unknown reason. The following is an error message from that. {code} * this is package 'SparkR' version '2.4.0' * checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : dims [product 24] do not match the length of object [0] Execution halted {code} *PR BUILDER* - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ *MASTER BRANCH* - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/lastCompletedBuild/console This is critical because we already start to merge the PR by ignoring this **known unkonwn** SparkR failure. - https://github.com/apache/spark/pull/21175 > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/ > (Fail with no failures) > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24152: -- Description: PR builder and master branch test fails with the following SparkR error with unknown reason. The following is an error message from that. {code} * this is package 'SparkR' version '2.4.0' * checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : dims [product 24] do not match the length of object [0] Execution halted {code} *PR BUILDER* - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ *MASTER BRANCH* - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/lastCompletedBuild/console This is critical because we already start to merge the PR by ignoring this **known unkonwn** SparkR failure. - https://github.com/apache/spark/pull/21175 was: PR builder fails with the following SparkR error with unknown reason. The following is an error message from that. {code} * this is package 'SparkR' version '2.4.0' * checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : dims [product 24] do not match the length of object [0] Execution halted {code} - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ This is critical because we already start to merge the PR by ignoring this **known unkonwn** SparkR failure. - https://github.com/apache/spark/pull/21175 > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder and master branch test fails with the following SparkR error with > unknown reason. The following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > *PR BUILDER* > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > *MASTER BRANCH* > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/lastCompletedBuild/console > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24152: -- Description: PR builder fails with the following SparkR error with unknown reason. The following is an error message from that. {code} * this is package 'SparkR' version '2.4.0' * checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : dims [product 24] do not match the length of object [0] Execution halted {code} - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ This is critical because we already start to merge the PR by ignoring this **known unkonwn** SparkR failure. - https://github.com/apache/spark/pull/21175 was: PR builder fails with the following SparkR error with unknown reason. The following is an error message from that. {code} * this is package 'SparkR' version '2.4.0' * checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : dims [product 24] do not match the length of object [0] Execution halted {code} - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder fails with the following SparkR error with unknown reason. The > following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ > This is critical because we already start to merge the PR by ignoring this > **known unkonwn** SparkR failure. > - https://github.com/apache/spark/pull/21175 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461328#comment-16461328 ] Dongjoon Hyun commented on SPARK-24152: --- cc [~shivaram], [~felixcheung], [~yanboliang] > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder fails with the following SparkR error with unknown reason. The > following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24152) Flaky Test: SparkR
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24152: -- Description: PR builder fails with the following SparkR error with unknown reason. The following is an error message from that. {code} * this is package 'SparkR' version '2.4.0' * checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : dims [product 24] do not match the length of object [0] Execution halted {code} - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ was: PR builder fails with the following SparkR error with unknown reason. The following is an error message from that. {code} * this is package 'SparkR' version '2.4.0' * checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : dims [product 24] do not match the length of object [0] Execution halted {code} - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > Flaky Test: SparkR > -- > > Key: SPARK-24152 > URL: https://issues.apache.org/jira/browse/SPARK-24152 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > PR builder fails with the following SparkR error with unknown reason. The > following is an error message from that. > {code} > * this is package 'SparkR' version '2.4.0' > * checking CRAN incoming feasibility ...Error in > .check_package_CRAN_incoming(pkgdir) : > dims [product 24] do not match the length of object [0] > Execution halted > {code} > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ > - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24152) Flaky Test: SparkR
Dongjoon Hyun created SPARK-24152: - Summary: Flaky Test: SparkR Key: SPARK-24152 URL: https://issues.apache.org/jira/browse/SPARK-24152 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 2.4.0 Reporter: Dongjoon Hyun PR builder fails with the following SparkR error with unknown reason. The following is an error message from that. {code} * this is package 'SparkR' version '2.4.0' * checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : dims [product 24] do not match the length of object [0] Execution halted {code} - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/ - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size
[ https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461324#comment-16461324 ] Erik Erlandson commented on SPARK-24135: > In the case of the executor failing to start at all, this wouldn't be caught > by Spark's task failure count logic because you're never going to end up > scheduling tasks on these executors that failed to start. Aha, that argues for allowing a way to give up after repeated pod start failures. > [K8s] Executors that fail to start up because of init-container errors are > not retried and limit the executor pool size > --- > > Key: SPARK-24135 > URL: https://issues.apache.org/jira/browse/SPARK-24135 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Matt Cheah >Priority: Major > > In KubernetesClusterSchedulerBackend, we detect if executors disconnect after > having been started or if executors hit the {{ERROR}} or {{DELETED}} states. > When executors fail in these ways, they are removed from the pending > executors pool and the driver should retry requesting these executors. > However, the driver does not handle a different class of error: when the pod > enters the {{Init:Error}} state. This state comes up when the executor fails > to launch because one of its init-containers fails. Spark itself doesn't > attach any init-containers to the executors. However, custom web hooks can > run on the cluster and attach init-containers to the executor pods. > Additionally, pod presets can specify init containers to run on these pods. > Therefore Spark should be handling the {{Init:Error}} cases regardless if > Spark itself is aware of init-containers or not. > This class of error is particularly bad because when we hit this state, the > failed executor will never start, but it's still seen as pending by the > executor allocator. The executor allocator won't request more rounds of > executors because its current batch hasn't been resolved to either running or > failed. Therefore we end up with being stuck with the number of executors > that successfully started before the faulty one failed to start, potentially > creating a fake resource bottleneck. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24150) Race condition in FsHistoryProvider
[ https://issues.apache.org/jira/browse/SPARK-24150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] William Montaz updated SPARK-24150: --- Description: There exist a race condition in checkLogs method between threads of replayExecutor. They use the field "applications" to synchronise, but they also update that field. The problem is that threads will eventually synchronise on different monitors (because they will synchronise on different objects which references have been assigned to "applications"), breaking the initial synchronisation intent. This has even greater chance to reproduce when number_new_log_files > replayExecutor_pool_size If such log disappears (it will not be present in the list "applications"), it will be impossible to read it from the UI (being in the list "applications" is a mandatory check to avoid getting a 404) Workaround: * use a permanent object as a monitor on which to synchronise (or synchronise on `this`) * keep volatile field for all other read accesses was: There exist a race condition in checkLogs method between threads of replayExecutor. They use the field "applications" to synchronise, but they also update that field. The problem is that threads will eventually synchronise on different monitors (because they will synchronise on different objects which references have been assigned to "applications"), breaking the initial synchronisation intent. This has even greater chance to reproduce when number_new_log_files > replayExecutor_pool_size Workaround: * use a permanent object as a monitor on which to synchronise (or synchronise on `this`) * keep volatile field for all other read accesses > Race condition in FsHistoryProvider > --- > > Key: SPARK-24150 > URL: https://issues.apache.org/jira/browse/SPARK-24150 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: William Montaz >Priority: Major > > There exist a race condition in checkLogs method between threads of > replayExecutor. They use the field "applications" to synchronise, but they > also update that field. > The problem is that threads will eventually synchronise on different monitors > (because they will synchronise on different objects which references have > been assigned to "applications"), breaking the initial synchronisation > intent. This has even greater chance to reproduce when number_new_log_files > > replayExecutor_pool_size > If such log disappears (it will not be present in the list "applications"), > it will be impossible to read it from the UI (being in the list > "applications" is a mandatory check to avoid getting a 404) > Workaround: > * use a permanent object as a monitor on which to synchronise (or > synchronise on `this`) > * keep volatile field for all other read accesses -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24150) Race condition in FsHistoryProvider
[ https://issues.apache.org/jira/browse/SPARK-24150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] William Montaz updated SPARK-24150: --- Description: There exist a race condition in checkLogs method between threads of replayExecutor. They use the field "applications" to synchronise, but they also update that field. The problem is that threads will eventually synchronise on different monitors (because they will synchronise on different objects which references have been assigned to "applications"), breaking the initial synchronisation intent. This has even greater chance to reproduce when number_new_log_files > replayExecutor_pool_size Workaround: * use a permanent object as a monitor on which to synchronise (or synchronise on `this`) * keep volatile field for all other read accesses was: There exist a race condition in checkLogs method between threads of replayExecutor. They use the field "applications" to synchronise, but they also update that field. The problem is that threads will eventually synchronise on different monitors (because they will synchronise on different objects which references that have been assigned to "applications"), breaking the initial synchronisation intent. This has even greater chance to reproduce when number_new_log_files > replayExecutor_pool_size Workaround: * use a permanent object as a monitor on which to synchronise (or synchronise on `this`) * keep volatile field for all other read accesses > Race condition in FsHistoryProvider > --- > > Key: SPARK-24150 > URL: https://issues.apache.org/jira/browse/SPARK-24150 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: William Montaz >Priority: Major > > There exist a race condition in checkLogs method between threads of > replayExecutor. They use the field "applications" to synchronise, but they > also update that field. > The problem is that threads will eventually synchronise on different monitors > (because they will synchronise on different objects which references have > been assigned to "applications"), breaking the initial synchronisation > intent. This has even greater chance to reproduce when number_new_log_files > > replayExecutor_pool_size > Workaround: > * use a permanent object as a monitor on which to synchronise (or > synchronise on `this`) > * keep volatile field for all other read accesses -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24150) Race condition in FsHistoryProvider
[ https://issues.apache.org/jira/browse/SPARK-24150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] William Montaz updated SPARK-24150: --- Priority: Major (was: Minor) > Race condition in FsHistoryProvider > --- > > Key: SPARK-24150 > URL: https://issues.apache.org/jira/browse/SPARK-24150 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: William Montaz >Priority: Major > > There exist a race condition in checkLogs method between threads of > replayExecutor. They use the field "applications" to synchronise, but they > also update that field. > The problem is that threads will eventually synchronise on different monitors > (because they will synchronise on different objects which references that > have been assigned to "applications"), breaking the initial synchronisation > intent. This has even greater chance to reproduce when number_new_log_files > > replayExecutor_pool_size > Workaround: > * use a permanent object as a monitor on which to synchronise (or > synchronise on `this`) > * keep volatile field for all other read accesses -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24150) Race condition in FsHistoryProvider
[ https://issues.apache.org/jira/browse/SPARK-24150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] William Montaz updated SPARK-24150: --- Description: There exist a race condition in checkLogs method between threads of replayExecutor. They use the field "applications" to synchronise, but they also update that field. The problem is that threads will eventually synchronise on different monitors (because they will synchronise on different objects which references that have been assigned to "applications"), breaking the initial synchronisation intent. This has even greater chance to reproduce when number_new_log_files > replayExecutor_pool_size Workaround: * use a permanent object as a monitor on which to synchronise (or synchronise on `this`) * keep volatile field for all other read accesses was: There exist a race condition in checkLogs method between threads of replayExecutor. They use the field "applications" to synchronise, but they also update that field. The problem is that if the number of tasks (the number of new log files to replay and add to the applications list) is greater than the number of threads in the pool, threads will eventually synchronise on different monitors (because they will synchronise on different objects which references that have been assigned to "applications"), breaking the initial synchronisation intent. Workaround: * use a permanent object as a monitor on which to synchronise (or synchronise on `this`) * keep volatile field for all other read accesses > Race condition in FsHistoryProvider > --- > > Key: SPARK-24150 > URL: https://issues.apache.org/jira/browse/SPARK-24150 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: William Montaz >Priority: Minor > > There exist a race condition in checkLogs method between threads of > replayExecutor. They use the field "applications" to synchronise, but they > also update that field. > The problem is that threads will eventually synchronise on different monitors > (because they will synchronise on different objects which references that > have been assigned to "applications"), breaking the initial synchronisation > intent. This has even greater chance to reproduce when number_new_log_files > > replayExecutor_pool_size > Workaround: > * use a permanent object as a monitor on which to synchronise (or > synchronise on `this`) > * keep volatile field for all other read accesses -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24150) Race condition in FsHistoryProvider
[ https://issues.apache.org/jira/browse/SPARK-24150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] William Montaz updated SPARK-24150: --- Description: There exist a race condition in checkLogs method between threads of replayExecutor. They use the field "applications" to synchronise, but they also update that field. The problem is that if the number of tasks (the number of new log files to replay and add to the applications list) is greater than the number of threads in the pool, threads will eventually synchronise on different monitors (because they will synchronise on different objects which references that have been assigned to "applications"), breaking the initial synchronisation intent. Workaround: * use a permanent object as a monitor on which to synchronise (or synchronise on `this`) * keep volatile field for all other read accesses was: There exist a race condition in checkLogs method between threads of replayExecutor. They use the field "applications" to synchronise, but they also update that field. The problem is that if the number of tasks (the number of new log files to replay and add to the applications list) is greater than the number of threads in the pool, there is a great chance that a thread will try to synchronise on an updated version of applications (since it is volatile and updated) while some are still being synchronised on an old reference of applications. There the race condition happens. Workaround: * use a permanent object as a monitor on which to synchronise (or synchronise on `this`) * keep volatile field for all other read accesses > Race condition in FsHistoryProvider > --- > > Key: SPARK-24150 > URL: https://issues.apache.org/jira/browse/SPARK-24150 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: William Montaz >Priority: Minor > > There exist a race condition in checkLogs method between threads of > replayExecutor. They use the field "applications" to synchronise, but they > also update that field. > The problem is that if the number of tasks (the number of new log files to > replay and add to the applications list) is greater than the number of > threads in the pool, threads will eventually synchronise on different > monitors (because they will synchronise on different objects which references > that have been assigned to "applications"), breaking the initial > synchronisation intent. > Workaround: > * use a permanent object as a monitor on which to synchronise (or > synchronise on `this`) > * keep volatile field for all other read accesses -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24150) Race condition in FsHistoryProvider
[ https://issues.apache.org/jira/browse/SPARK-24150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] William Montaz updated SPARK-24150: --- Description: There exist a race condition in checkLogs method between threads of replayExecutor. They use the field "applications" to synchronise, but they also update that field. The problem is that if the number of tasks (the number of new log files to replay and add to the applications list) is greater than the number of threads in the pool, there is a great chance that a thread will try to synchronise on an updated version of applications (since it is volatile and updated) while some are still being synchronised on an old reference of applications. There the race condition happens. Workaround: * use a permanent object as a monitor on which to synchronise (or synchronise on `this`) * keep volatile field for all other read accesses was: There exist a race condition between the method checkLogs and cleanLogs. cleanLogs can read the field applications while it is concurrently processed by checkLogs. It is possible that checkLogs added new fetched logs, sets applications and this is erased by cleanLogs having an old version of applications. The problem is that the fetched log won't appear in applications anymore and it will then be impossible to display the corresponding application in the History Server, since it must be in the LinkedList applications. Workaround: * use a permanent object as a monitor on which to synchronise * keep volatile field for all other read accesses > Race condition in FsHistoryProvider > --- > > Key: SPARK-24150 > URL: https://issues.apache.org/jira/browse/SPARK-24150 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: William Montaz >Priority: Minor > > There exist a race condition in checkLogs method between threads of > replayExecutor. They use the field "applications" to synchronise, but they > also update that field. > The problem is that if the number of tasks (the number of new log files to > replay and add to the applications list) is greater than the number of > threads in the pool, there is a great chance that a thread will try to > synchronise on an updated version of applications (since it is volatile and > updated) while some are still being synchronised on an old reference of > applications. There the race condition happens. > Workaround: > * use a permanent object as a monitor on which to synchronise (or > synchronise on `this`) > * keep volatile field for all other read accesses -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine",
[ https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461250#comment-16461250 ] Sam Garrett commented on SPARK-22918: - +1 same issue > sbt test (spark - local) fail after upgrading to 2.2.1 with: > java.security.AccessControlException: access denied > org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" ) > > > Key: SPARK-22918 > URL: https://issues.apache.org/jira/browse/SPARK-22918 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Damian Momot >Priority: Major > > After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started > to fail with following exception: > {noformat} > java.security.AccessControlException: access denied > org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" ) > at > java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) > at > java.security.AccessController.checkPermission(AccessController.java:884) > at > org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown > Source) > at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown > Source) > at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source) > at java.security.AccessController.doPrivileged(Native Method) > at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source) > at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source) > at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source) > at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at java.lang.Class.newInstance(Class.java:442) > at > org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47) > at > org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54) > at > org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238) > at > org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131) > at > org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) > at > org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325) > at > org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282) > at > org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) > at > org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301) > at > org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187) > at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333) > at