[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark
[ https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957603#comment-16957603 ] zhao bo commented on SPARK-29106: - Hi [~shaneknapp], Sorry for disturb. I have some questions about the following work want to discuss with you. I list them in the following. # For pyspark test, you mentioned we didn't install any python debs for testing. Is there any "requirements.txt" or "test-requirements.txt" in the spark repo? I'm failed to find them. When we test the pyspark before, we just realize that we need to install numpy package with pip, because when we exec the pyspark test scripts, the fail message noticed us. So you mentioned "pyspark testing debs" before, you mean that we should figure all out manually? Is there any kind suggest from your side? # For sparkR test, we compile a higher R version 3.6.1 by fix many lib dependency, and make it work. And exec the R test script, till to all of them return pass. So we wonder the difficult about the test when we truelly in amplab, could you please share more to us? # For current periodic jobs, you said it will be triggered 2 times per day. Each build will cost most 11 hours. I have a thought about the next job deployment, wish to know your thought about it. My thought is we can setup 2 jobs per day, one is the current maven UT test triggered by SCM changes(11h), the other will run the pyspark and sparkR tests also triggered by SCM changes(including spark build and tests, may cost 5-6 hours). How about this? We can talk and discuss if we don't realize how difficult to do these now. Thanks very much, shane. And hope you could reply if you are free. ;) > Add jenkins arm test for spark > -- > > Key: SPARK-29106 > URL: https://issues.apache.org/jira/browse/SPARK-29106 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.0 >Reporter: huangtianhua >Priority: Minor > > Add arm test jobs to amplab jenkins for spark. > Till now we made two arm test periodic jobs for spark in OpenLab, one is > based on master with hadoop 2.7(similar with QA test of amplab jenkins), > other one is based on a new branch which we made on date 09-09, see > [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64] > and > [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64] > We only have to care about the first one when integrate arm test with amplab > jenkins. > About the k8s test on arm, we have took test it, see > [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it > later. > And we plan test on other stable branches too, and we can integrate them to > amplab when they are ready. > We have offered an arm instance and sent the infos to shane knapp, thanks > shane to add the first arm job to amplab jenkins :) > The other important thing is about the leveldbjni > [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80] > spark depends on leveldbjni-all-1.8 > [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8], > we can see there is no arm64 supporting. So we build an arm64 supporting > release of leveldbjni see > [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8], > but we can't modified the spark pom.xml directly with something like > 'property'/'profile' to choose correct jar package on arm or x86 platform, > because spark depends on some hadoop packages like hadoop-hdfs, the packages > depend on leveldbjni-all-1.8 too, unless hadoop release with new arm > supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of > openlabtesting and 'mvn install' to use it when arm testing for spark. > PS: The issues found and fixed: > SPARK-28770 > [https://github.com/apache/spark/pull/25673] > > SPARK-28519 > [https://github.com/apache/spark/pull/25279] > > SPARK-28433 > [https://github.com/apache/spark/pull/25186] > > SPARK-28467 > [https://github.com/apache/spark/pull/25864] > > SPARK-29286 > [https://github.com/apache/spark/pull/26021] > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29093) Remove automatically generated param setters in _shared_params_code_gen.py
[ https://issues.apache.org/jira/browse/SPARK-29093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-29093: Assignee: Huaxin Gao > Remove automatically generated param setters in _shared_params_code_gen.py > -- > > Key: SPARK-29093 > URL: https://issues.apache.org/jira/browse/SPARK-29093 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Assignee: Huaxin Gao >Priority: Major > > The main difference between scala and py sides come from the automatically > generated param setter in _shared_params_code_gen.py. > To make them in sync, we should remove those setters in _shared_.py, and add > the corresponding setters manually. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29093) Remove automatically generated param setters in _shared_params_code_gen.py
[ https://issues.apache.org/jira/browse/SPARK-29093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957601#comment-16957601 ] zhengruifeng commented on SPARK-29093: -- [~huaxingao] Thanks! > Remove automatically generated param setters in _shared_params_code_gen.py > -- > > Key: SPARK-29093 > URL: https://issues.apache.org/jira/browse/SPARK-29093 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Priority: Major > > The main difference between scala and py sides come from the automatically > generated param setter in _shared_params_code_gen.py. > To make them in sync, we should remove those setters in _shared_.py, and add > the corresponding setters manually. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23171) Reduce the time costs of the rule runs that do not change the plans
[ https://issues.apache.org/jira/browse/SPARK-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957593#comment-16957593 ] Takeshi Yamamuro commented on SPARK-23171: -- oh, nice, the performance looks much better. > Reduce the time costs of the rule runs that do not change the plans > > > Key: SPARK-23171 > URL: https://issues.apache.org/jira/browse/SPARK-23171 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > Labels: bulk-closed > > Below is the time stats of Analyzer/Optimizer rules. Try to improve the rules > and reduce the time costs, especially for the runs that do not change the > plans. > {noformat} > === Metrics of Analyzer/Optimizer Rules === > Total number of runs = 175827 > Total time: 20.699042877 seconds > Rule > Total Time Effective Time Total Runs > Effective Runs > org.apache.spark.sql.catalyst.optimizer.ColumnPruning > 2340563794 1338268224 1875 > 761 > org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution > 1632672623 1625071881 788 > 37 > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions > 1395087131 347339931 1982 > 38 > org.apache.spark.sql.catalyst.optimizer.PruneFilters > 1177711364 21344174 1590 > 3 > org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries > 1145135465 1131417128 285 > 39 > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences > 1008347217 663112062 1982 > 616 > org.apache.spark.sql.catalyst.optimizer.ReorderJoin > 767024424 693001699 1590 > 132 > org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability > 598524650 40802876 742 > 12 > org.apache.spark.sql.catalyst.analysis.DecimalPrecision > 595384169 436153128 1982 > 211 > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery > 548178270 459695885 1982 > 49 > org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts > 423002864 139869503 1982 > 86 > org.apache.spark.sql.catalyst.optimizer.BooleanSimplification > 405544962 17250184 1590 > 7 > org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin > 383837603 284174662 1590 > 708 > org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases > 372901885 33623321590 > 9 > org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints > 364628214 343815519 285 > 192 > org.apache.spark.sql.execution.datasources.FindDataSourceTable > 303293296 285344766 1982 > 233 > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions > 233195019 92648171 1982 > 294 > org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion > 220568919 73932736 1982 > 38 > org.apache.spark.sql.catalyst.optimizer.NullPropagation > 207976072 9072305
[jira] [Commented] (SPARK-23171) Reduce the time costs of the rule runs that do not change the plans
[ https://issues.apache.org/jira/browse/SPARK-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957591#comment-16957591 ] Yuming Wang commented on SPARK-23171: - This is a real SQL in our production. Spark 2.3.4: {noformat} === Metrics of Analyzer/Optimizer Rules === Total number of runs: 1602 Total time: 25.87935196 seconds Rule Effective Time / Total Time Effective Runs / Total Runs org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations 12560629829 / 12561649545 4 / 21 org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions 0 / 10442916205 0 / 5 org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions 1655041748 / 1655084280 1 / 2 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 217766453 / 256617622 8 / 21 org.apache.spark.sql.catalyst.analysis.DecimalPrecision 48636897 / 68610147 4 / 21 org.apache.spark.sql.catalyst.optimizer.ColumnPruning 16638517 / 53422588 1 / 15 org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion 26295695 / 50081268 2 / 21 org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts 0 / 495189890 / 21 org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings 24587790 / 49437868 2 / 21 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions 16488193 / 32838168 8 / 21 org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator 0 / 322903690 / 21 org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 18041546 / 29396487 10 / 21 org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability 0 / 286502760 / 5 org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations 0 / 266196050 / 21 org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion 0 / 262065210 / 21 org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion 0 / 250364120 / 21 org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality 0 / 248969190 / 21 org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division 0 / 238217250 / 21 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame 0 / 226211150 / 21 org.apache.spark.sql.catalyst.analysis.ResolveCreateNamedStruct 0 / 223976120 / 21 org.apache.spark.sql.catalyst.analysis.EliminateView 22255584 / 22286242 1 / 2 org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion 0 / 212443510 / 21 org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion 0 / 210324060 / 21 org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion 0 / 208345110 / 21 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder 0 / 206443710 / 21 org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion 0 / 200976830 / 21 org.apache.spark.sql.catalyst.analysis.TimeWindowing 0 / 198999780 / 21 org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion 0 / 198197680 / 21 org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery 0 / 182571400 / 21 org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates 0 / 173047130 / 21 org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints 11616056 / 11622509 1 / 2 org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin 5286165 / 8730109 8 / 13 org.ap
[jira] [Updated] (SPARK-29145) Spark SQL cannot handle "NOT IN" condition when using "JOIN"
[ https://issues.apache.org/jira/browse/SPARK-29145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29145: -- Affects Version/s: 2.1.3 > Spark SQL cannot handle "NOT IN" condition when using "JOIN" > > > Key: SPARK-29145 > URL: https://issues.apache.org/jira/browse/SPARK-29145 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.4 >Reporter: Dezhi Cai >Priority: Minor > > sample sql: > {code} > spark.range(10).createOrReplaceTempView("A") > spark.range(10).createOrReplaceTempView("B") > spark.range(10).createOrReplaceTempView("C") > sql("""select * from A inner join B on A.id=B.id and A.id NOT IN (select id > from C)""") > {code} > > {code} > org.apache.spark.sql.AnalysisException: Table or view not found: C; line 1 > pos 74; > 'Project [*] > +- 'Join Inner, ((id#0L = id#2L) AND NOT id#0L IN (list#6 [])) >: +- 'Project ['id] >: +- 'UnresolvedRelation [C] >:- SubqueryAlias `a` >: +- Range (0, 10, step=1, splits=Some(12)) >+- SubqueryAlias `b` > +- Range (0, 10, step=1, splits=Some(12)) > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:89) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:155) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:154) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:154) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:154) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:89) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:86) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:120) > ... > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29145) Spark SQL cannot handle "NOT IN" condition when using "JOIN"
[ https://issues.apache.org/jira/browse/SPARK-29145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29145: -- Affects Version/s: 2.2.3 > Spark SQL cannot handle "NOT IN" condition when using "JOIN" > > > Key: SPARK-29145 > URL: https://issues.apache.org/jira/browse/SPARK-29145 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.3, 2.3.4, 2.4.4 >Reporter: Dezhi Cai >Priority: Minor > > sample sql: > {code} > spark.range(10).createOrReplaceTempView("A") > spark.range(10).createOrReplaceTempView("B") > spark.range(10).createOrReplaceTempView("C") > sql("""select * from A inner join B on A.id=B.id and A.id NOT IN (select id > from C)""") > {code} > > {code} > org.apache.spark.sql.AnalysisException: Table or view not found: C; line 1 > pos 74; > 'Project [*] > +- 'Join Inner, ((id#0L = id#2L) AND NOT id#0L IN (list#6 [])) >: +- 'Project ['id] >: +- 'UnresolvedRelation [C] >:- SubqueryAlias `a` >: +- Range (0, 10, step=1, splits=Some(12)) >+- SubqueryAlias `b` > +- Range (0, 10, step=1, splits=Some(12)) > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:89) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:155) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:154) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:154) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:154) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:89) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:86) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:120) > ... > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29145) Spark SQL cannot handle "NOT IN" condition when using "JOIN"
[ https://issues.apache.org/jira/browse/SPARK-29145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29145: -- Affects Version/s: 2.3.4 > Spark SQL cannot handle "NOT IN" condition when using "JOIN" > > > Key: SPARK-29145 > URL: https://issues.apache.org/jira/browse/SPARK-29145 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.4 >Reporter: Dezhi Cai >Priority: Minor > > sample sql: > {code} > spark.range(10).createOrReplaceTempView("A") > spark.range(10).createOrReplaceTempView("B") > spark.range(10).createOrReplaceTempView("C") > sql("""select * from A inner join B on A.id=B.id and A.id NOT IN (select id > from C)""") > {code} > > {code} > org.apache.spark.sql.AnalysisException: Table or view not found: C; line 1 > pos 74; > 'Project [*] > +- 'Join Inner, ((id#0L = id#2L) AND NOT id#0L IN (list#6 [])) >: +- 'Project ['id] >: +- 'UnresolvedRelation [C] >:- SubqueryAlias `a` >: +- Range (0, 10, step=1, splits=Some(12)) >+- SubqueryAlias `b` > +- Range (0, 10, step=1, splits=Some(12)) > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:89) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:155) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:154) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:154) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:154) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:89) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:86) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:120) > ... > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29551) There is a bug about fetch failed when an executor lost
[ https://issues.apache.org/jira/browse/SPARK-29551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-29551: - Description: There will be a regression when the executor lost and then causes 'fetch failed'. When an executor lost with some reason (eg:. the external shuffle service or host lost on the executor's host ) and the executor loses time happens to be reduce stage fetch failed from it which is really bad, the previous only call mapOutputTracker.unregisterMapOutput(shuffleId, mapIndex, bmAddress) to mark one map as broken in the map stage at this time , but other maps on the executor are also not available which can only be resubmitted by a nest retry stage which is the regression. As we all know that the previous will call mapOutputTracker.removeOutputsOnHost(host) or mapOutputTracker.removeOutputsOnExecutor(execId) when reduce stage fetches failed and the executor is active, while it does NOT for the above problems. So we should distinguish the failedEpoch of 'executor lost' from the fetchFailedEpoch of 'fetch failed' to solve the above problem. We can add an unittest in 'DAGSchedulerSuite.scala' to catch the above problem. {code} test("All shuffle files on the slave should be cleaned up when slave lost test") { // reset the test context with the right shuffle service config afterEach() val conf = new SparkConf() conf.set(config.SHUFFLE_SERVICE_ENABLED.key, "true") conf.set("spark.files.fetchFailure.unRegisterOutputOnHost", "true") init(conf) runEvent(ExecutorAdded("exec-hostA1", "hostA")) runEvent(ExecutorAdded("exec-hostA2", "hostA")) runEvent(ExecutorAdded("exec-hostB", "hostB")) val firstRDD = new MyRDD(sc, 3, Nil) val firstShuffleDep = new ShuffleDependency(firstRDD, new HashPartitioner(3)) val firstShuffleId = firstShuffleDep.shuffleId val shuffleMapRdd = new MyRDD(sc, 3, List(firstShuffleDep)) val shuffleDep = new ShuffleDependency(shuffleMapRdd, new HashPartitioner(3)) val secondShuffleId = shuffleDep.shuffleId val reduceRdd = new MyRDD(sc, 1, List(shuffleDep)) submit(reduceRdd, Array(0)) // map stage1 completes successfully, with one task on each executor complete(taskSets(0), Seq( (Success, MapStatus( BlockManagerId("exec-hostA1", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 5)), (Success, MapStatus( BlockManagerId("exec-hostA2", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 6)), (Success, makeMapStatus("hostB", 1, mapTaskId = 7)) )) // map stage2 completes successfully, with one task on each executor complete(taskSets(1), Seq( (Success, MapStatus( BlockManagerId("exec-hostA1", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 8)), (Success, MapStatus( BlockManagerId("exec-hostA2", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 9)), (Success, makeMapStatus("hostB", 1, mapTaskId = 10)) )) // make sure our test setup is correct val initialMapStatus1 = mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses // val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get assert(initialMapStatus1.count(_ != null) === 3) assert(initialMapStatus1.map{_.location.executorId}.toSet === Set("exec-hostA1", "exec-hostA2", "exec-hostB")) assert(initialMapStatus1.map{_.mapId}.toSet === Set(5, 6, 7)) val initialMapStatus2 = mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses // val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get assert(initialMapStatus2.count(_ != null) === 3) assert(initialMapStatus2.map{_.location.executorId}.toSet === Set("exec-hostA1", "exec-hostA2", "exec-hostB")) assert(initialMapStatus2.map{_.mapId}.toSet === Set(8, 9, 10)) // kill exec-hostA2 runEvent(ExecutorLost("exec-hostA2", ExecutorKilled)) // reduce stage fails with a fetch failure from map stage from exec-hostA2 complete(taskSets(2), Seq( (FetchFailed(BlockManagerId("exec-hostA2", "hostA", 12345), secondShuffleId, 0L, 0, 0, "ignored"), null) )) // Here is the main assertion -- make sure that we de-register // the map outputs for both map stage from both executors on hostA val mapStatus1 = mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses assert(mapStatus1.count(_ != null) === 1) assert(mapStatus1(2).location.executorId === "exec-hostB") assert(mapStatus1(2).location.host === "hostB") val mapStatus2 = mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses assert(mapStatus2.count(_ != null) === 1) assert(mapStatus2(2).location.executorId === "exec-hostB") assert(mapStatus2(2).location.host === "hostB") } {code} The error output is: {code} 3 did not equal 1 ScalaTestFailureLocation: org.apache.spark.sched
[jira] [Closed] (SPARK-28925) Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and 1.14
[ https://issues.apache.org/jira/browse/SPARK-28925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-28925. - > Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and > 1.14 > > > Key: SPARK-28925 > URL: https://issues.apache.org/jira/browse/SPARK-28925 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.3, 2.4.3 >Reporter: Eric >Priority: Minor > > Hello, > If you use Spark with Kubernetes 1.13 or 1.14 you will see this error: > {code:java} > {"time": "2019-08-28T09:56:11.866Z", "lvl":"INFO", "logger": > "org.apache.spark.internal.Logging", > "thread":"kubernetes-executor-snapshots-subscribers-0","msg":"Going to > request 1 executors from Kubernetes."} > {"time": "2019-08-28T09:56:12.028Z", "lvl":"WARN", "logger": > "io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2", > "thread":"OkHttp https://kubernetes.default.svc/...","msg":"Exec Failure: > HTTP 403, Status: 403 - "} > java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden' > {code} > Apparently the bug is fixed here: > [https://github.com/fabric8io/kubernetes-client/pull/1669] > We have currently compiled Spark source code with Kubernetes-client 4.4.2 and > it's working great on our cluster. We are using Kubernetes 1.13.10. > > Could it be possible to update that dependency version? > > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-29071) Upgrade Scala to 2.12.10
[ https://issues.apache.org/jira/browse/SPARK-29071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-29071. - > Upgrade Scala to 2.12.10 > > > Key: SPARK-29071 > URL: https://issues.apache.org/jira/browse/SPARK-29071 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > Supposed to compile another 5-10% faster than the 2.12.8 we're on now: > * [https://github.com/scala/scala/releases/tag/v2.12.9] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-29446) Upgrade netty-all to 4.1.42 and fix vulnerabilities.
[ https://issues.apache.org/jira/browse/SPARK-29446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-29446. - > Upgrade netty-all to 4.1.42 and fix vulnerabilities. > > > Key: SPARK-29446 > URL: https://issues.apache.org/jira/browse/SPARK-29446 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > The current code uses io.netty:netty-all:jar:4.1.17 and it will cause a > security vulnerabilities. We could get some security info from > [https://www.tenable.com/cve/CVE-2019-16869]. > This reference remind to upgrate the version of netty-all to 4.1.42 or later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-29495) Add ability to estimate per
[ https://issues.apache.org/jira/browse/SPARK-29495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-29495. - > Add ability to estimate per > --- > > Key: SPARK-29495 > URL: https://issues.apache.org/jira/browse/SPARK-29495 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.4 >Reporter: Chris Nardi >Priority: Major > > In gensim, [the LDA > model|[https://radimrehurek.com/gensim/models/ldamodel.html]] has a parameter > eval_every that allows a user to specify that the model should be evaluated > every X iterations to determine its log perplexity. This helps to determine > convergence of the model, and whether or not the proper number of iterations > has been chosen. Spark has no similar functionality in its implementation of > LDA. This should be added, as it appears the only way to achieve this > functionality would be to train models of varying numbers of iterations and > evaluate each's log perplexity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27741) Transitivity on predicate pushdown
[ https://issues.apache.org/jira/browse/SPARK-27741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27741: -- Fix Version/s: (was: 2.4.0) > Transitivity on predicate pushdown > --- > > Key: SPARK-27741 > URL: https://issues.apache.org/jira/browse/SPARK-27741 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.1.1 >Reporter: U Shaw >Priority: Major > > When using inner join, where conditions can be passed to join on, and when > using outer join, even if the conditions are the same, only the predicate is > pushed down to left or right. > As follows: > select * from t1 left join t2 on t1.id=t2.id where t1.id=1 > --> select * from t1 left join on t1.id=t2.id and t2.id=1 where t1.id=1 > Is Catalyst can support transitivity on predicate pushdown ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29193) Update fabric8 version to 4.3 continue docker 4 desktop support
[ https://issues.apache.org/jira/browse/SPARK-29193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29193: -- Fix Version/s: (was: 3.0.0) > Update fabric8 version to 4.3 continue docker 4 desktop support > --- > > Key: SPARK-29193 > URL: https://issues.apache.org/jira/browse/SPARK-29193 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Holden Karau >Priority: Blocker > > The current version of the kubernetes client we are using has some issues > with not setting origin ( > [https://github.com/fabric8io/kubernetes-client/issues/1667] ) which cause > failures on new versions of Docker 4 Desktop Kubernetes. > > This is fixed in 4.3-snapshot, so we will need to wait for the 4.3 release or > backport this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-29547) Make `docker-integration-tests` work in JDK11
[ https://issues.apache.org/jira/browse/SPARK-29547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-29547. - > Make `docker-integration-tests` work in JDK11 > - > > Key: SPARK-29547 > URL: https://issues.apache.org/jira/browse/SPARK-29547 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > To support JDK11, SPARK-28737 upgraded `Jersey` to 2.29. However, it turns > out that `docker-integration-tests` is broken because > `com.spotify.docker-client` still depends on jersey-guava. > SPARK-29546 recovers the test suite in JDK8 by adding back the dependency. We > had better make this test suite work in JDK11 environment, too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29546) Recover jersey-guava test dependency in docker-integration-tests
[ https://issues.apache.org/jira/browse/SPARK-29546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-29546: - Assignee: Dongjoon Hyun > Recover jersey-guava test dependency in docker-integration-tests > > > Key: SPARK-29546 > URL: https://issues.apache.org/jira/browse/SPARK-29546 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > While SPARK-28737 upgrades `Jersey` to 2.29, `docker-integration-tests` is > broken because `com.spotify.docker-client` depends on `jersey-guava`. The > latest `com.spotify.docker-client` is also still depending on that, too. > - https://mvnrepository.com/artifact/com.spotify/docker-client/5.0.2 > -> > https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-client/2.19 > -> > https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-common/2.19 > -> > https://mvnrepository.com/artifact/org.glassfish.jersey.bundles.repackaged/jersey-guava/2.19 > **AFTER** > {code} > build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 > -Dtest=none > -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite test > Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0 > All tests passed. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29546) Recover jersey-guava test dependency in docker-integration-tests
[ https://issues.apache.org/jira/browse/SPARK-29546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29546: -- Description: While SPARK-28737 upgrades `Jersey` to 2.29, `docker-integration-tests` is broken because `com.spotify.docker-client` depends on `jersey-guava`. The latest `com.spotify.docker-client` is also still depending on that, too. - https://mvnrepository.com/artifact/com.spotify/docker-client/5.0.2 -> https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-client/2.19 -> https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-common/2.19 -> https://mvnrepository.com/artifact/org.glassfish.jersey.bundles.repackaged/jersey-guava/2.19 **AFTER** {code} build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite test Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0 All tests passed. {code} was: While SPARK-28737 upgrades `Jersey` to 2.29, `docker-integration-tests` is broken because `com.spotify.docker-client` depends on `jersey-guava`. The latest `com.spotify.docker-client` is also still depending on that, too. - https://mvnrepository.com/artifact/com.spotify/docker-client/5.0.2 -> https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-client/2.19 -> https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-common/2.19 -> https://mvnrepository.com/artifact/org.glassfish.jersey.bundles.repackaged/jersey-guava/2.19 **AFTER** {code} build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite test Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0 All tests passed. {code} This is only for recovering JDBC integration test in JDK8 environment. For now, this is broken in both JDK8/11. After fixing, this will not work in JDK11. > Recover jersey-guava test dependency in docker-integration-tests > > > Key: SPARK-29546 > URL: https://issues.apache.org/jira/browse/SPARK-29546 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > While SPARK-28737 upgrades `Jersey` to 2.29, `docker-integration-tests` is > broken because `com.spotify.docker-client` depends on `jersey-guava`. The > latest `com.spotify.docker-client` is also still depending on that, too. > - https://mvnrepository.com/artifact/com.spotify/docker-client/5.0.2 > -> > https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-client/2.19 > -> > https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-common/2.19 > -> > https://mvnrepository.com/artifact/org.glassfish.jersey.bundles.repackaged/jersey-guava/2.19 > **AFTER** > {code} > build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 > -Dtest=none > -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite test > Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0 > All tests passed. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29547) Make `docker-integration-tests` work in JDK11
[ https://issues.apache.org/jira/browse/SPARK-29547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-29547. --- Resolution: Duplicate This is fixed by SPARK-29546 > Make `docker-integration-tests` work in JDK11 > - > > Key: SPARK-29547 > URL: https://issues.apache.org/jira/browse/SPARK-29547 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > To support JDK11, SPARK-28737 upgraded `Jersey` to 2.29. However, it turns > out that `docker-integration-tests` is broken because > `com.spotify.docker-client` still depends on jersey-guava. > SPARK-29546 recovers the test suite in JDK8 by adding back the dependency. We > had better make this test suite work in JDK11 environment, too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29546) Recover jersey-guava test dependency in docker-integration-tests
[ https://issues.apache.org/jira/browse/SPARK-29546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29546: -- Parent: SPARK-29194 Issue Type: Sub-task (was: Bug) > Recover jersey-guava test dependency in docker-integration-tests > > > Key: SPARK-29546 > URL: https://issues.apache.org/jira/browse/SPARK-29546 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > While SPARK-28737 upgrades `Jersey` to 2.29, `docker-integration-tests` is > broken because `com.spotify.docker-client` depends on `jersey-guava`. The > latest `com.spotify.docker-client` is also still depending on that, too. > - https://mvnrepository.com/artifact/com.spotify/docker-client/5.0.2 > -> > https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-client/2.19 > -> > https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-common/2.19 > -> > https://mvnrepository.com/artifact/org.glassfish.jersey.bundles.repackaged/jersey-guava/2.19 > **AFTER** > {code} > build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 > -Dtest=none > -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite test > Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0 > All tests passed. > {code} > This is only for recovering JDBC integration test in JDK8 environment. For > now, this is broken in both JDK8/11. After fixing, this will not work in > JDK11. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29511) DataSourceV2: Support CREATE NAMESPACE
[ https://issues.apache.org/jira/browse/SPARK-29511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-29511: --- Assignee: Terry Kim > DataSourceV2: Support CREATE NAMESPACE > -- > > Key: SPARK-29511 > URL: https://issues.apache.org/jira/browse/SPARK-29511 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > > CREATE NAMESPACE needs to support v2 catalogs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29511) DataSourceV2: Support CREATE NAMESPACE
[ https://issues.apache.org/jira/browse/SPARK-29511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-29511. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26166 [https://github.com/apache/spark/pull/26166] > DataSourceV2: Support CREATE NAMESPACE > -- > > Key: SPARK-29511 > URL: https://issues.apache.org/jira/browse/SPARK-29511 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > Fix For: 3.0.0 > > > CREATE NAMESPACE needs to support v2 catalogs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29563) CREATE TABLE LIKE should look up catalog/table like v2 commands
[ https://issues.apache.org/jira/browse/SPARK-29563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-29563: --- Assignee: Dilip Biswal (was: Wenchen Fan) > CREATE TABLE LIKE should look up catalog/table like v2 commands > --- > > Key: SPARK-29563 > URL: https://issues.apache.org/jira/browse/SPARK-29563 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dilip Biswal >Assignee: Dilip Biswal >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark
[ https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957508#comment-16957508 ] huangtianhua commented on SPARK-29106: -- [~shaneknapp],there are two small suggestions: # we don't have to download and install leveldbjni-all-1.8 in our arm test instance, we have installed it and it was there. # maybe we can try to use 'mvn clean package ' instead of 'mvn clean install '? > Add jenkins arm test for spark > -- > > Key: SPARK-29106 > URL: https://issues.apache.org/jira/browse/SPARK-29106 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.0 >Reporter: huangtianhua >Priority: Minor > > Add arm test jobs to amplab jenkins for spark. > Till now we made two arm test periodic jobs for spark in OpenLab, one is > based on master with hadoop 2.7(similar with QA test of amplab jenkins), > other one is based on a new branch which we made on date 09-09, see > [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64] > and > [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64] > We only have to care about the first one when integrate arm test with amplab > jenkins. > About the k8s test on arm, we have took test it, see > [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it > later. > And we plan test on other stable branches too, and we can integrate them to > amplab when they are ready. > We have offered an arm instance and sent the infos to shane knapp, thanks > shane to add the first arm job to amplab jenkins :) > The other important thing is about the leveldbjni > [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80] > spark depends on leveldbjni-all-1.8 > [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8], > we can see there is no arm64 supporting. So we build an arm64 supporting > release of leveldbjni see > [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8], > but we can't modified the spark pom.xml directly with something like > 'property'/'profile' to choose correct jar package on arm or x86 platform, > because spark depends on some hadoop packages like hadoop-hdfs, the packages > depend on leveldbjni-all-1.8 too, unless hadoop release with new arm > supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of > openlabtesting and 'mvn install' to use it when arm testing for spark. > PS: The issues found and fixed: > SPARK-28770 > [https://github.com/apache/spark/pull/25673] > > SPARK-28519 > [https://github.com/apache/spark/pull/25279] > > SPARK-28433 > [https://github.com/apache/spark/pull/25186] > > SPARK-28467 > [https://github.com/apache/spark/pull/25864] > > SPARK-29286 > [https://github.com/apache/spark/pull/26021] > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20880) When spark SQL is used with Avro-backed HIVE tables, NPE from org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.
[ https://issues.apache.org/jira/browse/SPARK-20880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957491#comment-16957491 ] Benjamyn Ward edited comment on SPARK-20880 at 10/23/19 2:05 AM: - Gentle ping. While the description states that the issue is fixed in Hive 2.2, based on the Hive Jira, the issue was fixed in version 2.3.0. * https://issues.apache.org/jira/browse/HIVE-16175 I am also running into this issue. I am going to try to work around the issue by using the **extraClassPath** that includes Hive SerDe 2.3.x, but I'm not sure if this will work or not. A much better solution would be to upgrade Spark's library dependencies. was (Author: errorsandglitches): Gentle ping. While the description states that the issue is fixed in Hive 2.2, based on the Hive Jira, the issue was fixed in version 2.3. * https://issues.apache.org/jira/browse/HIVE-16175 I am also running into this issue. I am going to try to work around the issue by using the **extraClassPath** that includes Hive SerDe 2.3.x, but I'm not sure if this will work or not. A much better solution would be to upgrade Spark's library dependencies. > When spark SQL is used with Avro-backed HIVE tables, NPE from > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories. > > > Key: SPARK-20880 > URL: https://issues.apache.org/jira/browse/SPARK-20880 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Vinod KC >Priority: Minor > > When spark SQL is used with Avro-backed HIVE tables, intermittently getting > NPE from > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories. > Root cause is due race condition in hive 1.2.1 jar used in Spark SQL . > In HIVE 2.2 this issue has been fixed (HIVE JIRA: > https://issues.apache.org/jira/browse/HIVE-16175. ), since Spark is still > using Hive 1.2.1 jars we are still getting into race condition. > One workaround is to run Spark with a single task per executor, however it > will slow down the jobs. > Exception stack trace > 13/05/07 09:18:39 WARN scheduler.TaskSetManager: Lost task 18.0 in stage 0.0 > (TID 18, aiyhyashu.dxc.com): java.lang.NullPointerException > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories(AvroObjectInspectorGenerator.java:142) > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:91) > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104) > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104) > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:120) > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspector(AvroObjectInspectorGenerator.java:83) > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.(AvroObjectInspectorGenerator.java:56) > at > org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:124) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$10.apply(TableReader.scala:251) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$10.apply(TableReader.scala:239) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache
[jira] [Commented] (SPARK-20880) When spark SQL is used with Avro-backed HIVE tables, NPE from org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.
[ https://issues.apache.org/jira/browse/SPARK-20880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957491#comment-16957491 ] Benjamyn Ward commented on SPARK-20880: --- Gentle ping. While the description states that the issue is fixed in Hive 2.2, based on the Hive Jira, the issue was fixed in version 2.3. * https://issues.apache.org/jira/browse/HIVE-16175 I am also running into this issue. I am going to try to work around the issue by using the **extraClassPath** that includes Hive SerDe 2.3.x, but I'm not sure if this will work or not. A much better solution would be to upgrade Spark's library dependencies. > When spark SQL is used with Avro-backed HIVE tables, NPE from > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories. > > > Key: SPARK-20880 > URL: https://issues.apache.org/jira/browse/SPARK-20880 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Vinod KC >Priority: Minor > > When spark SQL is used with Avro-backed HIVE tables, intermittently getting > NPE from > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories. > Root cause is due race condition in hive 1.2.1 jar used in Spark SQL . > In HIVE 2.2 this issue has been fixed (HIVE JIRA: > https://issues.apache.org/jira/browse/HIVE-16175. ), since Spark is still > using Hive 1.2.1 jars we are still getting into race condition. > One workaround is to run Spark with a single task per executor, however it > will slow down the jobs. > Exception stack trace > 13/05/07 09:18:39 WARN scheduler.TaskSetManager: Lost task 18.0 in stage 0.0 > (TID 18, aiyhyashu.dxc.com): java.lang.NullPointerException > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories(AvroObjectInspectorGenerator.java:142) > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:91) > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104) > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104) > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:120) > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspector(AvroObjectInspectorGenerator.java:83) > at > org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.(AvroObjectInspectorGenerator.java:56) > at > org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:124) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$10.apply(TableReader.scala:251) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$10.apply(TableReader.scala:239) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache
[jira] [Resolved] (SPARK-29107) Add window.sql - Part 1
[ https://issues.apache.org/jira/browse/SPARK-29107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-29107. -- Resolution: Fixed Issue resolved by pull request 26119 [https://github.com/apache/spark/pull/26119] > Add window.sql - Part 1 > --- > > Key: SPARK-29107 > URL: https://issues.apache.org/jira/browse/SPARK-29107 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Assignee: Dylan Guedes >Priority: Major > Fix For: 3.0.0 > > > In this ticket, we plan to add the regression test cases of > https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L1-L319 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23160) Port window.sql
[ https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23160. -- Resolution: Duplicate > Port window.sql > --- > > Key: SPARK-23160 > URL: https://issues.apache.org/jira/browse/SPARK-23160 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Xingbo Jiang >Priority: Minor > > In this ticket, we plan to add the regression test cases of > https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/window.sql. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23160) Port window.sql
[ https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957481#comment-16957481 ] Hyukjin Kwon commented on SPARK-23160: -- This JIRA will be resolved at SPARK-29107 SPARK-29108 SPARK-29109 SPARK-29110 > Port window.sql > --- > > Key: SPARK-23160 > URL: https://issues.apache.org/jira/browse/SPARK-23160 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Xingbo Jiang >Priority: Minor > > In this ticket, we plan to add the regression test cases of > https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/window.sql. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29553) This problemis about using native BLAS to improvement ML/MLLIB performance
[ https://issues.apache.org/jira/browse/SPARK-29553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WuZeyi updated SPARK-29553: --- Description: I use {color:#ff}native BLAS{color} to improvement ML/MLLIB performance on Yarn. The file {color:#ff}spark-env.sh{color} which is modified by SPARK-21305 said that I should set {color:#ff}OPENBLAS_NUM_THREADS=1{color} to disable multi-threading of OpenBLAS, but it does not take effect. I modify {color:#ff}spark.conf{color} to set {color:#FF}spark.executorEnv.OPENBLAS_NUM_THREADS=1{color},and the performance improve. I think MKL_NUM_THREADS is the same. was: I use {color:#FF}native BLAS{color} to improvement ML/MLLIB performance on Yarn. The file {color:#FF}spark-env.sh{color} which is modified by [SPARK-21305] said that I should set {color:#FF}OPENBLAS_NUM_THREADS=1{color} to disable multi-threading of OpenBLAS, but it does not take effect. I modify {color:#FF}spark.conf{color} to set OPENBLAS_NUM_THREADS=1,and the performance improve. I think MKL_NUM_THREADS is the same. > This problemis about using native BLAS to improvement ML/MLLIB performance > -- > > Key: SPARK-29553 > URL: https://issues.apache.org/jira/browse/SPARK-29553 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Affects Versions: 2.3.0, 2.4.4 >Reporter: WuZeyi >Priority: Major > Labels: performance > > I use {color:#ff}native BLAS{color} to improvement ML/MLLIB performance > on Yarn. > The file {color:#ff}spark-env.sh{color} which is modified by SPARK-21305 > said that I should set {color:#ff}OPENBLAS_NUM_THREADS=1{color} to > disable multi-threading of OpenBLAS, but it does not take effect. > I modify {color:#ff}spark.conf{color} to set > {color:#FF}spark.executorEnv.OPENBLAS_NUM_THREADS=1{color},and the > performance improve. > > > I think MKL_NUM_THREADS is the same. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-29488) In Web UI, stage page has js error when sort table.
[ https://issues.apache.org/jira/browse/SPARK-29488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957470#comment-16957470 ] jenny edited comment on SPARK-29488 at 10/23/19 1:07 AM: - Thank you [~srowen] for review. was (Author: cjn082030): Thank you [srowen|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=srowen] for review. > In Web UI, stage page has js error when sort table. > --- > > Key: SPARK-29488 > URL: https://issues.apache.org/jira/browse/SPARK-29488 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2, 2.4.4 >Reporter: jenny >Assignee: jenny >Priority: Minor > Fix For: 3.0.0 > > Attachments: image-2019-10-16-15-47-25-212.png > > > In Web UI, follow the steps below, get js error "Uncaught TypeError: Failed > to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'.". > # Click "Summary Metrics..." 's tablehead "Min" > # Click "Aggregated Metrics by Executor" 's tablehead "Task Time" > # Click "Summary Metrics..." 's tablehead "Min"(the same as step 1.) > !image-2019-10-16-15-47-25-212.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-29488) In Web UI, stage page has js error when sort table.
[ https://issues.apache.org/jira/browse/SPARK-29488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957470#comment-16957470 ] jenny edited comment on SPARK-29488 at 10/23/19 1:06 AM: - Thank you [srowen|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=srowen] for review. was (Author: cjn082030): Thank you https://issues.apache.org/jira/secure/ViewProfile.jspa?name=srowen for review. > In Web UI, stage page has js error when sort table. > --- > > Key: SPARK-29488 > URL: https://issues.apache.org/jira/browse/SPARK-29488 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2, 2.4.4 >Reporter: jenny >Assignee: jenny >Priority: Minor > Fix For: 3.0.0 > > Attachments: image-2019-10-16-15-47-25-212.png > > > In Web UI, follow the steps below, get js error "Uncaught TypeError: Failed > to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'.". > # Click "Summary Metrics..." 's tablehead "Min" > # Click "Aggregated Metrics by Executor" 's tablehead "Task Time" > # Click "Summary Metrics..." 's tablehead "Min"(the same as step 1.) > !image-2019-10-16-15-47-25-212.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29488) In Web UI, stage page has js error when sort table.
[ https://issues.apache.org/jira/browse/SPARK-29488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957470#comment-16957470 ] jenny commented on SPARK-29488: --- Thank you https://issues.apache.org/jira/secure/ViewProfile.jspa?name=srowen for review. > In Web UI, stage page has js error when sort table. > --- > > Key: SPARK-29488 > URL: https://issues.apache.org/jira/browse/SPARK-29488 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2, 2.4.4 >Reporter: jenny >Assignee: jenny >Priority: Minor > Fix For: 3.0.0 > > Attachments: image-2019-10-16-15-47-25-212.png > > > In Web UI, follow the steps below, get js error "Uncaught TypeError: Failed > to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'.". > # Click "Summary Metrics..." 's tablehead "Min" > # Click "Aggregated Metrics by Executor" 's tablehead "Task Time" > # Click "Summary Metrics..." 's tablehead "Min"(the same as step 1.) > !image-2019-10-16-15-47-25-212.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29563) CREATE TABLE LIKE should look up catalog/table like v2 commands
Dilip Biswal created SPARK-29563: Summary: CREATE TABLE LIKE should look up catalog/table like v2 commands Key: SPARK-29563 URL: https://issues.apache.org/jira/browse/SPARK-29563 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dilip Biswal Assignee: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29562) SQLAppStatusListener metrics aggregation is slow and memory hungry
Marcelo Masiero Vanzin created SPARK-29562: -- Summary: SQLAppStatusListener metrics aggregation is slow and memory hungry Key: SPARK-29562 URL: https://issues.apache.org/jira/browse/SPARK-29562 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Marcelo Masiero Vanzin While {{SQLAppStatusListener}} was added in 2.3, the aggregation code is very similar to what it was previously, so I'm sure this is even older. Long story short, the aggregation code ({{SQLAppStatusListener.aggregateMetrics}}) is very, very slow, and can take a non-trivial amount of time with large queries, aside from using a ton of memory. There are also cascading issues caused by that: since it's called from an event handler, it can slow down event processing, causing events to be dropped, which can cause listeners to miss important events that would tell them to free up internal state (and, thus, memory). To given an anecdotal example, one app I looked at ran into the "events being dropped" issue, which caused the listener to accumulate state for 100s of live stages, even though most of them were already finished. That lead to a few GB of memory being wasted due to finished stages that were still being tracked. Here, though, I'd like to focus on {{SQLAppStatusListener.aggregateMetrics}} and making it faster. We should look at the other issues (unblocking event processing, cleaning up of stale data in listeners) separately. (I also remember someone in the past trying to fix something in this area, but couldn't find a PR nor an open bug.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29561) Large Case Statement Code Generation OOM
[ https://issues.apache.org/jira/browse/SPARK-29561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Chen updated SPARK-29561: - Description: Spark Configuration spark.driver.memory = 1g spark.master = "local" spark.deploy.mode = "client" Try to execute a case statement with 3000+ branches. Added sql statement as attachment Spark runs for a while before it OOM {noformat} java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73) 19/10/22 16:19:54 ERROR FileFormatWriter: Aborting job null. java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newNode(HashMap.java:1750) at java.util.HashMap.putVal(HashMap.java:631) at java.util.HashMap.putMapEntries(HashMap.java:515) at java.util.HashMap.putAll(HashMap.java:785) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3345) at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3230) at org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3198) at org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3351) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3254) at org.codehaus.janino.UnitCompiler.access$3900(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3216) at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3198) at org.codehaus.janino.Java$Block.accept(Java.java:2756) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3260) at org.codehaus.janino.UnitCompiler.access$4000(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3217) at org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3198) at org.codehaus.janino.Java$DoStatement.accept(Java.java:3304) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3186) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3009) at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336) at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958) at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393) at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385) at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286) 19/10/22 16:19:54 ERROR Utils: throw uncaught fatal error in thread Spark Context Cleaner java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73){noformat} Generated code looks like {noformat} /* 029 */ private void project_doConsume(InternalRow scan_row, UTF8String project_expr_0, boolean project_exprIsNull_0) throws java.io.IOException { /* 030 */ byte project_caseWhenResultState = -1; /* 031 */ do { /* 032 */ boolean project_isNull1 = true; /* 033 */ boolean project_value1 = false; /* 034 */ /* 035 */ boolean project_isNull2 = project_exprIsNull_0; /* 036 */ int project_value2 = -1; /* 037 */ if (!project_exprIsNull_0) { /* 038 */ UTF8String.IntWrapper project_intWrapper = new UTF8String.IntWrapper(); /* 039 */ if (project_expr_0.toInt(project_intWrapper)) { /* 040 */ project_value2 = project_intWrappe
[jira] [Updated] (SPARK-29560) Add typesafe bintray repo for sbt-mima-plugin
[ https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29560: -- Summary: Add typesafe bintray repo for sbt-mima-plugin (was: sbt-mima-plugin is missing) > Add typesafe bintray repo for sbt-mima-plugin > - > > Key: SPARK-29560 > URL: https://issues.apache.org/jira/browse/SPARK-29560 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 2.4.5, 3.0.0 > > > GitHub Action detects the following from yesterday (Oct 21, 2019). > - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing. > - `master`: `sbt-mima-plugin:0.3.0` is missing. > These versions of `sbt-mima-plugin` seems to be removed from the old repo. We > need to change the repo location or upgrade this. > {code} > ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/ > ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle > Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as > default JAVA_HOME. > Note, this will be overridden by -java-home if it is set. > Attempting to fetch sbt > Launching sbt from build/sbt-launch-0.13.17.jar > [info] Loading project definition from > /Users/dongjoon/APACHE/spark-merge/project > [info] Updating > {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build... > [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ... > [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17 > [warn] typesafe-ivy-releases: tried > [warn] > https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] sbt-plugin-releases: tried > [warn] > https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] local: tried > [warn] > /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > [warn] local-preloaded-ivy: tried > [warn] > /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml > [warn] local-preloaded: tried > [warn] > file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found > [warn]:: > [warn] > [warn]Note: Some unresolved dependencies have extra attributes. > Check that these dependencies exist with the requested attributes. > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19) > [warn] +- default:spark-merge-build:0.1-SNAPSHOT > (scalaVersion=2.10, sbtVersion=0.13) > sbt.ResolveException: unresolved dependency: > com.typesafe#sbt-mima-plugin;0.1.17: not found > {code} > This breaks our Jenkins in `branch-2.4` now. > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.4-test-sbt-hadoop-2.6/611/console -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29560) sbt-mima-plugin is missing
[ https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-29560: - Assignee: Dongjoon Hyun > sbt-mima-plugin is missing > -- > > Key: SPARK-29560 > URL: https://issues.apache.org/jira/browse/SPARK-29560 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > > GitHub Action detects the following from yesterday (Oct 21, 2019). > - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing. > - `master`: `sbt-mima-plugin:0.3.0` is missing. > These versions of `sbt-mima-plugin` seems to be removed from the old repo. We > need to change the repo location or upgrade this. > {code} > ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/ > ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle > Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as > default JAVA_HOME. > Note, this will be overridden by -java-home if it is set. > Attempting to fetch sbt > Launching sbt from build/sbt-launch-0.13.17.jar > [info] Loading project definition from > /Users/dongjoon/APACHE/spark-merge/project > [info] Updating > {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build... > [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ... > [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17 > [warn] typesafe-ivy-releases: tried > [warn] > https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] sbt-plugin-releases: tried > [warn] > https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] local: tried > [warn] > /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > [warn] local-preloaded-ivy: tried > [warn] > /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml > [warn] local-preloaded: tried > [warn] > file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found > [warn]:: > [warn] > [warn]Note: Some unresolved dependencies have extra attributes. > Check that these dependencies exist with the requested attributes. > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19) > [warn] +- default:spark-merge-build:0.1-SNAPSHOT > (scalaVersion=2.10, sbtVersion=0.13) > sbt.ResolveException: unresolved dependency: > com.typesafe#sbt-mima-plugin;0.1.17: not found > {code} > This breaks our Jenkins in `branch-2.4` now. > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.4-test-sbt-hadoop-2.6/611/console -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29560) sbt-mima-plugin is missing
[ https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-29560. --- Fix Version/s: 3.0.0 2.4.5 Resolution: Fixed Issue resolved by pull request 26217 [https://github.com/apache/spark/pull/26217] > sbt-mima-plugin is missing > -- > > Key: SPARK-29560 > URL: https://issues.apache.org/jira/browse/SPARK-29560 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 2.4.5, 3.0.0 > > > GitHub Action detects the following from yesterday (Oct 21, 2019). > - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing. > - `master`: `sbt-mima-plugin:0.3.0` is missing. > These versions of `sbt-mima-plugin` seems to be removed from the old repo. We > need to change the repo location or upgrade this. > {code} > ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/ > ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle > Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as > default JAVA_HOME. > Note, this will be overridden by -java-home if it is set. > Attempting to fetch sbt > Launching sbt from build/sbt-launch-0.13.17.jar > [info] Loading project definition from > /Users/dongjoon/APACHE/spark-merge/project > [info] Updating > {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build... > [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ... > [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17 > [warn] typesafe-ivy-releases: tried > [warn] > https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] sbt-plugin-releases: tried > [warn] > https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] local: tried > [warn] > /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > [warn] local-preloaded-ivy: tried > [warn] > /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml > [warn] local-preloaded: tried > [warn] > file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found > [warn]:: > [warn] > [warn]Note: Some unresolved dependencies have extra attributes. > Check that these dependencies exist with the requested attributes. > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19) > [warn] +- default:spark-merge-build:0.1-SNAPSHOT > (scalaVersion=2.10, sbtVersion=0.13) > sbt.ResolveException: unresolved dependency: > com.typesafe#sbt-mima-plugin;0.1.17: not found > {code} > This breaks our Jenkins in `branch-2.4` now. > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.4-test-sbt-hadoop-2.6/611/console -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29561) Large Case Statement Code Generation OOM
[ https://issues.apache.org/jira/browse/SPARK-29561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Chen updated SPARK-29561: - Attachment: apacheSparkCase.sql > Large Case Statement Code Generation OOM > > > Key: SPARK-29561 > URL: https://issues.apache.org/jira/browse/SPARK-29561 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Michael Chen >Priority: Major > Attachments: apacheSparkCase.sql > > > Spark Configuration > spark.driver.memory = 1g > spark.master = "local" > spark.deploy.mode = "client" > Try to execute a case statement with 3000+ branches. > Spark runs for a while before it OOM > {noformat} > java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182) > at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) > at > org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) > at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73) > 19/10/22 16:19:54 ERROR FileFormatWriter: Aborting job null. > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.util.HashMap.newNode(HashMap.java:1750) > at java.util.HashMap.putVal(HashMap.java:631) > at java.util.HashMap.putMapEntries(HashMap.java:515) > at java.util.HashMap.putAll(HashMap.java:785) > at > org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3345) > at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:212) > at > org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3230) > at > org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3198) > at > org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3351) > at > org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) > at > org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3254) > at org.codehaus.janino.UnitCompiler.access$3900(UnitCompiler.java:212) > at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3216) > at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3198) > at org.codehaus.janino.Java$Block.accept(Java.java:2756) > at > org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) > at > org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3260) > at org.codehaus.janino.UnitCompiler.access$4000(UnitCompiler.java:212) > at > org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3217) > at > org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3198) > at org.codehaus.janino.Java$DoStatement.accept(Java.java:3304) > at > org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) > at > org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3186) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3009) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958) > at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212) > at > org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393) > at > org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385) > at > org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286) > 19/10/22 16:19:54 ERROR Utils: throw uncaught fatal error in thread Spark > Context Cleaner > java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182) > at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) > at > org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) > at > org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73){noformat} > Generated code looks like > {noformat} > /* 029 */ private void project_doConsume(InternalRow scan_row, UTF8String > project_expr_0, boolean project_exprIsNull_0) throws java.io.IOException { > /* 030 */ byte project_caseWhenResultState = -1; > /*
[jira] [Updated] (SPARK-29561) Large Case Statement Code Generation OOM
[ https://issues.apache.org/jira/browse/SPARK-29561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Chen updated SPARK-29561: - Description: Spark Configuration spark.driver.memory = 1g spark.master = "local" spark.deploy.mode = "client" Try to execute a case statement with 3000+ branches. Added sql statement as attachment Spark runs for a while before it OOM {noformat} java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73) 19/10/22 16:19:54 ERROR FileFormatWriter: Aborting job null. java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newNode(HashMap.java:1750) at java.util.HashMap.putVal(HashMap.java:631) at java.util.HashMap.putMapEntries(HashMap.java:515) at java.util.HashMap.putAll(HashMap.java:785) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3345) at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3230) at org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3198) at org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3351) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3254) at org.codehaus.janino.UnitCompiler.access$3900(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3216) at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3198) at org.codehaus.janino.Java$Block.accept(Java.java:2756) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3260) at org.codehaus.janino.UnitCompiler.access$4000(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3217) at org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3198) at org.codehaus.janino.Java$DoStatement.accept(Java.java:3304) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3186) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3009) at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336) at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958) at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393) at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385) at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286) 19/10/22 16:19:54 ERROR Utils: throw uncaught fatal error in thread Spark Context Cleaner java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73){noformat} Generated code looks like {noformat} /* 029 */ private void project_doConsume(InternalRow scan_row, UTF8String project_expr_0, boolean project_exprIsNull_0) throws java.io.IOException { /* 030 */ byte project_caseWhenResultState = -1; /* 031 */ do { /* 032 */ boolean project_isNull1 = true; /* 033 */ boolean project_value1 = false; /* 034 */ /* 035 */ boolean project_isNull2 = project_exprIsNull_0; /* 036 */ int project_value2 = -1; /* 037 */ if (!project_exprIsNull_0) { /* 038 */ UTF8String.IntWrapper project_intWrapper = new UTF8String.IntWrapper(); /* 039 */ if (project_expr_0.toInt(project_intWrapper)) { /* 040 */ project_value2 = project_intWrappe
[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark
[ https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957421#comment-16957421 ] zhao bo commented on SPARK-29106: - Shane, thanks for recheck the #9 build fail, let's see whether the issue could be reproduced in #10. If it still happen, we should fix it. Thank you:) > Add jenkins arm test for spark > -- > > Key: SPARK-29106 > URL: https://issues.apache.org/jira/browse/SPARK-29106 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.0 >Reporter: huangtianhua >Priority: Minor > > Add arm test jobs to amplab jenkins for spark. > Till now we made two arm test periodic jobs for spark in OpenLab, one is > based on master with hadoop 2.7(similar with QA test of amplab jenkins), > other one is based on a new branch which we made on date 09-09, see > [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64] > and > [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64] > We only have to care about the first one when integrate arm test with amplab > jenkins. > About the k8s test on arm, we have took test it, see > [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it > later. > And we plan test on other stable branches too, and we can integrate them to > amplab when they are ready. > We have offered an arm instance and sent the infos to shane knapp, thanks > shane to add the first arm job to amplab jenkins :) > The other important thing is about the leveldbjni > [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80] > spark depends on leveldbjni-all-1.8 > [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8], > we can see there is no arm64 supporting. So we build an arm64 supporting > release of leveldbjni see > [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8], > but we can't modified the spark pom.xml directly with something like > 'property'/'profile' to choose correct jar package on arm or x86 platform, > because spark depends on some hadoop packages like hadoop-hdfs, the packages > depend on leveldbjni-all-1.8 too, unless hadoop release with new arm > supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of > openlabtesting and 'mvn install' to use it when arm testing for spark. > PS: The issues found and fixed: > SPARK-28770 > [https://github.com/apache/spark/pull/25673] > > SPARK-28519 > [https://github.com/apache/spark/pull/25279] > > SPARK-28433 > [https://github.com/apache/spark/pull/25186] > > SPARK-28467 > [https://github.com/apache/spark/pull/25864] > > SPARK-29286 > [https://github.com/apache/spark/pull/26021] > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29561) Large Case Statement Code Generation OOM
Michael Chen created SPARK-29561: Summary: Large Case Statement Code Generation OOM Key: SPARK-29561 URL: https://issues.apache.org/jira/browse/SPARK-29561 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Michael Chen Spark Configuration spark.driver.memory = 1g spark.master = "local" spark.deploy.mode = "client" Try to execute a case statement with 3000+ branches. Spark runs for a while before it OOM {noformat} java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73) 19/10/22 16:19:54 ERROR FileFormatWriter: Aborting job null. java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newNode(HashMap.java:1750) at java.util.HashMap.putVal(HashMap.java:631) at java.util.HashMap.putMapEntries(HashMap.java:515) at java.util.HashMap.putAll(HashMap.java:785) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3345) at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3230) at org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3198) at org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3351) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3254) at org.codehaus.janino.UnitCompiler.access$3900(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3216) at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3198) at org.codehaus.janino.Java$Block.accept(Java.java:2756) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3260) at org.codehaus.janino.UnitCompiler.access$4000(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3217) at org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3198) at org.codehaus.janino.Java$DoStatement.accept(Java.java:3304) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3186) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3009) at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336) at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958) at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393) at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385) at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286) 19/10/22 16:19:54 ERROR Utils: throw uncaught fatal error in thread Spark Context Cleaner java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73){noformat} Generated code looks like {noformat} /* 029 */ private void project_doConsume(InternalRow scan_row, UTF8String project_expr_0, boolean project_exprIsNull_0) throws java.io.IOException { /* 030 */ byte project_caseWhenResultState = -1; /* 031 */ do { /* 032 */ boolean project_isNull1 = true; /* 033 */ boolean project_value1 = false; /* 034 */ /* 035 */ boolean project_isNull2 = project_exprIsNull_0; /* 036 */ int project_value2 = -1; /* 037 */ if (!project_exprIsNull_0) { /* 038 */ UTF8String.IntWrapper project_intWrapper = new UTF8Stri
[jira] [Updated] (SPARK-29561) Large Case Statement Code Generation OOM
[ https://issues.apache.org/jira/browse/SPARK-29561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Chen updated SPARK-29561: - Description: Spark Configuration spark.driver.memory = 1g spark.master = "local" spark.deploy.mode = "client" Try to execute a case statement with 3000+ branches. Spark runs for a while before it OOM {noformat} java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73) 19/10/22 16:19:54 ERROR FileFormatWriter: Aborting job null. java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newNode(HashMap.java:1750) at java.util.HashMap.putVal(HashMap.java:631) at java.util.HashMap.putMapEntries(HashMap.java:515) at java.util.HashMap.putAll(HashMap.java:785) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3345) at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3230) at org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3198) at org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3351) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3254) at org.codehaus.janino.UnitCompiler.access$3900(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3216) at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3198) at org.codehaus.janino.Java$Block.accept(Java.java:2756) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3260) at org.codehaus.janino.UnitCompiler.access$4000(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3217) at org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3198) at org.codehaus.janino.Java$DoStatement.accept(Java.java:3304) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3186) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3009) at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336) at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958) at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393) at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385) at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286) 19/10/22 16:19:54 ERROR Utils: throw uncaught fatal error in thread Spark Context Cleaner java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178) at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73){noformat} Generated code looks like {noformat} /* 029 */ private void project_doConsume(InternalRow scan_row, UTF8String project_expr_0, boolean project_exprIsNull_0) throws java.io.IOException { /* 030 */ byte project_caseWhenResultState = -1; /* 031 */ do { /* 032 */ boolean project_isNull1 = true; /* 033 */ boolean project_value1 = false; /* 034 */ /* 035 */ boolean project_isNull2 = project_exprIsNull_0; /* 036 */ int project_value2 = -1; /* 037 */ if (!project_exprIsNull_0) { /* 038 */ UTF8String.IntWrapper project_intWrapper = new UTF8String.IntWrapper(); /* 039 */ if (project_expr_0.toInt(project_intWrapper)) { /* 040 */ project_value2 = project_intWrapper.value; /* 041 */ } else {
[jira] [Updated] (SPARK-29542) [SQL][DOC] The descriptions of `spark.sql.files.*` are confused.
[ https://issues.apache.org/jira/browse/SPARK-29542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feiwang updated SPARK-29542: Description: Hi,the description of `spark.sql.files.maxPartitionBytes` is shown as below. {code:java} The maximum number of bytes to pack into a single partition when reading files. {code} It seems that it can ensure each partition at most process bytes of that value for spark sql. As shown in the attachment, the value of spark.sql.files.maxPartitionBytes is 128MB. For stage 1, its input is 16.3TB, but there are only 6400 tasks. I checked the code, it is only effective for data source table. So, its description is confused. Same as all the descriptions of `spark.sql.files.*`. was: Hi,the description of `spark.sql.files.maxPartitionBytes` is shown as below. {code:java} The maximum number of bytes to pack into a single partition when reading files. {code} It seems that it can ensure each partition at most process bytes of that value for spark sql. As shown in the attachment, the value of spark.sql.files.maxPartitionBytes is 128MB. For stage 1, its input is 16.3TB, but there are only 6400 tasks. I checked the code, it is only effective for data source table. So, its description is confused. > [SQL][DOC] The descriptions of `spark.sql.files.*` are confused. > > > Key: SPARK-29542 > URL: https://issues.apache.org/jira/browse/SPARK-29542 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.4.4 >Reporter: feiwang >Priority: Minor > Attachments: screenshot-1.png > > > Hi,the description of `spark.sql.files.maxPartitionBytes` is shown as below. > {code:java} > The maximum number of bytes to pack into a single partition when reading > files. > {code} > It seems that it can ensure each partition at most process bytes of that > value for spark sql. > As shown in the attachment, the value of spark.sql.files.maxPartitionBytes > is 128MB. > For stage 1, its input is 16.3TB, but there are only 6400 tasks. > I checked the code, it is only effective for data source table. > So, its description is confused. > Same as all the descriptions of `spark.sql.files.*`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23897) Guava version
[ https://issues.apache.org/jira/browse/SPARK-23897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957387#comment-16957387 ] Tak-Lon (Stephen) Wu commented on SPARK-23897: -- hadoop-3.2.x has included [HADOOP-16213|https://github.com/apache/hadoop/commit/e0b3cbd221c1e611660b364a64d1aec52b10bc4e] which upgraded guava to 27.0-jre, will spark include the change as a new profile e.g. Hadoop-3.2 ? > Guava version > - > > Key: SPARK-23897 > URL: https://issues.apache.org/jira/browse/SPARK-23897 > Project: Spark > Issue Type: Dependency upgrade > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Sercan Karaoglu >Priority: Minor > > Guava dependency version 14 is pretty old, needs to be updated to at least > 16, google cloud storage connector uses newer one which causes pretty popular > error with guava; "java.lang.NoSuchMethodError: > com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;" > and causes app to crash -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29542) [SQL][DOC] The descriptions of `spark.sql.files.*` are confused.
[ https://issues.apache.org/jira/browse/SPARK-29542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feiwang updated SPARK-29542: Summary: [SQL][DOC] The descriptions of `spark.sql.files.*` are confused. (was: [DOC] The description of `spark.sql.files.maxPartitionBytes` is confused.) > [SQL][DOC] The descriptions of `spark.sql.files.*` are confused. > > > Key: SPARK-29542 > URL: https://issues.apache.org/jira/browse/SPARK-29542 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.4.4 >Reporter: feiwang >Priority: Minor > Attachments: screenshot-1.png > > > Hi,the description of `spark.sql.files.maxPartitionBytes` is shown as below. > {code:java} > The maximum number of bytes to pack into a single partition when reading > files. > {code} > It seems that it can ensure each partition at most process bytes of that > value for spark sql. > As shown in the attachment, the value of spark.sql.files.maxPartitionBytes > is 128MB. > For stage 1, its input is 16.3TB, but there are only 6400 tasks. > I checked the code, it is only effective for data source table. > So, its description is confused. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29539) SHOW PARTITIONS should look up catalog/table like v2 commands
[ https://issues.apache.org/jira/browse/SPARK-29539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-29539. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26198 [https://github.com/apache/spark/pull/26198] > SHOW PARTITIONS should look up catalog/table like v2 commands > - > > Key: SPARK-29539 > URL: https://issues.apache.org/jira/browse/SPARK-29539 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.0.0 > > > SHOW PARTITIONS should look up catalog/table like v2 commands -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29539) SHOW PARTITIONS should look up catalog/table like v2 commands
[ https://issues.apache.org/jira/browse/SPARK-29539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-29539: --- Assignee: Huaxin Gao > SHOW PARTITIONS should look up catalog/table like v2 commands > - > > Key: SPARK-29539 > URL: https://issues.apache.org/jira/browse/SPARK-29539 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > > SHOW PARTITIONS should look up catalog/table like v2 commands -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28859) Remove value check of MEMORY_OFFHEAP_SIZE in declaration section
[ https://issues.apache.org/jira/browse/SPARK-28859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957365#comment-16957365 ] Yifan Xing commented on SPARK-28859: Created a pr: [https://github.pie.apple.com/pie/apache-spark/pull/469] [~holden] seems like this Jira is assigned to [~yifan Xu], who is Yifan Xu. (I am [~yifan_xing] :)) Sorry for the duplicated names. Would you mind reassign? I also don't have permission to update the ticket status. Would you like to update it to `In Review` or allow me permission? Thank you! > Remove value check of MEMORY_OFFHEAP_SIZE in declaration section > > > Key: SPARK-28859 > URL: https://issues.apache.org/jira/browse/SPARK-28859 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yang Jie >Assignee: yifan >Priority: Minor > > Now MEMORY_OFFHEAP_SIZE has default value 0, but It should be greater than 0 > when > MEMORY_OFFHEAP_ENABLED is true,, should we check this condition in code? > > SPARK-28577 add this check before request memory resource to Yarn > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29560) sbt-mima-plugin is missing
[ https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29560: -- Description: GitHub Action detects the following from yesterday (Oct 21, 2019). - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing. - `master`: `sbt-mima-plugin:0.3.0` is missing. These versions of `sbt-mima-plugin` seems to be removed from the old repo. We need to change the repo location or upgrade this. {code} ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/ ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. Attempting to fetch sbt Launching sbt from build/sbt-launch-0.13.17.jar [info] Loading project definition from /Users/dongjoon/APACHE/spark-merge/project [info] Updating {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build... [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ... [warn] module not found: com.typesafe#sbt-mima-plugin;0.1.17 [warn] typesafe-ivy-releases: tried [warn] https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] sbt-plugin-releases: tried [warn] https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] local: tried [warn] /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] public: tried [warn] https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom [warn] local-preloaded-ivy: tried [warn] /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml [warn] local-preloaded: tried [warn] file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom ... [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :: [warn] :: com.typesafe#sbt-mima-plugin;0.1.17: not found [warn] :: [warn] [warn] Note: Some unresolved dependencies have extra attributes. Check that these dependencies exist with the requested attributes. [warn] com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, sbtVersion=0.13) [warn] [warn] Note: Unresolved dependencies path: [warn] com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, sbtVersion=0.13) (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19) [warn]+- default:spark-merge-build:0.1-SNAPSHOT (scalaVersion=2.10, sbtVersion=0.13) sbt.ResolveException: unresolved dependency: com.typesafe#sbt-mima-plugin;0.1.17: not found {code} This breaks our Jenkins in `branch-2.4` now. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.4-test-sbt-hadoop-2.6/611/console was: GitHub Action detects the following from yesterday (Oct 21, 2019). - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing. - `master`: `sbt-mima-plugin:0.3.0` is missing. These versions of `sbt-mima-plugin` seems to be removed from the old repo. We need to change the repo location or upgrade this. {code} ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/ ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. Attempting to fetch sbt Launching sbt from build/sbt-launch-0.13.17.jar [info] Loading project definition from /Users/dongjoon/APACHE/spark-merge/project [info] Updating {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build... [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ... [warn] module not found: com.typesafe#sbt-mima-plugin;0.1.17 [warn] typesafe-ivy-releases: tried [warn] https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] sbt-plugin-releases: tried [warn] https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] local: tried [warn] /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] public: tried [warn] https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom [warn] local-preloaded-ivy: tried [warn] /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml [warn] local-preloaded: tried [warn] file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom ... [warn] :
[jira] [Commented] (SPARK-29560) sbt-mima-plugin is missing
[ https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957363#comment-16957363 ] Dongjoon Hyun commented on SPARK-29560: --- I raised the priority to `Blocker` because Jenkins is broken. We need to recover this as soon as possible to protect the branches from the upcoming commits. > sbt-mima-plugin is missing > -- > > Key: SPARK-29560 > URL: https://issues.apache.org/jira/browse/SPARK-29560 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Blocker > > GitHub Action detects the following from yesterday (Oct 21, 2019). > - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing. > - `master`: `sbt-mima-plugin:0.3.0` is missing. > These versions of `sbt-mima-plugin` seems to be removed from the old repo. We > need to change the repo location or upgrade this. > {code} > ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/ > ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle > Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as > default JAVA_HOME. > Note, this will be overridden by -java-home if it is set. > Attempting to fetch sbt > Launching sbt from build/sbt-launch-0.13.17.jar > [info] Loading project definition from > /Users/dongjoon/APACHE/spark-merge/project > [info] Updating > {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build... > [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ... > [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17 > [warn] typesafe-ivy-releases: tried > [warn] > https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] sbt-plugin-releases: tried > [warn] > https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] local: tried > [warn] > /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > [warn] local-preloaded-ivy: tried > [warn] > /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml > [warn] local-preloaded: tried > [warn] > file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found > [warn]:: > [warn] > [warn]Note: Some unresolved dependencies have extra attributes. > Check that these dependencies exist with the requested attributes. > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19) > [warn] +- default:spark-merge-build:0.1-SNAPSHOT > (scalaVersion=2.10, sbtVersion=0.13) > sbt.ResolveException: unresolved dependency: > com.typesafe#sbt-mima-plugin;0.1.17: not found > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29560) sbt-mima-plugin is missing
[ https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29560: -- Priority: Blocker (was: Major) > sbt-mima-plugin is missing > -- > > Key: SPARK-29560 > URL: https://issues.apache.org/jira/browse/SPARK-29560 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Blocker > > GitHub Action detects the following from yesterday (Oct 21, 2019). > - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing. > - `master`: `sbt-mima-plugin:0.3.0` is missing. > These versions of `sbt-mima-plugin` seems to be removed from the old repo. We > need to change the repo location or upgrade this. > {code} > ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/ > ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle > Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as > default JAVA_HOME. > Note, this will be overridden by -java-home if it is set. > Attempting to fetch sbt > Launching sbt from build/sbt-launch-0.13.17.jar > [info] Loading project definition from > /Users/dongjoon/APACHE/spark-merge/project > [info] Updating > {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build... > [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ... > [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17 > [warn] typesafe-ivy-releases: tried > [warn] > https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] sbt-plugin-releases: tried > [warn] > https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] local: tried > [warn] > /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > [warn] local-preloaded-ivy: tried > [warn] > /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml > [warn] local-preloaded: tried > [warn] > file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found > [warn]:: > [warn] > [warn]Note: Some unresolved dependencies have extra attributes. > Check that these dependencies exist with the requested attributes. > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19) > [warn] +- default:spark-merge-build:0.1-SNAPSHOT > (scalaVersion=2.10, sbtVersion=0.13) > sbt.ResolveException: unresolved dependency: > com.typesafe#sbt-mima-plugin;0.1.17: not found > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29560) sbt-mima-plugin is missing
[ https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957362#comment-16957362 ] Dongjoon Hyun commented on SPARK-29560: --- Yes. It does. I'm trying to fix this because this starts to break our Jenkins, too. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.4-test-sbt-hadoop-2.6/611/console > sbt-mima-plugin is missing > -- > > Key: SPARK-29560 > URL: https://issues.apache.org/jira/browse/SPARK-29560 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > GitHub Action detects the following from yesterday (Oct 21, 2019). > - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing. > - `master`: `sbt-mima-plugin:0.3.0` is missing. > These versions of `sbt-mima-plugin` seems to be removed from the old repo. We > need to change the repo location or upgrade this. > {code} > ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/ > ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle > Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as > default JAVA_HOME. > Note, this will be overridden by -java-home if it is set. > Attempting to fetch sbt > Launching sbt from build/sbt-launch-0.13.17.jar > [info] Loading project definition from > /Users/dongjoon/APACHE/spark-merge/project > [info] Updating > {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build... > [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ... > [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17 > [warn] typesafe-ivy-releases: tried > [warn] > https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] sbt-plugin-releases: tried > [warn] > https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] local: tried > [warn] > /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > [warn] local-preloaded-ivy: tried > [warn] > /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml > [warn] local-preloaded: tried > [warn] > file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found > [warn]:: > [warn] > [warn]Note: Some unresolved dependencies have extra attributes. > Check that these dependencies exist with the requested attributes. > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19) > [warn] +- default:spark-merge-build:0.1-SNAPSHOT > (scalaVersion=2.10, sbtVersion=0.13) > sbt.ResolveException: unresolved dependency: > com.typesafe#sbt-mima-plugin;0.1.17: not found > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29557) Upgrade dropwizard metrics library to 4.1.1
[ https://issues.apache.org/jira/browse/SPARK-29557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29557: -- Component/s: (was: Spark Core) Build > Upgrade dropwizard metrics library to 4.1.1 > --- > > Key: SPARK-29557 > URL: https://issues.apache.org/jira/browse/SPARK-29557 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Luca Canali >Priority: Minor > > This proposes to upgrade the dropwizard/codahale metrics library version used > by Spark to a recent version, tentatively 4.1.1. Spark is currently using > Dropwizard metrics version 3.1.5, a version that is no more actively > developed nor maintained, according to the project's Github repo README. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29556) Avoid including path in error response from REST submission server
[ https://issues.apache.org/jira/browse/SPARK-29556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29556: -- Affects Version/s: 1.6.3 > Avoid including path in error response from REST submission server > -- > > Key: SPARK-29556 > URL: https://issues.apache.org/jira/browse/SPARK-29556 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > Fix For: 2.4.5, 3.0.0 > > > I'm not sure if it's possible to exploit, but, the following code in > RESTSubmissionServer's ErrorServlet.service is a little risky as it includes > user-supplied path input in the error response. We don't want to let a link > determine what's in the resulting HTML. > {code} > val path = request.getPathInfo > ... > var msg = > parts match { > ... > case _ => > // never reached > s"Malformed path $path." > } > msg += s" Please submit requests through > http://[host]:[port]/$serverVersion/submissions/..."; > val error = handleError(msg) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29556) Avoid including path in error response from REST submission server
[ https://issues.apache.org/jira/browse/SPARK-29556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29556: -- Affects Version/s: 2.0.2 2.1.3 2.2.3 2.3.4 > Avoid including path in error response from REST submission server > -- > > Key: SPARK-29556 > URL: https://issues.apache.org/jira/browse/SPARK-29556 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > Fix For: 2.4.5, 3.0.0 > > > I'm not sure if it's possible to exploit, but, the following code in > RESTSubmissionServer's ErrorServlet.service is a little risky as it includes > user-supplied path input in the error response. We don't want to let a link > determine what's in the resulting HTML. > {code} > val path = request.getPathInfo > ... > var msg = > parts match { > ... > case _ => > // never reached > s"Malformed path $path." > } > msg += s" Please submit requests through > http://[host]:[port]/$serverVersion/submissions/..."; > val error = handleError(msg) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29556) Avoid including path in error response from REST submission server
[ https://issues.apache.org/jira/browse/SPARK-29556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-29556. --- Fix Version/s: 3.0.0 2.4.5 Resolution: Fixed Issue resolved by pull request 26211 [https://github.com/apache/spark/pull/26211] > Avoid including path in error response from REST submission server > -- > > Key: SPARK-29556 > URL: https://issues.apache.org/jira/browse/SPARK-29556 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4, 3.0.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > Fix For: 2.4.5, 3.0.0 > > > I'm not sure if it's possible to exploit, but, the following code in > RESTSubmissionServer's ErrorServlet.service is a little risky as it includes > user-supplied path input in the error response. We don't want to let a link > determine what's in the resulting HTML. > {code} > val path = request.getPathInfo > ... > var msg = > parts match { > ... > case _ => > // never reached > s"Malformed path $path." > } > msg += s" Please submit requests through > http://[host]:[port]/$serverVersion/submissions/..."; > val error = handleError(msg) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29560) sbt-mima-plugin is missing
[ https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957340#comment-16957340 ] Sean R. Owen commented on SPARK-29560: -- Hm. I note that 0.3.0 is the last version that works with sbt 0.13, so we need to find 0.3.0. It does seem to have disappeared; I presume it was previously at https://dl.bintray.com/sbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/ or under https://dl.bintray.com/typesafe/ivy-releases/com.typesafe/ It looks like it is still here: https://dl.bintray.com/typesafe/sbt-plugins/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.3.0/ ... so maybe it's a question of adding a new repo for plugin resolution? I don't know how to do that off the top of my head, anyone know SBT better? :) > sbt-mima-plugin is missing > -- > > Key: SPARK-29560 > URL: https://issues.apache.org/jira/browse/SPARK-29560 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > GitHub Action detects the following from yesterday (Oct 21, 2019). > - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing. > - `master`: `sbt-mima-plugin:0.3.0` is missing. > These versions of `sbt-mima-plugin` seems to be removed from the old repo. We > need to change the repo location or upgrade this. > {code} > ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/ > ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle > Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as > default JAVA_HOME. > Note, this will be overridden by -java-home if it is set. > Attempting to fetch sbt > Launching sbt from build/sbt-launch-0.13.17.jar > [info] Loading project definition from > /Users/dongjoon/APACHE/spark-merge/project > [info] Updating > {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build... > [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ... > [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17 > [warn] typesafe-ivy-releases: tried > [warn] > https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] sbt-plugin-releases: tried > [warn] > https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] local: tried > [warn] > /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > [warn] local-preloaded-ivy: tried > [warn] > /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml > [warn] local-preloaded: tried > [warn] > file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found > [warn]:: > [warn] > [warn]Note: Some unresolved dependencies have extra attributes. > Check that these dependencies exist with the requested attributes. > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19) > [warn] +- default:spark-merge-build:0.1-SNAPSHOT > (scalaVersion=2.10, sbtVersion=0.13) > sbt.ResolveException: unresolved dependency: > com.typesafe#sbt-mima-plugin;0.1.17: not found > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29560) sbt-mima-plugin is missing
[ https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29560: -- Description: GitHub Action detects the following from yesterday (Oct 21, 2019). - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing. - `master`: `sbt-mima-plugin:0.3.0` is missing. These versions of `sbt-mima-plugin` seems to be removed from the old repo. We need to change the repo location or upgrade this. {code} ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/ ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. Attempting to fetch sbt Launching sbt from build/sbt-launch-0.13.17.jar [info] Loading project definition from /Users/dongjoon/APACHE/spark-merge/project [info] Updating {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build... [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ... [warn] module not found: com.typesafe#sbt-mima-plugin;0.1.17 [warn] typesafe-ivy-releases: tried [warn] https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] sbt-plugin-releases: tried [warn] https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] local: tried [warn] /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] public: tried [warn] https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom [warn] local-preloaded-ivy: tried [warn] /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml [warn] local-preloaded: tried [warn] file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom ... [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :: [warn] :: com.typesafe#sbt-mima-plugin;0.1.17: not found [warn] :: [warn] [warn] Note: Some unresolved dependencies have extra attributes. Check that these dependencies exist with the requested attributes. [warn] com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, sbtVersion=0.13) [warn] [warn] Note: Unresolved dependencies path: [warn] com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, sbtVersion=0.13) (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19) [warn]+- default:spark-merge-build:0.1-SNAPSHOT (scalaVersion=2.10, sbtVersion=0.13) sbt.ResolveException: unresolved dependency: com.typesafe#sbt-mima-plugin;0.1.17: not found {code} was: GitHub Action detects the following from yesterday (Oct 21, 2019). - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing. - `master`: `sbt-mima-plugin:0.3.0` is missing. These versions of `sbt-mima-plugin` seems to be removed from the old repo. We need to change the repo location or upgrade this. {code} ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. Attempting to fetch sbt Launching sbt from build/sbt-launch-0.13.17.jar [info] Loading project definition from /Users/dongjoon/APACHE/spark-merge/project [info] Updating {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build... [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ... [warn] module not found: com.typesafe#sbt-mima-plugin;0.1.17 [warn] typesafe-ivy-releases: tried [warn] https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] sbt-plugin-releases: tried [warn] https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] local: tried [warn] /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] public: tried [warn] https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom [warn] local-preloaded-ivy: tried [warn] /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml [warn] local-preloaded: tried [warn] file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom ... [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :: [warn] :: com.typesafe#sbt-mima-plugin;0.1.17: not found [warn]
[jira] [Commented] (SPARK-29560) sbt-mima-plugin is missing
[ https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957260#comment-16957260 ] Dongjoon Hyun commented on SPARK-29560: --- Although this is not an Apache Spark issue, but we are affected. (cc [~srowen] and [~hyukjin.kwon]) > sbt-mima-plugin is missing > -- > > Key: SPARK-29560 > URL: https://issues.apache.org/jira/browse/SPARK-29560 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > GitHub Action detects the following from yesterday (Oct 21, 2019). > - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing. > - `master`: `sbt-mima-plugin:0.3.0` is missing. > These versions of `sbt-mima-plugin` seems to be removed from the old repo. We > need to change the repo location or upgrade this. > {code} > ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle > Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as > default JAVA_HOME. > Note, this will be overridden by -java-home if it is set. > Attempting to fetch sbt > Launching sbt from build/sbt-launch-0.13.17.jar > [info] Loading project definition from > /Users/dongjoon/APACHE/spark-merge/project > [info] Updating > {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build... > [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ... > [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17 > [warn] typesafe-ivy-releases: tried > [warn] > https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] sbt-plugin-releases: tried > [warn] > https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] local: tried > [warn] > /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > [warn] local-preloaded-ivy: tried > [warn] > /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml > [warn] local-preloaded: tried > [warn] > file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom > ... > [warn]:: > [warn]:: UNRESOLVED DEPENDENCIES :: > [warn]:: > [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found > [warn]:: > [warn] > [warn]Note: Some unresolved dependencies have extra attributes. > Check that these dependencies exist with the requested attributes. > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > [warn] > [warn]Note: Unresolved dependencies path: > [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, > sbtVersion=0.13) > (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19) > [warn] +- default:spark-merge-build:0.1-SNAPSHOT > (scalaVersion=2.10, sbtVersion=0.13) > sbt.ResolveException: unresolved dependency: > com.typesafe#sbt-mima-plugin;0.1.17: not found > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29560) sbt-mima-plugin is missing
Dongjoon Hyun created SPARK-29560: - Summary: sbt-mima-plugin is missing Key: SPARK-29560 URL: https://issues.apache.org/jira/browse/SPARK-29560 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.4.4, 3.0.0 Reporter: Dongjoon Hyun GitHub Action detects the following from yesterday (Oct 21, 2019). - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing. - `master`: `sbt-mima-plugin:0.3.0` is missing. These versions of `sbt-mima-plugin` seems to be removed from the old repo. We need to change the repo location or upgrade this. {code} ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. Attempting to fetch sbt Launching sbt from build/sbt-launch-0.13.17.jar [info] Loading project definition from /Users/dongjoon/APACHE/spark-merge/project [info] Updating {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build... [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ... [warn] module not found: com.typesafe#sbt-mima-plugin;0.1.17 [warn] typesafe-ivy-releases: tried [warn] https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] sbt-plugin-releases: tried [warn] https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] local: tried [warn] /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml [warn] public: tried [warn] https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom [warn] local-preloaded-ivy: tried [warn] /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml [warn] local-preloaded: tried [warn] file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom ... [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :: [warn] :: com.typesafe#sbt-mima-plugin;0.1.17: not found [warn] :: [warn] [warn] Note: Some unresolved dependencies have extra attributes. Check that these dependencies exist with the requested attributes. [warn] com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, sbtVersion=0.13) [warn] [warn] Note: Unresolved dependencies path: [warn] com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, sbtVersion=0.13) (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19) [warn]+- default:spark-merge-build:0.1-SNAPSHOT (scalaVersion=2.10, sbtVersion=0.13) sbt.ResolveException: unresolved dependency: com.typesafe#sbt-mima-plugin;0.1.17: not found {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29559) Support pagination for JDBC/ODBC UI page
shahid created SPARK-29559: -- Summary: Support pagination for JDBC/ODBC UI page Key: SPARK-29559 URL: https://issues.apache.org/jira/browse/SPARK-29559 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 2.4.4, 3.0.0 Reporter: shahid Support pagination for JDBC/ODBC UI page -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29558) ResolveTables and ResolveRelations should be order-insensitive
Wenchen Fan created SPARK-29558: --- Summary: ResolveTables and ResolveRelations should be order-insensitive Key: SPARK-29558 URL: https://issues.apache.org/jira/browse/SPARK-29558 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29557) Upgrade dropwizard metrics library to 4.1.1
Luca Canali created SPARK-29557: --- Summary: Upgrade dropwizard metrics library to 4.1.1 Key: SPARK-29557 URL: https://issues.apache.org/jira/browse/SPARK-29557 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0 Reporter: Luca Canali This proposes to upgrade the dropwizard/codahale metrics library version used by Spark to a recent version, tentatively 4.1.1. Spark is currently using Dropwizard metrics version 3.1.5, a version that is no more actively developed nor maintained, according to the project's Github repo README. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29555) Getting 404 while opening link for Sequence file on latest documentation page
[ https://issues.apache.org/jira/browse/SPARK-29555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Akkalkote updated SPARK-29555: - Description: Trying to open the link for Sequence file on Page ([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which redirects to like - [http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html] however getting 404 (which is a standard http error code for File Not Found) Its actually giving 404 for all resources whose base url is – [http://hadoop.apache.org/common] e.g. [SequenceFiles|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html], [Writable|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Writable.html], [IntWritable|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/IntWritable.html] and [Text|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Text.html]. was: Trying to open the link for Sequence file on Page ([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which redirects to like - [http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html] however getting 404 (which is a standard http error code for File Not Found) Its actually giving 404 for all resources whose base url is – [http://hadoop.apache.org/common] > Getting 404 while opening link for Sequence file on latest documentation page > - > > Key: SPARK-29555 > URL: https://issues.apache.org/jira/browse/SPARK-29555 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.4.4 >Reporter: Vishal Akkalkote >Priority: Major > > Trying to open the link for Sequence file on Page > ([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which > redirects to like - > [http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html] > however getting 404 (which is a standard http error code for File Not Found) > Its actually giving 404 for all resources whose base url is – > [http://hadoop.apache.org/common] > e.g. > [SequenceFiles|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html], > > [Writable|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Writable.html], > > [IntWritable|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/IntWritable.html] > and > [Text|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Text.html]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29555) Getting 404 while opening link for Sequence file on latest documentation page
[ https://issues.apache.org/jira/browse/SPARK-29555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Akkalkote updated SPARK-29555: - Description: Trying to open the link for Sequence file on Page ([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which redirects to like - [http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html] however getting 404 (which is a standard http error code for File Not Found) Its actually giving 404 for all resources whose base url is – [http://hadoop.apache.org/common] was: Trying to open the link for Sequence file on Page ([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which redirects to like - [http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html] however getting 404 (which is a standard http error code for File Not Found) > Getting 404 while opening link for Sequence file on latest documentation page > - > > Key: SPARK-29555 > URL: https://issues.apache.org/jira/browse/SPARK-29555 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.4.4 >Reporter: Vishal Akkalkote >Priority: Major > > Trying to open the link for Sequence file on Page > ([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which > redirects to like - > [http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html] > however getting 404 (which is a standard http error code for File Not Found) > Its actually giving 404 for all resources whose base url is – > [http://hadoop.apache.org/common] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29556) Avoid including path in error response from REST submission server
Sean R. Owen created SPARK-29556: Summary: Avoid including path in error response from REST submission server Key: SPARK-29556 URL: https://issues.apache.org/jira/browse/SPARK-29556 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.4, 3.0.0 Reporter: Sean R. Owen Assignee: Sean R. Owen I'm not sure if it's possible to exploit, but, the following code in RESTSubmissionServer's ErrorServlet.service is a little risky as it includes user-supplied path input in the error response. We don't want to let a link determine what's in the resulting HTML. {code} val path = request.getPathInfo ... var msg = parts match { ... case _ => // never reached s"Malformed path $path." } msg += s" Please submit requests through http://[host]:[port]/$serverVersion/submissions/..."; val error = handleError(msg) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29551) There is a bug about fetch failed when an executor lost
[ https://issues.apache.org/jira/browse/SPARK-29551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-29551: - Fix Version/s: (was: 2.4.3) Target Version/s: (was: 2.4.3, 2.4.5, 3.0.0) Priority: Major (was: Blocker) Don't set blocker or target / fix versions please. > There is a bug about fetch failed when an executor lost > > > Key: SPARK-29551 > URL: https://issues.apache.org/jira/browse/SPARK-29551 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: weixiuli >Priority: Major > > There will be a regression when the executor lost and then causes 'fetch > failed'. > We can add an unittest in 'DAGSchedulerSuite.scala' to catch the above > problem. > {code} > test("All shuffle files on the slave should be cleaned up when slave lost > test") { > // reset the test context with the right shuffle service config > afterEach() > val conf = new SparkConf() > conf.set(config.SHUFFLE_SERVICE_ENABLED.key, "true") > conf.set("spark.files.fetchFailure.unRegisterOutputOnHost", "true") > init(conf) > runEvent(ExecutorAdded("exec-hostA1", "hostA")) > runEvent(ExecutorAdded("exec-hostA2", "hostA")) > runEvent(ExecutorAdded("exec-hostB", "hostB")) > val firstRDD = new MyRDD(sc, 3, Nil) > val firstShuffleDep = new ShuffleDependency(firstRDD, new > HashPartitioner(3)) > val firstShuffleId = firstShuffleDep.shuffleId > val shuffleMapRdd = new MyRDD(sc, 3, List(firstShuffleDep)) > val shuffleDep = new ShuffleDependency(shuffleMapRdd, new > HashPartitioner(3)) > val secondShuffleId = shuffleDep.shuffleId > val reduceRdd = new MyRDD(sc, 1, List(shuffleDep)) > submit(reduceRdd, Array(0)) > // map stage1 completes successfully, with one task on each executor > complete(taskSets(0), Seq( > (Success, > MapStatus( > BlockManagerId("exec-hostA1", "hostA", 12345), > Array.fill[Long](1)(2), mapTaskId = 5)), > (Success, > MapStatus( > BlockManagerId("exec-hostA2", "hostA", 12345), > Array.fill[Long](1)(2), mapTaskId = 6)), > (Success, makeMapStatus("hostB", 1, mapTaskId = 7)) > )) > // map stage2 completes successfully, with one task on each executor > complete(taskSets(1), Seq( > (Success, > MapStatus( > BlockManagerId("exec-hostA1", "hostA", 12345), > Array.fill[Long](1)(2), mapTaskId = 8)), > (Success, > MapStatus( > BlockManagerId("exec-hostA2", "hostA", 12345), > Array.fill[Long](1)(2), mapTaskId = 9)), > (Success, makeMapStatus("hostB", 1, mapTaskId = 10)) > )) > // make sure our test setup is correct > val initialMapStatus1 = > mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses > // val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get > assert(initialMapStatus1.count(_ != null) === 3) > assert(initialMapStatus1.map{_.location.executorId}.toSet === > Set("exec-hostA1", "exec-hostA2", "exec-hostB")) > assert(initialMapStatus1.map{_.mapId}.toSet === Set(5, 6, 7)) > val initialMapStatus2 = > mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses > // val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get > assert(initialMapStatus2.count(_ != null) === 3) > assert(initialMapStatus2.map{_.location.executorId}.toSet === > Set("exec-hostA1", "exec-hostA2", "exec-hostB")) > assert(initialMapStatus2.map{_.mapId}.toSet === Set(8, 9, 10)) > // kill exec-hostA2 > runEvent(ExecutorLost("exec-hostA2", ExecutorKilled)) > // reduce stage fails with a fetch failure from map stage from exec-hostA2 > complete(taskSets(2), Seq( > (FetchFailed(BlockManagerId("exec-hostA2", "hostA", 12345), > secondShuffleId, 0L, 0, 0, "ignored"), > null) > )) > // Here is the main assertion -- make sure that we de-register > // the map outputs for both map stage from both executors on hostA > val mapStatus1 = > mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses > assert(mapStatus1.count(_ != null) === 1) > assert(mapStatus1(2).location.executorId === "exec-hostB") > assert(mapStatus1(2).location.host === "hostB") > val mapStatus2 = > mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses > assert(mapStatus2.count(_ != null) === 1) > assert(mapStatus2(2).location.executorId === "exec-hostB") > assert(mapStatus2(2).location.host === "hostB") > } > {code} > The error output is: > {code} > 3 did not equal 1 > ScalaTestFailureLocation: org.apache.spark.scheduler.DAGSchedulerSuite at > (DAGSchedulerSuite.scala:609) > Expected :1 > Actual :3 > > org.scalatest.except
[jira] [Updated] (SPARK-29488) In Web UI, stage page has js error when sort table.
[ https://issues.apache.org/jira/browse/SPARK-29488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-29488: - Priority: Minor (was: Major) > In Web UI, stage page has js error when sort table. > --- > > Key: SPARK-29488 > URL: https://issues.apache.org/jira/browse/SPARK-29488 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2, 2.4.4 >Reporter: jenny >Assignee: jenny >Priority: Minor > Fix For: 3.0.0 > > Attachments: image-2019-10-16-15-47-25-212.png > > > In Web UI, follow the steps below, get js error "Uncaught TypeError: Failed > to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'.". > # Click "Summary Metrics..." 's tablehead "Min" > # Click "Aggregated Metrics by Executor" 's tablehead "Task Time" > # Click "Summary Metrics..." 's tablehead "Min"(the same as step 1.) > !image-2019-10-16-15-47-25-212.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29488) In Web UI, stage page has js error when sort table.
[ https://issues.apache.org/jira/browse/SPARK-29488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-29488: Assignee: jenny > In Web UI, stage page has js error when sort table. > --- > > Key: SPARK-29488 > URL: https://issues.apache.org/jira/browse/SPARK-29488 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2, 2.4.4 >Reporter: jenny >Assignee: jenny >Priority: Major > Attachments: image-2019-10-16-15-47-25-212.png > > > In Web UI, follow the steps below, get js error "Uncaught TypeError: Failed > to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'.". > # Click "Summary Metrics..." 's tablehead "Min" > # Click "Aggregated Metrics by Executor" 's tablehead "Task Time" > # Click "Summary Metrics..." 's tablehead "Min"(the same as step 1.) > !image-2019-10-16-15-47-25-212.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29488) In Web UI, stage page has js error when sort table.
[ https://issues.apache.org/jira/browse/SPARK-29488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-29488. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26136 [https://github.com/apache/spark/pull/26136] > In Web UI, stage page has js error when sort table. > --- > > Key: SPARK-29488 > URL: https://issues.apache.org/jira/browse/SPARK-29488 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2, 2.4.4 >Reporter: jenny >Assignee: jenny >Priority: Major > Fix For: 3.0.0 > > Attachments: image-2019-10-16-15-47-25-212.png > > > In Web UI, follow the steps below, get js error "Uncaught TypeError: Failed > to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'.". > # Click "Summary Metrics..." 's tablehead "Min" > # Click "Aggregated Metrics by Executor" 's tablehead "Task Time" > # Click "Summary Metrics..." 's tablehead "Min"(the same as step 1.) > !image-2019-10-16-15-47-25-212.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28793) Document CREATE FUNCTION in SQL Reference.
[ https://issues.apache.org/jira/browse/SPARK-28793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-28793: - Priority: Minor (was: Major) > Document CREATE FUNCTION in SQL Reference. > -- > > Key: SPARK-28793 > URL: https://issues.apache.org/jira/browse/SPARK-28793 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 2.4.3 >Reporter: Dilip Biswal >Assignee: Dilip Biswal >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28793) Document CREATE FUNCTION in SQL Reference.
[ https://issues.apache.org/jira/browse/SPARK-28793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-28793: Assignee: Dilip Biswal > Document CREATE FUNCTION in SQL Reference. > -- > > Key: SPARK-28793 > URL: https://issues.apache.org/jira/browse/SPARK-28793 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 2.4.3 >Reporter: Dilip Biswal >Assignee: Dilip Biswal >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28793) Document CREATE FUNCTION in SQL Reference.
[ https://issues.apache.org/jira/browse/SPARK-28793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-28793. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25894 [https://github.com/apache/spark/pull/25894] > Document CREATE FUNCTION in SQL Reference. > -- > > Key: SPARK-28793 > URL: https://issues.apache.org/jira/browse/SPARK-28793 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 2.4.3 >Reporter: Dilip Biswal >Assignee: Dilip Biswal >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28787) Document LOAD DATA statement in SQL Reference.
[ https://issues.apache.org/jira/browse/SPARK-28787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-28787: - Priority: Minor (was: Major) > Document LOAD DATA statement in SQL Reference. > -- > > Key: SPARK-28787 > URL: https://issues.apache.org/jira/browse/SPARK-28787 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.0.0 > > > Document LOAD DATA statement in SQL Reference. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28787) Document LOAD DATA statement in SQL Reference.
[ https://issues.apache.org/jira/browse/SPARK-28787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-28787. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25522 [https://github.com/apache/spark/pull/25522] > Document LOAD DATA statement in SQL Reference. > -- > > Key: SPARK-28787 > URL: https://issues.apache.org/jira/browse/SPARK-28787 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.0.0 > > > Document LOAD DATA statement in SQL Reference. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28787) Document LOAD DATA statement in SQL Reference.
[ https://issues.apache.org/jira/browse/SPARK-28787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-28787: Assignee: Huaxin Gao > Document LOAD DATA statement in SQL Reference. > -- > > Key: SPARK-28787 > URL: https://issues.apache.org/jira/browse/SPARK-28787 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > > Document LOAD DATA statement in SQL Reference. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29555) Getting 404 while opening link for Sequence file on latest documentation page
Vishal Akkalkote created SPARK-29555: Summary: Getting 404 while opening link for Sequence file on latest documentation page Key: SPARK-29555 URL: https://issues.apache.org/jira/browse/SPARK-29555 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 2.4.4 Reporter: Vishal Akkalkote Trying to open the link for Sequence file on Page ([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which redirects to like - [http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html] however getting 404 (which is a standard http error code for File Not Found) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29517) TRUNCATE TABLE should look up catalog/table like v2 commands
[ https://issues.apache.org/jira/browse/SPARK-29517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-29517. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26174 [https://github.com/apache/spark/pull/26174] > TRUNCATE TABLE should look up catalog/table like v2 commands > > > Key: SPARK-29517 > URL: https://issues.apache.org/jira/browse/SPARK-29517 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.0.0 > > > TRUNCATE TABLE should look up catalog/table like v2 commands -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29554) Add a misc function named version
Kent Yao created SPARK-29554: Summary: Add a misc function named version Key: SPARK-29554 URL: https://issues.apache.org/jira/browse/SPARK-29554 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Kent Yao |string|version()|Returns the Hive version (as of Hive 2.1.0). The string contains 2 fields, the first being a build number and the second being a build hash. Example: "select version();" might return "2.1.0.2.5.0.0-1245 r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on your build.| [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29552) Fix the flaky test failed in AdaptiveQueryExecSuite # multiple joins
[ https://issues.apache.org/jira/browse/SPARK-29552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated SPARK-29552: --- Description: AQE will optimize the logical plan once there is query stage finished. So for inner join, when two join side is all small to be the build side. The planner of converting logical plan to physical plan will select the build side as BuildRight if right side finished firstly or BuildLeft if left side finished firstly. In some case, when BuildRight or BuildLeft may introduce additional exchange to the parent node. The revert approach in OptimizeLocalShuffleReader rule may be too conservative, which revert all the local shuffle reader when introduce additional exchange not revert the local shuffle reader introduced shuffle. It may be expense to only revert the local shuffle reader introduced shuffle. The workaround is to apply the OptimizeLocalShuffleReader rule again when creating new query stage to further optimize the sub tree shuffle reader to local shuffle reader. (was: AQE will optimize the logical plan once there is query stage finished. So for inner join, when two join side is all small to be the build side. The planner of converting logical plan to physical plan will select the build side as BuildRight if right side finished firstly or BuildLeft if left side finished firstly. In some case, when BuildRight or BuildLeft may introduce additioanl exchange to the parent node. The revert approach in OptimizeLocalShuffleReader rule may be too conservative, which revert all the local shuffle reader when introduce additional exchange not revert the local shuffle reader introduced shuffle. It may be expense to only revert the local shuffle reader introduced shuffle. The workaround is to apply the OptimizeLocalShuffleReader rule again when creating new query stage to further optimize the sub tree shuffle reader to local shuffle reader.) > Fix the flaky test failed in AdaptiveQueryExecSuite # multiple joins > > > Key: SPARK-29552 > URL: https://issues.apache.org/jira/browse/SPARK-29552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ke Jia >Priority: Major > > AQE will optimize the logical plan once there is query stage finished. So for > inner join, when two join side is all small to be the build side. The planner > of converting logical plan to physical plan will select the build side as > BuildRight if right side finished firstly or BuildLeft if left side finished > firstly. In some case, when BuildRight or BuildLeft may introduce additional > exchange to the parent node. The revert approach in > OptimizeLocalShuffleReader rule may be too conservative, which revert all the > local shuffle reader when introduce additional exchange not revert the local > shuffle reader introduced shuffle. It may be expense to only revert the > local shuffle reader introduced shuffle. The workaround is to apply the > OptimizeLocalShuffleReader rule again when creating new query stage to > further optimize the sub tree shuffle reader to local shuffle reader. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29553) This problemis about using native BLAS to improvement ML/MLLIB performance
WuZeyi created SPARK-29553: -- Summary: This problemis about using native BLAS to improvement ML/MLLIB performance Key: SPARK-29553 URL: https://issues.apache.org/jira/browse/SPARK-29553 Project: Spark Issue Type: Improvement Components: ML, MLlib Affects Versions: 2.4.4, 2.3.0 Reporter: WuZeyi I use {color:#FF}native BLAS{color} to improvement ML/MLLIB performance on Yarn. The file {color:#FF}spark-env.sh{color} which is modified by [SPARK-21305] said that I should set {color:#FF}OPENBLAS_NUM_THREADS=1{color} to disable multi-threading of OpenBLAS, but it does not take effect. I modify {color:#FF}spark.conf{color} to set OPENBLAS_NUM_THREADS=1,and the performance improve. I think MKL_NUM_THREADS is the same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29552) Fix the flaky test failed in AdaptiveQueryExecSuite # multiple joins
Ke Jia created SPARK-29552: -- Summary: Fix the flaky test failed in AdaptiveQueryExecSuite # multiple joins Key: SPARK-29552 URL: https://issues.apache.org/jira/browse/SPARK-29552 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Ke Jia AQE will optimize the logical plan once there is query stage finished. So for inner join, when two join side is all small to be the build side. The planner of converting logical plan to physical plan will select the build side as BuildRight if right side finished firstly or BuildLeft if left side finished firstly. In some case, when BuildRight or BuildLeft may introduce additioanl exchange to the parent node. The revert approach in OptimizeLocalShuffleReader rule may be too conservative, which revert all the local shuffle reader when introduce additional exchange not revert the local shuffle reader introduced shuffle. It may be expense to only revert the local shuffle reader introduced shuffle. The workaround is to apply the OptimizeLocalShuffleReader rule again when creating new query stage to further optimize the sub tree shuffle reader to local shuffle reader. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21492) Memory leak in SortMergeJoin
[ https://issues.apache.org/jira/browse/SPARK-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-21492: --- Assignee: Yuanjian Li > Memory leak in SortMergeJoin > > > Key: SPARK-21492 > URL: https://issues.apache.org/jira/browse/SPARK-21492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.3.1, 3.0.0 >Reporter: Zhan Zhang >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > > In SortMergeJoin, if the iterator is not exhausted, there will be memory leak > caused by the Sort. The memory is not released until the task end, and cannot > be used by other operators causing performance drop or OOM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21492) Memory leak in SortMergeJoin
[ https://issues.apache.org/jira/browse/SPARK-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-21492. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26164 [https://github.com/apache/spark/pull/26164] > Memory leak in SortMergeJoin > > > Key: SPARK-21492 > URL: https://issues.apache.org/jira/browse/SPARK-21492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.3.1, 3.0.0 >Reporter: Zhan Zhang >Priority: Major > Fix For: 3.0.0 > > > In SortMergeJoin, if the iterator is not exhausted, there will be memory leak > caused by the Sort. The memory is not released until the task end, and cannot > be used by other operators causing performance drop or OOM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29550) Enhance locking in session catalog
[ https://issues.apache.org/jira/browse/SPARK-29550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikita Gorbachevski updated SPARK-29550: Component/s: (was: Spark Core) > Enhance locking in session catalog > -- > > Key: SPARK-29550 > URL: https://issues.apache.org/jira/browse/SPARK-29550 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: Nikita Gorbachevski >Priority: Minor > > In my streaming application``spark.streaming.concurrentJobs`` is set to 50 > which is used as size for underlying thread pool. I automatically > create/alter tables/view in runtime. I order to do that i invoke ``create ... > if not exists operations`` on driver on each batch invocation. Once i noticed > that most of batch time is spent on driver but not on executors. I did a > thread dump and figured out that most of the threads are blocked on > SessionCatalog waiting for a lock. > Existing implementation of SessionCatalog uses a single lock which is used > almost by all the methods to guard ``currentDb`` and ``tempViews`` variables. > I propose to enhance locking behaviour of SessionCatalog by : > # Employing ReadWriteLock which allows to execute read operations > concurrently. > # Replace synchronized with the corresponding read or write lock. > Also it's possible to go even further and strip locks for ``currentDb`` and > ``tempViews`` but i'm not sure whether it's possible from the implementation > point of view. Probably someone will help me with this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29551) There is a bug about fetch failed when an executor lost
weixiuli created SPARK-29551: Summary: There is a bug about fetch failed when an executor lost Key: SPARK-29551 URL: https://issues.apache.org/jira/browse/SPARK-29551 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.3 Reporter: weixiuli Fix For: 2.4.3 There will be a regression when the executor lost and then causes 'fetch failed'. We can add an unittest in 'DAGSchedulerSuite.scala' to catch the above problem. {code} test("All shuffle files on the slave should be cleaned up when slave lost test") { // reset the test context with the right shuffle service config afterEach() val conf = new SparkConf() conf.set(config.SHUFFLE_SERVICE_ENABLED.key, "true") conf.set("spark.files.fetchFailure.unRegisterOutputOnHost", "true") init(conf) runEvent(ExecutorAdded("exec-hostA1", "hostA")) runEvent(ExecutorAdded("exec-hostA2", "hostA")) runEvent(ExecutorAdded("exec-hostB", "hostB")) val firstRDD = new MyRDD(sc, 3, Nil) val firstShuffleDep = new ShuffleDependency(firstRDD, new HashPartitioner(3)) val firstShuffleId = firstShuffleDep.shuffleId val shuffleMapRdd = new MyRDD(sc, 3, List(firstShuffleDep)) val shuffleDep = new ShuffleDependency(shuffleMapRdd, new HashPartitioner(3)) val secondShuffleId = shuffleDep.shuffleId val reduceRdd = new MyRDD(sc, 1, List(shuffleDep)) submit(reduceRdd, Array(0)) // map stage1 completes successfully, with one task on each executor complete(taskSets(0), Seq( (Success, MapStatus( BlockManagerId("exec-hostA1", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 5)), (Success, MapStatus( BlockManagerId("exec-hostA2", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 6)), (Success, makeMapStatus("hostB", 1, mapTaskId = 7)) )) // map stage2 completes successfully, with one task on each executor complete(taskSets(1), Seq( (Success, MapStatus( BlockManagerId("exec-hostA1", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 8)), (Success, MapStatus( BlockManagerId("exec-hostA2", "hostA", 12345), Array.fill[Long](1)(2), mapTaskId = 9)), (Success, makeMapStatus("hostB", 1, mapTaskId = 10)) )) // make sure our test setup is correct val initialMapStatus1 = mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses // val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get assert(initialMapStatus1.count(_ != null) === 3) assert(initialMapStatus1.map{_.location.executorId}.toSet === Set("exec-hostA1", "exec-hostA2", "exec-hostB")) assert(initialMapStatus1.map{_.mapId}.toSet === Set(5, 6, 7)) val initialMapStatus2 = mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses // val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get assert(initialMapStatus2.count(_ != null) === 3) assert(initialMapStatus2.map{_.location.executorId}.toSet === Set("exec-hostA1", "exec-hostA2", "exec-hostB")) assert(initialMapStatus2.map{_.mapId}.toSet === Set(8, 9, 10)) // kill exec-hostA2 runEvent(ExecutorLost("exec-hostA2", ExecutorKilled)) // reduce stage fails with a fetch failure from map stage from exec-hostA2 complete(taskSets(2), Seq( (FetchFailed(BlockManagerId("exec-hostA2", "hostA", 12345), secondShuffleId, 0L, 0, 0, "ignored"), null) )) // Here is the main assertion -- make sure that we de-register // the map outputs for both map stage from both executors on hostA val mapStatus1 = mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses assert(mapStatus1.count(_ != null) === 1) assert(mapStatus1(2).location.executorId === "exec-hostB") assert(mapStatus1(2).location.host === "hostB") val mapStatus2 = mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses assert(mapStatus2.count(_ != null) === 1) assert(mapStatus2(2).location.executorId === "exec-hostB") assert(mapStatus2(2).location.host === "hostB") } {code} The error output is: {code} 3 did not equal 1 ScalaTestFailureLocation: org.apache.spark.scheduler.DAGSchedulerSuite at (DAGSchedulerSuite.scala:609) Expected :1 Actual :3 org.scalatest.exceptions.TestFailedException: 3 did not equal 1 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29516) Test ThriftServerQueryTestSuite asynchronously
[ https://issues.apache.org/jira/browse/SPARK-29516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-29516: --- Assignee: Yuming Wang > Test ThriftServerQueryTestSuite asynchronously > -- > > Key: SPARK-29516 > URL: https://issues.apache.org/jira/browse/SPARK-29516 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > spark.sql.hive.thriftServer.async=true -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29516) Test ThriftServerQueryTestSuite asynchronously
[ https://issues.apache.org/jira/browse/SPARK-29516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-29516. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26172 [https://github.com/apache/spark/pull/26172] > Test ThriftServerQueryTestSuite asynchronously > -- > > Key: SPARK-29516 > URL: https://issues.apache.org/jira/browse/SPARK-29516 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > spark.sql.hive.thriftServer.async=true -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29550) Enhance locking in session catalog
[ https://issues.apache.org/jira/browse/SPARK-29550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956902#comment-16956902 ] Nikita Gorbachevski commented on SPARK-29550: - Working on this. > Enhance locking in session catalog > -- > > Key: SPARK-29550 > URL: https://issues.apache.org/jira/browse/SPARK-29550 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.4.4 >Reporter: Nikita Gorbachevski >Priority: Minor > > In my streaming application``spark.streaming.concurrentJobs`` is set to 50 > which is used as size for underlying thread pool. I automatically > create/alter tables/view in runtime. I order to do that i invoke ``create ... > if not exists operations`` on driver on each batch invocation. Once i noticed > that most of batch time is spent on driver but not on executors. I did a > thread dump and figured out that most of the threads are blocked on > SessionCatalog waiting for a lock. > Existing implementation of SessionCatalog uses a single lock which is used > almost by all the methods to guard ``currentDb`` and ``tempViews`` variables. > I propose to enhance locking behaviour of SessionCatalog by : > # Employing ReadWriteLock which allows to execute read operations > concurrently. > # Replace synchronized with the corresponding read or write lock. > Also it's possible to go even further and strip locks for ``currentDb`` and > ``tempViews`` but i'm not sure whether it's possible from the implementation > point of view. Probably someone will help me with this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29550) Enhance locking in session catalog
Nikita Gorbachevski created SPARK-29550: --- Summary: Enhance locking in session catalog Key: SPARK-29550 URL: https://issues.apache.org/jira/browse/SPARK-29550 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 2.4.4 Reporter: Nikita Gorbachevski In my streaming application``spark.streaming.concurrentJobs`` is set to 50 which is used as size for underlying thread pool. I automatically create/alter tables/view in runtime. I order to do that i invoke ``create ... if not exists operations`` on driver on each batch invocation. Once i noticed that most of batch time is spent on driver but not on executors. I did a thread dump and figured out that most of the threads are blocked on SessionCatalog waiting for a lock. Existing implementation of SessionCatalog uses a single lock which is used almost by all the methods to guard ``currentDb`` and ``tempViews`` variables. I propose to enhance locking behaviour of SessionCatalog by : # Employing ReadWriteLock which allows to execute read operations concurrently. # Replace synchronized with the corresponding read or write lock. Also it's possible to go even further and strip locks for ``currentDb`` and ``tempViews`` but i'm not sure whether it's possible from the implementation point of view. Probably someone will help me with this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29549) Union of DataSourceV2 datasources leads to duplicated results
Miguel Molina created SPARK-29549: - Summary: Union of DataSourceV2 datasources leads to duplicated results Key: SPARK-29549 URL: https://issues.apache.org/jira/browse/SPARK-29549 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.4, 2.3.3, 2.3.2, 2.3.1, 2.3.0 Reporter: Miguel Molina Hello! I've discovered that when two DataSourceV2 data frames in a query of the exact same shape are joined and there is an aggregation in the query, only the first results are used. The rest get removed by the ReuseExchange rule and reuse the results of the first data frame, leading to N times the first data frame results. I've put together a repository with an example project where this can be reproduced: [https://github.com/erizocosmico/spark-union-issue] Basically, doing this: {code:java} val products = spark.sql("SELECT name, COUNT(*) as count FROM products GROUP BY name") val users = spark.sql("SELECT name, COUNT(*) as count FROM users GROUP BY name") products.union(users) .select("name") .show(truncate = false, numRows = 50){code} Where products is: {noformat} +-+---+ |name |id | +-+---+ |candy |1 | |chocolate|2 | |milk |3 | |cinnamon |4 | |pizza |5 | |pineapple|6 | +-+---+{noformat} And users is: {noformat} +---+---+ |name |id | +---+---+ |andy |1 | |alice |2 | |mike |3 | |mariah |4 | |eleanor|5 | |matthew|6 | +---+---+ {noformat} Results are incorrect: {noformat} +-+ |name | +-+ |candy | |pizza | |chocolate| |cinnamon | |pineapple| |milk | |candy | |pizza | |chocolate| |cinnamon | |pineapple| |milk | +-+{noformat} This is the plan explained: {noformat} == Parsed Logical Plan == 'Project [unresolvedalias('name, None)] +- AnalysisBarrier +- Union :- Aggregate [name#0], [name#0, count(1) AS count#8L] : +- SubqueryAlias products : +- DataSourceV2Relation [name#0, id#1], DefaultReader(List([candy,1], [chocolate,2], [milk,3], [cinnamon,4], [pizza,5], [pineapple,6])) +- Aggregate [name#4], [name#4, count(1) AS count#12L] +- SubqueryAlias users +- DataSourceV2Relation [name#4, id#5], DefaultReader(List([andy,1], [alice,2], [mike,3], [mariah,4], [eleanor,5], [matthew,6])) == Analyzed Logical Plan == name: string Project [name#0] +- Union :- Aggregate [name#0], [name#0, count(1) AS count#8L] : +- SubqueryAlias products : +- DataSourceV2Relation [name#0, id#1], DefaultReader(List([candy,1], [chocolate,2], [milk,3], [cinnamon,4], [pizza,5], [pineapple,6])) +- Aggregate [name#4], [name#4, count(1) AS count#12L] +- SubqueryAlias users +- DataSourceV2Relation [name#4, id#5], DefaultReader(List([andy,1], [alice,2], [mike,3], [mariah,4], [eleanor,5], [matthew,6])) == Optimized Logical Plan == Union :- Aggregate [name#0], [name#0] : +- Project [name#0] : +- DataSourceV2Relation [name#0, id#1], DefaultReader(List([candy,1], [chocolate,2], [milk,3], [cinnamon,4], [pizza,5], [pineapple,6])) +- Aggregate [name#4], [name#4] +- Project [name#4] +- DataSourceV2Relation [name#4, id#5], DefaultReader(List([andy,1], [alice,2], [mike,3], [mariah,4], [eleanor,5], [matthew,6])) == Physical Plan == Union :- *(2) HashAggregate(keys=[name#0], functions=[], output=[name#0]) : +- Exchange hashpartitioning(name#0, 200) : +- *(1) HashAggregate(keys=[name#0], functions=[], output=[name#0]) : +- *(1) Project [name#0] : +- *(1) DataSourceV2Scan [name#0, id#1], DefaultReader(List([candy,1], [chocolate,2], [milk,3], [cinnamon,4], [pizza,5], [pineapple,6])) +- *(4) HashAggregate(keys=[name#4], functions=[], output=[name#4]) +- ReusedExchange [name#4], Exchange hashpartitioning(name#0, 200) {noformat} In the physical plan, the first exchange is reused, but it shouldn't be because both sources are not the same. {noformat} == Physical Plan == Union :- *(2) HashAggregate(keys=[name#0], functions=[], output=[name#0]) : +- Exchange hashpartitioning(name#0, 200) : +- *(1) HashAggregate(keys=[name#0], functions=[], output=[name#0]) : +- *(1) Project [name#0] : +- *(1) DataSourceV2Scan [name#0, id#1], DefaultReader(List([candy,1], [chocolate,2], [milk,3], [cinnamon,4], [pizza,5], [pineapple,6])) +- *(4) HashAggregate(keys=[name#4], functions=[], output=[name#4]) +- ReusedExchange [name#4], Exchange hashpartitioning(name#0, 200){noformat} This seems to be fixed in 2.4.x, but affects, 2.3.x versions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29548) Redirect system print stream to log4j and improve robustness
[ https://issues.apache.org/jira/browse/SPARK-29548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956885#comment-16956885 ] Ching Lin commented on SPARK-29548: --- how about using checkpoint instead of log4j ? > Redirect system print stream to log4j and improve robustness > > > Key: SPARK-29548 > URL: https://issues.apache.org/jira/browse/SPARK-29548 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: jiaan.geng >Priority: Major > > In a production environment, user behavior is highly random and uncertain. > For example: Users use `System.out` or `System.err` to print information. > But the system print stream may cause some trouble, such as: the disk file is > too large. > In my production environment, it causes the disk to be full and let > [NodeManager] works not fine. > A method of threat is to forbid the use of `System.out` or `System.err`. But > unfriendly to the users. > A better method is to redirecting the system print stream to `Log4j` and > Spark can take advantage of `Log4j`'s split log. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29232) RandomForestRegressionModel does not update the parameter maps of the DecisionTreeRegressionModels underneath
[ https://issues.apache.org/jira/browse/SPARK-29232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-29232: Assignee: Huaxin Gao > RandomForestRegressionModel does not update the parameter maps of the > DecisionTreeRegressionModels underneath > - > > Key: SPARK-29232 > URL: https://issues.apache.org/jira/browse/SPARK-29232 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.0 >Reporter: Jiaqi Guo >Assignee: Huaxin Gao >Priority: Major > > We trained a RandomForestRegressionModel, and tried to access the trees. Even > though the DecisionTreeRegressionModel is correctly built with the proper > parameters from random forest, the parameter map is not updated, and still > contains only the default value. > For example, if a RandomForestRegressor was trained with maxDepth of 12, then > accessing the tree information, extractParamMap still returns the default > values, with maxDepth=5. Calling the depth itself of > DecisionTreeRegressionModel returns the correct value of 12 though. > This creates issues when we want to access each individual tree and build the > trees back up for the random forest estimator. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29232) RandomForestRegressionModel does not update the parameter maps of the DecisionTreeRegressionModels underneath
[ https://issues.apache.org/jira/browse/SPARK-29232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-29232. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26154 [https://github.com/apache/spark/pull/26154] > RandomForestRegressionModel does not update the parameter maps of the > DecisionTreeRegressionModels underneath > - > > Key: SPARK-29232 > URL: https://issues.apache.org/jira/browse/SPARK-29232 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.0 >Reporter: Jiaqi Guo >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.0.0 > > > We trained a RandomForestRegressionModel, and tried to access the trees. Even > though the DecisionTreeRegressionModel is correctly built with the proper > parameters from random forest, the parameter map is not updated, and still > contains only the default value. > For example, if a RandomForestRegressor was trained with maxDepth of 12, then > accessing the tree information, extractParamMap still returns the default > values, with maxDepth=5. Calling the depth itself of > DecisionTreeRegressionModel returns the correct value of 12 though. > This creates issues when we want to access each individual tree and build the > trees back up for the random forest estimator. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29540) Thrift in some cases can't parse string to date
[ https://issues.apache.org/jira/browse/SPARK-29540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956863#comment-16956863 ] angerszhu commented on SPARK-29540: --- check on this. > Thrift in some cases can't parse string to date > --- > > Key: SPARK-29540 > URL: https://issues.apache.org/jira/browse/SPARK-29540 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > I'm porting tests from PostgreSQL window.sql but anything related to casting > a string to datetime seems to fail on Thrift. For instance, the following > does not work: > {code:sql} > CREATE TABLE empsalary ( > > depname string, > > empno integer, > > salary int, > > enroll_date date > > ) USING parquet; > INSERT INTO empsalary VALUES ('develop', 10, 5200, '2007-08-01'); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29545) Implement bitwise integer aggregates bit_xor
[ https://issues.apache.org/jira/browse/SPARK-29545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-29545: - Description: {code:java} {code} As we support bit_and, bit_or now, we'd better support the related aggregate function bit_xor ahead of postgreSQL, because many other popular databases support it. [http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbreference/bit-xor-function.html] [https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_bit-or] [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Aggregate/BIT_XOR.htm?TocPath=SQL%20Reference%20Manual%7CSQL%20Functions%7CAggregate%20Functions%7C_10] was: {code:java} {code} As we support bit_and, bot_or now, we'd better support the related aggregate function bit_or ahead of postgreSQL, because many other popular databases support it. [http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbreference/bit-xor-function.html] [https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_bit-or] [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Aggregate/BIT_XOR.htm?TocPath=SQL%20Reference%20Manual%7CSQL%20Functions%7CAggregate%20Functions%7C_10] > Implement bitwise integer aggregates bit_xor > > > Key: SPARK-29545 > URL: https://issues.apache.org/jira/browse/SPARK-29545 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.0.0 > > > {code:java} > {code} > As we support bit_and, bit_or now, we'd better support the related aggregate > function bit_xor ahead of postgreSQL, because many other popular databases > support it. > > [http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbreference/bit-xor-function.html] > [https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_bit-or] > [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Aggregate/BIT_XOR.htm?TocPath=SQL%20Reference%20Manual%7CSQL%20Functions%7CAggregate%20Functions%7C_10] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29548) Redirect system print stream to log4j and improve robustness
jiaan.geng created SPARK-29548: -- Summary: Redirect system print stream to log4j and improve robustness Key: SPARK-29548 URL: https://issues.apache.org/jira/browse/SPARK-29548 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0 Reporter: jiaan.geng In a production environment, user behavior is highly random and uncertain. For example: Users use `System.out` or `System.err` to print information. But the system print stream may cause some trouble, such as: the disk file is too large. In my production environment, it causes the disk to be full and let [NodeManager] works not fine. A method of threat is to forbid the use of `System.out` or `System.err`. But unfriendly to the users. A better method is to redirecting the system print stream to `Log4j` and Spark can take advantage of `Log4j`'s split log. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29547) Make `docker-integration-tests` work in JDK11
Dongjoon Hyun created SPARK-29547: - Summary: Make `docker-integration-tests` work in JDK11 Key: SPARK-29547 URL: https://issues.apache.org/jira/browse/SPARK-29547 Project: Spark Issue Type: Sub-task Components: Tests Affects Versions: 3.0.0 Reporter: Dongjoon Hyun To support JDK11, SPARK-28737 upgraded `Jersey` to 2.29. However, it turns out that `docker-integration-tests` is broken because `com.spotify.docker-client` still depends on jersey-guava. SPARK-29546 recovers the test suite in JDK8 by adding back the dependency. We had better make this test suite work in JDK11 environment, too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org