date:20191022

[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-22 Thread zhao bo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957603#comment-16957603
 ] 

zhao bo commented on SPARK-29106:
-

Hi [~shaneknapp],

Sorry for disturb. I have some questions about the following work want to 
discuss with you. I list them in the following.
 # For pyspark test, you mentioned we didn't install any python debs for 
testing. Is there any "requirements.txt" or "test-requirements.txt" in the 
spark repo? I'm failed to find them. When we test the pyspark before, we just 
realize that we need to install numpy package with pip, because when we exec 
the pyspark test scripts, the fail message noticed us. So you mentioned 
"pyspark testing debs" before, you mean that we should figure all out manually? 
Is there any kind suggest from your side?
 # For sparkR test, we compile a higher R version 3.6.1 by fix many lib 
dependency, and make it work. And exec the R test script, till to all of them 
return pass. So we wonder the difficult about the test when we truelly in 
amplab, could you please share more to us?
 # For current periodic jobs, you said it will be triggered 2 times per day. 
Each build will cost most 11 hours. I have a thought about the next job 
deployment, wish to know your thought about it. My thought is we can setup 2 
jobs per day, one is the current maven UT test triggered by SCM changes(11h), 
the other will run the pyspark and sparkR tests also triggered by SCM 
changes(including spark build and tests, may cost 5-6 hours). How about this? 
We can talk and discuss if we don't realize how difficult to do these now.

Thanks very much, shane. And hope you could reply if you are free. ;)

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29093) Remove automatically generated param setters in _shared_params_code_gen.py

2019-10-22 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-29093:


Assignee: Huaxin Gao

> Remove automatically generated param setters in _shared_params_code_gen.py
> --
>
> Key: SPARK-29093
> URL: https://issues.apache.org/jira/browse/SPARK-29093
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: Huaxin Gao
>Priority: Major
>
> The main difference between scala and py sides come from the automatically 
> generated param setter in _shared_params_code_gen.py.
> To make them in sync, we should remove those setters in _shared_.py, and add 
> the corresponding setters manually.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29093) Remove automatically generated param setters in _shared_params_code_gen.py

2019-10-22 Thread zhengruifeng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957601#comment-16957601
 ] 

zhengruifeng commented on SPARK-29093:
--

[~huaxingao] Thanks!

> Remove automatically generated param setters in _shared_params_code_gen.py
> --
>
> Key: SPARK-29093
> URL: https://issues.apache.org/jira/browse/SPARK-29093
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Priority: Major
>
> The main difference between scala and py sides come from the automatically 
> generated param setter in _shared_params_code_gen.py.
> To make them in sync, we should remove those setters in _shared_.py, and add 
> the corresponding setters manually.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23171) Reduce the time costs of the rule runs that do not change the plans

2019-10-22 Thread Takeshi Yamamuro (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957593#comment-16957593
 ] 

Takeshi Yamamuro commented on SPARK-23171:
--

oh, nice, the performance looks much better.

> Reduce the time costs of the rule runs that do not change the plans 
> 
>
> Key: SPARK-23171
> URL: https://issues.apache.org/jira/browse/SPARK-23171
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>  Labels: bulk-closed
>
> Below is the time stats of Analyzer/Optimizer rules. Try to improve the rules 
> and reduce the time costs, especially for the runs that do not change the 
> plans.
> {noformat}
> === Metrics of Analyzer/Optimizer Rules ===
> Total number of runs = 175827
> Total time: 20.699042877 seconds
> Rule  
>  Total Time Effective Time Total Runs 
> Effective Runs
> org.apache.spark.sql.catalyst.optimizer.ColumnPruning 
>  2340563794 1338268224 1875   
> 761   
> org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution   
>  1632672623 1625071881 788
> 37
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions 
>  1395087131 347339931  1982   
> 38
> org.apache.spark.sql.catalyst.optimizer.PruneFilters  
>  1177711364 21344174   1590   
> 3 
> org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries  
>  1145135465 1131417128 285
> 39
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 
>  1008347217 663112062  1982   
> 616   
> org.apache.spark.sql.catalyst.optimizer.ReorderJoin   
>  767024424  693001699  1590   
> 132   
> org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability
>  598524650  40802876   742
> 12
> org.apache.spark.sql.catalyst.analysis.DecimalPrecision   
>  595384169  436153128  1982   
> 211   
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery   
>  548178270  459695885  1982   
> 49
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts 
>  423002864  139869503  1982   
> 86
> org.apache.spark.sql.catalyst.optimizer.BooleanSimplification 
>  405544962  17250184   1590   
> 7 
> org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin  
>  383837603  284174662  1590   
> 708   
> org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases
>  372901885  33623321590   
> 9 
> org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints   
>  364628214  343815519  285
> 192   
> org.apache.spark.sql.execution.datasources.FindDataSourceTable
>  303293296  285344766  1982   
> 233   
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions  
>  233195019  92648171   1982   
> 294   
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion
>  220568919  73932736   1982   
> 38
> org.apache.spark.sql.catalyst.optimizer.NullPropagation   
>  207976072  9072305

[jira] [Commented] (SPARK-23171) Reduce the time costs of the rule runs that do not change the plans

2019-10-22 Thread Yuming Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957591#comment-16957591
 ] 

Yuming Wang commented on SPARK-23171:
-

This is a real SQL in our production.
 Spark 2.3.4:
{noformat}
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 1602
Total time: 25.87935196 seconds

Rule
  Effective Time / Total Time Effective Runs / Total Runs

org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations
  12560629829 / 12561649545   4 / 21
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions 
  0 / 10442916205 0 / 5
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions
  1655041748 / 1655084280 1 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences   
  217766453 / 256617622   8 / 21
org.apache.spark.sql.catalyst.analysis.DecimalPrecision 
  48636897 / 68610147 4 / 21
org.apache.spark.sql.catalyst.optimizer.ColumnPruning   
  16638517 / 53422588 1 / 15
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion  
  26295695 / 50081268 2 / 21
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts   
  0 / 495189890 / 21
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings  
  24587790 / 49437868 2 / 21
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions
  16488193 / 32838168 8 / 21
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator
  0 / 322903690 / 21
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone  
  18041546 / 29396487 10 / 21
org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability  
  0 / 286502760 / 5
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations  
  0 / 266196050 / 21
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion
  0 / 262065210 / 21
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion  
  0 / 250364120 / 21
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality 
  0 / 248969190 / 21
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division
  0 / 238217250 / 21
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame  
  0 / 226211150 / 21
org.apache.spark.sql.catalyst.analysis.ResolveCreateNamedStruct 
  0 / 223976120 / 21
org.apache.spark.sql.catalyst.analysis.EliminateView
  22255584 / 22286242 1 / 2
org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion 
  0 / 212443510 / 21
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion   
  0 / 210324060 / 21
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion 
  0 / 208345110 / 21
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder  
  0 / 206443710 / 21
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion  
  0 / 200976830 / 21
org.apache.spark.sql.catalyst.analysis.TimeWindowing
  0 / 198999780 / 21
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion
  0 / 198197680 / 21
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery 
  0 / 182571400 / 21
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates
  0 / 173047130 / 21
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints 
  11616056 / 11622509 1 / 2
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin
  5286165 / 8730109   8 / 13
org.ap

[jira] [Updated] (SPARK-29145) Spark SQL cannot handle "NOT IN" condition when using "JOIN"

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29145:
--
Affects Version/s: 2.1.3

> Spark SQL cannot handle "NOT IN" condition when using "JOIN"
> 
>
> Key: SPARK-29145
> URL: https://issues.apache.org/jira/browse/SPARK-29145
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.4
>Reporter: Dezhi Cai
>Priority: Minor
>
> sample sql: 
> {code}
> spark.range(10).createOrReplaceTempView("A")
> spark.range(10).createOrReplaceTempView("B")
> spark.range(10).createOrReplaceTempView("C")
> sql("""select * from A inner join B on A.id=B.id and A.id NOT IN (select id 
> from C)""")
> {code}
>  
> {code}
> org.apache.spark.sql.AnalysisException: Table or view not found: C; line 1 
> pos 74;
> 'Project [*]
> +- 'Join Inner, ((id#0L = id#2L) AND NOT id#0L IN (list#6 []))
>:  +- 'Project ['id]
>: +- 'UnresolvedRelation [C]
>:- SubqueryAlias `a`
>:  +- Range (0, 10, step=1, splits=Some(12))
>+- SubqueryAlias `b`
>   +- Range (0, 10, step=1, splits=Some(12))
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:89)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:155)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:154)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:154)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:154)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:89)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:86)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:120)
> ...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29145) Spark SQL cannot handle "NOT IN" condition when using "JOIN"

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29145:
--
Affects Version/s: 2.2.3

> Spark SQL cannot handle "NOT IN" condition when using "JOIN"
> 
>
> Key: SPARK-29145
> URL: https://issues.apache.org/jira/browse/SPARK-29145
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.3, 2.3.4, 2.4.4
>Reporter: Dezhi Cai
>Priority: Minor
>
> sample sql: 
> {code}
> spark.range(10).createOrReplaceTempView("A")
> spark.range(10).createOrReplaceTempView("B")
> spark.range(10).createOrReplaceTempView("C")
> sql("""select * from A inner join B on A.id=B.id and A.id NOT IN (select id 
> from C)""")
> {code}
>  
> {code}
> org.apache.spark.sql.AnalysisException: Table or view not found: C; line 1 
> pos 74;
> 'Project [*]
> +- 'Join Inner, ((id#0L = id#2L) AND NOT id#0L IN (list#6 []))
>:  +- 'Project ['id]
>: +- 'UnresolvedRelation [C]
>:- SubqueryAlias `a`
>:  +- Range (0, 10, step=1, splits=Some(12))
>+- SubqueryAlias `b`
>   +- Range (0, 10, step=1, splits=Some(12))
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:89)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:155)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:154)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:154)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:154)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:89)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:86)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:120)
> ...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29145) Spark SQL cannot handle "NOT IN" condition when using "JOIN"

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29145:
--
Affects Version/s: 2.3.4

> Spark SQL cannot handle "NOT IN" condition when using "JOIN"
> 
>
> Key: SPARK-29145
> URL: https://issues.apache.org/jira/browse/SPARK-29145
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.4
>Reporter: Dezhi Cai
>Priority: Minor
>
> sample sql: 
> {code}
> spark.range(10).createOrReplaceTempView("A")
> spark.range(10).createOrReplaceTempView("B")
> spark.range(10).createOrReplaceTempView("C")
> sql("""select * from A inner join B on A.id=B.id and A.id NOT IN (select id 
> from C)""")
> {code}
>  
> {code}
> org.apache.spark.sql.AnalysisException: Table or view not found: C; line 1 
> pos 74;
> 'Project [*]
> +- 'Join Inner, ((id#0L = id#2L) AND NOT id#0L IN (list#6 []))
>:  +- 'Project ['id]
>: +- 'UnresolvedRelation [C]
>:- SubqueryAlias `a`
>:  +- Range (0, 10, step=1, splits=Some(12))
>+- SubqueryAlias `b`
>   +- Range (0, 10, step=1, splits=Some(12))
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:89)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:155)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:154)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:154)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:154)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:89)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:86)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:120)
> ...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29551) There is a bug about fetch failed when an executor lost

2019-10-22 Thread weixiuli (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-29551:
-
Description: 
There will be a regression when the executor lost and then causes 'fetch 
failed'.

When an executor lost with some reason (eg:. the external shuffle service or 
host lost on the executor's host ) and the executor loses time happens to be 
reduce stage fetch failed from it which is really bad, the previous only call 
mapOutputTracker.unregisterMapOutput(shuffleId, mapIndex, bmAddress) to mark 
one map as broken in the map stage at this time , but other maps on the 
executor are also not available which can only be resubmitted by a nest retry 
stage which is the regression.

As we all know that the previous will call 
mapOutputTracker.removeOutputsOnHost(host) or
mapOutputTracker.removeOutputsOnExecutor(execId) when reduce stage fetches 
failed and the executor is active, while it does NOT for the above problems.

So we should distinguish the failedEpoch of 'executor lost' from the 
fetchFailedEpoch of 'fetch failed' to solve the above problem.

We can add  an unittest in 'DAGSchedulerSuite.scala'  to catch the above 
problem.

{code}
test("All shuffle files on the slave should be cleaned up when slave lost 
test") {
// reset the test context with the right shuffle service config
afterEach()
val conf = new SparkConf()
conf.set(config.SHUFFLE_SERVICE_ENABLED.key, "true")
conf.set("spark.files.fetchFailure.unRegisterOutputOnHost", "true")
init(conf)
runEvent(ExecutorAdded("exec-hostA1", "hostA"))
runEvent(ExecutorAdded("exec-hostA2", "hostA"))
runEvent(ExecutorAdded("exec-hostB", "hostB"))
val firstRDD = new MyRDD(sc, 3, Nil)
val firstShuffleDep = new ShuffleDependency(firstRDD, new 
HashPartitioner(3))
val firstShuffleId = firstShuffleDep.shuffleId
val shuffleMapRdd = new MyRDD(sc, 3, List(firstShuffleDep))
val shuffleDep = new ShuffleDependency(shuffleMapRdd, new 
HashPartitioner(3))
val secondShuffleId = shuffleDep.shuffleId
val reduceRdd = new MyRDD(sc, 1, List(shuffleDep))
submit(reduceRdd, Array(0))
// map stage1 completes successfully, with one task on each executor
complete(taskSets(0), Seq(
  (Success,
MapStatus(
  BlockManagerId("exec-hostA1", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 5)),
  (Success,
MapStatus(
  BlockManagerId("exec-hostA2", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 6)),
  (Success, makeMapStatus("hostB", 1, mapTaskId = 7))
))
// map stage2 completes successfully, with one task on each executor
complete(taskSets(1), Seq(
  (Success,
MapStatus(
  BlockManagerId("exec-hostA1", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 8)),
  (Success,
MapStatus(
  BlockManagerId("exec-hostA2", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 9)),
  (Success, makeMapStatus("hostB", 1, mapTaskId = 10))
))
// make sure our test setup is correct
val initialMapStatus1 = 
mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses
//  val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get
assert(initialMapStatus1.count(_ != null) === 3)
assert(initialMapStatus1.map{_.location.executorId}.toSet ===
  Set("exec-hostA1", "exec-hostA2", "exec-hostB"))
assert(initialMapStatus1.map{_.mapId}.toSet === Set(5, 6, 7))

val initialMapStatus2 = 
mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses
//  val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get
assert(initialMapStatus2.count(_ != null) === 3)
assert(initialMapStatus2.map{_.location.executorId}.toSet ===
  Set("exec-hostA1", "exec-hostA2", "exec-hostB"))
assert(initialMapStatus2.map{_.mapId}.toSet === Set(8, 9, 10))

// kill exec-hostA2
runEvent(ExecutorLost("exec-hostA2", ExecutorKilled))
// reduce stage fails with a fetch failure from map stage from exec-hostA2
complete(taskSets(2), Seq(
  (FetchFailed(BlockManagerId("exec-hostA2", "hostA", 12345),
secondShuffleId, 0L, 0, 0, "ignored"),
null)
))
// Here is the main assertion -- make sure that we de-register
// the map outputs for both map stage from both executors on hostA
val mapStatus1 = 
mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses
assert(mapStatus1.count(_ != null) === 1)
assert(mapStatus1(2).location.executorId === "exec-hostB")
assert(mapStatus1(2).location.host === "hostB")

val mapStatus2 = 
mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses
assert(mapStatus2.count(_ != null) === 1)
assert(mapStatus2(2).location.executorId === "exec-hostB")
assert(mapStatus2(2).location.host === "hostB")
  }
{code}

The error output is:
{code}

3 did not equal 1
ScalaTestFailureLocation: org.apache.spark.sched

[jira] [Closed] (SPARK-28925) Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and 1.14

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-28925.
-

> Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and 
> 1.14
> 
>
> Key: SPARK-28925
> URL: https://issues.apache.org/jira/browse/SPARK-28925
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Eric
>Priority: Minor
>
> Hello,
> If you use Spark with Kubernetes 1.13 or 1.14 you will see this error:
> {code:java}
> {"time": "2019-08-28T09:56:11.866Z", "lvl":"INFO", "logger": 
> "org.apache.spark.internal.Logging", 
> "thread":"kubernetes-executor-snapshots-subscribers-0","msg":"Going to 
> request 1 executors from Kubernetes."}
> {"time": "2019-08-28T09:56:12.028Z", "lvl":"WARN", "logger": 
> "io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2", 
> "thread":"OkHttp https://kubernetes.default.svc/...","msg":"Exec Failure: 
> HTTP 403, Status: 403 - "}
> java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
> {code}
> Apparently the bug is fixed here: 
> [https://github.com/fabric8io/kubernetes-client/pull/1669]
> We have currently compiled Spark source code with Kubernetes-client 4.4.2 and 
> it's working great on our cluster. We are using Kubernetes 1.13.10.
>  
> Could it be possible to update that dependency version?
>  
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-29071) Upgrade Scala to 2.12.10

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-29071.
-

> Upgrade Scala to 2.12.10
> 
>
> Key: SPARK-29071
> URL: https://issues.apache.org/jira/browse/SPARK-29071
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Major
>
> Supposed to compile another 5-10% faster than the 2.12.8 we're on now:
>  * [https://github.com/scala/scala/releases/tag/v2.12.9]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-29446) Upgrade netty-all to 4.1.42 and fix vulnerabilities.

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-29446.
-

> Upgrade netty-all to 4.1.42 and fix vulnerabilities.
> 
>
> Key: SPARK-29446
> URL: https://issues.apache.org/jira/browse/SPARK-29446
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current code uses io.netty:netty-all:jar:4.1.17 and it will cause a 
> security vulnerabilities. We could get some security info from 
> [https://www.tenable.com/cve/CVE-2019-16869].
> This reference remind to upgrate the version of netty-all to 4.1.42 or later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-29495) Add ability to estimate per

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-29495.
-

> Add ability to estimate per
> ---
>
> Key: SPARK-29495
> URL: https://issues.apache.org/jira/browse/SPARK-29495
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.4.4
>Reporter: Chris Nardi
>Priority: Major
>
> In gensim, [the LDA 
> model|[https://radimrehurek.com/gensim/models/ldamodel.html]] has a parameter 
> eval_every that allows a user to specify that the model should be evaluated 
> every X iterations to determine its log perplexity. This helps to determine 
> convergence of the model, and whether or not the proper number of iterations 
> has been chosen. Spark has no similar functionality in its implementation of 
> LDA. This should be added, as it appears the only way to achieve this 
> functionality would be to train models of varying numbers of iterations and 
> evaluate each's log perplexity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27741) Transitivity on predicate pushdown

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27741:
--
Fix Version/s: (was: 2.4.0)

> Transitivity on predicate pushdown 
> ---
>
> Key: SPARK-27741
> URL: https://issues.apache.org/jira/browse/SPARK-27741
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: U Shaw
>Priority: Major
>
> When using inner join, where conditions can be passed to join on, and when 
> using outer join, even if the conditions are the same, only the predicate is 
> pushed down to left or right.
> As follows:
> select * from t1 left join t2 on t1.id=t2.id where t1.id=1
> --> select * from t1 left join on t1.id=t2.id and t2.id=1 where t1.id=1
> Is Catalyst can support transitivity on predicate pushdown ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29193) Update fabric8 version to 4.3 continue docker 4 desktop support

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29193:
--
Fix Version/s: (was: 3.0.0)

> Update fabric8 version to 4.3 continue docker 4 desktop support
> ---
>
> Key: SPARK-29193
> URL: https://issues.apache.org/jira/browse/SPARK-29193
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Holden Karau
>Priority: Blocker
>
> The current version of the kubernetes client we are using has some issues 
> with not setting origin ( 
> [https://github.com/fabric8io/kubernetes-client/issues/1667] ) which cause 
> failures on new versions of Docker 4 Desktop Kubernetes.
>  
> This is fixed in 4.3-snapshot, so we will need to wait for the 4.3 release or 
> backport this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-29547) Make `docker-integration-tests` work in JDK11

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-29547.
-

> Make `docker-integration-tests` work in JDK11
> -
>
> Key: SPARK-29547
> URL: https://issues.apache.org/jira/browse/SPARK-29547
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> To support JDK11,  SPARK-28737 upgraded `Jersey` to 2.29. However, it turns 
> out that `docker-integration-tests` is broken because 
> `com.spotify.docker-client` still depends on jersey-guava.
> SPARK-29546 recovers the test suite in JDK8 by adding back the dependency. We 
> had better make this test suite work in JDK11 environment, too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29546) Recover jersey-guava test dependency in docker-integration-tests

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29546:
-

Assignee: Dongjoon Hyun

> Recover jersey-guava test dependency in docker-integration-tests
> 
>
> Key: SPARK-29546
> URL: https://issues.apache.org/jira/browse/SPARK-29546
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> While SPARK-28737 upgrades `Jersey` to 2.29, `docker-integration-tests` is 
> broken because `com.spotify.docker-client` depends on `jersey-guava`. The 
> latest `com.spotify.docker-client` is also still depending on that, too.
> - https://mvnrepository.com/artifact/com.spotify/docker-client/5.0.2
>   -> 
> https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-client/2.19
> -> 
> https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-common/2.19
>   -> 
> https://mvnrepository.com/artifact/org.glassfish.jersey.bundles.repackaged/jersey-guava/2.19
> **AFTER**
> {code}
> build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 
> -Dtest=none 
> -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite test
> Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
> All tests passed.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29546) Recover jersey-guava test dependency in docker-integration-tests

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29546:
--
Description: 
While SPARK-28737 upgrades `Jersey` to 2.29, `docker-integration-tests` is 
broken because `com.spotify.docker-client` depends on `jersey-guava`. The 
latest `com.spotify.docker-client` is also still depending on that, too.

- https://mvnrepository.com/artifact/com.spotify/docker-client/5.0.2
  -> 
https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-client/2.19
-> 
https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-common/2.19
  -> 
https://mvnrepository.com/artifact/org.glassfish.jersey.bundles.repackaged/jersey-guava/2.19

**AFTER**
{code}
build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 
-Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite 
test

Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
{code}

  was:
While SPARK-28737 upgrades `Jersey` to 2.29, `docker-integration-tests` is 
broken because `com.spotify.docker-client` depends on `jersey-guava`. The 
latest `com.spotify.docker-client` is also still depending on that, too.

- https://mvnrepository.com/artifact/com.spotify/docker-client/5.0.2
  -> 
https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-client/2.19
-> 
https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-common/2.19
  -> 
https://mvnrepository.com/artifact/org.glassfish.jersey.bundles.repackaged/jersey-guava/2.19

**AFTER**
{code}
build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 
-Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite 
test

Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
{code}

This is only for recovering JDBC integration test in JDK8 environment. For now, 
this is broken in both JDK8/11. After fixing, this will not work in JDK11.


> Recover jersey-guava test dependency in docker-integration-tests
> 
>
> Key: SPARK-29546
> URL: https://issues.apache.org/jira/browse/SPARK-29546
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> While SPARK-28737 upgrades `Jersey` to 2.29, `docker-integration-tests` is 
> broken because `com.spotify.docker-client` depends on `jersey-guava`. The 
> latest `com.spotify.docker-client` is also still depending on that, too.
> - https://mvnrepository.com/artifact/com.spotify/docker-client/5.0.2
>   -> 
> https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-client/2.19
> -> 
> https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-common/2.19
>   -> 
> https://mvnrepository.com/artifact/org.glassfish.jersey.bundles.repackaged/jersey-guava/2.19
> **AFTER**
> {code}
> build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 
> -Dtest=none 
> -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite test
> Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
> All tests passed.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29547) Make `docker-integration-tests` work in JDK11

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29547.
---
Resolution: Duplicate

This is fixed by SPARK-29546

> Make `docker-integration-tests` work in JDK11
> -
>
> Key: SPARK-29547
> URL: https://issues.apache.org/jira/browse/SPARK-29547
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> To support JDK11,  SPARK-28737 upgraded `Jersey` to 2.29. However, it turns 
> out that `docker-integration-tests` is broken because 
> `com.spotify.docker-client` still depends on jersey-guava.
> SPARK-29546 recovers the test suite in JDK8 by adding back the dependency. We 
> had better make this test suite work in JDK11 environment, too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29546) Recover jersey-guava test dependency in docker-integration-tests

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29546:
--
Parent: SPARK-29194
Issue Type: Sub-task  (was: Bug)

> Recover jersey-guava test dependency in docker-integration-tests
> 
>
> Key: SPARK-29546
> URL: https://issues.apache.org/jira/browse/SPARK-29546
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> While SPARK-28737 upgrades `Jersey` to 2.29, `docker-integration-tests` is 
> broken because `com.spotify.docker-client` depends on `jersey-guava`. The 
> latest `com.spotify.docker-client` is also still depending on that, too.
> - https://mvnrepository.com/artifact/com.spotify/docker-client/5.0.2
>   -> 
> https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-client/2.19
> -> 
> https://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-common/2.19
>   -> 
> https://mvnrepository.com/artifact/org.glassfish.jersey.bundles.repackaged/jersey-guava/2.19
> **AFTER**
> {code}
> build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 
> -Dtest=none 
> -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite test
> Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
> All tests passed.
> {code}
> This is only for recovering JDBC integration test in JDK8 environment. For 
> now, this is broken in both JDK8/11. After fixing, this will not work in 
> JDK11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29511) DataSourceV2: Support CREATE NAMESPACE

2019-10-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29511:
---

Assignee: Terry Kim

> DataSourceV2: Support CREATE NAMESPACE
> --
>
> Key: SPARK-29511
> URL: https://issues.apache.org/jira/browse/SPARK-29511
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
>
> CREATE NAMESPACE needs to support v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29511) DataSourceV2: Support CREATE NAMESPACE

2019-10-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29511.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26166
[https://github.com/apache/spark/pull/26166]

> DataSourceV2: Support CREATE NAMESPACE
> --
>
> Key: SPARK-29511
> URL: https://issues.apache.org/jira/browse/SPARK-29511
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>
> CREATE NAMESPACE needs to support v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29563) CREATE TABLE LIKE should look up catalog/table like v2 commands

2019-10-22 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-29563:
---

Assignee: Dilip Biswal  (was: Wenchen Fan)

> CREATE TABLE LIKE should look up catalog/table like v2 commands
> ---
>
> Key: SPARK-29563
> URL: https://issues.apache.org/jira/browse/SPARK-29563
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-22 Thread huangtianhua (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957508#comment-16957508
 ] 

huangtianhua commented on SPARK-29106:
--

[~shaneknapp]，there are two small suggestions:
 # we don't have to download and install leveldbjni-all-1.8 in our arm test 
instance, we have installed it and it was there.
 # maybe we can try to use 'mvn clean package ' instead of 'mvn clean 
install '?

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-20880) When spark SQL is used with Avro-backed HIVE tables, NPE from org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.

2019-10-22 Thread Benjamyn Ward (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-20880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957491#comment-16957491
 ] 

Benjamyn Ward edited comment on SPARK-20880 at 10/23/19 2:05 AM:
-

Gentle ping. While the description states that the issue is fixed in Hive 2.2, 
based on the Hive Jira, the issue was fixed in version 2.3.0.
 * https://issues.apache.org/jira/browse/HIVE-16175

I am also running into this issue. I am going to try to work around the issue 
by using the **extraClassPath** that includes Hive SerDe 2.3.x, but I'm not 
sure if this will work or not. A much better solution would be to upgrade 
Spark's library dependencies.


was (Author: errorsandglitches):
Gentle ping. While the description states that the issue is fixed in Hive 2.2, 
based on the Hive Jira, the issue was fixed in version 2.3.

* https://issues.apache.org/jira/browse/HIVE-16175

I am also running into this issue. I am going to try to work around the issue 
by using the **extraClassPath** that includes Hive SerDe 2.3.x, but I'm not 
sure if this will work or not. A much better solution would be to upgrade 
Spark's library dependencies.

> When spark SQL is used with  Avro-backed HIVE tables,  NPE from 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.
> 
>
> Key: SPARK-20880
> URL: https://issues.apache.org/jira/browse/SPARK-20880
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Vinod KC
>Priority: Minor
>
> When spark SQL is used with  Avro-backed HIVE tables,  intermittently getting 
> NPE from 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.
> Root cause is due race condition in hive 1.2.1 jar used in Spark SQL .
> In HIVE 2.2 this issue has been fixed (HIVE JIRA: 
> https://issues.apache.org/jira/browse/HIVE-16175. ), since  Spark is still 
> using Hive 1.2.1 jars we are still getting into race condition.
> One workaround  is to run Spark with a single task per executor, however it 
> will slow down the jobs. 
> Exception stack trace
> 13/05/07 09:18:39 WARN scheduler.TaskSetManager: Lost task 18.0 in stage 0.0 
> (TID 18, aiyhyashu.dxc.com): java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories(AvroObjectInspectorGenerator.java:142)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:91)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:120)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspector(AvroObjectInspectorGenerator.java:83)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.(AvroObjectInspectorGenerator.java:56)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:124)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$10.apply(TableReader.scala:251)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$10.apply(TableReader.scala:239)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache

[jira] [Commented] (SPARK-20880) When spark SQL is used with Avro-backed HIVE tables, NPE from org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.

2019-10-22 Thread Benjamyn Ward (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-20880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957491#comment-16957491
 ] 

Benjamyn Ward commented on SPARK-20880:
---

Gentle ping. While the description states that the issue is fixed in Hive 2.2, 
based on the Hive Jira, the issue was fixed in version 2.3.

* https://issues.apache.org/jira/browse/HIVE-16175

I am also running into this issue. I am going to try to work around the issue 
by using the **extraClassPath** that includes Hive SerDe 2.3.x, but I'm not 
sure if this will work or not. A much better solution would be to upgrade 
Spark's library dependencies.

> When spark SQL is used with  Avro-backed HIVE tables,  NPE from 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.
> 
>
> Key: SPARK-20880
> URL: https://issues.apache.org/jira/browse/SPARK-20880
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Vinod KC
>Priority: Minor
>
> When spark SQL is used with  Avro-backed HIVE tables,  intermittently getting 
> NPE from 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories.
> Root cause is due race condition in hive 1.2.1 jar used in Spark SQL .
> In HIVE 2.2 this issue has been fixed (HIVE JIRA: 
> https://issues.apache.org/jira/browse/HIVE-16175. ), since  Spark is still 
> using Hive 1.2.1 jars we are still getting into race condition.
> One workaround  is to run Spark with a single task per executor, however it 
> will slow down the jobs. 
> Exception stack trace
> 13/05/07 09:18:39 WARN scheduler.TaskSetManager: Lost task 18.0 in stage 0.0 
> (TID 18, aiyhyashu.dxc.com): java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories(AvroObjectInspectorGenerator.java:142)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:91)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:104)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:120)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspector(AvroObjectInspectorGenerator.java:83)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.(AvroObjectInspectorGenerator.java:56)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:124)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$10.apply(TableReader.scala:251)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$10.apply(TableReader.scala:239)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache

[jira] [Resolved] (SPARK-29107) Add window.sql - Part 1

2019-10-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-29107.
--
Resolution: Fixed

Issue resolved by pull request 26119
[https://github.com/apache/spark/pull/26119]

> Add window.sql - Part 1
> ---
>
> Key: SPARK-29107
> URL: https://issues.apache.org/jira/browse/SPARK-29107
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Assignee: Dylan Guedes
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L1-L319



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23160) Port window.sql

2019-10-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-23160.
--
Resolution: Duplicate

> Port window.sql
> ---
>
> Key: SPARK-23160
> URL: https://issues.apache.org/jira/browse/SPARK-23160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Xingbo Jiang
>Priority: Minor
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/window.sql.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23160) Port window.sql

2019-10-22 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957481#comment-16957481
 ] 

Hyukjin Kwon commented on SPARK-23160:
--

This JIRA will be resolved at SPARK-29107 SPARK-29108 SPARK-29109 SPARK-29110

> Port window.sql
> ---
>
> Key: SPARK-23160
> URL: https://issues.apache.org/jira/browse/SPARK-23160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Xingbo Jiang
>Priority: Minor
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/window.sql.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29553) This problemis about using native BLAS to improvement ML/MLLIB performance

2019-10-22 Thread WuZeyi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WuZeyi updated SPARK-29553:
---
Description: 
I use {color:#ff}native BLAS{color} to improvement ML/MLLIB performance on 
Yarn.

The file {color:#ff}spark-env.sh{color} which is modified by SPARK-21305 
said that I should set {color:#ff}OPENBLAS_NUM_THREADS=1{color} to disable 
multi-threading of OpenBLAS, but it does not take effect.

I modify {color:#ff}spark.conf{color} to set  
{color:#FF}spark.executorEnv.OPENBLAS_NUM_THREADS=1{color}，and the 
performance improve.
  
  
 I think MKL_NUM_THREADS is the same.
  

  was:
I use {color:#FF}native BLAS{color} to improvement ML/MLLIB performance on 
Yarn.

The file {color:#FF}spark-env.sh{color} which is modified by [SPARK-21305] 
said that I should set {color:#FF}OPENBLAS_NUM_THREADS=1{color} to disable 
multi-threading of OpenBLAS, but it does not take effect.

I modify {color:#FF}spark.conf{color} to set  OPENBLAS_NUM_THREADS=1，and 
the performance improve.
 
 
I think MKL_NUM_THREADS is the same.
 


> This problemis about using native BLAS to improvement ML/MLLIB performance
> --
>
> Key: SPARK-29553
> URL: https://issues.apache.org/jira/browse/SPARK-29553
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.3.0, 2.4.4
>Reporter: WuZeyi
>Priority: Major
>  Labels: performance
>
> I use {color:#ff}native BLAS{color} to improvement ML/MLLIB performance 
> on Yarn.
> The file {color:#ff}spark-env.sh{color} which is modified by SPARK-21305 
> said that I should set {color:#ff}OPENBLAS_NUM_THREADS=1{color} to 
> disable multi-threading of OpenBLAS, but it does not take effect.
> I modify {color:#ff}spark.conf{color} to set  
> {color:#FF}spark.executorEnv.OPENBLAS_NUM_THREADS=1{color}，and the 
> performance improve.
>   
>   
>  I think MKL_NUM_THREADS is the same.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-29488) In Web UI, stage page has js error when sort table.

2019-10-22 Thread jenny (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957470#comment-16957470
 ] 

jenny edited comment on SPARK-29488 at 10/23/19 1:07 AM:
-

Thank you  [~srowen]  for review.


was (Author: cjn082030):
Thank you  
[srowen|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=srowen]  
for review.

> In Web UI, stage page has js error when sort table.
> ---
>
> Key: SPARK-29488
> URL: https://issues.apache.org/jira/browse/SPARK-29488
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2, 2.4.4
>Reporter: jenny
>Assignee: jenny
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: image-2019-10-16-15-47-25-212.png
>
>
> In Web UI, follow the steps below, get js error "Uncaught TypeError: Failed 
> to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'.".
>  # Click "Summary Metrics..." 's tablehead "Min"
>  # Click "Aggregated Metrics by Executor" 's tablehead "Task Time"
>  # Click "Summary Metrics..." 's tablehead "Min"（the same as step 1.）
>   !image-2019-10-16-15-47-25-212.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-29488) In Web UI, stage page has js error when sort table.

2019-10-22 Thread jenny (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957470#comment-16957470
 ] 

jenny edited comment on SPARK-29488 at 10/23/19 1:06 AM:
-

Thank you  
[srowen|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=srowen]  
for review.


was (Author: cjn082030):
Thank you https://issues.apache.org/jira/secure/ViewProfile.jspa?name=srowen  
for review.

> In Web UI, stage page has js error when sort table.
> ---
>
> Key: SPARK-29488
> URL: https://issues.apache.org/jira/browse/SPARK-29488
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2, 2.4.4
>Reporter: jenny
>Assignee: jenny
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: image-2019-10-16-15-47-25-212.png
>
>
> In Web UI, follow the steps below, get js error "Uncaught TypeError: Failed 
> to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'.".
>  # Click "Summary Metrics..." 's tablehead "Min"
>  # Click "Aggregated Metrics by Executor" 's tablehead "Task Time"
>  # Click "Summary Metrics..." 's tablehead "Min"（the same as step 1.）
>   !image-2019-10-16-15-47-25-212.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29488) In Web UI, stage page has js error when sort table.

2019-10-22 Thread jenny (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957470#comment-16957470
 ] 

jenny commented on SPARK-29488:
---

Thank you https://issues.apache.org/jira/secure/ViewProfile.jspa?name=srowen  
for review.

> In Web UI, stage page has js error when sort table.
> ---
>
> Key: SPARK-29488
> URL: https://issues.apache.org/jira/browse/SPARK-29488
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2, 2.4.4
>Reporter: jenny
>Assignee: jenny
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: image-2019-10-16-15-47-25-212.png
>
>
> In Web UI, follow the steps below, get js error "Uncaught TypeError: Failed 
> to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'.".
>  # Click "Summary Metrics..." 's tablehead "Min"
>  # Click "Aggregated Metrics by Executor" 's tablehead "Task Time"
>  # Click "Summary Metrics..." 's tablehead "Min"（the same as step 1.）
>   !image-2019-10-16-15-47-25-212.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29563) CREATE TABLE LIKE should look up catalog/table like v2 commands

2019-10-22 Thread Dilip Biswal (Jira)

Dilip Biswal created SPARK-29563:


 Summary: CREATE TABLE LIKE should look up catalog/table like v2 
commands
 Key: SPARK-29563
 URL: https://issues.apache.org/jira/browse/SPARK-29563
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Dilip Biswal
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29562) SQLAppStatusListener metrics aggregation is slow and memory hungry

2019-10-22 Thread Marcelo Masiero Vanzin (Jira)

Marcelo Masiero Vanzin created SPARK-29562:
--

 Summary: SQLAppStatusListener metrics aggregation is slow and 
memory hungry
 Key: SPARK-29562
 URL: https://issues.apache.org/jira/browse/SPARK-29562
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Marcelo Masiero Vanzin


While {{SQLAppStatusListener}} was added in 2.3, the aggregation code is very 
similar to what it was previously, so I'm sure this is even older.

Long story short, the aggregation code 
({{SQLAppStatusListener.aggregateMetrics}}) is very, very slow, and can take a 
non-trivial amount of time with large queries, aside from using a ton of memory.

There are also cascading issues caused by that: since it's called from an event 
handler, it can slow down event processing, causing events to be dropped, which 
can cause listeners to miss important events that would tell them to free up 
internal state (and, thus, memory).

To given an anecdotal example, one app I looked at ran into the "events being 
dropped" issue, which caused the listener to accumulate state for 100s of live 
stages, even though most of them were already finished. That lead to a few GB 
of memory being wasted due to finished stages that were still being tracked.

Here, though, I'd like to focus on {{SQLAppStatusListener.aggregateMetrics}} 
and making it faster. We should look at the other issues (unblocking event 
processing, cleaning up of stale data in listeners) separately.

(I also remember someone in the past trying to fix something in this area, but 
couldn't find a PR nor an open bug.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29561) Large Case Statement Code Generation OOM

2019-10-22 Thread Michael Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Chen updated SPARK-29561:
-
Description: 
Spark Configuration

spark.driver.memory = 1g
 spark.master = "local"
 spark.deploy.mode = "client"

Try to execute a case statement with 3000+ branches. Added sql statement as 
attachment
 Spark runs for a while before it OOM
{noformat}
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
at 
org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73)
19/10/22 16:19:54 ERROR FileFormatWriter: Aborting job null.
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.newNode(HashMap.java:1750)
at java.util.HashMap.putVal(HashMap.java:631)
at java.util.HashMap.putMapEntries(HashMap.java:515)
at java.util.HashMap.putAll(HashMap.java:785)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3345)
at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:212)
at 
org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3230)
at 
org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3198)
at 
org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3351)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3254)
at org.codehaus.janino.UnitCompiler.access$3900(UnitCompiler.java:212)
at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3216)
at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3198)
at org.codehaus.janino.Java$Block.accept(Java.java:2756)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3260)
at org.codehaus.janino.UnitCompiler.access$4000(UnitCompiler.java:212)
at 
org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3217)
at 
org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3198)
at org.codehaus.janino.Java$DoStatement.accept(Java.java:3304)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3186)
at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3009)
at 
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336)
at 
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309)
at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799)
at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958)
at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212)
at 
org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393)
at 
org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385)
at 
org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286)
19/10/22 16:19:54 ERROR Utils: throw uncaught fatal error in thread Spark 
Context Cleaner
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
at 
org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
at 
org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73){noformat}
 Generated code looks like
{noformat}
/* 029 */   private void project_doConsume(InternalRow scan_row, UTF8String 
project_expr_0, boolean project_exprIsNull_0) throws java.io.IOException {
/* 030 */     byte project_caseWhenResultState = -1;
/* 031 */     do {
/* 032 */       boolean project_isNull1 = true;
/* 033 */       boolean project_value1 = false;
/* 034 */
/* 035 */       boolean project_isNull2 = project_exprIsNull_0;
/* 036 */       int project_value2 = -1;
/* 037 */       if (!project_exprIsNull_0) {
/* 038 */         UTF8String.IntWrapper project_intWrapper = new 
UTF8String.IntWrapper();
/* 039 */         if (project_expr_0.toInt(project_intWrapper)) {
/* 040 */           project_value2 = project_intWrappe

[jira] [Updated] (SPARK-29560) Add typesafe bintray repo for sbt-mima-plugin

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29560:
--
Summary: Add typesafe bintray repo for sbt-mima-plugin  (was: 
sbt-mima-plugin is missing)

> Add typesafe bintray repo for sbt-mima-plugin
> -
>
> Key: SPARK-29560
> URL: https://issues.apache.org/jira/browse/SPARK-29560
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 2.4.5, 3.0.0
>
>
> GitHub Action detects the following from yesterday (Oct 21, 2019).
> - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing.
> - `master`: `sbt-mima-plugin:0.3.0` is missing.
> These versions of `sbt-mima-plugin` seems to be removed from the old repo. We 
> need to change the repo location or upgrade this.
> {code}
> ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/
> ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle
> Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as 
> default JAVA_HOME.
> Note, this will be overridden by -java-home if it is set.
> Attempting to fetch sbt
> Launching sbt from build/sbt-launch-0.13.17.jar
> [info] Loading project definition from 
> /Users/dongjoon/APACHE/spark-merge/project
> [info] Updating 
> {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build...
> [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ...
> [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17
> [warn]  typesafe-ivy-releases: tried
> [warn]   
> https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  sbt-plugin-releases: tried
> [warn]   
> https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  local: tried
> [warn]   
> /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> [warn]  local-preloaded-ivy: tried
> [warn]   
> /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml
> [warn]  local-preloaded: tried
> [warn]   
> file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found
> [warn]::
> [warn]
> [warn]Note: Some unresolved dependencies have extra attributes.  
> Check that these dependencies exist with the requested attributes.
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13)
> [warn]
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13) 
> (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19)
> [warn]  +- default:spark-merge-build:0.1-SNAPSHOT 
> (scalaVersion=2.10, sbtVersion=0.13)
> sbt.ResolveException: unresolved dependency: 
> com.typesafe#sbt-mima-plugin;0.1.17: not found
> {code}
> This breaks our Jenkins in `branch-2.4` now.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.4-test-sbt-hadoop-2.6/611/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29560) sbt-mima-plugin is missing

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29560:
-

Assignee: Dongjoon Hyun

> sbt-mima-plugin is missing
> --
>
> Key: SPARK-29560
> URL: https://issues.apache.org/jira/browse/SPARK-29560
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
>
> GitHub Action detects the following from yesterday (Oct 21, 2019).
> - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing.
> - `master`: `sbt-mima-plugin:0.3.0` is missing.
> These versions of `sbt-mima-plugin` seems to be removed from the old repo. We 
> need to change the repo location or upgrade this.
> {code}
> ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/
> ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle
> Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as 
> default JAVA_HOME.
> Note, this will be overridden by -java-home if it is set.
> Attempting to fetch sbt
> Launching sbt from build/sbt-launch-0.13.17.jar
> [info] Loading project definition from 
> /Users/dongjoon/APACHE/spark-merge/project
> [info] Updating 
> {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build...
> [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ...
> [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17
> [warn]  typesafe-ivy-releases: tried
> [warn]   
> https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  sbt-plugin-releases: tried
> [warn]   
> https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  local: tried
> [warn]   
> /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> [warn]  local-preloaded-ivy: tried
> [warn]   
> /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml
> [warn]  local-preloaded: tried
> [warn]   
> file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found
> [warn]::
> [warn]
> [warn]Note: Some unresolved dependencies have extra attributes.  
> Check that these dependencies exist with the requested attributes.
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13)
> [warn]
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13) 
> (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19)
> [warn]  +- default:spark-merge-build:0.1-SNAPSHOT 
> (scalaVersion=2.10, sbtVersion=0.13)
> sbt.ResolveException: unresolved dependency: 
> com.typesafe#sbt-mima-plugin;0.1.17: not found
> {code}
> This breaks our Jenkins in `branch-2.4` now.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.4-test-sbt-hadoop-2.6/611/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29560) sbt-mima-plugin is missing

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29560.
---
Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 26217
[https://github.com/apache/spark/pull/26217]

> sbt-mima-plugin is missing
> --
>
> Key: SPARK-29560
> URL: https://issues.apache.org/jira/browse/SPARK-29560
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 2.4.5, 3.0.0
>
>
> GitHub Action detects the following from yesterday (Oct 21, 2019).
> - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing.
> - `master`: `sbt-mima-plugin:0.3.0` is missing.
> These versions of `sbt-mima-plugin` seems to be removed from the old repo. We 
> need to change the repo location or upgrade this.
> {code}
> ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/
> ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle
> Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as 
> default JAVA_HOME.
> Note, this will be overridden by -java-home if it is set.
> Attempting to fetch sbt
> Launching sbt from build/sbt-launch-0.13.17.jar
> [info] Loading project definition from 
> /Users/dongjoon/APACHE/spark-merge/project
> [info] Updating 
> {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build...
> [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ...
> [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17
> [warn]  typesafe-ivy-releases: tried
> [warn]   
> https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  sbt-plugin-releases: tried
> [warn]   
> https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  local: tried
> [warn]   
> /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> [warn]  local-preloaded-ivy: tried
> [warn]   
> /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml
> [warn]  local-preloaded: tried
> [warn]   
> file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found
> [warn]::
> [warn]
> [warn]Note: Some unresolved dependencies have extra attributes.  
> Check that these dependencies exist with the requested attributes.
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13)
> [warn]
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13) 
> (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19)
> [warn]  +- default:spark-merge-build:0.1-SNAPSHOT 
> (scalaVersion=2.10, sbtVersion=0.13)
> sbt.ResolveException: unresolved dependency: 
> com.typesafe#sbt-mima-plugin;0.1.17: not found
> {code}
> This breaks our Jenkins in `branch-2.4` now.
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.4-test-sbt-hadoop-2.6/611/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29561) Large Case Statement Code Generation OOM

2019-10-22 Thread Michael Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Chen updated SPARK-29561:
-
Attachment: apacheSparkCase.sql

> Large Case Statement Code Generation OOM
> 
>
> Key: SPARK-29561
> URL: https://issues.apache.org/jira/browse/SPARK-29561
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Michael Chen
>Priority: Major
> Attachments: apacheSparkCase.sql
>
>
> Spark Configuration
> spark.driver.memory = 1g
>  spark.master = "local"
>  spark.deploy.mode = "client"
> Try to execute a case statement with 3000+ branches.
>  Spark runs for a while before it OOM
> {noformat}
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>   at 
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
>   at 
> org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
>   at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73)
> 19/10/22 16:19:54 ERROR FileFormatWriter: Aborting job null.
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>   at java.util.HashMap.newNode(HashMap.java:1750)
>   at java.util.HashMap.putVal(HashMap.java:631)
>   at java.util.HashMap.putMapEntries(HashMap.java:515)
>   at java.util.HashMap.putAll(HashMap.java:785)
>   at 
> org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3345)
>   at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:212)
>   at 
> org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3230)
>   at 
> org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3198)
>   at 
> org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3351)
>   at 
> org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
>   at 
> org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3254)
>   at org.codehaus.janino.UnitCompiler.access$3900(UnitCompiler.java:212)
>   at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3216)
>   at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3198)
>   at org.codehaus.janino.Java$Block.accept(Java.java:2756)
>   at 
> org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
>   at 
> org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3260)
>   at org.codehaus.janino.UnitCompiler.access$4000(UnitCompiler.java:212)
>   at 
> org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3217)
>   at 
> org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3198)
>   at org.codehaus.janino.Java$DoStatement.accept(Java.java:3304)
>   at 
> org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
>   at 
> org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3186)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3009)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958)
>   at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385)
>   at 
> org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286)
> 19/10/22 16:19:54 ERROR Utils: throw uncaught fatal error in thread Spark 
> Context Cleaner
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>   at 
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
>   at 
> org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
>   at 
> org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73){noformat}
>  Generated code looks like
> {noformat}
> /* 029 */   private void project_doConsume(InternalRow scan_row, UTF8String 
> project_expr_0, boolean project_exprIsNull_0) throws java.io.IOException {
> /* 030 */     byte project_caseWhenResultState = -1;
> /*

[jira] [Updated] (SPARK-29561) Large Case Statement Code Generation OOM

2019-10-22 Thread Michael Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Chen updated SPARK-29561:
-
Description: 
Spark Configuration

spark.driver.memory = 1g
 spark.master = "local"
 spark.deploy.mode = "client"

Try to execute a case statement with 3000+ branches. Added sql statement as 
attachment
 Spark runs for a while before it OOM
{noformat}
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
at 
org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73)
19/10/22 16:19:54 ERROR FileFormatWriter: Aborting job null.
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.newNode(HashMap.java:1750)
at java.util.HashMap.putVal(HashMap.java:631)
at java.util.HashMap.putMapEntries(HashMap.java:515)
at java.util.HashMap.putAll(HashMap.java:785)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3345)
at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:212)
at 
org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3230)
at 
org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3198)
at 
org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3351)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3254)
at org.codehaus.janino.UnitCompiler.access$3900(UnitCompiler.java:212)
at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3216)
at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3198)
at org.codehaus.janino.Java$Block.accept(Java.java:2756)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3260)
at org.codehaus.janino.UnitCompiler.access$4000(UnitCompiler.java:212)
at 
org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3217)
at 
org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3198)
at org.codehaus.janino.Java$DoStatement.accept(Java.java:3304)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3186)
at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3009)
at 
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336)
at 
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309)
at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799)
at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958)
at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212)
at 
org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393)
at 
org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385)
at 
org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286)
19/10/22 16:19:54 ERROR Utils: throw uncaught fatal error in thread Spark 
Context Cleaner
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
at 
org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
at 
org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73){noformat}
 Generated code looks like
{noformat}
/* 029 */   private void project_doConsume(InternalRow scan_row, UTF8String 
project_expr_0, boolean project_exprIsNull_0) throws java.io.IOException {
/* 030 */     byte project_caseWhenResultState = -1;
/* 031 */     do {
/* 032 */       boolean project_isNull1 = true;
/* 033 */       boolean project_value1 = false;
/* 034 */
/* 035 */       boolean project_isNull2 = project_exprIsNull_0;
/* 036 */       int project_value2 = -1;
/* 037 */       if (!project_exprIsNull_0) {
/* 038 */         UTF8String.IntWrapper project_intWrapper = new 
UTF8String.IntWrapper();
/* 039 */         if (project_expr_0.toInt(project_intWrapper)) {
/* 040 */           project_value2 = project_intWrappe

[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-22 Thread zhao bo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957421#comment-16957421
 ] 

zhao bo commented on SPARK-29106:
-

Shane, thanks for recheck the #9 build fail, let's see whether the issue could 
be reproduced in #10. If it still happen, we should fix it.  Thank you:) 

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29561) Large Case Statement Code Generation OOM

2019-10-22 Thread Michael Chen (Jira)

Michael Chen created SPARK-29561:


 Summary: Large Case Statement Code Generation OOM
 Key: SPARK-29561
 URL: https://issues.apache.org/jira/browse/SPARK-29561
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Michael Chen


Spark Configuration

spark.driver.memory = 1g
spark.master = "local"
spark.deploy.mode = "client"

Try to execute a case statement with 3000+ branches.
Spark runs for a while before it OOM
{noformat}
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
at 
org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73)
19/10/22 16:19:54 ERROR FileFormatWriter: Aborting job null.
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.newNode(HashMap.java:1750)
at java.util.HashMap.putVal(HashMap.java:631)
at java.util.HashMap.putMapEntries(HashMap.java:515)
at java.util.HashMap.putAll(HashMap.java:785)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3345)
at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:212)
at 
org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3230)
at 
org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3198)
at 
org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3351)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3254)
at org.codehaus.janino.UnitCompiler.access$3900(UnitCompiler.java:212)
at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3216)
at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3198)
at org.codehaus.janino.Java$Block.accept(Java.java:2756)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3260)
at org.codehaus.janino.UnitCompiler.access$4000(UnitCompiler.java:212)
at 
org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3217)
at 
org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3198)
at org.codehaus.janino.Java$DoStatement.accept(Java.java:3304)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3186)
at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3009)
at 
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336)
at 
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309)
at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799)
at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958)
at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212)
at 
org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393)
at 
org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385)
at 
org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286)
19/10/22 16:19:54 ERROR Utils: throw uncaught fatal error in thread Spark 
Context Cleaner
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
at 
org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
at 
org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73){noformat}
 

 

Generated code looks like
{noformat}
/* 029 */   private void project_doConsume(InternalRow scan_row, UTF8String 
project_expr_0, boolean project_exprIsNull_0) throws java.io.IOException {
/* 030 */     byte project_caseWhenResultState = -1;
/* 031 */     do {
/* 032 */       boolean project_isNull1 = true;
/* 033 */       boolean project_value1 = false;
/* 034 */
/* 035 */       boolean project_isNull2 = project_exprIsNull_0;
/* 036 */       int project_value2 = -1;
/* 037 */       if (!project_exprIsNull_0) {
/* 038 */         UTF8String.IntWrapper project_intWrapper = new 
UTF8Stri

[jira] [Updated] (SPARK-29561) Large Case Statement Code Generation OOM

2019-10-22 Thread Michael Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Chen updated SPARK-29561:
-
Description: 
Spark Configuration

spark.driver.memory = 1g
 spark.master = "local"
 spark.deploy.mode = "client"

Try to execute a case statement with 3000+ branches.
 Spark runs for a while before it OOM
{noformat}
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
at 
org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73)
19/10/22 16:19:54 ERROR FileFormatWriter: Aborting job null.
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.newNode(HashMap.java:1750)
at java.util.HashMap.putVal(HashMap.java:631)
at java.util.HashMap.putMapEntries(HashMap.java:515)
at java.util.HashMap.putAll(HashMap.java:785)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3345)
at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:212)
at 
org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3230)
at 
org.codehaus.janino.UnitCompiler$8.visitLocalVariableDeclarationStatement(UnitCompiler.java:3198)
at 
org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3351)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3254)
at org.codehaus.janino.UnitCompiler.access$3900(UnitCompiler.java:212)
at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3216)
at org.codehaus.janino.UnitCompiler$8.visitBlock(UnitCompiler.java:3198)
at org.codehaus.janino.Java$Block.accept(Java.java:2756)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3260)
at org.codehaus.janino.UnitCompiler.access$4000(UnitCompiler.java:212)
at 
org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3217)
at 
org.codehaus.janino.UnitCompiler$8.visitDoStatement(UnitCompiler.java:3198)
at org.codehaus.janino.Java$DoStatement.accept(Java.java:3304)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3197)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3186)
at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3009)
at 
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336)
at 
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309)
at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799)
at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958)
at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212)
at 
org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393)
at 
org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385)
at 
org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286)
19/10/22 16:19:54 ERROR Utils: throw uncaught fatal error in thread Spark 
Context Cleaner
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:182)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
at 
org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
at 
org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73){noformat}
 Generated code looks like
{noformat}
/* 029 */   private void project_doConsume(InternalRow scan_row, UTF8String 
project_expr_0, boolean project_exprIsNull_0) throws java.io.IOException {
/* 030 */     byte project_caseWhenResultState = -1;
/* 031 */     do {
/* 032 */       boolean project_isNull1 = true;
/* 033 */       boolean project_value1 = false;
/* 034 */
/* 035 */       boolean project_isNull2 = project_exprIsNull_0;
/* 036 */       int project_value2 = -1;
/* 037 */       if (!project_exprIsNull_0) {
/* 038 */         UTF8String.IntWrapper project_intWrapper = new 
UTF8String.IntWrapper();
/* 039 */         if (project_expr_0.toInt(project_intWrapper)) {
/* 040 */           project_value2 = project_intWrapper.value;
/* 041 */         } else {

[jira] [Updated] (SPARK-29542) [SQL][DOC] The descriptions of `spark.sql.files.*` are confused.

2019-10-22 Thread feiwang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

feiwang updated SPARK-29542:

Description: 
Hi，the description of `spark.sql.files.maxPartitionBytes` is shown as below.

{code:java}
The maximum number of bytes to pack into a single partition when reading files.
{code}

It seems that it can ensure each partition at most process bytes of that value 
for spark sql.

As shown in the attachment,  the value of spark.sql.files.maxPartitionBytes is 
128MB.
For stage 1, its input is 16.3TB, but there are only 6400 tasks.


I checked the code,  it is only effective for data source table.
So, its description is confused.
Same as all the descriptions of `spark.sql.files.*`.

  was:
Hi，the description of `spark.sql.files.maxPartitionBytes` is shown as below.

{code:java}
The maximum number of bytes to pack into a single partition when reading files.
{code}

It seems that it can ensure each partition at most process bytes of that value 
for spark sql.

As shown in the attachment,  the value of spark.sql.files.maxPartitionBytes is 
128MB.
For stage 1, its input is 16.3TB, but there are only 6400 tasks.


I checked the code,  it is only effective for data source table.
So, its description is confused.


> [SQL][DOC] The descriptions of `spark.sql.files.*` are confused.
> 
>
> Key: SPARK-29542
> URL: https://issues.apache.org/jira/browse/SPARK-29542
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> Hi，the description of `spark.sql.files.maxPartitionBytes` is shown as below.
> {code:java}
> The maximum number of bytes to pack into a single partition when reading 
> files.
> {code}
> It seems that it can ensure each partition at most process bytes of that 
> value for spark sql.
> As shown in the attachment,  the value of spark.sql.files.maxPartitionBytes 
> is 128MB.
> For stage 1, its input is 16.3TB, but there are only 6400 tasks.
> I checked the code,  it is only effective for data source table.
> So, its description is confused.
> Same as all the descriptions of `spark.sql.files.*`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23897) Guava version

2019-10-22 Thread Tak-Lon (Stephen) Wu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957387#comment-16957387
 ] 

Tak-Lon (Stephen) Wu commented on SPARK-23897:
--

hadoop-3.2.x has included 
[HADOOP-16213|https://github.com/apache/hadoop/commit/e0b3cbd221c1e611660b364a64d1aec52b10bc4e]
 which upgraded guava to 27.0-jre, will spark include the change as a new 
profile e.g. Hadoop-3.2 ? 

> Guava version
> -
>
> Key: SPARK-23897
> URL: https://issues.apache.org/jira/browse/SPARK-23897
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sercan Karaoglu
>Priority: Minor
>
> Guava dependency version 14 is pretty old, needs to be updated to at least 
> 16, google cloud storage connector uses newer one which causes pretty popular 
> error with guava; "java.lang.NoSuchMethodError: 
> com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;"
>  and causes app to crash



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29542) [SQL][DOC] The descriptions of `spark.sql.files.*` are confused.

2019-10-22 Thread feiwang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

feiwang updated SPARK-29542:

Summary: [SQL][DOC] The descriptions of `spark.sql.files.*` are confused.  
(was: [DOC] The description of `spark.sql.files.maxPartitionBytes` is confused.)

> [SQL][DOC] The descriptions of `spark.sql.files.*` are confused.
> 
>
> Key: SPARK-29542
> URL: https://issues.apache.org/jira/browse/SPARK-29542
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> Hi，the description of `spark.sql.files.maxPartitionBytes` is shown as below.
> {code:java}
> The maximum number of bytes to pack into a single partition when reading 
> files.
> {code}
> It seems that it can ensure each partition at most process bytes of that 
> value for spark sql.
> As shown in the attachment,  the value of spark.sql.files.maxPartitionBytes 
> is 128MB.
> For stage 1, its input is 16.3TB, but there are only 6400 tasks.
> I checked the code,  it is only effective for data source table.
> So, its description is confused.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29539) SHOW PARTITIONS should look up catalog/table like v2 commands

2019-10-22 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-29539.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26198
[https://github.com/apache/spark/pull/26198]

> SHOW PARTITIONS should look up catalog/table like v2 commands
> -
>
> Key: SPARK-29539
> URL: https://issues.apache.org/jira/browse/SPARK-29539
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.0.0
>
>
> SHOW PARTITIONS should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29539) SHOW PARTITIONS should look up catalog/table like v2 commands

2019-10-22 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-29539:
---

Assignee: Huaxin Gao

> SHOW PARTITIONS should look up catalog/table like v2 commands
> -
>
> Key: SPARK-29539
> URL: https://issues.apache.org/jira/browse/SPARK-29539
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>
> SHOW PARTITIONS should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28859) Remove value check of MEMORY_OFFHEAP_SIZE in declaration section

2019-10-22 Thread Yifan Xing (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957365#comment-16957365
 ] 

Yifan Xing commented on SPARK-28859:


Created a pr: [https://github.pie.apple.com/pie/apache-spark/pull/469]

 

[~holden] seems like this Jira is assigned to [~yifan Xu], who is Yifan Xu. (I 
am [~yifan_xing] :)) Sorry for the duplicated names. Would you mind reassign?

I also don't have permission to update the ticket status. Would you like to 
update it to `In Review` or allow me permission?

 

Thank you!

> Remove value check of MEMORY_OFFHEAP_SIZE in declaration section
> 
>
> Key: SPARK-28859
> URL: https://issues.apache.org/jira/browse/SPARK-28859
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yang Jie
>Assignee: yifan
>Priority: Minor
>
> Now MEMORY_OFFHEAP_SIZE has default value 0, but It should be greater than 0 
> when 
> MEMORY_OFFHEAP_ENABLED is true,, should we check this condition in code?
>  
> SPARK-28577 add this check before request memory resource to Yarn 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29560) sbt-mima-plugin is missing

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29560:
--
Description: 
GitHub Action detects the following from yesterday (Oct 21, 2019).

- `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing.
- `master`: `sbt-mima-plugin:0.3.0` is missing.

These versions of `sbt-mima-plugin` seems to be removed from the old repo. We 
need to change the repo location or upgrade this.

{code}
~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/

~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle
Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as 
default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Attempting to fetch sbt
Launching sbt from build/sbt-launch-0.13.17.jar
[info] Loading project definition from 
/Users/dongjoon/APACHE/spark-merge/project
[info] Updating 
{file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build...
[info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ...
[warn]  module not found: com.typesafe#sbt-mima-plugin;0.1.17
[warn]  typesafe-ivy-releases: tried
[warn]   
https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  sbt-plugin-releases: tried
[warn]   
https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  local: tried
[warn]   
/Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  public: tried
[warn]   
https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
[warn]  local-preloaded-ivy: tried
[warn]   
/Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml
[warn]  local-preloaded: tried
[warn]   
file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom

...
[warn]  ::
[warn]  ::  UNRESOLVED DEPENDENCIES ::
[warn]  ::
[warn]  :: com.typesafe#sbt-mima-plugin;0.1.17: not found
[warn]  ::
[warn]
[warn]  Note: Some unresolved dependencies have extra attributes.  Check that 
these dependencies exist with the requested attributes.
[warn]  com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
sbtVersion=0.13)
[warn]
[warn]  Note: Unresolved dependencies path:
[warn]  com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
sbtVersion=0.13) (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19)
[warn]+- default:spark-merge-build:0.1-SNAPSHOT (scalaVersion=2.10, 
sbtVersion=0.13)
sbt.ResolveException: unresolved dependency: 
com.typesafe#sbt-mima-plugin;0.1.17: not found
{code}

This breaks our Jenkins in `branch-2.4` now.
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.4-test-sbt-hadoop-2.6/611/console

  was:
GitHub Action detects the following from yesterday (Oct 21, 2019).

- `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing.
- `master`: `sbt-mima-plugin:0.3.0` is missing.

These versions of `sbt-mima-plugin` seems to be removed from the old repo. We 
need to change the repo location or upgrade this.

{code}
~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/

~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle
Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as 
default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Attempting to fetch sbt
Launching sbt from build/sbt-launch-0.13.17.jar
[info] Loading project definition from 
/Users/dongjoon/APACHE/spark-merge/project
[info] Updating 
{file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build...
[info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ...
[warn]  module not found: com.typesafe#sbt-mima-plugin;0.1.17
[warn]  typesafe-ivy-releases: tried
[warn]   
https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  sbt-plugin-releases: tried
[warn]   
https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  local: tried
[warn]   
/Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  public: tried
[warn]   
https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
[warn]  local-preloaded-ivy: tried
[warn]   
/Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml
[warn]  local-preloaded: tried
[warn]   
file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom

...
[warn]  :

[jira] [Commented] (SPARK-29560) sbt-mima-plugin is missing

2019-10-22 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957363#comment-16957363
 ] 

Dongjoon Hyun commented on SPARK-29560:
---

I raised the priority to `Blocker` because Jenkins is broken. We need to 
recover this as soon as possible to protect the branches from the upcoming 
commits.

> sbt-mima-plugin is missing
> --
>
> Key: SPARK-29560
> URL: https://issues.apache.org/jira/browse/SPARK-29560
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> GitHub Action detects the following from yesterday (Oct 21, 2019).
> - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing.
> - `master`: `sbt-mima-plugin:0.3.0` is missing.
> These versions of `sbt-mima-plugin` seems to be removed from the old repo. We 
> need to change the repo location or upgrade this.
> {code}
> ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/
> ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle
> Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as 
> default JAVA_HOME.
> Note, this will be overridden by -java-home if it is set.
> Attempting to fetch sbt
> Launching sbt from build/sbt-launch-0.13.17.jar
> [info] Loading project definition from 
> /Users/dongjoon/APACHE/spark-merge/project
> [info] Updating 
> {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build...
> [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ...
> [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17
> [warn]  typesafe-ivy-releases: tried
> [warn]   
> https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  sbt-plugin-releases: tried
> [warn]   
> https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  local: tried
> [warn]   
> /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> [warn]  local-preloaded-ivy: tried
> [warn]   
> /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml
> [warn]  local-preloaded: tried
> [warn]   
> file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found
> [warn]::
> [warn]
> [warn]Note: Some unresolved dependencies have extra attributes.  
> Check that these dependencies exist with the requested attributes.
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13)
> [warn]
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13) 
> (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19)
> [warn]  +- default:spark-merge-build:0.1-SNAPSHOT 
> (scalaVersion=2.10, sbtVersion=0.13)
> sbt.ResolveException: unresolved dependency: 
> com.typesafe#sbt-mima-plugin;0.1.17: not found
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29560) sbt-mima-plugin is missing

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29560:
--
Priority: Blocker  (was: Major)

> sbt-mima-plugin is missing
> --
>
> Key: SPARK-29560
> URL: https://issues.apache.org/jira/browse/SPARK-29560
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> GitHub Action detects the following from yesterday (Oct 21, 2019).
> - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing.
> - `master`: `sbt-mima-plugin:0.3.0` is missing.
> These versions of `sbt-mima-plugin` seems to be removed from the old repo. We 
> need to change the repo location or upgrade this.
> {code}
> ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/
> ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle
> Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as 
> default JAVA_HOME.
> Note, this will be overridden by -java-home if it is set.
> Attempting to fetch sbt
> Launching sbt from build/sbt-launch-0.13.17.jar
> [info] Loading project definition from 
> /Users/dongjoon/APACHE/spark-merge/project
> [info] Updating 
> {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build...
> [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ...
> [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17
> [warn]  typesafe-ivy-releases: tried
> [warn]   
> https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  sbt-plugin-releases: tried
> [warn]   
> https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  local: tried
> [warn]   
> /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> [warn]  local-preloaded-ivy: tried
> [warn]   
> /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml
> [warn]  local-preloaded: tried
> [warn]   
> file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found
> [warn]::
> [warn]
> [warn]Note: Some unresolved dependencies have extra attributes.  
> Check that these dependencies exist with the requested attributes.
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13)
> [warn]
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13) 
> (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19)
> [warn]  +- default:spark-merge-build:0.1-SNAPSHOT 
> (scalaVersion=2.10, sbtVersion=0.13)
> sbt.ResolveException: unresolved dependency: 
> com.typesafe#sbt-mima-plugin;0.1.17: not found
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29560) sbt-mima-plugin is missing

2019-10-22 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957362#comment-16957362
 ] 

Dongjoon Hyun commented on SPARK-29560:
---

Yes. It does. I'm trying to fix this because this starts to break our Jenkins, 
too.
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.4-test-sbt-hadoop-2.6/611/console

> sbt-mima-plugin is missing
> --
>
> Key: SPARK-29560
> URL: https://issues.apache.org/jira/browse/SPARK-29560
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> GitHub Action detects the following from yesterday (Oct 21, 2019).
> - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing.
> - `master`: `sbt-mima-plugin:0.3.0` is missing.
> These versions of `sbt-mima-plugin` seems to be removed from the old repo. We 
> need to change the repo location or upgrade this.
> {code}
> ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/
> ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle
> Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as 
> default JAVA_HOME.
> Note, this will be overridden by -java-home if it is set.
> Attempting to fetch sbt
> Launching sbt from build/sbt-launch-0.13.17.jar
> [info] Loading project definition from 
> /Users/dongjoon/APACHE/spark-merge/project
> [info] Updating 
> {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build...
> [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ...
> [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17
> [warn]  typesafe-ivy-releases: tried
> [warn]   
> https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  sbt-plugin-releases: tried
> [warn]   
> https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  local: tried
> [warn]   
> /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> [warn]  local-preloaded-ivy: tried
> [warn]   
> /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml
> [warn]  local-preloaded: tried
> [warn]   
> file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found
> [warn]::
> [warn]
> [warn]Note: Some unresolved dependencies have extra attributes.  
> Check that these dependencies exist with the requested attributes.
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13)
> [warn]
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13) 
> (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19)
> [warn]  +- default:spark-merge-build:0.1-SNAPSHOT 
> (scalaVersion=2.10, sbtVersion=0.13)
> sbt.ResolveException: unresolved dependency: 
> com.typesafe#sbt-mima-plugin;0.1.17: not found
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29557) Upgrade dropwizard metrics library to 4.1.1

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29557:
--
Component/s: (was: Spark Core)
 Build

> Upgrade dropwizard metrics library to 4.1.1
> ---
>
> Key: SPARK-29557
> URL: https://issues.apache.org/jira/browse/SPARK-29557
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Priority: Minor
>
> This proposes to upgrade the dropwizard/codahale metrics library version used 
> by Spark to a recent version, tentatively 4.1.1. Spark is currently using 
> Dropwizard metrics version 3.1.5, a version that is no more actively 
> developed nor maintained, according to the project's Github repo README.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29556) Avoid including path in error response from REST submission server

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29556:
--
Affects Version/s: 1.6.3

> Avoid including path in error response from REST submission server
> --
>
> Key: SPARK-29556
> URL: https://issues.apache.org/jira/browse/SPARK-29556
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> I'm not sure if it's possible to exploit, but, the following code in 
> RESTSubmissionServer's ErrorServlet.service is a little risky as it includes 
> user-supplied path input in the error response. We don't want to let a link 
> determine what's in the resulting HTML.
> {code}
> val path = request.getPathInfo
> ...
> var msg =
>   parts match {
> ...
> case _ =>
>   // never reached
>   s"Malformed path $path."
>   }
> msg += s" Please submit requests through 
> http://[host]:[port]/$serverVersion/submissions/...";
> val error = handleError(msg)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29556) Avoid including path in error response from REST submission server

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29556:
--
Affects Version/s: 2.0.2
   2.1.3
   2.2.3
   2.3.4

> Avoid including path in error response from REST submission server
> --
>
> Key: SPARK-29556
> URL: https://issues.apache.org/jira/browse/SPARK-29556
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> I'm not sure if it's possible to exploit, but, the following code in 
> RESTSubmissionServer's ErrorServlet.service is a little risky as it includes 
> user-supplied path input in the error response. We don't want to let a link 
> determine what's in the resulting HTML.
> {code}
> val path = request.getPathInfo
> ...
> var msg =
>   parts match {
> ...
> case _ =>
>   // never reached
>   s"Malformed path $path."
>   }
> msg += s" Please submit requests through 
> http://[host]:[port]/$serverVersion/submissions/...";
> val error = handleError(msg)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29556) Avoid including path in error response from REST submission server

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29556.
---
Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 26211
[https://github.com/apache/spark/pull/26211]

> Avoid including path in error response from REST submission server
> --
>
> Key: SPARK-29556
> URL: https://issues.apache.org/jira/browse/SPARK-29556
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> I'm not sure if it's possible to exploit, but, the following code in 
> RESTSubmissionServer's ErrorServlet.service is a little risky as it includes 
> user-supplied path input in the error response. We don't want to let a link 
> determine what's in the resulting HTML.
> {code}
> val path = request.getPathInfo
> ...
> var msg =
>   parts match {
> ...
> case _ =>
>   // never reached
>   s"Malformed path $path."
>   }
> msg += s" Please submit requests through 
> http://[host]:[port]/$serverVersion/submissions/...";
> val error = handleError(msg)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29560) sbt-mima-plugin is missing

2019-10-22 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957340#comment-16957340
 ] 

Sean R. Owen commented on SPARK-29560:
--

Hm. I note that 0.3.0 is the last version that works with sbt 0.13, so we need 
to find 0.3.0.

It does seem to have disappeared; I presume it was previously at
https://dl.bintray.com/sbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/
or under
https://dl.bintray.com/typesafe/ivy-releases/com.typesafe/

It looks like it is still here:
https://dl.bintray.com/typesafe/sbt-plugins/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.3.0/

... so maybe it's a question of adding a new repo for plugin resolution? I 
don't know how to do that off the top of my head, anyone know SBT better? :)

> sbt-mima-plugin is missing
> --
>
> Key: SPARK-29560
> URL: https://issues.apache.org/jira/browse/SPARK-29560
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> GitHub Action detects the following from yesterday (Oct 21, 2019).
> - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing.
> - `master`: `sbt-mima-plugin:0.3.0` is missing.
> These versions of `sbt-mima-plugin` seems to be removed from the old repo. We 
> need to change the repo location or upgrade this.
> {code}
> ~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/
> ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle
> Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as 
> default JAVA_HOME.
> Note, this will be overridden by -java-home if it is set.
> Attempting to fetch sbt
> Launching sbt from build/sbt-launch-0.13.17.jar
> [info] Loading project definition from 
> /Users/dongjoon/APACHE/spark-merge/project
> [info] Updating 
> {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build...
> [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ...
> [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17
> [warn]  typesafe-ivy-releases: tried
> [warn]   
> https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  sbt-plugin-releases: tried
> [warn]   
> https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  local: tried
> [warn]   
> /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> [warn]  local-preloaded-ivy: tried
> [warn]   
> /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml
> [warn]  local-preloaded: tried
> [warn]   
> file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found
> [warn]::
> [warn]
> [warn]Note: Some unresolved dependencies have extra attributes.  
> Check that these dependencies exist with the requested attributes.
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13)
> [warn]
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13) 
> (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19)
> [warn]  +- default:spark-merge-build:0.1-SNAPSHOT 
> (scalaVersion=2.10, sbtVersion=0.13)
> sbt.ResolveException: unresolved dependency: 
> com.typesafe#sbt-mima-plugin;0.1.17: not found
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29560) sbt-mima-plugin is missing

2019-10-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29560:
--
Description: 
GitHub Action detects the following from yesterday (Oct 21, 2019).

- `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing.
- `master`: `sbt-mima-plugin:0.3.0` is missing.

These versions of `sbt-mima-plugin` seems to be removed from the old repo. We 
need to change the repo location or upgrade this.

{code}
~/A/spark-merge:branch-2.4$ rm -rf ~/.ivy2/

~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle
Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as 
default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Attempting to fetch sbt
Launching sbt from build/sbt-launch-0.13.17.jar
[info] Loading project definition from 
/Users/dongjoon/APACHE/spark-merge/project
[info] Updating 
{file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build...
[info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ...
[warn]  module not found: com.typesafe#sbt-mima-plugin;0.1.17
[warn]  typesafe-ivy-releases: tried
[warn]   
https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  sbt-plugin-releases: tried
[warn]   
https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  local: tried
[warn]   
/Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  public: tried
[warn]   
https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
[warn]  local-preloaded-ivy: tried
[warn]   
/Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml
[warn]  local-preloaded: tried
[warn]   
file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom

...
[warn]  ::
[warn]  ::  UNRESOLVED DEPENDENCIES ::
[warn]  ::
[warn]  :: com.typesafe#sbt-mima-plugin;0.1.17: not found
[warn]  ::
[warn]
[warn]  Note: Some unresolved dependencies have extra attributes.  Check that 
these dependencies exist with the requested attributes.
[warn]  com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
sbtVersion=0.13)
[warn]
[warn]  Note: Unresolved dependencies path:
[warn]  com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
sbtVersion=0.13) (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19)
[warn]+- default:spark-merge-build:0.1-SNAPSHOT (scalaVersion=2.10, 
sbtVersion=0.13)
sbt.ResolveException: unresolved dependency: 
com.typesafe#sbt-mima-plugin;0.1.17: not found
{code}

  was:
GitHub Action detects the following from yesterday (Oct 21, 2019).

- `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing.
- `master`: `sbt-mima-plugin:0.3.0` is missing.

These versions of `sbt-mima-plugin` seems to be removed from the old repo. We 
need to change the repo location or upgrade this.

{code}
~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle
Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as 
default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Attempting to fetch sbt
Launching sbt from build/sbt-launch-0.13.17.jar
[info] Loading project definition from 
/Users/dongjoon/APACHE/spark-merge/project
[info] Updating 
{file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build...
[info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ...
[warn]  module not found: com.typesafe#sbt-mima-plugin;0.1.17
[warn]  typesafe-ivy-releases: tried
[warn]   
https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  sbt-plugin-releases: tried
[warn]   
https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  local: tried
[warn]   
/Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  public: tried
[warn]   
https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
[warn]  local-preloaded-ivy: tried
[warn]   
/Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml
[warn]  local-preloaded: tried
[warn]   
file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom

...
[warn]  ::
[warn]  ::  UNRESOLVED DEPENDENCIES ::
[warn]  ::
[warn]  :: com.typesafe#sbt-mima-plugin;0.1.17: not found
[warn]

[jira] [Commented] (SPARK-29560) sbt-mima-plugin is missing

2019-10-22 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957260#comment-16957260
 ] 

Dongjoon Hyun commented on SPARK-29560:
---

Although this is not an Apache Spark issue, but we are affected. (cc [~srowen] 
and [~hyukjin.kwon])

> sbt-mima-plugin is missing
> --
>
> Key: SPARK-29560
> URL: https://issues.apache.org/jira/browse/SPARK-29560
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> GitHub Action detects the following from yesterday (Oct 21, 2019).
> - `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing.
> - `master`: `sbt-mima-plugin:0.3.0` is missing.
> These versions of `sbt-mima-plugin` seems to be removed from the old repo. We 
> need to change the repo location or upgrade this.
> {code}
> ~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle
> Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as 
> default JAVA_HOME.
> Note, this will be overridden by -java-home if it is set.
> Attempting to fetch sbt
> Launching sbt from build/sbt-launch-0.13.17.jar
> [info] Loading project definition from 
> /Users/dongjoon/APACHE/spark-merge/project
> [info] Updating 
> {file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build...
> [info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ...
> [warn]module not found: com.typesafe#sbt-mima-plugin;0.1.17
> [warn]  typesafe-ivy-releases: tried
> [warn]   
> https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  sbt-plugin-releases: tried
> [warn]   
> https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  local: tried
> [warn]   
> /Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
> [warn]  public: tried
> [warn]   
> https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> [warn]  local-preloaded-ivy: tried
> [warn]   
> /Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml
> [warn]  local-preloaded: tried
> [warn]   
> file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
> ...
> [warn]::
> [warn]::  UNRESOLVED DEPENDENCIES ::
> [warn]::
> [warn]:: com.typesafe#sbt-mima-plugin;0.1.17: not found
> [warn]::
> [warn]
> [warn]Note: Some unresolved dependencies have extra attributes.  
> Check that these dependencies exist with the requested attributes.
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13)
> [warn]
> [warn]Note: Unresolved dependencies path:
> [warn]com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
> sbtVersion=0.13) 
> (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19)
> [warn]  +- default:spark-merge-build:0.1-SNAPSHOT 
> (scalaVersion=2.10, sbtVersion=0.13)
> sbt.ResolveException: unresolved dependency: 
> com.typesafe#sbt-mima-plugin;0.1.17: not found
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29560) sbt-mima-plugin is missing

2019-10-22 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-29560:
-

 Summary: sbt-mima-plugin is missing
 Key: SPARK-29560
 URL: https://issues.apache.org/jira/browse/SPARK-29560
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 2.4.4, 3.0.0
Reporter: Dongjoon Hyun


GitHub Action detects the following from yesterday (Oct 21, 2019).

- `branch-2.4`: `sbt-mima-plugin:0.1.17` is missing.
- `master`: `sbt-mima-plugin:0.3.0` is missing.

These versions of `sbt-mima-plugin` seems to be removed from the old repo. We 
need to change the repo location or upgrade this.

{code}
~/A/spark-merge:branch-2.4$ build/sbt scalastyle test:scalastyle
Using /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home as 
default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Attempting to fetch sbt
Launching sbt from build/sbt-launch-0.13.17.jar
[info] Loading project definition from 
/Users/dongjoon/APACHE/spark-merge/project
[info] Updating 
{file:/Users/dongjoon/APACHE/spark-merge/project/}spark-merge-build...
[info] Resolving com.typesafe#sbt-mima-plugin;0.1.17 ...
[warn]  module not found: com.typesafe#sbt-mima-plugin;0.1.17
[warn]  typesafe-ivy-releases: tried
[warn]   
https://repo.typesafe.com/typesafe/ivy-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  sbt-plugin-releases: tried
[warn]   
https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  local: tried
[warn]   
/Users/dongjoon/.ivy2/local/com.typesafe/sbt-mima-plugin/scala_2.10/sbt_0.13/0.1.17/ivys/ivy.xml
[warn]  public: tried
[warn]   
https://repo1.maven.org/maven2/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom
[warn]  local-preloaded-ivy: tried
[warn]   
/Users/dongjoon/.sbt/preloaded/com.typesafe/sbt-mima-plugin/0.1.17/ivys/ivy.xml
[warn]  local-preloaded: tried
[warn]   
file:Users/dongjoon/.sbt/preloaded/com/typesafe/sbt-mima-plugin_2.10_0.13/0.1.17/sbt-mima-plugin-0.1.17.pom

...
[warn]  ::
[warn]  ::  UNRESOLVED DEPENDENCIES ::
[warn]  ::
[warn]  :: com.typesafe#sbt-mima-plugin;0.1.17: not found
[warn]  ::
[warn]
[warn]  Note: Some unresolved dependencies have extra attributes.  Check that 
these dependencies exist with the requested attributes.
[warn]  com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
sbtVersion=0.13)
[warn]
[warn]  Note: Unresolved dependencies path:
[warn]  com.typesafe:sbt-mima-plugin:0.1.17 (scalaVersion=2.10, 
sbtVersion=0.13) (/Users/dongjoon/APACHE/spark-merge/project/plugins.sbt#L18-19)
[warn]+- default:spark-merge-build:0.1-SNAPSHOT (scalaVersion=2.10, 
sbtVersion=0.13)
sbt.ResolveException: unresolved dependency: 
com.typesafe#sbt-mima-plugin;0.1.17: not found
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29559) Support pagination for JDBC/ODBC UI page

2019-10-22 Thread shahid (Jira)

shahid created SPARK-29559:
--

 Summary: Support pagination for JDBC/ODBC UI page
 Key: SPARK-29559
 URL: https://issues.apache.org/jira/browse/SPARK-29559
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 2.4.4, 3.0.0
Reporter: shahid


Support pagination for JDBC/ODBC UI page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29558) ResolveTables and ResolveRelations should be order-insensitive

2019-10-22 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-29558:
---

 Summary: ResolveTables and ResolveRelations should be 
order-insensitive
 Key: SPARK-29558
 URL: https://issues.apache.org/jira/browse/SPARK-29558
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29557) Upgrade dropwizard metrics library to 4.1.1

2019-10-22 Thread Luca Canali (Jira)

Luca Canali created SPARK-29557:
---

 Summary: Upgrade dropwizard metrics library to 4.1.1
 Key: SPARK-29557
 URL: https://issues.apache.org/jira/browse/SPARK-29557
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Luca Canali


This proposes to upgrade the dropwizard/codahale metrics library version used 
by Spark to a recent version, tentatively 4.1.1. Spark is currently using 
Dropwizard metrics version 3.1.5, a version that is no more actively developed 
nor maintained, according to the project's Github repo README.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29555) Getting 404 while opening link for Sequence file on latest documentation page

2019-10-22 Thread Vishal Akkalkote (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Akkalkote updated SPARK-29555:
-
Description: 
Trying to open the link for Sequence file on Page 
([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which 
redirects to like - 
[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html]

however getting 404 (which is a standard http error code for File Not Found)

Its actually giving 404 for all resources whose base url is – 
[http://hadoop.apache.org/common]

 e.g.  
[SequenceFiles|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html],
 
[Writable|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Writable.html],
 
[IntWritable|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/IntWritable.html]
 and 
[Text|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Text.html].

  was:
Trying to open the link for Sequence file on Page 
([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which 
redirects to like - 
[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html]

however getting 404 (which is a standard http error code for File Not Found)

Its actually giving 404 for all resources whose base url is – 
[http://hadoop.apache.org/common]

 


> Getting 404 while opening link for Sequence file on latest documentation page
> -
>
> Key: SPARK-29555
> URL: https://issues.apache.org/jira/browse/SPARK-29555
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.4.4
>Reporter: Vishal Akkalkote
>Priority: Major
>
> Trying to open the link for Sequence file on Page 
> ([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which 
> redirects to like - 
> [http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html]
> however getting 404 (which is a standard http error code for File Not Found)
> Its actually giving 404 for all resources whose base url is – 
> [http://hadoop.apache.org/common]
>  e.g.  
> [SequenceFiles|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html],
>  
> [Writable|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Writable.html],
>  
> [IntWritable|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/IntWritable.html]
>  and 
> [Text|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Text.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29555) Getting 404 while opening link for Sequence file on latest documentation page

2019-10-22 Thread Vishal Akkalkote (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Akkalkote updated SPARK-29555:
-
Description: 
Trying to open the link for Sequence file on Page 
([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which 
redirects to like - 
[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html]

however getting 404 (which is a standard http error code for File Not Found)

Its actually giving 404 for all resources whose base url is – 
[http://hadoop.apache.org/common]

 

  was:
Trying to open the link for Sequence file on Page 
([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which 
redirects to like - 
[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html]


however getting 404 (which is a standard http error code for File Not Found)


> Getting 404 while opening link for Sequence file on latest documentation page
> -
>
> Key: SPARK-29555
> URL: https://issues.apache.org/jira/browse/SPARK-29555
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.4.4
>Reporter: Vishal Akkalkote
>Priority: Major
>
> Trying to open the link for Sequence file on Page 
> ([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which 
> redirects to like - 
> [http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html]
> however getting 404 (which is a standard http error code for File Not Found)
> Its actually giving 404 for all resources whose base url is – 
> [http://hadoop.apache.org/common]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29556) Avoid including path in error response from REST submission server

2019-10-22 Thread Sean R. Owen (Jira)

Sean R. Owen created SPARK-29556:


 Summary: Avoid including path in error response from REST 
submission server
 Key: SPARK-29556
 URL: https://issues.apache.org/jira/browse/SPARK-29556
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4, 3.0.0
Reporter: Sean R. Owen
Assignee: Sean R. Owen


I'm not sure if it's possible to exploit, but, the following code in 
RESTSubmissionServer's ErrorServlet.service is a little risky as it includes 
user-supplied path input in the error response. We don't want to let a link 
determine what's in the resulting HTML.

{code}
val path = request.getPathInfo
...
var msg =
  parts match {
...
case _ =>
  // never reached
  s"Malformed path $path."
  }
msg += s" Please submit requests through 
http://[host]:[port]/$serverVersion/submissions/...";
val error = handleError(msg)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29551) There is a bug about fetch failed when an executor lost

2019-10-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-29551:
-
   Fix Version/s: (was: 2.4.3)
Target Version/s:   (was: 2.4.3, 2.4.5, 3.0.0)
Priority: Major  (was: Blocker)

Don't set blocker or target / fix versions please.

> There is a bug about fetch failed when an executor lost 
> 
>
> Key: SPARK-29551
> URL: https://issues.apache.org/jira/browse/SPARK-29551
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: weixiuli
>Priority: Major
>
> There will be a regression when the executor lost and then causes 'fetch 
> failed'.
> We can add  an unittest in 'DAGSchedulerSuite.scala'  to catch the above 
> problem.
> {code}
> test("All shuffle files on the slave should be cleaned up when slave lost 
> test") {
> // reset the test context with the right shuffle service config
> afterEach()
> val conf = new SparkConf()
> conf.set(config.SHUFFLE_SERVICE_ENABLED.key, "true")
> conf.set("spark.files.fetchFailure.unRegisterOutputOnHost", "true")
> init(conf)
> runEvent(ExecutorAdded("exec-hostA1", "hostA"))
> runEvent(ExecutorAdded("exec-hostA2", "hostA"))
> runEvent(ExecutorAdded("exec-hostB", "hostB"))
> val firstRDD = new MyRDD(sc, 3, Nil)
> val firstShuffleDep = new ShuffleDependency(firstRDD, new 
> HashPartitioner(3))
> val firstShuffleId = firstShuffleDep.shuffleId
> val shuffleMapRdd = new MyRDD(sc, 3, List(firstShuffleDep))
> val shuffleDep = new ShuffleDependency(shuffleMapRdd, new 
> HashPartitioner(3))
> val secondShuffleId = shuffleDep.shuffleId
> val reduceRdd = new MyRDD(sc, 1, List(shuffleDep))
> submit(reduceRdd, Array(0))
> // map stage1 completes successfully, with one task on each executor
> complete(taskSets(0), Seq(
>   (Success,
> MapStatus(
>   BlockManagerId("exec-hostA1", "hostA", 12345), 
> Array.fill[Long](1)(2), mapTaskId = 5)),
>   (Success,
> MapStatus(
>   BlockManagerId("exec-hostA2", "hostA", 12345), 
> Array.fill[Long](1)(2), mapTaskId = 6)),
>   (Success, makeMapStatus("hostB", 1, mapTaskId = 7))
> ))
> // map stage2 completes successfully, with one task on each executor
> complete(taskSets(1), Seq(
>   (Success,
> MapStatus(
>   BlockManagerId("exec-hostA1", "hostA", 12345), 
> Array.fill[Long](1)(2), mapTaskId = 8)),
>   (Success,
> MapStatus(
>   BlockManagerId("exec-hostA2", "hostA", 12345), 
> Array.fill[Long](1)(2), mapTaskId = 9)),
>   (Success, makeMapStatus("hostB", 1, mapTaskId = 10))
> ))
> // make sure our test setup is correct
> val initialMapStatus1 = 
> mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses
> //  val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get
> assert(initialMapStatus1.count(_ != null) === 3)
> assert(initialMapStatus1.map{_.location.executorId}.toSet ===
>   Set("exec-hostA1", "exec-hostA2", "exec-hostB"))
> assert(initialMapStatus1.map{_.mapId}.toSet === Set(5, 6, 7))
> val initialMapStatus2 = 
> mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses
> //  val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get
> assert(initialMapStatus2.count(_ != null) === 3)
> assert(initialMapStatus2.map{_.location.executorId}.toSet ===
>   Set("exec-hostA1", "exec-hostA2", "exec-hostB"))
> assert(initialMapStatus2.map{_.mapId}.toSet === Set(8, 9, 10))
> // kill exec-hostA2
> runEvent(ExecutorLost("exec-hostA2", ExecutorKilled))
> // reduce stage fails with a fetch failure from map stage from exec-hostA2
> complete(taskSets(2), Seq(
>   (FetchFailed(BlockManagerId("exec-hostA2", "hostA", 12345),
> secondShuffleId, 0L, 0, 0, "ignored"),
> null)
> ))
> // Here is the main assertion -- make sure that we de-register
> // the map outputs for both map stage from both executors on hostA
> val mapStatus1 = 
> mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses
> assert(mapStatus1.count(_ != null) === 1)
> assert(mapStatus1(2).location.executorId === "exec-hostB")
> assert(mapStatus1(2).location.host === "hostB")
> val mapStatus2 = 
> mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses
> assert(mapStatus2.count(_ != null) === 1)
> assert(mapStatus2(2).location.executorId === "exec-hostB")
> assert(mapStatus2(2).location.host === "hostB")
>   }
> {code}
> The error output is:
> {code}
> 3 did not equal 1
> ScalaTestFailureLocation: org.apache.spark.scheduler.DAGSchedulerSuite at 
> (DAGSchedulerSuite.scala:609)
> Expected :1
> Actual   :3
>  
> org.scalatest.except

[jira] [Updated] (SPARK-29488) In Web UI, stage page has js error when sort table.

2019-10-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-29488:
-
Priority: Minor  (was: Major)

> In Web UI, stage page has js error when sort table.
> ---
>
> Key: SPARK-29488
> URL: https://issues.apache.org/jira/browse/SPARK-29488
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2, 2.4.4
>Reporter: jenny
>Assignee: jenny
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: image-2019-10-16-15-47-25-212.png
>
>
> In Web UI, follow the steps below, get js error "Uncaught TypeError: Failed 
> to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'.".
>  # Click "Summary Metrics..." 's tablehead "Min"
>  # Click "Aggregated Metrics by Executor" 's tablehead "Task Time"
>  # Click "Summary Metrics..." 's tablehead "Min"（the same as step 1.）
>   !image-2019-10-16-15-47-25-212.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29488) In Web UI, stage page has js error when sort table.

2019-10-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-29488:


Assignee: jenny

> In Web UI, stage page has js error when sort table.
> ---
>
> Key: SPARK-29488
> URL: https://issues.apache.org/jira/browse/SPARK-29488
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2, 2.4.4
>Reporter: jenny
>Assignee: jenny
>Priority: Major
> Attachments: image-2019-10-16-15-47-25-212.png
>
>
> In Web UI, follow the steps below, get js error "Uncaught TypeError: Failed 
> to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'.".
>  # Click "Summary Metrics..." 's tablehead "Min"
>  # Click "Aggregated Metrics by Executor" 's tablehead "Task Time"
>  # Click "Summary Metrics..." 's tablehead "Min"（the same as step 1.）
>   !image-2019-10-16-15-47-25-212.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29488) In Web UI, stage page has js error when sort table.

2019-10-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29488.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26136
[https://github.com/apache/spark/pull/26136]

> In Web UI, stage page has js error when sort table.
> ---
>
> Key: SPARK-29488
> URL: https://issues.apache.org/jira/browse/SPARK-29488
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2, 2.4.4
>Reporter: jenny
>Assignee: jenny
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: image-2019-10-16-15-47-25-212.png
>
>
> In Web UI, follow the steps below, get js error "Uncaught TypeError: Failed 
> to execute 'removeChild' on 'Node': parameter 1 is not of type 'Node'.".
>  # Click "Summary Metrics..." 's tablehead "Min"
>  # Click "Aggregated Metrics by Executor" 's tablehead "Task Time"
>  # Click "Summary Metrics..." 's tablehead "Min"（the same as step 1.）
>   !image-2019-10-16-15-47-25-212.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28793) Document CREATE FUNCTION in SQL Reference.

2019-10-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-28793:
-
Priority: Minor  (was: Major)

> Document CREATE FUNCTION in SQL Reference.
> --
>
> Key: SPARK-28793
> URL: https://issues.apache.org/jira/browse/SPARK-28793
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28793) Document CREATE FUNCTION in SQL Reference.

2019-10-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-28793:


Assignee: Dilip Biswal

> Document CREATE FUNCTION in SQL Reference.
> --
>
> Key: SPARK-28793
> URL: https://issues.apache.org/jira/browse/SPARK-28793
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28793) Document CREATE FUNCTION in SQL Reference.

2019-10-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-28793.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25894
[https://github.com/apache/spark/pull/25894]

> Document CREATE FUNCTION in SQL Reference.
> --
>
> Key: SPARK-28793
> URL: https://issues.apache.org/jira/browse/SPARK-28793
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28787) Document LOAD DATA statement in SQL Reference.

2019-10-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-28787:
-
Priority: Minor  (was: Major)

> Document LOAD DATA statement in SQL Reference.
> --
>
> Key: SPARK-28787
> URL: https://issues.apache.org/jira/browse/SPARK-28787
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.0.0
>
>
> Document LOAD DATA statement in SQL Reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28787) Document LOAD DATA statement in SQL Reference.

2019-10-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-28787.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25522
[https://github.com/apache/spark/pull/25522]

> Document LOAD DATA statement in SQL Reference.
> --
>
> Key: SPARK-28787
> URL: https://issues.apache.org/jira/browse/SPARK-28787
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.0.0
>
>
> Document LOAD DATA statement in SQL Reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28787) Document LOAD DATA statement in SQL Reference.

2019-10-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-28787:


Assignee: Huaxin Gao

> Document LOAD DATA statement in SQL Reference.
> --
>
> Key: SPARK-28787
> URL: https://issues.apache.org/jira/browse/SPARK-28787
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>
> Document LOAD DATA statement in SQL Reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29555) Getting 404 while opening link for Sequence file on latest documentation page

2019-10-22 Thread Vishal Akkalkote (Jira)

Vishal Akkalkote created SPARK-29555:


 Summary: Getting 404 while opening link for Sequence file on 
latest documentation page
 Key: SPARK-29555
 URL: https://issues.apache.org/jira/browse/SPARK-29555
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 2.4.4
Reporter: Vishal Akkalkote


Trying to open the link for Sequence file on Page 
([https://spark.apache.org/docs/latest/rdd-programming-guide.html]) which 
redirects to like - 
[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html]


however getting 404 (which is a standard http error code for File Not Found)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29517) TRUNCATE TABLE should look up catalog/table like v2 commands

2019-10-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29517.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26174
[https://github.com/apache/spark/pull/26174]

> TRUNCATE TABLE should look up catalog/table like v2 commands
> 
>
> Key: SPARK-29517
> URL: https://issues.apache.org/jira/browse/SPARK-29517
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.0.0
>
>
> TRUNCATE TABLE should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29554) Add a misc function named version

2019-10-22 Thread Kent Yao (Jira)

Kent Yao created SPARK-29554:


 Summary: Add a misc function named version
 Key: SPARK-29554
 URL: https://issues.apache.org/jira/browse/SPARK-29554
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Kent Yao


|string|version()|Returns the Hive version (as of Hive 2.1.0). The string 
contains 2 fields, the first being a build number and the second being a build 
hash. Example: "select version();" might return "2.1.0.2.5.0.0-1245 
r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232". Actual results will depend on your 
build.|

 

 

[https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29552) Fix the flaky test failed in AdaptiveQueryExecSuite # multiple joins

2019-10-22 Thread Ke Jia (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-29552:
---
Description: AQE will optimize the logical plan once there is query stage 
finished. So for inner join, when two join side is all small to be the build 
side. The planner of converting logical plan to physical plan will select the 
build side as BuildRight if right side finished firstly or BuildLeft if left 
side finished firstly. In some case, when BuildRight or BuildLeft may introduce 
additional exchange to the parent node. The revert approach in 
OptimizeLocalShuffleReader rule may be too conservative, which revert all the 
local shuffle reader when introduce additional exchange not  revert the local 
shuffle reader introduced shuffle.  It may be expense to only revert the local 
shuffle reader introduced shuffle. The workaround is to apply the 
OptimizeLocalShuffleReader rule again when creating new query stage to further 
optimize the sub tree shuffle reader to local shuffle reader.  (was: AQE will 
optimize the logical plan once there is query stage finished. So for inner 
join, when two join side is all small to be the build side. The planner of 
converting logical plan to physical plan will select the build side as 
BuildRight if right side finished firstly or BuildLeft if left side finished 
firstly. In some case, when BuildRight or BuildLeft may introduce additioanl 
exchange to the parent node. The revert approach in OptimizeLocalShuffleReader 
rule may be too conservative, which revert all the local shuffle reader when 
introduce additional exchange not  revert the local shuffle reader introduced 
shuffle.  It may be expense to only revert the local shuffle reader introduced 
shuffle. The workaround is to apply the OptimizeLocalShuffleReader rule again 
when creating new query stage to further optimize the sub tree shuffle reader 
to local shuffle reader.)

> Fix the flaky test failed in AdaptiveQueryExecSuite # multiple joins
> 
>
> Key: SPARK-29552
> URL: https://issues.apache.org/jira/browse/SPARK-29552
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ke Jia
>Priority: Major
>
> AQE will optimize the logical plan once there is query stage finished. So for 
> inner join, when two join side is all small to be the build side. The planner 
> of converting logical plan to physical plan will select the build side as 
> BuildRight if right side finished firstly or BuildLeft if left side finished 
> firstly. In some case, when BuildRight or BuildLeft may introduce additional 
> exchange to the parent node. The revert approach in 
> OptimizeLocalShuffleReader rule may be too conservative, which revert all the 
> local shuffle reader when introduce additional exchange not  revert the local 
> shuffle reader introduced shuffle.  It may be expense to only revert the 
> local shuffle reader introduced shuffle. The workaround is to apply the 
> OptimizeLocalShuffleReader rule again when creating new query stage to 
> further optimize the sub tree shuffle reader to local shuffle reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29553) This problemis about using native BLAS to improvement ML/MLLIB performance

2019-10-22 Thread WuZeyi (Jira)

WuZeyi created SPARK-29553:
--

 Summary: This problemis about using native BLAS to improvement 
ML/MLLIB performance
 Key: SPARK-29553
 URL: https://issues.apache.org/jira/browse/SPARK-29553
 Project: Spark
  Issue Type: Improvement
  Components: ML, MLlib
Affects Versions: 2.4.4, 2.3.0
Reporter: WuZeyi


I use {color:#FF}native BLAS{color} to improvement ML/MLLIB performance on 
Yarn.

The file {color:#FF}spark-env.sh{color} which is modified by [SPARK-21305] 
said that I should set {color:#FF}OPENBLAS_NUM_THREADS=1{color} to disable 
multi-threading of OpenBLAS, but it does not take effect.

I modify {color:#FF}spark.conf{color} to set  OPENBLAS_NUM_THREADS=1，and 
the performance improve.
 
 
I think MKL_NUM_THREADS is the same.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29552) Fix the flaky test failed in AdaptiveQueryExecSuite # multiple joins

2019-10-22 Thread Ke Jia (Jira)

Ke Jia created SPARK-29552:
--

 Summary: Fix the flaky test failed in AdaptiveQueryExecSuite # 
multiple joins
 Key: SPARK-29552
 URL: https://issues.apache.org/jira/browse/SPARK-29552
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Ke Jia


AQE will optimize the logical plan once there is query stage finished. So for 
inner join, when two join side is all small to be the build side. The planner 
of converting logical plan to physical plan will select the build side as 
BuildRight if right side finished firstly or BuildLeft if left side finished 
firstly. In some case, when BuildRight or BuildLeft may introduce additioanl 
exchange to the parent node. The revert approach in OptimizeLocalShuffleReader 
rule may be too conservative, which revert all the local shuffle reader when 
introduce additional exchange not  revert the local shuffle reader introduced 
shuffle.  It may be expense to only revert the local shuffle reader introduced 
shuffle. The workaround is to apply the OptimizeLocalShuffleReader rule again 
when creating new query stage to further optimize the sub tree shuffle reader 
to local shuffle reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21492) Memory leak in SortMergeJoin

2019-10-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-21492:
---

Assignee: Yuanjian Li

> Memory leak in SortMergeJoin
> 
>
> Key: SPARK-21492
> URL: https://issues.apache.org/jira/browse/SPARK-21492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.3.1, 3.0.0
>Reporter: Zhan Zhang
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.0.0
>
>
> In SortMergeJoin, if the iterator is not exhausted, there will be memory leak 
> caused by the Sort. The memory is not released until the task end, and cannot 
> be used by other operators causing performance drop or OOM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-21492) Memory leak in SortMergeJoin

2019-10-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-21492.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26164
[https://github.com/apache/spark/pull/26164]

> Memory leak in SortMergeJoin
> 
>
> Key: SPARK-21492
> URL: https://issues.apache.org/jira/browse/SPARK-21492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.3.1, 3.0.0
>Reporter: Zhan Zhang
>Priority: Major
> Fix For: 3.0.0
>
>
> In SortMergeJoin, if the iterator is not exhausted, there will be memory leak 
> caused by the Sort. The memory is not released until the task end, and cannot 
> be used by other operators causing performance drop or OOM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29550) Enhance locking in session catalog

2019-10-22 Thread Nikita Gorbachevski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikita Gorbachevski updated SPARK-29550:

Component/s: (was: Spark Core)

> Enhance locking in session catalog
> --
>
> Key: SPARK-29550
> URL: https://issues.apache.org/jira/browse/SPARK-29550
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Nikita Gorbachevski
>Priority: Minor
>
> In my streaming application``spark.streaming.concurrentJobs`` is set to 50 
> which is used as size for underlying thread pool. I automatically 
> create/alter tables/view in runtime. I order to do that i invoke ``create ... 
> if not exists operations`` on driver on each batch invocation. Once i noticed 
> that  most of batch time is spent on driver but not on executors. I did a 
> thread dump and figured out that most of the threads are blocked on 
> SessionCatalog waiting for a lock.  
> Existing implementation of SessionCatalog uses a single lock which is used 
> almost by all the methods to guard ``currentDb`` and ``tempViews`` variables. 
> I propose to enhance locking behaviour of SessionCatalog by :
>  # Employing ReadWriteLock which allows to execute read operations 
> concurrently. 
>  # Replace synchronized with the corresponding read or write lock.
> Also it's possible to go even further and strip locks for ``currentDb`` and 
> ``tempViews`` but i'm not sure whether it's possible from the implementation 
> point of view. Probably someone will help me with this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29551) There is a bug about fetch failed when an executor lost

2019-10-22 Thread weixiuli (Jira)

weixiuli created SPARK-29551:


 Summary: There is a bug about fetch failed when an executor lost 
 Key: SPARK-29551
 URL: https://issues.apache.org/jira/browse/SPARK-29551
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.3
Reporter: weixiuli
 Fix For: 2.4.3


There will be a regression when the executor lost and then causes 'fetch 
failed'.

We can add  an unittest in 'DAGSchedulerSuite.scala'  to catch the above 
problem.

{code}
test("All shuffle files on the slave should be cleaned up when slave lost 
test") {
// reset the test context with the right shuffle service config
afterEach()
val conf = new SparkConf()
conf.set(config.SHUFFLE_SERVICE_ENABLED.key, "true")
conf.set("spark.files.fetchFailure.unRegisterOutputOnHost", "true")
init(conf)
runEvent(ExecutorAdded("exec-hostA1", "hostA"))
runEvent(ExecutorAdded("exec-hostA2", "hostA"))
runEvent(ExecutorAdded("exec-hostB", "hostB"))
val firstRDD = new MyRDD(sc, 3, Nil)
val firstShuffleDep = new ShuffleDependency(firstRDD, new 
HashPartitioner(3))
val firstShuffleId = firstShuffleDep.shuffleId
val shuffleMapRdd = new MyRDD(sc, 3, List(firstShuffleDep))
val shuffleDep = new ShuffleDependency(shuffleMapRdd, new 
HashPartitioner(3))
val secondShuffleId = shuffleDep.shuffleId
val reduceRdd = new MyRDD(sc, 1, List(shuffleDep))
submit(reduceRdd, Array(0))
// map stage1 completes successfully, with one task on each executor
complete(taskSets(0), Seq(
  (Success,
MapStatus(
  BlockManagerId("exec-hostA1", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 5)),
  (Success,
MapStatus(
  BlockManagerId("exec-hostA2", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 6)),
  (Success, makeMapStatus("hostB", 1, mapTaskId = 7))
))
// map stage2 completes successfully, with one task on each executor
complete(taskSets(1), Seq(
  (Success,
MapStatus(
  BlockManagerId("exec-hostA1", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 8)),
  (Success,
MapStatus(
  BlockManagerId("exec-hostA2", "hostA", 12345), 
Array.fill[Long](1)(2), mapTaskId = 9)),
  (Success, makeMapStatus("hostB", 1, mapTaskId = 10))
))
// make sure our test setup is correct
val initialMapStatus1 = 
mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses
//  val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get
assert(initialMapStatus1.count(_ != null) === 3)
assert(initialMapStatus1.map{_.location.executorId}.toSet ===
  Set("exec-hostA1", "exec-hostA2", "exec-hostB"))
assert(initialMapStatus1.map{_.mapId}.toSet === Set(5, 6, 7))

val initialMapStatus2 = 
mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses
//  val initialMapStatus1 = mapOutputTracker.mapStatuses.get(0).get
assert(initialMapStatus2.count(_ != null) === 3)
assert(initialMapStatus2.map{_.location.executorId}.toSet ===
  Set("exec-hostA1", "exec-hostA2", "exec-hostB"))
assert(initialMapStatus2.map{_.mapId}.toSet === Set(8, 9, 10))

// kill exec-hostA2
runEvent(ExecutorLost("exec-hostA2", ExecutorKilled))
// reduce stage fails with a fetch failure from map stage from exec-hostA2
complete(taskSets(2), Seq(
  (FetchFailed(BlockManagerId("exec-hostA2", "hostA", 12345),
secondShuffleId, 0L, 0, 0, "ignored"),
null)
))
// Here is the main assertion -- make sure that we de-register
// the map outputs for both map stage from both executors on hostA
val mapStatus1 = 
mapOutputTracker.shuffleStatuses(firstShuffleId).mapStatuses
assert(mapStatus1.count(_ != null) === 1)
assert(mapStatus1(2).location.executorId === "exec-hostB")
assert(mapStatus1(2).location.host === "hostB")

val mapStatus2 = 
mapOutputTracker.shuffleStatuses(secondShuffleId).mapStatuses
assert(mapStatus2.count(_ != null) === 1)
assert(mapStatus2(2).location.executorId === "exec-hostB")
assert(mapStatus2(2).location.host === "hostB")
  }
{code}

The error output is:
{code}

3 did not equal 1
ScalaTestFailureLocation: org.apache.spark.scheduler.DAGSchedulerSuite at 
(DAGSchedulerSuite.scala:609)
Expected :1
Actual   :3
 

org.scalatest.exceptions.TestFailedException: 3 did not equal 1

{code}






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29516) Test ThriftServerQueryTestSuite asynchronously

2019-10-22 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-29516:
---

Assignee: Yuming Wang

> Test ThriftServerQueryTestSuite asynchronously
> --
>
> Key: SPARK-29516
> URL: https://issues.apache.org/jira/browse/SPARK-29516
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> spark.sql.hive.thriftServer.async=true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29516) Test ThriftServerQueryTestSuite asynchronously

2019-10-22 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-29516.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26172
[https://github.com/apache/spark/pull/26172]

> Test ThriftServerQueryTestSuite asynchronously
> --
>
> Key: SPARK-29516
> URL: https://issues.apache.org/jira/browse/SPARK-29516
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> spark.sql.hive.thriftServer.async=true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29550) Enhance locking in session catalog

2019-10-22 Thread Nikita Gorbachevski (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956902#comment-16956902
 ] 

Nikita Gorbachevski commented on SPARK-29550:
-

Working on this.

> Enhance locking in session catalog
> --
>
> Key: SPARK-29550
> URL: https://issues.apache.org/jira/browse/SPARK-29550
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.4.4
>Reporter: Nikita Gorbachevski
>Priority: Minor
>
> In my streaming application``spark.streaming.concurrentJobs`` is set to 50 
> which is used as size for underlying thread pool. I automatically 
> create/alter tables/view in runtime. I order to do that i invoke ``create ... 
> if not exists operations`` on driver on each batch invocation. Once i noticed 
> that  most of batch time is spent on driver but not on executors. I did a 
> thread dump and figured out that most of the threads are blocked on 
> SessionCatalog waiting for a lock.  
> Existing implementation of SessionCatalog uses a single lock which is used 
> almost by all the methods to guard ``currentDb`` and ``tempViews`` variables. 
> I propose to enhance locking behaviour of SessionCatalog by :
>  # Employing ReadWriteLock which allows to execute read operations 
> concurrently. 
>  # Replace synchronized with the corresponding read or write lock.
> Also it's possible to go even further and strip locks for ``currentDb`` and 
> ``tempViews`` but i'm not sure whether it's possible from the implementation 
> point of view. Probably someone will help me with this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29550) Enhance locking in session catalog

2019-10-22 Thread Nikita Gorbachevski (Jira)

Nikita Gorbachevski created SPARK-29550:
---

 Summary: Enhance locking in session catalog
 Key: SPARK-29550
 URL: https://issues.apache.org/jira/browse/SPARK-29550
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 2.4.4
Reporter: Nikita Gorbachevski


In my streaming application``spark.streaming.concurrentJobs`` is set to 50 
which is used as size for underlying thread pool. I automatically create/alter 
tables/view in runtime. I order to do that i invoke ``create ... if not exists 
operations`` on driver on each batch invocation. Once i noticed that  most of 
batch time is spent on driver but not on executors. I did a thread dump and 
figured out that most of the threads are blocked on SessionCatalog waiting for 
a lock.  

Existing implementation of SessionCatalog uses a single lock which is used 
almost by all the methods to guard ``currentDb`` and ``tempViews`` variables. I 
propose to enhance locking behaviour of SessionCatalog by :
 # Employing ReadWriteLock which allows to execute read operations 
concurrently. 
 # Replace synchronized with the corresponding read or write lock.

Also it's possible to go even further and strip locks for ``currentDb`` and 
``tempViews`` but i'm not sure whether it's possible from the implementation 
point of view. Probably someone will help me with this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29549) Union of DataSourceV2 datasources leads to duplicated results

2019-10-22 Thread Miguel Molina (Jira)

Miguel Molina created SPARK-29549:
-

 Summary: Union of DataSourceV2 datasources leads to duplicated 
results
 Key: SPARK-29549
 URL: https://issues.apache.org/jira/browse/SPARK-29549
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.4, 2.3.3, 2.3.2, 2.3.1, 2.3.0
Reporter: Miguel Molina


Hello!

I've discovered that when two DataSourceV2 data frames in a query of the exact 
same shape are joined and there is an aggregation in the query, only the first 
results are used. The rest get removed by the ReuseExchange rule and reuse the 
results of the first data frame, leading to N times the first data frame 
results.

 

I've put together a repository with an example project where this can be 
reproduced: [https://github.com/erizocosmico/spark-union-issue]

 

Basically, doing this:

 
{code:java}
val products = spark.sql("SELECT name, COUNT(*) as count FROM products GROUP BY 
name")
val users = spark.sql("SELECT name, COUNT(*) as count FROM users GROUP BY name")

products.union(users)
 .select("name")
 .show(truncate = false, numRows = 50){code}
 

 

Where products is:
{noformat}
+-+---+
|name |id |
+-+---+
|candy |1 |
|chocolate|2 |
|milk |3 |
|cinnamon |4 |
|pizza |5 |
|pineapple|6 |
+-+---+{noformat}
And users is:
{noformat}
+---+---+
|name |id |
+---+---+
|andy |1 |
|alice |2 |
|mike |3 |
|mariah |4 |
|eleanor|5 |
|matthew|6 |
+---+---+ {noformat}
 

Results are incorrect:
{noformat}
+-+
|name |
+-+
|candy |
|pizza |
|chocolate|
|cinnamon |
|pineapple|
|milk |
|candy |
|pizza |
|chocolate|
|cinnamon |
|pineapple|
|milk |
+-+{noformat}
 

This is the plan explained:

 
{noformat}
== Parsed Logical Plan ==
'Project [unresolvedalias('name, None)]
+- AnalysisBarrier
 +- Union
 :- Aggregate [name#0], [name#0, count(1) AS count#8L]
 : +- SubqueryAlias products
 : +- DataSourceV2Relation [name#0, id#1], DefaultReader(List([candy,1], 
[chocolate,2], [milk,3], [cinnamon,4], [pizza,5], [pineapple,6]))
 +- Aggregate [name#4], [name#4, count(1) AS count#12L]
 +- SubqueryAlias users
 +- DataSourceV2Relation [name#4, id#5], DefaultReader(List([andy,1], 
[alice,2], [mike,3], [mariah,4], [eleanor,5], [matthew,6]))
== Analyzed Logical Plan ==
name: string
Project [name#0]
+- Union
 :- Aggregate [name#0], [name#0, count(1) AS count#8L]
 : +- SubqueryAlias products
 : +- DataSourceV2Relation [name#0, id#1], DefaultReader(List([candy,1], 
[chocolate,2], [milk,3], [cinnamon,4], [pizza,5], [pineapple,6]))
 +- Aggregate [name#4], [name#4, count(1) AS count#12L]
 +- SubqueryAlias users
 +- DataSourceV2Relation [name#4, id#5], DefaultReader(List([andy,1], 
[alice,2], [mike,3], [mariah,4], [eleanor,5], [matthew,6]))
== Optimized Logical Plan ==
Union
:- Aggregate [name#0], [name#0]
: +- Project [name#0]
: +- DataSourceV2Relation [name#0, id#1], DefaultReader(List([candy,1], 
[chocolate,2], [milk,3], [cinnamon,4], [pizza,5], [pineapple,6]))
+- Aggregate [name#4], [name#4]
 +- Project [name#4]
 +- DataSourceV2Relation [name#4, id#5], DefaultReader(List([andy,1], 
[alice,2], [mike,3], [mariah,4], [eleanor,5], [matthew,6]))
== Physical Plan ==
Union
:- *(2) HashAggregate(keys=[name#0], functions=[], output=[name#0])
: +- Exchange hashpartitioning(name#0, 200)
: +- *(1) HashAggregate(keys=[name#0], functions=[], output=[name#0])
: +- *(1) Project [name#0]
: +- *(1) DataSourceV2Scan [name#0, id#1], DefaultReader(List([candy,1], 
[chocolate,2], [milk,3], [cinnamon,4], [pizza,5], [pineapple,6]))
+- *(4) HashAggregate(keys=[name#4], functions=[], output=[name#4])
 +- ReusedExchange [name#4], Exchange hashpartitioning(name#0, 200)
{noformat}
 

 

In the physical plan, the first exchange is reused, but it shouldn't be because 
both sources are not the same.

 
{noformat}
== Physical Plan ==
Union
:- *(2) HashAggregate(keys=[name#0], functions=[], output=[name#0])
: +- Exchange hashpartitioning(name#0, 200)
: +- *(1) HashAggregate(keys=[name#0], functions=[], output=[name#0])
: +- *(1) Project [name#0]
: +- *(1) DataSourceV2Scan [name#0, id#1], DefaultReader(List([candy,1], 
[chocolate,2], [milk,3], [cinnamon,4], [pizza,5], [pineapple,6]))
+- *(4) HashAggregate(keys=[name#4], functions=[], output=[name#4])
 +- ReusedExchange [name#4], Exchange hashpartitioning(name#0, 200){noformat}
 

This seems to be fixed in 2.4.x, but affects, 2.3.x versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29548) Redirect system print stream to log4j and improve robustness

2019-10-22 Thread Ching Lin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956885#comment-16956885
 ] 

Ching Lin commented on SPARK-29548:
---

how about using checkpoint instead of log4j ?

> Redirect system print stream to log4j and improve robustness
> 
>
> Key: SPARK-29548
> URL: https://issues.apache.org/jira/browse/SPARK-29548
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> In a production environment, user behavior is highly random and uncertain.
> For example: Users use `System.out` or `System.err` to print information.
> But the system print stream may cause some trouble, such as: the disk file is 
> too large.
> In my production environment, it causes the disk to be full and let 
> [NodeManager] works not fine.
> A method of threat is to forbid the use of `System.out` or `System.err`. But 
> unfriendly to the users.
> A better method is to redirecting the system print stream to `Log4j` and 
> Spark can take advantage of `Log4j`'s split log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29232) RandomForestRegressionModel does not update the parameter maps of the DecisionTreeRegressionModels underneath

2019-10-22 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-29232:


Assignee: Huaxin Gao

> RandomForestRegressionModel does not update the parameter maps of the 
> DecisionTreeRegressionModels underneath
> -
>
> Key: SPARK-29232
> URL: https://issues.apache.org/jira/browse/SPARK-29232
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Jiaqi Guo
>Assignee: Huaxin Gao
>Priority: Major
>
> We trained a RandomForestRegressionModel, and tried to access the trees. Even 
> though the DecisionTreeRegressionModel is correctly built with the proper 
> parameters from random forest, the parameter map is not updated, and still 
> contains only the default value. 
> For example, if a RandomForestRegressor was trained with maxDepth of 12, then 
> accessing the tree information, extractParamMap still returns the default 
> values, with maxDepth=5. Calling the depth itself of 
> DecisionTreeRegressionModel returns the correct value of 12 though.
> This creates issues when we want to access each individual tree and build the 
> trees back up for the random forest estimator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29232) RandomForestRegressionModel does not update the parameter maps of the DecisionTreeRegressionModels underneath

2019-10-22 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-29232.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26154
[https://github.com/apache/spark/pull/26154]

> RandomForestRegressionModel does not update the parameter maps of the 
> DecisionTreeRegressionModels underneath
> -
>
> Key: SPARK-29232
> URL: https://issues.apache.org/jira/browse/SPARK-29232
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Jiaqi Guo
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.0.0
>
>
> We trained a RandomForestRegressionModel, and tried to access the trees. Even 
> though the DecisionTreeRegressionModel is correctly built with the proper 
> parameters from random forest, the parameter map is not updated, and still 
> contains only the default value. 
> For example, if a RandomForestRegressor was trained with maxDepth of 12, then 
> accessing the tree information, extractParamMap still returns the default 
> values, with maxDepth=5. Calling the depth itself of 
> DecisionTreeRegressionModel returns the correct value of 12 though.
> This creates issues when we want to access each individual tree and build the 
> trees back up for the random forest estimator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29540) Thrift in some cases can't parse string to date

2019-10-22 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956863#comment-16956863
 ] 

angerszhu commented on SPARK-29540:
---

check on this.

> Thrift in some cases can't parse string to date
> ---
>
> Key: SPARK-29540
> URL: https://issues.apache.org/jira/browse/SPARK-29540
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> I'm porting tests from PostgreSQL window.sql but anything related to casting 
> a string to datetime seems to fail on Thrift. For instance, the following 
> does not work:
> {code:sql}
> CREATE TABLE empsalary (  
>   
> depname string,   
>   
> empno integer,
>   
> salary int,   
>   
> enroll_date date  
>   
> ) USING parquet;  
> INSERT INTO empsalary VALUES ('develop', 10, 5200, '2007-08-01');
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29545) Implement bitwise integer aggregates bit_xor

2019-10-22 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-29545:
-
Description: 
{code:java}
 {code}
As we support bit_and, bit_or now, we'd better support the related aggregate 
function bit_xor ahead of postgreSQL, because many other popular databases 
support it.

 

[http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbreference/bit-xor-function.html]

[https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_bit-or]

[https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Aggregate/BIT_XOR.htm?TocPath=SQL%20Reference%20Manual%7CSQL%20Functions%7CAggregate%20Functions%7C_10]

 

  was:
{code:java}
 {code}
As we support bit_and, bot_or now, we'd better support the related aggregate 
function bit_or ahead of postgreSQL, because many other popular databases 
support it.

 

[http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbreference/bit-xor-function.html]

[https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_bit-or]

[https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Aggregate/BIT_XOR.htm?TocPath=SQL%20Reference%20Manual%7CSQL%20Functions%7CAggregate%20Functions%7C_10]

 


> Implement bitwise integer aggregates bit_xor
> 
>
> Key: SPARK-29545
> URL: https://issues.apache.org/jira/browse/SPARK-29545
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> {code:java}
>  {code}
> As we support bit_and, bit_or now, we'd better support the related aggregate 
> function bit_xor ahead of postgreSQL, because many other popular databases 
> support it.
>  
> [http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbreference/bit-xor-function.html]
> [https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_bit-or]
> [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Aggregate/BIT_XOR.htm?TocPath=SQL%20Reference%20Manual%7CSQL%20Functions%7CAggregate%20Functions%7C_10]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29548) Redirect system print stream to log4j and improve robustness

2019-10-22 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-29548:
--

 Summary: Redirect system print stream to log4j and improve 
robustness
 Key: SPARK-29548
 URL: https://issues.apache.org/jira/browse/SPARK-29548
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: jiaan.geng


In a production environment, user behavior is highly random and uncertain.
For example: Users use `System.out` or `System.err` to print information.
But the system print stream may cause some trouble, such as: the disk file is 
too large.

In my production environment, it causes the disk to be full and let 
[NodeManager] works not fine.

A method of threat is to forbid the use of `System.out` or `System.err`. But 
unfriendly to the users.
A better method is to redirecting the system print stream to `Log4j` and Spark 
can take advantage of `Log4j`'s split log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29547) Make `docker-integration-tests` work in JDK11

2019-10-22 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-29547:
-

 Summary: Make `docker-integration-tests` work in JDK11
 Key: SPARK-29547
 URL: https://issues.apache.org/jira/browse/SPARK-29547
 Project: Spark
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


To support JDK11,  SPARK-28737 upgraded `Jersey` to 2.29. However, it turns out 
that `docker-integration-tests` is broken because `com.spotify.docker-client` 
still depends on jersey-guava.

SPARK-29546 recovers the test suite in JDK8 by adding back the dependency. We 
had better make this test suite work in JDK11 environment, too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 111 matches

Mail list logo