[GitHub] spark issue #18323: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18323
  
**[Test build #78245 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78245/testReport)**
 for PR 18323 at commit 
[`7407541`](https://github.com/apache/spark/commit/740754135db80ff0ab60952b38defae65c017065).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18092
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78246/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18092
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18092
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78247/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18092
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18323: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18323
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78245/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18323: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18323
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

2017-06-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18140#discussion_r122632556
  
--- Diff: R/pkg/R/mllib_regression.R ---
@@ -182,9 +207,13 @@ setMethod("spark.glm", signature(data = 
"SparkDataFrame", formula = "formula"),
 #' @seealso \link{spark.glm}
 setMethod("glm", signature(formula = "formula", family = "ANY", data = 
"SparkDataFrame"),
   function(formula, family = gaussian, data, epsilon = 1e-6, maxit 
= 25, weightCol = NULL,
-   var.power = 0.0, link.power = 1.0 - var.power) {
+   var.power = 0.0, link.power = 1.0 - var.power,
+   stringIndexerOrderType = c("frequencyDesc", 
"frequencyAsc",
+  "alphabetDesc", 
"alphabetAsc")) {
+stringIndexerOrderType <- match.arg(stringIndexerOrderType)
--- End diff --

maybe we don't need here, since we are calling spark.glm which will do the 
same check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

2017-06-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18140#discussion_r122632393
  
--- Diff: R/pkg/R/mllib_regression.R ---
@@ -145,7 +163,8 @@ setMethod("spark.glm", signature(data = 
"SparkDataFrame", formula = "formula"),
 jobj <- 
callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper",
 "fit", formula, data@sdf, 
tolower(family$family), family$link,
 tol, as.integer(maxIter), weightCol, 
regParam,
-as.double(var.power), 
as.double(link.power))
+as.double(var.power), 
as.double(link.power),
+as.character(stringIndexerOrderType))
--- End diff --

nit: I think we don't need `as.character` now as `stringIndexerOrderType` 
is from `match.arg`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #7075: [SPARK-8674] [MLlib] Implementation of a 2 sample Kolmogo...

2017-06-19 Thread josepablocam
Github user josepablocam commented on the issue:

https://github.com/apache/spark/pull/7075
  
I'm not involved in this anymore. If you're interested in making those
fixes, feel free. Thanks.

On Jun 19, 2017 5:44 AM, "Hyukjin Kwon"  wrote:

> @josepablocam , it looks the conflicts
> were not resolved cleanly. Would you resolve them?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

2017-06-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18140#discussion_r122632631
  
--- Diff: R/pkg/R/mllib_regression.R ---
@@ -443,10 +478,14 @@ setMethod("write.ml", signature(object = 
"IsotonicRegressionModel", path = "char
 #' }
 #' @note spark.survreg since 2.0.0
 setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = 
"formula"),
-  function(data, formula, aggregationDepth = 2) {
+  function(data, formula, aggregationDepth = 2,
+   stringIndexerOrderType = c("frequencyDesc", 
"frequencyAsc",
+  "alphabetDesc", 
"alphabetAsc")) {
+stringIndexerOrderType <- match.arg(stringIndexerOrderType)
 formula <- paste(deparse(formula), collapse = "")
 jobj <- 
callJStatic("org.apache.spark.ml.r.AFTSurvivalRegressionWrapper",
-"fit", formula, data@sdf, 
as.integer(aggregationDepth))
+"fit", formula, data@sdf, 
as.integer(aggregationDepth),
+as.character(stringIndexerOrderType))
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

2017-06-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18140#discussion_r122632729
  
--- Diff: R/pkg/tests/fulltests/test_mllib_regression.R ---
@@ -367,6 +367,51 @@ test_that("glm save/load", {
   unlink(modelPath)
 })
 
+test_that("spark.glm and glm with string encoding", {
+  skip_on_cran()
--- End diff --

sorry, no longer needed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...

2017-06-19 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18343#discussion_r122633016
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -175,6 +175,7 @@ class KryoSerializer(conf: SparkConf)
 kryo.register(None.getClass)
 kryo.register(Nil.getClass)
 
kryo.register(Utils.classForName("scala.collection.immutable.$colon$colon"))
+
kryo.register(Utils.classForName("scala.collection.immutable.Map$EmptyMap$"))
--- End diff --

Because this test failed: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78224/testReport/org.apache.spark.serializer/KryoSerializerSuite/registration_of_HighlyCompressedMapStatus/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18343
  
**[Test build #78248 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78248/testReport)**
 for PR 18343 at commit 
[`facca95`](https://github.com/apache/spark/commit/facca957f4fa21ff488a20e17af07317ddca474a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18343
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18343
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78248/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18343
  
**[Test build #78248 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78248/testReport)**
 for PR 18343 at commit 
[`facca95`](https://github.com/apache/spark/commit/facca957f4fa21ff488a20e17af07317ddca474a).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...

2017-06-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18343#discussion_r122636117
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -141,8 +143,8 @@ private[spark] class HighlyCompressedMapStatus private (
 private[this] var numNonEmptyBlocks: Int,
 private[this] var emptyBlocks: RoaringBitmap,
 private[this] var avgSize: Long,
-@transient private var hugeBlockSizes: Map[Int, Byte])
-  extends MapStatus with Externalizable {
+@transient private[this] var hugeBlockSizes: Map[Int, Byte])
+  extends MapStatus with Externalizable with KryoSerializable {
--- End diff --

I think the previous version already worked...  When we have 
`writeExternal` and `readExternal`, `@transient` doesn't matter for java 
serializer, so removing `@transient` to make it work with kryo is a valid fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...

2017-06-19 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18343#discussion_r122637320
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -141,8 +143,8 @@ private[spark] class HighlyCompressedMapStatus private (
 private[this] var numNonEmptyBlocks: Int,
 private[this] var emptyBlocks: RoaringBitmap,
 private[this] var avgSize: Long,
-@transient private var hugeBlockSizes: Map[Int, Byte])
-  extends MapStatus with Externalizable {
+@transient private[this] var hugeBlockSizes: Map[Int, Byte])
+  extends MapStatus with Externalizable with KryoSerializable {
--- End diff --

OK, I have manual tests `remove @transient` and `Extends KryoSerializable`, 
both worked fine,  I'm doing UT.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18322: [SPARK-21115][Core]If the cores left is less than...

2017-06-19 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18322#discussion_r122638913
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -278,6 +278,14 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 if (pyFiles != null && !isPython) {
   SparkSubmit.printErrorAndExit("--py-files given but primary resource 
is not a Python script")
 }
+if (totalExecutorCores != null && executorCores != null) {
--- End diff --

Ok, I have moved it to SparkConf, would you like to review it again, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18322: [SPARK-21115][Core]If the cores left is less than the co...

2017-06-19 Thread eatoncys
Github user eatoncys commented on the issue:

https://github.com/apache/spark/pull/18322
  
@jerryshao, I have added a unit test in MasterSuite, would you like to 
review it again, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...

2017-06-19 Thread lubozhan
Github user lubozhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18347#discussion_r122639236
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -465,6 +465,8 @@ case class DataSource(
 providingClass.newInstance() match {
   case dataSource: CreatableRelationProvider =>
 SaveIntoDataSourceCommand(data, dataSource, 
caseInsensitiveOptions, mode)
+  case dataSource: ConsoleSinkProvider =>
+data.show(data.count().toInt, false)
--- End diff --

Sorry for late reply.
Yes, it is right to use underscore since dataSource is not used. 
Considering it is no need to create a new ConsoleSink and no access to the 
private variable, i will use caseInsensitiveOptions instead to extract the 
numRows and truncate, 
Thanks for your comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18320: [SPARK-21093][R] Terminate R's worker processes i...

2017-06-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18320#discussion_r122639738
  
--- Diff: R/pkg/inst/worker/daemon.R ---
@@ -31,7 +31,15 @@ inputCon <- socketConnection(
 port = port, open = "rb", blocking = TRUE, timeout = connectionTimeout)
 
 while (TRUE) {
-  ready <- socketSelect(list(inputCon))
+  ready <- socketSelect(list(inputCon), timeout = 1)
+
+  # Terminate R workers in the parent process.
+  finishedChildren <- parallel:::selectChildren()
--- End diff --

Definitely for comments.

Maybe I missed your point. Children will only return their PID on exit and 
`parallel:::selectChildren()` will only return children PIDs that finished they 
work and called `parallel:::mcexit(0L)` (related test was done 
https://github.com/apache/spark/pull/18320#discussion_r122605437) up to my 
knowledge. So, even if connecting to JVM is delayed in `worker.R` and 
`RRunner.scala`, it won't matter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18340: [SPARK-21132] [SQL] DISTINCT modifier of function argume...

2017-06-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18340
  
LGTM, merging to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18092
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-19 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18300#discussion_r122640567
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1764,6 +1765,58 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
+   *
+   * This is different from both `UNION ALL` and `UNION DISTINCT` in SQL. 
To do a SQL-style set
+   * union (that does deduplication of elements), use this function 
followed by a [[distinct]].
+   *
+   * The difference between this function and [[union]] is that this 
function
+   * resolves columns by name (not by position):
+   *
+   * {{{
+   *   val df1 = Seq((1, 2, 3)).toDF("col0", "col1", "col2")
+   *   val df2 = Seq((4, 5, 6)).toDF("col1", "col2", "col0")
+   *   df1.unionByName(df2).show
+   *
+   *   // output:
+   *   // ++++
+   *   // |col0|col1|col2|
+   *   // ++++
+   *   // |   1|   2|   3|
+   *   // |   6|   4|   5|
+   *   // ++++
+   * }}}
+   *
+   * @group typedrel
+   * @since 2.3.0
+   */
+  def unionByName(other: Dataset[T]): Dataset[T] = withSetOperator {
+// Creates a `Union` node and resolves it first to reorder output 
attributes in `other` by name
+val unionPlan = 
sparkSession.sessionState.executePlan(Union(logicalPlan, other.logicalPlan))
--- End diff --

Is this always resolvable? If the columns don't have the same data types, 
the `Union` may not be resolved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18340: [SPARK-21132] [SQL] DISTINCT modifier of function...

2017-06-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18340


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18092
  
**[Test build #78249 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78249/testReport)**
 for PR 18092 at commit 
[`d31d8da`](https://github.com/apache/spark/commit/d31d8da7952e1db527fa892087b2feb85799cae4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18300#discussion_r122645208
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1764,6 +1765,58 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
+   *
+   * This is different from both `UNION ALL` and `UNION DISTINCT` in SQL. 
To do a SQL-style set
+   * union (that does deduplication of elements), use this function 
followed by a [[distinct]].
+   *
+   * The difference between this function and [[union]] is that this 
function
+   * resolves columns by name (not by position):
+   *
+   * {{{
+   *   val df1 = Seq((1, 2, 3)).toDF("col0", "col1", "col2")
+   *   val df2 = Seq((4, 5, 6)).toDF("col1", "col2", "col0")
+   *   df1.unionByName(df2).show
+   *
+   *   // output:
+   *   // ++++
+   *   // |col0|col1|col2|
+   *   // ++++
+   *   // |   1|   2|   3|
+   *   // |   6|   4|   5|
+   *   // ++++
+   * }}}
+   *
+   * @group typedrel
+   * @since 2.3.0
+   */
+  def unionByName(other: Dataset[T]): Dataset[T] = withSetOperator {
+// Creates a `Union` node and resolves it first to reorder output 
attributes in `other` by name
+val unionPlan = 
sparkSession.sessionState.executePlan(Union(logicalPlan, other.logicalPlan))
--- End diff --

In that case, I think we couldn't pass `unionPlan.assertAnalyzed()`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-19 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18300#discussion_r122645584
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1764,6 +1765,58 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
+   *
+   * This is different from both `UNION ALL` and `UNION DISTINCT` in SQL. 
To do a SQL-style set
+   * union (that does deduplication of elements), use this function 
followed by a [[distinct]].
+   *
+   * The difference between this function and [[union]] is that this 
function
+   * resolves columns by name (not by position):
+   *
+   * {{{
+   *   val df1 = Seq((1, 2, 3)).toDF("col0", "col1", "col2")
+   *   val df2 = Seq((4, 5, 6)).toDF("col1", "col2", "col0")
+   *   df1.unionByName(df2).show
+   *
+   *   // output:
+   *   // ++++
+   *   // |col0|col1|col2|
+   *   // ++++
+   *   // |   1|   2|   3|
+   *   // |   6|   4|   5|
+   *   // ++++
+   * }}}
+   *
+   * @group typedrel
+   * @since 2.3.0
+   */
+  def unionByName(other: Dataset[T]): Dataset[T] = withSetOperator {
+// Creates a `Union` node and resolves it first to reorder output 
attributes in `other` by name
+val unionPlan = 
sparkSession.sessionState.executePlan(Union(logicalPlan, other.logicalPlan))
--- End diff --

Yeah, so we don't plan to support it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18300#discussion_r122647177
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1764,6 +1765,58 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
+   *
+   * This is different from both `UNION ALL` and `UNION DISTINCT` in SQL. 
To do a SQL-style set
+   * union (that does deduplication of elements), use this function 
followed by a [[distinct]].
+   *
+   * The difference between this function and [[union]] is that this 
function
+   * resolves columns by name (not by position):
+   *
+   * {{{
+   *   val df1 = Seq((1, 2, 3)).toDF("col0", "col1", "col2")
+   *   val df2 = Seq((4, 5, 6)).toDF("col1", "col2", "col0")
+   *   df1.unionByName(df2).show
+   *
+   *   // output:
+   *   // ++++
+   *   // |col0|col1|col2|
+   *   // ++++
+   *   // |   1|   2|   3|
+   *   // |   6|   4|   5|
+   *   // ++++
+   * }}}
+   *
+   * @group typedrel
+   * @since 2.3.0
+   */
+  def unionByName(other: Dataset[T]): Dataset[T] = withSetOperator {
+// Creates a `Union` node and resolves it first to reorder output 
attributes in `other` by name
+val unionPlan = 
sparkSession.sessionState.executePlan(Union(logicalPlan, other.logicalPlan))
--- End diff --

I think (you already know that) `TypeCoercion` in `Analyzer` resolves 
compatible types for that case like: 
https://github.com/apache/spark/pull/18300/files#diff-5d2ebf4e9ca5a990136b276859769289R122.
 You suggested other cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18309: [SPARK-21079] [SQL] Calculate total size of a par...

2017-06-19 Thread mbasmanova
Github user mbasmanova commented on a diff in the pull request:

https://github.com/apache/spark/pull/18309#discussion_r122649017
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ---
@@ -126,6 +127,27 @@ private[sql] trait SQLTestUtils
   }
 
   /**
+   * Creates the requested number of temporary path (without creating the 
actual file/directory),
--- End diff --

I wanted to create a multi-path version of an existing withTempPath 
function. That function returns a *valid*, but non-existent path. The comments 
and path.delete() part of the implementation came from there.

/**
   * Generates a temporary path without creating the actual file/directory, 
then pass it to `f`. If
   * a file/directory is created there by `f`, it will be delete after `f` 
returns.
   *
   * @todo Probably this method should be moved to a more general place
   */
  protected def withTempPath(f: File => Unit): Unit = {
val path = Utils.createTempDir()
path.delete()
try f(path) finally Utils.deleteRecursively(path)
  }


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18290: [SPARK-20989][Core] Fail to start multiple workers on on...

2017-06-19 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/18290
  
AFAIK there is no plan to support multiple external shuffle services on one 
host, and we mainly start only one worker on each host in standalone mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-19 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18300#discussion_r122649899
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1764,6 +1765,58 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
+   *
+   * This is different from both `UNION ALL` and `UNION DISTINCT` in SQL. 
To do a SQL-style set
+   * union (that does deduplication of elements), use this function 
followed by a [[distinct]].
+   *
+   * The difference between this function and [[union]] is that this 
function
+   * resolves columns by name (not by position):
+   *
+   * {{{
+   *   val df1 = Seq((1, 2, 3)).toDF("col0", "col1", "col2")
+   *   val df2 = Seq((4, 5, 6)).toDF("col1", "col2", "col0")
+   *   df1.unionByName(df2).show
+   *
+   *   // output:
+   *   // ++++
+   *   // |col0|col1|col2|
+   *   // ++++
+   *   // |   1|   2|   3|
+   *   // |   6|   4|   5|
+   *   // ++++
+   * }}}
+   *
+   * @group typedrel
+   * @since 2.3.0
+   */
+  def unionByName(other: Dataset[T]): Dataset[T] = withSetOperator {
+// Creates a `Union` node and resolves it first to reorder output 
attributes in `other` by name
+val unionPlan = 
sparkSession.sessionState.executePlan(Union(logicalPlan, other.logicalPlan))
--- End diff --

hmm, I mean the case looks like:

val df1 = Seq((1, "2", 3.4)).toDF("a", "b", "c")
val df2 = Seq((6.7, 4, "5")).toDF("c", "a", "b")

And the result is `Row(1, "2", 3.4) :: Row(4, "5", 6.7)`.

That's what I guess `unionByName` should do?

Forcibly widening the types looks a bit weird for me. Because after the 
union, the schema is different to original datasets.

Or maybe I miss the purpose of this API?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18309: [SPARK-21079] [SQL] Calculate total size of a par...

2017-06-19 Thread mbasmanova
Github user mbasmanova commented on a diff in the pull request:

https://github.com/apache/spark/pull/18309#discussion_r122651779
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -128,6 +129,45 @@ class StatisticsSuite extends 
StatisticsCollectionTestBase with TestHiveSingleto
   TableIdentifier("tempTable"), ignoreIfNotExists = true, purge = 
false)
   }
 
+  test("SPARK-21079 - analyze table with location different than that of 
individual partitions") {
+def queryTotalSize(tableName: String): BigInt =
+  
spark.table(tableName).queryExecution.analyzed.stats(conf).sizeInBytes
+
+val tableName = "analyzeTable_part"
+withTable(tableName) {
+  withTempPaths(4) {
+case tablePath :: partitionPaths =>
+  sql(
+s"""
+   |CREATE TABLE ${tableName} (key STRING, value STRING) 
PARTITIONED BY (ds STRING)
+   |LOCATION '${tablePath}'
+ """.
+  stripMargin).collect()
+
+  val partitionDates = List("2010-01-01", "2010-01-02", 
"2010-01-03")
+  partitionDates.zip(partitionPaths).foreach {
+case (ds, path) =>
+  sql(
+s"""
+   |ALTER TABLE ${tableName} ADD PARTITION (ds='${ds}')
+   |LOCATION '${path.toString}'
+""".
+  stripMargin).collect()
+  sql(
+s"""
+   |INSERT INTO TABLE ${tableName} PARTITION (ds='${ds}')
+   |SELECT * FROM src
+""".
+  stripMargin).collect()
+  }
+
+  sql(s"ANALYZE TABLE ${tableName} COMPUTE STATISTICS noscan")
+
+  assert(queryTotalSize(tableName) === BigInt(17436))
--- End diff --

This is how I wrote the test initially. In this case all partitions are 
located under the same top-level directory, but table-level location is 
somewhere else. I modified the test to use different paths for each partition 
as well as the table to address some of the earlier comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18309: [SPARK-21079] [SQL] Calculate total size of a partition ...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18309
  
**[Test build #78250 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78250/testReport)**
 for PR 18309 at commit 
[`09c4900`](https://github.com/apache/spark/commit/09c4900f2a4502b8e3d577798245b5cc3e29bb49).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18309: [SPARK-21079] [SQL] Calculate total size of a partition ...

2017-06-19 Thread mbasmanova
Github user mbasmanova commented on the issue:

https://github.com/apache/spark/pull/18309
  
@gatorsmile, re: additional test, how can I create a table with a mix of 
visible and invisible partitions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18300#discussion_r122653082
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1764,6 +1765,58 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
+   *
+   * This is different from both `UNION ALL` and `UNION DISTINCT` in SQL. 
To do a SQL-style set
+   * union (that does deduplication of elements), use this function 
followed by a [[distinct]].
+   *
+   * The difference between this function and [[union]] is that this 
function
+   * resolves columns by name (not by position):
+   *
+   * {{{
+   *   val df1 = Seq((1, 2, 3)).toDF("col0", "col1", "col2")
+   *   val df2 = Seq((4, 5, 6)).toDF("col1", "col2", "col0")
+   *   df1.unionByName(df2).show
+   *
+   *   // output:
+   *   // ++++
+   *   // |col0|col1|col2|
+   *   // ++++
+   *   // |   1|   2|   3|
+   *   // |   6|   4|   5|
+   *   // ++++
+   * }}}
+   *
+   * @group typedrel
+   * @since 2.3.0
+   */
+  def unionByName(other: Dataset[T]): Dataset[T] = withSetOperator {
+// Creates a `Union` node and resolves it first to reorder output 
attributes in `other` by name
+val unionPlan = 
sparkSession.sessionState.executePlan(Union(logicalPlan, other.logicalPlan))
--- End diff --

Aha, I see. This is a bug, so I'll look into this. Thanks!
This target is just to union by name while keeping the `union` semantics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18343
  
**[Test build #78251 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78251/testReport)**
 for PR 18343 at commit 
[`e045bef`](https://github.com/apache/spark/commit/e045bef720402c5ecc52dd546500dbfd5f431c09).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18351: [SPARK-21135][WEB UI] On history server page,du...

2017-06-19 Thread fjh100456
GitHub user fjh100456 opened a pull request:

https://github.com/apache/spark/pull/18351

[SPARK-21135][WEB UI] On history server page,duration of incompleted 
applications should be hidden instead of showing up as 0

## What changes were proposed in this pull request?

Hide duration of incompleted applications.

## How was this patch tested?

manual tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fjh100456/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18351.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18351


commit f91ed07718d4462e487237353bf82c80c5e148f7
Author: fjh100456 
Date:   2017-06-19T08:53:06Z

[SPARK-21135][WEB UI] On history server page,duration of incompleted 
applications should be hidden instead of showing up as 0

## What changes were proposed in this pull request?

Hide duration of incomplete applications.

## How was this patch tested?

manual tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18351: [SPARK-21135][WEB UI] On history server page,duration ...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18351
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-19 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18300
  
@viirya How about this?
```
scala> val df1 = Seq((1, "2", 3.4)).toDF("a", "b", "c")
scala> val df2 = Seq((1, "3", 6.7)).toDF("a", "b", "c")
scala> df1.union(df2).printSchema
root
 |-- a: integer (nullable = false)
 |-- b: string (nullable = true)
 |-- c: double (nullable = false)

scala> df1.union(df2).show
+---+---+---+
|  a|  b|  c|
+---+---+---+
|  1|  2|3.4|
|  1|  3|6.7|
+---+---+---+

scala> val df1 = Seq((1, "2", 3.4)).toDF("a", "b", "c")
scala> val df2 = Seq((6.7, 4, "5")).toDF("c", "a", "b")
scala> df1.unionByName(df2).printSchema
root
 |-- a: integer (nullable = false)
 |-- b: string (nullable = true)
 |-- c: double (nullable = false)

scala> df1.unionByName(df2).show
+---+---+---+
|  a|  b|  c|
+---+---+---+
|  1|  2|3.4|
|  1|  3|6.7|
+---+---+---+
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15417
  
**[Test build #78253 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78253/testReport)**
 for PR 15417 at commit 
[`c7a859c`](https://github.com/apache/spark/commit/c7a859c0ba8dbf2404ea1ee7979c4faf09000138).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18300
  
**[Test build #78252 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78252/testReport)**
 for PR 18300 at commit 
[`ed26881`](https://github.com/apache/spark/commit/ed26881ed66b9f2d0e9d695a40de0bb8bb72a8c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18350: [MINOR] Fix some typo of the document

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18350
  
**[Test build #3802 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3802/testReport)**
 for PR 18350 at commit 
[`e7baf54`](https://github.com/apache/spark/commit/e7baf5489c1472c180f8ec7609ec370b0ed9dabe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18278: Branch 2.2

2017-06-19 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18278
  
Close this @GaryLeee 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-19 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18300
  
oh, the current one does not work well..., so I need to consider more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18351: [SPARK-21135][WEB UI] On history server page,duration ...

2017-06-19 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18351
  
CC @zhuoliu 
Wouldn't there be more places to apply logic like this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18343
  
OK I get it. Hm, I wonder why some classes in the code extend 
`Externalizable` instead of `Serializable`? I see a comment about controlling 
serialization, but `Serializable` also lets you do that. 

I imagine Kryo understands what to do with `Serializable` but not 
`Externalizable`, because i see most classes the implement the latter also 
implement `KryoSerializable`.

That seems like the right fix?

Because this seems to just cause default serialization to take over, and 
that's not desirable, apparently.

If so then we might have a similar problem with `CompressedMapStatus`, 
`DirectTaskResult`, `LongHashedRelation`, `StorageLevel`, `UpdateBlockInfo`, 
`BlockManagerId`. It may not make a difference there, or I might misunderstand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

2017-06-19 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/18092#discussion_r122661511
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleClient.java
 ---
@@ -49,6 +49,7 @@
   private final TransportConf conf;
   private final boolean authEnabled;
   private final SecretKeyHolder secretKeyHolder;
+  private final long registrationTimeoutMilli;
--- End diff --

"MS" or "Millis" is more consistent. Milli suggests something different. 
https://en.wikipedia.org/wiki/Milli_Vanilli


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18344: [MINOR][BUILD][BRANCH-2.2] Fix Java linter errors

2017-06-19 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18344
  
OK do we need this PR or will https://github.com/apache/spark/pull/18345 
contain all of these changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18322: [SPARK-21115][Core]If the cores left is less than...

2017-06-19 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/18322#discussion_r122665706
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -543,6 +543,30 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
   }
 }
 
+if (contains("spark.cores.max")) {
--- End diff --

I think these checks for negative numbers are redundant with arg checking 
for `spark-submit`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18327: [SPARK-21047] Add test suites for complicated cases in C...

2017-06-19 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18327
  
Thanks, let me take a look tonight.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18341: [MINOR] Bump SparkR and PySpark version to 2.3.0.

2017-06-19 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18341
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18288: [SPARK-21066][ML] LibSVM load just one input file

2017-06-19 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/18288#discussion_r122669126
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala ---
@@ -91,12 +91,10 @@ private[libsvm] class LibSVMFileFormat extends 
TextBasedFileFormat with DataSour
 val numFeatures: Int = libSVMOptions.numFeatures.getOrElse {
   // Infers number of features if the user doesn't specify (a valid) 
one.
   val dataFiles = files.filterNot(_.getPath.getName startsWith "_")
-  val path = if (dataFiles.length == 1) {
-dataFiles.head.getPath.toUri.toString
-  } else if (dataFiles.isEmpty) {
+  val path = if (dataFiles.isEmpty) {
 throw new IOException("No input path specified for libsvm data")
   } else {
-throw new IOException("Multiple input paths are not supported for 
libsvm data.")
+dataFiles.map(_.getPath).mkString(",")
--- End diff --

Ping @darionyaphet 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18341: [MINOR] Bump SparkR and PySpark version to 2.3.0.

2017-06-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18341


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...

2017-06-19 Thread ouyangxiaochen
Github user ouyangxiaochen commented on the issue:

https://github.com/apache/spark/pull/17681
  
retest this please,Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18341: [MINOR] Bump SparkR and PySpark version to 2.3.0.

2017-06-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18341
  
Sure, I will do. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18092
  
**[Test build #78249 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78249/testReport)**
 for PR 18092 at commit 
[`d31d8da`](https://github.com/apache/spark/commit/d31d8da7952e1db527fa892087b2feb85799cae4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18092
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78249/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18092
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18300
  
**[Test build #78252 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78252/testReport)**
 for PR 18300 at commit 
[`ed26881`](https://github.com/apache/spark/commit/ed26881ed66b9f2d0e9d695a40de0bb8bb72a8c8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18300
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18300: [SPARK-21043][SQL] Add unionByName in Dataset

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18300
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78252/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18343
  
It seems `Externalizable` is kind of abused in Spark, we should benchmark 
and make sure that these "customized serialization logic" is faster than the 
default one of java serializer.

For this patch, the kryo serializer seems have the same logic to serialize 
a map: https://github.com/apache/spark/pull/18343#discussion_r122621682 , so 
it's ok to just remove the `@transient`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18343
  
I don't quibble with custom serialization logic, but you can do that with 
`Serializable` too. And Kryo has its own marker interface too. I wonder what 
the purpose of `Externalizable` is then. Actually, I've always wondered this in 
general about this JDK interface (which extends `Serializable`) so maybe i'm 
missing something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18322: [SPARK-21115][Core]If the cores left is less than...

2017-06-19 Thread eatoncys
Github user eatoncys commented on a diff in the pull request:

https://github.com/apache/spark/pull/18322#discussion_r122676148
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -543,6 +543,30 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
   }
 }
 
+if (contains("spark.cores.max")) {
--- End diff --

@srowen  Users may set these configuration via SparkConf object in a 
programmatic way, can I move the checkings from spark–submit to here, or 
removed it here directly,which is better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15417
  
**[Test build #78253 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78253/testReport)**
 for PR 15417 at commit 
[`c7a859c`](https://github.com/apache/spark/commit/c7a859c0ba8dbf2404ea1ee7979c4faf09000138).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class SessionCatalogSuite extends AnalysisTest `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17963: [SPARK-20722][CORE] Replay newer event log that h...

2017-06-19 Thread sharkdtu
Github user sharkdtu closed the pull request at:

https://github.com/apache/spark/pull/17963


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15417
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15417
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78253/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18343
  
**[Test build #78251 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78251/testReport)**
 for PR 18343 at commit 
[`e045bef`](https://github.com/apache/spark/commit/e045bef720402c5ecc52dd546500dbfd5f431c09).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18343
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78251/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18343
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18309: [SPARK-21079] [SQL] Calculate total size of a partition ...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18309
  
**[Test build #78250 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78250/testReport)**
 for PR 18309 at commit 
[`09c4900`](https://github.com/apache/spark/commit/09c4900f2a4502b8e3d577798245b5cc3e29bb49).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18309: [SPARK-21079] [SQL] Calculate total size of a partition ...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18309
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18309: [SPARK-21079] [SQL] Calculate total size of a partition ...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18309
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78250/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18352: [SPARK-21138] [YARN] Cannot delete staging dir wh...

2017-06-19 Thread sharkdtu
GitHub user sharkdtu opened a pull request:

https://github.com/apache/spark/pull/18352

[SPARK-21138] [YARN] Cannot delete staging dir when the clusters of 
"spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different

## What changes were proposed in this pull request?

When I set different clusters for "spark.hadoop.fs.defaultFS" and 
"spark.yarn.stagingDir" as follows:
```
spark.hadoop.fs.defaultFS  hdfs://tl-nn-tdw.tencent-distribute.com:54310
spark.yarn.stagingDir hdfs://ss-teg-2-v2/tmp/spark
```
The staging dir can not be deleted, it will prompt following message:
```
java.lang.IllegalArgumentException: Wrong FS: 
hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618, 
expected: hdfs://tl-nn-tdw.tencent-distribute.com:54310
```

## How was this patch tested?

Existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sharkdtu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18352.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18352


commit b74138e31d3317b34ffb9f13cf7fdd7873edc1a6
Author: sharkdtu 
Date:   2017-06-19T11:03:01Z

Cannot delete staging dir when the clusters of spark.yarn.stagingDir and 
spark.hadoop.fs.defaultFS are different




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18352: [SPARK-21138] [YARN] Cannot delete staging dir when the ...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18352
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18352: [SPARK-21138] [YARN] Cannot delete staging dir when the ...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18352
  
**[Test build #3803 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3803/testReport)**
 for PR 18352 at commit 
[`b74138e`](https://github.com/apache/spark/commit/b74138e31d3317b34ffb9f13cf7fdd7873edc1a6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18351: [SPARK-21135][WEB UI] On history server page,duration ...

2017-06-19 Thread fjh100456
Github user fjh100456 commented on the issue:

https://github.com/apache/spark/pull/18351
  
I have not found a similar problem on other pages? The CompleteTime of this 
page has been hidden, indicating that it should have been considered before. 
Have you considered the other question I mentioned in Jira about the abort 
of the anomaly?   @zhouliu  @srowen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18352: [SPARK-21138] [YARN] Cannot delete staging dir when the ...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18352
  
**[Test build #3803 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3803/testReport)**
 for PR 18352 at commit 
[`b74138e`](https://github.com/apache/spark/commit/b74138e31d3317b34ffb9f13cf7fdd7873edc1a6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18343
  
**[Test build #78254 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78254/testReport)**
 for PR 18343 at commit 
[`7a4e6ec`](https://github.com/apache/spark/commit/7a4e6ec52ee8a198542ec2260586602186d67f4c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18320: [SPARK-21093][R] Terminate R's worker processes i...

2017-06-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18320#discussion_r122687051
  
--- Diff: R/pkg/inst/worker/daemon.R ---
@@ -31,7 +31,15 @@ inputCon <- socketConnection(
 port = port, open = "rb", blocking = TRUE, timeout = connectionTimeout)
 
 while (TRUE) {
-  ready <- socketSelect(list(inputCon))
+  ready <- socketSelect(list(inputCon), timeout = 1)
+
+  # Terminate R workers in the parent process.
+  finishedChildren <- parallel:::selectChildren()
--- End diff --

@felixcheung, I tested with the change below:

```diff
 port <- as.integer(Sys.getenv("SPARKR_WORKER_PORT"))
+Sys.sleep(5L)
 inputCon <- socketConnection(
 port = port, blocking = TRUE, open = "rb", timeout = connectionTimeout)
 outputCon <- socketConnection(
```

It looks fine. Does this deal with your concern?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18350: [MINOR] Fix some typo of the document

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18350
  
**[Test build #3802 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3802/testReport)**
 for PR 18350 at commit 
[`e7baf54`](https://github.com/apache/spark/commit/e7baf5489c1472c180f8ec7609ec370b0ed9dabe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Terminate R's worker processes in the p...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18320
  
**[Test build #78255 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78255/testReport)**
 for PR 18320 at commit 
[`fa7a226`](https://github.com/apache/spark/commit/fa7a2261e915ae11f3c83139bcf8d9f62a5929c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18320: [SPARK-21093][R] Terminate R's worker processes i...

2017-06-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18320#discussion_r122695789
  
--- Diff: R/pkg/inst/worker/daemon.R ---
@@ -31,7 +31,30 @@ inputCon <- socketConnection(
 port = port, open = "rb", blocking = TRUE, timeout = connectionTimeout)
 
 while (TRUE) {
-  ready <- socketSelect(list(inputCon))
+  ready <- socketSelect(list(inputCon), timeout = 1)
+
+  # Note that the children should be terminated in the parent. If each 
child terminates
+  # itself, it appears that the resource is not released properly, that 
causes an unexpected
+  # termination of this daemon due to, for example, running out of file 
descriptors
+  # (see SPARK-21093). Therefore, the current implementation tries to 
retrieve children
+  # that are exited (but not terminated) and then sends a kill signal to 
terminate them properly
+  # in the parent.
+  #
+  # There are two paths that it sends a signal to terminate the children 
in the parent.
+  #
+  #   1. Every second if any socket connection is not available.
+  #   2. Right after a socket connection is available.
+  #
+  # In other words, the parent sends the signal to children every second 
or right before
+  # launching other worker children from the following new socket 
connection.
+  #
+  # Only the process IDs of exited children are returned and the 
termination is attempted below.
+  finishedChildren <- parallel:::selectChildren(timeout = 0)
--- End diff --

This is 0 by default but I added to prevent conversion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Terminate R's worker processes in the p...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78255/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Terminate R's worker processes in the p...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18320
  
**[Test build #78255 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78255/testReport)**
 for PR 18320 at commit 
[`fa7a226`](https://github.com/apache/spark/commit/fa7a2261e915ae11f3c83139bcf8d9f62a5929c6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Terminate R's worker processes in the p...

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18320
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18092
  
**[Test build #78256 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78256/testReport)**
 for PR 18092 at commit 
[`ca308bc`](https://github.com/apache/spark/commit/ca308bcc12243d1a301199763c58f0b2d801).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18235: [SPARK-21012][Submit] Add glob support for resour...

2017-06-19 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18235#discussion_r122707282
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -858,19 +844,33 @@ object SparkSubmit extends CommandLineUtils {
 require(path != null, "path cannot be null.")
 val uri = Utils.resolveURI(path)
 uri.getScheme match {
-  case "file" | "local" =>
-path
-
+  case "file" | "local" => path
+  case "https" | "http" | "ftp" => path
--- End diff --

My original thinking is that `FileSystem` API doesn't support such API, so 
I treat them separately without downloading. Will verify it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...

2017-06-19 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18159
  
ping @cloud-fan @gatorsmile for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18342: [Spark-21123][Docs][Structured Streaming] Options for fi...

2017-06-19 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/18342
  
This lgtm; @zsxwing please also take a look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18353: Corrected kafka dependencies

2017-06-19 Thread timvw
GitHub user timvw opened a pull request:

https://github.com/apache/spark/pull/18353

Corrected kafka dependencies


## What changes were proposed in this pull request?

Currently spark-streaming-kafka-0-10 has a dependency on the full kafka 
distribution (but only uses and requires the kafka-clients library).

The PR fixes that (the library only depends on kafka-clients), and the 
tests depend on the full kafka.

## How was this patch tested?

All existing tests still pass.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/timvw/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18353.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18353


commit 08403b009b9fc1c06ef3d18d3e50ac3f7878672c
Author: Tim Van Wassenhove 
Date:   2017-06-19T14:10:46Z

Moved KakaTestUtils to test folder. Corrected dependencies so that 
streaming library only has dependency on kafka-clients instead of full kafka




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18353: Corrected kafka dependencies

2017-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18353
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18353: Corrected kafka dependencies

2017-06-19 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/18353#discussion_r122720352
  
--- Diff: external/kafka-0-10/pom.xml ---
@@ -49,8 +49,36 @@
 
 
   org.apache.kafka
+  kafka-clients
--- End diff --

Hm, ideally we unify this with the similar declaration already in the Spark 
SQL Kafka 0.10 POM but they don't have a parent that is Kafka-specific. At 
least they should be consistent. Why are the excludes needed? We also probably 
want a refactored `kafka.version` property in each to avoid repeating the 
version.

Finally can you explain the change a bit and make a JIRA? 
http://spark.apache.org/contributing.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18353: Corrected kafka dependencies

2017-06-19 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/18353#discussion_r122720450
  
--- Diff: 
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala
 ---
@@ -20,25 +20,24 @@ package org.apache.spark.streaming.kafka010
 import java.io.{File, IOException}
 import java.lang.{Integer => JInt}
 import java.net.InetSocketAddress
-import java.util.{Map => JMap, Properties}
 import java.util.concurrent.TimeoutException
-
-import scala.annotation.tailrec
-import scala.collection.JavaConverters._
-import scala.util.control.NonFatal
+import java.util.{Properties, Map => JMap}
--- End diff --

Looks like all of this file should be reverted


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18353: Corrected kafka dependencies

2017-06-19 Thread timvw
Github user timvw commented on a diff in the pull request:

https://github.com/apache/spark/pull/18353#discussion_r122722841
  
--- Diff: 
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala
 ---
@@ -20,25 +20,24 @@ package org.apache.spark.streaming.kafka010
 import java.io.{File, IOException}
 import java.lang.{Integer => JInt}
 import java.net.InetSocketAddress
-import java.util.{Map => JMap, Properties}
 import java.util.concurrent.TimeoutException
-
-import scala.annotation.tailrec
-import scala.collection.JavaConverters._
-import scala.util.control.NonFatal
+import java.util.{Properties, Map => JMap}
--- End diff --

I moved the file from src/main to src/test because KafkaTestUtils does need 
the full kafka dependency (but is not used outside the tests).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18345: [MINOR][BUILD] Fix Java linter errors

2017-06-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18345
  
Thank you for review, @srowen. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >