[GitHub] spark pull request: [SPARK-3731] [PySpark] fix memory leak in Pyth...

2014-10-06 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/2668

[SPARK-3731] [PySpark] fix memory leak in PythonRDD

The parent.getOrCompute() of PythonRDD is executed in a separated thread, 
it should release the memory reserved for shuffle and unrolling finally.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark leak

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2668






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3731] [PySpark] fix memory leak in Pyth...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2668#issuecomment-57977968
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21319/consoleFull)
 for   PR 2668 at commit 
[`ae98be2`](https://github.com/apache/spark/commit/ae98be240b95aa6f838875c7a112b99bf748acba).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2805] akka 2.3.4

2014-10-06 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/1685#issuecomment-57979326
  
LGTM, I have tested it locally by running test suits(only relevant ones.) 
@pwendell Can you trigger jenkins here and should be okay to merge ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2667#issuecomment-57979727
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21318/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2667#issuecomment-57979724
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21318/consoleFull)
 for   PR 2667 at commit 
[`3a5a6ff`](https://github.com/apache/spark/commit/3a5a6ffdb036f8432911184920193a4b8a007084).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class RankingMetrics(predictionAndLabels: RDD[(Array[Double], 
Array[Double])]) `
  * `case class CacheTableCommand(tableName: String, plan: 
Option[LogicalPlan], isLazy: Boolean)`
  * `case class UncacheTableCommand(tableName: String) extends Command`
  * `case class CacheTableCommand(`
  * `case class UncacheTableCommand(tableName: String) extends LeafNode 
with Command `
  * `case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3731] [PySpark] fix memory leak in Pyth...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2668#issuecomment-57982093
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21319/consoleFull)
 for   PR 2668 at commit 
[`ae98be2`](https://github.com/apache/spark/commit/ae98be240b95aa6f838875c7a112b99bf748acba).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CacheTableCommand(tableName: String, plan: 
Option[LogicalPlan], isLazy: Boolean)`
  * `case class UncacheTableCommand(tableName: String) extends Command`
  * `case class CacheTableCommand(`
  * `case class UncacheTableCommand(tableName: String) extends LeafNode 
with Command `
  * `case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3731] [PySpark] fix memory leak in Pyth...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2668#issuecomment-57982095
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21319/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3808] PySpark fails to start in Windows

2014-10-06 Thread tsudukim
GitHub user tsudukim opened a pull request:

https://github.com/apache/spark/pull/2669

[SPARK-3808] PySpark fails to start in Windows

Modified syntax error of *.cmd script.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tsudukim/spark feature/SPARK-3808

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2669.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2669


commit 7f804e6cb7001b1be372940eb186750e4154a83f
Author: Masayoshi TSUZUKI tsudu...@oss.nttdata.co.jp
Date:   2014-10-06T07:40:07Z

[SPARK-3808] PySpark fails to start in Windows

Modified syntax error of *.cmd script.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3808] PySpark fails to start in Windows

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2669#issuecomment-57983457
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...

2014-10-06 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/2648#issuecomment-57985017
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...

2014-10-06 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/2648#issuecomment-57985280
  
Changes LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2648#issuecomment-57985301
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21320/consoleFull)
 for   PR 2648 at commit 
[`a2aa36b`](https://github.com/apache/spark/commit/a2aa36b6838ff71941dab1d4af5c8e5f79fd4b4f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2648#issuecomment-57985492
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/269/consoleFull)
 for   PR 2648 at commit 
[`a2aa36b`](https://github.com/apache/spark/commit/a2aa36b6838ff71941dab1d4af5c8e5f79fd4b4f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2667#discussion_r18447056
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.evaluation
+
+
+import org.apache.spark.SparkContext._
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+
+
+/**
+ * ::Experimental::
+ * Evaluator for ranking algorithms.
+ *
+ * @param predictionAndLabels an RDD of (predicted ranking, ground truth 
set) pairs.
--- End diff --

The inputs are really ranks, right? would this not be more natural as `Int` 
then?
I might have expected that the inputs were instead predicted and ground 
truth scores instead, in which case `Double` makes sense. But then the 
methods need to convert to rankings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2667#discussion_r18447076
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.evaluation
+
+
+import org.apache.spark.SparkContext._
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+
+
+/**
+ * ::Experimental::
+ * Evaluator for ranking algorithms.
+ *
+ * @param predictionAndLabels an RDD of (predicted ranking, ground truth 
set) pairs.
+ */
+@Experimental
+class RankingMetrics(predictionAndLabels: RDD[(Array[Double], 
Array[Double])]) {
+
+  /**
+   * Returns the precsion@k for each query
--- End diff --

Might actually use `@return` here, but there is also no `k` in the code or 
docs. This is the length of (both) arguments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2667#discussion_r18447094
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.evaluation
+
+
+import org.apache.spark.SparkContext._
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+
+
+/**
+ * ::Experimental::
+ * Evaluator for ranking algorithms.
+ *
+ * @param predictionAndLabels an RDD of (predicted ranking, ground truth 
set) pairs.
+ */
+@Experimental
+class RankingMetrics(predictionAndLabels: RDD[(Array[Double], 
Array[Double])]) {
+
+  /**
+   * Returns the precsion@k for each query
+   */
+  lazy val precAtK: RDD[Array[Double]] = predictionAndLabels.map {case 
(pred, lab)=
+val labSet : Set[Double] = lab.toSet
+val n = pred.length
+val topkPrec = Array.fill[Double](n)(.0)
+var (i, cnt) = (0, 0)
--- End diff --

`0.0` instead of `.0`? And I am not sure it is helpful to init 2 or more 
variables in a line using a tuple.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2667#discussion_r18447346
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.evaluation
+
+
+import org.apache.spark.SparkContext._
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+
+
+/**
+ * ::Experimental::
+ * Evaluator for ranking algorithms.
+ *
+ * @param predictionAndLabels an RDD of (predicted ranking, ground truth 
set) pairs.
+ */
+@Experimental
+class RankingMetrics(predictionAndLabels: RDD[(Array[Double], 
Array[Double])]) {
+
+  /**
+   * Returns the precsion@k for each query
+   */
+  lazy val precAtK: RDD[Array[Double]] = predictionAndLabels.map {case 
(pred, lab)=
+val labSet : Set[Double] = lab.toSet
--- End diff --

Given my previous comment, maybe I'm missing something, but isn't one of 
the two arguments always going to be 1 to n? either you are ranking the 
predicted top n versus real rankings, or evaluating the predicted ranking of 
the known top n... ? I think I would have expected the input to be the 
predicted top n items by ID or something, and the IDs of the real top n, and 
then making a set and `contains` makes some sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2667#discussion_r18447381
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.evaluation
+
+import org.scalatest.FunSuite
+
+import org.apache.spark.mllib.util.LocalSparkContext
+
+class RankingMetricsSuite extends FunSuite with LocalSparkContext {
+  test(Ranking metrics: map, ndcg) {
+val predictionAndLabels = sc.parallelize(
+  Seq(
+(Array[Double](1, 6, 2, 7, 8, 3, 9, 10, 4, 5), Array[Double](1, 2, 
3, 4, 5)),
+(Array[Double](4, 1, 5, 6, 2, 7, 3, 8, 9, 10), Array[Double](1, 2, 
3))
+  ), 2)
+val eps: Double = 1e-5
+
+val metrics = new RankingMetrics(predictionAndLabels)
+val precAtK = metrics.precAtK.collect()
+val avePrec = metrics.avePrec.collect()
+val map = metrics.meanAvePrec
+val ndcg = metrics.ndcg.collect()
+val aveNdcg = metrics.meanNdcg
+
+assert(math.abs(precAtK(0)(4) - 0.4)  eps)
--- End diff --

Check out the `~==` operator used in other tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2667#discussion_r18447409
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.evaluation
+
+
+import org.apache.spark.SparkContext._
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+
+
+/**
+ * ::Experimental::
+ * Evaluator for ranking algorithms.
+ *
+ * @param predictionAndLabels an RDD of (predicted ranking, ground truth 
set) pairs.
+ */
+@Experimental
+class RankingMetrics(predictionAndLabels: RDD[(Array[Double], 
Array[Double])]) {
+
+  /**
+   * Returns the precsion@k for each query
+   */
+  lazy val precAtK: RDD[Array[Double]] = predictionAndLabels.map {case 
(pred, lab)=
+val labSet : Set[Double] = lab.toSet
+val n = pred.length
+val topkPrec = Array.fill[Double](n)(.0)
+var (i, cnt) = (0, 0)
+
+while (i  n) {
+  if (labSet.contains(pred(i))) {
+cnt += 1
+  }
+  topkPrec(i) = cnt.toDouble / (i + 1)
+  i += 1
+}
+topkPrec
+  }
+
+  /**
+   * Returns the average precision for each query
+   */
+  lazy val avePrec: RDD[Double] = predictionAndLabels.map {case (pred, 
lab) =
+val labSet: Set[Double] = lab.toSet
+var (i, cnt, precSum) = (0, 0, .0)
+val n = pred.length
+
+while (i  n) {
+  if (labSet.contains(pred(i))) {
+cnt += 1
+precSum += cnt.toDouble / (i + 1)
+  }
+  i += 1
+}
+precSum / labSet.size
+  }
+
+  /**
+   * Returns the mean average precision (MAP) of all the queries
+   */
+  lazy val meanAvePrec: Double = computeMean(avePrec)
+
+  /**
+   * Returns the normalized discounted cumulative gain for each query
+   */
+  lazy val ndcg: RDD[Double] = predictionAndLabels.map {case (pred, lab) =
+val labSet = lab.toSet
+val n = math.min(pred.length, labSet.size)
+var (maxDcg, dcg, i) = (.0, .0, 0)
+while (i  n) {
+  /* Calculate 1/log2(i + 2) */
+  val gain = 1.0 / (math.log(i + 2) / math.log(2))
+  if (labSet.contains(pred(i))) {
+dcg += gain
+  }
+  maxDcg += gain
+  i += 1
+}
+dcg / maxDcg
+  }
+
+  /**
+   * Returns the mean NDCG of all the queries
+   */
+  lazy val meanNdcg: Double = computeMean(ndcg)
+
+  private def computeMean(data: RDD[Double]): Double = {
--- End diff --

`RDD[Double]` already has a `mean()` method; no need to reimplement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2667#discussion_r18447432
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.evaluation
+
+
+import org.apache.spark.SparkContext._
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.rdd.RDD
+
+
+/**
+ * ::Experimental::
+ * Evaluator for ranking algorithms.
+ *
+ * @param predictionAndLabels an RDD of (predicted ranking, ground truth 
set) pairs.
+ */
+@Experimental
+class RankingMetrics(predictionAndLabels: RDD[(Array[Double], 
Array[Double])]) {
--- End diff --

Might check that arguments are not empty and of equal length?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2648#issuecomment-57996887
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21320/consoleFull)**
 for PR 2648 at commit 
[`a2aa36b`](https://github.com/apache/spark/commit/a2aa36b6838ff71941dab1d4af5c8e5f79fd4b4f)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2648#issuecomment-57996893
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21320/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2648#issuecomment-57997093
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/269/consoleFull)**
 for PR 2648 at commit 
[`a2aa36b`](https://github.com/apache/spark/commit/a2aa36b6838ff71941dab1d4af5c8e5f79fd4b4f)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3811 [CORE] More robust / standard Utils...

2014-10-06 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/2670

SPARK-3811 [CORE] More robust / standard Utils.deleteRecursively, 
Utils.createTempDir

I noticed a few issues with how temp directories are created and deleted:

*Minor*

* Guava's `Files.createTempDir()` plus `File.deleteOnExit()` is used in 
many tests to make a temp dir, but `Utils.createTempDir()` seems to be the 
standard Spark mechanism
* Call to `File.deleteOnExit()` could be pushed into 
`Utils.createTempDir()` as well, along with this replacement
* _I messed up the message in an exception in `Utils` in SPARK-3794; fixed 
here_

*Bit Less Minor*

* `Utils.deleteRecursively()` fails immediately if any `IOException` 
occurs, instead of trying to delete any remaining files and subdirectories. 
I've observed this leave temp dirs around. I suggest changing it to continue in 
the face of an exception and throw one of the possibly several exceptions that 
occur at the end.
* `Utils.createTempDir()` will add a JVM shutdown hook every time the 
method is called. Even if the subdir is the parent of another parent dir, since 
this check is inside the hook. However `Utils` manages a set of all dirs to 
delete on shutdown already, called `shutdownDeletePaths`. A single hook can be 
registered to delete all of these on exit. This is how Tachyon temp paths are 
cleaned up in `TachyonBlockManager`.

I noticed a few other things that might be changed but wanted to ask first:

* Shouldn't the set of dirs to delete be `File`, not just `String` paths?
* `Utils` manages the set of `TachyonFile` that have been registered for 
deletion, but the shutdown hook is managed in `TachyonBlockManager`. Should 
this logic not live together, and not in `Utils`? it's more specific to 
Tachyon, and looks a slight bit odd to import in such a generic place.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-3811

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2670.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2670


commit 3a0faa4e151cac3d9d9b4b4ee87cd024d260c9b1
Author: Sean Owen so...@cloudera.com
Date:   2014-10-06T10:19:01Z

Standardize on Utils.createTempDir instead of Files.createTempDir

commit da0146de0fd21f375843afb47441a2d9a4db146d
Author: Sean Owen so...@cloudera.com
Date:   2014-10-06T10:19:30Z

Make Utils.deleteRecursively try to delete all paths even when an exception 
occurs; use one shutdown hook instead of one per method call to delete temp dirs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread scwf
GitHub user scwf opened a pull request:

https://github.com/apache/spark/pull/2671

[SPARK-3809][SQL]fix HiveThriftServer2Suite to make it work correctly

Currently HiveThriftServer2Suite is a fake one, actually HiveThriftServer 
not even started there. Issues here:
1 Thriftserver not started. Testing will get this error ---
ERROR HiveThriftServer2Suite: Failed to start Hive Thrift server within 30 
seconds
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
2 Thriftserver not stoped. After test finished the process of thriftserver 
did not exit.

This patch fix this problems as follows:
1 Since thriftserver started as a daemon in  
https://github.com/apache/spark/pull/2509, output of 
the```start-thriftserver.sh``` has be redirect to a log file such as 
```spark-kf-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-kf.out```,
 so to see whether this file contain ThriftBinaryCLIService listening on ` to 
assert server started successfully

2 Start server in ```beforeAll```

3 stop server in ```afterAll```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scwf/spark fix-HiveThriftServer2Suite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2671.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2671


commit c39d0a5eabc1f505117e711c17a48b089b266483
Author: scwf wangf...@huawei.com
Date:   2014-10-06T09:45:41Z

fix HiveThriftServer2Suite

commit 0081a508f147a2b7bd7065149b8d3da308ba3d37
Author: scwf wangf...@huawei.com
Date:   2014-10-06T09:51:14Z

fix code format




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3811 [CORE] More robust / standard Utils...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2670#issuecomment-57998800
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21321/consoleFull)
 for   PR 2670 at commit 
[`da0146d`](https://github.com/apache/spark/commit/da0146de0fd21f375843afb47441a2d9a4db146d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2671#issuecomment-57999047
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3810][SQL] Makes PreInsertionCasts hand...

2014-10-06 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/2672

[SPARK-3810][SQL] Makes PreInsertionCasts handle partitions properly

Includes partition keys into account when applying `PreInsertionCasts` rule.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark fix-pre-insert-casts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2672.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2672


commit def1a1a316961d1209fef0046154319e9bfca260
Author: Cheng Lian lian.cs@gmail.com
Date:   2014-10-06T10:03:46Z

Makes PreInsertionCasts handle partitions properly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3810][SQL] Makes PreInsertionCasts hand...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2672#issuecomment-58002727
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21322/consoleFull)
 for   PR 2672 at commit 
[`def1a1a`](https://github.com/apache/spark/commit/def1a1a316961d1209fef0046154319e9bfca260).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3806][SQL]Minor fix for CliSuite

2014-10-06 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2666#issuecomment-58003170
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3806][SQL]Minor fix for CliSuite

2014-10-06 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2666#issuecomment-58003178
  
Good catch! LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Build changes to publish effective pom.

2014-10-06 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/2673

Build changes to publish effective pom.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 build-changes-effective-pom

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2673.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2673


commit 83072fb3bd08f13874373883717ca5700a468eb3
Author: Prashant Sharma prashan...@imaginea.com
Date:   2014-10-06T10:22:25Z

help plugin

commit cfe0531d3e49241b57e384c3dd98c0da7cf1c4ff
Author: Prashant Sharma prashan...@imaginea.com
Date:   2014-10-06T11:13:20Z

Switched to a custom plugin since maven-help-plugin was not much of a help.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2671#issuecomment-58003250
  
cc @liancheng


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Build changes to publish effective pom.

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2673#issuecomment-58003572
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21323/consoleFull)
 for   PR 2673 at commit 
[`cfe0531`](https://github.com/apache/spark/commit/cfe0531d3e49241b57e384c3dd98c0da7cf1c4ff).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3807: SparkSql does not work for tables ...

2014-10-06 Thread chiragaggarwal
GitHub user chiragaggarwal opened a pull request:

https://github.com/apache/spark/pull/2674

SPARK-3807: SparkSql does not work for tables created using custom serde



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiragaggarwal/spark branch-1.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2674.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2674


commit 5c73b72b917ad0cb16b76411f961731527022e36
Author: chirag chirag.aggar...@guavus.com
Date:   2014-10-06T11:10:30Z

SPARK-3807: SparkSql does not work for tables created using custom serde




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3807: SparkSql does not work for tables ...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2674#issuecomment-58003794
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3807: SparkSql does not work for tables ...

2014-10-06 Thread chiragaggarwal
Github user chiragaggarwal commented on the pull request:

https://github.com/apache/spark/pull/2674#issuecomment-58003913
  
SparkSql crashes on selecting tables using custom serde.
The following exception is seen on running a query like 'select * from 
table_name limit 1':
ERROR CliDriver: org.apache.hadoop.hive.serde2.SerDeException: 
java.lang.NullPointerException 
at 
org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer.initialize(ThriftDeserializer.java:68)
 
at 
org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:80) 
at 
org.apache.spark.sql.hive.execution.HiveTableScan.addColumnMetadataToConf(HiveTableScan.scala:86)
 
at 
org.apache.spark.sql.hive.execution.HiveTableScan.init(HiveTableScan.scala:100)
 
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
 
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
 
at 
org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364)
 
at 
org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184)
 
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) 
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
 
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
 
at 
org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:280)
 
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) 
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
 
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402)
 
at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400) 
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406)
 
at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406)
 
at 
org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:406)
 
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59)
 
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291)
 
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) 
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)
 
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) 
at java.lang.reflect.Method.invoke(Unknown Source) 
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) 
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) 
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.NullPointerException


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...

2014-10-06 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2673#issuecomment-58004659
  
@pwendell Take a look, whenever you get time. It would be good if we can 
publish https://github.com/ScrapCodes/effective-pom-plugin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3811 [CORE] More robust / standard Utils...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2670#issuecomment-58004841
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21321/consoleFull)
 for   PR 2670 at commit 
[`da0146d`](https://github.com/apache/spark/commit/da0146de0fd21f375843afb47441a2d9a4db146d).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3811 [CORE] More robust / standard Utils...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2670#issuecomment-58004848
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21321/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...

2014-10-06 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2673#issuecomment-58005087
  
I will have to add a similar thing for 
http://maven.apache.org/plugins/maven-deploy-plugin/deploy-file-mojo.html. But 
I am not sure about repository url field.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3810][SQL] Makes PreInsertionCasts hand...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2672#issuecomment-58007097
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21322/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3810][SQL] Makes PreInsertionCasts hand...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2672#issuecomment-58007089
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21322/consoleFull)
 for   PR 2672 at commit 
[`def1a1a`](https://github.com/apache/spark/commit/def1a1a316961d1209fef0046154319e9bfca260).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2673#issuecomment-58009844
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21323/consoleFull)
 for   PR 2673 at commit 
[`cfe0531`](https://github.com/apache/spark/commit/cfe0531d3e49241b57e384c3dd98c0da7cf1c4ff).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CacheTableCommand(tableName: String, plan: 
Option[LogicalPlan], isLazy: Boolean)`
  * `case class UncacheTableCommand(tableName: String) extends Command`
  * `case class CacheTableCommand(`
  * `case class UncacheTableCommand(tableName: String) extends LeafNode 
with Command `
  * `case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2673#issuecomment-58009851
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21323/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2671#issuecomment-58020543
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3808] PySpark fails to start in Windows

2014-10-06 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2669#issuecomment-58020951
  
LGTM.
I think, this issue is caused by #2481.
@adrewor14 Can you take a look at this change? Because you saw #2481 .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2671#issuecomment-58021025
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/271/consoleFull)
 for   PR 2671 at commit 
[`0081a50`](https://github.com/apache/spark/commit/0081a508f147a2b7bd7065149b8d3da308ba3d37).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3806][SQL]Minor fix for CliSuite

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2666#issuecomment-58020996
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/270/consoleFull)
 for   PR 2666 at commit 
[`11430db`](https://github.com/apache/spark/commit/11430dbb01c78b4244ab626e626153747bb1d30a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2671#issuecomment-58022609
  
@scwf Thanks, this is a good catch. However, I should mention that 
HiveThriftServer2Suite is known to be flaky before the Thrift server is made a 
daemon. I had opened #2214 to try to fix this issue, but unfortunately Jenkins 
fails because of unknown reason that I couldn't reproduce locally. After 
numerous unsuccessful tries, I haven't got time to get it done yet. Sorry for 
the trouble... The essential issue fixed in #2214 is that the exception caught 
in HiveThriftServer2Suite is not re-thrown in the `catch` clause. That's why it 
always passes no matter what exception is thrown.

Back to this PR, I have several comments:

1. Personally I don't prefer to start/stop the server process in 
`beforeAll`/`afterAll`. I'd like to make sure every test is executed against a 
Thrift server process with clean states.
1. The sleeps introduced in this PR can be eliminated by starting a `tail` 
process to watch the log file, and then monitor the output of the `tail` 
process. Since an empty log file does no harm, we can try to create a new empty 
log file to ensure the file exists before executing `tail`.
1. Log file should be removed after stopping the server process.

Since the Jenkins failure issue in #2214 is really tricky to fix and 
without fixing that, we couldn't make any change to `HiveThriftServer2Suite`, 
I'm going to open new PR to have another try to fix this issue together with 
those left in #2214.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2647#issuecomment-58023284
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21324/consoleFull)
 for   PR 2647 at commit 
[`b2318eb`](https://github.com/apache/spark/commit/b2318eb227d59dbd61d2dd8a24592cdc2f64ac2b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...

2014-10-06 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2649#issuecomment-58023951
  
The changes look fine, although I don't think this applies to federation.  
My understanding of federation was the namespace wasn't viewable on the client 
side. The client still picks one of the federated namenodes (using normal 
host:port), but on the cluster side it uses the namespace.  




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2671#discussion_r18458935
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -70,37 +73,39 @@ class HiveThriftServer2Suite extends FunSuite with 
Logging {
 
 val serverStarted = Promise[Unit]()
 val buffer = new ArrayBuffer[String]()
+val startString =
+  starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, 
logging to 
+val maxTries = 30
 
 def captureOutput(source: String)(line: String) {
   buffer += s$source $line
-  if (line.contains(ThriftBinaryCLIService listening on)) {
-serverStarted.success(())
+  if (line.contains(startString)) {
+val logFile = new File(line.substring(startString.length))
+var tryNum = 0
+// This is a hack to wait logFile ready
+Thread.sleep(5000)
+// logFile may have not finished, try every second
+while (!logFile.exists() || (!fileToString(logFile).contains(
+  ThriftBinaryCLIService listening on)  tryNum  maxTries)) {
--- End diff --

`tryNum` is never increased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2661#issuecomment-58024089
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21326/consoleFull)
 for   PR 2661 at commit 
[`7090e17`](https://github.com/apache/spark/commit/7090e17695c4a1a095ddf31d33012f0c323e988b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-58024065
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21327/consoleFull)
 for   PR 2520 at commit 
[`fccdad2`](https://github.com/apache/spark/commit/fccdad2525433d693a443e6938de110fcb56afce).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2671#discussion_r18459115
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -70,37 +73,39 @@ class HiveThriftServer2Suite extends FunSuite with 
Logging {
 
 val serverStarted = Promise[Unit]()
 val buffer = new ArrayBuffer[String]()
+val startString =
+  starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, 
logging to 
+val maxTries = 30
 
 def captureOutput(source: String)(line: String) {
   buffer += s$source $line
-  if (line.contains(ThriftBinaryCLIService listening on)) {
-serverStarted.success(())
+  if (line.contains(startString)) {
+val logFile = new File(line.substring(startString.length))
+var tryNum = 0
+// This is a hack to wait logFile ready
+Thread.sleep(5000)
+// logFile may have not finished, try every second
+while (!logFile.exists() || (!fileToString(logFile).contains(
+  ThriftBinaryCLIService listening on)  tryNum  maxTries)) {
+  Thread.sleep(1000)
+}
+if (fileToString(logFile).contains(ThriftBinaryCLIService 
listening on)) {
+  serverStarted.success(())
+} else {
+  throw new TimeoutException()
+}
   }
 }
-
 val process = Process(command).run(
   ProcessLogger(captureOutput(stdout), captureOutput(stderr)))
 
 Future {
   val exitValue = process.exitValue()
-  logInfo(sSpark SQL Thrift server process exit value: $exitValue)
+  logInfo(sStart Spark SQL Thrift server process exit value: 
$exitValue)
--- End diff --

Why Start here? When this line is executed, the server process has 
already ended.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58024460
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21325/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-06 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18459132
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +405,82 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * This system security manager applies to the entire process.
+   * It's main purpose is to handle the case if the user code does a 
System.exit.
+   * This allows us to catch that and properly set the YARN application 
status and
+   * cleanup if needed.
+   */
+  private def setupSystemSecurityManager() = {
+try {
+  var stopped = false
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo(In securityManager checkExit, exit code:  + paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+User class exited with non-zero exit code)
+}
+stopped = true
+  }
+}
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): Unit 
= {
+}
+  })
+}
+catch {
+  case e: SecurityException =
+finish(FinalApplicationStatus.FAILED,
+  ApplicationMaster.EXIT_SECURITY,
+  Error in setSecurityManager)
+logError(Error in setSecurityManager:, e)
+}
+  }
+
+  /**
+   * Start the user class, which contains the spark driver, in a separate 
Thread.
+   * If the main routine exits cleanly or exits with System.exit(0) we
+   * assume it was successful, for all other cases we assume failure.
+   *
+   * Returns the user thread that was started.
+   */
   private def startUserClass(): Thread = {
 logInfo(Starting the user JAR in a separate Thread)
 System.setProperty(spark.executor.instances, 
args.numExecutors.toString)
 val mainMethod = Class.forName(args.userClass, false,
   Thread.currentThread.getContextClassLoader).getMethod(main, 
classOf[Array[String]])
 
-userClassThread = new Thread {
+val userThread = new Thread {
   override def run() {
-var status = FinalApplicationStatus.FAILED
 try {
-  // Copy
   val mainArgs = new Array[String](args.userArgs.size)
   args.userArgs.copyToArray(mainArgs, 0, args.userArgs.size)
   mainMethod.invoke(null, mainArgs)
-  // Some apps have System.exit(0) at the end.  The user thread 
will stop here unless
-  // it has an uncaught exception thrown out.  It needs a shutdown 
hook to set SUCCEEDED.
-  status = FinalApplicationStatus.SUCCEEDED
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+  logDebug(Done running users class)
 } catch {
   case e: InvocationTargetException =
 e.getCause match {
   case _: InterruptedException =
 // Reporter thread can interrupt to stop user class
-
-  case e = throw e
+  case e: Throwable =
--- End diff --

that is fine, but note you didn't comment on this one earlier, you 
commented somewhere else in the code. this one we end up re-throwing so I 
wasn't as concerned with it.  I can change it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-06 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2577#discussion_r18459205
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala 
---
@@ -383,40 +405,82 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 }
   }
 
+  /**
+   * This system security manager applies to the entire process.
+   * It's main purpose is to handle the case if the user code does a 
System.exit.
+   * This allows us to catch that and properly set the YARN application 
status and
+   * cleanup if needed.
+   */
+  private def setupSystemSecurityManager() = {
+try {
+  var stopped = false
+  System.setSecurityManager(new java.lang.SecurityManager() {
+override def checkExit(paramInt: Int) {
+  if (!stopped) {
+logInfo(In securityManager checkExit, exit code:  + paramInt)
+if (paramInt == 0) {
+  finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+} else {
+  finish(FinalApplicationStatus.FAILED,
+paramInt,
+User class exited with non-zero exit code)
+}
+stopped = true
+  }
+}
+// required for the checkExit to work properly
+override def checkPermission(perm: java.security.Permission): Unit 
= {
+}
--- End diff --

In the future please clarify what you want bumped up as you said this prior 
and I thought you meant remove the extra space between 430 and 431.  I assume 
you actually mean the }


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-06 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58024665
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2671#discussion_r18459283
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -123,14 +128,45 @@ class HiveThriftServer2Suite extends FunSuite with 
Logging {
  |=
.stripMargin, cause)
 } finally {
-  warehousePath.delete()
-  metastorePath.delete()
   process.destroy()
 }
   }
 
+  override def afterAll() {
+warehousePath.delete()
+metastorePath.delete()
+stopThriftserver
+  }
+
+  def stopThriftserver: Unit = {
+val stopScript = 
../../sbin/stop-thriftserver.sh.split(/).mkString(File.separator)
+val builder = new ProcessBuilder(stopScript)
+val process = builder.start()
+new Thread(read stderr) {
+  override def run() {
+for (line - 
Source.fromInputStream(process.getErrorStream).getLines()) {
+  System.err.println(line)
+}
+  }
+}.start()
+val output = new StringBuffer
+val stdoutThread = new Thread(read stdout) {
+  override def run() {
+for (line - 
Source.fromInputStream(process.getInputStream).getLines()) {
+  output.append(line)
--- End diff --

`output` is never used. Maybe you intended to print it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [3809][SQL] Fixes test suites in hive-thriftse...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2675#issuecomment-58024854
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21328/consoleFull)
 for   PR 2675 at commit 
[`5094bb4`](https://github.com/apache/spark/commit/5094bb446922875b41bfaf06fc54510d6ef9b22e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58025658
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21329/consoleFull)
 for   PR 2612 at commit 
[`33376b1`](https://github.com/apache/spark/commit/33376b181a361b04fea7d6f02565fa9914c43350).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2671#discussion_r18459638
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -123,14 +128,45 @@ class HiveThriftServer2Suite extends FunSuite with 
Logging {
  |=
.stripMargin, cause)
 } finally {
-  warehousePath.delete()
-  metastorePath.delete()
   process.destroy()
 }
   }
 
+  override def afterAll() {
+warehousePath.delete()
+metastorePath.delete()
+stopThriftserver
+  }
+
+  def stopThriftserver: Unit = {
+val stopScript = 
../../sbin/stop-thriftserver.sh.split(/).mkString(File.separator)
+val builder = new ProcessBuilder(stopScript)
--- End diff --

Using Scala process API can be much simpler :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-58026499
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21330/consoleFull)
 for   PR 2577 at commit 
[`9c2efbf`](https://github.com/apache/spark/commit/9c2efbfd8d199bf89f911e44c7b07c6afe6b15bd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2671#issuecomment-58029079
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/271/consoleFull)
 for   PR 2671 at commit 
[`0081a50`](https://github.com/apache/spark/commit/0081a508f147a2b7bd7065149b8d3da308ba3d37).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3806][SQL]Minor fix for CliSuite

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2666#issuecomment-58029017
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/270/consoleFull)
 for   PR 2666 at commit 
[`11430db`](https://github.com/apache/spark/commit/11430dbb01c78b4244ab626e626153747bb1d30a).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CacheTableCommand(tableName: String, plan: 
Option[LogicalPlan], isLazy: Boolean)`
  * `case class UncacheTableCommand(tableName: String) extends Command`
  * `case class CacheTableCommand(`
  * `case class UncacheTableCommand(tableName: String) extends LeafNode 
with Command `
  * `case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2671#issuecomment-58029661
  
Hi, @liancheng, thanks for you comments that is very useful.
1 Actually i have tried start a server for a every test case but the second 
one failed due to org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 
running as process 82266. Stop it first. Now i get the reason from your patch 
A Thread.sleep has to be introduced because the kill command used in 
stop-thriftserver.sh is not synchronous.
2 Use a tail process is ok, actually i think about it. Since the log file 
won't be a big file, so here i mike to use ```fileToString```
3 Yeah, here should remove log file


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...

2014-10-06 Thread tgravescs
GitHub user tgravescs opened a pull request:

https://github.com/apache/spark/pull/2676

[SPARK-3778] newAPIHadoopRDD doesn't properly pass credentials for secure 
hdfs

https://issues.apache.org/jira/browse/SPARK-3778

This affects if someone is trying to access secure hdfs something like:
val lines = {
  val hconf = new Configuration()
  hconf.set(mapred.input.dir, mydir)
  hconf.set(textinputformat.record.delimiter,\003432\n)
  sc.newAPIHadoopRDD(hconf, classOf[TextInputFormat], 
classOf[LongWritable], classOf[Text])
}

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgravescs/spark SPARK-3778

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2676.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2676


commit c3d6b83332b1ba370bff837d7be09ffd30243262
Author: Thomas Graves tgra...@apache.org
Date:   2014-10-06T14:53:29Z

newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [3809][SQL] Fixes test suites in hive-thriftse...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2675#issuecomment-58030626
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21328/consoleFull)
 for   PR 2675 at commit 
[`5094bb4`](https://github.com/apache/spark/commit/5094bb446922875b41bfaf06fc54510d6ef9b22e).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2676#issuecomment-58030636
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21331/consoleFull)
 for   PR 2676 at commit 
[`c3d6b83`](https://github.com/apache/spark/commit/c3d6b83332b1ba370bff837d7be09ffd30243262).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [3809][SQL] Fixes test suites in hive-thriftse...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2675#issuecomment-58030638
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21328/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

2014-10-06 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2126#issuecomment-58031929
  
thanks @jongyoul, the changes look fine to me, but I'll leave the final 
review to someone who knows the mesos scheduler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2671#issuecomment-58033388
  
BTW, Jenkins passes because the exception re-thrown issue is not fixed in 
your PR :) You may check the full console output for sure 
https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/271/consoleFull

And mine still suffers the mysterious timeout. Keep digging...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2671#issuecomment-58033821
  
The key point to use `tail` is to eliminate the `sleep` call rather than 
avoid `fileToString`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2647#issuecomment-58034384
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21324/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2647#issuecomment-58034375
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21324/consoleFull)
 for   PR 2647 at commit 
[`b2318eb`](https://github.com/apache/spark/commit/b2318eb227d59dbd61d2dd8a24592cdc2f64ac2b).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-58035848
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21327/consoleFull)
 for   PR 2520 at commit 
[`fccdad2`](https://github.com/apache/spark/commit/fccdad2525433d693a443e6938de110fcb56afce).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-58035857
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21327/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2661#issuecomment-58036024
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21326/consoleFull)
 for   PR 2661 at commit 
[`7090e17`](https://github.com/apache/spark/commit/7090e17695c4a1a095ddf31d33012f0c323e988b).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2671#issuecomment-58036125
  
Get it, i will check this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2661#issuecomment-58036040
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21326/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58037051
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21329/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2612#issuecomment-58037041
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21329/consoleFull)
 for   PR 2612 at commit 
[`33376b1`](https://github.com/apache/spark/commit/33376b181a361b04fea7d6f02565fa9914c43350).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-58037924
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21330/consoleFull)
 for   PR 2577 at commit 
[`9c2efbf`](https://github.com/apache/spark/commit/9c2efbfd8d199bf89f911e44c7b07c6afe6b15bd).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CacheTableCommand(tableName: String, plan: 
Option[LogicalPlan], isLazy: Boolean)`
  * `case class UncacheTableCommand(tableName: String) extends Command`
  * `case class CacheTableCommand(`
  * `case class UncacheTableCommand(tableName: String) extends LeafNode 
with Command `
  * `case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2577#issuecomment-58037937
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21330/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3816][SQL] Add configureOutputJobProper...

2014-10-06 Thread alexliu68
GitHub user alexliu68 opened a pull request:

https://github.com/apache/spark/pull/2677

[SPARK-3816][SQL] Add configureOutputJobPropertiesForStorageHandler to j...

...ob conf in SparkHadoopWriter class

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alexliu68/spark SPARK-SQL-3816

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2677


commit 14e3c63e49fab82a5ef386bb714984eca29f3bdc
Author: Alex Liu alex_li...@yahoo.com
Date:   2014-10-06T16:03:30Z

[SPARK-3816][SQL] Add configureOutputJobPropertiesForStorageHandler to job 
conf in SparkHadoopWriter class




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3816][SQL] Add configureOutputJobProper...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2677#issuecomment-58041075
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3813][SQL] Support case when conditio...

2014-10-06 Thread ravipesala
GitHub user ravipesala opened a pull request:

https://github.com/apache/spark/pull/2678

[SPARK-3813][SQL] Support case when conditional functions in Spark SQL.

case when conditional function is already supported in Spark SQL but 
there is no support in SqlParser. So added parser support to it.

Author : ravipesala ravindra.pes...@huawei.com

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/spark SPARK-3813

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2678.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2678


commit 709684f1036e1ab8595f94c2d3c5314c29a20063
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-10-06T15:42:02Z

Changed parser to support case when function.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

2014-10-06 Thread timothysc
Github user timothysc commented on the pull request:

https://github.com/apache/spark/pull/2126#issuecomment-58042192
  
@tgravescs Seems ok, may I ask how you tested/verified @jongyoul? 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...

2014-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2676#issuecomment-58042286
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21331/consoleFull)
 for   PR 2676 at commit 
[`c3d6b83`](https://github.com/apache/spark/commit/c3d6b83332b1ba370bff837d7be09ffd30243262).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2676#issuecomment-58042300
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21331/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3816][SQL] Add configureOutputJobProper...

2014-10-06 Thread alexliu68
Github user alexliu68 closed the pull request at:

https://github.com/apache/spark/pull/2677


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3813][SQL] Support case when conditio...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2678#issuecomment-58042687
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3816][SQL] Add configureOutputJobProper...

2014-10-06 Thread alexliu68
GitHub user alexliu68 reopened a pull request:

https://github.com/apache/spark/pull/2677

[SPARK-3816][SQL] Add configureOutputJobPropertiesForStorageHandler to j...

...ob conf in SparkHadoopWriter class

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alexliu68/spark SPARK-SQL-3816

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2677


commit e62af9fecd009a396a9ea2a362170977653472bb
Author: Alex Liu alex_li...@yahoo.com
Date:   2014-10-06T16:11:37Z

[SPARK-3816][SQL] Add configureOutputJobPropertiesForStorageHandler to job 
conf in SparkHiveWriterContainer class




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2671#issuecomment-58043321
  
It is very strange in my local maching it is ok. Hi @liancheng, can you get 
the log file in stdout starting 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to 
/home/jenkins/workspace/NewSparkPullRequestBuilder/sbin/../logs/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-test02.amplab.out
I think may be there is a already exist thriftserver process that leads 
server start failed on Jenkins, we need the log file to check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3816][SQL] Add configureOutputJobProper...

2014-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2677#issuecomment-58043529
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [3809][SQL] Fixes test suites in hive-thriftse...

2014-10-06 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2675#issuecomment-58043428
  
The console output suggests that the CLI process and the Thrift server 
process were started and executed successfully but the timeout was too tight. 
Try relaxing the timeout.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2671#issuecomment-58044200
  
When trying #2214 several weeks ago, the Thrift server process simply 
couldn't start on Jenkins server, but everything's fine on my local machine.

However, the pull request builder had been refactored a lot by Josh. It 
seems that #2675 fails simply because my timeout was too tight for Jenkins. I'm 
trying to relax the timeout a bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...

2014-10-06 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2671#issuecomment-58045037
  
Ok, since now the process output redirect to log file, we can also check 
with it to see where is the problem


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >