[GitHub] spark pull request: [SPARK-3731] [PySpark] fix memory leak in Pyth...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/2668 [SPARK-3731] [PySpark] fix memory leak in PythonRDD The parent.getOrCompute() of PythonRDD is executed in a separated thread, it should release the memory reserved for shuffle and unrolling finally. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark leak Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2668.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2668 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3731] [PySpark] fix memory leak in Pyth...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2668#issuecomment-57977968 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21319/consoleFull) for PR 2668 at commit [`ae98be2`](https://github.com/apache/spark/commit/ae98be240b95aa6f838875c7a112b99bf748acba). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2805] akka 2.3.4
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/1685#issuecomment-57979326 LGTM, I have tested it locally by running test suits(only relevant ones.) @pwendell Can you trigger jenkins here and should be okay to merge ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2667#issuecomment-57979727 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21318/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2667#issuecomment-57979724 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21318/consoleFull) for PR 2667 at commit [`3a5a6ff`](https://github.com/apache/spark/commit/3a5a6ffdb036f8432911184920193a4b8a007084). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class RankingMetrics(predictionAndLabels: RDD[(Array[Double], Array[Double])]) ` * `case class CacheTableCommand(tableName: String, plan: Option[LogicalPlan], isLazy: Boolean)` * `case class UncacheTableCommand(tableName: String) extends Command` * `case class CacheTableCommand(` * `case class UncacheTableCommand(tableName: String) extends LeafNode with Command ` * `case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3731] [PySpark] fix memory leak in Pyth...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2668#issuecomment-57982093 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21319/consoleFull) for PR 2668 at commit [`ae98be2`](https://github.com/apache/spark/commit/ae98be240b95aa6f838875c7a112b99bf748acba). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CacheTableCommand(tableName: String, plan: Option[LogicalPlan], isLazy: Boolean)` * `case class UncacheTableCommand(tableName: String) extends Command` * `case class CacheTableCommand(` * `case class UncacheTableCommand(tableName: String) extends LeafNode with Command ` * `case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3731] [PySpark] fix memory leak in Pyth...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2668#issuecomment-57982095 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21319/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3808] PySpark fails to start in Windows
GitHub user tsudukim opened a pull request: https://github.com/apache/spark/pull/2669 [SPARK-3808] PySpark fails to start in Windows Modified syntax error of *.cmd script. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tsudukim/spark feature/SPARK-3808 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2669.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2669 commit 7f804e6cb7001b1be372940eb186750e4154a83f Author: Masayoshi TSUZUKI tsudu...@oss.nttdata.co.jp Date: 2014-10-06T07:40:07Z [SPARK-3808] PySpark fails to start in Windows Modified syntax error of *.cmd script. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3808] PySpark fails to start in Windows
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2669#issuecomment-57983457 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2648#issuecomment-57985017 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2648#issuecomment-57985280 Changes LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2648#issuecomment-57985301 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21320/consoleFull) for PR 2648 at commit [`a2aa36b`](https://github.com/apache/spark/commit/a2aa36b6838ff71941dab1d4af5c8e5f79fd4b4f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2648#issuecomment-57985492 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/269/consoleFull) for PR 2648 at commit [`a2aa36b`](https://github.com/apache/spark/commit/a2aa36b6838ff71941dab1d4af5c8e5f79fd4b4f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r18447056 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + + +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. --- End diff -- The inputs are really ranks, right? would this not be more natural as `Int` then? I might have expected that the inputs were instead predicted and ground truth scores instead, in which case `Double` makes sense. But then the methods need to convert to rankings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r18447076 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + + +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics(predictionAndLabels: RDD[(Array[Double], Array[Double])]) { + + /** + * Returns the precsion@k for each query --- End diff -- Might actually use `@return` here, but there is also no `k` in the code or docs. This is the length of (both) arguments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r18447094 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + + +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics(predictionAndLabels: RDD[(Array[Double], Array[Double])]) { + + /** + * Returns the precsion@k for each query + */ + lazy val precAtK: RDD[Array[Double]] = predictionAndLabels.map {case (pred, lab)= +val labSet : Set[Double] = lab.toSet +val n = pred.length +val topkPrec = Array.fill[Double](n)(.0) +var (i, cnt) = (0, 0) --- End diff -- `0.0` instead of `.0`? And I am not sure it is helpful to init 2 or more variables in a line using a tuple. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r18447346 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + + +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics(predictionAndLabels: RDD[(Array[Double], Array[Double])]) { + + /** + * Returns the precsion@k for each query + */ + lazy val precAtK: RDD[Array[Double]] = predictionAndLabels.map {case (pred, lab)= +val labSet : Set[Double] = lab.toSet --- End diff -- Given my previous comment, maybe I'm missing something, but isn't one of the two arguments always going to be 1 to n? either you are ranking the predicted top n versus real rankings, or evaluating the predicted ranking of the known top n... ? I think I would have expected the input to be the predicted top n items by ID or something, and the IDs of the real top n, and then making a set and `contains` makes some sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r18447381 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import org.scalatest.FunSuite + +import org.apache.spark.mllib.util.LocalSparkContext + +class RankingMetricsSuite extends FunSuite with LocalSparkContext { + test(Ranking metrics: map, ndcg) { +val predictionAndLabels = sc.parallelize( + Seq( +(Array[Double](1, 6, 2, 7, 8, 3, 9, 10, 4, 5), Array[Double](1, 2, 3, 4, 5)), +(Array[Double](4, 1, 5, 6, 2, 7, 3, 8, 9, 10), Array[Double](1, 2, 3)) + ), 2) +val eps: Double = 1e-5 + +val metrics = new RankingMetrics(predictionAndLabels) +val precAtK = metrics.precAtK.collect() +val avePrec = metrics.avePrec.collect() +val map = metrics.meanAvePrec +val ndcg = metrics.ndcg.collect() +val aveNdcg = metrics.meanNdcg + +assert(math.abs(precAtK(0)(4) - 0.4) eps) --- End diff -- Check out the `~==` operator used in other tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r18447409 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + + +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics(predictionAndLabels: RDD[(Array[Double], Array[Double])]) { + + /** + * Returns the precsion@k for each query + */ + lazy val precAtK: RDD[Array[Double]] = predictionAndLabels.map {case (pred, lab)= +val labSet : Set[Double] = lab.toSet +val n = pred.length +val topkPrec = Array.fill[Double](n)(.0) +var (i, cnt) = (0, 0) + +while (i n) { + if (labSet.contains(pred(i))) { +cnt += 1 + } + topkPrec(i) = cnt.toDouble / (i + 1) + i += 1 +} +topkPrec + } + + /** + * Returns the average precision for each query + */ + lazy val avePrec: RDD[Double] = predictionAndLabels.map {case (pred, lab) = +val labSet: Set[Double] = lab.toSet +var (i, cnt, precSum) = (0, 0, .0) +val n = pred.length + +while (i n) { + if (labSet.contains(pred(i))) { +cnt += 1 +precSum += cnt.toDouble / (i + 1) + } + i += 1 +} +precSum / labSet.size + } + + /** + * Returns the mean average precision (MAP) of all the queries + */ + lazy val meanAvePrec: Double = computeMean(avePrec) + + /** + * Returns the normalized discounted cumulative gain for each query + */ + lazy val ndcg: RDD[Double] = predictionAndLabels.map {case (pred, lab) = +val labSet = lab.toSet +val n = math.min(pred.length, labSet.size) +var (maxDcg, dcg, i) = (.0, .0, 0) +while (i n) { + /* Calculate 1/log2(i + 2) */ + val gain = 1.0 / (math.log(i + 2) / math.log(2)) + if (labSet.contains(pred(i))) { +dcg += gain + } + maxDcg += gain + i += 1 +} +dcg / maxDcg + } + + /** + * Returns the mean NDCG of all the queries + */ + lazy val meanNdcg: Double = computeMean(ndcg) + + private def computeMean(data: RDD[Double]): Double = { --- End diff -- `RDD[Double]` already has a `mean()` method; no need to reimplement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r18447432 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + + +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics(predictionAndLabels: RDD[(Array[Double], Array[Double])]) { --- End diff -- Might check that arguments are not empty and of equal length? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2648#issuecomment-57996887 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21320/consoleFull)** for PR 2648 at commit [`a2aa36b`](https://github.com/apache/spark/commit/a2aa36b6838ff71941dab1d4af5c8e5f79fd4b4f) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2648#issuecomment-57996893 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21320/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark] RDD take() method: overestimate too mu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2648#issuecomment-57997093 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/269/consoleFull)** for PR 2648 at commit [`a2aa36b`](https://github.com/apache/spark/commit/a2aa36b6838ff71941dab1d4af5c8e5f79fd4b4f) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3811 [CORE] More robust / standard Utils...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/2670 SPARK-3811 [CORE] More robust / standard Utils.deleteRecursively, Utils.createTempDir I noticed a few issues with how temp directories are created and deleted: *Minor* * Guava's `Files.createTempDir()` plus `File.deleteOnExit()` is used in many tests to make a temp dir, but `Utils.createTempDir()` seems to be the standard Spark mechanism * Call to `File.deleteOnExit()` could be pushed into `Utils.createTempDir()` as well, along with this replacement * _I messed up the message in an exception in `Utils` in SPARK-3794; fixed here_ *Bit Less Minor* * `Utils.deleteRecursively()` fails immediately if any `IOException` occurs, instead of trying to delete any remaining files and subdirectories. I've observed this leave temp dirs around. I suggest changing it to continue in the face of an exception and throw one of the possibly several exceptions that occur at the end. * `Utils.createTempDir()` will add a JVM shutdown hook every time the method is called. Even if the subdir is the parent of another parent dir, since this check is inside the hook. However `Utils` manages a set of all dirs to delete on shutdown already, called `shutdownDeletePaths`. A single hook can be registered to delete all of these on exit. This is how Tachyon temp paths are cleaned up in `TachyonBlockManager`. I noticed a few other things that might be changed but wanted to ask first: * Shouldn't the set of dirs to delete be `File`, not just `String` paths? * `Utils` manages the set of `TachyonFile` that have been registered for deletion, but the shutdown hook is managed in `TachyonBlockManager`. Should this logic not live together, and not in `Utils`? it's more specific to Tachyon, and looks a slight bit odd to import in such a generic place. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-3811 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2670.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2670 commit 3a0faa4e151cac3d9d9b4b4ee87cd024d260c9b1 Author: Sean Owen so...@cloudera.com Date: 2014-10-06T10:19:01Z Standardize on Utils.createTempDir instead of Files.createTempDir commit da0146de0fd21f375843afb47441a2d9a4db146d Author: Sean Owen so...@cloudera.com Date: 2014-10-06T10:19:30Z Make Utils.deleteRecursively try to delete all paths even when an exception occurs; use one shutdown hook instead of one per method call to delete temp dirs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/2671 [SPARK-3809][SQL]fix HiveThriftServer2Suite to make it work correctly Currently HiveThriftServer2Suite is a fake one, actually HiveThriftServer not even started there. Issues here: 1 Thriftserver not started. Testing will get this error --- ERROR HiveThriftServer2Suite: Failed to start Hive Thrift server within 30 seconds java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] 2 Thriftserver not stoped. After test finished the process of thriftserver did not exit. This patch fix this problems as follows: 1 Since thriftserver started as a daemon in https://github.com/apache/spark/pull/2509, output of the```start-thriftserver.sh``` has be redirect to a log file such as ```spark-kf-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-kf.out```, so to see whether this file contain ThriftBinaryCLIService listening on ` to assert server started successfully 2 Start server in ```beforeAll``` 3 stop server in ```afterAll``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/scwf/spark fix-HiveThriftServer2Suite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2671.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2671 commit c39d0a5eabc1f505117e711c17a48b089b266483 Author: scwf wangf...@huawei.com Date: 2014-10-06T09:45:41Z fix HiveThriftServer2Suite commit 0081a508f147a2b7bd7065149b8d3da308ba3d37 Author: scwf wangf...@huawei.com Date: 2014-10-06T09:51:14Z fix code format --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3811 [CORE] More robust / standard Utils...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2670#issuecomment-57998800 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21321/consoleFull) for PR 2670 at commit [`da0146d`](https://github.com/apache/spark/commit/da0146de0fd21f375843afb47441a2d9a4db146d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2671#issuecomment-57999047 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3810][SQL] Makes PreInsertionCasts hand...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/2672 [SPARK-3810][SQL] Makes PreInsertionCasts handle partitions properly Includes partition keys into account when applying `PreInsertionCasts` rule. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark fix-pre-insert-casts Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2672.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2672 commit def1a1a316961d1209fef0046154319e9bfca260 Author: Cheng Lian lian.cs@gmail.com Date: 2014-10-06T10:03:46Z Makes PreInsertionCasts handle partitions properly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3810][SQL] Makes PreInsertionCasts hand...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2672#issuecomment-58002727 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21322/consoleFull) for PR 2672 at commit [`def1a1a`](https://github.com/apache/spark/commit/def1a1a316961d1209fef0046154319e9bfca260). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3806][SQL]Minor fix for CliSuite
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2666#issuecomment-58003170 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3806][SQL]Minor fix for CliSuite
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2666#issuecomment-58003178 Good catch! LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Build changes to publish effective pom.
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/2673 Build changes to publish effective pom. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 build-changes-effective-pom Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2673.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2673 commit 83072fb3bd08f13874373883717ca5700a468eb3 Author: Prashant Sharma prashan...@imaginea.com Date: 2014-10-06T10:22:25Z help plugin commit cfe0531d3e49241b57e384c3dd98c0da7cf1c4ff Author: Prashant Sharma prashan...@imaginea.com Date: 2014-10-06T11:13:20Z Switched to a custom plugin since maven-help-plugin was not much of a help. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2671#issuecomment-58003250 cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Build changes to publish effective pom.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2673#issuecomment-58003572 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21323/consoleFull) for PR 2673 at commit [`cfe0531`](https://github.com/apache/spark/commit/cfe0531d3e49241b57e384c3dd98c0da7cf1c4ff). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3807: SparkSql does not work for tables ...
GitHub user chiragaggarwal opened a pull request: https://github.com/apache/spark/pull/2674 SPARK-3807: SparkSql does not work for tables created using custom serde You can merge this pull request into a Git repository by running: $ git pull https://github.com/chiragaggarwal/spark branch-1.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2674.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2674 commit 5c73b72b917ad0cb16b76411f961731527022e36 Author: chirag chirag.aggar...@guavus.com Date: 2014-10-06T11:10:30Z SPARK-3807: SparkSql does not work for tables created using custom serde --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3807: SparkSql does not work for tables ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2674#issuecomment-58003794 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3807: SparkSql does not work for tables ...
Github user chiragaggarwal commented on the pull request: https://github.com/apache/spark/pull/2674#issuecomment-58003913 SparkSql crashes on selecting tables using custom serde. The following exception is seen on running a query like 'select * from table_name limit 1': ERROR CliDriver: org.apache.hadoop.hive.serde2.SerDeException: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer.initialize(ThriftDeserializer.java:68) at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:80) at org.apache.spark.sql.hive.execution.HiveTableScan.addColumnMetadataToConf(HiveTableScan.scala:86) at org.apache.spark.sql.hive.execution.HiveTableScan.init(HiveTableScan.scala:100) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364) at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:280) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406) at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:406) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.NullPointerException --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2673#issuecomment-58004659 @pwendell Take a look, whenever you get time. It would be good if we can publish https://github.com/ScrapCodes/effective-pom-plugin. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3811 [CORE] More robust / standard Utils...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2670#issuecomment-58004841 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21321/consoleFull) for PR 2670 at commit [`da0146d`](https://github.com/apache/spark/commit/da0146de0fd21f375843afb47441a2d9a4db146d). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3811 [CORE] More robust / standard Utils...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2670#issuecomment-58004848 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21321/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2673#issuecomment-58005087 I will have to add a similar thing for http://maven.apache.org/plugins/maven-deploy-plugin/deploy-file-mojo.html. But I am not sure about repository url field. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3810][SQL] Makes PreInsertionCasts hand...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2672#issuecomment-58007097 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21322/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3810][SQL] Makes PreInsertionCasts hand...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2672#issuecomment-58007089 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21322/consoleFull) for PR 2672 at commit [`def1a1a`](https://github.com/apache/spark/commit/def1a1a316961d1209fef0046154319e9bfca260). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2673#issuecomment-58009844 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21323/consoleFull) for PR 2673 at commit [`cfe0531`](https://github.com/apache/spark/commit/cfe0531d3e49241b57e384c3dd98c0da7cf1c4ff). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CacheTableCommand(tableName: String, plan: Option[LogicalPlan], isLazy: Boolean)` * `case class UncacheTableCommand(tableName: String) extends Command` * `case class CacheTableCommand(` * `case class UncacheTableCommand(tableName: String) extends LeafNode with Command ` * `case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2673#issuecomment-58009851 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21323/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2671#issuecomment-58020543 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3808] PySpark fails to start in Windows
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2669#issuecomment-58020951 LGTM. I think, this issue is caused by #2481. @adrewor14 Can you take a look at this change? Because you saw #2481 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2671#issuecomment-58021025 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/271/consoleFull) for PR 2671 at commit [`0081a50`](https://github.com/apache/spark/commit/0081a508f147a2b7bd7065149b8d3da308ba3d37). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3806][SQL]Minor fix for CliSuite
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2666#issuecomment-58020996 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/270/consoleFull) for PR 2666 at commit [`11430db`](https://github.com/apache/spark/commit/11430dbb01c78b4244ab626e626153747bb1d30a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2671#issuecomment-58022609 @scwf Thanks, this is a good catch. However, I should mention that HiveThriftServer2Suite is known to be flaky before the Thrift server is made a daemon. I had opened #2214 to try to fix this issue, but unfortunately Jenkins fails because of unknown reason that I couldn't reproduce locally. After numerous unsuccessful tries, I haven't got time to get it done yet. Sorry for the trouble... The essential issue fixed in #2214 is that the exception caught in HiveThriftServer2Suite is not re-thrown in the `catch` clause. That's why it always passes no matter what exception is thrown. Back to this PR, I have several comments: 1. Personally I don't prefer to start/stop the server process in `beforeAll`/`afterAll`. I'd like to make sure every test is executed against a Thrift server process with clean states. 1. The sleeps introduced in this PR can be eliminated by starting a `tail` process to watch the log file, and then monitor the output of the `tail` process. Since an empty log file does no harm, we can try to create a new empty log file to ensure the file exists before executing `tail`. 1. Log file should be removed after stopping the server process. Since the Jenkins failure issue in #2214 is really tricky to fix and without fixing that, we couldn't make any change to `HiveThriftServer2Suite`, I'm going to open new PR to have another try to fix this issue together with those left in #2214. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-58023284 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21324/consoleFull) for PR 2647 at commit [`b2318eb`](https://github.com/apache/spark/commit/b2318eb227d59dbd61d2dd8a24592cdc2f64ac2b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3788] [yarn] Fix compareFs to do the ri...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2649#issuecomment-58023951 The changes look fine, although I don't think this applies to federation. My understanding of federation was the namespace wasn't viewable on the client side. The client still picks one of the federated namenodes (using normal host:port), but on the cluster side it uses the namespace. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2671#discussion_r18458935 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala --- @@ -70,37 +73,39 @@ class HiveThriftServer2Suite extends FunSuite with Logging { val serverStarted = Promise[Unit]() val buffer = new ArrayBuffer[String]() +val startString = + starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to +val maxTries = 30 def captureOutput(source: String)(line: String) { buffer += s$source $line - if (line.contains(ThriftBinaryCLIService listening on)) { -serverStarted.success(()) + if (line.contains(startString)) { +val logFile = new File(line.substring(startString.length)) +var tryNum = 0 +// This is a hack to wait logFile ready +Thread.sleep(5000) +// logFile may have not finished, try every second +while (!logFile.exists() || (!fileToString(logFile).contains( + ThriftBinaryCLIService listening on) tryNum maxTries)) { --- End diff -- `tryNum` is never increased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2661#issuecomment-58024089 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21326/consoleFull) for PR 2661 at commit [`7090e17`](https://github.com/apache/spark/commit/7090e17695c4a1a095ddf31d33012f0c323e988b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-58024065 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21327/consoleFull) for PR 2520 at commit [`fccdad2`](https://github.com/apache/spark/commit/fccdad2525433d693a443e6938de110fcb56afce). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2671#discussion_r18459115 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala --- @@ -70,37 +73,39 @@ class HiveThriftServer2Suite extends FunSuite with Logging { val serverStarted = Promise[Unit]() val buffer = new ArrayBuffer[String]() +val startString = + starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to +val maxTries = 30 def captureOutput(source: String)(line: String) { buffer += s$source $line - if (line.contains(ThriftBinaryCLIService listening on)) { -serverStarted.success(()) + if (line.contains(startString)) { +val logFile = new File(line.substring(startString.length)) +var tryNum = 0 +// This is a hack to wait logFile ready +Thread.sleep(5000) +// logFile may have not finished, try every second +while (!logFile.exists() || (!fileToString(logFile).contains( + ThriftBinaryCLIService listening on) tryNum maxTries)) { + Thread.sleep(1000) +} +if (fileToString(logFile).contains(ThriftBinaryCLIService listening on)) { + serverStarted.success(()) +} else { + throw new TimeoutException() +} } } - val process = Process(command).run( ProcessLogger(captureOutput(stdout), captureOutput(stderr))) Future { val exitValue = process.exitValue() - logInfo(sSpark SQL Thrift server process exit value: $exitValue) + logInfo(sStart Spark SQL Thrift server process exit value: $exitValue) --- End diff -- Why Start here? When this line is executed, the server process has already ended. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58024460 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21325/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/2577#discussion_r18459132 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -383,40 +405,82 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments, } } + /** + * This system security manager applies to the entire process. + * It's main purpose is to handle the case if the user code does a System.exit. + * This allows us to catch that and properly set the YARN application status and + * cleanup if needed. + */ + private def setupSystemSecurityManager() = { +try { + var stopped = false + System.setSecurityManager(new java.lang.SecurityManager() { +override def checkExit(paramInt: Int) { + if (!stopped) { +logInfo(In securityManager checkExit, exit code: + paramInt) +if (paramInt == 0) { + finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) +} else { + finish(FinalApplicationStatus.FAILED, +paramInt, +User class exited with non-zero exit code) +} +stopped = true + } +} +// required for the checkExit to work properly +override def checkPermission(perm: java.security.Permission): Unit = { +} + }) +} +catch { + case e: SecurityException = +finish(FinalApplicationStatus.FAILED, + ApplicationMaster.EXIT_SECURITY, + Error in setSecurityManager) +logError(Error in setSecurityManager:, e) +} + } + + /** + * Start the user class, which contains the spark driver, in a separate Thread. + * If the main routine exits cleanly or exits with System.exit(0) we + * assume it was successful, for all other cases we assume failure. + * + * Returns the user thread that was started. + */ private def startUserClass(): Thread = { logInfo(Starting the user JAR in a separate Thread) System.setProperty(spark.executor.instances, args.numExecutors.toString) val mainMethod = Class.forName(args.userClass, false, Thread.currentThread.getContextClassLoader).getMethod(main, classOf[Array[String]]) -userClassThread = new Thread { +val userThread = new Thread { override def run() { -var status = FinalApplicationStatus.FAILED try { - // Copy val mainArgs = new Array[String](args.userArgs.size) args.userArgs.copyToArray(mainArgs, 0, args.userArgs.size) mainMethod.invoke(null, mainArgs) - // Some apps have System.exit(0) at the end. The user thread will stop here unless - // it has an uncaught exception thrown out. It needs a shutdown hook to set SUCCEEDED. - status = FinalApplicationStatus.SUCCEEDED + finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) + logDebug(Done running users class) } catch { case e: InvocationTargetException = e.getCause match { case _: InterruptedException = // Reporter thread can interrupt to stop user class - - case e = throw e + case e: Throwable = --- End diff -- that is fine, but note you didn't comment on this one earlier, you commented somewhere else in the code. this one we end up re-throwing so I wasn't as concerned with it. I can change it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/2577#discussion_r18459205 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -383,40 +405,82 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments, } } + /** + * This system security manager applies to the entire process. + * It's main purpose is to handle the case if the user code does a System.exit. + * This allows us to catch that and properly set the YARN application status and + * cleanup if needed. + */ + private def setupSystemSecurityManager() = { +try { + var stopped = false + System.setSecurityManager(new java.lang.SecurityManager() { +override def checkExit(paramInt: Int) { + if (!stopped) { +logInfo(In securityManager checkExit, exit code: + paramInt) +if (paramInt == 0) { + finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) +} else { + finish(FinalApplicationStatus.FAILED, +paramInt, +User class exited with non-zero exit code) +} +stopped = true + } +} +// required for the checkExit to work properly +override def checkPermission(perm: java.security.Permission): Unit = { +} --- End diff -- In the future please clarify what you want bumped up as you said this prior and I thought you meant remove the extra space between 430 and 431. I assume you actually mean the } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58024665 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2671#discussion_r18459283 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala --- @@ -123,14 +128,45 @@ class HiveThriftServer2Suite extends FunSuite with Logging { |= .stripMargin, cause) } finally { - warehousePath.delete() - metastorePath.delete() process.destroy() } } + override def afterAll() { +warehousePath.delete() +metastorePath.delete() +stopThriftserver + } + + def stopThriftserver: Unit = { +val stopScript = ../../sbin/stop-thriftserver.sh.split(/).mkString(File.separator) +val builder = new ProcessBuilder(stopScript) +val process = builder.start() +new Thread(read stderr) { + override def run() { +for (line - Source.fromInputStream(process.getErrorStream).getLines()) { + System.err.println(line) +} + } +}.start() +val output = new StringBuffer +val stdoutThread = new Thread(read stdout) { + override def run() { +for (line - Source.fromInputStream(process.getInputStream).getLines()) { + output.append(line) --- End diff -- `output` is never used. Maybe you intended to print it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [3809][SQL] Fixes test suites in hive-thriftse...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2675#issuecomment-58024854 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21328/consoleFull) for PR 2675 at commit [`5094bb4`](https://github.com/apache/spark/commit/5094bb446922875b41bfaf06fc54510d6ef9b22e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58025658 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21329/consoleFull) for PR 2612 at commit [`33376b1`](https://github.com/apache/spark/commit/33376b181a361b04fea7d6f02565fa9914c43350). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2671#discussion_r18459638 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala --- @@ -123,14 +128,45 @@ class HiveThriftServer2Suite extends FunSuite with Logging { |= .stripMargin, cause) } finally { - warehousePath.delete() - metastorePath.delete() process.destroy() } } + override def afterAll() { +warehousePath.delete() +metastorePath.delete() +stopThriftserver + } + + def stopThriftserver: Unit = { +val stopScript = ../../sbin/stop-thriftserver.sh.split(/).mkString(File.separator) +val builder = new ProcessBuilder(stopScript) --- End diff -- Using Scala process API can be much simpler :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2577#issuecomment-58026499 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21330/consoleFull) for PR 2577 at commit [`9c2efbf`](https://github.com/apache/spark/commit/9c2efbfd8d199bf89f911e44c7b07c6afe6b15bd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2671#issuecomment-58029079 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/271/consoleFull) for PR 2671 at commit [`0081a50`](https://github.com/apache/spark/commit/0081a508f147a2b7bd7065149b8d3da308ba3d37). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3806][SQL]Minor fix for CliSuite
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2666#issuecomment-58029017 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/270/consoleFull) for PR 2666 at commit [`11430db`](https://github.com/apache/spark/commit/11430dbb01c78b4244ab626e626153747bb1d30a). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CacheTableCommand(tableName: String, plan: Option[LogicalPlan], isLazy: Boolean)` * `case class UncacheTableCommand(tableName: String) extends Command` * `case class CacheTableCommand(` * `case class UncacheTableCommand(tableName: String) extends LeafNode with Command ` * `case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2671#issuecomment-58029661 Hi, @liancheng, thanks for you comments that is very useful. 1 Actually i have tried start a server for a every test case but the second one failed due to org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as process 82266. Stop it first. Now i get the reason from your patch A Thread.sleep has to be introduced because the kill command used in stop-thriftserver.sh is not synchronous. 2 Use a tail process is ok, actually i think about it. Since the log file won't be a big file, so here i mike to use ```fileToString``` 3 Yeah, here should remove log file --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...
GitHub user tgravescs opened a pull request: https://github.com/apache/spark/pull/2676 [SPARK-3778] newAPIHadoopRDD doesn't properly pass credentials for secure hdfs https://issues.apache.org/jira/browse/SPARK-3778 This affects if someone is trying to access secure hdfs something like: val lines = { val hconf = new Configuration() hconf.set(mapred.input.dir, mydir) hconf.set(textinputformat.record.delimiter,\003432\n) sc.newAPIHadoopRDD(hconf, classOf[TextInputFormat], classOf[LongWritable], classOf[Text]) } You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgravescs/spark SPARK-3778 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2676.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2676 commit c3d6b83332b1ba370bff837d7be09ffd30243262 Author: Thomas Graves tgra...@apache.org Date: 2014-10-06T14:53:29Z newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [3809][SQL] Fixes test suites in hive-thriftse...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2675#issuecomment-58030626 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21328/consoleFull) for PR 2675 at commit [`5094bb4`](https://github.com/apache/spark/commit/5094bb446922875b41bfaf06fc54510d6ef9b22e). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2676#issuecomment-58030636 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21331/consoleFull) for PR 2676 at commit [`c3d6b83`](https://github.com/apache/spark/commit/c3d6b83332b1ba370bff837d7be09ffd30243262). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [3809][SQL] Fixes test suites in hive-thriftse...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2675#issuecomment-58030638 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21328/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2126#issuecomment-58031929 thanks @jongyoul, the changes look fine to me, but I'll leave the final review to someone who knows the mesos scheduler. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2671#issuecomment-58033388 BTW, Jenkins passes because the exception re-thrown issue is not fixed in your PR :) You may check the full console output for sure https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/271/consoleFull And mine still suffers the mysterious timeout. Keep digging... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2671#issuecomment-58033821 The key point to use `tail` is to eliminate the `sleep` call rather than avoid `fileToString`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-58034384 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21324/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-58034375 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21324/consoleFull) for PR 2647 at commit [`b2318eb`](https://github.com/apache/spark/commit/b2318eb227d59dbd61d2dd8a24592cdc2f64ac2b). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-58035848 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21327/consoleFull) for PR 2520 at commit [`fccdad2`](https://github.com/apache/spark/commit/fccdad2525433d693a443e6938de110fcb56afce). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-58035857 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21327/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2661#issuecomment-58036024 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21326/consoleFull) for PR 2661 at commit [`7090e17`](https://github.com/apache/spark/commit/7090e17695c4a1a095ddf31d33012f0c323e988b). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2671#issuecomment-58036125 Get it, i will check this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2661#issuecomment-58036040 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21326/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58037051 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21329/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-58037041 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21329/consoleFull) for PR 2612 at commit [`33376b1`](https://github.com/apache/spark/commit/33376b181a361b04fea7d6f02565fa9914c43350). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2577#issuecomment-58037924 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21330/consoleFull) for PR 2577 at commit [`9c2efbf`](https://github.com/apache/spark/commit/9c2efbfd8d199bf89f911e44c7b07c6afe6b15bd). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CacheTableCommand(tableName: String, plan: Option[LogicalPlan], isLazy: Boolean)` * `case class UncacheTableCommand(tableName: String) extends Command` * `case class CacheTableCommand(` * `case class UncacheTableCommand(tableName: String) extends LeafNode with Command ` * `case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3627] - [yarn] - fix exit code and fina...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2577#issuecomment-58037937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21330/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3816][SQL] Add configureOutputJobProper...
GitHub user alexliu68 opened a pull request: https://github.com/apache/spark/pull/2677 [SPARK-3816][SQL] Add configureOutputJobPropertiesForStorageHandler to j... ...ob conf in SparkHadoopWriter class You can merge this pull request into a Git repository by running: $ git pull https://github.com/alexliu68/spark SPARK-SQL-3816 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2677.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2677 commit 14e3c63e49fab82a5ef386bb714984eca29f3bdc Author: Alex Liu alex_li...@yahoo.com Date: 2014-10-06T16:03:30Z [SPARK-3816][SQL] Add configureOutputJobPropertiesForStorageHandler to job conf in SparkHadoopWriter class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3816][SQL] Add configureOutputJobProper...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2677#issuecomment-58041075 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3813][SQL] Support case when conditio...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2678 [SPARK-3813][SQL] Support case when conditional functions in Spark SQL. case when conditional function is already supported in Spark SQL but there is no support in SqlParser. So added parser support to it. Author : ravipesala ravindra.pes...@huawei.com You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3813 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2678.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2678 commit 709684f1036e1ab8595f94c2d3c5314c29a20063 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-10-06T15:42:02Z Changed parser to support case when function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...
Github user timothysc commented on the pull request: https://github.com/apache/spark/pull/2126#issuecomment-58042192 @tgravescs Seems ok, may I ask how you tested/verified @jongyoul? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2676#issuecomment-58042286 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21331/consoleFull) for PR 2676 at commit [`c3d6b83`](https://github.com/apache/spark/commit/c3d6b83332b1ba370bff837d7be09ffd30243262). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2676#issuecomment-58042300 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21331/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3816][SQL] Add configureOutputJobProper...
Github user alexliu68 closed the pull request at: https://github.com/apache/spark/pull/2677 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3813][SQL] Support case when conditio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2678#issuecomment-58042687 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3816][SQL] Add configureOutputJobProper...
GitHub user alexliu68 reopened a pull request: https://github.com/apache/spark/pull/2677 [SPARK-3816][SQL] Add configureOutputJobPropertiesForStorageHandler to j... ...ob conf in SparkHadoopWriter class You can merge this pull request into a Git repository by running: $ git pull https://github.com/alexliu68/spark SPARK-SQL-3816 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2677.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2677 commit e62af9fecd009a396a9ea2a362170977653472bb Author: Alex Liu alex_li...@yahoo.com Date: 2014-10-06T16:11:37Z [SPARK-3816][SQL] Add configureOutputJobPropertiesForStorageHandler to job conf in SparkHiveWriterContainer class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2671#issuecomment-58043321 It is very strange in my local maching it is ok. Hi @liancheng, can you get the log file in stdout starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /home/jenkins/workspace/NewSparkPullRequestBuilder/sbin/../logs/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-test02.amplab.out I think may be there is a already exist thriftserver process that leads server start failed on Jenkins, we need the log file to check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3816][SQL] Add configureOutputJobProper...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2677#issuecomment-58043529 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [3809][SQL] Fixes test suites in hive-thriftse...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2675#issuecomment-58043428 The console output suggests that the CLI process and the Thrift server process were started and executed successfully but the timeout was too tight. Try relaxing the timeout. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2671#issuecomment-58044200 When trying #2214 several weeks ago, the Thrift server process simply couldn't start on Jenkins server, but everything's fine on my local machine. However, the pull request builder had been refactored a lot by Josh. It seems that #2675 fails simply because my timeout was too tight for Jenkins. I'm trying to relax the timeout a bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3809][SQL]fix HiveThriftServer2Suite to...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2671#issuecomment-58045037 Ok, since now the process output redirect to log file, we can also check with it to see where is the problem --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org