[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76505881 [Test build #28102 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28102/consoleFull) for PR 4821 at commit [`ef69276`](https://github.com/apache/spark/commit/ef692768db319d3159ce9522d625cede3505e161). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76505885 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28102/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6055] [PySpark] fix incorrect __eq__ of...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4808 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6074] [sql] Package pyspark sql binding...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4822#issuecomment-76509271 [Test build #28106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28106/consoleFull) for PR 4822 at commit [`fb52001`](https://github.com/apache/spark/commit/fb5200118d7fbf9466d3b91936e24de268051d6e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6055] [PySpark] fix incorrect __eq__ of...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4808#issuecomment-76510949 LGTM, so I've merged this into `branch-1.3` (1.3.0) and `master` (1.4.0). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4823#issuecomment-76511755 [Test build #28112 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28112/consoleFull) for PR 4823 at commit [`7f52874`](https://github.com/apache/spark/commit/7f52874badfea314d019b0dc9097c54b8af2f654). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [CORE][minor] enhance the `toArray` method in ...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/4825 [CORE][minor] enhance the `toArray` method in `SizeTrackingVector` Use array copy instead of `Iterator#toArray` to make it more efficient. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark minor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4825.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4825 commit 946a35bfef2a746a4e4fa44c62df70031677d217 Author: Wenchen Fan cloud0...@outlook.com Date: 2015-02-16T09:42:38Z minor enhance --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6079 ] Use index to speed up StatusTrac...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/4830 [SPARK-6079 ] Use index to speed up StatusTracker.getJobIdsForGroup() `StatusTracker.getJobIdsForGroup()` is implemented via a linear scan over a HashMap rather than using an index, which might be an expensive operation if there are many (e.g. thousands) of retained jobs. This patch adds a new map to `JobProgressListener` in order to speed up these lookups. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark statustracker-job-group-indexing Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4830.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4830 commit 97275a7a472ba782c268f391876529fec8fbf2ab Author: Josh Rosen joshro...@databricks.com Date: 2015-02-28T07:29:23Z Add jobGroup to jobId index to JobProgressListener commit 2c49614cc4f92dc1a47044be362db51cfe4da77b Author: Josh Rosen joshro...@databricks.com Date: 2015-02-28T07:31:27Z getOrElse --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4826#issuecomment-76516087 [Test build #28119 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28119/consoleFull) for PR 4826 at commit [`0eb5578`](https://github.com/apache/spark/commit/0eb5578f8fc81c9c2186ffe7ba4b538f7a40f828). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4826#issuecomment-76516089 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28119/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r25552305 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -422,6 +424,108 @@ class Analyzer(catalog: Catalog, Generate(g, join = false, outer = false, None, child) } } + + /** + * Transforms the query which has subquery expressions in where clause to left semi join. + * select T1.x from T1 where T1.x in (select T2.y from T2) transformed to + * select T1.x from T1 left semi join T2 on T1.x = T2.y. + */ + object SubQueryExpressions extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case p: LogicalPlan if !p.childrenResolved = p + case filter @ Filter(conditions, child) = +val subqueryExprs = conditions.collect { + case In(exp, Seq(SubqueryExpression(subquery))) = (exp, subquery) +} +// Replace subqueries with a dummy true literal since they are evaluated separately now. +val transformedConds = conditions.transform { + case In(_, Seq(SubqueryExpression(_))) = Literal(true) +} +subqueryExprs match { + case Seq() = filter // No subqueries. + case Seq((exp, subquery)) = +createLeftSemiJoin( + child, + exp, + subquery, + transformedConds) + case _ = +throw new TreeNodeException(filter, Only one SubQuery expression is supported.) +} +} + +/** + * Create LeftSemi join with parent query to the subquery which is mentioned in 'IN' predicate + * And combine the subquery conditions and parent query conditions. + */ +def createLeftSemiJoin(left: LogicalPlan, +value: Expression, +subquery: LogicalPlan, +parentConds: Expression) : LogicalPlan = { + val (transformedPlan, subqueryConds) = transformAndGetConditions(value, subquery) + // Add both parent query conditions and subquery conditions as join conditions + val allPredicates = And(parentConds, subqueryConds) + Join(left, transformedPlan, LeftSemi, Some(allPredicates)) +} + +/** + * Transform the subquery LogicalPlan and add the expressions which are used as filters to the + * projection. And also return filter conditions used in subquery + */ +def transformAndGetConditions(value: Expression, + subquery: LogicalPlan): (LogicalPlan, Expression) = { + val expr = new scala.collection.mutable.ArrayBuffer[Expression]() + // TODO : we only decorelate subqueries in very specific cases like the cases mentioned above + // in documentation. The more complex queries like using of subqueries inside subqueries can + // be supported in future. + val transformedPlan = subquery transform { +case project @ Project(projectList, f @ Filter(condition, child)) = + // Don't support more than one item in select list of subquery + if(projectList.size 1) { +throw new TreeNodeException( +project, +SubQuery can contain only one item in Select List) + } + val resolvedChild = ResolveRelations(child) + // Add the expressions to the projections which are used as filters in subquery + val toBeAddedExprs = f.references.filter{a = +resolvedChild.resolve(a.name, resolver) != None !project.outputSet.contains(a)} + val nameToExprMap = collection.mutable.Map[String, Alias]() + // Create aliases for all projection expressions. + val witAliases = (projectList ++ toBeAddedExprs).zipWithIndex.map { +case (exp, index) = + nameToExprMap.put(exp.name, Alias(exp, ssqc$index)()) + Alias(exp, ssqc$index)() + } + // Replace the condition column names with alias names. + val transformedConds = condition.transform { +case a: Attribute if resolvedChild.resolve(a.name, resolver) != None = + nameToExprMap.get(a.name).get.toAttribute + } + // Join the first projection column of subquery to the main query and add as condition + // TODO : We can avoid if the parent condition already has this condition. + expr += EqualTo(value, witAliases(0).toAttribute) + expr += transformedConds --- End diff -- Connect the subquery with the join condition doesn't make sense to me, as we will transform the whole logical plan as
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4804#issuecomment-76504749 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28105/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4804#issuecomment-76504748 [Test build #28105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28105/consoleFull) for PR 4804 at commit [`4d95f75`](https://github.com/apache/spark/commit/4d95f75d3bdf09bad3d8a4a32d5c2ee7486a8a23). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226] [SQL] Add Exists support for wher...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4812#issuecomment-76505474 Sorry, I meant semantically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/4804#discussion_r25553237 --- Diff: core/src/test/java/org/apache/spark/util/collection/TestTimSort.java --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util.collection; + +import java.util.*; + +/** + * This codes generates a int array which fails the standard TimSort. + * + * The blog that reported the bug + * http://www.envisage-project.eu/timsort-specification-and-verification/ + * + * The algorithms to reproduce the bug is obtained from the reporter of the bug + * https://github.com/abstools/java-timsort-bug + * + * Licensed under Apache License 2.0 + * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE + */ +public class TestTimSort { + + private static final int MIN_MERGE = 32; + + /** + * Returns an array of integers that demonstrate the bug in TimSort + */ + public static int[] getTimSortBugTestSet(int length) { +int minRun = minRunLength(length); +ListLong runs = runsJDKWorstCase(minRun, length); +return createArray(runs, length); + } + + private static int minRunLength(int n) { +int r = 0; // Becomes 1 if any 1 bits are shifted off +while (n = MIN_MERGE) { + r |= (n 1); + n = 1; +} +return n + r; + } + + private static int[] createArray(ListLong runs, int length) { +int[] a = new int[length]; +Arrays.fill(a, 0); +int endRun = -1; +for (long len : runs) + a[endRun += len] = 1; +a[length - 1] = 0; +return a; + } + + /** + * Fills coderuns/code with a sequence of run lengths of the formbr + * Y_n x_{n,1} x_{n,2} ... x_{n,l_n} br + * Y_{n-1} x_{n-1,1} x_{n-1,2} ... x_{n-1,l_{n-1}} br + * ... br + * Y_1 x_{1,1} x_{1,2} ... x_{1,l_1}br + * The Y_i's are chosen to satisfy the invariant throughout execution, + * but the x_{i,j}'s are merged (by codeTimSort.mergeCollapse/code) + * into an X_i that violates the invariant. + * + * @param length The sum of all run lengths that will be added to coderuns/code. + */ + private static ListLong runsJDKWorstCase(int minRun, int length) { +ListLong runs = new ArrayListLong(); + +long runningTotal = 0, Y = minRun + 4, X = minRun; + +while (runningTotal + Y + X = length) { + runningTotal += X + Y; + generateJDKWrongElem(runs, minRun, X); + runs.add(0, Y); + // X_{i+1} = Y_i + x_{i,1} + 1, since runs.get(1) = x_{i,1} + X = Y + runs.get(1) + 1; + // Y_{i+1} = X_{i+1} + Y_i + 1 + Y += X + 1; +} + +if (runningTotal + X = length) { + runningTotal += X; + generateJDKWrongElem(runs, minRun, X); +} + +runs.add(length - runningTotal); +return runs; --- End diff -- Actually, is there any test at all in this file? it seems like it just generates the problem test case. Maybe you can use it to generate a short test exposing the bug, and create a new, actual test that shows the sort works on it. Then this code need not exist in Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6063
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4815#issuecomment-76506361 I don't think it's necessary to wait on Jenkins. This doc change can't cause a problem. We can fix the title on merge too. I'll wait anyway for that, but figure one of us can just merge soon in any event. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6074] [sql] Package pyspark sql binding...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4822#issuecomment-76506844 [Test build #28106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28106/consoleFull) for PR 4822 at commit [`fb52001`](https://github.com/apache/spark/commit/fb5200118d7fbf9466d3b91936e24de268051d6e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4823#issuecomment-76508165 [Test build #28107 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28107/consoleFull) for PR 4823 at commit [`af461cc`](https://github.com/apache/spark/commit/af461ccce44e2792ea9356ccc2db6c84609511a0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6073][SQL] Need to refresh metastore ca...
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/4824 [SPARK-6073][SQL] Need to refresh metastore cache after append data in CreateMetastoreDataSourceAsSelect JIRA: https://issues.apache.org/jira/browse/SPARK-6073 @liancheng You can merge this pull request into a Git repository by running: $ git pull https://github.com/yhuai/spark refreshCache Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4824.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4824 commit b9542ef6198988736ccfea3b665d968b6e767418 Author: Yin Huai yh...@databricks.com Date: 2015-02-28T04:07:55Z Refresh metadata cache in the Catalog in CreateMetastoreDataSourceAsSelect. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6073][SQL] Need to refresh metastore ca...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4824#issuecomment-76509453 [Test build #28109 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28109/consoleFull) for PR 4824 at commit [`b9542ef`](https://github.com/apache/spark/commit/b9542ef6198988736ccfea3b665d968b6e767418). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4021#issuecomment-76512378 Another thought: if `register()` is somehow called twice for the same accumulator, then it looks like we'll silently overwrite the existing value in `localAccums`. We should probably throw an exception instead, since that scenario could lead to lost updates. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4826#issuecomment-76513066 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28113/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4688#issuecomment-76513795 [Test build #28116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28116/consoleFull) for PR 4688 at commit [`5c11c3e`](https://github.com/apache/spark/commit/5c11c3e348fecdd070f5ab471314bce94bb4b66e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4818#discussion_r2088 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -290,8 +290,19 @@ private[yarn] class YarnAllocator( location: String, containersToUse: ArrayBuffer[Container], remaining: ArrayBuffer[Container]): Unit = { +// SPARK-6050: certain Yarn configurations return a virtual core count that doesn't match the +// request; for example, capacity scheduler + DefaultResourceCalculator. Allow users in those +// situations to disable matching of the core count. +val matchingResource = + if (sparkConf.getBoolean(spark.yarn.container.disableCpuMatching, false)) { +Resource.newInstance(allocatedContainer.getResource().getMemory(), --- End diff -- Nit: take out parens for consistency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4818#discussion_r2091 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -290,8 +290,19 @@ private[yarn] class YarnAllocator( location: String, containersToUse: ArrayBuffer[Container], remaining: ArrayBuffer[Container]): Unit = { +// SPARK-6050: certain Yarn configurations return a virtual core count that doesn't match the +// request; for example, capacity scheduler + DefaultResourceCalculator. Allow users in those +// situations to disable matching of the core count. +val matchingResource = + if (sparkConf.getBoolean(spark.yarn.container.disableCpuMatching, false)) { +Resource.newInstance(allocatedContainer.getResource().getMemory(), --- End diff -- Actually, why not just use `allocatedContainer.getResource`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5979][SPARK-6031][SPARK-6032][SPARK-604...
Github user brkyvz closed the pull request at: https://github.com/apache/spark/pull/4754 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4818#issuecomment-76514683 My opinion is that we should make the default true, as the vanilla YARN default of `FIFOScheduler` will run into this issue (though most vendor distributions have a better default). There are no versions of YARN that will return containers smaller than were requested, except in this weird situation where the scheduler doesn't support CPU scheduling. I actually think it might be better to avoid a config at all and always just avoid matching on CPU. It's really hard to imagine any situation where it would actually benefit someone to set the config to false. The only one I can think of is debugging incorrect behavior in YARN, and, if we care about that, it would be better to just log something. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4688#issuecomment-76515993 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28116/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-76515967 @mateiz @pwendell I'm hoping to also see this merged soon, what else is needed here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25552458 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -110,17 +117,12 @@ private[spark] class EventLoggingListener( hadoopDataStream = Some(fileSystem.create(path)) hadoopDataStream.get } - -val compressionCodec = - if (shouldCompress) { -Some(CompressionCodec.createCodec(sparkConf)) - } else { -None - } +val cstream = compressionCodec.map(_.compressedOutputStream(dstream)).getOrElse(dstream) --- End diff -- that's fine. I fixed this in my latest commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/4804#discussion_r25552857 --- Diff: core/src/test/java/org/apache/spark/util/collection/TestTimSort.java --- @@ -0,0 +1,134 @@ +package org.apache.spark.util.collection; + +import java.util.*; + +/* --- End diff -- you need to put this in the beginning of the file, i.e. before the package definition --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-76505657 @andrewor14 please check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6074] [sql] Package pyspark sql binding...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/4822 [SPARK-6074] [sql] Package pyspark sql bindings. This is needed for the SQL bindings to work on Yarn. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-6074 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4822.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4822 commit fb5200118d7fbf9466d3b91936e24de268051d6e Author: Marcelo Vanzin van...@cloudera.com Date: 2015-02-28T02:46:03Z [SPARK-6074] [sql] Package pyspark sql bindings. This is needed for the SQL bindings to work on Yarn. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6055] [PySpark] fix incorrect DataType....
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4810#issuecomment-76509238 I've merged this into `branch-1.1` (1.1.2). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user hotou commented on the pull request: https://github.com/apache/spark/pull/4804#issuecomment-76509270 @srowen Sounds good, done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5938][SQL] Generate Row from JSON strin...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4712#issuecomment-76509265 @liancheng Description is updated. Please take a look when you have time. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6074] [sql] Package pyspark sql binding...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4822#issuecomment-76509272 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28106/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3586][streaming]Support nested director...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2765#issuecomment-76513278 [Test build #28115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28115/consoleFull) for PR 2765 at commit [`beaed4c`](https://github.com/apache/spark/commit/beaed4c901bca8fe91361901e5ba0cb30b8a94b5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6070] [yarn] Remove unneeded classes fr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4820 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6070] [yarn] Remove unneeded classes fr...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4820#issuecomment-76514114 Thanks Marcelo, pulling this in! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5979][SPARK-6032] Smaller safer --packa...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4802#issuecomment-76514538 Pulling this in - thanks Burak! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6078][CORE] create event log dir if not...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4829#issuecomment-76516211 If we do decide to create directories, then we should only create the last missing directory, not all missing parent directories. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5751] [SQL] Sets SPARK_HOME as SPARK_PI...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4758#issuecomment-76503823 Cool, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user hotou commented on the pull request: https://github.com/apache/spark/pull/4804#issuecomment-76503891 @rxin @srowen Thanks for the review, I updated the comments and licensed info etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-76503896 Thank you @ravipesala for implementing this, however, this PR probably involve some unnecessary join condition transformation, probably you need to understand the rule of pushing down the join filter / condition first. Sorry, please correct me if I misunderstood something. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user hotou commented on the pull request: https://github.com/apache/spark/pull/4804#issuecomment-76504910 Ah. I guess I have to have license file header in .java, not just linked to it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user hotou commented on a diff in the pull request: https://github.com/apache/spark/pull/4804#discussion_r25552968 --- Diff: core/src/test/java/org/apache/spark/util/collection/TestTimSort.java --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util.collection; + +import java.util.*; + +/** + * This codes generates a int array which fails the standard TimSort. + * + * The blog that reported the bug + * http://www.envisage-project.eu/timsort-specification-and-verification/ + * + * The algorithms to reproduce the bug is obtained from the reporter of the bug + * https://github.com/abstools/java-timsort-bug + * + * Licensed under Apache License 2.0 + * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE --- End diff -- Well, it's not a exact copy, I made changes to the original codes. Do you guys have a IntelliJ style that I can import ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/4804#discussion_r25553210 --- Diff: core/src/test/java/org/apache/spark/util/collection/TestTimSort.java --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util.collection; + +import java.util.*; + +/** + * This codes generates a int array which fails the standard TimSort. + * + * The blog that reported the bug + * http://www.envisage-project.eu/timsort-specification-and-verification/ + * + * The algorithms to reproduce the bug is obtained from the reporter of the bug + * https://github.com/abstools/java-timsort-bug + * + * Licensed under Apache License 2.0 + * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE + */ +public class TestTimSort { + + private static final int MIN_MERGE = 32; + + /** + * Returns an array of integers that demonstrate the bug in TimSort + */ + public static int[] getTimSortBugTestSet(int length) { +int minRun = minRunLength(length); +ListLong runs = runsJDKWorstCase(minRun, length); +return createArray(runs, length); + } + + private static int minRunLength(int n) { +int r = 0; // Becomes 1 if any 1 bits are shifted off +while (n = MIN_MERGE) { + r |= (n 1); + n = 1; +} +return n + r; + } + + private static int[] createArray(ListLong runs, int length) { +int[] a = new int[length]; +Arrays.fill(a, 0); +int endRun = -1; +for (long len : runs) + a[endRun += len] = 1; +a[length - 1] = 0; +return a; + } + + /** + * Fills coderuns/code with a sequence of run lengths of the formbr + * Y_n x_{n,1} x_{n,2} ... x_{n,l_n} br + * Y_{n-1} x_{n-1,1} x_{n-1,2} ... x_{n-1,l_{n-1}} br + * ... br + * Y_1 x_{1,1} x_{1,2} ... x_{1,l_1}br + * The Y_i's are chosen to satisfy the invariant throughout execution, + * but the x_{i,j}'s are merged (by codeTimSort.mergeCollapse/code) + * into an X_i that violates the invariant. + * + * @param length The sum of all run lengths that will be added to coderuns/code. + */ + private static ListLong runsJDKWorstCase(int minRun, int length) { --- End diff -- Most importantly, this file doesn't contain tests that JUnit will run. Have a look at how other files declare test code with `@Test` annotations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/4804#discussion_r25553242 --- Diff: core/src/test/java/org/apache/spark/util/collection/TestTimSort.java --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util.collection; + +import java.util.*; + +/** + * This codes generates a int array which fails the standard TimSort. + * + * The blog that reported the bug + * http://www.envisage-project.eu/timsort-specification-and-verification/ + * + * The algorithms to reproduce the bug is obtained from the reporter of the bug + * https://github.com/abstools/java-timsort-bug + * + * Licensed under Apache License 2.0 + * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE + */ +public class TestTimSort { + + private static final int MIN_MERGE = 32; + + /** + * Returns an array of integers that demonstrate the bug in TimSort + */ + public static int[] getTimSortBugTestSet(int length) { +int minRun = minRunLength(length); +ListLong runs = runsJDKWorstCase(minRun, length); +return createArray(runs, length); + } + + private static int minRunLength(int n) { +int r = 0; // Becomes 1 if any 1 bits are shifted off +while (n = MIN_MERGE) { + r |= (n 1); + n = 1; +} +return n + r; + } + + private static int[] createArray(ListLong runs, int length) { +int[] a = new int[length]; +Arrays.fill(a, 0); +int endRun = -1; +for (long len : runs) + a[endRun += len] = 1; +a[length - 1] = 0; +return a; + } + + /** + * Fills coderuns/code with a sequence of run lengths of the formbr + * Y_n x_{n,1} x_{n,2} ... x_{n,l_n} br + * Y_{n-1} x_{n-1,1} x_{n-1,2} ... x_{n-1,l_{n-1}} br + * ... br + * Y_1 x_{1,1} x_{1,2} ... x_{1,l_1}br + * The Y_i's are chosen to satisfy the invariant throughout execution, + * but the x_{i,j}'s are merged (by codeTimSort.mergeCollapse/code) + * into an X_i that violates the invariant. + * + * @param length The sum of all run lengths that will be added to coderuns/code. + */ + private static ListLong runsJDKWorstCase(int minRun, int length) { --- End diff -- Sean - I think this is used in the scalatest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6074] [sql] Package pyspark sql binding...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4822#issuecomment-76506782 $ jar tf sql/core/target/spark-sql_2.10-1.3.0-SNAPSHOT.jar | grep pyspark pyspark/ pyspark/sql/ pyspark/sql/functions.py pyspark/sql/__init__.py pyspark/sql/tests.py pyspark/sql/types.py pyspark/sql/dataframe.py pyspark/sql/context.py --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user hotou commented on a diff in the pull request: https://github.com/apache/spark/pull/4804#discussion_r25553389 --- Diff: core/src/test/java/org/apache/spark/util/collection/TestTimSort.java --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util.collection; + +import java.util.*; + +/** + * This codes generates a int array which fails the standard TimSort. + * + * The blog that reported the bug + * http://www.envisage-project.eu/timsort-specification-and-verification/ + * + * The algorithms to reproduce the bug is obtained from the reporter of the bug + * https://github.com/abstools/java-timsort-bug + * + * Licensed under Apache License 2.0 + * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE + */ +public class TestTimSort { + + private static final int MIN_MERGE = 32; + + /** + * Returns an array of integers that demonstrate the bug in TimSort + */ + public static int[] getTimSortBugTestSet(int length) { +int minRun = minRunLength(length); +ListLong runs = runsJDKWorstCase(minRun, length); +return createArray(runs, length); + } + + private static int minRunLength(int n) { +int r = 0; // Becomes 1 if any 1 bits are shifted off +while (n = MIN_MERGE) { + r |= (n 1); + n = 1; +} +return n + r; + } + + private static int[] createArray(ListLong runs, int length) { +int[] a = new int[length]; +Arrays.fill(a, 0); +int endRun = -1; +for (long len : runs) + a[endRun += len] = 1; +a[length - 1] = 0; +return a; + } + + /** + * Fills coderuns/code with a sequence of run lengths of the formbr + * Y_n x_{n,1} x_{n,2} ... x_{n,l_n} br + * Y_{n-1} x_{n-1,1} x_{n-1,2} ... x_{n-1,l_{n-1}} br + * ... br + * Y_1 x_{1,1} x_{1,2} ... x_{1,l_1}br + * The Y_i's are chosen to satisfy the invariant throughout execution, + * but the x_{i,j}'s are merged (by codeTimSort.mergeCollapse/code) + * into an X_i that violates the invariant. + * + * @param length The sum of all run lengths that will be added to coderuns/code. + */ + private static ListLong runsJDKWorstCase(int minRun, int length) { +ListLong runs = new ArrayListLong(); + +long runningTotal = 0, Y = minRun + 4, X = minRun; + +while (runningTotal + Y + X = length) { + runningTotal += X + Y; + generateJDKWrongElem(runs, minRun, X); + runs.add(0, Y); + // X_{i+1} = Y_i + x_{i,1} + 1, since runs.get(1) = x_{i,1} + X = Y + runs.get(1) + 1; + // Y_{i+1} = X_{i+1} + Y_i + 1 + Y += X + 1; +} + +if (runningTotal + X = length) { + runningTotal += X; + generateJDKWrongElem(runs, minRun, X); +} + +runs.add(length - runningTotal); +return runs; --- End diff -- In SorterSuite I added a test that uses TestTimSort.java Yes TestTimSort just generate a int[], but the the array has to be at least 67108864 long, so I thought just posting a huge int[] is not as useful as knowing how the array is generated. The original codes was written to demonstrate the bug so it had a main(), and some other stuffs, I get rid of those. I am fine with fixing the license here, if you guys bear with me a bit. I am not that experienced with open source licenses --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:
[GitHub] spark pull request: SPARK-1965 [WEBUI] Spark UI throws NPE on tryi...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4777#issuecomment-76508562 OK will make that change, and if there are no more objections, will go ahead with this change to patch up this case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4804#issuecomment-76508900 OK, I apologize for belaboring this and hassle with the incorrect suggestion earlier. But I think we may have to do one more thing for the licensing to get it right, and we should. I believe we need an entry in our own `LICENSE` file after all, given the situation. You can see one for the copied TimSort. I'd just add it below that. Then it really does look good from a license perspective, AFAIK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r25552219 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -422,6 +424,108 @@ class Analyzer(catalog: Catalog, Generate(g, join = false, outer = false, None, child) } } + + /** + * Transforms the query which has subquery expressions in where clause to left semi join. + * select T1.x from T1 where T1.x in (select T2.y from T2) transformed to + * select T1.x from T1 left semi join T2 on T1.x = T2.y. + */ + object SubQueryExpressions extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case p: LogicalPlan if !p.childrenResolved = p + case filter @ Filter(conditions, child) = +val subqueryExprs = conditions.collect { + case In(exp, Seq(SubqueryExpression(subquery))) = (exp, subquery) +} +// Replace subqueries with a dummy true literal since they are evaluated separately now. +val transformedConds = conditions.transform { + case In(_, Seq(SubqueryExpression(_))) = Literal(true) +} +subqueryExprs match { + case Seq() = filter // No subqueries. + case Seq((exp, subquery)) = +createLeftSemiJoin( + child, + exp, + subquery, + transformedConds) + case _ = +throw new TreeNodeException(filter, Only one SubQuery expression is supported.) +} +} + +/** + * Create LeftSemi join with parent query to the subquery which is mentioned in 'IN' predicate + * And combine the subquery conditions and parent query conditions. + */ +def createLeftSemiJoin(left: LogicalPlan, +value: Expression, +subquery: LogicalPlan, +parentConds: Expression) : LogicalPlan = { + val (transformedPlan, subqueryConds) = transformAndGetConditions(value, subquery) + // Add both parent query conditions and subquery conditions as join conditions + val allPredicates = And(parentConds, subqueryConds) + Join(left, transformedPlan, LeftSemi, Some(allPredicates)) +} + +/** + * Transform the subquery LogicalPlan and add the expressions which are used as filters to the + * projection. And also return filter conditions used in subquery + */ +def transformAndGetConditions(value: Expression, + subquery: LogicalPlan): (LogicalPlan, Expression) = { + val expr = new scala.collection.mutable.ArrayBuffer[Expression]() + // TODO : we only decorelate subqueries in very specific cases like the cases mentioned above + // in documentation. The more complex queries like using of subqueries inside subqueries can + // be supported in future. + val transformedPlan = subquery transform { +case project @ Project(projectList, f @ Filter(condition, child)) = + // Don't support more than one item in select list of subquery + if(projectList.size 1) { +throw new TreeNodeException( +project, +SubQuery can contain only one item in Select List) + } + val resolvedChild = ResolveRelations(child) + // Add the expressions to the projections which are used as filters in subquery + val toBeAddedExprs = f.references.filter{a = +resolvedChild.resolve(a.name, resolver) != None !project.outputSet.contains(a)} + val nameToExprMap = collection.mutable.Map[String, Alias]() + // Create aliases for all projection expressions. + val witAliases = (projectList ++ toBeAddedExprs).zipWithIndex.map { +case (exp, index) = + nameToExprMap.put(exp.name, Alias(exp, ssqc$index)()) + Alias(exp, ssqc$index)() + } + // Replace the condition column names with alias names. + val transformedConds = condition.transform { --- End diff -- I am not so sure why you cares about the subquery condition, as in Hive wiki ``` As of Hive 0.13 some types of subqueries are supported in the WHERE clause. Those are queries where the result of the query can be treated as a constant for IN and NOT IN statements (called uncorrelated subqueries because the subquery does not reference columns from the parent query): ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25552104 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -217,53 +219,60 @@ private[spark] object EventLoggingListener extends Logging { /** * Write metadata about the event log to the given stream. * - * The header is a serialized version of a map, except it does not use Java serialization to - * avoid incompatibilities between different JDKs. It writes one map entry per line, in - * key=value format. - * - * The very last entry in the header is the `HEADER_END_MARKER` marker, so that the parsing code - * can know when to stop. + * The header is a single line of JSON in the beginning of the file. Note that this + * assumes all metadata necessary to parse the log is also included in the file name. + * The format needs to be kept in sync with the `openEventLog()` method below. Also, it + * cannot change in new Spark versions without some other way of detecting the change. * - * The format needs to be kept in sync with the openEventLog() method below. Also, it cannot - * change in new Spark versions without some other way of detecting the change (like some - * metadata encoded in the file name). - * - * @param logStream Raw output stream to the even log file. + * @param logStream Raw output stream to the event log file. * @param compressionCodec Optional compression codec to use. - * @return A stream where to write event log data. This may be a wrapper around the original + * @return A stream to which event log data is written. This may be a wrapper around the original * stream (for example, when compression is enabled). */ def initEventLog( logStream: OutputStream, compressionCodec: Option[CompressionCodec]): OutputStream = { -val meta = mutable.HashMap((version - SPARK_VERSION)) +val metadata = new mutable.HashMap[String, String] +// Some of these metadata are already encoded in the file name +// Here we include them again within the file itself for completeness +metadata += (Event - Utils.getFormattedClassName(SparkListenerMetadataIdentifier)) +metadata += (SPARK_VERSION_KEY - SPARK_VERSION) compressionCodec.foreach { codec = - meta += (compressionCodec - codec.getClass().getName()) + metadata += (COMPRESSION_CODEC_KEY - codec.getClass.getCanonicalName) } - -def write(entry: String) = { - val bytes = entry.getBytes(Charsets.UTF_8) - if (bytes.length MAX_HEADER_LINE_LENGTH) { -throw new IOException(sHeader entry too long: ${entry}) - } - logStream.write(bytes, 0, bytes.length) +val metadataJson = compact(render(JsonProtocol.mapToJson(metadata))) +val metadataBytes = (metadataJson + \n).getBytes(Charsets.UTF_8) +if (metadataBytes.length MAX_HEADER_LINE_LENGTH) { + throw new IOException(sEvent log metadata too long: $metadataJson) } - -meta.foreach { case (k, v) = write(s$k=$v\n) } -write(s$HEADER_END_MARKER\n) - compressionCodec.map(_.compressedOutputStream(logStream)).getOrElse(logStream) +logStream.write(metadataBytes, 0, metadataBytes.length) +logStream } /** * Return a file-system-safe path to the log file for the given application. * + * Note that because we currently only create a single log file for each application, + * we must encode all the information needed to parse this event log in the file name + * instead of within the file itself. Otherwise, if the file is compressed, for instance, + * we won't know which codec to use to decompress the metadata. + * * @param logBaseDir Directory where the log file will be written. * @param appId A unique app ID. + * @param compressionCodecName Name of the compression codec used to compress the contents + * of the log, or None if compression is not enabled. * @return A path which consists of file-system-safe characters. */ - def getLogPath(logBaseDir: String, appId: String): String = { -val name = appId.replaceAll([ :/], -).replaceAll([${}'\], _).toLowerCase -Utils.resolveURI(logBaseDir) + / + name.stripSuffix(/) + def getLogPath( + logBaseDir: String, + appId: String, + compressionCodecName: Option[String]): String = { +val sanitizedAppId = appId.replaceAll([ :/], -).replaceAll([${}'\], _).toLowerCase +// e.g. EVENT_LOG_app_123_SPARK_VERSION_1.3.1 +// e.g. EVENT_LOG_ {...}
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4804#issuecomment-76504745 [Test build #28105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28105/consoleFull) for PR 4804 at commit [`4d95f75`](https://github.com/apache/spark/commit/4d95f75d3bdf09bad3d8a4a32d5c2ee7486a8a23). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/4804#discussion_r25552927 --- Diff: core/src/test/java/org/apache/spark/util/collection/TestTimSort.java --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util.collection; + +import java.util.*; + +/** + * This codes generates a int array which fails the standard TimSort. + * + * The blog that reported the bug + * http://www.envisage-project.eu/timsort-specification-and-verification/ + * + * The algorithms to reproduce the bug is obtained from the reporter of the bug + * https://github.com/abstools/java-timsort-bug + * + * Licensed under Apache License 2.0 + * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE --- End diff -- If this test code is your own work, then this statement is redundant with the license header, so would be removed. But it's copied from the project above right? then you can't write a license header here that says it was licensed to the ASF. If anything we would reproduce the plain vanilla AL2 stanza from the plain AL2 license text up above in the file's license header. That or not copy this test code. This Java code needs a bit more style work to match coding practices here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/4804#discussion_r25553306 --- Diff: core/src/test/java/org/apache/spark/util/collection/TestTimSort.java --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util.collection; + +import java.util.*; + +/** + * This codes generates a int array which fails the standard TimSort. + * + * The blog that reported the bug + * http://www.envisage-project.eu/timsort-specification-and-verification/ + * + * The algorithms to reproduce the bug is obtained from the reporter of the bug + * https://github.com/abstools/java-timsort-bug + * + * Licensed under Apache License 2.0 + * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE + */ +public class TestTimSort { + + private static final int MIN_MERGE = 32; + + /** + * Returns an array of integers that demonstrate the bug in TimSort + */ + public static int[] getTimSortBugTestSet(int length) { +int minRun = minRunLength(length); +ListLong runs = runsJDKWorstCase(minRun, length); +return createArray(runs, length); + } + + private static int minRunLength(int n) { +int r = 0; // Becomes 1 if any 1 bits are shifted off +while (n = MIN_MERGE) { + r |= (n 1); + n = 1; +} +return n + r; + } + + private static int[] createArray(ListLong runs, int length) { +int[] a = new int[length]; +Arrays.fill(a, 0); +int endRun = -1; +for (long len : runs) + a[endRun += len] = 1; +a[length - 1] = 0; +return a; + } + + /** + * Fills coderuns/code with a sequence of run lengths of the formbr + * Y_n x_{n,1} x_{n,2} ... x_{n,l_n} br + * Y_{n-1} x_{n-1,1} x_{n-1,2} ... x_{n-1,l_{n-1}} br + * ... br + * Y_1 x_{1,1} x_{1,2} ... x_{1,l_1}br + * The Y_i's are chosen to satisfy the invariant throughout execution, + * but the x_{i,j}'s are merged (by codeTimSort.mergeCollapse/code) + * into an X_i that violates the invariant. + * + * @param length The sum of all run lengths that will be added to coderuns/code. + */ + private static ListLong runsJDKWorstCase(int minRun, int length) { --- End diff -- Oh! can't believe I missed the file at the end here. OK. I see that the test case that gets generated is really big, not something you can paste into the source. Hm, OK well I suggest fixing the license situation here at a minimum --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76507612 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28103/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76507606 [Test build #28103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28103/consoleFull) for PR 4821 at commit [`519e51a`](https://github.com/apache/spark/commit/519e51a958b40d193327e85b659e1df767041f55). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6055] [PySpark] fix incorrect DataType....
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4809#issuecomment-76509176 I've merged this into `branch-1.2` (1.2.2). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4823#issuecomment-76510590 [Test build #28107 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28107/consoleFull) for PR 4823 at commit [`af461cc`](https://github.com/apache/spark/commit/af461ccce44e2792ea9356ccc2db6c84609511a0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4823#issuecomment-76510593 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28107/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1965 [WEBUI] Spark UI throws NPE on tryi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4777#issuecomment-76510899 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28108/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1965 [WEBUI] Spark UI throws NPE on tryi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4777#issuecomment-76510896 [Test build #28108 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28108/consoleFull) for PR 4777 at commit [`7e16590`](https://github.com/apache/spark/commit/7e1659074451b03d6b4626aff382ec6ba2f53289). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4826#issuecomment-76513062 [Test build #28113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28113/consoleFull) for PR 4826 at commit [`e4f397c`](https://github.com/apache/spark/commit/e4f397cea7ec0dc21a714b75a7254bb275319fc2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4823#issuecomment-76514247 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28112/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4823#issuecomment-76514243 [Test build #28112 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28112/consoleFull) for PR 4823 at commit [`7f52874`](https://github.com/apache/spark/commit/7f52874badfea314d019b0dc9097c54b8af2f654). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6078][CORE] create event log dir if not...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/4829 [SPARK-6078][CORE] create event log dir if not exists when event log directory does not exists, spark just throw IlleagalArgumentException and stop the job. User need manually create directory first. It's better to create the directory automatically if the directory does not exists. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark creatLogDir Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4829.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4829 commit 0de4f18d2b306d2f73d3b6958b400a56c6154de1 Author: Zhang, Liye liye.zh...@intel.com Date: 2015-02-28T06:13:38Z create eventlog dir if eventlog dir does not exists commit e76a1b46383a831f9c6a0daccf1d89934cbbefd2 Author: Zhang, Liye liye.zh...@intel.com Date: 2015-02-28T06:44:00Z throw exception when there is same file name --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4826#issuecomment-76514620 [Test build #28119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28119/consoleFull) for PR 4826 at commit [`0eb5578`](https://github.com/apache/spark/commit/0eb5578f8fc81c9c2186ffe7ba4b538f7a40f828). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25552366 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -217,53 +219,60 @@ private[spark] object EventLoggingListener extends Logging { /** * Write metadata about the event log to the given stream. * - * The header is a serialized version of a map, except it does not use Java serialization to - * avoid incompatibilities between different JDKs. It writes one map entry per line, in - * key=value format. - * - * The very last entry in the header is the `HEADER_END_MARKER` marker, so that the parsing code - * can know when to stop. + * The header is a single line of JSON in the beginning of the file. Note that this + * assumes all metadata necessary to parse the log is also included in the file name. + * The format needs to be kept in sync with the `openEventLog()` method below. Also, it + * cannot change in new Spark versions without some other way of detecting the change. * - * The format needs to be kept in sync with the openEventLog() method below. Also, it cannot - * change in new Spark versions without some other way of detecting the change (like some - * metadata encoded in the file name). - * - * @param logStream Raw output stream to the even log file. + * @param logStream Raw output stream to the event log file. * @param compressionCodec Optional compression codec to use. - * @return A stream where to write event log data. This may be a wrapper around the original + * @return A stream to which event log data is written. This may be a wrapper around the original * stream (for example, when compression is enabled). */ def initEventLog( logStream: OutputStream, compressionCodec: Option[CompressionCodec]): OutputStream = { -val meta = mutable.HashMap((version - SPARK_VERSION)) +val metadata = new mutable.HashMap[String, String] +// Some of these metadata are already encoded in the file name +// Here we include them again within the file itself for completeness +metadata += (Event - Utils.getFormattedClassName(SparkListenerMetadataIdentifier)) +metadata += (SPARK_VERSION_KEY - SPARK_VERSION) compressionCodec.foreach { codec = - meta += (compressionCodec - codec.getClass().getName()) + metadata += (COMPRESSION_CODEC_KEY - codec.getClass.getCanonicalName) } - -def write(entry: String) = { - val bytes = entry.getBytes(Charsets.UTF_8) - if (bytes.length MAX_HEADER_LINE_LENGTH) { -throw new IOException(sHeader entry too long: ${entry}) - } - logStream.write(bytes, 0, bytes.length) +val metadataJson = compact(render(JsonProtocol.mapToJson(metadata))) +val metadataBytes = (metadataJson + \n).getBytes(Charsets.UTF_8) +if (metadataBytes.length MAX_HEADER_LINE_LENGTH) { + throw new IOException(sEvent log metadata too long: $metadataJson) } - -meta.foreach { case (k, v) = write(s$k=$v\n) } -write(s$HEADER_END_MARKER\n) - compressionCodec.map(_.compressedOutputStream(logStream)).getOrElse(logStream) +logStream.write(metadataBytes, 0, metadataBytes.length) +logStream } /** * Return a file-system-safe path to the log file for the given application. * + * Note that because we currently only create a single log file for each application, + * we must encode all the information needed to parse this event log in the file name + * instead of within the file itself. Otherwise, if the file is compressed, for instance, + * we won't know which codec to use to decompress the metadata. + * * @param logBaseDir Directory where the log file will be written. * @param appId A unique app ID. + * @param compressionCodecName Name of the compression codec used to compress the contents + * of the log, or None if compression is not enabled. * @return A path which consists of file-system-safe characters. */ - def getLogPath(logBaseDir: String, appId: String): String = { -val name = appId.replaceAll([ :/], -).replaceAll([${}'\], _).toLowerCase -Utils.resolveURI(logBaseDir) + / + name.stripSuffix(/) + def getLogPath( + logBaseDir: String, + appId: String, + compressionCodecName: Option[String]): String = { +val sanitizedAppId = appId.replaceAll([ :/], -).replaceAll([${}'\], _).toLowerCase +// e.g. EVENT_LOG_app_123_SPARK_VERSION_1.3.1 +// e.g. EVENT_LOG_ {...}
[GitHub] spark pull request: [SPARK-6048] SparkConf should not translate de...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4799#issuecomment-76504481 [Test build #28104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28104/consoleFull) for PR 4799 at commit [`c26a9e3`](https://github.com/apache/spark/commit/c26a9e3c3f6a17ae01782278fb1d4a1426fcbdbd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4804#issuecomment-76504579 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/4804#discussion_r25553196 --- Diff: core/src/test/java/org/apache/spark/util/collection/TestTimSort.java --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util.collection; + +import java.util.*; + +/** + * This codes generates a int array which fails the standard TimSort. + * + * The blog that reported the bug + * http://www.envisage-project.eu/timsort-specification-and-verification/ + * + * The algorithms to reproduce the bug is obtained from the reporter of the bug + * https://github.com/abstools/java-timsort-bug + * + * Licensed under Apache License 2.0 + * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE --- End diff -- It looks like it's almost entirely the code from the third party site. The right-est thing to do is actually begin this file with the standard AL2 stanza: ``` Copyright 2015 [the author's name] Licensed under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ``` ... since that is substantially the license of the work in the file. I believe the build's check for this stuff will accept this, or should. It can be followed with a comment that the work has been modified from its original form. I don't think it's crazy to omit this either, though always nice to have tests. THere's no standard IJ config but I'll point out some things that could be touched up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user hotou commented on the pull request: https://github.com/apache/spark/pull/4804#issuecomment-76508685 @srowen I did what you recommended here. This passed the rat test on my machine at least --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1965 [WEBUI] Spark UI throws NPE on tryi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4777#issuecomment-76508662 [Test build #28108 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28108/consoleFull) for PR 4777 at commit [`7e16590`](https://github.com/apache/spark/commit/7e1659074451b03d6b4626aff382ec6ba2f53289). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-76511907 [Test build #28110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28110/consoleFull) for PR 4819 at commit [`7d4ed48`](https://github.com/apache/spark/commit/7d4ed483e0a0c58669ab00421d00eecda832cfba). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-76511909 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28110/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3586][streaming]Support nested director...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2765#issuecomment-76512835 [Test build #28114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28114/consoleFull) for PR 2765 at commit [`348657e`](https://github.com/apache/spark/commit/348657e2069c3732d2a43bbc6ddb873eec7a3a48). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4723#issuecomment-76512903 [Test build #28111 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28111/consoleFull) for PR 4723 at commit [`1b6e873`](https://github.com/apache/spark/commit/1b6e873602785c5e5c78ee23d77725d2c51129fc). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class OffsetRange(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4723#issuecomment-76512906 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28111/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4792#issuecomment-76513950 [Test build #28117 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28117/consoleFull) for PR 4792 at commit [`538f506`](https://github.com/apache/spark/commit/538f506851d7e2eba6a20d0ad4a5909486bf8516). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/4792#issuecomment-76514046 @yhuai Thanks for the review! I've addressed the comments. Will merge this to master and branch-1.3 after Jenkins approves. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6077] update listener for the existing ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4828#issuecomment-76514483 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5979][SPARK-6032] Smaller safer --packa...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4802 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5979][SPARK-6031][SPARK-6032][SPARK-604...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4754#issuecomment-76514573 @brkyvz let's close this issue for now and keep it in our back pocket. We can use it if we decide to put this in the 1.3 branch down the line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [GraphX] initialmessage for pagerank should be...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1128 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3629][Doc] improve spark on yarn doc
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2813 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6079 ] Use index to speed up StatusTrac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4830#issuecomment-76515751 [Test build #28122 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28122/consoleFull) for PR 4830 at commit [`2c49614`](https://github.com/apache/spark/commit/2c49614cc4f92dc1a47044be362db51cfe4da77b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76503962 [Test build #28103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28103/consoleFull) for PR 4821 at commit [`519e51a`](https://github.com/apache/spark/commit/519e51a958b40d193327e85b659e1df767041f55). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...
Github user hotou commented on the pull request: https://github.com/apache/spark/pull/4804#issuecomment-76505114 ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6029] Stop excluding fastutil package
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4780#issuecomment-76507231 So, marking CA as provided but putting it your assembly is contradictory, but in the end, the right thing is to include CA (and its dependencies) in your assembly, yes. I was asking whether you mark Spark as provided; it should be. You're effectively shading CA (and not fastutil); you should also be able to achieve that through your build rather than bother with source, though, I don't know how that works in SBT. (`minimizeJar` is a function of Maven's shading plugin.) fastutil-in-Spark isn't the issue per se, since indeed Spark doesn't have it! what it does have is CA. Your result seems to confirm that the problem is really CA, in the sense that your app finds Spark's loaded copy of CA classes, but that classloader can't see your classloader, which also has CA, but also the fastutil it needs. Shading CA disambiguates this. (So, put that workaround in your pocket for now; you should be able to do this with SBT.) This is what the userClassPathFirst stuff is supposed to resolve though. You're definitely sure CA is in your app JAR? I ask just because you mention it was marked provided above, though also in your assembly. Worth double-checking. Otherwise, I'm not sure. It almost sounds like the inverse of unresolved (?) https://issues.apache.org/jira/browse/SPARK-1863 but may surprise me by being the same issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...
GitHub user lianhuiwang opened a pull request: https://github.com/apache/spark/pull/4823 [SPARK-4411][UI]Add kill link for jobs in the UI We should have a kill link for each job, similar to what we have for each stage, so it's easier for users to kill jobs in the UI. @kayousterhout can you take a look at this? thanks. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lianhuiwang/spark SPARK-4411 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4823.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4823 commit af461ccce44e2792ea9356ccc2db6c84609511a0 Author: Lianhui Wang lianhuiwan...@gmail.com Date: 2015-02-28T03:24:46Z Add kill link for jobs in the UI --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6048] SparkConf should not translate de...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4799#issuecomment-76507995 [Test build #28104 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28104/consoleFull) for PR 4799 at commit [`c26a9e3`](https://github.com/apache/spark/commit/c26a9e3c3f6a17ae01782278fb1d4a1426fcbdbd). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6048] SparkConf should not translate de...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4799#issuecomment-76507997 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28104/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4819#issuecomment-76509618 [Test build #28110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28110/consoleFull) for PR 4819 at commit [`7d4ed48`](https://github.com/apache/spark/commit/7d4ed483e0a0c58669ab00421d00eecda832cfba). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4723#issuecomment-76510453 [Test build #28111 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28111/consoleFull) for PR 4723 at commit [`1b6e873`](https://github.com/apache/spark/commit/1b6e873602785c5e5c78ee23d77725d2c51129fc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4021#issuecomment-76511660 I think that this patch may have introduced a bug that may cause accumulator updates to be lost: https://issues.apache.org/jira/browse/SPARK-6075 I'm still trying to see if I can spot the problem, but my hunch is that maybe the `localAccums` thread-local maps should not hold weak references. When deserializing an accumulator in an executor and registering it with `localAccums`, is there ever a moment in which the accumulator has no strong references pointing to it? Does someone object hold a strong reference to an accumulator while it's being deserialized? If not, this could lead to it being dropped from the `localAccums` map, causing that task's accumulator updates to be lost. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4826#issuecomment-76512669 [Test build #28113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28113/consoleFull) for PR 4826 at commit [`e4f397c`](https://github.com/apache/spark/commit/e4f397cea7ec0dc21a714b75a7254bb275319fc2). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/4729#issuecomment-76512714 @viirya Thank you for working on it! Our discussions helped me clearly understand the problem. After discussions with @liancheng, I am proposing a different approach to address this issue in https://github.com/apache/spark/pull/4826. Please feel free to leave comments at there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org