date:20150227

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4821#issuecomment-76505881
  
  [Test build #28102 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28102/consoleFull)
 for   PR 4821 at commit 
[`ef69276`](https://github.com/apache/spark/commit/ef692768db319d3159ce9522d625cede3505e161).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4821#issuecomment-76505885
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28102/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6055] [PySpark] fix incorrect eq of...

2015-02-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4808


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6074] [sql] Package pyspark sql binding...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4822#issuecomment-76509271
  
  [Test build #28106 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28106/consoleFull)
 for   PR 4822 at commit 
[`fb52001`](https://github.com/apache/spark/commit/fb5200118d7fbf9466d3b91936e24de268051d6e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6055] [PySpark] fix incorrect eq of...

2015-02-27 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4808#issuecomment-76510949
  
LGTM, so I've merged this into `branch-1.3` (1.3.0) and `master` (1.4.0).  
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4823#issuecomment-76511755
  
  [Test build #28112 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28112/consoleFull)
 for   PR 4823 at commit 
[`7f52874`](https://github.com/apache/spark/commit/7f52874badfea314d019b0dc9097c54b8af2f654).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [CORE][minor] enhance the `toArray` method in ...

2015-02-27 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/4825

[CORE][minor] enhance the `toArray` method in `SizeTrackingVector`

Use array copy instead of `Iterator#toArray` to make it more efficient.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4825.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4825


commit 946a35bfef2a746a4e4fa44c62df70031677d217
Author: Wenchen Fan cloud0...@outlook.com
Date:   2015-02-16T09:42:38Z

minor enhance




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6079 ] Use index to speed up StatusTrac...

2015-02-27 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/4830

[SPARK-6079 ] Use index to speed up StatusTracker.getJobIdsForGroup()

`StatusTracker.getJobIdsForGroup()` is implemented via a linear scan over a 
HashMap rather than using an index, which might be an expensive operation if 
there are many (e.g. thousands) of retained jobs.

This patch adds a new map to `JobProgressListener` in order to speed up 
these lookups.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark 
statustracker-job-group-indexing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4830.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4830


commit 97275a7a472ba782c268f391876529fec8fbf2ab
Author: Josh Rosen joshro...@databricks.com
Date:   2015-02-28T07:29:23Z

Add jobGroup to jobId index to JobProgressListener

commit 2c49614cc4f92dc1a47044be362db51cfe4da77b
Author: Josh Rosen joshro...@databricks.com
Date:   2015-02-28T07:31:27Z

getOrElse




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4826#issuecomment-76516087
  
  [Test build #28119 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28119/consoleFull)
 for   PR 4826 at commit 
[`0eb5578`](https://github.com/apache/spark/commit/0eb5578f8fc81c9c2186ffe7ba4b538f7a40f828).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4826#issuecomment-76516089
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28119/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r25552305
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -422,6 +424,108 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to left semi join.
+   * select T1.x from T1 where T1.x in (select T2.y from T2) transformed to
+   * select T1.x from T1 left semi join T2 on T1.x = T2.y.
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = conditions.collect {
+  case In(exp, Seq(SubqueryExpression(subquery))) = (exp, 
subquery)
+}
+// Replace subqueries with a dummy true literal since they are 
evaluated separately now.
+val transformedConds = conditions.transform {
+  case In(_, Seq(SubqueryExpression(_))) = Literal(true)
+}
+subqueryExprs match {
+  case Seq() = filter // No subqueries.
+  case Seq((exp, subquery)) =
+createLeftSemiJoin(
+  child,
+  exp,
+  subquery,
+  transformedConds)
+  case _ =
+throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression,
+subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = 
transformAndGetConditions(value, subquery)
+  // Add both parent query conditions and subquery conditions as join 
conditions
+  val allPredicates = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(allPredicates))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  // TODO : we only decorelate subqueries in very specific cases like 
the cases mentioned above
+  // in documentation. The more complex queries like using of 
subqueries inside subqueries can 
+  // be supported in future.
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than one item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(
+project,
+SubQuery can contain only one item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter{a =
+resolvedChild.resolve(a.name, resolver) != None  
!project.outputSet.contains(a)}
+  val nameToExprMap = collection.mutable.Map[String, Alias]()
+  // Create aliases for all projection expressions.
+  val witAliases = (projectList ++ 
toBeAddedExprs).zipWithIndex.map {
+case (exp, index) = 
+  nameToExprMap.put(exp.name, Alias(exp, ssqc$index)())
+  Alias(exp, ssqc$index)()
+  }
+  // Replace the condition column names with alias names.
+  val transformedConds = condition.transform {
+case a: Attribute if resolvedChild.resolve(a.name, resolver) 
!= None =
+  nameToExprMap.get(a.name).get.toAttribute
+  }
+  // Join the first projection column of subquery to the main 
query and add as condition
+  // TODO : We can avoid if the parent condition already has this 
condition.
+  expr += EqualTo(value, witAliases(0).toAttribute)
+  expr += transformedConds
--- End diff --

Connect the subquery with the join condition doesn't make sense to me, as 
we will transform the whole logical plan as

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4804#issuecomment-76504749
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28105/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4804#issuecomment-76504748
  
  [Test build #28105 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28105/consoleFull)
 for   PR 4804 at commit 
[`4d95f75`](https://github.com/apache/spark/commit/4d95f75d3bdf09bad3d8a4a32d5c2ee7486a8a23).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4226] [SQL] Add Exists support for wher...

2015-02-27 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4812#issuecomment-76505474
  
Sorry, I meant semantically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4804#discussion_r25553237
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/TestTimSort.java ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util.collection;
+
+import java.util.*;
+
+/**
+ * This codes generates a int array which fails the standard TimSort.
+ *
+ * The blog that reported the bug
+ * http://www.envisage-project.eu/timsort-specification-and-verification/
+ *
+ * The algorithms to reproduce the bug is obtained from the reporter of 
the bug
+ * https://github.com/abstools/java-timsort-bug
+ *
+ * Licensed under Apache License 2.0
+ * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE
+ */
+public class TestTimSort {
+
+  private static final int MIN_MERGE = 32;
+
+  /**
+   * Returns an array of integers that demonstrate the bug in TimSort
+   */
+  public static int[] getTimSortBugTestSet(int length) {
+int minRun = minRunLength(length);
+ListLong runs = runsJDKWorstCase(minRun, length);
+return createArray(runs, length);
+  }
+
+  private static int minRunLength(int n) {
+int r = 0; // Becomes 1 if any 1 bits are shifted off
+while (n = MIN_MERGE) {
+  r |= (n  1);
+  n = 1;
+}
+return n + r;
+  }
+
+  private static int[] createArray(ListLong runs, int length) {
+int[] a = new int[length];
+Arrays.fill(a, 0);
+int endRun = -1;
+for (long len : runs)
+  a[endRun += len] = 1;
+a[length - 1] = 0;
+return a;
+  }
+
+  /**
+   * Fills coderuns/code with a sequence of run lengths of the formbr
+   * Y_n x_{n,1}   x_{n,2}   ... x_{n,l_n} br
+   * Y_{n-1} x_{n-1,1} x_{n-1,2} ... x_{n-1,l_{n-1}} br
+   * ... br
+   * Y_1 x_{1,1}   x_{1,2}   ... x_{1,l_1}br
+   * The Y_i's are chosen to satisfy the invariant throughout execution,
+   * but the x_{i,j}'s are merged (by codeTimSort.mergeCollapse/code)
+   * into an X_i that violates the invariant.
+   *
+   * @param length The sum of all run lengths that will be added to 
coderuns/code.
+   */
+  private static ListLong runsJDKWorstCase(int minRun, int length) {
+ListLong runs = new ArrayListLong();
+
+long runningTotal = 0, Y = minRun + 4, X = minRun;
+
+while (runningTotal + Y + X = length) {
+  runningTotal += X + Y;
+  generateJDKWrongElem(runs, minRun, X);
+  runs.add(0, Y);
+  // X_{i+1} = Y_i + x_{i,1} + 1, since runs.get(1) = x_{i,1}
+  X = Y + runs.get(1) + 1;
+  // Y_{i+1} = X_{i+1} + Y_i + 1
+  Y += X + 1;
+}
+
+if (runningTotal + X = length) {
+  runningTotal += X;
+  generateJDKWrongElem(runs, minRun, X);
+}
+
+runs.add(length - runningTotal);
+return runs;
--- End diff --

Actually, is there any test at all in this file? it seems like it just 
generates the problem test case.

Maybe you can use it to generate a short test exposing the bug, and create 
a new, actual test that shows the sort works on it.

Then this code need not exist in Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6063

2015-02-27 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4815#issuecomment-76506361
  
I don't think it's necessary to wait on Jenkins. This doc change can't 
cause a problem. We can fix the title on merge too. I'll wait anyway for that, 
but figure one of us can just merge soon in any event.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6074] [sql] Package pyspark sql binding...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4822#issuecomment-76506844
  
  [Test build #28106 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28106/consoleFull)
 for   PR 4822 at commit 
[`fb52001`](https://github.com/apache/spark/commit/fb5200118d7fbf9466d3b91936e24de268051d6e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4823#issuecomment-76508165
  
  [Test build #28107 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28107/consoleFull)
 for   PR 4823 at commit 
[`af461cc`](https://github.com/apache/spark/commit/af461ccce44e2792ea9356ccc2db6c84609511a0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6073][SQL] Need to refresh metastore ca...

2015-02-27 Thread yhuai

GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/4824

[SPARK-6073][SQL] Need to refresh metastore cache after append data in 
CreateMetastoreDataSourceAsSelect

JIRA: https://issues.apache.org/jira/browse/SPARK-6073

@liancheng


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark refreshCache

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4824.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4824


commit b9542ef6198988736ccfea3b665d968b6e767418
Author: Yin Huai yh...@databricks.com
Date:   2015-02-28T04:07:55Z

Refresh metadata cache in the Catalog in CreateMetastoreDataSourceAsSelect.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6073][SQL] Need to refresh metastore ca...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4824#issuecomment-76509453
  
  [Test build #28109 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28109/consoleFull)
 for   PR 4824 at commit 
[`b9542ef`](https://github.com/apache/spark/commit/b9542ef6198988736ccfea3b665d968b6e767418).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-27 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-76512378
  
Another thought: if `register()` is somehow called twice for the same 
accumulator, then it looks like we'll silently overwrite the existing value in 
`localAccums`.  We should probably throw an exception instead, since that 
scenario could lead to lost updates.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4826#issuecomment-76513066
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28113/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4688#issuecomment-76513795
  
  [Test build #28116 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28116/consoleFull)
 for   PR 4688 at commit 
[`5c11c3e`](https://github.com/apache/spark/commit/5c11c3e348fecdd070f5ab471314bce94bb4b66e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

2015-02-27 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/4818#discussion_r2088
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -290,8 +290,19 @@ private[yarn] class YarnAllocator(
   location: String,
   containersToUse: ArrayBuffer[Container],
   remaining: ArrayBuffer[Container]): Unit = {
+// SPARK-6050: certain Yarn configurations return a virtual core count 
that doesn't match the
+// request; for example, capacity scheduler + 
DefaultResourceCalculator. Allow users in those
+// situations to disable matching of the core count.
+val matchingResource =
+  if (sparkConf.getBoolean(spark.yarn.container.disableCpuMatching, 
false)) {
+Resource.newInstance(allocatedContainer.getResource().getMemory(),
--- End diff --

Nit: take out parens for consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

2015-02-27 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/4818#discussion_r2091
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -290,8 +290,19 @@ private[yarn] class YarnAllocator(
   location: String,
   containersToUse: ArrayBuffer[Container],
   remaining: ArrayBuffer[Container]): Unit = {
+// SPARK-6050: certain Yarn configurations return a virtual core count 
that doesn't match the
+// request; for example, capacity scheduler + 
DefaultResourceCalculator. Allow users in those
+// situations to disable matching of the core count.
+val matchingResource =
+  if (sparkConf.getBoolean(spark.yarn.container.disableCpuMatching, 
false)) {
+Resource.newInstance(allocatedContainer.getResource().getMemory(),
--- End diff --

Actually, why not just use `allocatedContainer.getResource`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5979][SPARK-6031][SPARK-6032][SPARK-604...

2015-02-27 Thread brkyvz

Github user brkyvz closed the pull request at:

https://github.com/apache/spark/pull/4754


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6050] [yarn] Add config option to do la...

2015-02-27 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/4818#issuecomment-76514683
  
My opinion is that we should make the default true, as the vanilla YARN 
default of `FIFOScheduler` will run into this issue (though most vendor 
distributions have a better default).  There are no versions of YARN that will 
return containers smaller than were requested, except in this weird situation 
where the scheduler doesn't support CPU scheduling.  I actually think it might 
be better to avoid a config at all and always just avoid matching on CPU.  It's 
really hard to imagine any situation where it would actually benefit someone to 
set the config to false.  The only one I can think of is debugging incorrect 
behavior in YARN, and, if we care about that, it would be better to just log 
something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4688#issuecomment-76515993
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28116/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-02-27 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-76515967
  
@mateiz @pwendell I'm hoping to also see this merged soon, what else is 
needed here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-02-27 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4821#discussion_r25552458
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -110,17 +117,12 @@ private[spark] class EventLoggingListener(
 hadoopDataStream = Some(fileSystem.create(path))
 hadoopDataStream.get
   }
-
-val compressionCodec =
-  if (shouldCompress) {
-Some(CompressionCodec.createCodec(sparkConf))
-  } else {
-None
-  }
+val cstream = 
compressionCodec.map(_.compressedOutputStream(dstream)).getOrElse(dstream)
--- End diff --

that's fine. I fixed this in my latest commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4804#discussion_r25552857
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/TestTimSort.java ---
@@ -0,0 +1,134 @@
+package org.apache.spark.util.collection;
+
+import java.util.*;
+
+/*
--- End diff --

you need to put this in the beginning of the file, i.e. before the package 
definition


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-02-27 Thread marsishandsome

Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/4525#issuecomment-76505657
  
@andrewor14 please check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6074] [sql] Package pyspark sql binding...

2015-02-27 Thread vanzin

GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/4822

[SPARK-6074] [sql] Package pyspark sql bindings.

This is needed for the SQL bindings to work on Yarn.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-6074

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4822.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4822


commit fb5200118d7fbf9466d3b91936e24de268051d6e
Author: Marcelo Vanzin van...@cloudera.com
Date:   2015-02-28T02:46:03Z

[SPARK-6074] [sql] Package pyspark sql bindings.

This is needed for the SQL bindings to work on Yarn.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6055] [PySpark] fix incorrect DataType....

2015-02-27 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4810#issuecomment-76509238
  
I've merged this into `branch-1.1` (1.1.2).  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread hotou

Github user hotou commented on the pull request:

https://github.com/apache/spark/pull/4804#issuecomment-76509270
  
@srowen Sounds good, done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5938][SQL] Generate Row from JSON strin...

2015-02-27 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4712#issuecomment-76509265
  
@liancheng Description is updated. Please take a look when you have time. 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6074] [sql] Package pyspark sql binding...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4822#issuecomment-76509272
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28106/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3586][streaming]Support nested director...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2765#issuecomment-76513278
  
  [Test build #28115 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28115/consoleFull)
 for   PR 2765 at commit 
[`beaed4c`](https://github.com/apache/spark/commit/beaed4c901bca8fe91361901e5ba0cb30b8a94b5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6070] [yarn] Remove unneeded classes fr...

2015-02-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4820


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6070] [yarn] Remove unneeded classes fr...

2015-02-27 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4820#issuecomment-76514114
  
Thanks Marcelo, pulling this in!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5979][SPARK-6032] Smaller safer --packa...

2015-02-27 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4802#issuecomment-76514538
  
Pulling this in - thanks Burak!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6078][CORE] create event log dir if not...

2015-02-27 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4829#issuecomment-76516211
  
If we do decide to create directories, then we should only create the last 
missing directory, not all missing parent directories.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5751] [SQL] Sets SPARK_HOME as SPARK_PI...

2015-02-27 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/4758#issuecomment-76503823
  
Cool, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread hotou

Github user hotou commented on the pull request:

https://github.com/apache/spark/pull/4804#issuecomment-76503891
  
@rxin @srowen Thanks for the review, I updated the comments and licensed 
info etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-76503896
  
Thank you @ravipesala for implementing this, however, this PR probably 
involve some unnecessary join condition transformation, probably you need to 
understand the rule of pushing down the join filter / condition first. Sorry, 
please correct me if I misunderstood something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread hotou

Github user hotou commented on the pull request:

https://github.com/apache/spark/pull/4804#issuecomment-76504910
  
Ah. I guess I have to have license file header in .java, not just linked to 
it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread hotou

Github user hotou commented on a diff in the pull request:

https://github.com/apache/spark/pull/4804#discussion_r25552968
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/TestTimSort.java ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util.collection;
+
+import java.util.*;
+
+/**
+ * This codes generates a int array which fails the standard TimSort.
+ *
+ * The blog that reported the bug
+ * http://www.envisage-project.eu/timsort-specification-and-verification/
+ *
+ * The algorithms to reproduce the bug is obtained from the reporter of 
the bug
+ * https://github.com/abstools/java-timsort-bug
+ *
+ * Licensed under Apache License 2.0
+ * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE
--- End diff --

Well, it's not a exact copy, I made changes to the original codes. 

Do you guys have a IntelliJ style that I can import ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4804#discussion_r25553210
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/TestTimSort.java ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util.collection;
+
+import java.util.*;
+
+/**
+ * This codes generates a int array which fails the standard TimSort.
+ *
+ * The blog that reported the bug
+ * http://www.envisage-project.eu/timsort-specification-and-verification/
+ *
+ * The algorithms to reproduce the bug is obtained from the reporter of 
the bug
+ * https://github.com/abstools/java-timsort-bug
+ *
+ * Licensed under Apache License 2.0
+ * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE
+ */
+public class TestTimSort {
+
+  private static final int MIN_MERGE = 32;
+
+  /**
+   * Returns an array of integers that demonstrate the bug in TimSort
+   */
+  public static int[] getTimSortBugTestSet(int length) {
+int minRun = minRunLength(length);
+ListLong runs = runsJDKWorstCase(minRun, length);
+return createArray(runs, length);
+  }
+
+  private static int minRunLength(int n) {
+int r = 0; // Becomes 1 if any 1 bits are shifted off
+while (n = MIN_MERGE) {
+  r |= (n  1);
+  n = 1;
+}
+return n + r;
+  }
+
+  private static int[] createArray(ListLong runs, int length) {
+int[] a = new int[length];
+Arrays.fill(a, 0);
+int endRun = -1;
+for (long len : runs)
+  a[endRun += len] = 1;
+a[length - 1] = 0;
+return a;
+  }
+
+  /**
+   * Fills coderuns/code with a sequence of run lengths of the formbr
+   * Y_n x_{n,1}   x_{n,2}   ... x_{n,l_n} br
+   * Y_{n-1} x_{n-1,1} x_{n-1,2} ... x_{n-1,l_{n-1}} br
+   * ... br
+   * Y_1 x_{1,1}   x_{1,2}   ... x_{1,l_1}br
+   * The Y_i's are chosen to satisfy the invariant throughout execution,
+   * but the x_{i,j}'s are merged (by codeTimSort.mergeCollapse/code)
+   * into an X_i that violates the invariant.
+   *
+   * @param length The sum of all run lengths that will be added to 
coderuns/code.
+   */
+  private static ListLong runsJDKWorstCase(int minRun, int length) {
--- End diff --

Most importantly, this file doesn't contain tests that JUnit will run. Have 
a look at how other files declare test code with `@Test` annotations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4804#discussion_r25553242
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/TestTimSort.java ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util.collection;
+
+import java.util.*;
+
+/**
+ * This codes generates a int array which fails the standard TimSort.
+ *
+ * The blog that reported the bug
+ * http://www.envisage-project.eu/timsort-specification-and-verification/
+ *
+ * The algorithms to reproduce the bug is obtained from the reporter of 
the bug
+ * https://github.com/abstools/java-timsort-bug
+ *
+ * Licensed under Apache License 2.0
+ * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE
+ */
+public class TestTimSort {
+
+  private static final int MIN_MERGE = 32;
+
+  /**
+   * Returns an array of integers that demonstrate the bug in TimSort
+   */
+  public static int[] getTimSortBugTestSet(int length) {
+int minRun = minRunLength(length);
+ListLong runs = runsJDKWorstCase(minRun, length);
+return createArray(runs, length);
+  }
+
+  private static int minRunLength(int n) {
+int r = 0; // Becomes 1 if any 1 bits are shifted off
+while (n = MIN_MERGE) {
+  r |= (n  1);
+  n = 1;
+}
+return n + r;
+  }
+
+  private static int[] createArray(ListLong runs, int length) {
+int[] a = new int[length];
+Arrays.fill(a, 0);
+int endRun = -1;
+for (long len : runs)
+  a[endRun += len] = 1;
+a[length - 1] = 0;
+return a;
+  }
+
+  /**
+   * Fills coderuns/code with a sequence of run lengths of the formbr
+   * Y_n x_{n,1}   x_{n,2}   ... x_{n,l_n} br
+   * Y_{n-1} x_{n-1,1} x_{n-1,2} ... x_{n-1,l_{n-1}} br
+   * ... br
+   * Y_1 x_{1,1}   x_{1,2}   ... x_{1,l_1}br
+   * The Y_i's are chosen to satisfy the invariant throughout execution,
+   * but the x_{i,j}'s are merged (by codeTimSort.mergeCollapse/code)
+   * into an X_i that violates the invariant.
+   *
+   * @param length The sum of all run lengths that will be added to 
coderuns/code.
+   */
+  private static ListLong runsJDKWorstCase(int minRun, int length) {
--- End diff --

Sean - I think this is used in the scalatest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6074] [sql] Package pyspark sql binding...

2015-02-27 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4822#issuecomment-76506782
  
$ jar tf sql/core/target/spark-sql_2.10-1.3.0-SNAPSHOT.jar | grep 
pyspark
pyspark/
pyspark/sql/
pyspark/sql/functions.py
pyspark/sql/__init__.py
pyspark/sql/tests.py
pyspark/sql/types.py
pyspark/sql/dataframe.py
pyspark/sql/context.py


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread hotou

Github user hotou commented on a diff in the pull request:

https://github.com/apache/spark/pull/4804#discussion_r25553389
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/TestTimSort.java ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util.collection;
+
+import java.util.*;
+
+/**
+ * This codes generates a int array which fails the standard TimSort.
+ *
+ * The blog that reported the bug
+ * http://www.envisage-project.eu/timsort-specification-and-verification/
+ *
+ * The algorithms to reproduce the bug is obtained from the reporter of 
the bug
+ * https://github.com/abstools/java-timsort-bug
+ *
+ * Licensed under Apache License 2.0
+ * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE
+ */
+public class TestTimSort {
+
+  private static final int MIN_MERGE = 32;
+
+  /**
+   * Returns an array of integers that demonstrate the bug in TimSort
+   */
+  public static int[] getTimSortBugTestSet(int length) {
+int minRun = minRunLength(length);
+ListLong runs = runsJDKWorstCase(minRun, length);
+return createArray(runs, length);
+  }
+
+  private static int minRunLength(int n) {
+int r = 0; // Becomes 1 if any 1 bits are shifted off
+while (n = MIN_MERGE) {
+  r |= (n  1);
+  n = 1;
+}
+return n + r;
+  }
+
+  private static int[] createArray(ListLong runs, int length) {
+int[] a = new int[length];
+Arrays.fill(a, 0);
+int endRun = -1;
+for (long len : runs)
+  a[endRun += len] = 1;
+a[length - 1] = 0;
+return a;
+  }
+
+  /**
+   * Fills coderuns/code with a sequence of run lengths of the formbr
+   * Y_n x_{n,1}   x_{n,2}   ... x_{n,l_n} br
+   * Y_{n-1} x_{n-1,1} x_{n-1,2} ... x_{n-1,l_{n-1}} br
+   * ... br
+   * Y_1 x_{1,1}   x_{1,2}   ... x_{1,l_1}br
+   * The Y_i's are chosen to satisfy the invariant throughout execution,
+   * but the x_{i,j}'s are merged (by codeTimSort.mergeCollapse/code)
+   * into an X_i that violates the invariant.
+   *
+   * @param length The sum of all run lengths that will be added to 
coderuns/code.
+   */
+  private static ListLong runsJDKWorstCase(int minRun, int length) {
+ListLong runs = new ArrayListLong();
+
+long runningTotal = 0, Y = minRun + 4, X = minRun;
+
+while (runningTotal + Y + X = length) {
+  runningTotal += X + Y;
+  generateJDKWrongElem(runs, minRun, X);
+  runs.add(0, Y);
+  // X_{i+1} = Y_i + x_{i,1} + 1, since runs.get(1) = x_{i,1}
+  X = Y + runs.get(1) + 1;
+  // Y_{i+1} = X_{i+1} + Y_i + 1
+  Y += X + 1;
+}
+
+if (runningTotal + X = length) {
+  runningTotal += X;
+  generateJDKWrongElem(runs, minRun, X);
+}
+
+runs.add(length - runningTotal);
+return runs;
--- End diff --

In SorterSuite I added a test that uses TestTimSort.java

Yes TestTimSort just generate a int[], but the the array has to be at least 
67108864 long, so I thought just posting a huge int[] is not as useful as 
knowing how the array is generated.

The original codes was written to demonstrate the bug so it had a main(), 
and some other stuffs, I get rid of those.

I am fine with fixing the license here, if you guys bear with me a bit. I 
am not that experienced with open source licenses


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:

[GitHub] spark pull request: SPARK-1965 [WEBUI] Spark UI throws NPE on tryi...

2015-02-27 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4777#issuecomment-76508562
  
OK will make that change, and if there are no more objections, will go 
ahead with this change to patch up this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4804#issuecomment-76508900
  
OK, I apologize for belaboring this and hassle with the incorrect 
suggestion earlier. But I think we may have to do one more thing for the 
licensing to get it right, and we should. I believe we need an entry in our own 
`LICENSE` file after all, given the situation. You can see one for the copied 
TimSort. I'd just add it below that. Then it really does look good from a 
license perspective, AFAIK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r25552219
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -422,6 +424,108 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to left semi join.
+   * select T1.x from T1 where T1.x in (select T2.y from T2) transformed to
+   * select T1.x from T1 left semi join T2 on T1.x = T2.y.
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = conditions.collect {
+  case In(exp, Seq(SubqueryExpression(subquery))) = (exp, 
subquery)
+}
+// Replace subqueries with a dummy true literal since they are 
evaluated separately now.
+val transformedConds = conditions.transform {
+  case In(_, Seq(SubqueryExpression(_))) = Literal(true)
+}
+subqueryExprs match {
+  case Seq() = filter // No subqueries.
+  case Seq((exp, subquery)) =
+createLeftSemiJoin(
+  child,
+  exp,
+  subquery,
+  transformedConds)
+  case _ =
+throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression,
+subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = 
transformAndGetConditions(value, subquery)
+  // Add both parent query conditions and subquery conditions as join 
conditions
+  val allPredicates = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(allPredicates))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  // TODO : we only decorelate subqueries in very specific cases like 
the cases mentioned above
+  // in documentation. The more complex queries like using of 
subqueries inside subqueries can 
+  // be supported in future.
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than one item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(
+project,
+SubQuery can contain only one item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter{a =
+resolvedChild.resolve(a.name, resolver) != None  
!project.outputSet.contains(a)}
+  val nameToExprMap = collection.mutable.Map[String, Alias]()
+  // Create aliases for all projection expressions.
+  val witAliases = (projectList ++ 
toBeAddedExprs).zipWithIndex.map {
+case (exp, index) = 
+  nameToExprMap.put(exp.name, Alias(exp, ssqc$index)())
+  Alias(exp, ssqc$index)()
+  }
+  // Replace the condition column names with alias names.
+  val transformedConds = condition.transform {
--- End diff --

I am not so sure why you cares about the subquery condition, as in Hive wiki
```
As of Hive 0.13 some types of subqueries are supported in the WHERE clause. 
Those are queries where the result of the query can be treated as a constant 
for IN and NOT IN statements (called uncorrelated subqueries because the 
subquery does not reference columns from the parent query):
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-02-27 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4821#discussion_r25552104
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -217,53 +219,60 @@ private[spark] object EventLoggingListener extends 
Logging {
   /**
* Write metadata about the event log to the given stream.
*
-   * The header is a serialized version of a map, except it does not use 
Java serialization to
-   * avoid incompatibilities between different JDKs. It writes one map 
entry per line, in
-   * key=value format.
-   *
-   * The very last entry in the header is the `HEADER_END_MARKER` marker, 
so that the parsing code
-   * can know when to stop.
+   * The header is a single line of JSON in the beginning of the file. 
Note that this
+   * assumes all metadata necessary to parse the log is also included in 
the file name.
+   * The format needs to be kept in sync with the `openEventLog()` method 
below. Also, it
+   * cannot change in new Spark versions without some other way of 
detecting the change.
*
-   * The format needs to be kept in sync with the openEventLog() method 
below. Also, it cannot
-   * change in new Spark versions without some other way of detecting the 
change (like some
-   * metadata encoded in the file name).
-   *
-   * @param logStream Raw output stream to the even log file.
+   * @param logStream Raw output stream to the event log file.
* @param compressionCodec Optional compression codec to use.
-   * @return A stream where to write event log data. This may be a wrapper 
around the original
+   * @return A stream to which event log data is written. This may be a 
wrapper around the original
* stream (for example, when compression is enabled).
*/
   def initEventLog(
   logStream: OutputStream,
   compressionCodec: Option[CompressionCodec]): OutputStream = {
-val meta = mutable.HashMap((version - SPARK_VERSION))
+val metadata = new mutable.HashMap[String, String]
+// Some of these metadata are already encoded in the file name
+// Here we include them again within the file itself for completeness
+metadata += (Event - 
Utils.getFormattedClassName(SparkListenerMetadataIdentifier))
+metadata += (SPARK_VERSION_KEY - SPARK_VERSION)
 compressionCodec.foreach { codec =
-  meta += (compressionCodec - codec.getClass().getName())
+  metadata += (COMPRESSION_CODEC_KEY - 
codec.getClass.getCanonicalName)
 }
-
-def write(entry: String) = {
-  val bytes = entry.getBytes(Charsets.UTF_8)
-  if (bytes.length  MAX_HEADER_LINE_LENGTH) {
-throw new IOException(sHeader entry too long: ${entry})
-  }
-  logStream.write(bytes, 0, bytes.length)
+val metadataJson = compact(render(JsonProtocol.mapToJson(metadata)))
+val metadataBytes = (metadataJson + \n).getBytes(Charsets.UTF_8)
+if (metadataBytes.length  MAX_HEADER_LINE_LENGTH) {
+  throw new IOException(sEvent log metadata too long: $metadataJson)
 }
-
-meta.foreach { case (k, v) = write(s$k=$v\n) }
-write(s$HEADER_END_MARKER\n)
-
compressionCodec.map(_.compressedOutputStream(logStream)).getOrElse(logStream)
+logStream.write(metadataBytes, 0, metadataBytes.length)
+logStream
   }
 
   /**
* Return a file-system-safe path to the log file for the given 
application.
*
+   * Note that because we currently only create a single log file for each 
application,
+   * we must encode all the information needed to parse this event log in 
the file name
+   * instead of within the file itself. Otherwise, if the file is 
compressed, for instance,
+   * we won't know which codec to use to decompress the metadata.
+   *
* @param logBaseDir Directory where the log file will be written.
* @param appId A unique app ID.
+   * @param compressionCodecName Name of the compression codec used to 
compress the contents
+   * of the log, or None if compression is not 
enabled.
* @return A path which consists of file-system-safe characters.
*/
-  def getLogPath(logBaseDir: String, appId: String): String = {
-val name = appId.replaceAll([ :/], -).replaceAll([${}'\], 
_).toLowerCase
-Utils.resolveURI(logBaseDir) + / + name.stripSuffix(/)
+  def getLogPath(
+  logBaseDir: String,
+  appId: String,
+  compressionCodecName: Option[String]): String = {
+val sanitizedAppId = appId.replaceAll([ :/], 
-).replaceAll([${}'\], _).toLowerCase
+// e.g. EVENT_LOG_app_123_SPARK_VERSION_1.3.1
+// e.g. EVENT_LOG_ {...}

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4804#issuecomment-76504745
  
  [Test build #28105 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28105/consoleFull)
 for   PR 4804 at commit 
[`4d95f75`](https://github.com/apache/spark/commit/4d95f75d3bdf09bad3d8a4a32d5c2ee7486a8a23).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4804#discussion_r25552927
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/TestTimSort.java ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util.collection;
+
+import java.util.*;
+
+/**
+ * This codes generates a int array which fails the standard TimSort.
+ *
+ * The blog that reported the bug
+ * http://www.envisage-project.eu/timsort-specification-and-verification/
+ *
+ * The algorithms to reproduce the bug is obtained from the reporter of 
the bug
+ * https://github.com/abstools/java-timsort-bug
+ *
+ * Licensed under Apache License 2.0
+ * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE
--- End diff --

If this test code is your own work, then this statement is redundant with 
the license header, so would be removed.

But it's copied from the project above right? then you can't write a 
license header here that says it was licensed to the ASF. If anything we would 
reproduce the plain vanilla AL2 stanza from the plain AL2 license text up above 
in the file's license header.

That or not copy this test code. This Java code needs a bit more style work 
to match coding practices here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4804#discussion_r25553306
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/TestTimSort.java ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util.collection;
+
+import java.util.*;
+
+/**
+ * This codes generates a int array which fails the standard TimSort.
+ *
+ * The blog that reported the bug
+ * http://www.envisage-project.eu/timsort-specification-and-verification/
+ *
+ * The algorithms to reproduce the bug is obtained from the reporter of 
the bug
+ * https://github.com/abstools/java-timsort-bug
+ *
+ * Licensed under Apache License 2.0
+ * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE
+ */
+public class TestTimSort {
+
+  private static final int MIN_MERGE = 32;
+
+  /**
+   * Returns an array of integers that demonstrate the bug in TimSort
+   */
+  public static int[] getTimSortBugTestSet(int length) {
+int minRun = minRunLength(length);
+ListLong runs = runsJDKWorstCase(minRun, length);
+return createArray(runs, length);
+  }
+
+  private static int minRunLength(int n) {
+int r = 0; // Becomes 1 if any 1 bits are shifted off
+while (n = MIN_MERGE) {
+  r |= (n  1);
+  n = 1;
+}
+return n + r;
+  }
+
+  private static int[] createArray(ListLong runs, int length) {
+int[] a = new int[length];
+Arrays.fill(a, 0);
+int endRun = -1;
+for (long len : runs)
+  a[endRun += len] = 1;
+a[length - 1] = 0;
+return a;
+  }
+
+  /**
+   * Fills coderuns/code with a sequence of run lengths of the formbr
+   * Y_n x_{n,1}   x_{n,2}   ... x_{n,l_n} br
+   * Y_{n-1} x_{n-1,1} x_{n-1,2} ... x_{n-1,l_{n-1}} br
+   * ... br
+   * Y_1 x_{1,1}   x_{1,2}   ... x_{1,l_1}br
+   * The Y_i's are chosen to satisfy the invariant throughout execution,
+   * but the x_{i,j}'s are merged (by codeTimSort.mergeCollapse/code)
+   * into an X_i that violates the invariant.
+   *
+   * @param length The sum of all run lengths that will be added to 
coderuns/code.
+   */
+  private static ListLong runsJDKWorstCase(int minRun, int length) {
--- End diff --

Oh! can't believe I missed the file at the end here. OK. I see that the 
test case that gets generated is really big, not something you can paste into 
the source. Hm, OK well I suggest fixing the license situation here at a minimum


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4821#issuecomment-76507612
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28103/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4821#issuecomment-76507606
  
  [Test build #28103 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28103/consoleFull)
 for   PR 4821 at commit 
[`519e51a`](https://github.com/apache/spark/commit/519e51a958b40d193327e85b659e1df767041f55).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6055] [PySpark] fix incorrect DataType....

2015-02-27 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4809#issuecomment-76509176
  
I've merged this into `branch-1.2` (1.2.2).  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4823#issuecomment-76510590
  
  [Test build #28107 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28107/consoleFull)
 for   PR 4823 at commit 
[`af461cc`](https://github.com/apache/spark/commit/af461ccce44e2792ea9356ccc2db6c84609511a0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4823#issuecomment-76510593
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28107/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1965 [WEBUI] Spark UI throws NPE on tryi...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4777#issuecomment-76510899
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28108/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1965 [WEBUI] Spark UI throws NPE on tryi...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4777#issuecomment-76510896
  
  [Test build #28108 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28108/consoleFull)
 for   PR 4777 at commit 
[`7e16590`](https://github.com/apache/spark/commit/7e1659074451b03d6b4626aff382ec6ba2f53289).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4826#issuecomment-76513062
  
  [Test build #28113 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28113/consoleFull)
 for   PR 4826 at commit 
[`e4f397c`](https://github.com/apache/spark/commit/e4f397cea7ec0dc21a714b75a7254bb275319fc2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4823#issuecomment-76514247
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28112/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4823#issuecomment-76514243
  
  [Test build #28112 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28112/consoleFull)
 for   PR 4823 at commit 
[`7f52874`](https://github.com/apache/spark/commit/7f52874badfea314d019b0dc9097c54b8af2f654).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6078][CORE] create event log dir if not...

2015-02-27 Thread liyezhang556520

GitHub user liyezhang556520 opened a pull request:

https://github.com/apache/spark/pull/4829

[SPARK-6078][CORE] create event log dir if not exists

when event log directory does not exists, spark just throw 
IlleagalArgumentException and stop the job. User need manually create directory 
first. It's better to create the directory automatically if the directory does 
not exists.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liyezhang556520/spark creatLogDir

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4829.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4829


commit 0de4f18d2b306d2f73d3b6958b400a56c6154de1
Author: Zhang, Liye liye.zh...@intel.com
Date:   2015-02-28T06:13:38Z

create eventlog dir if eventlog dir does not exists

commit e76a1b46383a831f9c6a0daccf1d89934cbbefd2
Author: Zhang, Liye liye.zh...@intel.com
Date:   2015-02-28T06:44:00Z

throw exception when there is same file name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4826#issuecomment-76514620
  
  [Test build #28119 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28119/consoleFull)
 for   PR 4826 at commit 
[`0eb5578`](https://github.com/apache/spark/commit/0eb5578f8fc81c9c2186ffe7ba4b538f7a40f828).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-02-27 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4821#discussion_r25552366
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -217,53 +219,60 @@ private[spark] object EventLoggingListener extends 
Logging {
   /**
* Write metadata about the event log to the given stream.
*
-   * The header is a serialized version of a map, except it does not use 
Java serialization to
-   * avoid incompatibilities between different JDKs. It writes one map 
entry per line, in
-   * key=value format.
-   *
-   * The very last entry in the header is the `HEADER_END_MARKER` marker, 
so that the parsing code
-   * can know when to stop.
+   * The header is a single line of JSON in the beginning of the file. 
Note that this
+   * assumes all metadata necessary to parse the log is also included in 
the file name.
+   * The format needs to be kept in sync with the `openEventLog()` method 
below. Also, it
+   * cannot change in new Spark versions without some other way of 
detecting the change.
*
-   * The format needs to be kept in sync with the openEventLog() method 
below. Also, it cannot
-   * change in new Spark versions without some other way of detecting the 
change (like some
-   * metadata encoded in the file name).
-   *
-   * @param logStream Raw output stream to the even log file.
+   * @param logStream Raw output stream to the event log file.
* @param compressionCodec Optional compression codec to use.
-   * @return A stream where to write event log data. This may be a wrapper 
around the original
+   * @return A stream to which event log data is written. This may be a 
wrapper around the original
* stream (for example, when compression is enabled).
*/
   def initEventLog(
   logStream: OutputStream,
   compressionCodec: Option[CompressionCodec]): OutputStream = {
-val meta = mutable.HashMap((version - SPARK_VERSION))
+val metadata = new mutable.HashMap[String, String]
+// Some of these metadata are already encoded in the file name
+// Here we include them again within the file itself for completeness
+metadata += (Event - 
Utils.getFormattedClassName(SparkListenerMetadataIdentifier))
+metadata += (SPARK_VERSION_KEY - SPARK_VERSION)
 compressionCodec.foreach { codec =
-  meta += (compressionCodec - codec.getClass().getName())
+  metadata += (COMPRESSION_CODEC_KEY - 
codec.getClass.getCanonicalName)
 }
-
-def write(entry: String) = {
-  val bytes = entry.getBytes(Charsets.UTF_8)
-  if (bytes.length  MAX_HEADER_LINE_LENGTH) {
-throw new IOException(sHeader entry too long: ${entry})
-  }
-  logStream.write(bytes, 0, bytes.length)
+val metadataJson = compact(render(JsonProtocol.mapToJson(metadata)))
+val metadataBytes = (metadataJson + \n).getBytes(Charsets.UTF_8)
+if (metadataBytes.length  MAX_HEADER_LINE_LENGTH) {
+  throw new IOException(sEvent log metadata too long: $metadataJson)
 }
-
-meta.foreach { case (k, v) = write(s$k=$v\n) }
-write(s$HEADER_END_MARKER\n)
-
compressionCodec.map(_.compressedOutputStream(logStream)).getOrElse(logStream)
+logStream.write(metadataBytes, 0, metadataBytes.length)
+logStream
   }
 
   /**
* Return a file-system-safe path to the log file for the given 
application.
*
+   * Note that because we currently only create a single log file for each 
application,
+   * we must encode all the information needed to parse this event log in 
the file name
+   * instead of within the file itself. Otherwise, if the file is 
compressed, for instance,
+   * we won't know which codec to use to decompress the metadata.
+   *
* @param logBaseDir Directory where the log file will be written.
* @param appId A unique app ID.
+   * @param compressionCodecName Name of the compression codec used to 
compress the contents
+   * of the log, or None if compression is not 
enabled.
* @return A path which consists of file-system-safe characters.
*/
-  def getLogPath(logBaseDir: String, appId: String): String = {
-val name = appId.replaceAll([ :/], -).replaceAll([${}'\], 
_).toLowerCase
-Utils.resolveURI(logBaseDir) + / + name.stripSuffix(/)
+  def getLogPath(
+  logBaseDir: String,
+  appId: String,
+  compressionCodecName: Option[String]): String = {
+val sanitizedAppId = appId.replaceAll([ :/], 
-).replaceAll([${}'\], _).toLowerCase
+// e.g. EVENT_LOG_app_123_SPARK_VERSION_1.3.1
+// e.g. EVENT_LOG_ {...}

[GitHub] spark pull request: [SPARK-6048] SparkConf should not translate de...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4799#issuecomment-76504481
  
  [Test build #28104 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28104/consoleFull)
 for   PR 4799 at commit 
[`c26a9e3`](https://github.com/apache/spark/commit/c26a9e3c3f6a17ae01782278fb1d4a1426fcbdbd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4804#issuecomment-76504579
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4804#discussion_r25553196
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/TestTimSort.java ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util.collection;
+
+import java.util.*;
+
+/**
+ * This codes generates a int array which fails the standard TimSort.
+ *
+ * The blog that reported the bug
+ * http://www.envisage-project.eu/timsort-specification-and-verification/
+ *
+ * The algorithms to reproduce the bug is obtained from the reporter of 
the bug
+ * https://github.com/abstools/java-timsort-bug
+ *
+ * Licensed under Apache License 2.0
+ * https://github.com/abstools/java-timsort-bug/blob/master/LICENSE
--- End diff --

It looks like it's almost entirely the code from the third party site. The 
right-est thing to do is actually begin this file with the standard AL2 stanza:

```
Copyright 2015 [the author's name]

   Licensed under the Apache License, Version 2.0 (the License);
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an AS IS BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
```

... since that is substantially the license of the work in the file. I 
believe the build's check for this stuff will accept this, or should. It can be 
followed with a comment that the work has been modified from its original form.

I don't think it's crazy to omit this either, though always nice to have 
tests.

THere's no standard IJ config but I'll point out some things that could be 
touched up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread hotou

Github user hotou commented on the pull request:

https://github.com/apache/spark/pull/4804#issuecomment-76508685
  
@srowen I did what you recommended here. This passed the rat test on my 
machine at least


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1965 [WEBUI] Spark UI throws NPE on tryi...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4777#issuecomment-76508662
  
  [Test build #28108 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28108/consoleFull)
 for   PR 4777 at commit 
[`7e16590`](https://github.com/apache/spark/commit/7e1659074451b03d6b4626aff382ec6ba2f53289).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4819#issuecomment-76511907
  
  [Test build #28110 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28110/consoleFull)
 for   PR 4819 at commit 
[`7d4ed48`](https://github.com/apache/spark/commit/7d4ed483e0a0c58669ab00421d00eecda832cfba).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4819#issuecomment-76511909
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28110/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3586][streaming]Support nested director...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2765#issuecomment-76512835
  
  [Test build #28114 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28114/consoleFull)
 for   PR 2765 at commit 
[`348657e`](https://github.com/apache/spark/commit/348657e2069c3732d2a43bbc6ddb873eec7a3a48).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4723#issuecomment-76512903
  
  [Test build #28111 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28111/consoleFull)
 for   PR 4723 at commit 
[`1b6e873`](https://github.com/apache/spark/commit/1b6e873602785c5e5c78ee23d77725d2c51129fc).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OffsetRange(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4723#issuecomment-76512906
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28111/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76513950
  
  [Test build #28117 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28117/consoleFull)
 for   PR 4792 at commit 
[`538f506`](https://github.com/apache/spark/commit/538f506851d7e2eba6a20d0ad4a5909486bf8516).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5775] [SQL] BugFix: GenericRow cannot b...

2015-02-27 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/4792#issuecomment-76514046
  
@yhuai Thanks for the review! I've addressed the comments. Will merge this 
to master and branch-1.3 after Jenkins approves.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6077] update listener for the existing ...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4828#issuecomment-76514483
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5979][SPARK-6032] Smaller safer --packa...

2015-02-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4802


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5979][SPARK-6031][SPARK-6032][SPARK-604...

2015-02-27 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4754#issuecomment-76514573
  
@brkyvz let's close this issue for now and keep it in our back pocket. We 
can use it if we decide to put this in the 1.3 branch down the line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [GraphX] initialmessage for pagerank should be...

2015-02-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1128


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3629][Doc] improve spark on yarn doc

2015-02-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2813


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6079 ] Use index to speed up StatusTrac...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4830#issuecomment-76515751
  
  [Test build #28122 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28122/consoleFull)
 for   PR 4830 at commit 
[`2c49614`](https://github.com/apache/spark/commit/2c49614cc4f92dc1a47044be362db51cfe4da77b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4821#issuecomment-76503962
  
  [Test build #28103 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28103/consoleFull)
 for   PR 4821 at commit 
[`519e51a`](https://github.com/apache/spark/commit/519e51a958b40d193327e85b659e1df767041f55).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5984: Fix TimSort bug causes ArrayOutOfB...

2015-02-27 Thread hotou

Github user hotou commented on the pull request:

https://github.com/apache/spark/pull/4804#issuecomment-76505114
  
ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6029] Stop excluding fastutil package

2015-02-27 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4780#issuecomment-76507231
  
So, marking CA as provided but putting it your assembly is contradictory, 
but in the end, the right thing is to include CA (and its dependencies) in your 
assembly, yes. I was asking whether you mark Spark as provided; it should be.

You're effectively shading CA (and not fastutil); you should also be able 
to achieve that through your build rather than bother with source, though, I 
don't know how that works in SBT. (`minimizeJar` is a function of Maven's 
shading plugin.)

fastutil-in-Spark isn't the issue per se, since indeed Spark doesn't have 
it! what it does have is CA.

Your result seems to confirm that the problem is really CA, in the sense 
that your app finds Spark's loaded copy of CA classes, but that classloader 
can't see your classloader, which also has CA, but also the fastutil it needs. 
Shading CA disambiguates this. (So, put that workaround in your pocket for now; 
you should be able to do this with SBT.)

This is what the userClassPathFirst stuff is supposed to resolve though.

You're definitely sure CA is in your app JAR? I ask just because you 
mention it was marked provided above, though also in your assembly. Worth 
double-checking.

Otherwise, I'm not sure. It almost sounds like the inverse of unresolved 
(?) https://issues.apache.org/jira/browse/SPARK-1863 but may surprise me by 
being the same issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4411][UI]Add kill link for jobs in the ...

2015-02-27 Thread lianhuiwang

GitHub user lianhuiwang opened a pull request:

https://github.com/apache/spark/pull/4823

[SPARK-4411][UI]Add kill link for jobs in the UI

 We should have a kill link for each job, similar to what we have for 
each stage, so it's easier for users to kill  jobs in the UI. @kayousterhout 
can you take a look at this? thanks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lianhuiwang/spark SPARK-4411

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4823.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4823


commit af461ccce44e2792ea9356ccc2db6c84609511a0
Author: Lianhui Wang lianhuiwan...@gmail.com
Date:   2015-02-28T03:24:46Z

Add kill link for jobs in the UI




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6048] SparkConf should not translate de...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4799#issuecomment-76507995
  
  [Test build #28104 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28104/consoleFull)
 for   PR 4799 at commit 
[`c26a9e3`](https://github.com/apache/spark/commit/c26a9e3c3f6a17ae01782278fb1d4a1426fcbdbd).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6048] SparkConf should not translate de...

2015-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4799#issuecomment-76507997
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28104/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method to effi...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4819#issuecomment-76509618
  
  [Test build #28110 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28110/consoleFull)
 for   PR 4819 at commit 
[`7d4ed48`](https://github.com/apache/spark/commit/7d4ed483e0a0c58669ab00421d00eecda832cfba).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4723#issuecomment-76510453
  
  [Test build #28111 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28111/consoleFull)
 for   PR 4723 at commit 
[`1b6e873`](https://github.com/apache/spark/commit/1b6e873602785c5e5c78ee23d77725d2c51129fc).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-27 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-76511660
  
I think that this patch may have introduced a bug that may cause 
accumulator updates to be lost:
https://issues.apache.org/jira/browse/SPARK-6075

I'm still trying to see if I can spot the problem, but my hunch is that 
maybe the `localAccums` thread-local maps should not hold weak references.  
When deserializing an accumulator in an executor and registering it with 
`localAccums`, is there ever a moment in which the accumulator has no strong 
references pointing to it?  Does someone object hold a strong reference to an 
accumulator while it's being deserialized?  If not, this could lead to it being 
dropped from the `localAccums` map, causing that task's accumulator updates to 
be lost.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL]Insert array into a metastore...

2015-02-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4826#issuecomment-76512669
  
  [Test build #28113 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28113/consoleFull)
 for   PR 4826 at commit 
[`e4f397c`](https://github.com/apache/spark/commit/e4f397cea7ec0dc21a714b75a7254bb275319fc2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

2015-02-27 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/4729#issuecomment-76512714
  
@viirya Thank you for working on it! Our discussions helped me clearly 
understand the problem. After discussions with @liancheng, I am proposing a 
different approach to address this issue in 
https://github.com/apache/spark/pull/4826. Please feel free to leave comments 
at there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 428 matches

Mail list logo