date:20141209

[GitHub] spark pull request: SPARK-4159 [CORE] [WIP] Maven build doesn't ru...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3651#issuecomment-66416040
  
  [Test build #24299 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24299/consoleFull)
 for   PR 3651 at commit 
[`125b0b6`](https://github.com/apache/spark/commit/125b0b64efc22c5a573aea00bf9bfdb53393cdbe).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3570


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-09 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-66415593
  
Thanks for updating the description.  This looks good to me, so I'm going 
to merge this into `master`, `branch-1.0`, and `branch-1.1` (and I'll tag it 
for a post-release backport into `branch-1.2`).  Thanks again for this fix!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...

2014-12-09 Thread Lewuathe

Github user Lewuathe commented on the pull request:

https://github.com/apache/spark/pull/3637#issuecomment-66415498
  
My initial question was based on the viewpoint of developer api.
Simple api seems to restrict the possibility of implementation of new 
algorithm. As @shivaram mentioned, developer cannot use optimization api 
because these datatypes does not support simply multi classification etc. So I 
think typed interface does not necessarily be public in this context. If you 
have different viewpoint such as end user of this api, that opinion should be 
included on this discussion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add hadoop-2.5 profile with upgraded jets3t

2014-12-09 Thread ZhangBanger

Github user ZhangBanger commented on the pull request:

https://github.com/apache/spark/pull/3654#issuecomment-66415382
  
Thanks for the response! I'll try out the ```hadoop-2.4``` profile again. 
For ```jets3t```, I'll take another look to see if it's Spark or Hadoop that 
needs to do some shading.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Use tag for help icon in web UI ...

2014-12-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3659


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Use tag for help icon in web UI ...

2014-12-09 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3659#issuecomment-66415145
  
I'm merging this into `master` and `branch-1.2`.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4233] [SQL] WIP:Simplify the UDAF API (...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3247#issuecomment-66415137
  
  [Test build #24297 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24297/consoleFull)
 for   PR 3247 at commit 
[`a9c1544`](https://github.com/apache/spark/commit/a9c15442c0ecfea8a605c30046548249475009db).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedFunction(`
  * `trait AggregateFunction `
  * `trait AggregateExpression extends Expression `
  * `abstract class UnaryAggregateExpression extends UnaryExpression with 
AggregateExpression `
  * `case class Min(`
  * `case class Average(child: Expression, distinct: Boolean = false)`
  * `case class Max(child: Expression, distinct: Boolean = false)`
  * `case class Count(child: Expression, distinct: Boolean = false)`
  * `case class Sum(child: Expression, distinct: Boolean = false)`
  * `case class First(child: Expression, distinct: Boolean = false)`
  * `case class Last(child: Expression, distinct: Boolean = false)`
  * `case class MinFunction(aggr: BoundReference, base: Min) extends 
AggregateFunction `
  * `case class AverageFunction(count: BoundReference, sum: BoundReference, 
base: Average)`
  * `case class MaxFunction(aggr: BoundReference, base: Max) extends 
AggregateFunction `
  * `case class CountFunction(aggr: BoundReference, base: Count) extends 
AggregateFunction `
  * `case class SumFunction(aggr: BoundReference, base: Sum) extends 
AggregateFunction `
  * `case class FirstFunction(aggr: BoundReference, base: First) extends 
AggregateFunction `
  * `case class LastFunction(aggr: BoundReference, base: 
AggregateExpression) extends AggregateFunction `
  * `sealed case class AggregateFunctionBind(`
  * `sealed trait Aggregate `
  * `sealed trait PreShuffle extends Aggregate `
  * `sealed trait PostShuffle extends Aggregate `
  * `case class AggregatePreShuffle(`
  * `case class AggregatePostShuffle(`
  * `case class DistinctAggregate(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4233] [SQL] WIP:Simplify the UDAF API (...

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3247#issuecomment-66415146
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24297/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4812][SQL] Fix the initialization issue...

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3660#issuecomment-66414901
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24296/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4812][SQL] Fix the initialization issue...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3660#issuecomment-66414896
  
  [Test build #24296 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24296/consoleFull)
 for   PR 3660 at commit 
[`fab8658`](https://github.com/apache/spark/commit/fab86585bcfbf1afaa300632faf259024facf66c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-submit with accept multiple properties-f...

2014-12-09 Thread lvsoft

Github user lvsoft commented on the pull request:

https://github.com/apache/spark/pull/3490#issuecomment-66414770
  
Well, I can't understand what's the complexity of this PR. I've reviewed 
the SPARK-3779 marked as related and didn't find something related to this 
patch.
And, this patch will be downward compatible with current `spark-submit` 
behavior.

From my point of view, let's talk it level by level:
1. In case of necessity: I've give out two reasons, one for benchmark case, 
one for common intuition in most systems.
2. In case of complexity: This patch maintains downward compatibility, and 
I've described its detail at the beginning and didn't catch the relationship 
with SPARK-3779.
3. In case of elegance: I don't think this is the most elegant solution. 
However, in order to maintain compatibility and least impact to current system, 
this is the relatively elegant solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4233] [SQL] WIP:Simplify the UDAF API (...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3247#issuecomment-66414361
  
  [Test build #24297 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24297/consoleFull)
 for   PR 3247 at commit 
[`a9c1544`](https://github.com/apache/spark/commit/a9c15442c0ecfea8a605c30046548249475009db).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4812][SQL] Fix the initialization issue...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3660#issuecomment-66414168
  
  [Test build #24296 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24296/consoleFull)
 for   PR 3660 at commit 
[`fab8658`](https://github.com/apache/spark/commit/fab86585bcfbf1afaa300632faf259024facf66c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix the initialization issue of 'codegenEnable...

2014-12-09 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/3660

Fix the initialization issue of 'codegenEnabled'

The problem is `codegenEnabled` is `val`, but it uses a `val` `sqlContext`, 
which can be override by subclasses. Here is a simple example to show this 
issue.

```Scala
scala> :paste
// Entering paste mode (ctrl-D to finish)

abstract class Foo {

  protected val sqlContext = "Foo"

  val codegenEnabled: Boolean = {
println(sqlContext) // it will call subclass's `sqlContext` which has 
not yet been initialized.
if (sqlContext != null) {
  true
} else {
  false
}
  }
}

class Bar extends Foo {
  override val sqlContext = "Bar"
}

println(new Bar().codegenEnabled)

// Exiting paste mode, now interpreting.

null
false
defined class Foo
defined class Bar
```

To fix it, we should mark `codegenEnabled` as `lazy`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-4812

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3660.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3660


commit fab86585bcfbf1afaa300632faf259024facf66c
Author: zsxwing 
Date:   2014-12-10T07:25:54Z

Fix the initialization issue of 'codegenEnabled'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join

2014-12-09 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3505#issuecomment-66411889
  
Let me explain - Spark SQL is more than SQL. It is SQL + a dsl that will be 
improved over time. I personally believe over time, majority of Spark users 
will interact directly with SchemaRDD instead, because that is optimized for 
structured data. It is much easier in Spark SQL to also optimize for data 
structure. Personally I'm worried about pushing more and more stuff into Spark 
core itself because it is very hard to maintain and optimize for arbitrarily 
structured JVM objects.

Most of the code you have written here can be used directly in SchemaRDD. 
If we really want to apply this in core itself, even without considering the 
maintenance burden, we will need to find a way so this can be turned on and off 
easily (e.g. different method names) rather than relying on immutable confs.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join

2014-12-09 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3505#issuecomment-66411639
  
> the problem is that sparkconf is immutable once created - so in order to 
toggle this on and off, a user would have to restart Spark.

I added this configuration because it's convenient to old codes. I would 
encourage people to call `skewedJoin` directly for new codes if they need 
skewed join because they are usually familiar with their data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join

2014-12-09 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3505#issuecomment-66411392
  
> Maybe a better place to do this is in SparkSQL?

It depends on if this is a fundamental feature for Spark Core. IMO, I think 
it's better to have a skewed join in Spark Core as Join is a very common 
operator for Spark Core users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-09 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3643#issuecomment-66411144
  
Thanks. I add the consideration for different cases of SparseVector and 
DenseVector.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join

2014-12-09 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3505#issuecomment-66410973
  
the problem is that sparkconf is immutable once created - so in order to 
toggle this on and off, a user would have to restart Spark. Maybe a better 
place to do this is in SparkSQL?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join

2014-12-09 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3505#issuecomment-66410552
  
ping @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2014-12-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3655#discussion_r21586197
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala
 ---
@@ -201,12 +201,31 @@ class ReliableKafkaReceiver[
 topicPartitionOffsetMap.clear()
   }
 
-  /** Store the ready-to-be-stored block and commit the related offsets to 
zookeeper. */
+  /**
+   * Store the ready-to-be-stored block and commit the related offsets to 
zookeeper. This method
+   * will try a fixed number of times to push the block. If the push 
fails, the receiver is stopped.
+   */
   private def storeBlockAndCommitOffset(
   blockId: StreamBlockId, arrayBuffer: mutable.ArrayBuffer[_]): Unit = 
{
-store(arrayBuffer.asInstanceOf[mutable.ArrayBuffer[(K, V)]])
-Option(blockOffsetMap.get(blockId)).foreach(commitOffset)
-blockOffsetMap.remove(blockId)
+var count = 0
+var pushed = false
+var exception: Exception = null
+while (!pushed && count <= 3) {
--- End diff --

General question - is it likely that a store fails, but then immediately 
succeeds? Just wondering at the likelihood that this does anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add hadoop-2.5 profile with upgraded jets3t

2014-12-09 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3654#issuecomment-66409892
  
This has been discussed a few times - the `hadoop-2.4` profile covers 2.4+. 
You don't need a new profile. I think the goal is to match `jets3t` distributed 
with Hadoop. It might be fine to update from 0.9.0 to 0.9.2 but if it's not 
actually a problem, might not be worth touching it just now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4159 [CORE] [WIP] Maven build doesn't ru...

2014-12-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3651#discussion_r21585931
  
--- Diff: pom.xml ---
@@ -941,19 +950,38 @@
 true
   
 
+
 
   org.apache.maven.plugins
   maven-surefire-plugin
-  2.17
+  2.18
+  
   
-
-true
+
+  **/Test*.java
+  **/*Test.java
+  **/*TestCase.java
+  **/*Suite.java
+
+
${project.build.directory}/surefire-reports
+-Xmx3g -XX:MaxPermSize=${MaxPermGen} 
-XX:ReservedCodeCacheSize=512m
+
+  true
+  
${session.executionRootDirectory}
+  1
+  false
+  
false
+  
${test_classpath}
+  
true
+
   
 
+
--- End diff --

It probably is; I had not tried it just yet. I do think this is a precursor 
to more change, potentially using `surefire` for everything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-submit with accept multiple properties-f...

2014-12-09 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3490#issuecomment-66408350
  
As Patrick said, this will make confiugration more complex than more 
elegant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4795][Core] Redesign the "primitive typ...

2014-12-09 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/3642#discussion_r21585352
  
--- Diff: 
graphx/src/test/scala/org/apache/spark/graphx/lib/ShortestPathsSuite.scala ---
@@ -40,7 +40,7 @@ class ShortestPathsSuite extends FunSuite with 
LocalSparkContext {
   val graph = Graph.fromEdgeTuples(edges, 1)
   val landmarks = Seq(1, 4).map(_.toLong)
   val results = ShortestPaths.run(graph, 
landmarks).vertices.collect.map {
-case (v, spMap) => (v, spMap.mapValues(_.get))
--- End diff --

Or we can keep `xxxToXxxWritable` still implicit for the source 
compatibility?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4741] Do not destroy FileInputStream an...

2014-12-09 Thread mridulm

Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/3600#issuecomment-66407791
  
-1 This is broken change for multiple reasons - finalize of out of scope 
variable can trigger close of underlying fd, potential state issue with vars 
not being null when we expect them to be; not to mention the cost 'saved' is 
fairly trivial anyway to introduce changes to this codepath.

please be very careful changing disk writers in spark ... we have spent too 
much time debugging and fixing it; correctness and functionality is much more 
important than perceived performance improvement. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-submit with accept multiple properties-f...

2014-12-09 Thread lvsoft

Github user lvsoft commented on the pull request:

https://github.com/apache/spark/pull/3490#issuecomment-66405387
  
Well, that's called separated property files, not *common* properties. 
It'll be hard to adjust common properties and easy to make mistakes.

Delete tmp files is a common requirement in system design. Of course you 
can ignore tmp files. As I said, I think it's a more elegant approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Use tag for help icon in web UI ...

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3659#issuecomment-66405196
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24295/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Use tag for help icon in web UI ...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3659#issuecomment-66405191
  
  [Test build #24295 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24295/consoleFull)
 for   PR 3659 at commit 
[`bd72899`](https://github.com/apache/spark/commit/bd7289909719c0cb9d566baed2affa478f359193).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...

2014-12-09 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3409#issuecomment-66404905
  
Use spark.yarn.am.* and describe the scope of configuration items in docs 
is better. In cluster mode we should ignore the unused configs, probably also 
add a warning log. Throwing an exception is a bad idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3779. yarn spark.yarn.applicationMaster....

2014-12-09 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/3471#issuecomment-66404718
  
@andrewor14, makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-submit with accept multiple properties-f...

2014-12-09 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3490#issuecomment-66404521
  
In your case, why don't just add common properties into private config and 
set a seperate propertiy file for each workload?
Why would the tmp conf file be deleted after job finished?
I don't think this is reasonalbe to make this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4809] Rework Guava library shading.

2014-12-09 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3658#issuecomment-66404298
  
BTW, credit where credit is due, I got this idea from @arahuja


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-submit with accept multiple properties-f...

2014-12-09 Thread lvsoft

Github user lvsoft commented on the pull request:

https://github.com/apache/spark/pull/3490#issuecomment-66404194
  
Sorry for late reply. I'll explain the use cases for multiple properties 
files. 

Currently I'm working on a benchmark utility for spark. It'll be nature to 
adjust properties for different workloads.
I'd like to setup the configures with two parts: global confs for common 
properties, and private confs for each workloads. Without the support of 
multiple properties files, I have to merge the properties as a tmp conf file, 
and remove it after spark-submit finished. What's more, consider to submit 
multiple workloads for multiple times concurrently, the tmp conf file name need 
to be mutually exclusive. And if the benchmark processing was interrupted, the 
tmp conf files will be hard to clean.

So I think, a more elegant approach is to add the support of multiple 
properties files for spark.

Another reason for this PR: currently spark will use `spark-defaults.conf` 
if no properties-file specified, or use the specified properties-file and 
*discard* `spark-defaults.conf`. This behavior is also counter-intuitive for 
beginners. In most systems, it is a natural assumption that the values in 
`xxx-defaults.conf` will take effect if the properties is not overrided in 
user's config.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...

2014-12-09 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/3637#issuecomment-66403895
  
@jkbradley Apologies for the delay - I just read your design doc and am 
catching up on this discussion. 
Sorry if I missed something, but could you clarify the use case here ? I 
can see two kinds of scenarios

1. Cases where we just want to use existing classifier like 
LogisticRegression in a pipeline. I guess the train() interface shouldn't 
really matter here as they we will be passing around SchemaRDDs in a pipeline 
and calling fit (thus going through the untyped API ?).

2. Cases where developers want to implement a new Classification or 
Regression method. For these cases I think the strongly typed API would help in 
reducing the amount of cruft code and possible bugs in extracting features, 
labels etc.

FWIW I agree with the conclusion of keeping LabeledPoint simple as (Double, 
Array[Double]). And I think `predictRaw` is also probably fine as meaning of 
the values returned may vary (as noted in your comment). 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4806] Streaming doc update for 1.2

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3653#issuecomment-66403488
  
  [Test build #24294 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24294/consoleFull)
 for   PR 3653 at commit 
[`17b99fb`](https://github.com/apache/spark/commit/17b99fbaf699c54bf75893b98c66ec5e3fde30ba).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4806] Streaming doc update for 1.2

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3653#issuecomment-66403490
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24294/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-12-09 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r21583881
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -221,33 +238,87 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
 }
   }
 
-  private def createReplayBus(logDir: FileStatus): (ReplayListenerBus, 
ApplicationEventListener) = {
-val path = logDir.getPath()
-val elogInfo = EventLoggingListener.parseLoggingInfo(path, fs)
-val replayBus = new ReplayListenerBus(elogInfo.logPaths, fs, 
elogInfo.compressionCodec)
-val appListener = new ApplicationEventListener
-replayBus.addListener(appListener)
-(replayBus, appListener)
+  /**
+   * Replays the event data in the given log, and returns the application 
information.
+   */
+  private def replay(logPath: FileStatus, bus: ReplayListenerBus): 
FsApplicationHistoryInfo = {
+val (logInput, sparkVersion) =
+  if (isLegacyLogDirectory(logPath)) {
+openOldLog(logPath.getPath())
+  } else {
+EventLoggingListener.openEventLog(logPath.getPath(), fs)
+  }
+try {
+  val appListener = new ApplicationEventListener
+  bus.addListener(appListener)
+  bus.replay(logInput, sparkVersion)
+  new FsApplicationHistoryInfo(
+logPath.getPath().getName(),
+appListener.appId.getOrElse(logPath.getPath().getName()),
+appListener.appName.getOrElse(NOT_STARTED),
+appListener.startTime.getOrElse(-1L),
+appListener.endTime.getOrElse(-1L),
+getModificationTime(logPath),
+appListener.sparkUser.getOrElse(NOT_STARTED))
+} finally {
+  logInput.close()
+}
   }
 
-  /** Return when this directory was last modified. */
-  private def getModificationTime(dir: FileStatus): Long = {
-try {
-  val logFiles = fs.listStatus(dir.getPath)
-  if (logFiles != null && !logFiles.isEmpty) {
-logFiles.map(_.getModificationTime).max
-  } else {
-dir.getModificationTime
+  /**
+   * Load the a legacy log directory. This assumes that the log directory 
contains a single event
+   * log file, which is the case for directories generated by the code in 
previous releases.
+   */
+  private[history] def openOldLog(dir: Path): (InputStream, String) = {
+val children = fs.listStatus(dir)
+var eventLogPath: Path = null
+var codecName: String = null
+var sparkVersion: String = null
+
+children.foreach { child =>
+  child.getPath().getName() match {
+case name if name.startsWith(LOG_PREFIX) =>
+  eventLogPath = child.getPath()
+
+case codec if codec.startsWith(COMPRESSION_CODEC_PREFIX) =>
+  codecName = codec.substring(COMPRESSION_CODEC_PREFIX.length())
+
+case version if version.startsWith(SPARK_VERSION_PREFIX) =>
+  sparkVersion = version.substring(SPARK_VERSION_PREFIX.length())
+
+case _ =>
   }
-} catch {
-  case t: Throwable =>
-logError("Exception in accessing modification time of 
%s".format(dir.getPath), t)
--1L
+}
+
+val codec = try {
+if (codecName != null) {
+  Some(CompressionCodec.createCodec(conf, codecName))
+} else None
--- End diff --

codecName.map { name => CompressionCodec.createCodec(conf, name) }


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4759] Fix driver hanging from coalescin...

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3633#issuecomment-66403290
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24292/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4759] Fix driver hanging from coalescin...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3633#issuecomment-66403289
  
  [Test build #24292 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24292/consoleFull)
 for   PR 3633 at commit 
[`e520d6b`](https://github.com/apache/spark/commit/e520d6b138f2b1bf95cf59d73ed2976eacfb87ca).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3779. yarn spark.yarn.applicationMaster....

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3471#issuecomment-66402995
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24293/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3779. yarn spark.yarn.applicationMaster....

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3471#issuecomment-66402988
  
  [Test build #24293 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24293/consoleFull)
 for   PR 3471 at commit 
[`20b9887`](https://github.com/apache/spark/commit/20b9887bb9529f2792123778e6eeca6ba0e51c37).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4809] Rework Guava library shading.

2014-12-09 Thread ash211

Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/3658#issuecomment-66402583
  
cc @mccheah


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4161]Spark shell class path is not corr...

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3050#issuecomment-66402461
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24291/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4161]Spark shell class path is not corr...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3050#issuecomment-66402458
  
  [Test build #24291 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24291/consoleFull)
 for   PR 3050 at commit 
[`abb6fa4`](https://github.com/apache/spark/commit/abb6fa4186d0737ec960d628a55d367ac52fe03c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-12-09 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r21583427
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -221,33 +238,87 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
 }
   }
 
-  private def createReplayBus(logDir: FileStatus): (ReplayListenerBus, 
ApplicationEventListener) = {
-val path = logDir.getPath()
-val elogInfo = EventLoggingListener.parseLoggingInfo(path, fs)
-val replayBus = new ReplayListenerBus(elogInfo.logPaths, fs, 
elogInfo.compressionCodec)
-val appListener = new ApplicationEventListener
-replayBus.addListener(appListener)
-(replayBus, appListener)
+  /**
+   * Replays the event data in the given log, and returns the application 
information.
+   */
+  private def replay(logPath: FileStatus, bus: ReplayListenerBus): 
FsApplicationHistoryInfo = {
+val (logInput, sparkVersion) =
+  if (isLegacyLogDirectory(logPath)) {
+openOldLog(logPath.getPath())
+  } else {
+EventLoggingListener.openEventLog(logPath.getPath(), fs)
+  }
+try {
+  val appListener = new ApplicationEventListener
+  bus.addListener(appListener)
+  bus.replay(logInput, sparkVersion)
+  new FsApplicationHistoryInfo(
+logPath.getPath().getName(),
+appListener.appId.getOrElse(logPath.getPath().getName()),
+appListener.appName.getOrElse(NOT_STARTED),
+appListener.startTime.getOrElse(-1L),
+appListener.endTime.getOrElse(-1L),
+getModificationTime(logPath),
+appListener.sparkUser.getOrElse(NOT_STARTED))
+} finally {
+  logInput.close()
+}
   }
 
-  /** Return when this directory was last modified. */
-  private def getModificationTime(dir: FileStatus): Long = {
-try {
-  val logFiles = fs.listStatus(dir.getPath)
-  if (logFiles != null && !logFiles.isEmpty) {
-logFiles.map(_.getModificationTime).max
-  } else {
-dir.getModificationTime
+  /**
+   * Load the a legacy log directory. This assumes that the log directory 
contains a single event
+   * log file, which is the case for directories generated by the code in 
previous releases.
--- End diff --

you should add `single event log file, along with other files containing 
application meta data`. From this description alone it's not clear how the 
legacy directory is different from the new structure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4329][WebUI] HistoryPage pagenation

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3194#issuecomment-66400801
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24290/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4329][WebUI] HistoryPage pagenation

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3194#issuecomment-66400794
  
  [Test build #24290 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24290/consoleFull)
 for   PR 3194 at commit 
[`15d3d2d`](https://github.com/apache/spark/commit/15d3d2d5b3e62b353af3061e454e7203331de3d2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-12-09 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r21582841
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -221,33 +238,87 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
 }
--- End diff --

I can't comment in the line above this one, but we really shouldn't catch 
`Throwable` here. We should make it an `Exception`. (not your changes)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-12-09 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r21582813
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -149,41 +162,45 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
* Tries to reuse as much of the data already in memory as possible, by 
not reading
* applications that haven't been updated since last time the logs were 
checked.
*/
-  private def checkForLogs() = {
+  private[history] def checkForLogs() = {
 lastLogCheckTimeMs = getMonotonicTimeMs()
 logDebug("Checking for logs. Time is now 
%d.".format(lastLogCheckTimeMs))
-try {
-  val logStatus = fs.listStatus(new Path(logDir))
-  val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq 
else Seq[FileStatus]()
 
-  // Load all new logs from the log directory. Only directories that 
have a modification time
-  // later than the last known log directory will be loaded.
+try {
   var newLastModifiedTime = lastModifiedTime
-  val logInfos = logDirs
-.filter { dir =>
-  if (fs.isFile(new Path(dir.getPath(), 
EventLoggingListener.APPLICATION_COMPLETE))) {
-val modTime = getModificationTime(dir)
-newLastModifiedTime = math.max(newLastModifiedTime, modTime)
-modTime > lastModifiedTime
-  } else {
-false
+  val statusList = Option(fs.listStatus(new Path(logDir))).map(_.toSeq)
+.getOrElse(Seq[FileStatus]())
+  val logInfos = statusList
+.filter { entry =>
+  try {
+val isFinishedApplication =
+  if (isLegacyLogDirectory(entry)) {
+fs.exists(new Path(entry.getPath(), APPLICATION_COMPLETE))
+  } else {
+
!entry.getPath().getName().endsWith(EventLoggingListener.IN_PROGRESS)
+  }
+
+if (isFinishedApplication) {
+  val modTime = getModificationTime(entry)
+  newLastModifiedTime = math.max(newLastModifiedTime, modTime)
+  modTime >= lastModifiedTime
+} else {
+  false
+}
+  } catch {
+case e: AccessControlException =>
+  // Do not use "logInfo" since these messages can get pretty 
noisy if printed on
+  // every poll.
+  logDebug(s"No permission to read $entry, ignoring.")
+  false
   }
 }
-.flatMap { dir =>
+.flatMap { entry =>
   try {
-val (replayBus, appListener) = createReplayBus(dir)
-replayBus.replay()
-Some(new FsApplicationHistoryInfo(
-  dir.getPath().getName(),
-  appListener.appId.getOrElse(dir.getPath().getName()),
-  appListener.appName.getOrElse(NOT_STARTED),
-  appListener.startTime.getOrElse(-1L),
-  appListener.endTime.getOrElse(-1L),
-  getModificationTime(dir),
-  appListener.sparkUser.getOrElse(NOT_STARTED)))
+Some(replay(entry, new ReplayListenerBus()))
   } catch {
 case e: Exception =>
-  logInfo(s"Failed to load application log data from $dir.", e)
+  logInfo(s"Failed to load application log data from $entry.", 
e)
--- End diff --

Not your code, but this should be `logError`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-12-09 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r21582789
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -149,41 +162,45 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
* Tries to reuse as much of the data already in memory as possible, by 
not reading
* applications that haven't been updated since last time the logs were 
checked.
*/
-  private def checkForLogs() = {
+  private[history] def checkForLogs() = {
 lastLogCheckTimeMs = getMonotonicTimeMs()
 logDebug("Checking for logs. Time is now 
%d.".format(lastLogCheckTimeMs))
-try {
-  val logStatus = fs.listStatus(new Path(logDir))
-  val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq 
else Seq[FileStatus]()
 
-  // Load all new logs from the log directory. Only directories that 
have a modification time
-  // later than the last known log directory will be loaded.
+try {
   var newLastModifiedTime = lastModifiedTime
-  val logInfos = logDirs
-.filter { dir =>
-  if (fs.isFile(new Path(dir.getPath(), 
EventLoggingListener.APPLICATION_COMPLETE))) {
-val modTime = getModificationTime(dir)
-newLastModifiedTime = math.max(newLastModifiedTime, modTime)
-modTime > lastModifiedTime
-  } else {
-false
+  val statusList = Option(fs.listStatus(new Path(logDir))).map(_.toSeq)
+.getOrElse(Seq[FileStatus]())
+  val logInfos = statusList
+.filter { entry =>
+  try {
+val isFinishedApplication =
+  if (isLegacyLogDirectory(entry)) {
+fs.exists(new Path(entry.getPath(), APPLICATION_COMPLETE))
+  } else {
+
!entry.getPath().getName().endsWith(EventLoggingListener.IN_PROGRESS)
+  }
+
+if (isFinishedApplication) {
+  val modTime = getModificationTime(entry)
+  newLastModifiedTime = math.max(newLastModifiedTime, modTime)
+  modTime >= lastModifiedTime
+} else {
+  false
+}
+  } catch {
+case e: AccessControlException =>
+  // Do not use "logInfo" since these messages can get pretty 
noisy if printed on
+  // every poll.
+  logDebug(s"No permission to read $entry, ignoring.")
--- End diff --

When do we get these?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3607#issuecomment-66400109
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24288/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Use tag for help icon in web UI ...

2014-12-09 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3659#issuecomment-66400102
  
Yeah, I got lazy while grabbing screencaps.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3607#issuecomment-66400106
  
  [Test build #24288 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24288/consoleFull)
 for   PR 3607 at commit 
[`d619996`](https://github.com/apache/spark/commit/d61999620dde6f762d6611b721bd34f31f8c3ab3).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-12-09 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r21582645
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -221,33 +238,87 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
 }
   }
 
-  private def createReplayBus(logDir: FileStatus): (ReplayListenerBus, 
ApplicationEventListener) = {
-val path = logDir.getPath()
-val elogInfo = EventLoggingListener.parseLoggingInfo(path, fs)
-val replayBus = new ReplayListenerBus(elogInfo.logPaths, fs, 
elogInfo.compressionCodec)
-val appListener = new ApplicationEventListener
-replayBus.addListener(appListener)
-(replayBus, appListener)
+  /**
+   * Replays the event data in the given log, and returns the application 
information.
+   */
+  private def replay(logPath: FileStatus, bus: ReplayListenerBus): 
FsApplicationHistoryInfo = {
--- End diff --

also we use `logPath.getPath` a lot. Maybe we should just define
```
val logPath = eventLog.getPath
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Use tag for help icon in web UI ...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3659#issuecomment-6633
  
  [Test build #24295 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24295/consoleFull)
 for   PR 3659 at commit 
[`bd72899`](https://github.com/apache/spark/commit/bd7289909719c0cb9d566baed2affa478f359193).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-12-09 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r21582609
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -221,33 +238,87 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
 }
   }
 
-  private def createReplayBus(logDir: FileStatus): (ReplayListenerBus, 
ApplicationEventListener) = {
-val path = logDir.getPath()
-val elogInfo = EventLoggingListener.parseLoggingInfo(path, fs)
-val replayBus = new ReplayListenerBus(elogInfo.logPaths, fs, 
elogInfo.compressionCodec)
-val appListener = new ApplicationEventListener
-replayBus.addListener(appListener)
-(replayBus, appListener)
+  /**
+   * Replays the event data in the given log, and returns the application 
information.
+   */
+  private def replay(logPath: FileStatus, bus: ReplayListenerBus): 
FsApplicationHistoryInfo = {
--- End diff --

not really `logPath` right? This is more like `eventLog`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-12-09 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r21582597
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -221,33 +238,87 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
 }
   }
 
-  private def createReplayBus(logDir: FileStatus): (ReplayListenerBus, 
ApplicationEventListener) = {
-val path = logDir.getPath()
-val elogInfo = EventLoggingListener.parseLoggingInfo(path, fs)
-val replayBus = new ReplayListenerBus(elogInfo.logPaths, fs, 
elogInfo.compressionCodec)
-val appListener = new ApplicationEventListener
-replayBus.addListener(appListener)
-(replayBus, appListener)
+  /**
+   * Replays the event data in the given log, and returns the application 
information.
+   */
+  private def replay(logPath: FileStatus, bus: ReplayListenerBus): 
FsApplicationHistoryInfo = {
+val (logInput, sparkVersion) =
+  if (isLegacyLogDirectory(logPath)) {
+openOldLog(logPath.getPath())
--- End diff --

`openLegacyEventLog`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4809] Rework Guava library shading.

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3658#issuecomment-66399889
  
  [Test build #24285 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24285/consoleFull)
 for   PR 3658 at commit 
[`4a4ed42`](https://github.com/apache/spark/commit/4a4ed4202eac66bc288c8fcb2107b0608cc1e32f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4494] IDFModel.transform() add support ...

2014-12-09 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3603#issuecomment-66399885
  
@yu-iskw  Thanks for the PR!  I added some comments but left a question for 
@mengxr
Also, could you please add the [mllib] tag to the PR title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4809] Rework Guava library shading.

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3658#issuecomment-66399893
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24285/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4754] Refactor SparkContext into Execut...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3614#issuecomment-66399823
  
  [Test build #24286 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24286/consoleFull)
 for   PR 3614 at commit 
[`59baf6c`](https://github.com/apache/spark/commit/59baf6c4204b3a583c1d2a98e9abbee446358fda).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SparkContext(config: SparkConf) extends Logging with 
ExecutorAllocationClient `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4754] Refactor SparkContext into Execut...

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3614#issuecomment-66399829
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24286/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4494] IDFModel.transform() add support ...

2014-12-09 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3603#discussion_r21582550
  
--- Diff: python/pyspark/mllib/feature.py ---
@@ -212,7 +212,7 @@ class IDFModel(JavaVectorTransformer):
 """
 Represents an IDF model that can transform term frequency vectors.
 """
-def transform(self, dataset):
+def transform(self, data):
--- End diff --

This is a public API change, so it's a bit worrisome.  On the other hand, 
it makes sense to change it if this method includes single-vector transforms 
too.  @mengxr  Opinion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4494] IDFModel.transform() add support ...

2014-12-09 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3603#discussion_r21582552
  
--- Diff: python/pyspark/mllib/feature.py ---
@@ -220,12 +220,15 @@ def transform(self, dataset):
 the terms which occur in fewer than `minDocFreq`
 documents will have an entry of 0.
 
-:param dataset: an RDD of term frequency vectors
-:return: an RDD of TF-IDF vectors
+:param data: an RDD of term frequency vectors or a term frequency 
vector
+:return: an RDD of TF-IDF vectors or a TF-IDF vector
 """
-if not isinstance(dataset, RDD):
+if isinstance(data, RDD):
+return JavaVectorTransformer.transform(self, data)
+elif isinstance(data, Vector):
--- End diff --

It might be good to support native Python vector/array types, as in 
pyspark's LogisticRegressionModel.predict method in classification.py


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4494] IDFModel.transform() add support ...

2014-12-09 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3603#discussion_r21582540
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/IDFSuite.scala ---
@@ -53,6 +53,19 @@ class IDFSuite extends FunSuite with 
MLlibTestSparkContext {
 val tfidf2 = tfidf(2L).asInstanceOf[SparseVector]
 assert(tfidf2.indices === Array(1))
 assert(tfidf2.values(0) ~== (1.0 * expected(1)) absTol 1e-12)
+
+// Transforms local vectors
--- End diff --

Since this is the same set of checks as for the batch transform(), could 
they be moved to a helper method to eliminate duplicate code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-12-09 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r21582537
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -221,33 +238,87 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
 }
   }
 
-  private def createReplayBus(logDir: FileStatus): (ReplayListenerBus, 
ApplicationEventListener) = {
-val path = logDir.getPath()
-val elogInfo = EventLoggingListener.parseLoggingInfo(path, fs)
-val replayBus = new ReplayListenerBus(elogInfo.logPaths, fs, 
elogInfo.compressionCodec)
-val appListener = new ApplicationEventListener
-replayBus.addListener(appListener)
-(replayBus, appListener)
+  /**
+   * Replays the event data in the given log, and returns the application 
information.
--- End diff --

```
Replays the events in the specified log file, and returns information about 
the associated application.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4494] IDFModel.transform() add support ...

2014-12-09 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3603#discussion_r21582538
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/IDFSuite.scala ---
@@ -17,12 +17,10 @@
 
 package org.apache.spark.mllib.feature
 
-import org.scalatest.FunSuite
-
-import org.apache.spark.SparkContext._
 import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vectors}
 import org.apache.spark.mllib.util.MLlibTestSparkContext
 import org.apache.spark.mllib.util.TestingUtils._
+import org.scalatest.FunSuite
--- End diff --

Organize imports (scala/java, then non-spark, then spark)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4494] IDFModel.transform() add support ...

2014-12-09 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3603#discussion_r21582546
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/IDFSuite.scala ---
@@ -86,6 +101,19 @@ class IDFSuite extends FunSuite with 
MLlibTestSparkContext {
 val tfidf2 = tfidf(2L).asInstanceOf[SparseVector]
 assert(tfidf2.indices === Array(1))
 assert(tfidf2.values(0) ~== (1.0 * expected(1)) absTol 1e-12)
+
+// Transforms local vectors
--- End diff --

Same here (putting the checks in a helper method to eliminate duplicated 
code)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4494] IDFModel.transform() add support ...

2014-12-09 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3603#discussion_r21582536
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala ---
@@ -174,37 +174,18 @@ class IDFModel private[mllib] (val idf: Vector) 
extends Serializable {
*/
   def transform(dataset: RDD[Vector]): RDD[Vector] = {
 val bcIdf = dataset.context.broadcast(idf)
-dataset.mapPartitions { iter =>
-  val thisIdf = bcIdf.value
-  iter.map { v =>
-val n = v.size
-v match {
-  case sv: SparseVector =>
-val nnz = sv.indices.size
-val newValues = new Array[Double](nnz)
-var k = 0
-while (k < nnz) {
-  newValues(k) = sv.values(k) * thisIdf(sv.indices(k))
-  k += 1
-}
-Vectors.sparse(n, sv.indices, newValues)
-  case dv: DenseVector =>
-val newValues = new Array[Double](n)
-var j = 0
-while (j < n) {
-  newValues(j) = dv.values(j) * thisIdf(j)
-  j += 1
-}
-Vectors.dense(newValues)
-  case other =>
-throw new UnsupportedOperationException(
-  s"Only sparse and dense vectors are supported but got 
${other.getClass}.")
-}
-  }
-}
+dataset.mapPartitions(iter => iter.map(v => 
IDFModel.transform(bcIdf.value, v)))
   }
 
   /**
+   * Transforms tern frequency (TF) vectors to a TF-IDF vector
--- End diff --

typo: "tern"
also should be: "Transforms a term frequency (TF) vector"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Use tag for help icon in web UI ...

2014-12-09 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/3659#issuecomment-66399713
  
The new left-hand side gradient is a bit much, but otherwise this LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Use tag for help icon in web UI ...

2014-12-09 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3659#issuecomment-66399624
  
/cc @aarondav, who suggested the `` tag.  I think this should go into 
`branch-1.2`, since that's where this feature was introduced.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Config updates for the new shuffle transport.

2014-12-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3657


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Use tag for help icon in web UI ...

2014-12-09 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/3659

[Minor] Use  tag for help icon in web UI page header

This small commit makes the `(?)` web UI help link into a superscript, 
which should address feedback that the current design makes it look like an 
error occurred or like information is missing.

Before:


![image](https://cloud.githubusercontent.com/assets/50748/5370611/a3ed0034-7fd9-11e4-870f-05bd9faad5b9.png)


After:


![image](https://cloud.githubusercontent.com/assets/50748/5370602/6c5ca8d6-7fd9-11e4-8d1a-568d71290aa7.png)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark webui-help-sup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3659.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3659


commit bd7289909719c0cb9d566baed2affa478f359193
Author: Josh Rosen 
Date:   2014-12-10T03:25:30Z

Use  tag for help icon in web UI page header.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-09 Thread nkronenfeld

Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-66399588
  
sorry, must have accidentally hit cancel instead of comment the first time. 
 Should be set now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4771][Docs] Document standalone cluster...

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3627#issuecomment-66399541
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24284/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4771][Docs] Document standalone cluster...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3627#issuecomment-66399537
  
  [Test build #24284 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24284/consoleFull)
 for   PR 3627 at commit 
[`9ca0908`](https://github.com/apache/spark/commit/9ca0908e632fc7434aea1a98511bd5edd1c744ff).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4161]Spark shell class path is not corr...

2014-12-09 Thread witgo

Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/3050#issuecomment-66399472
  
In my local test, it works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

2014-12-09 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1222#discussion_r21582392
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -149,41 +162,45 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
* Tries to reuse as much of the data already in memory as possible, by 
not reading
* applications that haven't been updated since last time the logs were 
checked.
*/
-  private def checkForLogs() = {
+  private[history] def checkForLogs() = {
--- End diff --

can you add `: Unit`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Config updates for the new shuffle transport.

2014-12-09 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/3657#issuecomment-66399456
  
Merging this into master and branch-1.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4730][YARN] Warn against deprecated YAR...

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3590#issuecomment-66399349
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24287/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4730][YARN] Warn against deprecated YAR...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3590#issuecomment-66399346
  
  [Test build #24287 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24287/consoleFull)
 for   PR 3590 at commit 
[`36e0753`](https://github.com/apache/spark/commit/36e075348b9dcbbea74dcff3d63f94d1f29f2db7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Config updates for the new shuffle transport.

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3657#issuecomment-66399245
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24281/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Config updates for the new shuffle transport.

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3657#issuecomment-66399244
  
  [Test build #24281 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24281/consoleFull)
 for   PR 3657 at commit 
[`7370eab`](https://github.com/apache/spark/commit/7370eaba0fc13505671bc7c60df952bd45e20cbd).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2014-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3655#issuecomment-66399109
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24282/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3655#issuecomment-66399105
  
  [Test build #24282 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24282/consoleFull)
 for   PR 3655 at commit 
[`5e2e7ad`](https://github.com/apache/spark/commit/5e2e7ad479d2739c4f1bd62fd1d48b216b2bdce0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update FsHistoryProvider.scala

2014-12-09 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2546#issuecomment-66399032
  
@397090770 given that a lot of changes have gone in since this was opened, 
I would recommend that we close this issue for now until we describe the issue 
in a JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3611] Show number of cores for each exe...

2014-12-09 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2980#issuecomment-66398809
  
Hey @devldevelopment given that there is no simple way to support this 
across different cluster managers, I would recommend that we close this issue 
for now. We can reopen it once we figure out a better design.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4771][Docs] Document standalone cluster...

2014-12-09 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3627#issuecomment-66398746
  
LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4771][Docs] Document standalone cluster...

2014-12-09 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3627#discussion_r21582109
  
--- Diff: docs/spark-standalone.md ---
@@ -272,6 +272,15 @@ should specify them through the `--jars` flag using 
comma as a delimiter (e.g. `
 To control the application's configuration or execution environment, see
 [Spark Configuration](configuration.html).
 
+Additionally, standalone `cluster` mode supports restarting your 
application automatically if it
--- End diff --

nit: my point was also that it is known that we are talking about 
"standalone"  so you can just say "cluster mode". But its okay. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4806] Streaming doc update for 1.2

2014-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3653#issuecomment-66398673
  
  [Test build #24294 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24294/consoleFull)
 for   PR 3653 at commit 
[`17b99fb`](https://github.com/apache/spark/commit/17b99fbaf699c54bf75893b98c66ec5e3fde30ba).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-09 Thread nkronenfeld

Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-66398560
  
I thought I'd done so, it looks like it lost my changes
I'll fix that asap



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-09 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-66398459
  
LGTM.  Just in case you missed my earlier comment, are you still planning 
to update the PR description to reflect the actual changes vs. the ones you had 
planned?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4749: Allow initializing KMeans clusters...

2014-12-09 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3610#issuecomment-66398376
  
@nxwhite-str  Thanks for the PR!  Could you please update the title to 
start with "[SPARK-4749] [mllib]" to help with automated tagging?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-12-09 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2516#issuecomment-66398367
  
Hey @tigerquoll usually for large patches like this we require a design doc 
on the JIRA. Especially because the priority of this is not super important, I 
would recommend that we close this issue for now, and maybe open a new one 
later once there is a consensus on how we should restructure Spark submit. 
Thanks for your work so far.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4749: Allow initializing KMeans clusters...

2014-12-09 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3610#discussion_r21581982
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -43,7 +43,8 @@ class KMeans private (
 private var runs: Int,
 private var initializationMode: String,
 private var initializationSteps: Int,
-private var epsilon: Double) extends Serializable with Logging {
+private var epsilon: Double,
+private var seed: Long = System.nanoTime()) extends Serializable with 
Logging {
--- End diff --

Could you set the default in the one public constructor instead since 
that's where other defaults are set?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4749: Allow initializing KMeans clusters...

2014-12-09 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3610#discussion_r21581986
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -353,6 +359,31 @@ object KMeans {
   }
 
   /**
+   * Trains a k-means model using the given set of parameters.
+   *
+   * @param data training points stored as `RDD[Array[Double]]`
+   * @param k number of clusters
+   * @param maxIterations max number of iterations
+   * @param runs number of parallel runs, defaults to 1. The best model is 
returned.
+   * @param initializationMode initialization model, either "random" or 
"k-means||" (default).
+   * @param seed seed value for cluster initialization
--- End diff --

In doc: Maybe say "random seed value" instead of "seed value" since I could 
imagine people mistaking "seed" to mean "initial cluster center" at first 
glance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4749: Allow initializing KMeans clusters...

2014-12-09 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3610#discussion_r21581990
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala ---
@@ -90,6 +90,27 @@ class KMeansSuite extends FunSuite with 
MLlibTestSparkContext {
 assert(model.clusterCenters.size === 3)
   }
 
+  test("deterministic initilization") {
+// Create a large-ish set of point to cluster
--- End diff --

typo: "point" --> "points"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2014-12-09 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/3655#issuecomment-66398284
  
Thanks Hari, seems this is a simple solution. BTW should we make `count = 
3` as a configurable parameter? For others LGTM. 

Original thoughts of introducing pending queue probably will make the 
design much more complex because of synchronization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 476 matches

Mail list logo