date:20141208

[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...

2014-12-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2956#issuecomment-66246533
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24239/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2956#issuecomment-66246526
  
  [Test build #24239 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24239/consoleFull)
 for   PR 2956 at commit 
[`c73ee63`](https://github.com/apache/spark/commit/c73ee632d3531f28d38cdc245739921acdcd2795).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3154][STREAMING] Replace ConcurrentHash...

2014-12-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3634


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3154][STREAMING] Replace ConcurrentHash...

2014-12-08 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/3634#discussion_r21511496
  
--- Diff: 
external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkAvroCallbackHandler.scala
 ---
@@ -47,8 +47,8 @@ private[flume] class SparkAvroCallbackHandler(val 
threads: Int, val channel: Cha
   val transactionExecutorOpt = Option(Executors.newFixedThreadPool(threads,
 new ThreadFactoryBuilder().setDaemon(true)
   .setNameFormat("Spark Sink Processor Thread - %d").build()))
-  private val sequenceNumberToProcessor =
-new ConcurrentHashMap[CharSequence, TransactionProcessor]()
+  // Protected by `sequenceNumberToProcessor`
--- End diff --

Ah, did not realize it was not a javax standard. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3154][STREAMING] Replace ConcurrentHash...

2014-12-08 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/3634#issuecomment-66246264
  
Merging into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4785] [SQL] Support udf instance ser/de...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3640#issuecomment-66244346
  
  [Test build #24240 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24240/consoleFull)
 for   PR 3640 at commit 
[`396c0e1`](https://github.com/apache/spark/commit/396c0e1bf10d4ca69675801aa68bf9b21ba5c9bf).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HiveFunctionCache(var functionClassName: String) extends 
java.io.Externalizable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4785] [SQL] Support udf instance ser/de...

2014-12-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3640#issuecomment-66244351
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24240/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4793] [Deploy] ensure .jar at end of li...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3641#issuecomment-66243872
  
  [Test build #24241 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24241/consoleFull)
 for   PR 3641 at commit 
[`45cbfd0`](https://github.com/apache/spark/commit/45cbfd03444b2d4022dd4d564100a6bac9924d1d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4793] [Deploy] ensure .jar at end of li...

2014-12-08 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/3641

[SPARK-4793] [Deploy] ensure .jar at end of line

sometimes I switch between different version and do not want to rebuild 
spark, so I rename assembly.jar into .jar.bak, but still caught by 
`compute-classpath.sh`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark jar

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3641.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3641


commit 45cbfd03444b2d4022dd4d564100a6bac9924d1d
Author: Daoyuan Wang 
Date:   2014-12-09T07:13:04Z

ensure .jar at end of line




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4785] [SQL] Support udf instance ser/de...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3640#issuecomment-66239947
  
  [Test build #24240 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24240/consoleFull)
 for   PR 3640 at commit 
[`396c0e1`](https://github.com/apache/spark/commit/396c0e1bf10d4ca69675801aa68bf9b21ba5c9bf).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2956#issuecomment-66239252
  
  [Test build #24239 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24239/consoleFull)
 for   PR 2956 at commit 
[`c73ee63`](https://github.com/apache/spark/commit/c73ee632d3531f28d38cdc245739921acdcd2795).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-08 Thread nkronenfeld

Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/3570#issuecomment-66238332
  
Any word on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3607#issuecomment-66235605
  
  [Test build #24236 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24236/consoleFull)
 for   PR 3607 at commit 
[`b09c309`](https://github.com/apache/spark/commit/b09c309cc62f3d1c105f2c29a1ec8fdec92172bf).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3607#issuecomment-66235607
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24236/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4785] [SQL] Support udf instance ser/de...

2014-12-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3640#issuecomment-66234312
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24238/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4785] [SQL] Support udf instance ser/de...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3640#issuecomment-66234309
  
  [Test build #24238 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24238/consoleFull)
 for   PR 3640 at commit 
[`e9c3212`](https://github.com/apache/spark/commit/e9c32129cdeb14b424bf9c19445efba9378fc2ba).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HiveFunctionCache(var functionClassName: String) extends 
java.io.Externalizable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3607#issuecomment-66234140
  
  [Test build #24235 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24235/consoleFull)
 for   PR 3607 at commit 
[`ab16bb5`](https://github.com/apache/spark/commit/ab16bb5c996ff499ca3ef76a6d48617fe1efab8c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3607#issuecomment-66234144
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24235/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4785] [SQL] Support udf instance ser/de...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3640#issuecomment-66233507
  
  [Test build #24237 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24237/consoleFull)
 for   PR 3640 at commit 
[`19cbd46`](https://github.com/apache/spark/commit/19cbd463c60fcf408b003fa53f3a1c7d8c7cbac3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HiveFunctionCache(var functionClassName: String) extends 
java.io.Externalizable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4785] [SQL] Support udf instance ser/de...

2014-12-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3640#issuecomment-66233511
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24237/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4785] [SQL] Support udf instance ser/de...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3640#issuecomment-66232292
  
  [Test build #24238 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24238/consoleFull)
 for   PR 3640 at commit 
[`e9c3212`](https://github.com/apache/spark/commit/e9c32129cdeb14b424bf9c19445efba9378fc2ba).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4785] [SQL] Support udf instance ser/de...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3640#issuecomment-66231529
  
  [Test build #24237 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24237/consoleFull)
 for   PR 3640 at commit 
[`19cbd46`](https://github.com/apache/spark/commit/19cbd463c60fcf408b003fa53f3a1c7d8c7cbac3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4785] [SQL] Support udf instance ser/de...

2014-12-08 Thread chenghao-intel

GitHub user chenghao-intel opened a pull request:

https://github.com/apache/spark/pull/3640

[SPARK-4785] [SQL] Support udf instance ser/de after initialization

UDF contract change in Hive 0.13.1. In Hive 0.12.0, it's always safe to 
construct and initialize a fresh UDF object on worker side, while in Hive 
0.13.1, UDF objects should only be initialized on driver side and then 
serialized to the worker side. We provide the ability to serialize the 
initialized UDF instance and deserialize them cross process boundary.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenghao-intel/spark udf_serde

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3640.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3640


commit 19cbd463c60fcf408b003fa53f3a1c7d8c7cbac3
Author: Cheng Hao 
Date:   2014-12-09T03:46:05Z

support udf instance ser/de after initialization




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4714][CORE]: Check block have removed o...

2014-12-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-66231241
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24234/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4714][CORE]: Check block have removed o...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-66231235
  
  [Test build #24234 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24234/consoleFull)
 for   PR 3574 at commit 
[`edb989d`](https://github.com/apache/spark/commit/edb989dac31dd5672c7677090ae637a2d7328ed8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3607#issuecomment-66231049
  
  [Test build #24236 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24236/consoleFull)
 for   PR 3607 at commit 
[`b09c309`](https://github.com/apache/spark/commit/b09c309cc62f3d1c105f2c29a1ec8fdec92172bf).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3607#issuecomment-66229430
  
  [Test build #24235 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24235/consoleFull)
 for   PR 3607 at commit 
[`ab16bb5`](https://github.com/apache/spark/commit/ab16bb5c996ff499ca3ef76a6d48617fe1efab8c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3405] add subnet-id and vpc-id options ...

2014-12-08 Thread jeffsteinmetz

Github user jeffsteinmetz commented on the pull request:

https://github.com/apache/spark/pull/2872#issuecomment-66229322
  
The EC2 docs could also be updated to include these new switches.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4742][SQL] The name of Parquet File gen...

2014-12-08 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/3602#issuecomment-66226961
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4714][CORE]: Check block have removed o...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-66225436
  
  [Test build #24234 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24234/consoleFull)
 for   PR 3574 at commit 
[`edb989d`](https://github.com/apache/spark/commit/edb989dac31dd5672c7677090ae637a2d7328ed8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4714][CORE]: Check block have removed o...

2014-12-08 Thread suyanNone

Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-66224947
  
@JoshRosen  I refine the code according your comments.
If still have problem, it OK for you to fix up including the title and 
comments, and thanks for you to check my code and give suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3154][STREAMING] Replace ConcurrentHash...

2014-12-08 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/3634#discussion_r21503445
  
--- Diff: 
external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkAvroCallbackHandler.scala
 ---
@@ -47,8 +47,8 @@ private[flume] class SparkAvroCallbackHandler(val 
threads: Int, val channel: Cha
   val transactionExecutorOpt = Option(Executors.newFixedThreadPool(threads,
 new ThreadFactoryBuilder().setDaemon(true)
   .setNameFormat("Spark Sink Processor Thread - %d").build()))
-  private val sequenceNumberToProcessor =
-new ConcurrentHashMap[CharSequence, TransactionProcessor]()
+  // Protected by `sequenceNumberToProcessor`
--- End diff --

`external/flume-sink` doesn't depend on jsr305. However, great to know that 
we can use `GuardedBy` in Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Core]Remove duplicated code in DAGScheduler

2014-12-08 Thread CodEnFisH

Github user CodEnFisH commented on the pull request:

https://github.com/apache/spark/pull/3421#issuecomment-66223296
  
Sure, we can close this PR first. 
I will spare more time next week to work on #3515 and keep you posted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Core]Remove duplicated code in DAGScheduler

2014-12-08 Thread CodEnFisH

Github user CodEnFisH closed the pull request at:

https://github.com/apache/spark/pull/3421


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-66222053
  
Left a couple more comments.  This is looking _much_ better than the code 
that was there before; thanks for your patience with the really late review.  I 
have to run now, but I'll loop back later to take a more detailed look later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...

2014-12-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3336


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21502123
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
--- End diff --

I noticed this because there was a `tempFile.delete()` call in the old code 
that wasn't preserved here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21501964
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
--- End diff --

This should probably be surrounded in a `try-catch` block so that we delete 
`tempFile` if `copyFile` throws an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] remove unnecessary import in spark-sql

2014-12-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3630


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] remove unnecessary import in spark-sql

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3630#issuecomment-66220273
  
Alright, merging this into `master`.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21501514
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
+  }
+
+  /**
+   * Copy file from `sourceFile` to `destFile`, checking whether 
`destFile` already exists, has
+   * the same contents as the downloaded file, and can be overwritten. 
Optionally removes
+   * `sourceFile` by moving instead of copying.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param sourceFile File path to copy/move from.
+   * @param destFile File path to copy/move to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   * @param removeSourceFile Whether to remove `sourceFile` after / as 
part of moving/copying it to
+   * `destFile`.
+   */
+  private def copyFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean,
+removeSourceFile: Boolean = false): Unit = {
+
+var shouldCopy = true
--- End diff --

This is super-nitpicky of me, but I love to avoid mutability whenever 
possible, so it would be nice to see if there was a clean way to remove this 
variable.  It looks like `shouldCopy=false` only in the case where the file 
contents are the same, so maybe we could just add a `return` on that branch and 
can remove the `shouldCopy` variable entirely.  This would let us remove the 
`if` on line 478,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66219789
  
QA results for PR 1269:- This patch FAILED unit tests.For more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/542/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...

2014-12-08 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2661#issuecomment-66219684
  
O.K, I'll close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...

2014-12-08 Thread sarutak

Github user sarutak closed the pull request at:

https://github.com/apache/spark/pull/2661


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21501367
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
--- End diff --

Fairly minor style nit, but do you mind indenting these method parameters 
by two more spaces, like `fetchFile` above:

```
   def fetchFile(
   url: String,
   targetDir: File,
   conf: SparkConf,
   securityMgr: SecurityManager,
   hadoopConf: Configuration,
   timestamp: Long,
   useCache: Boolean) {
 val fileName = url.split("/").last
```

This is a really minor style point that I've been guilty of overlooking 
myself: https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21501389
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
--- End diff --

Mind removing this extra blank line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] remove unnecessary import in spark-sql

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3630#issuecomment-66218996
  
  [Test build #24233 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24233/consoleFull)
 for   PR 3630 at commit 
[`150e7e0`](https://github.com/apache/spark/commit/150e7e0f4b0ec0eaa39736262d69c81d4ee83486).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] remove unnecessary import in spark-sql

2014-12-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3630#issuecomment-66219000
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24233/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] remove unnecessary import in spark-sql

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3630#issuecomment-66214972
  
This LGTM, pending Jenkins.  I just wanted to retest it to avoid a 
build-break.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4770. [DOC] [YARN] spark.scheduler.minRe...

2014-12-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3624


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...

2014-12-08 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r21499038
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/LabeledPoint.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import scala.beans.BeanInfo
+
+import org.apache.spark.annotation.AlphaComponent
+import org.apache.spark.mllib.linalg.Vector
+
+/**
+ * :: AlphaComponent ::
+ * Class that represents an instance (data point) for prediction tasks.
+ *
+ * @param label Label to predict
+ * @param features List of features describing this instance
+ * @param weight Instance weight
+ */
+@AlphaComponent
+@BeanInfo
+case class LabeledPoint(label: Double, features: Vector, weight: Double) {
--- End diff --

By the way, I think I'd be OK with the 2nd option above as long as we can 
come up with simple APIs for users who don't want to think about types.  That 
might involve default conversions between types (like one-hot encoding, and 
binning).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...

2014-12-08 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r21498969
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/LabeledPoint.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import scala.beans.BeanInfo
+
+import org.apache.spark.annotation.AlphaComponent
+import org.apache.spark.mllib.linalg.Vector
+
+/**
+ * :: AlphaComponent ::
+ * Class that represents an instance (data point) for prediction tasks.
+ *
+ * @param label Label to predict
+ * @param features List of features describing this instance
+ * @param weight Instance weight
+ */
+@AlphaComponent
+@BeanInfo
+case class LabeledPoint(label: Double, features: Vector, weight: Double) {
--- End diff --

Do you mean Vector would be replaced with Array[Feature] where Feature 
would have subclasses like:
* ContinuousFeature(f: Double)
* CategoricalFeature(f: Int)

That loses a lot of the benefits of Vector (fewer Java Objects).

Or do you mean Vector would be replaced with Features, which has subclasses 
like:
* ContinuousFeatures(f: Vector)
* CategoricalFeatures(f: Array[Int])
* MixedFeatures(contFeat: Vector, catFeat: Array[Int])

That would be reasonably efficient but would be a bit more awkward for both 
developers (APIs + casting) and users (casting data loaded from elsewhere).

W.r.t. generic types, I agree it would be unusual to want more than real 
values and categorical values, but I could imagine weak learning algorithms 
specific to images or text which operate on special types.  (On a related note, 
I'm debating whether boosting and bagging should support this typed API at all. 
 They will need types for labels but not for features.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4770. [DOC] [YARN] spark.scheduler.minRe...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3624#issuecomment-66214551
  
I grepped through the code and this LGTM, too, so I'm going to merge this 
into `master` and `branch-1.2`.  Thanks @sryza!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-08 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3607#issuecomment-66214045
  
Hey @WangTaoTheTonic I left a few comments. Also, could you document this? 
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-12-08 Thread kanzhang

Github user kanzhang closed the pull request at:

https://github.com/apache/spark/pull/1082


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2661#issuecomment-66213955
  
Yeah, it looks like this has been subsumed by Andrew's changes in 
723a86b04cfbc178fbd57bb78f4a2becc5cb1ef1, so do you mind closing this PR?  
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-12-08 Thread kanzhang

Github user kanzhang commented on the pull request:

https://github.com/apache/spark/pull/1082#issuecomment-66213985
  
OK, closing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-08 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3607#discussion_r21498649
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala 
---
@@ -54,8 +46,25 @@ private[spark] class ClientArguments(args: 
Array[String], sparkConf: SparkConf)
   loadEnvironmentArgs()
   validateArgs()
 
+  // Additional memory to allocate to containers
+  // For now, use driver's memory overhead as our AM container's memory 
overhead
+  val memOverheadStr = if (userClass == null) {
+"spark.yarn.driver.memoryOverhead"
+  } else {
+"spark.yarn.am.memoryOverhead"
+  }
+  val amMemoryOverhead = sparkConf.getInt(memOverheadStr,
+math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, 
MEMORY_OVERHEAD_MIN))
+
+  val executorMemoryOverhead = 
sparkConf.getInt("spark.yarn.executor.memoryOverhead",
+math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, 
MEMORY_OVERHEAD_MIN))
+
   /** Load any default arguments provided through environment variables 
and Spark properties. */
   private def loadEnvironmentArgs(): Unit = {
+// We use spark.yarn.am.memory to initialize Application Master in 
yarn-client mode.
--- End diff --

We need to add a comment here:
```
// This does not apply to cluster mode because the driver and the AM live 
in the same JVM in this mode
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [DOC] update IntelliJ IDEA profile description

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3584#issuecomment-66213786
  
This looks good to me, although I wanted to note that we have some more 
detailed IntelliJ instructions over at 
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools.  
Maybe we should change these docs to link to that instead, since those 
instructions will be more up-to-date.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-08 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3607#discussion_r21498553
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala 
---
@@ -54,8 +46,25 @@ private[spark] class ClientArguments(args: 
Array[String], sparkConf: SparkConf)
   loadEnvironmentArgs()
   validateArgs()
 
+  // Additional memory to allocate to containers
+  // For now, use driver's memory overhead as our AM container's memory 
overhead
+  val memOverheadStr = if (userClass == null) {
--- End diff --

also, we should call this `memoryOverheadConfKey`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-08 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3607#discussion_r21498518
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala 
---
@@ -54,8 +46,25 @@ private[spark] class ClientArguments(args: 
Array[String], sparkConf: SparkConf)
   loadEnvironmentArgs()
   validateArgs()
 
+  // Additional memory to allocate to containers
+  // For now, use driver's memory overhead as our AM container's memory 
overhead
+  val memOverheadStr = if (userClass == null) {
+"spark.yarn.driver.memoryOverhead"
+  } else {
+"spark.yarn.am.memoryOverhead"
--- End diff --

I don't think this is correct. Just to confirm my understanding: In cluster 
mode, the AM and the driver are in the same JVM, so we ignore 
`spark.yarn.am.memoryOverhead`. In this case, `userClass` should not be null, 
which is this else case here. Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Core]Remove duplicated code in DAGScheduler

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3421#issuecomment-66213595
  
Do you mind closing this PR for now?  I'd be happy to discuss more 
DAGScheudler refactorings later, but it would be nice get this out of the 
review queue for now unless you plan to work on it soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3926 [CORE] Reopened: result of JavaRDD ...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3587#issuecomment-66213344
  
Merged to `branch-1.1`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4765] Make GC time always shown in UI.

2014-12-08 Thread kayousterhout

Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/3622#issuecomment-66213467
  
@JoshRosen One thing I was wondering about here that I couldn't figure out: 
what does "private" actually mean in this context?  I would have thought it 
meant that it could only be used in the context of the same file, but that's 
obviously not the case since it's used in other UI classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-08 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3607#discussion_r21498374
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala 
---
@@ -54,8 +46,25 @@ private[spark] class ClientArguments(args: 
Array[String], sparkConf: SparkConf)
   loadEnvironmentArgs()
   validateArgs()
 
+  // Additional memory to allocate to containers
+  // For now, use driver's memory overhead as our AM container's memory 
overhead
+  val memOverheadStr = if (userClass == null) {
--- End diff --

I have to look this up every time. We should define
```
private val isDriver = userClass != null
```
so it's clearer what mode we're dealing with


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3926 [CORE] Reopened: result of JavaRDD ...

2014-12-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3587


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3926 [CORE] Reopened: result of JavaRDD ...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3587#issuecomment-66213074
  
LGTM, too.  Merging into `master` and tagging for `branch-1.2` backport.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...

2014-12-08 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r21498173
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/LabeledPoint.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import scala.beans.BeanInfo
+
+import org.apache.spark.annotation.AlphaComponent
+import org.apache.spark.mllib.linalg.Vector
+
+/**
+ * :: AlphaComponent ::
+ * Class that represents an instance (data point) for prediction tasks.
+ *
+ * @param label Label to predict
+ * @param features List of features describing this instance
+ * @param weight Instance weight
+ */
+@AlphaComponent
+@BeanInfo
+case class LabeledPoint(label: Double, features: Vector, weight: Double) {
--- End diff --

I think it need not be designed with generic types. In fact it can't really 
since there are N features of different types. But you can have a Feature class 
with subclasses for ordered and categorical types. That too has its own 
tradeoffs. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] remove unnecessary import in spark-sql

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3630#issuecomment-66212933
  
  [Test build #24233 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24233/consoleFull)
 for   PR 3630 at commit 
[`150e7e0`](https://github.com/apache/spark/commit/150e7e0f4b0ec0eaa39736262d69c81d4ee83486).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4765] Make GC time always shown in UI.

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3622#issuecomment-66212544
  
```
[error]  * method GC_TIME()java.lang.String in object 
org.apache.spark.ui.jobs.TaskDetailsClassNames does not have a correspondent in 
new version
[error]filter with: 
ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.ui.jobs.TaskDetailsClassNames.GC_TIME")
```

That class is `private`, so I think this is a false-positive MiMa warning.  
You can probably add an exclude and re-run.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...

2014-12-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3409#issuecomment-66212417
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24232/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] remove unnecessary import in spark-sql

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3630#issuecomment-66212396
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3409#issuecomment-66212407
  
  [Test build #24232 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24232/consoleFull)
 for   PR 3409 at commit 
[`e3f9abe`](https://github.com/apache/spark/commit/e3f9abeaa82018835cd9a7055adba0dabc451a24).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add error message when making local dir unsucc...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3635#issuecomment-66212364
  
Mind creating a JIRA for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4750] Dynamic allocation - synchronize ...

2014-12-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3612


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...

2014-12-08 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r21497595
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/LabeledPoint.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import scala.beans.BeanInfo
+
+import org.apache.spark.annotation.AlphaComponent
+import org.apache.spark.mllib.linalg.Vector
+
+/**
+ * :: AlphaComponent ::
+ * Class that represents an instance (data point) for prediction tasks.
+ *
+ * @param label Label to predict
+ * @param features List of features describing this instance
+ * @param weight Instance weight
+ */
+@AlphaComponent
+@BeanInfo
+case class LabeledPoint(label: Double, features: Vector, weight: Double) {
--- End diff --

I agree with you about strong types being nice.  I think optimizations with 
using fewer bits should remain internal.  For the user-facing API, it's really 
a question about whether we want to incur some extra overhead with strong types:
* Users & developers have to write LabeledPoint[Double, Vector] or 
LabeledPoint[Int, Vector] instead of LabeledPoint.
  * As far as I know, we can't have default type parameters, so users could 
never write LabeledPoint.  (a rare time I miss C++)
* Algorithm APIs get messier to look at (passing about LabelType and 
FeaturesType).  This is especially annoying with meta-algorithms (boosting & 
bagging).

Personally, I'd be OK with heavy typing (coming from C++ land), but it 
might offend some Scala users.

CC: @mengxr since I know you have opinions about this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66211199
  
QA tests have started for PR 1269. This patch DID NOT merge cleanly! 
View progress: 
https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/542/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-12-08 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/1269#issuecomment-66210825
  
@akopich  

The test failure seems unrelated (from a Python SQL test).  I'll re-run the 
tests.

(2) Regular and Robust in the same class

Would it work to have the regular parameters inherit from the robust, where 
the regular would override certain behavior to effectively fix the value of the 
noise?

(4) Float vs. Double and linear algebra operations

I think I wasn't clear.  Since you want to use Float, I don't think it 
makes sense to move code to linalg/ currently.  So 2 options for your code are: 
(a) Keep as is (Array[Array[Float]]) or (b) Use Breeze types internally (since 
they take type parameters).  (a) seems OK, but (b) might make code cleaner.  
Your call.

(5) Enumerator

Sure, it does seem useful.  Perhaps it could go in mllib/feature/ and 
follow the API of other transformers (such as HashingTF and Word2Vec).  
Parameters can be set via setters like setThreshold(), and numerate() can 
become transform().



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4774] [SQL] Makes HiveFromSpark more po...

2014-12-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3628


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4774] [SQL] Makes HiveFromSpark more po...

2014-12-08 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3628#issuecomment-66209979
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...

2014-12-08 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r21496412
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/LabeledPoint.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import scala.beans.BeanInfo
+
+import org.apache.spark.annotation.AlphaComponent
+import org.apache.spark.mllib.linalg.Vector
+
+/**
+ * :: AlphaComponent ::
+ * Class that represents an instance (data point) for prediction tasks.
+ *
+ * @param label Label to predict
+ * @param features List of features describing this instance
+ * @param weight Instance weight
+ */
+@AlphaComponent
+@BeanInfo
+case class LabeledPoint(label: Double, features: Vector, weight: Double) {
--- End diff --

I'm also more sympathetic with a strongly-typed API here rather than 
overload floating-point values to represent unordered categories. Are there 
really so many possibilities? Any continuous or ordinal value really does 
naturally translate to a double. Categoricals are the only other type of value 
that needs a separate representation. 

I feel like this misses some opportunities to optimize the internal 
representation (e.g. a Dataset whose feature is known to be one of N values 
doesn't need a double, but potentially just N bits) and avoid ambiguities of 
representation (is negative -1? 0?) This is one of those areas where the 
'simple' API just seems to push complexity elsewhere or ignore it. An algorithm 
either has to have its own checks for whether 1.0 is a category or not, or, 
overlooks the distinction. Same with the caller.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4567. Make SparkJobInfo and SparkStageIn...

2014-12-08 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3426#issuecomment-66208947
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4750] Dynamic allocation - synchronize ...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3612#issuecomment-66209014
  
LGTM.  I'm going to merge this into `master` and add a `backport-needed` 
tag so we don't forget to merge this into `branch-1.2`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-08 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/1379#issuecomment-66208868
  
@avulanov  Nice tests!  A few comments:
* Computing accuracy: It would be good to test on the original MNIST test 
set, rather than a subset of the training set.  The training set includes a 
bunch of duplicates of images with slight modifications, so results on it might 
be misleading.
* The timing tests look pretty convincing for ANN!  Can you please confirm 
whether both algorithms did all 40 iterations?  Or did they sometimes stop 
early b/c of the convergence tolerance?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...

2014-12-08 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r21495884
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/LabeledPoint.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import scala.beans.BeanInfo
+
+import org.apache.spark.annotation.AlphaComponent
+import org.apache.spark.mllib.linalg.Vector
+
+/**
+ * :: AlphaComponent ::
+ * Class that represents an instance (data point) for prediction tasks.
+ *
+ * @param label Label to predict
+ * @param features List of features describing this instance
+ * @param weight Instance weight
+ */
+@AlphaComponent
+@BeanInfo
+case class LabeledPoint(label: Double, features: Vector, weight: Double) {
--- End diff --

There's some discussion of this in the design doc linked from the JIRA.  
Basically, there could be a whole range of types, and it's a question of 
simplicity of the API vs. strong typing.  I thought about templatizing 
LabeledPoint by LabelType and FeaturesType, but it makes developers & users 
have to write a bunch more boilerplate whenever they specify types.  The 
current plan is to use Double for single labels and eventually some other type 
(Vector?) for multiple labels.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...

2014-12-08 Thread Lewuathe

Github user Lewuathe commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r21494740
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/LabeledPoint.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import scala.beans.BeanInfo
+
+import org.apache.spark.annotation.AlphaComponent
+import org.apache.spark.mllib.linalg.Vector
+
+/**
+ * :: AlphaComponent ::
+ * Class that represents an instance (data point) for prediction tasks.
+ *
+ * @param label Label to predict
+ * @param features List of features describing this instance
+ * @param weight Instance weight
+ */
+@AlphaComponent
+@BeanInfo
+case class LabeledPoint(label: Double, features: Vector, weight: Double) {
--- End diff --

Why is a label of `LabeledPoint` assumed as only `Double`? I think there 
are some cases where label is not `Double` such as one-of-k encoding. It seems 
better not to restrict to `Double` type. If I missed some alternatives, sorry 
for that and please let me know. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4759] Avoid using empty string as defau...

2014-12-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3633#issuecomment-66204818
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24231/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4759] Avoid using empty string as defau...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3633#issuecomment-66204811
  
  [Test build #24231 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24231/consoleFull)
 for   PR 3633 at commit 
[`f370a4e`](https://github.com/apache/spark/commit/f370a4e710b1ff29a5749944a1557de233223dc6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4770. [DOC] [YARN] spark.scheduler.minRe...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3624#issuecomment-66204351
  
@tgravescs It should be fine to pull docs-only changes into `branch-1.2`.  
We're trying to hold off on merging code changes that aren't addressing 1.2.0 
release blockers because we don't want to risk introducing new regressions and 
having to call new votes.  If you do want to merge a code change that should 
eventually be backported into `branch-1.2`, just merge it into the other 
branches, leave its JIRA open with 1.2.1 listed in Target Version/s and not Fix 
Version/s, then add the `backport-needed` label to the issue so that we 
remember to come back to it after 1.2.0 is released.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2014-12-08 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3607#discussion_r21493950
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala 
---
@@ -54,8 +46,25 @@ private[spark] class ClientArguments(args: 
Array[String], sparkConf: SparkConf)
   loadEnvironmentArgs()
   validateArgs()
 
+  // Additional memory to allocate to containers
+  // For now, use driver's memory overhead as our AM container's memory 
overhead
--- End diff --

This comment is no longer true


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...

2014-12-08 Thread Lewuathe

Github user Lewuathe commented on the pull request:

https://github.com/apache/spark/pull/3636#issuecomment-66203442
  
@jkbradley Thank you for reviewing. I'll update these points soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...

2014-12-08 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3637#issuecomment-66203211
  
The test failure reveals an issue in Spark SQL (ScalaReflection.scala:121 
in schemaFor) where it gets confused if the case class includes multiple 
constructors.  The default behavior should probably be to take the constructor 
with the most arguments, but I'll consult others about this.  This PR may be on 
temporary hold...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3409#issuecomment-66202544
  
  [Test build #24232 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24232/consoleFull)
 for   PR 3409 at commit 
[`e3f9abe`](https://github.com/apache/spark/commit/e3f9abeaa82018835cd9a7055adba0dabc451a24).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...

2014-12-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3638#issuecomment-66201380
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24230/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...

2014-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3638#issuecomment-66201371
  
  [Test build #24230 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24230/consoleFull)
 for   PR 3638 at commit 
[`94844d7`](https://github.com/apache/spark/commit/94844d736ed0d8322e2e0dda762961a9170d6a1d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class TaskNotSerializableException(error: Throwable) extends 
Exception(error)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-66199309
  
Left one minor code organization comment; aside from that, this looks good 
to me and should be ready to merge after you fix that up (I can do it if you 
don't have time, though; just let me know).

There are a couple of edits that I'd like to make to the commit title / 
description before merging this, but I can do it myself on merge.

Thanks for the careful analysis and for catching this issue!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-08 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3574#discussion_r21491375
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -1010,7 +1010,10 @@ private[spark] class BlockManager(
   info.synchronized {
 // required ? As of now, this will be invoked only for blocks 
which are ready
--- End diff --

This comment actually refers to the `!info.waitForReady()` case, so I'd 
like to either move the comment or swap the order of these checks so that we 
check for `blockInfo.get(blockId).isEmpty` in the `else if` clause instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] SPARK-2450 Adds exeuctor log links to We...

2014-12-08 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3486#discussion_r21491005
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala ---
@@ -183,6 +193,16 @@ trait SparkListener {
* Called when the driver receives task metrics from an executor in a 
heartbeat.
*/
   def onExecutorMetricsUpdate(executorMetricsUpdate: 
SparkListenerExecutorMetricsUpdate) { }
+
+  /**
+   * Called when the driver registers a new executor.
+   */
+  def onExecutorAdded(executorAdded: SparkListenerExecutorAdded) { }
--- End diff --

Ah wait. I see. These methods have default implementations, so they'll only 
affect people extending `SparkListener` from Java. Still, we should probably 
save these events to the log for replay later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] SPARK-2450 Adds exeuctor log links to We...

2014-12-08 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3486#discussion_r21490919
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala ---
@@ -183,6 +193,16 @@ trait SparkListener {
* Called when the driver receives task metrics from an executor in a 
heartbeat.
*/
   def onExecutorMetricsUpdate(executorMetricsUpdate: 
SparkListenerExecutorMetricsUpdate) { }
+
+  /**
+   * Called when the driver registers a new executor.
+   */
+  def onExecutorAdded(executorAdded: SparkListenerExecutorAdded) { }
--- End diff --

BTW doesn't this break the build? There are a few listeners in Spark code 
itself (e.g. `EventLoggingListener`) which should have broken because of this.

(BTW fixing that listener means you'll probably need to touch 
`JsonProtocol` to serialize these new events to the event log... and you'll 
need to be careful not to keep the log URLs in the replayed UIs since they'll 
most probably be broken links at that point. Meaning that probably the UI 
listener should nuke the log URLs when the "executor removed" message is 
handled.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Cdh5

2014-12-08 Thread orenmazor

Github user orenmazor closed the pull request at:

https://github.com/apache/spark/pull/3639


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 186 matches

Mail list logo