[GitHub] spark pull request: [SPARK-4057] Use -agentlib instead of -Xdebug ...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2904#issuecomment-60196011
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22063/consoleFull)
 for   PR 2904 at commit 
[`26b4af8`](https://github.com/apache/spark/commit/26b4af8ffc82aca784df6c4b4fd38e9083babc54).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-3795] Heuristics for dynamically s...

2014-10-23 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2746#issuecomment-60196383
  
@sryza just so I understand. I tell YARN I want 10 executors to be pending. 
Then say YARN grants me two executors. Does it internally decrement the pending 
number to 8 (and can I read back that state?). Or could we just infer that it 
has decremented the counter based on getting new executors? How would it work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4055][MLlib] Inconsistent spelling 'MLl...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2903#issuecomment-60196521
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22064/consoleFull)
 for   PR 2903 at commit 
[`b031640`](https://github.com/apache/spark/commit/b0316405074a617b1573bdd1c8285fc043835f82).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3988][SQL] add public API for date type

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2901#issuecomment-60196729
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22058/consoleFull)
 for   PR 2901 at commit 
[`444f100`](https://github.com/apache/spark/commit/444f10018326ca47676b46f5801eb7ee83b62241).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DateType(PrimitiveType):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3988][SQL] add public API for date type

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2901#issuecomment-60196733
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22058/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [BUILD] Fixed resolver for scalastyle plugin a...

2014-10-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2877


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4058] [PySpark] Log file name is hard c...

2014-10-23 Thread sarutak
GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/2905

[SPARK-4058] [PySpark] Log file name is hard coded even though there is a 
variable '$LOG_FILE '

In a script 'python/run-tests', log file name is represented by a variable 
'LOG_FILE' and it is used in run-tests. But, there are some hard-coded log file 
name in the script.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark SPARK-4058

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2905.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2905


commit 7710490e2c38e202c29e35445a77f1a070fbd678
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Date:   2014-10-23T06:15:04Z

Fixed python/run-tests not to use hard-coded log file name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-3795] Heuristics for dynamically s...

2014-10-23 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/2746#issuecomment-60197171
  
So yeah it internally decrements the pending number to 8.  The app can and 
is expected to infer YARN has decremented the counter.  Maybe TMI, but for 
getting a grasp on it, it might be helpful to understand the race conditions 
this approach exposes - i.e. there are situations where YARN can overallocate.  
For example imagine you requested 10 and then you decide you want 11. YARN just 
got 2 for you and decremented its counter to 8.  You might tell YARN you want 
11 before finding out about the 2 YARN is giving to you, which means you would 
overwrite the 8 with 11.  In the brief period before you can go back to YARN 
and tell it you only want 9 now, it could conceivably give you 11 containers, 
for a total of 13, which is more than you ever asked for.  The app is expected 
to handle these situations and release allocated containers that it doesn't 
need.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3988][SQL] add public API for date type

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2901#issuecomment-60197221
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22059/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3988][SQL] add public API for date type

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2901#issuecomment-60197217
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22059/consoleFull)
 for   PR 2901 at commit 
[`f760d8e`](https://github.com/apache/spark/commit/f760d8e6344a7bbfa49dbfb9324cf5b0cdba9223).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DateType(PrimitiveType):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4058] [PySpark] Log file name is hard c...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2905#issuecomment-60197430
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22065/consoleFull)
 for   PR 2905 at commit 
[`7710490`](https://github.com/apache/spark/commit/7710490e2c38e202c29e35445a77f1a070fbd678).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2087#issuecomment-60198186
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22061/consoleFull)
 for   PR 2087 at commit 
[`23010b8`](https://github.com/apache/spark/commit/23010b850b28fccd9b33b0352c4bc2cb5f5dd45c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2087#issuecomment-60198189
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22061/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-4041][SQL]attributes names in table sca...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2884#issuecomment-60198477
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/411/consoleFull)
 for   PR 2884 at commit 
[`3ff3a80`](https://github.com/apache/spark/commit/3ff3a8094f0d5c6aa50a53ac6b08345c1c7a3f69).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4055][MLlib] Inconsistent spelling 'MLl...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2903#issuecomment-60198488
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22062/consoleFull)
 for   PR 2903 at commit 
[`272e41e`](https://github.com/apache/spark/commit/272e41e6ce363a4c6386a9aff7c11a03df525281).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-3795] Heuristics for dynamically s...

2014-10-23 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2746#issuecomment-60198472
  
Yep - that's exactly what I was wondering about. If YARN doesn't expose the
internal counter it seems like there is a race (actually even if it does
expose it, there still is a minor race where you could read it and then
reset it but it changes in the middle). I guess we just live with it...

On Wed, Oct 22, 2014 at 11:21 PM, Sandy Ryza notificati...@github.com
wrote:

 So yeah it internally decrements the pending number to 8. The app can and
 is expected to infer YARN has decremented the counter. Maybe TMI, but for
 getting a grasp on it, it might be helpful to understand the race
 conditions this approach exposes - i.e. there are situations where YARN 
can
 overallocate. For example imagine you requested 10 and then you decide you
 want 11. YARN just got 2 for you and decremented its counter to 8. You
 might tell YARN you want 11 before finding out about the 2 YARN is giving
 to you, which means you would overwrite the 8 with 11. In the brief period
 before you can go back to YARN and tell it you only want 9 now, it could
 conceivably give you 11 containers, for a total of 13, which is more than
 you ever asked for. The app is expected to handle these situations and
 release allocated containers that it doesn't need.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/2746#issuecomment-60197171.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-60198448
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22066/consoleFull)
 for   PR 2520 at commit 
[`f5400bd`](https://github.com/apache/spark/commit/f5400bd1d06198d9b4ad02b8974957174c9668cb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4055][MLlib] Inconsistent spelling 'MLl...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2903#issuecomment-60198490
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22062/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4032] Deprecate YARN alpha support in S...

2014-10-23 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2878#issuecomment-60198553
  
Yeah, maybe the output here is too noisy for it to be noticable. I agree 
having something in the Client itself is a good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4037][SQL] Removes the SessionState ins...

2014-10-23 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2887#discussion_r19261442
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -150,10 +150,12 @@ class HiveThriftServer2Suite extends FunSuite with 
Logging {
   val dataFilePath =
 
Thread.currentThread().getContextClassLoader.getResource(data/files/small_kv.txt)
 
-  val queries = Seq(
-CREATE TABLE test(key INT, val STRING),
-sLOAD DATA LOCAL INPATH '$dataFilePath' OVERWRITE INTO TABLE 
test,
-CACHE TABLE test)
+  val queries =
+sSET spark.sql.shuffle.partitions=3;
--- End diff --

This SET command is used as a regression test of SPARK-4037.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-4041][SQL]attributes names in table sca...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2884#issuecomment-60198722
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/411/consoleFull)
 for   PR 2884 at commit 
[`3ff3a80`](https://github.com/apache/spark/commit/3ff3a8094f0d5c6aa50a53ac6b08345c1c7a3f69).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60200516
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22060/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2615#issuecomment-60200512
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22060/consoleFull)**
 for PR 2615 at commit 
[`897ec60`](https://github.com/apache/spark/commit/897ec603b3e07cb9ce4dda1fea4abdf30466493e)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4026][Streaming] Write ahead log manage...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2882#issuecomment-60201254
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22067/consoleFull)
 for   PR 2882 at commit 
[`3881706`](https://github.com/apache/spark/commit/38817069e66cc8c161cc2a8033873a3342cff4e2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4019] [SPARK-3740] Fix MapStatus compre...

2014-10-23 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/2866#issuecomment-60201281
  
@JoshRosen thanks for doing this. There is a chance that a normal hashset 
is much slower than a bitmap. Can you test that? It might make a lot more sense 
to use an uncompressed bitmap to track after deserialization instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4055][MLlib] Inconsistent spelling 'MLl...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2903#issuecomment-60201398
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22064/consoleFull)
 for   PR 2903 at commit 
[`b031640`](https://github.com/apache/spark/commit/b0316405074a617b1573bdd1c8285fc043835f82).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  throw new SparkException(Failed to load class to register 
with Kryo, e)`
  * `class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], 
Array[T])])`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4055][MLlib] Inconsistent spelling 'MLl...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2903#issuecomment-60201403
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22064/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-4041][SQL]attributes names in table sca...

2014-10-23 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2884#issuecomment-60201488
  
test failed due to streaming compile error, can you retest this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4058] [PySpark] Log file name is hard c...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2905#issuecomment-60202537
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22065/consoleFull)
 for   PR 2905 at commit 
[`7710490`](https://github.com/apache/spark/commit/7710490e2c38e202c29e35445a77f1a070fbd678).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4058] [PySpark] Log file name is hard c...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2905#issuecomment-60202542
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22065/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-60203866
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22066/consoleFull)
 for   PR 2520 at commit 
[`f5400bd`](https://github.com/apache/spark/commit/f5400bd1d06198d9b4ad02b8974957174c9668cb).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2520#issuecomment-60203876
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22066/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4026][Streaming] Write ahead log manage...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2882#issuecomment-60208450
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22067/consoleFull)
 for   PR 2882 at commit 
[`3881706`](https://github.com/apache/spark/commit/38817069e66cc8c161cc2a8033873a3342cff4e2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class LogInfo(startTime: Long, endTime: Long, path: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4026][Streaming] Write ahead log manage...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2882#issuecomment-60208456
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22067/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...

2014-10-23 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/2615#discussion_r19265089
  
--- Diff: dev/change-version-to-2.10.sh ---
@@ -0,0 +1,20 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+find -name 'pom.xml' -exec sed -i 's|\(artifactId.*\)_2.11|\1_2.10|g' {}  
\;
--- End diff --

I tried that, unfortunately in effective pom(s) that stays as is (i.e. 
$scala.version is not changed to 2.10).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-4041][SQL]attributes names in table sca...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2884#issuecomment-60209325
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/412/consoleFull)
 for   PR 2884 at commit 
[`3ff3a80`](https://github.com/apache/spark/commit/3ff3a8094f0d5c6aa50a53ac6b08345c1c7a3f69).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-4041][SQL]attributes names in table sca...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2884#issuecomment-60209693
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/412/consoleFull)
 for   PR 2884 at commit 
[`3ff3a80`](https://github.com/apache/spark/commit/3ff3a8094f0d5c6aa50a53ac6b08345c1c7a3f69).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4026][Streaming] Write ahead log manage...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2882#issuecomment-60213859
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22068/consoleFull)
 for   PR 2882 at commit 
[`9514dc8`](https://github.com/apache/spark/commit/9514dc833c9c30be12eeb64fb4580c2e6f1adb4f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread yu-iskw
GitHub user yu-iskw opened a pull request:

https://github.com/apache/spark/pull/2906

[SPARK-2429] [MLlib] Hierarchical Implementation of KMeans

I want to add a divisive hierarchical clustering algorithm implementation 
to MLlib. I don't support distance metrics other Euclidean distance metric yet. 
It would be nice to support it at other issue.
Could you review it?

Thanks!


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yu-iskw/spark hierarchical

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2906.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2906






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4026][Streaming] Write ahead log manage...

2014-10-23 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/2882#issuecomment-60213891
  
@JoshRosen
@harishreedharan  addressed all your comments, and also simplified the 
writer code
I did some further cleanups, and also added two new unit tests that test 
the writer and manager with corrupted writes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-60214129
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...

2014-10-23 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/2673#issuecomment-60214941
  
@ScrapCodes @pwendell  
This patch will cause `maven-assembly-plugin` error:
 ` ./make-distribution.sh -Dhadoop.version=2.3.0-cdh5.0.1 
-Dyarn.version=2.3.0-cdh5.0.1 -Phadoop-2.3 -Pyarn  -Pnetlib-lgpl` 
 
`du -sh dist/lib/*`
```
4.0Kdist/lib/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar
928Kdist/lib/spark-examples-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4057] Use -agentlib instead of -Xdebug ...

2014-10-23 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2904#issuecomment-60215591
  
+1 I can confirm that `-Xdebug` went away in Java 5 I think and this is the 
modern invocation of the debugger.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread rnowling
Github user rnowling commented on a diff in the pull request:

https://github.com/apache/spark/pull/2906#discussion_r19267797
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClusteringModel.scala
 ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import breeze.linalg.{DenseVector = BDV, Vector = BV, norm = breezeNorm}
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.mllib.linalg.Vector
+import org.apache.spark.rdd.RDD
+
+/**
+ * this class is used for the model of the hierarchical clustering
+ *
+ * @param clusterTree a cluster as a tree node
+ * @param trainTime the milliseconds for executing a training
+ * @param predictTime the milliseconds for executing a prediction
+ * @param isTrained if the model has been trained, the flag is true
+ */
+class HierarchicalClusteringModel private (
+  val clusterTree: ClusterTree,
+  var trainTime: Int,
+  var predictTime: Int,
+  var isTrained: Boolean) extends Serializable {
+
+  def this(clusterTree: ClusterTree) = this(clusterTree, 0, 0, false)
+
+  def getClusters(): Array[ClusterTree] = clusterTree.getClusters().toArray
+
+  def getCenters(): Array[Vector] = getClusters().map(_.center)
+
+  /**
+   * Predicts the closest cluster of each point
+   */
+  def predict(vector: Vector): Int = {
+// TODO Supports distance metrics other Euclidean distance metric
+val metric = (bv1: BV[Double], bv2: BV[Double]) = breezeNorm(bv1 - 
bv2, 2.0)
+this.clusterTree.assignClusterIndex(metric)(vector)
+  }
+
+  /**
+   * Predicts the closest cluster of each point
+   */
+  def predict(data: RDD[Vector]): RDD[(Int, Vector)] = {
+val startTime = System.currentTimeMillis() // to measure the execution 
time
+
+// TODO Supports distance metrics other Euclidean distance metric
+val metric = (bv1: BV[Double], bv2: BV[Double]) = breezeNorm(bv1 - 
bv2, 2.0)
+val centers = getClusters().map(_.center.toBreeze)
+val treeRoot = this.clusterTree
+val closestClusterIndexFinder = treeRoot.assignClusterIndex(metric) _
+data.sparkContext.broadcast(closestClusterIndexFinder)
+val predicted = data.map(point = (closestClusterIndexFinder(point), 
point))
--- End diff --

I don't think you're using the broadcast variable correctly:


http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-4041][SQL]attributes names in table sca...

2014-10-23 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2884#issuecomment-60216081
  
Hm, the failure was caused by a known Jenkins configuration issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread rnowling
Github user rnowling commented on a diff in the pull request:

https://github.com/apache/spark/pull/2906#discussion_r19267891
  
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -91,6 +99,58 @@ def train(cls, rdd, k, maxIterations=100, runs=1, 
initializationMode=k-means||
 return KMeansModel([c.toArray() for c in centers])
 
 
+class HierarchicalClusteringModel(ClusteringModel):
--- End diff --

The predict method seems to be O(kN) but you can do assignment in O(Nlog k) 
time with the tree, right?  (N is the number of data points, k is the number of 
cluster centers).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3954][Streaming] promote the speed of c...

2014-10-23 Thread surq
Github user surq commented on the pull request:

https://github.com/apache/spark/pull/2811#issuecomment-60217711
  
Does someone  take notice of this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: MLlib, exposing special rdd functions to the p...

2014-10-23 Thread numbnut
GitHub user numbnut opened a pull request:

https://github.com/apache/spark/pull/2907

MLlib, exposing special rdd functions to the public



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/numbnut/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2907.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2907


commit b3d8945d6fa0bc28b90a8409ced29fd78b34e752
Author: Niklas Wilcke 1wil...@informatik.uni-hamburg.de
Date:   2014-10-23T09:43:27Z

expose mllib specific rdd functions to the public




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2907#issuecomment-60218336
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4061] We cannot use EOL character in th...

2014-10-23 Thread sarutak
GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/2908

[SPARK-4061] We cannot use EOL character in the operand of LIKE predicate.

We cannot use EOL character like \n or \r in the operand of LIKE predicate.
So following condition is never true.

-- someStr is 'hoge\nfuga'
where someStr LIKE 'hoge_fuga'



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark 
spark-sql-like-match-modification

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2908.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2908


commit 38f66519ae95ec5d41705fc499e2cd658de4
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Date:   2014-10-23T10:07:14Z

Fixed LIKE predicate so that we can use EOL character as in a operand




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4061] We cannot use EOL character in th...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2908#issuecomment-60218997
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22069/consoleFull)
 for   PR 2908 at commit 
[`38f6651`](https://github.com/apache/spark/commit/38f66519ae95ec5d41705fc499e2cd658de4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4026][Streaming] Write ahead log manage...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2882#issuecomment-60219275
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22068/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4026][Streaming] Write ahead log manage...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2882#issuecomment-60219269
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22068/consoleFull)
 for   PR 2882 at commit 
[`9514dc8`](https://github.com/apache/spark/commit/9514dc833c9c30be12eeb64fb4580c2e6f1adb4f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class LogInfo(startTime: Long, endTime: Long, path: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2886#issuecomment-60220879
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22070/consoleFull)
 for   PR 2886 at commit 
[`df9d98f`](https://github.com/apache/spark/commit/df9d98fe6703f6cc37fb0187fa55d140f37bb50e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3900][YARN] ApplicationMaster's shutdow...

2014-10-23 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2755#issuecomment-60220899
  
/CC @tgravescs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: specify unidocGenjavadocVersion of 0.8

2014-10-23 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2893#issuecomment-60221337
  
This is for SPARK-3359. LGTM, thank you. This gets past some errors, and 
turns up more, which I'll comment on in the JIRA. But this is a step forward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2886#issuecomment-60221362
  
@andrewor14  - thanks for the comments. I believe I fixed them all. Let me 
know!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2886#issuecomment-60221739
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22072/consoleFull)
 for   PR 2886 at commit 
[`094d508`](https://github.com/apache/spark/commit/094d508fed9aa57beb60d7a571cbe7c1e3b334c1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2886#issuecomment-60222452
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22071/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2886#issuecomment-60222733
  
the failure seems technical (not related to my fix), I think. Local maven 
build works fine for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4061] We cannot use EOL character in th...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2908#issuecomment-60223794
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22069/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4061] We cannot use EOL character in th...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2908#issuecomment-60223791
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22069/consoleFull)
 for   PR 2908 at commit 
[`38f6651`](https://github.com/apache/spark/commit/38f66519ae95ec5d41705fc499e2cd658de4).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2886#issuecomment-60227754
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22070/consoleFull)
 for   PR 2886 at commit 
[`df9d98f`](https://github.com/apache/spark/commit/df9d98fe6703f6cc37fb0187fa55d140f37bb50e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2886#issuecomment-60227762
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22070/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2886#issuecomment-60228517
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22072/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2886#issuecomment-60228510
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22072/consoleFull)
 for   PR 2886 at commit 
[`094d508`](https://github.com/apache/spark/commit/094d508fed9aa57beb60d7a571cbe7c1e3b334c1).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Clarify docstring for Pyspark's foreachPartiti...

2014-10-23 Thread tdhopper
Github user tdhopper commented on the pull request:

https://github.com/apache/spark/pull/2895#issuecomment-60234425
  
Oh. Now that I look at master, @JoshRosen, I see that it's already been 
fixed by @davis 
[here](https://github.com/apache/spark/commit/1789cd46e38d1426deb6a4b14bddcbb8c751f585).
 The fix just isn't in 1.1. I guess we should close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] make i...

2014-10-23 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/2524#issuecomment-60237457
  
ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3900][YARN] ApplicationMaster's shutdow...

2014-10-23 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2755#issuecomment-60237621
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3900][YARN] ApplicationMaster's shutdow...

2014-10-23 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2755#issuecomment-60237859
  
Changes look good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3904] [SQL] add constant objectinspecto...

2014-10-23 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2762#issuecomment-60242669
  
Thank you @liancheng, I've updated the code accordingly.

You're right, the conversion is not so efficient, probably we need to add 
some Expression nodes for the data conversion, let's do that in the follow-ups.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3900][YARN] ApplicationMaster's shutdow...

2014-10-23 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2755#issuecomment-60250844
  
Hm... test wouldn't start...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4061][SQL] We cannot use EOL character ...

2014-10-23 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2908#issuecomment-60256774
  
Good catch! Would you mind to add a unit test for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4061][SQL] We cannot use EOL character ...

2014-10-23 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2908#discussion_r19284542
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -103,21 +103,21 @@ case class Like(left: Expression, right: Expression)
   // replace the _ with .{1} exactly match 1 time of any character
   // replace the % with .*, match 0 or more times with any character
   override def escape(v: String) = {
-val sb = new StringBuilder()
-var i = 0;
+val sb = new StringBuilder((?s))
+var i = 0
 while (i  v.length) {
   // Make a special case for \\_ and \\%
-  val n = v.charAt(i);
+  val n = v.charAt(i)
   if (n == '\\'  i + 1  v.length  (v.charAt(i + 1) == '_' || 
v.charAt(i + 1) == '%')) {
 sb.append(v.charAt(i + 1))
 i += 1
   } else {
 if (n == '_') {
-  sb.append(.);
+  sb.append(.)
 } else if (n == '%') {
-  sb.append(.*);
+  sb.append(.*)
 } else {
-  sb.append(Pattern.quote(Character.toString(n)));
+  sb.append(Pattern.quote(Character.toString(n)))
 }
   }
--- End diff --

I feel a little complicated about this... This function is not on critical 
path, so I'd like to refactor it in a more functional and readable (but less 
efficient) way, for example:

```scala
  override def escape(v: String) = (?s) + (' ' +: v.init).zip(v).flatMap {
case (prefix, '_') = if (prefix == '\\') _ else .
case (prefix, '%') = if (prefix == '\\') % else .*
case (_, ch) = Character.toString(ch)
  }.mkString
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1567#issuecomment-60257141
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22073/consoleFull)
 for   PR 1567 at commit 
[`76f474e`](https://github.com/apache/spark/commit/76f474e41a172d5128f99c9ae71c7b802b9114fa).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3359 [DOCS] sbt/sbt unidoc doesn't work ...

2014-10-23 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/2909

SPARK-3359 [DOCS] sbt/sbt unidoc doesn't work with Java 8

This follows https://github.com/apache/spark/pull/2893 , but does not 
completely fix SPARK-3359 either. This fixes minor scaladoc/javadoc issues that 
Javadoc 8 will treat as errors.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-3359

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2909.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2909


commit f62c347e2df9d7e63653c2bf42004e86f7a80b27
Author: Sean Owen so...@cloudera.com
Date:   2014-10-23T15:55:22Z

Fix some javadoc issues that javadoc 8 considers errors. This is not all of 
the errors turned up when javadoc 8 runs on output of genjavadoc.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3359 [DOCS] sbt/sbt unidoc doesn't work ...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2909#issuecomment-60262260
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22074/consoleFull)
 for   PR 2909 at commit 
[`f62c347`](https://github.com/apache/spark/commit/f62c347e2df9d7e63653c2bf42004e86f7a80b27).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4055][MLlib] Inconsistent spelling 'MLl...

2014-10-23 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2903#issuecomment-60265674
  
LGTM. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2906#discussion_r19288355
  
--- Diff: docs/mllib-clustering.md ---
@@ -153,3 +157,152 @@ provided in the [Self-Contained 
Applications](quick-start.html#self-contained-ap
 section of the Spark
 Quick Start guide. Be sure to also include *spark-mllib* to your build 
file as
 a dependency.
+
+
+### Hierarchical Clustering
+
+MLlib supports
+[hierarchical 
clustering](http://en.wikipedia.org/wiki/Hierarchical_clustering), one of the 
most commonly used clustering algorithm which seeks to build a hierarchy of 
clusters.
+Strategies for hierarchical clustering generally fall into two types.
+One is the agglomerative clustering which is a bottom up approach: each 
observation starts in its own cluster, and pairs of clusters are merged as one 
moves up the hierarchy.
+The other is the divisive clustering which is a top down approach: all 
observations start in one cluster, and splits are performed recursively as one 
moves down the hierarchy.
+The MLlib implementation only includes a divisive hierarchical clustering 
algorithm.
+
+The implementation in MLlib has the following parameters:
+
+* *k* is the number of maximum desired clusters. 
+* *subIterations* is the maximum number of iterations to split a cluster 
to its 2 sub clusters.
+* *numRetries* is the maximum number of retries if a splitting doesn't 
work as expected.
+* *epsilon* determines the saturate threshold to consider the splitting to 
have converged.
+
+
+
+### Hierarchical Clustering Example
+
+div class=codetabs
+
+div data-lang=scala markdown=1
+The following code snippets can be executed in `spark-shell`.
+
+In the following example after loading and parsing data, 
+we use the hierarchical clustering object to cluster the sample data into 
three clusters. 
+The number of desired clusters is passed to the algorithm. 
+Hoerver, even though the number of clusters is less than *k* in the middle 
of the clustering,
--- End diff --

Horever - However, and 'not be splitted' - 'not be split'


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1567#issuecomment-60266002
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22073/consoleFull)
 for   PR 1567 at commit 
[`76f474e`](https://github.com/apache/spark/commit/76f474e41a172d5128f99c9ae71c7b802b9114fa).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class GroupingSet(bitmasks: Seq[Int], `
  * `case class Cube(groupByExprs: Seq[Expression],`
  * `case class Rollup(groupByExprs: Seq[Expression],`
  * `case class VirtualColumn(name: String, dataType: DataType = 
StringType, nullable: Boolean = false)`
  * `case class GroupingSetExpansion(`
  * `case class GroupingSetExpansion(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1567#issuecomment-60266012
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22073/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4052][SQL] Use scala.collection.Map for...

2014-10-23 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2899#discussion_r19288406
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.hive.execution
 
 import scala.collection.JavaConversions._
+import scala.collection.Map
--- End diff --

I think it's better to use `scala.collection.Map` explicitly in the code 
below, and add comment to explain. Another reason that makes putting this line 
here dangerous is that imports can be easily reorganized automatically by IDEs, 
which are sometimes not smart enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2906#discussion_r19288604
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
 ---
@@ -0,0 +1,549 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import breeze.linalg.{DenseVector = BDV, Vector = BV, norm = breezeNorm}
+import org.apache.spark.Logging
+import org.apache.spark.SparkContext._
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/**
+ * the configuration for a hierarchical clustering algorithm
+ *
+ * @param numClusters the number of clusters you want
+ * @param subIterations the number of iterations at digging
+ * @param epsilon the threshold to stop the sub-iterations
+ * @param randomSeed uses in sampling data for initializing centers in 
each sub iterations
+ * @param randomRange the range coefficient to generate random points in 
each clustering step
+ */
+class HierarchicalClusteringConf(
+  private var numClusters: Int,
+  private var subIterations: Int,
+  private var numRetries: Int,
+  private var epsilon: Double,
+  private var randomSeed: Int,
+  private[mllib] var randomRange: Double) extends Serializable {
+
+  def this() = this(20, 5, 20, 10E-6, 1, 0.1)
+
+  def setNumClusters(numClusters: Int): this.type = {
--- End diff --

This may be my Scala ignorance, but if the constructor params aren't 
private, don't you get setters for free? I see you're going for a fluent style, 
and that makes sense, but I don't know of the other conf-like or algo-like 
classes do this. Pretty minor and I could be wrong but consider whether it's 
worth the code and consistency issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2906#discussion_r19288634
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
 ---
@@ -0,0 +1,549 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import breeze.linalg.{DenseVector = BDV, Vector = BV, norm = breezeNorm}
+import org.apache.spark.Logging
+import org.apache.spark.SparkContext._
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/**
+ * the configuration for a hierarchical clustering algorithm
+ *
+ * @param numClusters the number of clusters you want
+ * @param subIterations the number of iterations at digging
+ * @param epsilon the threshold to stop the sub-iterations
+ * @param randomSeed uses in sampling data for initializing centers in 
each sub iterations
+ * @param randomRange the range coefficient to generate random points in 
each clustering step
+ */
+class HierarchicalClusteringConf(
+  private var numClusters: Int,
+  private var subIterations: Int,
+  private var numRetries: Int,
+  private var epsilon: Double,
+  private var randomSeed: Int,
+  private[mllib] var randomRange: Double) extends Serializable {
+
+  def this() = this(20, 5, 20, 10E-6, 1, 0.1)
+
+  def setNumClusters(numClusters: Int): this.type = {
+this.numClusters = numClusters
+this
+  }
+
+  def getNumClusters(): Int = this.numClusters
+
+  def setSubIterations(iterations: Int): this.type = {
+this.subIterations = iterations
+this
+  }
+
+  def setNumRetries(numRetries: Int): this.type = {
+this.numRetries = numRetries
+this
+  }
+
+  def getNumRetries(): Int = this.numRetries
+
+  def getSubIterations(): Int = this.subIterations
+
+  def setEpsilon(epsilon: Double): this.type = {
+this.epsilon = epsilon
+this
+  }
+
+  def getEpsilon(): Double = this.epsilon
+
+  def setRandomSeed(seed: Int): this.type = {
+this.randomSeed = seed
+this
+  }
+
+  def getRandomSeed(): Int = this.randomSeed
+
+  def setRandomRange(range: Double): this.type = {
+this.randomRange = range
+this
+  }
+}
+
+
+/**
+ * This is a divisive hierarchical clustering algorithm based on bi-sect 
k-means algorithm.
+ *
+ * @param conf the configuration class for the hierarchical clustering
+ */
+class HierarchicalClustering(val conf: HierarchicalClusteringConf)
+extends Serializable with Logging {
+
+  /**
+   * Constructs with the default configuration
+   */
+  def this() = this(new HierarchicalClusteringConf())
+
+  /**
+   * Trains a hierarchical clustering model with the given configuration
+   *
+   * @param data training points
+   * @return a model for hierarchical clustering
+   */
+  def run(data: RDD[Vector]): HierarchicalClusteringModel = {
+validateData(data)
+logInfo(sRun with ${conf.toString})
--- End diff --

Trivial but can this be just `$conf`? and similarly for other format strings


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2906#discussion_r19288713
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
 ---
@@ -0,0 +1,549 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import breeze.linalg.{DenseVector = BDV, Vector = BV, norm = breezeNorm}
+import org.apache.spark.Logging
+import org.apache.spark.SparkContext._
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/**
+ * the configuration for a hierarchical clustering algorithm
+ *
+ * @param numClusters the number of clusters you want
+ * @param subIterations the number of iterations at digging
+ * @param epsilon the threshold to stop the sub-iterations
+ * @param randomSeed uses in sampling data for initializing centers in 
each sub iterations
+ * @param randomRange the range coefficient to generate random points in 
each clustering step
+ */
+class HierarchicalClusteringConf(
+  private var numClusters: Int,
+  private var subIterations: Int,
+  private var numRetries: Int,
+  private var epsilon: Double,
+  private var randomSeed: Int,
+  private[mllib] var randomRange: Double) extends Serializable {
+
+  def this() = this(20, 5, 20, 10E-6, 1, 0.1)
+
+  def setNumClusters(numClusters: Int): this.type = {
+this.numClusters = numClusters
+this
+  }
+
+  def getNumClusters(): Int = this.numClusters
+
+  def setSubIterations(iterations: Int): this.type = {
+this.subIterations = iterations
+this
+  }
+
+  def setNumRetries(numRetries: Int): this.type = {
+this.numRetries = numRetries
+this
+  }
+
+  def getNumRetries(): Int = this.numRetries
+
+  def getSubIterations(): Int = this.subIterations
+
+  def setEpsilon(epsilon: Double): this.type = {
+this.epsilon = epsilon
+this
+  }
+
+  def getEpsilon(): Double = this.epsilon
+
+  def setRandomSeed(seed: Int): this.type = {
+this.randomSeed = seed
+this
+  }
+
+  def getRandomSeed(): Int = this.randomSeed
+
+  def setRandomRange(range: Double): this.type = {
+this.randomRange = range
+this
+  }
+}
+
+
+/**
+ * This is a divisive hierarchical clustering algorithm based on bi-sect 
k-means algorithm.
+ *
+ * @param conf the configuration class for the hierarchical clustering
+ */
+class HierarchicalClustering(val conf: HierarchicalClusteringConf)
+extends Serializable with Logging {
+
+  /**
+   * Constructs with the default configuration
+   */
+  def this() = this(new HierarchicalClusteringConf())
+
+  /**
+   * Trains a hierarchical clustering model with the given configuration
+   *
+   * @param data training points
+   * @return a model for hierarchical clustering
+   */
+  def run(data: RDD[Vector]): HierarchicalClusteringModel = {
+validateData(data)
+logInfo(sRun with ${conf.toString})
+
+val startTime = System.currentTimeMillis() // to measure the execution 
time
+val clusterTree = ClusterTree.fromRDD(data) // make the root node
+val model = new HierarchicalClusteringModel(clusterTree)
+val statsUpdater = new ClusterTreeStatsUpdater()
+
+var node: Option[ClusterTree] = Some(model.clusterTree)
+statsUpdater(node.get)
+
+// If the followed conditions are satisfied, and then stop the 
training.
+//   1. There is no splittable cluster
+//   2. The number of the splitted clusters is greater than that of 
given clusters
+//   3. The total variance of all clusters increases, when a cluster 
is splitted
+var totalVariance = Double.MaxValue
+var newTotalVariance = 

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2906#discussion_r19288686
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
 ---
@@ -0,0 +1,549 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import breeze.linalg.{DenseVector = BDV, Vector = BV, norm = breezeNorm}
+import org.apache.spark.Logging
+import org.apache.spark.SparkContext._
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/**
+ * the configuration for a hierarchical clustering algorithm
+ *
+ * @param numClusters the number of clusters you want
+ * @param subIterations the number of iterations at digging
+ * @param epsilon the threshold to stop the sub-iterations
+ * @param randomSeed uses in sampling data for initializing centers in 
each sub iterations
+ * @param randomRange the range coefficient to generate random points in 
each clustering step
+ */
+class HierarchicalClusteringConf(
+  private var numClusters: Int,
+  private var subIterations: Int,
+  private var numRetries: Int,
+  private var epsilon: Double,
+  private var randomSeed: Int,
+  private[mllib] var randomRange: Double) extends Serializable {
+
+  def this() = this(20, 5, 20, 10E-6, 1, 0.1)
+
+  def setNumClusters(numClusters: Int): this.type = {
+this.numClusters = numClusters
+this
+  }
+
+  def getNumClusters(): Int = this.numClusters
+
+  def setSubIterations(iterations: Int): this.type = {
+this.subIterations = iterations
+this
+  }
+
+  def setNumRetries(numRetries: Int): this.type = {
+this.numRetries = numRetries
+this
+  }
+
+  def getNumRetries(): Int = this.numRetries
+
+  def getSubIterations(): Int = this.subIterations
+
+  def setEpsilon(epsilon: Double): this.type = {
+this.epsilon = epsilon
+this
+  }
+
+  def getEpsilon(): Double = this.epsilon
+
+  def setRandomSeed(seed: Int): this.type = {
+this.randomSeed = seed
+this
+  }
+
+  def getRandomSeed(): Int = this.randomSeed
+
+  def setRandomRange(range: Double): this.type = {
+this.randomRange = range
+this
+  }
+}
+
+
+/**
+ * This is a divisive hierarchical clustering algorithm based on bi-sect 
k-means algorithm.
+ *
+ * @param conf the configuration class for the hierarchical clustering
+ */
+class HierarchicalClustering(val conf: HierarchicalClusteringConf)
+extends Serializable with Logging {
+
+  /**
+   * Constructs with the default configuration
+   */
+  def this() = this(new HierarchicalClusteringConf())
+
+  /**
+   * Trains a hierarchical clustering model with the given configuration
+   *
+   * @param data training points
+   * @return a model for hierarchical clustering
+   */
+  def run(data: RDD[Vector]): HierarchicalClusteringModel = {
+validateData(data)
+logInfo(sRun with ${conf.toString})
+
+val startTime = System.currentTimeMillis() // to measure the execution 
time
+val clusterTree = ClusterTree.fromRDD(data) // make the root node
+val model = new HierarchicalClusteringModel(clusterTree)
+val statsUpdater = new ClusterTreeStatsUpdater()
+
+var node: Option[ClusterTree] = Some(model.clusterTree)
+statsUpdater(node.get)
+
+// If the followed conditions are satisfied, and then stop the 
training.
+//   1. There is no splittable cluster
+//   2. The number of the splitted clusters is greater than that of 
given clusters
+//   3. The total variance of all clusters increases, when a cluster 
is splitted
+var totalVariance = Double.MaxValue
+var newTotalVariance = 

[GitHub] spark pull request: [SPARK-4055][MLlib] Inconsistent spelling 'MLl...

2014-10-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2903


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2906#discussion_r19288793
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
 ---
@@ -0,0 +1,549 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import breeze.linalg.{DenseVector = BDV, Vector = BV, norm = breezeNorm}
+import org.apache.spark.Logging
+import org.apache.spark.SparkContext._
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/**
+ * the configuration for a hierarchical clustering algorithm
+ *
+ * @param numClusters the number of clusters you want
+ * @param subIterations the number of iterations at digging
+ * @param epsilon the threshold to stop the sub-iterations
+ * @param randomSeed uses in sampling data for initializing centers in 
each sub iterations
+ * @param randomRange the range coefficient to generate random points in 
each clustering step
+ */
+class HierarchicalClusteringConf(
+  private var numClusters: Int,
+  private var subIterations: Int,
+  private var numRetries: Int,
+  private var epsilon: Double,
+  private var randomSeed: Int,
+  private[mllib] var randomRange: Double) extends Serializable {
+
+  def this() = this(20, 5, 20, 10E-6, 1, 0.1)
+
+  def setNumClusters(numClusters: Int): this.type = {
+this.numClusters = numClusters
+this
+  }
+
+  def getNumClusters(): Int = this.numClusters
+
+  def setSubIterations(iterations: Int): this.type = {
+this.subIterations = iterations
+this
+  }
+
+  def setNumRetries(numRetries: Int): this.type = {
+this.numRetries = numRetries
+this
+  }
+
+  def getNumRetries(): Int = this.numRetries
+
+  def getSubIterations(): Int = this.subIterations
+
+  def setEpsilon(epsilon: Double): this.type = {
+this.epsilon = epsilon
+this
+  }
+
+  def getEpsilon(): Double = this.epsilon
+
+  def setRandomSeed(seed: Int): this.type = {
+this.randomSeed = seed
+this
+  }
+
+  def getRandomSeed(): Int = this.randomSeed
+
+  def setRandomRange(range: Double): this.type = {
+this.randomRange = range
+this
+  }
+}
+
+
+/**
+ * This is a divisive hierarchical clustering algorithm based on bi-sect 
k-means algorithm.
+ *
+ * @param conf the configuration class for the hierarchical clustering
+ */
+class HierarchicalClustering(val conf: HierarchicalClusteringConf)
+extends Serializable with Logging {
+
+  /**
+   * Constructs with the default configuration
+   */
+  def this() = this(new HierarchicalClusteringConf())
+
+  /**
+   * Trains a hierarchical clustering model with the given configuration
+   *
+   * @param data training points
+   * @return a model for hierarchical clustering
+   */
+  def run(data: RDD[Vector]): HierarchicalClusteringModel = {
+validateData(data)
+logInfo(sRun with ${conf.toString})
+
+val startTime = System.currentTimeMillis() // to measure the execution 
time
+val clusterTree = ClusterTree.fromRDD(data) // make the root node
+val model = new HierarchicalClusteringModel(clusterTree)
+val statsUpdater = new ClusterTreeStatsUpdater()
+
+var node: Option[ClusterTree] = Some(model.clusterTree)
+statsUpdater(node.get)
+
+// If the followed conditions are satisfied, and then stop the 
training.
+//   1. There is no splittable cluster
+//   2. The number of the splitted clusters is greater than that of 
given clusters
+//   3. The total variance of all clusters increases, when a cluster 
is splitted
+var totalVariance = Double.MaxValue
+var newTotalVariance = 

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2906#discussion_r19288871
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
 ---
@@ -0,0 +1,549 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import breeze.linalg.{DenseVector = BDV, Vector = BV, norm = breezeNorm}
+import org.apache.spark.Logging
+import org.apache.spark.SparkContext._
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/**
+ * the configuration for a hierarchical clustering algorithm
+ *
+ * @param numClusters the number of clusters you want
+ * @param subIterations the number of iterations at digging
+ * @param epsilon the threshold to stop the sub-iterations
+ * @param randomSeed uses in sampling data for initializing centers in 
each sub iterations
+ * @param randomRange the range coefficient to generate random points in 
each clustering step
+ */
+class HierarchicalClusteringConf(
+  private var numClusters: Int,
+  private var subIterations: Int,
+  private var numRetries: Int,
+  private var epsilon: Double,
+  private var randomSeed: Int,
+  private[mllib] var randomRange: Double) extends Serializable {
+
+  def this() = this(20, 5, 20, 10E-6, 1, 0.1)
+
+  def setNumClusters(numClusters: Int): this.type = {
+this.numClusters = numClusters
+this
+  }
+
+  def getNumClusters(): Int = this.numClusters
+
+  def setSubIterations(iterations: Int): this.type = {
+this.subIterations = iterations
+this
+  }
+
+  def setNumRetries(numRetries: Int): this.type = {
+this.numRetries = numRetries
+this
+  }
+
+  def getNumRetries(): Int = this.numRetries
+
+  def getSubIterations(): Int = this.subIterations
+
+  def setEpsilon(epsilon: Double): this.type = {
+this.epsilon = epsilon
+this
+  }
+
+  def getEpsilon(): Double = this.epsilon
+
+  def setRandomSeed(seed: Int): this.type = {
+this.randomSeed = seed
+this
+  }
+
+  def getRandomSeed(): Int = this.randomSeed
+
+  def setRandomRange(range: Double): this.type = {
+this.randomRange = range
+this
+  }
+}
+
+
+/**
+ * This is a divisive hierarchical clustering algorithm based on bi-sect 
k-means algorithm.
+ *
+ * @param conf the configuration class for the hierarchical clustering
+ */
+class HierarchicalClustering(val conf: HierarchicalClusteringConf)
+extends Serializable with Logging {
+
+  /**
+   * Constructs with the default configuration
+   */
+  def this() = this(new HierarchicalClusteringConf())
+
+  /**
+   * Trains a hierarchical clustering model with the given configuration
+   *
+   * @param data training points
+   * @return a model for hierarchical clustering
+   */
+  def run(data: RDD[Vector]): HierarchicalClusteringModel = {
+validateData(data)
+logInfo(sRun with ${conf.toString})
+
+val startTime = System.currentTimeMillis() // to measure the execution 
time
+val clusterTree = ClusterTree.fromRDD(data) // make the root node
+val model = new HierarchicalClusteringModel(clusterTree)
+val statsUpdater = new ClusterTreeStatsUpdater()
+
+var node: Option[ClusterTree] = Some(model.clusterTree)
+statsUpdater(node.get)
+
+// If the followed conditions are satisfied, and then stop the 
training.
+//   1. There is no splittable cluster
+//   2. The number of the splitted clusters is greater than that of 
given clusters
+//   3. The total variance of all clusters increases, when a cluster 
is splitted
+var totalVariance = Double.MaxValue
+var newTotalVariance = 

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2906#discussion_r19289138
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/HierarchicalClustering.scala
 ---
@@ -0,0 +1,549 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import breeze.linalg.{DenseVector = BDV, Vector = BV, norm = breezeNorm}
+import org.apache.spark.Logging
+import org.apache.spark.SparkContext._
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/**
+ * the configuration for a hierarchical clustering algorithm
+ *
+ * @param numClusters the number of clusters you want
+ * @param subIterations the number of iterations at digging
+ * @param epsilon the threshold to stop the sub-iterations
+ * @param randomSeed uses in sampling data for initializing centers in 
each sub iterations
+ * @param randomRange the range coefficient to generate random points in 
each clustering step
+ */
+class HierarchicalClusteringConf(
+  private var numClusters: Int,
+  private var subIterations: Int,
+  private var numRetries: Int,
+  private var epsilon: Double,
+  private var randomSeed: Int,
+  private[mllib] var randomRange: Double) extends Serializable {
+
+  def this() = this(20, 5, 20, 10E-6, 1, 0.1)
+
+  def setNumClusters(numClusters: Int): this.type = {
+this.numClusters = numClusters
+this
+  }
+
+  def getNumClusters(): Int = this.numClusters
+
+  def setSubIterations(iterations: Int): this.type = {
+this.subIterations = iterations
+this
+  }
+
+  def setNumRetries(numRetries: Int): this.type = {
+this.numRetries = numRetries
+this
+  }
+
+  def getNumRetries(): Int = this.numRetries
+
+  def getSubIterations(): Int = this.subIterations
+
+  def setEpsilon(epsilon: Double): this.type = {
+this.epsilon = epsilon
+this
+  }
+
+  def getEpsilon(): Double = this.epsilon
+
+  def setRandomSeed(seed: Int): this.type = {
+this.randomSeed = seed
+this
+  }
+
+  def getRandomSeed(): Int = this.randomSeed
+
+  def setRandomRange(range: Double): this.type = {
+this.randomRange = range
+this
+  }
+}
+
+
+/**
+ * This is a divisive hierarchical clustering algorithm based on bi-sect 
k-means algorithm.
+ *
+ * @param conf the configuration class for the hierarchical clustering
+ */
+class HierarchicalClustering(val conf: HierarchicalClusteringConf)
+extends Serializable with Logging {
+
+  /**
+   * Constructs with the default configuration
+   */
+  def this() = this(new HierarchicalClusteringConf())
+
+  /**
+   * Trains a hierarchical clustering model with the given configuration
+   *
+   * @param data training points
+   * @return a model for hierarchical clustering
+   */
+  def run(data: RDD[Vector]): HierarchicalClusteringModel = {
+validateData(data)
+logInfo(sRun with ${conf.toString})
+
+val startTime = System.currentTimeMillis() // to measure the execution 
time
+val clusterTree = ClusterTree.fromRDD(data) // make the root node
+val model = new HierarchicalClusteringModel(clusterTree)
+val statsUpdater = new ClusterTreeStatsUpdater()
+
+var node: Option[ClusterTree] = Some(model.clusterTree)
+statsUpdater(node.get)
+
+// If the followed conditions are satisfied, and then stop the 
training.
+//   1. There is no splittable cluster
+//   2. The number of the splitted clusters is greater than that of 
given clusters
+//   3. The total variance of all clusters increases, when a cluster 
is splitted
+var totalVariance = Double.MaxValue
+var newTotalVariance = 

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2906#discussion_r19289245
  
--- Diff: 
mllib/src/test/java/org/apache/spark/mllib/clustering/JavaHierarchicalClusteringSuite.java
 ---
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering;
+
+import com.google.common.collect.Lists;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.mllib.linalg.Vector;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.Serializable;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+
+public class JavaHierarchicalClusteringSuite implements Serializable {
+private transient JavaSparkContext sc;
--- End diff --

Looks like this is using 4-space indent but should be 2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-10-23 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-60268305
  
I just gave this a quick read-through, and the structure makes sense. I 
left several small comments. I see the chunks of logic I would expect, but did 
not evaluate it in detail. The existence of some tests suggests this probably 
basically works :) I am wondering about performance too as this relies on Scala 
idioms in many places; it might be worth a quick look with jprofiler if you can 
to see if there are any easy-win optimizations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3911] [SQL] HiveSimpleUdf can not be op...

2014-10-23 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2771#discussion_r19289432
  
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/QueryTest.scala ---
@@ -74,4 +76,30 @@ class QueryTest extends FunSuite {
   .stripMargin)
 }
   }
+
+  // The following copy is copied from 
org.apache.spark.sql.catalyst.plans.PlanTest
--- End diff --

How about making `QueryTest` inherit from `PlanTest` instead? Just like 
what we did in another `PlanTest` in `sql/core`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4026][Streaming] Write ahead log manage...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2882#issuecomment-60274484
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22075/consoleFull)
 for   PR 2882 at commit 
[`d29fddd`](https://github.com/apache/spark/commit/d29fddd880fd7efec8ed05017a12600bcb2aa829).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3883 SSL support for HttpServer and Akka

2014-10-23 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2739#issuecomment-60275209
  
Hi @jacek-lewandowski,

Now that I finally noticed you built this on top of branch-1.1, some of the 
choices you made make a lot more sense. (I always assume people are working on 
master, since it's generally preferable to add new features to master first.)

One huge difference in master, which lead to a lot of my comments, is 
SPARK-2098. That fix added the ability of all daemons - including Master and 
Worker - to read the spark-defaults.conf file. So, if you build on top of that, 
you need zero code dealing with loading config data, and can rely on SparkConf 
for everything. Then, you could have something like:

class SSLOptions(conf: SparkConf, module: String)

That would load options like this:

sslEnabled = conf.getOption(sspark.$module.ssl.enabled)
  .orElse(conf.getOption(spark.ssl.enabled))
  .getOrElse(false)

Then you have module-specific configuration and a global fallback. What do 
you think?

On the subject of distributing the configuration, I think it's sort of ok 
to rely on that, for the time being, for standalone mode. Long term, it would 
be better to allow each job to be able to distribute its own configuration, so 
that it's easy for admins and users to use different certificates for the 
daemons and for the jobs, for example.

On Yarn, I still believe we should not have this requirement - since when 
using Spark-on-Yarn, Spark is kind of a client-side thing and shouldn't require 
any changes in the cluster . The needed files should be distributed 
automatically by Spark and made available to executors. That should be doable 
by disabling certificate validation (so that the hostnames don't matter) or 
using wildcard certificates (assuming everything is in the same sub-domain). If 
that's not enough to cover all user cases, we can leave other enhancements for 
later.

I'm not familiar enough with mesos to be able to suggest anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3359 [DOCS] sbt/sbt unidoc doesn't work ...

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2909#issuecomment-60276342
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22074/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3359 [DOCS] sbt/sbt unidoc doesn't work ...

2014-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2909#issuecomment-60276331
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22074/consoleFull)
 for   PR 2909 at commit 
[`f62c347`](https://github.com/apache/spark/commit/f62c347e2df9d7e63653c2bf42004e86f7a80b27).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4065] Add check for IPython on Windows

2014-10-23 Thread msjgriffiths
GitHub user msjgriffiths opened a pull request:

https://github.com/apache/spark/pull/2910

[SPARK-4065] Add check for IPython on Windows

This issue employs logic similar to the bash launcher (pyspark) to check
if IPTYHON=1, and if so launch ipython with options in IPYTHON_OPTS.
This fix assumes that ipython is available in the system Path, and can
be invoked with a plain ipython command.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/msjgriffiths/spark pyspark-windows

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2910.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2910


commit f076d3b0c4de62001be449c5ce22cae399bf6bde
Author: Michael Griffiths msjgriffi...@gmail.com
Date:   2014-10-23T17:45:13Z

[SPARK-4065] Add check for IPython on Windows

This issue employs logic similar to the bash launcher (pyspark) to check
if IPTYHON=1, and if so launch ipython with options in IPYTHON_OPTS.
This fix assumes that ipython is available in the system Path, and can
be invoked with a plain ipython command.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4065] Add check for IPython on Windows

2014-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2910#issuecomment-60278572
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2886#discussion_r19294252
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala ---
@@ -325,22 +325,23 @@ class BlockManagerMasterActor(val isLocal: Boolean, 
conf: SparkConf, listenerBus
 
   private def register(id: BlockManagerId, maxMemSize: Long, slaveActor: 
ActorRef) {
 val time = System.currentTimeMillis()
+
 if (!blockManagerInfo.contains(id)) {
   blockManagerIdByExecutor.get(id.executorId) match {
 case Some(manager) =
-  // A block manager of the same executor already exists.
-  // This should never happen. Let's just quit.
-  logError(Got two different block manager registrations on  + 
id.executorId)
-  System.exit(1)
+  // A block manager of the same executor already exists so remove 
it (assumed dead).
--- End diff --

actually what I meant was to add a comma between exists and so... It's 
ok I can fix this myself when I merge it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >