[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-74630811
  
  [Test build #27622 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27622/consoleFull)
 for   PR 4622 at commit 
[`6dbec7d`](https://github.com/apache/spark/commit/6dbec7d451511d17f39750815d4a7d11da03561b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3340] Deprecate ADD_JARS and ADD_FILES

2015-02-17 Thread azagrebin
Github user azagrebin commented on the pull request:

https://github.com/apache/spark/pull/4616#issuecomment-74631484
  
@andrewor14, thanks for the brackets and the credit, I have actually read 
the style guide but forgot them, sorry for that. I have also created the JIRA 
account, my username is the same: azagrebin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74631726
  
  [Test build #27623 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27623/consoleFull)
 for   PR 4620 at commit 
[`c0635f1`](https://github.com/apache/spark/commit/c0635f13ce02fbf5920d9a5fb8a312fc16061b55).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Spark-5708: Add Slf4jSink to Spark Metrics

2015-02-17 Thread judynash
GitHub user judynash opened a pull request:

https://github.com/apache/spark/pull/4644

Spark-5708: Add Slf4jSink to Spark Metrics

Add Slf4jSink to Spark Metrics using Coda Hale's SlfjReporter. 
This sends metrics to log4j, allowing spark users to reuse log4j pipeline 
for metrics collection. 

Reviewed existing unit tests and didn't see any sink-related tests. Please 
advise on if tests should be added. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/judynash/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4644.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4644


commit ef837c0b7c79a21982624cb2954376abf8e6e75b
Author: Judy 
Date:   2015-02-17T08:13:57Z

Spark-5708: Add Slf4jSink to Spark Metrics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Spark-5708: Add Slf4jSink to Spark Metrics

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4644#issuecomment-74632101
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4562#issuecomment-74634826
  
  [Test build #27615 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27615/consoleFull)
 for   PR 4562 at commit 
[`36978d1`](https://github.com/apache/spark/commit/36978d1835ab6e0266ad3787b33056b573fd59e8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4562#issuecomment-74634837
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27615/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5856: In Maven build script, launch Zinc...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4643#issuecomment-74635275
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27616/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5856: In Maven build script, launch Zinc...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4643#issuecomment-74635267
  
  [Test build #27616 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27616/consoleFull)
 for   PR 4643 at commit 
[`717cfb0`](https://github.com/apache/spark/commit/717cfb055dcdbdf682a1a891e2413ab0d66de211).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: fix DataFrame Python API

2015-02-17 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/4645

fix DataFrame Python API

1. added explain()
2. add isLocal()
3. do not call show() in __repl__
4. add foreach() and foreachPartition()
5. add distinct()
6. fix functions.col()/column()/lit()
7. fix unit tests in sql/functions.py
8. fix unicode in showString()

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark df6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4645.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4645


commit 6b46a2ce71fd260f59fffc05ce8abcfb3495d4e0
Author: Davies Liu 
Date:   2015-02-17T08:51:16Z

fix DataFrame Python API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: fix DataFrame Python API

2015-02-17 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/4645#issuecomment-74635432
  
cc @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: fix DataFrame Python API

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4645#issuecomment-74635732
  
  [Test build #27624 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27624/consoleFull)
 for   PR 4645 at commit 
[`6b46a2c`](https://github.com/apache/spark/commit/6b46a2ce71fd260f59fffc05ce8abcfb3495d4e0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4506#issuecomment-74635723
  
  [Test build #27625 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27625/consoleFull)
 for   PR 4506 at commit 
[`4c58fbc`](https://github.com/apache/spark/commit/4c58fbc7813a3d6200a8029001798291437635bd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4602#issuecomment-74636197
  
  [Test build #27617 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27617/consoleFull)
 for   PR 4602 at commit 
[`f6907d2`](https://github.com/apache/spark/commit/f6907d2bb1c9aca1528e458a9a7fd9a3d58b9309).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ShowTablesCommand(databaseName: Option[String]) extends 
RunnableCommand `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4602#issuecomment-74636202
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27617/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

2015-02-17 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/4601#discussion_r24801227
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
@@ -144,11 +144,24 @@ private[spark] class PythonRDD(
 stream.readFully(update)
 accumulator += Collections.singletonList(update)
   }
+
   // Check whether the worker is ready to be re-used.
-  if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
-if (reuse_worker) {
-  env.releasePythonWorker(pythonExec, envVars.toMap, 
worker)
-  released = true
+  if (reuse_worker) {
+// It has a high possibility that the ending mark is 
already available,
+// And current task should not be blocked by checking it
+
+if (stream.available() >= 4) {
--- End diff --

@JoshRosen  This does not work very well in practice, it's common to see 
some workers can not be re-used, I will try to find a better solution, or 
revert this? (because it seems that it did not solve the freeze problem). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5841: remove DiskBlockManager shutdown h...

2015-02-17 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4627#issuecomment-74637135
  
@pwendell @JoshRosen @MattWhelan Let me propose a 'real' fix I think. This 
should be possible to make correct with a few more lines of code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] [Minor] Update the HiveContext Unittest

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4584#issuecomment-74637401
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27618/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] [Minor] Update the HiveContext Unittest

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4584#issuecomment-74637398
  
  [Test build #27618 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27618/consoleFull)
 for   PR 4584 at commit 
[`e5bdc3a`](https://github.com/apache/spark/commit/e5bdc3a2f1847098f3f663d6e3a336cbdaf50bce).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] [Minor] Update the HiveContext Unittest

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4584#issuecomment-74637498
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27619/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] [Minor] Update the HiveContext Unittest

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4584#issuecomment-74637490
  
  [Test build #27619 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27619/consoleFull)
 for   PR 4584 at commit 
[`e5bdc3a`](https://github.com/apache/spark/commit/e5bdc3a2f1847098f3f663d6e3a336cbdaf50bce).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4602#issuecomment-74637702
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27620/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4602#issuecomment-74637694
  
  [Test build #27620 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27620/consoleFull)
 for   PR 4602 at commit 
[`f6907d2`](https://github.com/apache/spark/commit/f6907d2bb1c9aca1528e458a9a7fd9a3d58b9309).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ShowTablesCommand(databaseName: Option[String]) extends 
RunnableCommand `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-17 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-74638488
  
@mengxr I updated the JIRA page. Please take a look. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5856: In Maven build script, launch Zinc...

2015-02-17 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4643#issuecomment-74638937
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Fixed overflow on large range with high number...

2015-02-17 Thread JeroenWarmerdam
GitHub user JeroenWarmerdam opened a pull request:

https://github.com/apache/spark/pull/4646

Fixed overflow on large range with high number of partitions

The following use case causes an overflow when creating the partitions 
inside JdbcRDD:
```
val jdbcRDD = new TJdbcRDD(sc, () =>
  DriverManager.getConnection(url, username, password),
  "SELECT id FROM twitter_statuses WHERE ? <= id AND id <= ?",
  lowerBound = 1131544775L,
  upperBound = 567279358897692673L,
  numPartitions = 20,
  mapRow = r => (r.getLong("id"))
)
```

This is fixed by swapping division and multiplication in the creation of 
partitions.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JeroenWarmerdam/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4646.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4646


commit 53f3a67b08c3e980c9dca0eb56fcb5094c8ff7d9
Author: Jeroen Warmerdam 
Date:   2015-02-17T09:21:55Z

Fixed overflow on large range with high number of partitions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-74639103
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27622/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5858][MLLIB] Remove unnecessary first()...

2015-02-17 Thread mengxr
GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/4647

[SPARK-5858][MLLIB] Remove unnecessary first() call in GLM

`numFeatures` is only used by multinomial logistic regression.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark SPARK-5858

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4647.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4647


commit 12c5548cef5d252be0acc3e4eaf4ff75dc639814
Author: Xiangrui Meng 
Date:   2015-02-17T08:45:48Z

check numFeatures only once

commit 036dc7fdf0e346323c8a154ae4394e78b86092cd
Author: Xiangrui Meng 
Date:   2015-02-17T09:20:08Z

remove unnecessary first() call




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-74639094
  
  [Test build #27622 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27622/consoleFull)
 for   PR 4622 at commit 
[`6dbec7d`](https://github.com/apache/spark/commit/6dbec7d451511d17f39750815d4a7d11da03561b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AffinityPropagationModel(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Fixed overflow on large range with high number...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4646#issuecomment-74639191
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5858][MLLIB] Remove unnecessary first()...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4647#issuecomment-74639277
  
  [Test build #27626 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27626/consoleFull)
 for   PR 4647 at commit 
[`036dc7f`](https://github.com/apache/spark/commit/036dc7fdf0e346323c8a154ae4394e78b86092cd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5841 [CORE] [HOTFIX] Memory leak in Disk...

2015-02-17 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/4648

SPARK-5841 [CORE] [HOTFIX] Memory leak in DiskBlockManager  

Avoid call to remove shutdown hook being called from shutdown hook

CC @pwendell @JoshRosen @MattWhelan

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-5841.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4648.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4648


commit 51548dbe1d58b74aeda29086c4e75a7120dfb53d
Author: Sean Owen 
Date:   2015-02-17T09:31:18Z

Avoid call to remove shutdown hook being called from shutdown hook




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4506#issuecomment-74640117
  
  [Test build #27625 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27625/consoleFull)
 for   PR 4506 at commit 
[`4c58fbc`](https://github.com/apache/spark/commit/4c58fbc7813a3d6200a8029001798291437635bd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4506#issuecomment-74640124
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27625/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74640219
  
  [Test build #27623 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27623/consoleFull)
 for   PR 4620 at commit 
[`c0635f1`](https://github.com/apache/spark/commit/c0635f13ce02fbf5920d9a5fb8a312fc16061b55).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74640231
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27623/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5841 [CORE] [HOTFIX] Memory leak in Disk...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4648#issuecomment-74640475
  
  [Test build #27627 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27627/consoleFull)
 for   PR 4648 at commit 
[`51548db`](https://github.com/apache/spark/commit/51548dbe1d58b74aeda29086c4e75a7120dfb53d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4642#issuecomment-74640566
  
  [Test build #27621 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27621/consoleFull)
 for   PR 4642 at commit 
[`d291c34`](https://github.com/apache/spark/commit/d291c347687da1576ba8fafc855d05f9da3419b1).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4642#issuecomment-74640572
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27621/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Fixed overflow on large range with high number...

2015-02-17 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4646#issuecomment-74641902
  
No, this changes the result! For example, `(3 * 4) / 5 == 2` but `3 * (4 / 
5) == 0`. It's a good point though, but why not just cast `i` to `long`? (You 
will need to open a JIRA for a minor bug.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-17 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74642194
  
@Sephiroth-Lin yes I think this should be directed at SPARK-5801 then. 
SPARK-5830 is a duplicate.
CC @kayousterhout

Does this correct the many levels of extra temp dirs? it sounds like you're 
addressing a case where there is one extra but I probably misunderstand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4634#issuecomment-74643730
  
@sryza The logical difference is small. `aggregateByKey` is for when you 
have a single immutable 'zero' value to start from for each key. `combineByKey` 
lets this be a function, and of the first value. That is useful, for example, 
if I were trying to combine into a `mutable.Set` since I need to make a 
different one for each key. Whether or not it was worth different methods in 
retrospect, I don't know, but that much seems OK since they're there already.

The rest of the difference is just that `combineByKey` exposes control over 
map side combine and serializer. That is a little more internal. If there is 
clear evidence this should have been a developer API then I'd say at least we 
can not open it up in the Java API. But is that clear? Otherwise I'd say, well, 
let's at least shoot for consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-74644181
  
  [Test build #27628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27628/consoleFull)
 for   PR 4622 at commit 
[`6cddeb2`](https://github.com/apache/spark/commit/6cddeb2f655fb477d23c1fbe5bf0230e2b97bdce).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5859] [PySpark] [SQL] fix DataFrame Pyt...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4645#issuecomment-74645030
  
  [Test build #27624 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27624/consoleFull)
 for   PR 4645 at commit 
[`6b46a2c`](https://github.com/apache/spark/commit/6b46a2ce71fd260f59fffc05ce8abcfb3495d4e0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5859] [PySpark] [SQL] fix DataFrame Pyt...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4645#issuecomment-74645037
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27624/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...

2015-02-17 Thread suyanNone
Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/4055#issuecomment-74646530
  
@srowen @JoshRosen  can some one verify this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...

2015-02-17 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r24806228
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -483,8 +483,9 @@ private[spark] class TaskSetManager(
   // a good proxy to task serialization time.
   // val timeTaken = clock.getTime() - startTime
   val taskName = s"task ${info.id} in stage ${taskSet.id}"
-  logInfo("Starting %s (TID %d, %s, %s, %d bytes)".format(
-  taskName, taskId, host, taskLocality, serializedTask.limit))
+  logInfo("Starting %s (TID %d, %s, %d, %s, %d bytes)".format(
--- End diff --

Why not string interpolation here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...

2015-02-17 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r24806252
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala 
---
@@ -65,4 +65,6 @@ private[spark] class ResultTask[T, U](
   override def preferredLocations: Seq[TaskLocation] = preferredLocs
 
   override def toString = "ResultTask(" + stageId + ", " + partitionId + 
")"
+
+  override def canEqual(other: Any): Boolean = 
other.isInstanceOf[ResultTask[T, U]]
--- End diff --

Yes, that's very slightly better. I agree


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...

2015-02-17 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4055#discussion_r24806282
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala 
---
@@ -65,4 +65,6 @@ private[spark] class ResultTask[T, U](
   override def preferredLocations: Seq[TaskLocation] = preferredLocs
 
   override def toString = "ResultTask(" + stageId + ", " + partitionId + 
")"
+
+  override def canEqual(other: Any): Boolean = 
other.isInstanceOf[ResultTask[T, U]]
--- End diff --

So `equals` is not overridden in these subclasses because equality does not 
depend on their additional fields? just checking that this is definitely 
desirable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

2015-02-17 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4460#issuecomment-74648170
  
So is the idea that `FeatureAttributes` becomes `AttributeGroup`, and that 
it continues to contain many `Attribute`s? I didn't realize that we intended 
the vector-valued features to be whole schemas within themselves. So they may 
be `AttributeGroup`s too and so an `AttributeGroup` is an `Attribute` too. 
Makes sense.

Rename `FeatureType`? and what's its value for `AttributeGroup`? `GROUP` or 
`null`?

You could imagine a more elaborate hierarchy of types: _discrete_ is a 
special case of _continuous_, _ordinal_ is a special case of _discrete_. It's 
nice to have that expressiveness; it adds somewhat to the complexity for the 
caller and the code. Maybe you could argue that the schema should force an 
interpretation for the algorithm. But I kind of like it. The type objects would 
have methods like `isContinuous`, `isCategorical`. Should I make a fuller 
hierarchy or stick to adding `BINARY`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5826][Streaming] Fix Configuration not ...

2015-02-17 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4612#issuecomment-74648878
  
OK LGTM. I suppose a field is not generated here as it's never used outside 
the constructor and it need not be `private`. Looks like a clean fix, we've 
reviewed it, tests pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5858][MLLIB] Remove unnecessary first()...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4647#issuecomment-74648960
  
  [Test build #27626 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27626/consoleFull)
 for   PR 4647 at commit 
[`036dc7f`](https://github.com/apache/spark/commit/036dc7fdf0e346323c8a154ae4394e78b86092cd).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5858][MLLIB] Remove unnecessary first()...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4647#issuecomment-74648972
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27626/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5826][Streaming] Fix Configuration not ...

2015-02-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4612


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5825] [Spark Submit] Remove the double ...

2015-02-17 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4611#issuecomment-74650236
  
Gotcha. OK. Until someone thinks of a more robust check, I think we can 
resort to just checking if `ps -p $pid -o comm=` is `java`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5841 [CORE] [HOTFIX] Memory leak in Disk...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4648#issuecomment-74650240
  
  [Test build #27627 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27627/consoleFull)
 for   PR 4648 at commit 
[`51548db`](https://github.com/apache/spark/commit/51548dbe1d58b74aeda29086c4e75a7120dfb53d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5841 [CORE] [HOTFIX] Memory leak in Disk...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4648#issuecomment-74650245
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27627/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-74653294
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27628/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-74653282
  
  [Test build #27628 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27628/consoleFull)
 for   PR 4622 at commit 
[`6cddeb2`](https://github.com/apache/spark/commit/6cddeb2f655fb477d23c1fbe5bf0230e2b97bdce).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AffinityPropagationModel(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor][SQL] Use same function to check path p...

2015-02-17 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/4649

[Minor][SQL] Use same function to check path parameter in JSONRelation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 use_checkpath

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4649.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4649


commit 0f9a1a172d7684a9eeb3898a674cce34a9d8e277
Author: Liang-Chi Hsieh 
Date:   2015-02-17T11:23:25Z

Use same function to check path parameter.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor][SQL] Use same function to check path p...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4649#issuecomment-74653911
  
  [Test build #27629 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27629/consoleFull)
 for   PR 4649 at commit 
[`0f9a1a1`](https://github.com/apache/spark/commit/0f9a1a172d7684a9eeb3898a674cce34a9d8e277).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-17 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/4562#issuecomment-74662087
  
Hey @chenghao-intel @yhuai, sorry I didn't notice this PR earlier, and I 
believe this issue has been fixed in #4563 
([here](https://github.com/apache/spark/pull/4563/files#diff-c69b9e667e93b7e4693812cc72abb65fR245)).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-17 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24813338
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -208,14 +208,14 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
 ParquetRelation2(
   paths,
   Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json),
-  None,
+  Some(metastoreSchema),
   Some(partitionSpec))(hive))
 } else {
   val paths = 
Seq(metastoreRelation.hiveQlTable.getDataLocation.toString)
-  LogicalRelation(
-ParquetRelation2(
+  LogicalRelation(ParquetRelation2(
   paths,
-  Map(ParquetRelation2.METASTORE_SCHEMA -> 
metastoreSchema.json))(hive))
+  Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json),
+  Some(metastoreSchema))(hive))
--- End diff --

Yeah, evil case insensitivity...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor][SQL] Use same function to check path p...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4649#issuecomment-74662835
  
  [Test build #27629 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27629/consoleFull)
 for   PR 4649 at commit 
[`0f9a1a1`](https://github.com/apache/spark/commit/0f9a1a172d7684a9eeb3898a674cce34a9d8e277).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor][SQL] Use same function to check path p...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4649#issuecomment-74662840
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27629/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4650#issuecomment-74679144
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5688][MLLIB] Randomize splits for categ...

2015-02-17 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4475#issuecomment-74679228
  
Oops, I also missed this behavior. BTW I think you will need to close the 
PR yourself if you want to close this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: RDF Loader added + documentation

2015-02-17 Thread lukovnikov
GitHub user lukovnikov opened a pull request:

https://github.com/apache/spark/pull/4650

RDF Loader added + documentation

Have been testing it with DBpedia dumps, works well so far.
Any help with custom partitioning and optimization is welcome.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukovnikov/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4650.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4650


commit 10436d252ad4876d28c91c77036e3d993050438a
Author: lukovnikov 
Date:   2015-02-03T19:41:58Z

fast forward from upstream

commit 595aed098fb423514b73263f96dfcaf1edbc72f5
Author: lukovnikov 
Date:   2015-02-03T21:41:00Z

dictionary builder done

commit c2399023825e804476527f7e159b182a1b5c91c8
Author: lukovnikov 
Date:   2015-02-03T21:44:07Z

[SPARK 5280]

commit f14e4835cf365fcbe5dd0979e61464b7cecb8774
Author: lukovnikov 
Date:   2015-02-03T22:50:06Z

done dictionary version

commit 43cc53ab6d99a4a96a0764cc306f38fdce3a7e00
Author: lukovnikov 
Date:   2015-02-03T23:25:07Z

[SPARK 5280] rdfloader using hashes as VertexIds

commit 2e1220d0938aee7d190439253e3b9bb1e73c77e8
Author: lukovnikov 
Date:   2015-02-04T00:04:48Z

cleaned up + fixed style
TODO: test + comment

commit 54e2c6eb24dade70753320a3ab2b3a64fef7a6d4
Author: lukovnikov 
Date:   2015-02-04T00:26:30Z

made custom 64bit hash

commit b454560508c9d50c60e067d7e67405ca1e13c165
Author: lukovnikov 
Date:   2015-02-04T00:32:57Z

proper

commit 45a9f57695e76c09c20fa99a1010168f63ef1da8
Author: lukovnikov 
Date:   2015-02-03T19:41:58Z

fast forward from upstream

commit 6ee9a2b675d06675b5b591f16e8d52e63d2dc049
Author: lukovnikov 
Date:   2015-02-03T21:41:00Z

dictionary builder done

commit 45c22160c52111066109f57a0d773aca211c2068
Author: lukovnikov 
Date:   2015-02-03T21:44:07Z

[SPARK 5280]

commit fa5c0da9ea4f6ca662406b380432901022d6de55
Author: lukovnikov 
Date:   2015-02-03T22:50:06Z

done dictionary version

commit c036f98476e96ac03124f758ed7f17c4a464cf86
Author: lukovnikov 
Date:   2015-02-03T23:25:07Z

[SPARK 5280] rdfloader using hashes as VertexIds

commit 57553797f7404e686674b0bfb39d80bb24d6520c
Author: lukovnikov 
Date:   2015-02-04T00:04:48Z

cleaned up + fixed style
TODO: test + comment

commit e00123eae4a84108af2c84cf253b1f4fb1fb69f1
Author: lukovnikov 
Date:   2015-02-04T00:26:30Z

made custom 64bit hash

commit 6af9a7ad6198174597ae7d86ec5c15fc8467a082
Author: lukovnikov 
Date:   2015-02-04T00:32:57Z

proper

commit 1ee34c9474bcf4500edecb08a848d15f3549055d
Author: lukovnikov 
Date:   2015-02-04T03:31:05Z

Merge branch 'master' of github.com:lukovnikov/spark into rdfloaderhash

commit 9000a4713d286d5078c16f62b5fadf480941bc82
Author: lukovnikov 
Date:   2015-02-04T03:31:18Z

Merge branch 'rdfloaderhash' of github.com:lukovnikov/spark into 
rdfloaderhash

commit 70eb725a102ae711a59c6d45794d191c18778c4b
Author: lukovnikov 
Date:   2015-02-04T23:02:48Z

RDF Loader with hash, tested on small RDF dumps (more tests in progress)

commit 4398d93712777442ba0f2e8920423fcdd7b67d1f
Author: Denis 
Date:   2015-02-04T23:27:01Z

added documentation for RDFLoader

commit 273a1b30dee1630333e0f7e683378b6dbb13c3a5
Author: Denis 
Date:   2015-02-04T23:29:05Z

small update to RDFLoader description

commit 202ccf86901c3d2435564e544f90d6a49cda66fb
Author: lukovnikov 
Date:   2015-02-04T23:31:10Z

sdf

commit 2d990cec1d48f62f4f1d9f9cf8082308a4eaf9e4
Author: lukovnikov 
Date:   2015-02-03T19:41:58Z

fast forward from upstream

commit 4a9b6222176749bee4a14e4b6d035b665c6ac7ea
Author: lukovnikov 
Date:   2015-02-04T23:43:31Z

Merge branch 'master' of github.com:lukovnikov/spark

commit 062996c45d0443836c1b4b2bb714d8f459ea6980
Author: lukovnikov 
Date:   2015-02-04T23:43:52Z

Merge branch 'rdfloaderhash'

commit 121bf14140573349424e7888da13ee2e8ea4f6f0
Author: lukovnikov 
Date:   2015-02-04T23:45:48Z

[SPARK 5280]

commit 67ada514b98292ff647d8354545d37cc111499ba
Author: lukovnikov 
Date:   2015-02-04T23:47:21Z

Merge branch 'rdfloaderhash' of github.com:lukovnikov/spark into 
rdfloaderhash

commit e5fcf758c0e4b54a38b2a01709681e11bbb6eae8
Author: lukovnikov 
Date:   2015-02-04T23:47:45Z

Merge branch 'rdfloaderhash'

commit c5960af7b14d65b1d290c3af11d722075a54ad2d
Author: lukovnikov 
Date:   2015-02-04T23:54:37Z

Merge remote-tracking branch 'upstream/master'

commit 91361f3f760dbc78467f8e2b87a1d77061aa59de
Author: lukovnikov 
Date:   2015-02-05T00:01:33Z

undone unnecessary changes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is ena

[GitHub] spark pull request: [SPARK-5688][MLLIB] Randomize splits for categ...

2015-02-17 Thread edenovit
Github user edenovit commented on the pull request:

https://github.com/apache/spark/pull/4475#issuecomment-74678578
  
Makes sense. I missed the fact that unordered features with 'high' 
cardinality were treated as ordered. This actually has the expected behavior. 
Feel free to go ahead and close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74680089
  
  [Test build #27630 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27630/consoleFull)
 for   PR 3074 at commit 
[`a8929c4`](https://github.com/apache/spark/commit/a8929c498ca2a0f995f03da6e158343bac475145).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5688][MLLIB] Randomize splits for categ...

2015-02-17 Thread edenovit
Github user edenovit commented on the pull request:

https://github.com/apache/spark/pull/4475#issuecomment-74680531
  
Done :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5862][SQL] Only transformUp the given p...

2015-02-17 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/4651

[SPARK-5862][SQL] Only transformUp the given plan once in 
HiveMetastoreCatalog

Current `ParquetConversions` in `HiveMetastoreCatalog` will transformUp the 
given plan multiple times if there are many Metastore Parquet tables. Since the 
transformUp operation is recursive, it should be better to only perform it once.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 parquet_atonce

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4651.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4651


commit e0f919bb68bcdd589538a6ed719e1f85e1db4800
Author: Liang-Chi Hsieh 
Date:   2015-02-17T15:13:47Z

Only transformUp the given plan once.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5688][MLLIB] Randomize splits for categ...

2015-02-17 Thread edenovit
Github user edenovit closed the pull request at:

https://github.com/apache/spark/pull/4475


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5862][SQL] Only transformUp the given p...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4651#issuecomment-74684464
  
  [Test build #27631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27631/consoleFull)
 for   PR 4651 at commit 
[`e0f919b`](https://github.com/apache/spark/commit/e0f919bb68bcdd589538a6ed719e1f85e1db4800).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74686697
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27630/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5862][SQL] Only transformUp the given p...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4651#issuecomment-74688099
  
  [Test build #27631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27631/consoleFull)
 for   PR 4651 at commit 
[`e0f919b`](https://github.com/apache/spark/commit/e0f919bb68bcdd589538a6ed719e1f85e1db4800).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5862][SQL] Only transformUp the given p...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4651#issuecomment-74688109
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27631/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74686687
  
  [Test build #27630 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27630/consoleFull)
 for   PR 3074 at commit 
[`a8929c4`](https://github.com/apache/spark/commit/a8929c498ca2a0f995f03da6e158343bac475145).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

2015-02-17 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4601#issuecomment-74690195
  
Reverted in `master` (1.4.0), `branch-1.3` (1.3.0), and `branch-1.2` 
(1.2.2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

2015-02-17 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4601#discussion_r24824713
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
@@ -144,11 +144,24 @@ private[spark] class PythonRDD(
 stream.readFully(update)
 accumulator += Collections.singletonList(update)
   }
+
   // Check whether the worker is ready to be re-used.
-  if (stream.readInt() == SpecialLengths.END_OF_STREAM) {
-if (reuse_worker) {
-  env.releasePythonWorker(pythonExec, envVars.toMap, 
worker)
-  released = true
+  if (reuse_worker) {
+// It has a high possibility that the ending mark is 
already available,
+// And current task should not be blocked by checking it
+
+if (stream.available() >= 4) {
--- End diff --

Yeah, let's revert and continue to investigate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74692249
  
  [Test build #27632 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27632/consoleFull)
 for   PR 3074 at commit 
[`7d67148`](https://github.com/apache/spark/commit/7d671482f7ba6e909e9f3f4c3b3dc694c95285fc).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5862][SQL] Only transformUp the given p...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4651#issuecomment-74698372
  
  [Test build #27633 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27633/consoleFull)
 for   PR 4651 at commit 
[`c1ed29d`](https://github.com/apache/spark/commit/c1ed29d80c2d285b3938fed8607d244f4377b7b8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread squito
Github user squito commented on the pull request:

https://github.com/apache/spark/pull/2851#issuecomment-74698831
  
Hi @CodingCat 

thanks for making all the updates.  Sorry I hadn't realized the subtlety w/ 
`Int` vs `Long` on the `RDDBlockId` and `BroadcastBlockId`.  Still, I think its 
still a good change for simplifying the code, glad you figured a way to make it 
work.   And on issue (2), the memory usage of a worker, I think what you have 
is correct, its supposed to be memory usage of the particular object on the 
worker.  This is on a UI page in the context of a particular object -- there is 
a separate page to summarize the memory usage of the executor overall.  (though 
I agree the UI is a little confusing ...)

I'll make a few more minor comments on the code, but mostly I just want to 
get another opinion on the right events to pass the block added events around, 
as I have mentioned above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5856: In Maven build script, launch Zinc...

2015-02-17 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/4643#issuecomment-74698870
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/2851#discussion_r24829369
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SparkListenerBus.scala ---
@@ -24,7 +24,13 @@ import org.apache.spark.util.ListenerBus
  */
 private[spark] trait SparkListenerBus extends ListenerBus[SparkListener, 
SparkListenerEvent] {
 
+  private[spark] var filter: DefaultSparkListenerEventFilter = new 
DefaultSparkListenerEventFilter
+
   override def onPostEvent(listener: SparkListener, event: 
SparkListenerEvent): Unit = {
+if (!filter.validate(event)) {
+  return  
--- End diff --

I think the `DefaultSparkListenerEventFilter` is probably adding an 
abstraction without any really good reason.  If we do stick w/ the new 
`SparkListenerBlockUpdate` events, I think its better if you just put the 
filter into the right method on the event logging listener and remove 
`DefaultSparkListenerEventFilter`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74699492
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27632/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74699476
  
  [Test build #27632 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27632/consoleFull)
 for   PR 3074 at commit 
[`7d67148`](https://github.com/apache/spark/commit/7d671482f7ba6e909e9f3f4c3b3dc694c95285fc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-02-17 Thread hellertime
Github user hellertime commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74700554
  
Missed that EasyMock is not longer the mocking kit. Gotta fixup my tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/2851#discussion_r24830276
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SparkListenerBus.scala ---
@@ -24,7 +24,13 @@ import org.apache.spark.util.ListenerBus
  */
 private[spark] trait SparkListenerBus extends ListenerBus[SparkListener, 
SparkListenerEvent] {
 
+  private[spark] var filter: DefaultSparkListenerEventFilter = new 
DefaultSparkListenerEventFilter
+
   override def onPostEvent(listener: SparkListener, event: 
SparkListenerEvent): Unit = {
+if (!filter.validate(event)) {
+  return  
--- End diff --

also, I think there is still one missing piece to get the json into the 
event logging for the history server.  You need to implement to put in the 
implementation for `onBlockUpdate` in `EventLoggingListener`.  I am suggesting 
that you just put this filter into that implementation, and get rid of the 
`Filter` abstraction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74702659
  
  [Test build #27634 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27634/consoleFull)
 for   PR 3074 at commit 
[`a99e4b8`](https://github.com/apache/spark/commit/a99e4b8e3b6dad6962969dcebe1e3460a04e2c84).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/2851#discussion_r24831053
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala ---
@@ -522,7 +523,9 @@ private[spark] class BlockManagerInfo(
 logInfo("Removed %s on %s on tachyon (size: %s)".format(
   blockId, blockManagerId.hostPort, 
Utils.bytesToString(blockStatus.tachyonSize)))
   }
+  return BlockStatus(storageLevel, 0, 0, 0)
 }
+null
--- End diff --

you don't need `return` here -- the last value of each block is its return 
value.  so this could be:

```
if (storageLevel.isValid) {
 ...
  _blocks.get(blockId)
} else if (_blocks.containsKey(blockId)) {
  ...
  BlockStatus(storageLevel, 0, 0, 0)
} else {
  null
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/2851#discussion_r24831483
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala ---
@@ -522,7 +523,9 @@ private[spark] class BlockManagerInfo(
 logInfo("Removed %s on %s on tachyon (size: %s)".format(
   blockId, blockManagerId.hostPort, 
Utils.bytesToString(blockStatus.tachyonSize)))
   }
+  return BlockStatus(storageLevel, 0, 0, 0)
 }
+null
--- End diff --

actually, can you explain the `null` case?  how does that happen, and won't 
that result in an NPE when it gets to your code in `StorageStatusListener` and 
in `JsonProtocol`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/2851#discussion_r24831712
  
--- Diff: core/src/main/scala/org/apache/spark/storage/RDDInfo.scala ---
@@ -21,13 +21,14 @@ import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.rdd.RDD
 import org.apache.spark.util.Utils
 
+trait InMemoryObjectInfo
--- End diff --

you don't really need this trait at all


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/4634#issuecomment-74704920
  
@srowen `aggregateByKey` will already make a copy of the object for each 
key so a mutable zero value is fine.  The `seqOp` argument to `aggregateByKey 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4634#issuecomment-74705635
  
Ah, true that. Well, there's not much difference at all eh. `combineByKey` 
is the lowest-level method and its separate utility is marginal; I suppose it 
gives access to exactly these settings like map-side combine. It's public, for 
better or worse, and doesn't do much harm other than taking up room in the API. 
In the name of consistency it seems OK to make it available equally in the Java 
API. If starting over, yeah, I'd question why both of these exist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/2851#discussion_r24833405
  
--- Diff: core/src/main/scala/org/apache/spark/storage/RDDInfo.scala ---
@@ -49,9 +50,40 @@ class RDDInfo(
   }
 }
 
+
 private[spark] object RDDInfo {
   def fromRdd(rdd: RDD[_]): RDDInfo = {
 val rddName = Option(rdd.name).getOrElse(rdd.id.toString)
 new RDDInfo(rdd.id, rddName, rdd.partitions.size, rdd.getStorageLevel)
   }
 }
+
+@DeveloperApi
+class BroadcastInfo(
+val id: Long,
+val name: String,
+val numPartitions: Int) extends Ordered[BroadcastInfo] with 
InMemoryObjectInfo {
+
+  var memSize = 0L
+  var diskSize = 0L
+  var tachyonSize = 0L
+
+  override def toString = {
+import Utils.bytesToString
+("%s\" (%d) ; " +
+  "MemorySize: %s; TachyonSize: %s; DiskSize: %s").format(
+name, id, bytesToString(memSize), bytesToString(tachyonSize), 
bytesToString(diskSize))
+  }
+
+  override def compare(that: BroadcastInfo): Int = {
+if (this.id > that.id) {
+  1
+} else {
+  if (this.id == that.id) {
+return 0
+  }
+  -1
--- End diff --

super minor:
```
if (this.id > that.id) {
  1
}  else if (this.is == that.id) {
  0
}  else {
  -1
}
```
(sorry I was wrong w/ the earlier suggestion)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/2851#discussion_r24833566
  
--- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala 
---
@@ -271,4 +368,19 @@ private[spark] object StorageUtils {
 blockLocations
   }
 
+
+  /**
+   * Return a mapping from block ID to its locations for each block that 
belongs to the given RDD.
--- End diff --

broadcast block, not RDD


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2851#discussion_r24833573
  
--- Diff: core/src/main/scala/org/apache/spark/storage/RDDInfo.scala ---
@@ -49,9 +50,40 @@ class RDDInfo(
   }
 }
 
+
 private[spark] object RDDInfo {
   def fromRdd(rdd: RDD[_]): RDDInfo = {
 val rddName = Option(rdd.name).getOrElse(rdd.id.toString)
 new RDDInfo(rdd.id, rddName, rdd.partitions.size, rdd.getStorageLevel)
   }
 }
+
+@DeveloperApi
+class BroadcastInfo(
+val id: Long,
+val name: String,
+val numPartitions: Int) extends Ordered[BroadcastInfo] with 
InMemoryObjectInfo {
+
+  var memSize = 0L
+  var diskSize = 0L
+  var tachyonSize = 0L
+
+  override def toString = {
+import Utils.bytesToString
+("%s\" (%d) ; " +
+  "MemorySize: %s; TachyonSize: %s; DiskSize: %s").format(
+name, id, bytesToString(memSize), bytesToString(tachyonSize), 
bytesToString(diskSize))
+  }
+
+  override def compare(that: BroadcastInfo): Int = {
+if (this.id > that.id) {
+  1
+} else {
+  if (this.id == that.id) {
+return 0
+  }
+  -1
--- End diff --

If you really want a suggestion here :-):

com.google.common.primitives.Longs.compare(this.id, that.id)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/4634#issuecomment-74709911
  
If there's no conceivable reason why someone would want to use 
`combineByKey` isn't including it in the Java API just going to confuse 
developers? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/2851#discussion_r24833690
  
--- Diff: 
core/src/main/scala/org/apache/spark/ui/storage/InMemoryObjectPage.scala ---
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ui.storage
+
+import javax.servlet.http.HttpServletRequest
+
+import org.apache.spark.storage._
+import org.apache.spark.ui.{UIUtils, WebUIPage}
+import org.apache.spark.util.Utils
+
+import scala.xml.Node
+
+private[ui] abstract class InMemoryObjectPage(pageName: String, parent: 
StorageTab)
--- End diff --

RDD aren't necessarily in memory ... maybe a better name would be 
`StorageDetailPage`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >