[GitHub] spark pull request: [SPARK-3481][SQL] removes the evil MINOR HACK

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2505#issuecomment-56480427
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20690/consoleFull)
 for   PR 2505 at commit 
[`4874ec8`](https://github.com/apache/spark/commit/4874ec83912e1b885b4b7e4cdbc0dfbdf5c83a45).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1486#issuecomment-56480392
  
Are we proposing to introduce hdfs caching tags/idioms directly into 
TaskSetManager in this pr ?
That does not look right. We need to generalize this so that any rdd can 
specify process/host (maybe rack also ?) annotations. 
Once done, HadoopRdd can leverage that.

Depending on underscore not being in name, etc is fragile.
One option would be to define our uri's: with default reverting to host 
only.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56480298
  
This looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3481][SQL] removes the evil MINOR HACK

2014-09-22 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2505#issuecomment-56480117
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3481][SQL] removes the evil MINOR HACK

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2505#issuecomment-56480056
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3481][SQL] removes the evil MINOR HACK

2014-09-22 Thread scwf
GitHub user scwf opened a pull request:

https://github.com/apache/spark/pull/2505

[SPARK-3481][SQL] removes the evil MINOR HACK

 a follow up of https://github.com/apache/spark/pull/2377 and 
https://github.com/apache/spark/pull/2352, see detail there.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scwf/spark patch-6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2505.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2505


commit 4874ec83912e1b885b4b7e4cdbc0dfbdf5c83a45
Author: wangfei 
Date:   2014-09-23T06:07:48Z

removes the evil MINOR HACK




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3172 and SPARK-3577

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2504#issuecomment-56479827
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20689/consoleFull)
 for   PR 2504 at commit 
[`c854514`](https://github.com/apache/spark/commit/c854514d81b4830ce1f1109662a713c51e6c8023).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class WriteMetrics extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3172 and SPARK-3577

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2504#issuecomment-56479829
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20689/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3172 and SPARK-3577

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2504#issuecomment-56479669
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20689/consoleFull)
 for   PR 2504 at commit 
[`c854514`](https://github.com/apache/spark/commit/c854514d81b4830ce1f1109662a713c51e6c8023).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3172 and SPARK-3577

2014-09-22 Thread sryza
GitHub user sryza opened a pull request:

https://github.com/apache/spark/pull/2504

SPARK-3172 and SPARK-3577

The posted patch addresses both SPARK-3172 and SPARK-3577.  It renames 
ShuffleWriteMetrics to WriteMetrics and uses it for tracking all three of 
shuffle write, spilling on the fetch side, and spilling on the write side 
(which only occurs during sort-based shuffle).

I'll fix and add tests if people think restructuring the metrics in this 
way makes sense.

I'm a little unsure about the name shuffleReadSpillMetrics, as spilling 
happens during aggregation, not read, but I had trouble coming up with 
something better.

I'm also unsure on what the most useful columns would be to display in the 
UI - I remember some pushback on adding new columns.  Ultimately these metrics 
will be most helpful if they can inform users whether and how much they need to 
increase the number of partitions / increase spark.shuffle.memoryFraction.  
Reporting spill time informs users whether spilling is a significant impacting 
performance.  Reporting memory size can help with understanding how much needs 
to be done to avoid spilling.

@pwendell any thoughts on this?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sryza/spark sandy-spark-3172

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2504.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2504


commit c854514d81b4830ce1f1109662a713c51e6c8023
Author: Sandy Ryza 
Date:   2014-09-23T05:58:18Z

SPARK-3172 and SPARK-3577




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3649] Remove GraphX custom serializers

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2503#issuecomment-56478252
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20688/consoleFull)
 for   PR 2503 at commit 
[`a49c2ad`](https://github.com/apache/spark/commit/a49c2ad67f2bf79ae10e9ef696605c64b0c0ed97).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3649] Remove GraphX custom serializers

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2503#issuecomment-56478260
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20688/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: add a util method for changing the log level w...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2433#issuecomment-56478099
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20687/consoleFull)
 for   PR 2433 at commit 
[`cdb3bfc`](https://github.com/apache/spark/commit/cdb3bfc1ab74d3b2c3dfec38dc23118bc05ed922).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2017] [SPARK-2016] Web UI responsivenes...

2014-09-22 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/1682#issuecomment-56478100
  
I've opened [SPARK-3644](https://issues.apache.org/jira/browse/SPARK-3644) 
as a forum for discussing the design of a REST API; sorry for the delay (got 
busy with other work / bug fixing).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: add a util method for changing the log level w...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2433#issuecomment-56478104
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20687/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

2014-09-22 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2492#issuecomment-56477603
  
> Understood, this side-effect is bit dangerous. The third-package could 
appear in sys.path in any order

Are you worried about a user adding a Python module whose name conflicts 
with a built-in module, thereby shadowing it?  I think this is a general Python 
problem that can occur even without `sys.path` manipulation, which is why it's 
bad to have top-level modules that have the same name as built-in ones (and 
also why relative imports can be bad): 
http://www.evanjones.ca/python-name-clashes.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3649] Remove GraphX custom serializers

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2503#issuecomment-56476058
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20688/consoleFull)
 for   PR 2503 at commit 
[`a49c2ad`](https://github.com/apache/spark/commit/a49c2ad67f2bf79ae10e9ef696605c64b0c0ed97).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3649] Remove GraphX custom serializers

2014-09-22 Thread ankurdave
GitHub user ankurdave opened a pull request:

https://github.com/apache/spark/pull/2503

[SPARK-3649] Remove GraphX custom serializers

As [reported][1] on the mailing list, GraphX throws

```
java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2
at 
org.apache.spark.graphx.impl.RoutingTableMessageSerializer$$anon$1$$anon$2.writeObject(Serializers.scala:39)
 
at 
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:195)
 
at 
org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:329)
```

when sort-based shuffle attempts to spill to disk. This is because GraphX 
defines custom serializers for shuffling pair RDDs that assume Spark will 
always serialize the entire pair object rather than breaking it up into its 
components. However, the spill code path in sort-based shuffle [violates this 
assumption][2].

GraphX uses the custom serializers to compress vertex ID keys using 
variable-length integer encoding. However, since the serializer can no longer 
rely on the key and value being serialized and deserialized together, 
performing such encoding would require writing a tag byte.

Instead, this PR simply removes the custom serializers. This causes a 10% 
slowdown for PageRank (494 s to 543 s, PageRank, 3 trials, 10 iterations per 
trial, uk-2007-05 graph, 16 r3.2xlarge nodes).

[1]: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassCastException-java-lang-Long-cannot-be-cast-to-scala-Tuple2-td13926.html#a14501
[2]: 
https://github.com/apache/spark/blob/f9d6220c792b779be385f3022d146911a22c2130/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala#L329

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ankurdave/spark SPARK-3649

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2503.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2503


commit a49c2ad67f2bf79ae10e9ef696605c64b0c0ed97
Author: Ankur Dave 
Date:   2014-09-22T22:05:30Z

[SPARK-3649] Remove GraphX custom serializers




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB] topic modeling on Gra...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2388#issuecomment-56475771
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20686/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB] topic modeling on Gra...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2388#issuecomment-56475768
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20686/consoleFull)
 for   PR 2388 at commit 
[`bf84e7b`](https://github.com/apache/spark/commit/bf84e7b87306dbe453077727be4a94fec40da417).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class TopicModeling(@transient val docs: RDD[(TopicModeling.DocId, 
SSV)],`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: add a util method for changing the log level w...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2433#issuecomment-56475233
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20687/consoleFull)
 for   PR 2433 at commit 
[`cdb3bfc`](https://github.com/apache/spark/commit/cdb3bfc1ab74d3b2c3dfec38dc23118bc05ed922).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56473764
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20684/consoleFull)
 for   PR 1290 at commit 
[`a28aa4a`](https://github.com/apache/spark/commit/a28aa4a7c91b402c95f81aaad254661cdf06607d).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OutputCanvas2D(wd: Int, ht: Int) extends Canvas `
  * `class OutputFrame2D( title: String ) extends Frame( title ) `
  * `class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends 
Canvas `
  * `class OutputFrame3D(title: String, shadowFrac: Double) extends 
Frame(title) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56473769
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20684/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-22 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-56473456
  
thanks a lot to @liancheng :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56473120
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20683/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56473117
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20683/consoleFull)
 for   PR 1290 at commit 
[`b3531d6`](https://github.com/apache/spark/commit/b3531d68dc36832115fd721a1a2efc0f99851661).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB] topic modeling on Gra...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2388#issuecomment-56472988
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20686/consoleFull)
 for   PR 2388 at commit 
[`bf84e7b`](https://github.com/apache/spark/commit/bf84e7b87306dbe453077727be4a94fec40da417).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-3212][SQL] Use logical plan matchi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2501#issuecomment-56472896
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20685/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-3212][SQL] Use logical plan matchi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2501#issuecomment-56472893
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20685/consoleFull)
 for   PR 2501 at commit 
[`80f26ac`](https://github.com/apache/spark/commit/80f26acffa8e234434fb8e080c499e6cae9fe6e4).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class LogicalRDD(output: Seq[Attribute], rdd: 
RDD[Row])(sqlContext: SQLContext)`
  * `case class PhysicalRDD(output: Seq[Attribute], rdd: RDD[Row]) extends 
LeafNode `
  * `case class ExistingRdd(output: Seq[Attribute], rdd: RDD[Row]) extends 
LeafNode `
  * `case class SparkLogicalPlan(alreadyPlanned: SparkPlan)(@transient 
sqlContext: SQLContext)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-22 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-56472453
  
LGTM

@marmbrus This is finally good to go :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2497#issuecomment-56472123
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20680/consoleFull)
 for   PR 2497 at commit 
[`b3b3e50`](https://github.com/apache/spark/commit/b3b3e50a0c13df08a607f036592f83e566cded39).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2497#issuecomment-56472127
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20680/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-3212][SQL] Use logical plan matchi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2501#issuecomment-56471636
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20685/consoleFull)
 for   PR 2501 at commit 
[`80f26ac`](https://github.com/apache/spark/commit/80f26acffa8e234434fb8e080c499e6cae9fe6e4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Merge pull request #1 from apache/master

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2502#issuecomment-56471527
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1720][SPARK-1719] Add the value of LD_L...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1031#issuecomment-56471514
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20681/consoleFull)
 for   PR 1031 at commit 
[`f44c221`](https://github.com/apache/spark/commit/f44c221aceb2f246eec335f4b7a1cd6f0c2b0080).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1720][SPARK-1719] Add the value of LD_L...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1031#issuecomment-56471518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20681/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Merge pull request #1 from apache/master

2014-09-22 Thread ceys
GitHub user ceys opened a pull request:

https://github.com/apache/spark/pull/2502

Merge pull request #1 from apache/master

Update from original

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ceys/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2502.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2502


commit 3fc6550d41409f26e6c54bac1914ed5cbf80c879
Author: ceys 
Date:   2014-09-23T02:56:32Z

Merge pull request #1 from apache/master

Update from original




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-3212][SQL] Use logical plan matchi...

2014-09-22 Thread marmbrus
GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/2501

[WIP][SPARK-3212][SQL] Use logical plan matching instead of temporary 
tables for table caching

_Also addresses: SPARK-1379 and SPARK-3641_

This PR introduces a new trait, `CacheManger`, which replaces the previous 
temporary table based caching system.  Instead of creating a temporary table, 
which shadows an existing table but provides a cached representation, the 
cached manager maintains a separate list of cached data.  After optimization, 
this list is searched for any matching plan fragments.  When a matching plan 
fragment is found it is replaced with the cached data.

There are several advantages to this approach:
 - Calling .cache() on a SchemaRDD now works as you would expect, and uses 
the more efficient columnar representation.
 - Its now possible to provide a list of temporary tables, without having 
to decide if a given table is actually just a  cached persistent table. (To be 
done in a follow-up PR)
 - In some cases it is possible that cached data will be used, even if a 
cached table was not explicitly requested.  This is because we now look at the 
logical structure instead of the table name.

TODO:
 - [ ] Finish cleanup of caching specific pattern matching code
 - [ ] More test cases for `sameResult` function

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark caching

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2501.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2501


commit 80f26acffa8e234434fb8e080c499e6cae9fe6e4
Author: Michael Armbrust 
Date:   2014-09-23T02:41:57Z

First draft of improved semantics for Spark SQL caching.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Adds json api for stages, storage and executor...

2014-09-22 Thread praveenr019
Github user praveenr019 closed the pull request at:

https://github.com/apache/spark/pull/882


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Adds json api for stages, storage and executor...

2014-09-22 Thread praveenr019
Github user praveenr019 commented on the pull request:

https://github.com/apache/spark/pull/882#issuecomment-56470825
  
Closing this pull request since its committed on a old branch.

Thanks @JoshRosen, would be glad to see this feature in Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56470743
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20684/consoleFull)
 for   PR 1290 at commit 
[`a28aa4a`](https://github.com/apache/spark/commit/a28aa4a7c91b402c95f81aaad254661cdf06607d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-22 Thread manishamde
Github user manishamde commented on a diff in the pull request:

https://github.com/apache/spark/pull/2435#discussion_r17889907
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala ---
@@ -0,0 +1,430 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.tree
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.tree.configuration.Algo._
+import org.apache.spark.mllib.tree.configuration.QuantileStrategy._
+import org.apache.spark.mllib.tree.configuration.Strategy
+import org.apache.spark.mllib.tree.impl.{BaggedPoint, TreePoint, 
DecisionTreeMetadata, TimeTracker}
+import org.apache.spark.mllib.tree.impurity.Impurities
+import org.apache.spark.mllib.tree.model._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * :: Experimental ::
+ * A class which implements a random forest learning algorithm for 
classification and regression.
+ * It supports both continuous and categorical features.
+ *
+ * @param strategy The configuration parameters for the random forest 
algorithm which specify
+ * the type of algorithm (classification, regression, 
etc.), feature type
+ * (continuous, categorical), depth of the tree, quantile 
calculation strategy,
+ * etc.
+ * @param numTrees If 1, then no bootstrapping is used.  If > 1, then 
bootstrapping is done.
+ * @param featureSubsetStrategy Number of features to consider for splits 
at each node.
+ *  Supported: "auto" (default), "all", 
"sqrt", "log2", "onethird".
+ *  If "auto" is set, this parameter is set 
based on numTrees:
+ *  if numTrees == 1, then 
featureSubsetStrategy = "all";
+ *  if numTrees > 1, then 
featureSubsetStrategy = "sqrt".
+ * @param seed  Random seed for bootstrapping and choosing feature subsets.
+ */
+@Experimental
+private class RandomForest (
+private val strategy: Strategy,
+private val numTrees: Int,
+featureSubsetStrategy: String,
+private val seed: Int)
+  extends Serializable with Logging {
+
+  strategy.assertValid()
+  require(numTrees > 0, s"RandomForest requires numTrees > 0, but was 
given numTrees = $numTrees.")
+  
require(RandomForest.supportedFeatureSubsetStrategies.contains(featureSubsetStrategy),
+s"RandomForest given invalid featureSubsetStrategy: 
$featureSubsetStrategy." +
+s" Supported values: 
${RandomForest.supportedFeatureSubsetStrategies.mkString(", ")}.")
+
+  /**
+   * Method to train a decision tree model over an RDD
+   * @param input Training data: RDD of 
[[org.apache.spark.mllib.regression.LabeledPoint]]
+   * @return RandomForestModel that can be used for prediction
+   */
+  def train(input: RDD[LabeledPoint]): RandomForestModel = {
+
+val timer = new TimeTracker()
+
+timer.start("total")
+
+timer.start("init")
+
+val retaggedInput = input.retag(classOf[LabeledPoint])
+val metadata =
+  DecisionTreeMetadata.buildMetadata(retaggedInput, strategy, 
numTrees, featureSubsetStrategy)
+logDebug("algo = " + strategy.algo)
+logDebug("numTrees = " + numTrees)
+logDebug("seed = " + seed)
+logDebug("maxBins = " + metadata.maxBins)
+logDebug("featureSubsetStrategy = " + featureSubsetStrategy)
+logDebug("numFeaturesPerNode = " + metadata.numFeaturesPerNode)
+
+// Find the splits and the corresponding bins (interval between the 
splits) using a sample
+// of the input data.
+timer.start("findSpli

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1486#issuecomment-56470414
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20678/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1486#issuecomment-56470411
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20678/consoleFull)
 for   PR 1486 at commit 
[`9c4933c`](https://github.com/apache/spark/commit/9c4933c6e18db8bf2e0cbd0deb85b46c2ca0d2b2).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: stop, start and destroy require the EC2_REGION

2014-09-22 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2473#discussion_r17889871
  
--- Diff: docs/ec2-scripts.md ---
@@ -137,11 +146,11 @@ cost you any EC2 cycles, but ***will*** continue to 
cost money for EBS
 storage.
 
 - To stop one of your clusters, go into the `ec2` directory and run
-`./spark-ec2 stop `.
+`./spark-ec2 --region= stop `.
 - To restart it later, run
-`./spark-ec2 -i  start `.
+`./spark-ec2 -i  --region= start `.
 - To ultimately destroy the cluster and stop consuming EBS space, run
-`./spark-ec2 destroy ` as described in the previous
+`./spark-ec2 --region= destroy ` as described in 
the previous
--- End diff --

Ah, right. It's set as the default.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56470205
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20682/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: stop, start and destroy require the EC2_REGION

2014-09-22 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2473#discussion_r17889800
  
--- Diff: docs/ec2-scripts.md ---
@@ -137,11 +146,11 @@ cost you any EC2 cycles, but ***will*** continue to 
cost money for EBS
 storage.
 
 - To stop one of your clusters, go into the `ec2` directory and run
-`./spark-ec2 stop `.
+`./spark-ec2 --region= stop `.
 - To restart it later, run
-`./spark-ec2 -i  start `.
+`./spark-ec2 -i  --region= start `.
 - To ultimately destroy the cluster and stop consuming EBS space, run
-`./spark-ec2 destroy ` as described in the previous
+`./spark-ec2 --region= destroy ` as described in 
the previous
--- End diff --

it does require it unless the region is us-east.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56470085
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20683/consoleFull)
 for   PR 1290 at commit 
[`b3531d6`](https://github.com/apache/spark/commit/b3531d68dc36832115fd721a1a2efc0f99851661).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-56470082
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20679/consoleFull)
 for   PR 2226 at commit 
[`e69ce88`](https://github.com/apache/spark/commit/e69ce883ee9d337a81d4aae3a63943937f771e84).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-56470087
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20679/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-22 Thread manishamde
Github user manishamde commented on a diff in the pull request:

https://github.com/apache/spark/pull/2435#discussion_r17889650
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala ---
@@ -0,0 +1,430 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.tree
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.tree.configuration.Algo._
+import org.apache.spark.mllib.tree.configuration.QuantileStrategy._
+import org.apache.spark.mllib.tree.configuration.Strategy
+import org.apache.spark.mllib.tree.impl.{BaggedPoint, TreePoint, 
DecisionTreeMetadata, TimeTracker}
+import org.apache.spark.mllib.tree.impurity.Impurities
+import org.apache.spark.mllib.tree.model._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * :: Experimental ::
+ * A class which implements a random forest learning algorithm for 
classification and regression.
+ * It supports both continuous and categorical features.
+ *
+ * @param strategy The configuration parameters for the random forest 
algorithm which specify
+ * the type of algorithm (classification, regression, 
etc.), feature type
+ * (continuous, categorical), depth of the tree, quantile 
calculation strategy,
+ * etc.
+ * @param numTrees If 1, then no bootstrapping is used.  If > 1, then 
bootstrapping is done.
+ * @param featureSubsetStrategy Number of features to consider for splits 
at each node.
+ *  Supported: "auto" (default), "all", 
"sqrt", "log2", "onethird".
+ *  If "auto" is set, this parameter is set 
based on numTrees:
+ *  if numTrees == 1, then 
featureSubsetStrategy = "all";
+ *  if numTrees > 1, then 
featureSubsetStrategy = "sqrt".
+ * @param seed  Random seed for bootstrapping and choosing feature subsets.
+ */
+@Experimental
+private class RandomForest (
+private val strategy: Strategy,
+private val numTrees: Int,
+featureSubsetStrategy: String,
+private val seed: Int)
+  extends Serializable with Logging {
+
+  strategy.assertValid()
+  require(numTrees > 0, s"RandomForest requires numTrees > 0, but was 
given numTrees = $numTrees.")
+  
require(RandomForest.supportedFeatureSubsetStrategies.contains(featureSubsetStrategy),
+s"RandomForest given invalid featureSubsetStrategy: 
$featureSubsetStrategy." +
+s" Supported values: 
${RandomForest.supportedFeatureSubsetStrategies.mkString(", ")}.")
+
+  /**
+   * Method to train a decision tree model over an RDD
+   * @param input Training data: RDD of 
[[org.apache.spark.mllib.regression.LabeledPoint]]
+   * @return RandomForestModel that can be used for prediction
+   */
+  def train(input: RDD[LabeledPoint]): RandomForestModel = {
+
+val timer = new TimeTracker()
+
+timer.start("total")
+
+timer.start("init")
+
+val retaggedInput = input.retag(classOf[LabeledPoint])
+val metadata =
+  DecisionTreeMetadata.buildMetadata(retaggedInput, strategy, 
numTrees, featureSubsetStrategy)
+logDebug("algo = " + strategy.algo)
+logDebug("numTrees = " + numTrees)
+logDebug("seed = " + seed)
+logDebug("maxBins = " + metadata.maxBins)
+logDebug("featureSubsetStrategy = " + featureSubsetStrategy)
+logDebug("numFeaturesPerNode = " + metadata.numFeaturesPerNode)
+
+// Find the splits and the corresponding bins (interval between the 
splits) using a sample
+// of the input data.
+timer.start("findSpli

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-22 Thread manishamde
Github user manishamde commented on a diff in the pull request:

https://github.com/apache/spark/pull/2435#discussion_r17889641
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala ---
@@ -0,0 +1,430 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.tree
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.api.java.JavaRDD
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.tree.configuration.Algo._
+import org.apache.spark.mllib.tree.configuration.QuantileStrategy._
+import org.apache.spark.mllib.tree.configuration.Strategy
+import org.apache.spark.mllib.tree.impl.{BaggedPoint, TreePoint, 
DecisionTreeMetadata, TimeTracker}
+import org.apache.spark.mllib.tree.impurity.Impurities
+import org.apache.spark.mllib.tree.model._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.util.Utils
+
+/**
+ * :: Experimental ::
+ * A class which implements a random forest learning algorithm for 
classification and regression.
+ * It supports both continuous and categorical features.
+ *
+ * @param strategy The configuration parameters for the random forest 
algorithm which specify
+ * the type of algorithm (classification, regression, 
etc.), feature type
+ * (continuous, categorical), depth of the tree, quantile 
calculation strategy,
+ * etc.
+ * @param numTrees If 1, then no bootstrapping is used.  If > 1, then 
bootstrapping is done.
+ * @param featureSubsetStrategy Number of features to consider for splits 
at each node.
+ *  Supported: "auto" (default), "all", 
"sqrt", "log2", "onethird".
+ *  If "auto" is set, this parameter is set 
based on numTrees:
+ *  if numTrees == 1, then 
featureSubsetStrategy = "all";
+ *  if numTrees > 1, then 
featureSubsetStrategy = "sqrt".
+ * @param seed  Random seed for bootstrapping and choosing feature subsets.
+ */
+@Experimental
+private class RandomForest (
+private val strategy: Strategy,
+private val numTrees: Int,
+featureSubsetStrategy: String,
+private val seed: Int)
+  extends Serializable with Logging {
+
+  strategy.assertValid()
+  require(numTrees > 0, s"RandomForest requires numTrees > 0, but was 
given numTrees = $numTrees.")
+  
require(RandomForest.supportedFeatureSubsetStrategies.contains(featureSubsetStrategy),
+s"RandomForest given invalid featureSubsetStrategy: 
$featureSubsetStrategy." +
+s" Supported values: 
${RandomForest.supportedFeatureSubsetStrategies.mkString(", ")}.")
+
+  /**
+   * Method to train a decision tree model over an RDD
+   * @param input Training data: RDD of 
[[org.apache.spark.mllib.regression.LabeledPoint]]
+   * @return RandomForestModel that can be used for prediction
+   */
+  def train(input: RDD[LabeledPoint]): RandomForestModel = {
+
+val timer = new TimeTracker()
+
+timer.start("total")
+
+timer.start("init")
+
+val retaggedInput = input.retag(classOf[LabeledPoint])
+val metadata =
+  DecisionTreeMetadata.buildMetadata(retaggedInput, strategy, 
numTrees, featureSubsetStrategy)
+logDebug("algo = " + strategy.algo)
+logDebug("numTrees = " + numTrees)
+logDebug("seed = " + seed)
+logDebug("maxBins = " + metadata.maxBins)
+logDebug("featureSubsetStrategy = " + featureSubsetStrategy)
+logDebug("numFeaturesPerNode = " + metadata.numFeaturesPerNode)
+
+// Find the splits and the corresponding bins (interval between the 
splits) using a sample
+// of the input data.
+timer.start("findSpli

[GitHub] spark pull request: [SPARK-3653] Respect SPARK_*_MEMORY for cluste...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2500#issuecomment-56469714
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20677/consoleFull)
 for   PR 2500 at commit 
[`6217b38`](https://github.com/apache/spark/commit/6217b38e5a71e4ef98b82a2968b8da7df5df94a1).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3653] Respect SPARK_*_MEMORY for cluste...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2500#issuecomment-56469716
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20677/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: stop, start and destroy require the EC2_REGION

2014-09-22 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2473#discussion_r17889468
  
--- Diff: docs/ec2-scripts.md ---
@@ -48,6 +48,15 @@ by looking for the "Name" tag of the instance in the 
Amazon EC2 Console.
 key pair, `` is the number of slave nodes to launch (try
 1 at first), and `` is the name to give to your
 cluster.
+
+For Example:
--- End diff --

Minor nit: "For example:" (lower case "E")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: stop, start and destroy require the EC2_REGION

2014-09-22 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2473#discussion_r17889459
  
--- Diff: docs/ec2-scripts.md ---
@@ -137,11 +146,11 @@ cost you any EC2 cycles, but ***will*** continue to 
cost money for EBS
 storage.
 
 - To stop one of your clusters, go into the `ec2` directory and run
-`./spark-ec2 stop `.
+`./spark-ec2 --region= stop `.
 - To restart it later, run
-`./spark-ec2 -i  start `.
+`./spark-ec2 -i  --region= start `.
 - To ultimately destroy the cluster and stop consuming EBS space, run
-`./spark-ec2 destroy ` as described in the previous
+`./spark-ec2 --region= destroy ` as described in 
the previous
--- End diff --

Hmm, are you sure `destroy` requires `ec2-region`? I've been successfully 
destroying EC2 clusters without specifying it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1720][SPARK-1719] Add the value of LD_L...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1031#issuecomment-56468425
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20681/consoleFull)
 for   PR 1031 at commit 
[`f44c221`](https://github.com/apache/spark/commit/f44c221aceb2f246eec335f4b7a1cd6f0c2b0080).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2497#issuecomment-56468434
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20680/consoleFull)
 for   PR 2497 at commit 
[`b3b3e50`](https://github.com/apache/spark/commit/b3b3e50a0c13df08a607f036592f83e566cded39).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-09-22 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2497#issuecomment-56468116
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-09-22 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2469#issuecomment-56468138
  
Thanks for fixing this @vanzin. I will look at it shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3652] [SQL] upgrade spark sql hive vers...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2499#issuecomment-56468081
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/142/consoleFull)
 for   PR 2499 at commit 
[`6d5d071`](https://github.com/apache/spark/commit/6d5d0710eb2ab1c14208deb158c2f4b018ddbf33).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3652] [SQL] upgrade spark sql hive vers...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2499#issuecomment-56468023
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/142/consoleFull)
 for   PR 2499 at commit 
[`6d5d071`](https://github.com/apache/spark/commit/6d5d0710eb2ab1c14208deb158c2f4b018ddbf33).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2494#issuecomment-56468001
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20676/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2494#issuecomment-56467997
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20676/consoleFull)
 for   PR 2494 at commit 
[`1801fd2`](https://github.com/apache/spark/commit/1801fd2e9518c610b3657c6a9cb9239fedd43847).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class IDF(val minimumOccurence: Long) `
  * `  class DocumentFrequencyAggregator(val minimumOccurence: Long) 
extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-56467600
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20679/consoleFull)
 for   PR 2226 at commit 
[`e69ce88`](https://github.com/apache/spark/commit/e69ce883ee9d337a81d4aae3a63943937f771e84).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2497#issuecomment-56467574
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20673/consoleFull)
 for   PR 2497 at commit 
[`b3b3e50`](https://github.com/apache/spark/commit/b3b3e50a0c13df08a607f036592f83e566cded39).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2497#issuecomment-56467583
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20673/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-22 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2226#discussion_r1720
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -522,6 +523,52 @@ class HiveQuerySuite extends HiveComparisonTest {
   case class LogEntry(filename: String, message: String)
   case class LogFile(name: String)
 
+  createQueryTest("dynamic_partition",
+"""
+  |DROP TABLE IF EXISTS dynamic_part_table;
+  |CREATE TABLE dynamic_part_table(intcol INT) PARTITIONED BY 
(partcol1 INT, partcol2 INT);
+  |
+  |SET hive.exec.dynamic.partition.mode=nonstrict;
+  |
+  |INSERT INTO TABLE dynamic_part_table PARTITION(partcol1, partcol2)
+  |SELECT 1, 1, 1 FROM src WHERE key=150;
+  |
+  |INSERT INTO TABLE dynamic_part_table PARTITION(partcol1, partcol2)
+  |SELECT 1, NULL, 1 FROM src WHERE key=150;
+  |
+  |INSERT INTO TABLE dynamic_part_table PARTITION(partcol1, partcol2)
+  |SELECT 1, 1, NULL FROM src WHERE key=150;
+  |
+  |INSERT INTO TABLe dynamic_part_table PARTITION(partcol1, partcol2)
+  |SELECT 1, NULL, NULL FROM src WHERE key=150;
+  |
+  |DROP TABLE IF EXISTS dynamic_part_table;
+""".stripMargin)
--- End diff --

Added a test to validate dynamic partitioning folder layout by loading each 
partition from specific partition folder and check the contents.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1486#issuecomment-56467045
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20678/consoleFull)
 for   PR 1486 at commit 
[`9c4933c`](https://github.com/apache/spark/commit/9c4933c6e18db8bf2e0cbd0deb85b46c2ca0d2b2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3653] Respect SPARK_*_MEMORY for cluste...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2500#issuecomment-56466221
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20677/consoleFull)
 for   PR 2500 at commit 
[`6217b38`](https://github.com/apache/spark/commit/6217b38e5a71e4ef98b82a2968b8da7df5df94a1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3653] Respect SPARK_*_MEMORY for cluste...

2014-09-22 Thread andrewor14
GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/2500

[SPARK-3653] Respect SPARK_*_MEMORY for cluster mode

`SPARK_DRIVER_MEMORY` was only used to start the `SparkSubmit` JVM, which 
becomes the driver only in client mode but not cluster mode. In cluster mode, 
this property is simply not propagated to the worker nodes.

`SPARK_EXECUTOR_MEMORY` is picked up from `SparkContext`, but in cluster 
mode the driver runs on one of the worker machines, where this environment 
variable may not be set.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark memory-env-vars

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2500.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2500


commit 6217b38e5a71e4ef98b82a2968b8da7df5df94a1
Author: Andrew Or 
Date:   2014-09-23T01:06:23Z

Respect SPARK_*_MEMORY for cluster mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1486#issuecomment-56465510
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20674/consoleFull)
 for   PR 1486 at commit 
[`8f9c5d6`](https://github.com/apache/spark/commit/8f9c5d66d7a630ebfee64afee7fa922c22f838ee).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3652] [SQL] upgrade spark sql hive vers...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2499#issuecomment-56465504
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1486#issuecomment-56465520
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20674/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3652] [SQL] upgrade spark sql hive vers...

2014-09-22 Thread scwf
GitHub user scwf opened a pull request:

https://github.com/apache/spark/pull/2499

[SPARK-3652] [SQL] upgrade spark sql hive version to 0.13.1

Now spark sql hive version is 0.12.0 and do not support 0.13.1 because of 
some api level changes in hive new version. 
Since hive has backwards compatibility, this PR just upgrade the hive 
version to 0.13.1(compile this PR against 0.12.0 will get error), i think this 
is ok for users and we also do not need to support different version of hive .

Notes:
1. package cmd not changed, sbt/sbt -Phive assembly will get the assembly 
jar with hive 0.13.1
2. this PR use org.apache.hive since there is not a shaded one of 
org.spark-project.hive for 0.13.1
3. i regenerate golden answer since change of sql query result

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scwf/spark hive-0.13.1-clean

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2499.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2499


commit 5d9de8ec6145d286ca05906d5e1dd1cfd9760e71
Author: scwf 
Date:   2014-09-21T15:45:00Z

update pom to org.apache.hive 0.13.1 version

commit c3aa95f9861a541df249f29ddb35b5ad9e6a4751
Author: w00228970 
Date:   2014-09-22T04:56:44Z

fix errors of hive/hive-thriftserver when update to org.apache.hive 0.13.1

commit 22f648655aa5941e53e65cbeadb097a32e0af8cf
Author: w00228970 
Date:   2014-09-22T06:00:55Z

fix StatisticsSuite error

commit f9fdc1ca944e14a986d910b7093da3ae4586cc68
Author: w00228970 
Date:   2014-09-22T06:38:01Z

loginFromKeyTab when set hive.server2.authentication

commit 2afcaa1e6f579b209ecc07d98a990520cdb81350
Author: w00228970 
Date:   2014-09-22T08:30:30Z

delete invalid set fs.default.name, this will lead to query error since 
SessionStat.start changed in hive0.13.1

commit a09fc4e37d54fda41c8cbf6afc6d577ece51ec55
Author: w00228970 
Date:   2014-09-22T08:42:28Z

fix Operation cancelled

commit 8b9309014e4e76560378a543fdddec51c874092c
Author: w00228970 
Date:   2014-09-22T09:09:08Z

regenerate golden answer

commit 9bee908fdc4ee947e2c96e8c0e9006f2023eb870
Author: w00228970 
Date:   2014-09-22T10:09:39Z

ignore stats_empty_partition

commit 0b15b748e94fd6afbc19cd4397cf9f74adf9064b
Author: w00228970 
Date:   2014-09-22T10:11:07Z

add logic for case VoidObjectInspector in method inspectorToDataType

commit eab2354187ce88c051ffc6c149847b08e532804b
Author: w00228970 
Date:   2014-09-22T10:39:51Z

reset TestHive in CachedTablesuite

commit 853632d71bdb16a6792776c951257524d728c8eb
Author: w00228970 
Date:   2014-09-22T14:34:52Z

fix Hivequerysuite

commit 6d5d0710eb2ab1c14208deb158c2f4b018ddbf33
Author: w00228970 
Date:   2014-09-22T14:59:41Z

fix analyze MetastoreRelations




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-22 Thread rnowling
Github user rnowling commented on the pull request:

https://github.com/apache/spark/pull/2494#issuecomment-56464302
  
@Ishiihara Thanks for pointing out the style check -- I found and fixed the 
style error in IDF.scala.

Thanks for mentioning options for the mimimumOccurence members. I decided 
to add the val keyword over adding a setter.  Earlier, I had considered several 
approaches including making it an optional parameter and adding a Scala-style 
setter, however I found that neither provided clean Java interoperability.  As 
a result, I settled on the overloaded constructor approach, which also matches 
the emphasis on immutability within Scala better.  Since creating IDF's is 
inexpensive, I don't think performance will be an issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2494#issuecomment-56464144
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20676/consoleFull)
 for   PR 2494 at commit 
[`1801fd2`](https://github.com/apache/spark/commit/1801fd2e9518c610b3657c6a9cb9239fedd43847).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2494#issuecomment-56463906
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20675/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2494#issuecomment-56463904
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20675/consoleFull)
 for   PR 2494 at commit 
[`6897252`](https://github.com/apache/spark/commit/689725201b3fbfa1232f4b5f74dc5002c8950b3f).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class IDF(val minimumOccurence: Long) `
  * `  class DocumentFrequencyAggregator(val minimumOccurence: Long) 
extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2494#issuecomment-56463855
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20675/consoleFull)
 for   PR 2494 at commit 
[`6897252`](https://github.com/apache/spark/commit/689725201b3fbfa1232f4b5f74dc5002c8950b3f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1486#issuecomment-56463517
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20674/consoleFull)
 for   PR 1486 at commit 
[`8f9c5d6`](https://github.com/apache/spark/commit/8f9c5d66d7a630ebfee64afee7fa922c22f838ee).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

2014-09-22 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2492#issuecomment-56463455
  
> Maybe my JIRA was misleadingly named; my motivation here is allowing 
users to specify versions of packages that take precedence over other versions 
of that same package that might be installed on the system, not in overriding 
modules included in Python's standard library (although the ability to do that 
is a side-effect of this change).

Understood, this side-effect is bit dangerous. The third-package could 
appear in sys.path in any order, such as 

```python
>>> import sys
>>> sys.path
['', '//anaconda/lib/python2.7/site-packages/DPark-0.1-py2.7.egg', 
'//anaconda/lib/python2.7/site-packages/protobuf-2.5.0-py2.7.egg', 
'//anaconda/lib/python2.7/site-packages/msgpack_python-0.4.2-py2.7-macosx-10.5-x86_64.egg',
 '//anaconda/lib/python2.7/site-packages/setuptools-3.6-py2.7.egg', 
'/Users/daviesliu/work/spark/python/lib', 
'/Users/daviesliu/work/spark/python/lib/py4j-0.8.2.1-src.zip', 
'/Users/daviesliu/work/spark/python', '//anaconda/lib/python27.zip', 
'//anaconda/lib/python2.7', '//anaconda/lib/python2.7/plat-darwin', 
'//anaconda/lib/python2.7/plat-mac', 
'//anaconda/lib/python2.7/plat-mac/lib-scriptpackages', 
'//anaconda/lib/python2.7/lib-tk', '//anaconda/lib/python2.7/lib-old', 
'//anaconda/lib/python2.7/lib-dynload', 
'//anaconda/lib/python2.7/site-packages', 
'//anaconda/lib/python2.7/site-packages/PIL', 
'//anaconda/lib/python2.7/site-packages/runipy-0.1.0-py2.7.egg']
```
it's not easy to find a position which is before third-package but after 
standard module.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: WHITESPACE CHANGE DO NOT MERGE

2014-09-22 Thread shaneknapp
Github user shaneknapp closed the pull request at:

https://github.com/apache/spark/pull/2498


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2497#issuecomment-56463199
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20673/consoleFull)
 for   PR 2497 at commit 
[`b3b3e50`](https://github.com/apache/spark/commit/b3b3e50a0c13df08a607f036592f83e566cded39).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: WHITESPACE CHANGE DO NOT MERGE

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2498#issuecomment-56463163
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20672/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: WHITESPACE CHANGE DO NOT MERGE

2014-09-22 Thread shaneknapp
Github user shaneknapp commented on the pull request:

https://github.com/apache/spark/pull/2498#issuecomment-56463049
  
jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: WHITESPACE CHANGE DO NOT MERGE

2014-09-22 Thread shaneknapp
GitHub user shaneknapp opened a pull request:

https://github.com/apache/spark/pull/2498

WHITESPACE CHANGE DO NOT MERGE

WHITESPACE CHANGE DO NOT MERGE

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shaneknapp/spark sknapptest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2498.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2498


commit c15f44ae06b71ccc0ed629771206760ab1c57797
Author: shane knapp 
Date:   2014-09-11T15:33:50Z

DO NOT MERGE, TESTING ONLY

commit 4e0747f7fd60a61f429b3e623072616690769d67
Author: shane knapp 
Date:   2014-09-23T00:18:03Z

WHITESPACE CHANGE DO NOT MERGE




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-09-22 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2497#issuecomment-56462789
  
Backport of #2469 to branch-1.1. Sending now to speed up the review 
process, since the original PR doesn't merge cleanly into this branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3606] [yarn] Correctly configure AmIpFi...

2014-09-22 Thread vanzin
GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/2497

[SPARK-3606] [yarn] Correctly configure AmIpFilter for Yarn HA (1.1 vers...

...ion).

This is a backport of SPARK-3606 to branch-1.1. Some of the code had to be
duplicated since branch-1.1 doesn't have the cleanup work that was done to
the Yarn codebase.

I don't know whether the version issue in yarn/alpha/pom.xml was 
intentional,
but I couldn't compile the code without fixing it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-3606-1.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2497.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2497


commit b3b3e50a0c13df08a607f036592f83e566cded39
Author: Marcelo Vanzin 
Date:   2014-09-19T23:40:43Z

[SPARK-3606] [yarn] Correctly configure AmIpFilter for Yarn HA (1.1 
version).

This is a backport of SPARK-3606 to branch-1.1. Some of the code had to be
duplicated since branch-1.1 doesn't have the cleanup work that was done to
the Yarn codebase.

I don't know whether the version issue in yarn/alpha/pom.xml was 
intentional,
but I couldn't compile the code without fixing it.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-22 Thread Ishiihara
Github user Ishiihara commented on a diff in the pull request:

https://github.com/apache/spark/pull/2494#discussion_r17886499
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala ---
@@ -30,9 +30,20 @@ import org.apache.spark.rdd.RDD
  * Inverse document frequency (IDF).
  * The standard formulation is used: `idf = log((m + 1) / (d(t) + 1))`, 
where `m` is the total
  * number of documents and `d(t)` is the number of documents that contain 
term `t`.
+ *
+ * This implementation supports filtering out terms which do not appear in 
a minimum number
+ * of documents (controlled by the variable minimumOccurence). For terms 
that are not in
+ * at least `minimumOccurence` documents, the IDF is found as 0, resulting 
in TF-IDFs of 0.
+ *
+ * @param minimumOccurence minimum of documents in which a term
+ * should appear for filtering
+ *
+ *
  */
 @Experimental
-class IDF {
+class IDF(minimumOccurence: Long) {
--- End diff --

You can add a val before minimumOccurence. Alternatively, if you want to 
set set minimumOccurence after new IDF(), you can define a private field and 
use a setter to set the value. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread cmccabe
Github user cmccabe commented on a diff in the pull request:

https://github.com/apache/spark/pull/1486#discussion_r17886353
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -309,4 +323,42 @@ private[spark] object HadoopRDD {
   f(inputSplit, firstParent[T].iterator(split, context))
 }
   }
+
+  private[spark] class SplitInfoReflections {
+val inputSplitWithLocationInfo =
+  Class.forName("org.apache.hadoop.mapred.InputSplitWithLocationInfo")
+val getLocationInfo = 
inputSplitWithLocationInfo.getMethod("getLocationInfo")
+val newInputSplit = 
Class.forName("org.apache.hadoop.mapreduce.InputSplit")
+val newGetLocationInfo = newInputSplit.getMethod("getLocationInfo")
+val splitLocationInfo = 
Class.forName("org.apache.hadoop.mapred.SplitLocationInfo")
+val isInMemory = splitLocationInfo.getMethod("isInMemory")
+val getLocation = splitLocationInfo.getMethod("getLocation")
+  }
+
+  private[spark] val SPLIT_INFO_REFLECTIONS = try {
--- End diff --

Sorry, I forgot about this one.  I added a type annotation. to 
SPLIT_INFO_REFLECTIONS.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-22 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/2087#issuecomment-56461580
  
MapReduce doesn't use getPos, but it does look like it might be helpful in 
some situations.  One caveat is that pos only means # bytes for file input 
formats.  For example, for DBInputFormat, it means the number of records. 

If we choose to use getPos for pre-2.5 Hadoop, my preference would be to 
make that change in a separate patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-22 Thread Ishiihara
Github user Ishiihara commented on the pull request:

https://github.com/apache/spark/pull/2494#issuecomment-56461303
  
@rnowling Please run sbt/sbt scalastyle on your local machine to clear out 
style issues. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2495#issuecomment-56461187
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20670/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2495#issuecomment-56461185
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20670/consoleFull)
 for   PR 2495 at commit 
[`d054d33`](https://github.com/apache/spark/commit/d054d33181486e3b90222e5e30b2f20648434673).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2494#issuecomment-56461090
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20671/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread cmccabe
Github user cmccabe commented on a diff in the pull request:

https://github.com/apache/spark/pull/1486#discussion_r17886029
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala 
---
@@ -22,13 +22,35 @@ package org.apache.spark.scheduler
  * In the latter case, we will prefer to launch the task on that 
executorID, but our next level
  * of preference will be executors on the same host if this is not 
possible.
  */
-private[spark]
-class TaskLocation private (val host: String, val executorId: 
Option[String]) extends Serializable {
-  override def toString: String = "TaskLocation(" + host + ", " + 
executorId + ")"
+private[spark] sealed abstract class TaskLocation(val host: String) {
+}
+
+private [spark] case class ExecutorCacheTaskLocation(override val host: 
String,
+val executorId: String) extends TaskLocation(host) {
+}
+
+private [spark] case class HDFSCachedTaskLocation(override val host: 
String)
+extends TaskLocation(host) {
+  override def toString = TaskLocation.in_memory_location_tag + host
+}
+
+private [spark] case class HostTaskLocation(override val host: String) 
extends TaskLocation(host) {
+  override def toString = host
 }
 
 private[spark] object TaskLocation {
-  def apply(host: String, executorId: String) = new TaskLocation(host, 
Some(executorId))
+  // We identify hosts on which the block is cached with this prefix.  
Because this prefix contains
+  // underscores, which are not legal characters in hostnames, there 
should be no potential for
+  // confusion.  See  RFC 952 and RFC 1123 for information about the 
format of hostnames.
+  val in_memory_location_tag = "_hdfs_cache_"
+
+  def apply(host: String, executorId: String) = new 
ExecutorCacheTaskLocation(host, executorId)
 
-  def apply(host: String) = new TaskLocation(host, None)
+  def apply(str: String) = {
+if (str.startsWith(in_memory_location_tag)) {
+  new 
HDFSCachedTaskLocation(str.substring(in_memory_location_tag.length))
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2494#issuecomment-56461087
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20671/consoleFull)
 for   PR 2494 at commit 
[`a200bab`](https://github.com/apache/spark/commit/a200babbad7280d3a20f05abb84140b0b8d51b85).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class IDF(minimumOccurence: Long) `
  * `  class DocumentFrequencyAggregator(minimumOccurence: Long) extends 
Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3614][MLLIB] Add minimumOccurence filte...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2494#issuecomment-56461020
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20671/consoleFull)
 for   PR 2494 at commit 
[`a200bab`](https://github.com/apache/spark/commit/a200babbad7280d3a20f05abb84140b0b8d51b85).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-22 Thread cmccabe
Github user cmccabe commented on a diff in the pull request:

https://github.com/apache/spark/pull/1486#discussion_r17886024
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala 
---
@@ -22,13 +22,35 @@ package org.apache.spark.scheduler
  * In the latter case, we will prefer to launch the task on that 
executorID, but our next level
  * of preference will be executors on the same host if this is not 
possible.
  */
-private[spark]
-class TaskLocation private (val host: String, val executorId: 
Option[String]) extends Serializable {
-  override def toString: String = "TaskLocation(" + host + ", " + 
executorId + ")"
+private[spark] sealed abstract class TaskLocation(val host: String) {
+}
+
+private [spark] case class ExecutorCacheTaskLocation(override val host: 
String,
+val executorId: String) extends TaskLocation(host) {
+}
+
+private [spark] case class HDFSCachedTaskLocation(override val host: 
String)
+extends TaskLocation(host) {
+  override def toString = TaskLocation.in_memory_location_tag + host
+}
+
+private [spark] case class HostTaskLocation(override val host: String) 
extends TaskLocation(host) {
+  override def toString = host
 }
 
 private[spark] object TaskLocation {
-  def apply(host: String, executorId: String) = new TaskLocation(host, 
Some(executorId))
+  // We identify hosts on which the block is cached with this prefix.  
Because this prefix contains
+  // underscores, which are not legal characters in hostnames, there 
should be no potential for
+  // confusion.  See  RFC 952 and RFC 1123 for information about the 
format of hostnames.
+  val in_memory_location_tag = "_hdfs_cache_"
+
+  def apply(host: String, executorId: String) = new 
ExecutorCacheTaskLocation(host, executorId)
 
-  def apply(host: String) = new TaskLocation(host, None)
+  def apply(str: String) = {
--- End diff --

added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >