[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...

2014-03-15 Thread ScrapCodes
Github user ScrapCodes closed the pull request at: https://github.com/apache/spark/pull/140 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/140#discussion_r10632880 --- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala --- @@ -86,14 +92,9 @@ class DoubleRDDFunctions(self: RDD[Double]) extends

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37712645 Ahh I understood the downside, that would be just for numbers then. makes sense. May be we can have both ? --- If your project is set up for it, you can reply to

[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/140#discussion_r10632860 --- Diff: project/build.properties --- @@ -14,4 +14,4 @@ # See the License for the specific language governing permissions and # limitations under

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37712447 Hey Matei, For a large dataset someone might wanna do it once, like with stat counter all of the numbers are calculated in one go. --- If your project is set

[GitHub] spark pull request: SPARK-1170-pyspark-histogram: added histogram ...

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/122#issuecomment-37642205 Hi Daniel, Thanks for the patch, It would be good to separate out the implementation of min max into a different PR and provide Rdd.min and RDD.max

[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...

2014-03-14 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/140 SPARK-1246, added min max API to Double RDDs in java and scala APIs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37625562 They were added in 2.7.4 onwards though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37618967 Hey Matei, Got rid of copying `heapq.py` and all the license stuff. But resorted to using internal API of heapq though. It should be simpler. I

[GitHub] spark pull request: Spark 615 map partitions with index callable f...

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/16#issuecomment-37531010 It might be good to add this test in java8 API suite ? Not sure if its 100% necessary, but there exist one for all other APIs (I hope!!). Thoughts ? --- If your

[GitHub] spark pull request: Prevent ContextClassLoader of Actor from becom...

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/15#issuecomment-37530227 Mind changing the PR title to add Jira ID? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: Prevent ContextClassLoader of Actor from becom...

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/15#issuecomment-37529958 Thanks for the fix. Only for the record this happens only when MASTER="local" or local[2]. Looks good. It might be good to add above test case in

[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/125#discussion_r10562618 --- Diff: project/plugins.sbt --- @@ -10,6 +10,8 @@ addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "2.2.0&

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10555984 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala --- @@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37505304 PriorityQueue in a way is just a wrapper over heapq and allows for blocking for put and get(AFAIU). We would need maxheapq to retain the top N smallest elements. One

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552982 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala --- @@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552833 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala --- @@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552608 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala --- @@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10514209 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -49,9 +49,28 @@ class ShuffleDependency[K, V]( @transient rdd: RDD

[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-11 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/125#issuecomment-37379933 @pwendell thoughts ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-11 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/125#issuecomment-37379720 We did not want to have this in our builds (maven or SBT) and running this so trivial that it might not even need that. I am not sure about the dynamics of a release

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-11 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37379097 Hi Matei, Does this mean that when key is None, then it would do the same thing as top ? In case NO, then we would need a maxheap since min heap will only keep

[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-11 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/125 SPARK-1144 Added license and RAT to check licenses. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 rat-integration

[GitHub] spark pull request: SPARK-1096, a space after comment style checke...

2014-03-11 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/124 SPARK-1096, a space after comment style checker. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1096/scalastyle

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-11 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-37272574 Hey Matei, Thanks ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1170 Added histogram(buckets) to pyspark...

2014-03-11 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/121 SPARK-1170 Added histogram(buckets) to pyspark and not histogram(noOfBuckets). That can be a part 2 of this PR. If we can have min and max functions on a RDD of double, that would be good. You

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-10 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37161692 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1168, Added foldByKey to pyspark.

2014-03-10 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/115 SPARK-1168, Added foldByKey to pyspark. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1168/pyspark-foldByKey

[GitHub] spark pull request: Update junitxml plugin to the latest version t...

2014-03-08 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/104#issuecomment-37096375 Very cool, finally we have this ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-07 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37086086 Hey Matei, PSF License is included now, I was not sure if the entire history of license should be included. --- If your project is set up for it, you can

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-07 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/97#discussion_r10407050 --- Diff: python/pyspark/maxheapq.py --- @@ -0,0 +1,115 @@ +# -*- coding: latin-1 -*- + +"""Heap queue algorithm (a.k.a.

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-07 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/97 Spark 1162 Implemented takeOrdered in pyspark. Since python does not have a library for max heap and usual tricks like inverting values etc.. does not work for all cases. So best thing I could

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-36971911 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/93#discussion_r10370555 --- Diff: python/pyspark/rdd.py --- @@ -628,6 +669,26 @@ def mergeMaps(m1, m2): m1[k] += v return m1

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-36887864 @mateiz I am learning python while doing this, so not sure if it is going to make sense. + I have not figured how to implement takeOrdered. Will it be fine if

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/93 SPARK-1162 Added top in python. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1162/pyspark-top-takeOrdered

[GitHub] spark pull request: Spark 1165 rdd.intersection in python and java

2014-03-05 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/80#issuecomment-36729592 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Spark 1165 rdd.intersection in python and java

2014-03-05 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/80 Spark 1165 rdd.intersection in python and java You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1165/RDD.intersection

[GitHub] spark pull request: SPARK-964 Fix for -java-home note.

2014-03-04 Thread ScrapCodes
Github user ScrapCodes closed the pull request at: https://github.com/apache/spark/pull/71 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-1109 wrong API docs for pyspark map func...

2014-03-04 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/73 SPARK-1109 wrong API docs for pyspark map function You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1109/wrong-API

[GitHub] spark pull request: SPARK-1164 Deprecated reduceByKeyToDriver as i...

2014-03-04 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/72 SPARK-1164 Deprecated reduceByKeyToDriver as it is an alias for reduceByKeyLocally You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes

[GitHub] spark pull request: [WIP] SPARK-964 Fix for -java-home note.

2014-03-03 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/71#issuecomment-36598821 It does not cover the case if JAVA_HOME points to invalid directory, it will simply takes alternate path instead of failing nicely. --- If your project is set up for

[GitHub] spark pull request: [WIP] SPARK-964 Fix for -java-home note.

2014-03-03 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/71#issuecomment-36597306 @pwendell Hey Patrick, It might be good to have jenkins not test the PRs which start with [WIP] or WIP. Or something like that ? --- If your project is set up for it

[GitHub] spark pull request: [WIP] SPARK-964 Fix for -java-home note.

2014-03-03 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/71 [WIP] SPARK-964 Fix for -java-home note. I just did a manual testing of this. with -java-home "jdk", setting just JAVA_HOME and both. Hope it covers all cases. It

[GitHub] spark pull request: [java8API] SPARK-964 Investigate the potential...

2014-03-03 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/17#issuecomment-36594377 There is one thing to note, `-java-home` currently has a note, we can actually fix that. In the sense by moving check after process args. --- If your project is set

[GitHub] spark pull request: [java8API] SPARK-964 Investigate the potential...

2014-03-03 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/17#discussion_r10227199 --- Diff: extras/java8-tests/README.md --- @@ -0,0 +1,15 @@ +# Java 8 test suites. + +These tests are bundled with spark and run if you have java

[GitHub] spark pull request: [HOTFIX] Patching maven build after #6 (SPARK-...

2014-02-28 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/37#issuecomment-36335799 Hey Patrick, Forgive me for this, this is the second time I have messed up maven build. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: SPARK-1121 Only add avro if the build is for H...

2014-02-26 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/6#issuecomment-36217654 Rebased !! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature