GitHub user bijaybisht reopened a pull request:

    https://github.com/apache/incubator-spark/pull/522

    Hadoop jar name

    This pull request is a copy of 
    #121 - Fix for hadoop client jar name, which got changed from 1.*. The 
other one was from master, which is wrong way of generating the pull requests. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-spark hadoop_jar_name

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-spark/pull/522.patch

----
commit 0ff38c22205f14770ecca1e66378e7c207ca2d1d
Author: Erik Selin <erik.se...@jadedpixel.com>
Date:   2014-01-29T20:44:54Z

    Merge pull request #494 from tyro89/worker_registration_issue
    
    Issue with failed worker registrations
    
    I've been going through the spark source after having some odd issues with 
workers dying and not coming back. After some digging (I'm very new to scala 
and spark) I believe I've found a worker registration issue. It looks to me 
like a failed registration follows the same code path as a successful 
registration which end up with workers believing they are connected (since they 
received a `RegisteredWorker` event) even tho they are not registered on the 
Master.
    
    This is a quick fix that I hope addresses this issue (assuming I didn't 
completely miss-read the code and I'm about to look like a silly person :P)
    
    I'm opening this pr now to start a chat with you guys while I do some more 
testing on my side :)
    
    Author: Erik Selin <erik.se...@jadedpixel.com>
    
    == Merge branch commits ==
    
    commit 973012f8a2dcf1ac1e68a69a2086a1b9a50f401b
    Author: Erik Selin <erik.se...@jadedpixel.com>
    Date:   Tue Jan 28 23:36:12 2014 -0500
    
        break logwarning into two lines to respect line character limit.
    
    commit e3754dc5b94730f37e9806974340e6dd93400f85
    Author: Erik Selin <erik.se...@jadedpixel.com>
    Date:   Tue Jan 28 21:16:21 2014 -0500
    
        add log warning when worker registration fails due to attempt to 
re-register on same address.
    
    commit 14baca241fa7823e1213cfc12a3ff2a9b865b1ed
    Author: Erik Selin <erik.se...@jadedpixel.com>
    Date:   Wed Jan 22 21:23:26 2014 -0500
    
        address code style comment
    
    commit 71c0d7e6f59cd378d4e24994c21140ab893954ee
    Author: Erik Selin <erik.se...@jadedpixel.com>
    Date:   Wed Jan 22 16:01:42 2014 -0500
    
        Make a failed registration not persist, not send a `RegisteredWordker` 
event and not run `schedule` but rather send a `RegisterWorkerFailed` message 
to the worker attempting to register.

commit ac712e48af3068672e629cec7766caae3cd77c37
Author: Reynold Xin <r...@apache.org>
Date:   2014-01-30T17:33:18Z

    Merge pull request #524 from rxin/doc
    
    Added spark.shuffle.file.buffer.kb to configuration doc.
    
    Author: Reynold Xin <r...@apache.org>
    
    == Merge branch commits ==
    
    commit 0eea1d761ff772ff89be234e1e28035d54e5a7de
    Author: Reynold Xin <r...@apache.org>
    Date:   Wed Jan 29 14:40:48 2014 -0800
    
        Added spark.shuffle.file.buffer.kb to configuration doc.

commit a8cf3ec157fc9a512421b319cfffc5e4f07cf1f3
Author: Ankur Dave <ankurd...@gmail.com>
Date:   2014-02-01T00:52:02Z

    Merge pull request #527 from ankurdave/graphx-assembly-pom
    
    Add GraphX to assembly/pom.xml
    
    Author: Ankur Dave <ankurd...@gmail.com>
    
    == Merge branch commits ==
    
    commit bb0b33ef9eb1b3d4a4fc283d9abb2ece4abcac23
    Author: Ankur Dave <ankurd...@gmail.com>
    Date:   Fri Jan 31 15:24:52 2014 -0800
    
        Add GraphX to assembly/pom.xml

commit 0386f42e383dc01b8df33c4a70b024e7902b5fdd
Author: Henry Saputra <hsapu...@apache.org>
Date:   2014-02-03T05:51:17Z

    Merge pull request #529 from hsaputra/cleanup_right_arrowop_scala
    
    Change the ⇒ character (maybe from scalariform) to => in Scala code for 
style consistency
    
    Looks like there are some ⇒ Unicode character (maybe from scalariform) in 
Scala code.
    This PR is to change it to => to get some consistency on the Scala code.
    
    If we want to use ⇒ as default we could use sbt plugin scalariform to 
make sure all Scala code has ⇒ instead of =>
    
    And remove unused imports found in TwitterInputDStream.scala while I was 
there =)
    
    Author: Henry Saputra <hsapu...@apache.org>
    
    == Merge branch commits ==
    
    commit 29c1771d346dff901b0b778f764e6b4409900234
    Author: Henry Saputra <hsapu...@apache.org>
    Date:   Sat Feb 1 22:05:16 2014 -0800
    
        Change the ⇒ character (maybe from scalariform) to => in Scala code 
for style consistency.

commit 1625d8c44693420de026138f3abecce2d12f895c
Author: Aaron Davidson <aa...@databricks.com>
Date:   2014-02-03T19:25:39Z

    Merge pull request #530 from aarondav/cleanup. Closes #530.
    
    Remove explicit conversion to PairRDDFunctions in cogroup()
    
    As SparkContext._ is already imported, using the implicit conversion 
appears to make the code much cleaner. Perhaps there was some sinister reason 
for doing the conversion explicitly, however.
    
    Author: Aaron Davidson <aa...@databricks.com>
    
    == Merge branch commits ==
    
    commit aa4a63f1bfd5b5178fe67364dd7ce4d84c357996
    Author: Aaron Davidson <aa...@databricks.com>
    Date:   Sun Feb 2 23:48:04 2014 -0800
    
        Remove explicit conversion to PairRDDFunctions in cogroup()
    
        As SparkContext._ is already imported, using the implicit conversion
        appears to make the code much cleaner. Perhaps there was some sinister
        reason for doing the converion explicitly, however.

commit 23af00f9e0e5108f62cdb9629e3eb4e54bbaa321
Author: Xiangrui Meng <m...@databricks.com>
Date:   2014-02-03T21:02:09Z

    Merge pull request #528 from mengxr/sample. Closes #528.
    
     Refactor RDD sampling and add randomSplit to RDD (update)
    
    Replace SampledRDD by PartitionwiseSampledRDD, which accepts a 
RandomSampler instance as input. The current sample with/without replacement 
can be easily integrated via BernoulliSampler and PoissonSampler. The benefits 
are:
    
    1) RDD.randomSplit is implemented in the same way, related to 
https://github.com/apache/incubator-spark/pull/513
    2) Stratified sampling and importance sampling can be implemented in the 
same manner as well.
    
    Unit tests are included for samplers and RDD.randomSplit.
    
    This should performance better than my previous request where the 
BernoulliSampler creates many Iterator instances:
    https://github.com/apache/incubator-spark/pull/513
    
    Author: Xiangrui Meng <m...@databricks.com>
    
    == Merge branch commits ==
    
    commit e8ce957e5f0a600f2dec057924f4a2ca6adba373
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Mon Feb 3 12:21:08 2014 -0800
    
        more docs to PartitionwiseSampledRDD
    
    commit fbb4586d0478ff638b24bce95f75ff06f713d43b
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Mon Feb 3 00:44:23 2014 -0800
    
        move XORShiftRandom to util.random and use it in BernoulliSampler
    
    commit 987456b0ee8612fd4f73cb8c40967112dc3c4c2d
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Sat Feb 1 11:06:59 2014 -0800
    
        relax assertions in SortingSuite because the RangePartitioner has large 
variance in this case
    
    commit 3690aae416b2dc9b2f9ba32efa465ba7948477f4
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Sat Feb 1 09:56:28 2014 -0800
    
        test split ratio of RDD.randomSplit
    
    commit 8a410bc933a60c4d63852606f8bbc812e416d6ae
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Sat Feb 1 09:25:22 2014 -0800
    
        add a test to ensure seed distribution and minor style update
    
    commit ce7e866f674c30ab48a9ceb09da846d5362ab4b6
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Fri Jan 31 18:06:22 2014 -0800
    
        minor style change
    
    commit 750912b4d77596ed807d361347bd2b7e3b9b7a74
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Fri Jan 31 18:04:54 2014 -0800
    
        fix some long lines
    
    commit c446a25c38d81db02821f7f194b0ce5ab4ed7ff5
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Fri Jan 31 17:59:59 2014 -0800
    
        add complement to BernoulliSampler and minor style changes
    
    commit dbe2bc2bd888a7bdccb127ee6595840274499403
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Fri Jan 31 17:45:08 2014 -0800
    
        switch to partition-wise sampling for better performance
    
    commit a1fca5232308feb369339eac67864c787455bb23
    Merge: ac712e4 cf6128f
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Fri Jan 31 16:33:09 2014 -0800
    
        Merge branch 'sample' of github.com:mengxr/incubator-spark into sample
    
    commit cf6128fb672e8c589615adbd3eaa3cbdb72bd461
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Sun Jan 26 14:40:07 2014 -0800
    
        set SampledRDD deprecated in 1.0
    
    commit f430f847c3df91a3894687c513f23f823f77c255
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Sun Jan 26 14:38:59 2014 -0800
    
        update code style
    
    commit a8b5e2021a9204e318c80a44d00c5c495f1befb6
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Sun Jan 26 12:56:27 2014 -0800
    
        move package random to util.random
    
    commit ab0fa2c4965033737a9e3a9bf0a59cbb0df6a6f5
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Sun Jan 26 12:50:35 2014 -0800
    
        add Apache headers and update code style
    
    commit 985609fe1a55655ad11966e05a93c18c138a403d
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Sun Jan 26 11:49:25 2014 -0800
    
        add new lines
    
    commit b21bddf29850a2c006a868869b8f91960a029322
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Sun Jan 26 11:46:35 2014 -0800
    
        move samplers to random.IndependentRandomSampler and add tests
    
    commit c02dacb4a941618e434cefc129c002915db08be6
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Sat Jan 25 15:20:24 2014 -0800
    
        add RandomSampler
    
    commit 8ff7ba3c5cf1fc338c29ae8b5fa06c222640e89c
    Author: Xiangrui Meng <m...@databricks.com>
    Date:   Fri Jan 24 13:23:22 2014 -0800
    
        init impl of IndependentlySampledRDD

commit 0c05cd374dac309b5444980f10f8dcb820c752c2
Author: Stevo Slavić <ssla...@gmail.com>
Date:   2014-02-04T17:45:46Z

    Merge pull request #535 from sslavic/patch-2. Closes #535.
    
    Fixed typo in scaladoc
    
    Author: Stevo Slavić <ssla...@gmail.com>
    
    == Merge branch commits ==
    
    commit 0a77f789e281930f4168543cc0d3b3ffbf5b3764
    Author: Stevo Slavić <ssla...@gmail.com>
    Date:   Tue Feb 4 15:30:27 2014 +0100
    
        Fixed typo in scaladoc

commit 92092879c3b8001a456fefc2efc0df16585515a8
Author: Stevo Slavić <ssla...@gmail.com>
Date:   2014-02-04T17:47:11Z

    Merge pull request #534 from sslavic/patch-1. Closes #534.
    
    Fixed wrong path to compute-classpath.cmd
    
    compute-classpath.cmd is in bin, not in sbin directory
    
    Author: Stevo Slavić <ssla...@gmail.com>
    
    == Merge branch commits ==
    
    commit 23deca32b69e9429b33ad31d35b7e1bfc9459f59
    Author: Stevo Slavić <ssla...@gmail.com>
    Date:   Tue Feb 4 15:01:47 2014 +0100
    
        Fixed wrong path to compute-classpath.cmd
    
        compute-classpath.cmd is in bin, not in sbin directory

commit f7fd80d9a71069cba94294e6b77c0eaeb90e73d7
Author: Stevo Slavić <ssla...@gmail.com>
Date:   2014-02-05T18:29:45Z

    Merge pull request #540 from sslavic/patch-3. Closes #540.
    
    Fix line end character stripping for Windows
    
    LogQuery Spark example would produce unwanted result when run on Windows 
platform because of different, platform specific trailing line end characters 
(not only \n but \r too).
    
    This fix makes use of Scala's standard library string functions to properly 
strip all trailing line end characters, letting Scala handle the platform 
specific stuff.
    
    Author: Stevo Slavić <ssla...@gmail.com>
    
    == Merge branch commits ==
    
    commit 1e43ba0ea773cc005cf0aef78b6c1755f8e88b27
    Author: Stevo Slavić <ssla...@gmail.com>
    Date:   Wed Feb 5 14:48:29 2014 +0100
    
        Fix line end character stripping for Windows
    
        LogQuery Spark example would produce unwanted result when run on 
Windows platform because of different, platform specific trailing line end 
characters (not only \n but \r too).
    
        This fix makes use of Scala's standard library string functions to 
properly strip all trailing line end characters, letting Scala handle the 
platform specific stuff.

commit cc14ba974c8e98c08548a2ccf64c2765f313f649
Author: Kay Ousterhout <kayousterh...@gmail.com>
Date:   2014-02-05T20:44:24Z

    Merge pull request #544 from kayousterhout/fix_test_warnings. Closes #544.
    
    Fixed warnings in test compilation.
    
    This commit fixes two problems: a redundant import, and a
    deprecated function.
    
    Author: Kay Ousterhout <kayousterh...@gmail.com>
    
    == Merge branch commits ==
    
    commit da9d2e13ee4102bc58888df0559c65cb26232a82
    Author: Kay Ousterhout <kayousterh...@gmail.com>
    Date:   Wed Feb 5 11:41:51 2014 -0800
    
        Fixed warnings in test compilation.
    
        This commit fixes two problems: a redundant import, and a
        deprecated function.

commit 18c4ee71e27189f5f3f4eb6bfc6ad8860aa254c6
Author: CodingCat <zhunans...@gmail.com>
Date:   2014-02-06T06:08:47Z

    Merge pull request #549 from CodingCat/deadcode_master. Closes #549.
    
    remove actorToWorker in master.scala, which is actually not used
    
    actorToWorker is actually not used in the code....just remove it
    
    Author: CodingCat <zhunans...@gmail.com>
    
    == Merge branch commits ==
    
    commit 52656c2d4bbf9abcd8bef65d454badb9cb14a32c
    Author: CodingCat <zhunans...@gmail.com>
    Date:   Thu Feb 6 00:28:26 2014 -0500
    
        remove actorToWorker in master.scala, which is actually not used

commit 38020961d101e792393855fd00d8e42f40713754
Author: Thomas Graves <tgra...@apache.org>
Date:   2014-02-06T07:37:07Z

    Merge pull request #526 from tgravescs/yarn_client_stop_am_fix. Closes #526.
    
    spark on yarn - yarn-client mode doesn't always exit immediately
    
    https://spark-project.atlassian.net/browse/SPARK-1049
    
    If you run in the yarn-client mode but you don't get all the workers you 
requested right away and then you exit your application, the application master 
stays around until it gets the number of workers you initially requested. This 
is a waste of resources.  The AM should exit immediately upon the client going 
away.
    
    This fix simply checks to see if the driver closed while its waiting for 
the initial # of workers.
    
    Author: Thomas Graves <tgra...@apache.org>
    
    == Merge branch commits ==
    
    commit 03f40a62584b6bdd094ba91670cd4aa6afe7cd81
    Author: Thomas Graves <tgra...@apache.org>
    Date:   Fri Jan 31 11:23:10 2014 -0600
    
        spark on yarn - yarn-client mode doesn't always exit immediately

commit 79c95527a77af32bd83a968c1a56feb22e441b7d
Author: Kay Ousterhout <kayousterh...@gmail.com>
Date:   2014-02-06T07:38:12Z

    Merge pull request #545 from kayousterhout/fix_progress. Closes #545.
    
    Fix off-by-one error with task progress info log.
    
    Author: Kay Ousterhout <kayousterh...@gmail.com>
    
    == Merge branch commits ==
    
    commit 29798fc685c4e7e3eb3bf91c75df7fa8ec94a235
    Author: Kay Ousterhout <kayousterh...@gmail.com>
    Date:   Wed Feb 5 13:40:01 2014 -0800
    
        Fix off-by-one error with task progress info log.

commit 084839ba357e03bb56517620123682b50a91cb0b
Author: Prashant Sharma <prashan...@imaginea.com>
Date:   2014-02-06T22:58:35Z

    Merge pull request #498 from ScrapCodes/python-api. Closes #498.
    
    Python api additions
    
    Author: Prashant Sharma <prashan...@imaginea.com>
    
    == Merge branch commits ==
    
    commit 8b51591f1a7a79a62c13ee66ff8d83040f7eccd8
    Author: Prashant Sharma <prashan...@imaginea.com>
    Date:   Fri Jan 24 11:50:29 2014 +0530
    
        Josh's and Patricks review comments.
    
    commit d37f9677838e43bef6c18ef61fbf08055ba6d1ca
    Author: Prashant Sharma <prashan...@imaginea.com>
    Date:   Thu Jan 23 17:27:17 2014 +0530
    
        fixed doc tests
    
    commit 27cb54bf5c99b1ea38a73858c291d0a1c43d8b7c
    Author: Prashant Sharma <prashan...@imaginea.com>
    Date:   Thu Jan 23 16:48:43 2014 +0530
    
        Added keys and values methods for PairFunctions in python
    
    commit 4ce76b396fbaefef2386d7a36d611572bdef9b5d
    Author: Prashant Sharma <prashan...@imaginea.com>
    Date:   Thu Jan 23 13:51:26 2014 +0530
    
        Added foreachPartition
    
    commit 05f05341a187cba829ac0e6c2bdf30be49948c89
    Author: Prashant Sharma <prashan...@imaginea.com>
    Date:   Thu Jan 23 13:02:59 2014 +0530
    
        Added coalesce fucntion to python API
    
    commit 6568d2c2fa14845dc56322c0f39ba2e13b3b26dd
    Author: Prashant Sharma <prashan...@imaginea.com>
    Date:   Thu Jan 23 12:52:44 2014 +0530
    
        added repartition function to python API.

commit 446403b63763157831ddbf6209044efc3cc7bf7c
Author: Sandy Ryza <sa...@cloudera.com>
Date:   2014-02-06T23:41:16Z

    Merge pull request #554 from sryza/sandy-spark-1056. Closes #554.
    
    SPARK-1056. Fix header comment in Executor to not imply that it's only u...
    
    ...sed for Mesos and Standalone.
    
    Author: Sandy Ryza <sa...@cloudera.com>
    
    == Merge branch commits ==
    
    commit 1f2443d902a26365a5c23e4af9077e1539ed2eab
    Author: Sandy Ryza <sa...@cloudera.com>
    Date:   Thu Feb 6 15:03:50 2014 -0800
    
        SPARK-1056. Fix header comment in Executor to not imply that it's only 
used for Mesos and Standalone

commit 18ad59e2c6b7bd009e8ba5ebf8fcf99630863029
Author: Kay Ousterhout <kayousterh...@gmail.com>
Date:   2014-02-07T00:10:48Z

    Merge pull request #321 from kayousterhout/ui_kill_fix. Closes #321.
    
    Inform DAG scheduler about all started/finished tasks.
    
    Previously, the DAG scheduler was not always informed
    when tasks started and finished. The simplest example here
    is for speculated tasks: the DAGScheduler was only told about
    the first attempt of a task, meaning that SparkListeners were
    also not told about multiple task attempts, so users can't see
    what's going on with speculation in the UI.  The DAGScheduler
    also wasn't always told about finished tasks, so in the UI, some
    tasks will never be shown as finished (this occurs, for example,
    if a task set gets killed).
    
    The other problem is that the fairness accounting was wrong
    -- the number of running tasks in a pool was decreased when a
    task set was considered done, even if all of its tasks hadn't
    yet finished.
    
    Author: Kay Ousterhout <kayousterh...@gmail.com>
    
    == Merge branch commits ==
    
    commit c8d547d0f7a17f5a193bef05f5872b9f475675c5
    Author: Kay Ousterhout <kayousterh...@gmail.com>
    Date:   Wed Jan 15 16:47:33 2014 -0800
    
        Addressed Reynold's review comments.
    
        Always use a TaskEndReason (remove the option), and explicitly
        signal when we don't know the reason. Also, always tell
        DAGScheduler (and associated listeners) about started tasks, even
        when they're speculated.
    
    commit 3fee1e2e3c06b975ff7f95d595448f38cce97a04
    Author: Kay Ousterhout <kayousterh...@gmail.com>
    Date:   Wed Jan 8 22:58:13 2014 -0800
    
        Fixed broken test and improved logging
    
    commit ff12fcaa2567c5d02b75a1d5db35687225bcd46f
    Author: Kay Ousterhout <kayousterh...@gmail.com>
    Date:   Sun Dec 29 21:08:20 2013 -0800
    
        Inform DAG scheduler about all finished tasks.
    
        Previously, the DAG scheduler was not always informed
        when tasks finished. For example, when a task set was
        aborted, the DAG scheduler was never told when the tasks
        in that task set finished. The DAG scheduler was also
        never told about the completion of speculated tasks.
        This led to confusion with SparkListeners because information
        about the completion of those tasks was never passed on to
        the listeners (so in the UI, for example, some tasks will never
        be shown as finished).
    
        The other problem is that the fairness accounting was wrong
        -- the number of running tasks in a pool was decreased when a
        task set was considered done, even if all of its tasks hadn't
        yet finished.

commit 0b448df6ac520a7977b1eb51e8c55e33f3fd2da8
Author: Kay Ousterhout <kayousterh...@gmail.com>
Date:   2014-02-07T00:15:24Z

    Merge pull request #450 from kayousterhout/fetch_failures. Closes #450.
    
    Only run ResubmitFailedStages event after a fetch fails
    
    Previously, the ResubmitFailedStages event was called every
    200 milliseconds, leading to a lot of unnecessary event processing
    and clogged DAGScheduler logs.
    
    Author: Kay Ousterhout <kayousterh...@gmail.com>
    
    == Merge branch commits ==
    
    commit e603784b3a562980e6f1863845097effe2129d3b
    Author: Kay Ousterhout <kayousterh...@gmail.com>
    Date:   Wed Feb 5 11:34:41 2014 -0800
    
        Re-add check for empty set of failed stages
    
    commit d258f0ef50caff4bbb19fb95a6b82186db1935bf
    Author: Kay Ousterhout <kayousterh...@gmail.com>
    Date:   Wed Jan 15 23:35:41 2014 -0800
    
        Only run ResubmitFailedStages event after a fetch fails
    
        Previously, the ResubmitFailedStages event was called every
        200 milliseconds, leading to a lot of unnecessary event processing
        and clogged DAGScheduler logs.

commit 1896c6e7c9f5c29284a045128b4aca0d5a6e7220
Author: Andrew Or <andrewo...@gmail.com>
Date:   2014-02-07T06:05:53Z

    Merge pull request #533 from andrewor14/master. Closes #533.
    
    External spilling - generalize batching logic
    
    The existing implementation consists of a hack for Kryo specifically and 
only works for LZF compression. Introducing an intermediate batch-level stream 
takes care of pre-fetching and other arbitrary behavior of higher level streams 
in a more general way.
    
    Author: Andrew Or <andrewo...@gmail.com>
    
    == Merge branch commits ==
    
    commit 3ddeb7ef89a0af2b685fb5d071aa0f71c975cc82
    Author: Andrew Or <andrewo...@gmail.com>
    Date:   Wed Feb 5 12:09:32 2014 -0800
    
        Also privatize fields
    
    commit 090544a87a0767effd0c835a53952f72fc8d24f0
    Author: Andrew Or <andrewo...@gmail.com>
    Date:   Wed Feb 5 10:58:23 2014 -0800
    
        Privatize methods
    
    commit 13920c918efe22e66a1760b14beceb17a61fd8cc
    Author: Andrew Or <andrewo...@gmail.com>
    Date:   Tue Feb 4 16:34:15 2014 -0800
    
        Update docs
    
    commit bd5a1d7350467ed3dc19c2de9b2c9f531f0e6aa3
    Author: Andrew Or <andrewo...@gmail.com>
    Date:   Tue Feb 4 13:44:24 2014 -0800
    
        Typo: phyiscal -> physical
    
    commit 287ef44e593ad72f7434b759be3170d9ee2723d2
    Author: Andrew Or <andrewo...@gmail.com>
    Date:   Tue Feb 4 13:38:32 2014 -0800
    
        Avoid reading the entire batch into memory; also simplify streaming 
logic
    
        Additionally, address formatting comments.
    
    commit 3df700509955f7074821e9aab1e74cb53c58b5a5
    Merge: a531d2e 164489d
    Author: Andrew Or <andrewo...@gmail.com>
    Date:   Mon Feb 3 18:27:49 2014 -0800
    
        Merge branch 'master' of github.com:andrewor14/incubator-spark
    
    commit a531d2e347acdcecf2d0ab72cd4f965ab5e145d8
    Author: Andrew Or <andrewo...@gmail.com>
    Date:   Mon Feb 3 18:18:04 2014 -0800
    
        Relax assumptions on compressors and serializers when batching
    
        This commit introduces an intermediate layer of an input stream on the 
batch level.
        This guards against interference from higher level streams (i.e. 
compression and
        deserialization streams), especially pre-fetching, without specifically 
targeting
        particular libraries (Kryo) and forcing shuffle spill compression to 
use LZF.
    
    commit 164489d6f176bdecfa9dabec2dfce5504d1ee8af
    Author: Andrew Or <andrewo...@gmail.com>
    Date:   Mon Feb 3 18:18:04 2014 -0800
    
        Relax assumptions on compressors and serializers when batching
    
        This commit introduces an intermediate layer of an input stream on the 
batch level.
        This guards against interference from higher level streams (i.e. 
compression and
        deserialization streams), especially pre-fetching, without specifically 
targeting
        particular libraries (Kryo) and forcing shuffle spill compression to 
use LZF.

commit 3a9d82cc9e85accb5c1577cf4718aa44c8d5038c
Author: Andrew Ash <and...@andrewash.com>
Date:   2014-02-07T06:38:36Z

    Merge pull request #506 from ash211/intersection. Closes #506.
    
    SPARK-1062 Add rdd.intersection(otherRdd) method
    
    Author: Andrew Ash <and...@andrewash.com>
    
    == Merge branch commits ==
    
    commit 5d9982b171b9572649e9828f37ef0b43f0242912
    Author: Andrew Ash <and...@andrewash.com>
    Date:   Thu Feb 6 18:11:45 2014 -0800
    
        Minor fixes
    
        - style: (v,null) => (v, null)
        - mention the shuffle in Javadoc
    
    commit b86d02f14e810902719cef893cf6bfa18ff9acb0
    Author: Andrew Ash <and...@andrewash.com>
    Date:   Sun Feb 2 13:17:40 2014 -0800
    
        Overload .intersection() for numPartitions and custom Partitioner
    
    commit bcaa34911fcc6bb5bc5e4f9fe46d1df73cb71c09
    Author: Andrew Ash <and...@andrewash.com>
    Date:   Sun Feb 2 13:05:40 2014 -0800
    
        Better naming of parameters in intersection's filter
    
    commit b10a6af2d793ec6e9a06c798007fac3f6b860d89
    Author: Andrew Ash <and...@andrewash.com>
    Date:   Sat Jan 25 23:06:26 2014 -0800
    
        Follow spark code format conventions of tab => 2 spaces
    
    commit 965256e4304cca514bb36a1a36087711dec535ec
    Author: Andrew Ash <and...@andrewash.com>
    Date:   Fri Jan 24 00:28:01 2014 -0800
    
        Add rdd.intersection(otherRdd) method

commit fabf1749995103841e6a3975892572f376ee48d0
Author: Martin Jaggi <m.ja...@gmail.com>
Date:   2014-02-08T19:39:13Z

    Merge pull request #552 from martinjaggi/master. Closes #552.
    
    tex formulas in the documentation
    
    using mathjax.
    and spliting the MLlib documentation by techniques
    
    see jira
    https://spark-project.atlassian.net/browse/MLLIB-19
    and
    https://github.com/shivaram/spark/compare/mathjax
    
    Author: Martin Jaggi <m.ja...@gmail.com>
    
    == Merge branch commits ==
    
    commit 0364bfabbfc347f917216057a20c39b631842481
    Author: Martin Jaggi <m.ja...@gmail.com>
    Date:   Fri Feb 7 03:19:38 2014 +0100
    
        minor polishing, as suggested by @pwendell
    
    commit dcd2142c164b2f602bf472bb152ad55bae82d31a
    Author: Martin Jaggi <m.ja...@gmail.com>
    Date:   Thu Feb 6 18:04:26 2014 +0100
    
        enabling inline latex formulas with $.$
    
        same mathjax configuration as used in math.stackexchange.com
    
        sample usage in the linear algebra (SVD) documentation
    
    commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa
    Author: Martin Jaggi <m.ja...@gmail.com>
    Date:   Thu Feb 6 17:31:29 2014 +0100
    
        split MLlib documentation by techniques
    
        and linked from the main mllib-guide.md site
    
    commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb
    Author: Martin Jaggi <m.ja...@gmail.com>
    Date:   Thu Feb 6 16:59:43 2014 +0100
    
        enable mathjax formula in the .md documentation files
    
        code by @shivaram
    
    commit d73948db0d9bc36296054e79fec5b1a657b4eab4
    Author: Martin Jaggi <m.ja...@gmail.com>
    Date:   Thu Feb 6 16:57:23 2014 +0100
    
        minor update on how to compile the documentation

commit 78050805bc691a00788f6e51f23dd785ca25b227
Author: Jey Kottalam <j...@cs.berkeley.edu>
Date:   2014-02-08T20:24:08Z

    Merge pull request #454 from jey/atomic-sbt-download. Closes #454.
    
    Make sbt download an atomic operation
    
    Modifies the `sbt/sbt` script to gracefully recover when a previous 
invocation died in the middle of downloading the SBT jar.
    
    Author: Jey Kottalam <j...@cs.berkeley.edu>
    
    == Merge branch commits ==
    
    commit 6c600eb434a2f3e7d70b67831aeebde9b5c0f43b
    Author: Jey Kottalam <j...@cs.berkeley.edu>
    Date:   Fri Jan 17 10:43:54 2014 -0800
    
        Make sbt download an atomic operation

commit f0ce736fadbcb7642b6148ad740f4508cd7dcd4d
Author: Qiuzhuang Lian <qiuzhuang.l...@gmail.com>
Date:   2014-02-08T20:59:48Z

    Merge pull request #561 from Qiuzhuang/master. Closes #561.
    
    Kill drivers in postStop() for Worker.
    
     JIRA SPARK-1068:https://spark-project.atlassian.net/browse/SPARK-1068
    
    Author: Qiuzhuang Lian <qiuzhuang.l...@gmail.com>
    
    == Merge branch commits ==
    
    commit 9c19ce63637eee9369edd235979288d3d9fc9105
    Author: Qiuzhuang Lian <qiuzhuang.l...@gmail.com>
    Date:   Sat Feb 8 16:07:39 2014 +0800
    
        Kill drivers in postStop() for Worker.
         JIRA SPARK-1068:https://spark-project.atlassian.net/browse/SPARK-1068

commit c2341c92bb206938fd9b18e2a714e5c6de55b06d
Author: Mark Hamstra <markhams...@gmail.com>
Date:   2014-02-09T00:00:43Z

    Merge pull request #542 from markhamstra/versionBump. Closes #542.
    
    Version number to 1.0.0-SNAPSHOT
    
    Since 0.9.0-incubating is done and out the door, we shouldn't be building 
0.9.0-incubating-SNAPSHOT anymore.
    
    @pwendell
    
    Author: Mark Hamstra <markhams...@gmail.com>
    
    == Merge branch commits ==
    
    commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71
    Author: Mark Hamstra <markhams...@gmail.com>
    Date:   Wed Feb 5 09:30:32 2014 -0800
    
        Version number to 1.0.0-SNAPSHOT

commit f892da8716d614467fddcc3a1b2b589979414219
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-02-09T07:13:34Z

    Merge pull request #565 from pwendell/dev-scripts. Closes #565.
    
    SPARK-1066: Add developer scripts to repository.
    
    These are some developer scripts I've been maintaining in a separate public 
repo. This patch adds them to the Spark repository so they can evolve here and 
are clearly accessible to all committers.
    
    I may do some small additional clean-up in this PR, but wanted to put them 
here in case others want to review. There are a few types of scripts here:
    
    1. A tool to merge pull requests.
    2. A script for packaging releases.
    3. A script for auditing release candidates.
    
    Author: Patrick Wendell <pwend...@gmail.com>
    
    == Merge branch commits ==
    
    commit 5d5d331d01f6fd59c2eb830f652955119b012173
    Author: Patrick Wendell <pwend...@gmail.com>
    Date:   Sat Feb 8 22:11:47 2014 -0800
    
        SPARK-1066: Add developer scripts to repository.

commit b6d40b782327188a25ded5b22790552121e5271f
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-02-09T07:35:31Z

    Merge pull request #560 from pwendell/logging. Closes #560.
    
    [WIP] SPARK-1067: Default log4j initialization causes errors for those not 
using log4j
    
    To fix this - we add a check when initializing log4j.
    
    Author: Patrick Wendell <pwend...@gmail.com>
    
    == Merge branch commits ==
    
    commit ffdce513877f64b6eed6d36138c3e0003d392889
    Author: Patrick Wendell <pwend...@gmail.com>
    Date:   Fri Feb 7 15:22:29 2014 -0800
    
        Logging fix

commit 2ef37c93664d74de6d7f6144834883a4a4ef79b7
Author: jyotiska <jyotiska...@gmail.com>
Date:   2014-02-09T07:36:48Z

    Merge pull request #562 from jyotiska/master. Closes #562.
    
    Added example Python code for sort
    
    I added an example Python code for sort. Right now, PySpark has limited 
examples for new people willing to use the project. This example code sorts 
integers stored in a file. I was able to sort 5 million, 10 million and 25 
million integers with this code.
    
    Author: jyotiska <jyotiska...@gmail.com>
    
    == Merge branch commits ==
    
    commit 8ad8faf6c8e02ae1cd68565d98524edf165f54df
    Author: jyotiska <jyotiska...@gmail.com>
    Date:   Sun Feb 9 11:00:41 2014 +0530
    
        Added comments in code on collect() method
    
    commit 6f98f1e313f4472a7c2207d36c4f0fbcebc95a8c
    Author: jyotiska <jyotiska...@gmail.com>
    Date:   Sat Feb 8 13:12:37 2014 +0530
    
        Updated python example code sort.py
    
    commit 945e39a5d68daa7e5bab0d96cbd35d7c4b04eafb
    Author: jyotiska <jyotiska...@gmail.com>
    Date:   Sat Feb 8 12:59:09 2014 +0530
    
        Added example python code for sort

commit b6dba10ae59215b5c4e40f7632563f592f138c87
Author: CodingCat <zhunans...@gmail.com>
Date:   2014-02-09T07:39:17Z

    Merge pull request #556 from CodingCat/JettyUtil. Closes #556.
    
    [SPARK-1060] startJettyServer should explicitly use IP information
    
    https://spark-project.atlassian.net/browse/SPARK-1060
    
    In the current implementation, the webserver in Master/Worker is started 
with
    
    val (srv, bPort) = JettyUtils.startJettyServer("0.0.0.0", port, handlers)
    
    inside startJettyServer:
    
    val server = new Server(currentPort) //here, the Server will take "0.0.0.0" 
as the hostname, i.e. will always bind to the IP address of the first NIC
    
    this can cause wrong IP binding, e.g. if the host has two NICs, N1 and N2, 
the user specify the SPARK_LOCAL_IP as the N2's IP address, however, when 
starting the web server, for the reason stated above, it will always bind to 
the N1's address
    
    Author: CodingCat <zhunans...@gmail.com>
    
    == Merge branch commits ==
    
    commit 6c6d9a8ccc9ec4590678a3b34cb03df19092029d
    Author: CodingCat <zhunans...@gmail.com>
    Date:   Thu Feb 6 14:53:34 2014 -0500
    
        startJettyServer should explicitly use IP information

commit b69f8b2a01669851c656739b6886efe4cddef31a
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-02-09T18:09:19Z

    Merge pull request #557 from ScrapCodes/style. Closes #557.
    
    SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build.
    
    Author: Patrick Wendell <pwend...@gmail.com>
    Author: Prashant Sharma <scrapco...@gmail.com>
    
    == Merge branch commits ==
    
    commit 1a8bd1c059b842cb95cc246aaea74a79fec684f4
    Author: Prashant Sharma <scrapco...@gmail.com>
    Date:   Sun Feb 9 17:39:07 2014 +0530
    
        scala style fixes
    
    commit f91709887a8e0b608c5c2b282db19b8a44d53a43
    Author: Patrick Wendell <pwend...@gmail.com>
    Date:   Fri Jan 24 11:22:53 2014 -0800
    
        Adding scalastyle snapshot

commit 94ccf869aacbe99b7ca7a40ca585a759923cb407
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-02-09T21:54:27Z

    Merge pull request #569 from pwendell/merge-fixes.
    
    Fixes bug where merges won't close associated pull request.
    
    Previously we added "Closes #XX" in the title. Github will sometimes
    linbreak the title in a way that causes this to not work. This patch
    instead adds the line in the body.
    
    This also makes the commit format more concise for merge commits.
    We might consider just dropping those in the future.
    
    Author: Patrick Wendell <pwend...@gmail.com>
    
    Closes #569 and squashes the following commits:
    
    732eba1 [Patrick Wendell] Fixes bug where merges won't close associated 
pull request.

commit afc8f3cb9a7afe3249500a7d135b4a54bb3e58c4
Author: qqsun8819 <jin....@alibaba-inc.com>
Date:   2014-02-09T21:57:29Z

    Merge pull request #551 from qqsun8819/json-protocol.
    
    [SPARK-1038] Add more fields in JsonProtocol and add tests that verify the 
JSON itself
    
    This is a PR for SPARK-1038. Two major changes:
    1 add some fields to JsonProtocol which is new and important to 
standalone-related data structures
    2 Use Diff in liftweb.json to verity the stringified Json output for 
detecting someone mod type T to Option[T]
    
    Author: qqsun8819 <jin....@alibaba-inc.com>
    
    Closes #551 and squashes the following commits:
    
    fdf0b4e [qqsun8819] [SPARK-1038] 1. Change code style for more readable 
according to rxin review 2. change submitdate hard-coded string to a date 
object toString for more complexiblity
    095a26f [qqsun8819] [SPARK-1038] mod according to  review of pwendel, use 
hard-coded json string for json data validation. Each test use its own json 
string
    0524e41 [qqsun8819] Merge remote-tracking branch 'upstream/master' into 
json-protocol
    d203d5c [qqsun8819] [SPARK-1038] Add more fields in JsonProtocol and add 
tests that verify the JSON itself

commit 2182aa3c55737a90e0ff200eede7146b440801a3
Author: Martin Jaggi <m.ja...@gmail.com>
Date:   2014-02-09T23:19:50Z

    Merge pull request #566 from martinjaggi/copy-MLlib-d.
    
    new MLlib documentation for optimization, regression and classification
    
    new documentation with tex formulas, hopefully improving usability and 
reproducibility of the offered MLlib methods.
    also did some minor changes in the code for consistency. scala tests pass.
    
    this is the rebased branch, i deleted the old PR
    
    jira:
    https://spark-project.atlassian.net/browse/MLLIB-19
    
    Author: Martin Jaggi <m.ja...@gmail.com>
    
    Closes #566 and squashes the following commits:
    
    5f0f31e [Martin Jaggi] line wrap at 100 chars
    4e094fb [Martin Jaggi] better description of GradientDescent
    1d6965d [Martin Jaggi] remove broken url
    ea569c3 [Martin Jaggi] telling what updater actually does
    964732b [Martin Jaggi] lambda R() in documentation
    a6c6228 [Martin Jaggi] better comments in SGD code for regression
    b32224a [Martin Jaggi] new optimization documentation
    d5dfef7 [Martin Jaggi] new classification and regression documentation
    b07ead6 [Martin Jaggi] correct scaling for MSE loss
    ba6158c [Martin Jaggi] use d for the number of features
    bab2ed2 [Martin Jaggi] renaming LeastSquaresGradient

commit 919bd7f669c61500eee7231298d9880b320eb6f3
Author: Prashant Sharma <prashan...@imaginea.com>
Date:   2014-02-10T06:17:52Z

    Merge pull request #567 from ScrapCodes/style2.
    
    SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build. Pt 2
    
    Continuation of PR #557
    
    With this all scala style errors are fixed across the code base !!
    
    The reason for creating a separate PR was to not interrupt an already 
reviewed and ready to merge PR. Hope this gets reviewed soon and merged too.
    
    Author: Prashant Sharma <prashan...@imaginea.com>
    
    Closes #567 and squashes the following commits:
    
    3b1ec30 [Prashant Sharma] scala style fixes

commit d6a9bdc097458ee961072e67627ade8a0a9e3c58
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-02-10T07:35:06Z

    Revert "Merge pull request #560 from pwendell/logging. Closes #560."
    
    This reverts commit b6d40b782327188a25ded5b22790552121e5271f.

commit 4afe6ccf40223699c13665b1ed5e98d1604d3247
Author: Chen Chao <crazy...@gmail.com>
Date:   2014-02-11T06:28:39Z

    Merge pull request #579 from CrazyJvm/patch-1.
    
    "in the source DStream" rather than "int the source DStream"
    
    "flatMap is a one-to-many DStream operation that creates a new DStream by 
generating multiple new records from each record int the source DStream."
    
    Author: Chen Chao <crazy...@gmail.com>
    
    Closes #579 and squashes the following commits:
    
    4abcae3 [Chen Chao] in the source DStream

commit ba38d9892ec922ff11f204cd4c1b8ddc90f1bd55
Author: Henry Saputra <he...@platfora.com>
Date:   2014-02-11T22:46:22Z

    Merge pull request #577 from hsaputra/fix_simple_streaming_doc.
    
    SPARK-1075 Fix doc in the Spark Streaming custom receiver closing bracket 
in the class constructor
    
    The closing parentheses in the constructor in the first code block example 
is reversed:
    diff --git a/docs/streaming-custom-receivers.md 
b/docs/streaming-custom-receivers.md
    index 4e27d65..3fb540c 100644
    — a/docs/streaming-custom-receivers.md
    +++ b/docs/streaming-custom-receivers.md
    @@ -14,7 +14,7 @@ This starts with implementing 
NetworkReceiver(api/streaming/index.html#org.apa
    The following is a simple socket text-stream receiver.
    {% highlight scala %}
    class SocketTextStreamReceiver(host: String, port: Int(
    + class SocketTextStreamReceiver(host: String, port: Int)
    extends NetworkReceiverString
    {
    protected lazy val blocksGenerator: BlockGenerator =
    
    Author: Henry Saputra <he...@platfora.com>
    
    Closes #577 and squashes the following commits:
    
    6508341 [Henry Saputra] SPARK-1075 Fix doc in the Spark Streaming custom 
receiver.

commit b0dab1bb9f4cfacae68b106a44d9b14f6bea3d29
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2014-02-11T22:48:59Z

    Merge pull request #571 from holdenk/switchtobinarysearch.
    
    SPARK-1072 Use binary search when needed in RangePartioner
    
    Author: Holden Karau <hol...@pigscanfly.ca>
    
    Closes #571 and squashes the following commits:
    
    f31a2e1 [Holden Karau] Swith to using CollectionsUtils in Partitioner
    4c7a0c3 [Holden Karau] Add CollectionsUtil as suggested by aarondav
    7099962 [Holden Karau] Add the binary search to only init once
    1bef01d [Holden Karau] CR feedback
    a21e097 [Holden Karau] Use binary search if we have more than 1000 elements 
inside of RangePartitioner

commit 1352981979acdebfeae66b940319fff35e71ee4f
Author: Bijay Bisht <bijay.bi...@gmail.com>
Date:   2014-02-05T17:34:55Z

    Ported hadoopClient jar for < 1.0.1 fix

----

Reply via email to