GitHub user bijaybisht reopened a pull request:
https://github.com/apache/incubator-spark/pull/522
Hadoop jar name
This pull request is a copy of
#121 - Fix for hadoop client jar name, which got changed from 1.*. The
other one was from master, which is wrong way of generating the pull requests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-spark hadoop_jar_name
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-spark/pull/522.patch
----
commit 0ff38c22205f14770ecca1e66378e7c207ca2d1d
Author: Erik Selin <[email protected]>
Date: 2014-01-29T20:44:54Z
Merge pull request #494 from tyro89/worker_registration_issue
Issue with failed worker registrations
I've been going through the spark source after having some odd issues with
workers dying and not coming back. After some digging (I'm very new to scala
and spark) I believe I've found a worker registration issue. It looks to me
like a failed registration follows the same code path as a successful
registration which end up with workers believing they are connected (since they
received a `RegisteredWorker` event) even tho they are not registered on the
Master.
This is a quick fix that I hope addresses this issue (assuming I didn't
completely miss-read the code and I'm about to look like a silly person :P)
I'm opening this pr now to start a chat with you guys while I do some more
testing on my side :)
Author: Erik Selin <[email protected]>
== Merge branch commits ==
commit 973012f8a2dcf1ac1e68a69a2086a1b9a50f401b
Author: Erik Selin <[email protected]>
Date: Tue Jan 28 23:36:12 2014 -0500
break logwarning into two lines to respect line character limit.
commit e3754dc5b94730f37e9806974340e6dd93400f85
Author: Erik Selin <[email protected]>
Date: Tue Jan 28 21:16:21 2014 -0500
add log warning when worker registration fails due to attempt to
re-register on same address.
commit 14baca241fa7823e1213cfc12a3ff2a9b865b1ed
Author: Erik Selin <[email protected]>
Date: Wed Jan 22 21:23:26 2014 -0500
address code style comment
commit 71c0d7e6f59cd378d4e24994c21140ab893954ee
Author: Erik Selin <[email protected]>
Date: Wed Jan 22 16:01:42 2014 -0500
Make a failed registration not persist, not send a `RegisteredWordker`
event and not run `schedule` but rather send a `RegisterWorkerFailed` message
to the worker attempting to register.
commit ac712e48af3068672e629cec7766caae3cd77c37
Author: Reynold Xin <[email protected]>
Date: 2014-01-30T17:33:18Z
Merge pull request #524 from rxin/doc
Added spark.shuffle.file.buffer.kb to configuration doc.
Author: Reynold Xin <[email protected]>
== Merge branch commits ==
commit 0eea1d761ff772ff89be234e1e28035d54e5a7de
Author: Reynold Xin <[email protected]>
Date: Wed Jan 29 14:40:48 2014 -0800
Added spark.shuffle.file.buffer.kb to configuration doc.
commit a8cf3ec157fc9a512421b319cfffc5e4f07cf1f3
Author: Ankur Dave <[email protected]>
Date: 2014-02-01T00:52:02Z
Merge pull request #527 from ankurdave/graphx-assembly-pom
Add GraphX to assembly/pom.xml
Author: Ankur Dave <[email protected]>
== Merge branch commits ==
commit bb0b33ef9eb1b3d4a4fc283d9abb2ece4abcac23
Author: Ankur Dave <[email protected]>
Date: Fri Jan 31 15:24:52 2014 -0800
Add GraphX to assembly/pom.xml
commit 0386f42e383dc01b8df33c4a70b024e7902b5fdd
Author: Henry Saputra <[email protected]>
Date: 2014-02-03T05:51:17Z
Merge pull request #529 from hsaputra/cleanup_right_arrowop_scala
Change the â character (maybe from scalariform) to => in Scala code for
style consistency
Looks like there are some â Unicode character (maybe from scalariform) in
Scala code.
This PR is to change it to => to get some consistency on the Scala code.
If we want to use â as default we could use sbt plugin scalariform to
make sure all Scala code has â instead of =>
And remove unused imports found in TwitterInputDStream.scala while I was
there =)
Author: Henry Saputra <[email protected]>
== Merge branch commits ==
commit 29c1771d346dff901b0b778f764e6b4409900234
Author: Henry Saputra <[email protected]>
Date: Sat Feb 1 22:05:16 2014 -0800
Change the â character (maybe from scalariform) to => in Scala code
for style consistency.
commit 1625d8c44693420de026138f3abecce2d12f895c
Author: Aaron Davidson <[email protected]>
Date: 2014-02-03T19:25:39Z
Merge pull request #530 from aarondav/cleanup. Closes #530.
Remove explicit conversion to PairRDDFunctions in cogroup()
As SparkContext._ is already imported, using the implicit conversion
appears to make the code much cleaner. Perhaps there was some sinister reason
for doing the conversion explicitly, however.
Author: Aaron Davidson <[email protected]>
== Merge branch commits ==
commit aa4a63f1bfd5b5178fe67364dd7ce4d84c357996
Author: Aaron Davidson <[email protected]>
Date: Sun Feb 2 23:48:04 2014 -0800
Remove explicit conversion to PairRDDFunctions in cogroup()
As SparkContext._ is already imported, using the implicit conversion
appears to make the code much cleaner. Perhaps there was some sinister
reason for doing the converion explicitly, however.
commit 23af00f9e0e5108f62cdb9629e3eb4e54bbaa321
Author: Xiangrui Meng <[email protected]>
Date: 2014-02-03T21:02:09Z
Merge pull request #528 from mengxr/sample. Closes #528.
Refactor RDD sampling and add randomSplit to RDD (update)
Replace SampledRDD by PartitionwiseSampledRDD, which accepts a
RandomSampler instance as input. The current sample with/without replacement
can be easily integrated via BernoulliSampler and PoissonSampler. The benefits
are:
1) RDD.randomSplit is implemented in the same way, related to
https://github.com/apache/incubator-spark/pull/513
2) Stratified sampling and importance sampling can be implemented in the
same manner as well.
Unit tests are included for samplers and RDD.randomSplit.
This should performance better than my previous request where the
BernoulliSampler creates many Iterator instances:
https://github.com/apache/incubator-spark/pull/513
Author: Xiangrui Meng <[email protected]>
== Merge branch commits ==
commit e8ce957e5f0a600f2dec057924f4a2ca6adba373
Author: Xiangrui Meng <[email protected]>
Date: Mon Feb 3 12:21:08 2014 -0800
more docs to PartitionwiseSampledRDD
commit fbb4586d0478ff638b24bce95f75ff06f713d43b
Author: Xiangrui Meng <[email protected]>
Date: Mon Feb 3 00:44:23 2014 -0800
move XORShiftRandom to util.random and use it in BernoulliSampler
commit 987456b0ee8612fd4f73cb8c40967112dc3c4c2d
Author: Xiangrui Meng <[email protected]>
Date: Sat Feb 1 11:06:59 2014 -0800
relax assertions in SortingSuite because the RangePartitioner has large
variance in this case
commit 3690aae416b2dc9b2f9ba32efa465ba7948477f4
Author: Xiangrui Meng <[email protected]>
Date: Sat Feb 1 09:56:28 2014 -0800
test split ratio of RDD.randomSplit
commit 8a410bc933a60c4d63852606f8bbc812e416d6ae
Author: Xiangrui Meng <[email protected]>
Date: Sat Feb 1 09:25:22 2014 -0800
add a test to ensure seed distribution and minor style update
commit ce7e866f674c30ab48a9ceb09da846d5362ab4b6
Author: Xiangrui Meng <[email protected]>
Date: Fri Jan 31 18:06:22 2014 -0800
minor style change
commit 750912b4d77596ed807d361347bd2b7e3b9b7a74
Author: Xiangrui Meng <[email protected]>
Date: Fri Jan 31 18:04:54 2014 -0800
fix some long lines
commit c446a25c38d81db02821f7f194b0ce5ab4ed7ff5
Author: Xiangrui Meng <[email protected]>
Date: Fri Jan 31 17:59:59 2014 -0800
add complement to BernoulliSampler and minor style changes
commit dbe2bc2bd888a7bdccb127ee6595840274499403
Author: Xiangrui Meng <[email protected]>
Date: Fri Jan 31 17:45:08 2014 -0800
switch to partition-wise sampling for better performance
commit a1fca5232308feb369339eac67864c787455bb23
Merge: ac712e4 cf6128f
Author: Xiangrui Meng <[email protected]>
Date: Fri Jan 31 16:33:09 2014 -0800
Merge branch 'sample' of github.com:mengxr/incubator-spark into sample
commit cf6128fb672e8c589615adbd3eaa3cbdb72bd461
Author: Xiangrui Meng <[email protected]>
Date: Sun Jan 26 14:40:07 2014 -0800
set SampledRDD deprecated in 1.0
commit f430f847c3df91a3894687c513f23f823f77c255
Author: Xiangrui Meng <[email protected]>
Date: Sun Jan 26 14:38:59 2014 -0800
update code style
commit a8b5e2021a9204e318c80a44d00c5c495f1befb6
Author: Xiangrui Meng <[email protected]>
Date: Sun Jan 26 12:56:27 2014 -0800
move package random to util.random
commit ab0fa2c4965033737a9e3a9bf0a59cbb0df6a6f5
Author: Xiangrui Meng <[email protected]>
Date: Sun Jan 26 12:50:35 2014 -0800
add Apache headers and update code style
commit 985609fe1a55655ad11966e05a93c18c138a403d
Author: Xiangrui Meng <[email protected]>
Date: Sun Jan 26 11:49:25 2014 -0800
add new lines
commit b21bddf29850a2c006a868869b8f91960a029322
Author: Xiangrui Meng <[email protected]>
Date: Sun Jan 26 11:46:35 2014 -0800
move samplers to random.IndependentRandomSampler and add tests
commit c02dacb4a941618e434cefc129c002915db08be6
Author: Xiangrui Meng <[email protected]>
Date: Sat Jan 25 15:20:24 2014 -0800
add RandomSampler
commit 8ff7ba3c5cf1fc338c29ae8b5fa06c222640e89c
Author: Xiangrui Meng <[email protected]>
Date: Fri Jan 24 13:23:22 2014 -0800
init impl of IndependentlySampledRDD
commit 0c05cd374dac309b5444980f10f8dcb820c752c2
Author: Stevo SlaviÄ <[email protected]>
Date: 2014-02-04T17:45:46Z
Merge pull request #535 from sslavic/patch-2. Closes #535.
Fixed typo in scaladoc
Author: Stevo SlaviÄ <[email protected]>
== Merge branch commits ==
commit 0a77f789e281930f4168543cc0d3b3ffbf5b3764
Author: Stevo SlaviÄ <[email protected]>
Date: Tue Feb 4 15:30:27 2014 +0100
Fixed typo in scaladoc
commit 92092879c3b8001a456fefc2efc0df16585515a8
Author: Stevo SlaviÄ <[email protected]>
Date: 2014-02-04T17:47:11Z
Merge pull request #534 from sslavic/patch-1. Closes #534.
Fixed wrong path to compute-classpath.cmd
compute-classpath.cmd is in bin, not in sbin directory
Author: Stevo SlaviÄ <[email protected]>
== Merge branch commits ==
commit 23deca32b69e9429b33ad31d35b7e1bfc9459f59
Author: Stevo SlaviÄ <[email protected]>
Date: Tue Feb 4 15:01:47 2014 +0100
Fixed wrong path to compute-classpath.cmd
compute-classpath.cmd is in bin, not in sbin directory
commit f7fd80d9a71069cba94294e6b77c0eaeb90e73d7
Author: Stevo SlaviÄ <[email protected]>
Date: 2014-02-05T18:29:45Z
Merge pull request #540 from sslavic/patch-3. Closes #540.
Fix line end character stripping for Windows
LogQuery Spark example would produce unwanted result when run on Windows
platform because of different, platform specific trailing line end characters
(not only \n but \r too).
This fix makes use of Scala's standard library string functions to properly
strip all trailing line end characters, letting Scala handle the platform
specific stuff.
Author: Stevo SlaviÄ <[email protected]>
== Merge branch commits ==
commit 1e43ba0ea773cc005cf0aef78b6c1755f8e88b27
Author: Stevo SlaviÄ <[email protected]>
Date: Wed Feb 5 14:48:29 2014 +0100
Fix line end character stripping for Windows
LogQuery Spark example would produce unwanted result when run on
Windows platform because of different, platform specific trailing line end
characters (not only \n but \r too).
This fix makes use of Scala's standard library string functions to
properly strip all trailing line end characters, letting Scala handle the
platform specific stuff.
commit cc14ba974c8e98c08548a2ccf64c2765f313f649
Author: Kay Ousterhout <[email protected]>
Date: 2014-02-05T20:44:24Z
Merge pull request #544 from kayousterhout/fix_test_warnings. Closes #544.
Fixed warnings in test compilation.
This commit fixes two problems: a redundant import, and a
deprecated function.
Author: Kay Ousterhout <[email protected]>
== Merge branch commits ==
commit da9d2e13ee4102bc58888df0559c65cb26232a82
Author: Kay Ousterhout <[email protected]>
Date: Wed Feb 5 11:41:51 2014 -0800
Fixed warnings in test compilation.
This commit fixes two problems: a redundant import, and a
deprecated function.
commit 18c4ee71e27189f5f3f4eb6bfc6ad8860aa254c6
Author: CodingCat <[email protected]>
Date: 2014-02-06T06:08:47Z
Merge pull request #549 from CodingCat/deadcode_master. Closes #549.
remove actorToWorker in master.scala, which is actually not used
actorToWorker is actually not used in the code....just remove it
Author: CodingCat <[email protected]>
== Merge branch commits ==
commit 52656c2d4bbf9abcd8bef65d454badb9cb14a32c
Author: CodingCat <[email protected]>
Date: Thu Feb 6 00:28:26 2014 -0500
remove actorToWorker in master.scala, which is actually not used
commit 38020961d101e792393855fd00d8e42f40713754
Author: Thomas Graves <[email protected]>
Date: 2014-02-06T07:37:07Z
Merge pull request #526 from tgravescs/yarn_client_stop_am_fix. Closes #526.
spark on yarn - yarn-client mode doesn't always exit immediately
https://spark-project.atlassian.net/browse/SPARK-1049
If you run in the yarn-client mode but you don't get all the workers you
requested right away and then you exit your application, the application master
stays around until it gets the number of workers you initially requested. This
is a waste of resources. The AM should exit immediately upon the client going
away.
This fix simply checks to see if the driver closed while its waiting for
the initial # of workers.
Author: Thomas Graves <[email protected]>
== Merge branch commits ==
commit 03f40a62584b6bdd094ba91670cd4aa6afe7cd81
Author: Thomas Graves <[email protected]>
Date: Fri Jan 31 11:23:10 2014 -0600
spark on yarn - yarn-client mode doesn't always exit immediately
commit 79c95527a77af32bd83a968c1a56feb22e441b7d
Author: Kay Ousterhout <[email protected]>
Date: 2014-02-06T07:38:12Z
Merge pull request #545 from kayousterhout/fix_progress. Closes #545.
Fix off-by-one error with task progress info log.
Author: Kay Ousterhout <[email protected]>
== Merge branch commits ==
commit 29798fc685c4e7e3eb3bf91c75df7fa8ec94a235
Author: Kay Ousterhout <[email protected]>
Date: Wed Feb 5 13:40:01 2014 -0800
Fix off-by-one error with task progress info log.
commit 084839ba357e03bb56517620123682b50a91cb0b
Author: Prashant Sharma <[email protected]>
Date: 2014-02-06T22:58:35Z
Merge pull request #498 from ScrapCodes/python-api. Closes #498.
Python api additions
Author: Prashant Sharma <[email protected]>
== Merge branch commits ==
commit 8b51591f1a7a79a62c13ee66ff8d83040f7eccd8
Author: Prashant Sharma <[email protected]>
Date: Fri Jan 24 11:50:29 2014 +0530
Josh's and Patricks review comments.
commit d37f9677838e43bef6c18ef61fbf08055ba6d1ca
Author: Prashant Sharma <[email protected]>
Date: Thu Jan 23 17:27:17 2014 +0530
fixed doc tests
commit 27cb54bf5c99b1ea38a73858c291d0a1c43d8b7c
Author: Prashant Sharma <[email protected]>
Date: Thu Jan 23 16:48:43 2014 +0530
Added keys and values methods for PairFunctions in python
commit 4ce76b396fbaefef2386d7a36d611572bdef9b5d
Author: Prashant Sharma <[email protected]>
Date: Thu Jan 23 13:51:26 2014 +0530
Added foreachPartition
commit 05f05341a187cba829ac0e6c2bdf30be49948c89
Author: Prashant Sharma <[email protected]>
Date: Thu Jan 23 13:02:59 2014 +0530
Added coalesce fucntion to python API
commit 6568d2c2fa14845dc56322c0f39ba2e13b3b26dd
Author: Prashant Sharma <[email protected]>
Date: Thu Jan 23 12:52:44 2014 +0530
added repartition function to python API.
commit 446403b63763157831ddbf6209044efc3cc7bf7c
Author: Sandy Ryza <[email protected]>
Date: 2014-02-06T23:41:16Z
Merge pull request #554 from sryza/sandy-spark-1056. Closes #554.
SPARK-1056. Fix header comment in Executor to not imply that it's only u...
...sed for Mesos and Standalone.
Author: Sandy Ryza <[email protected]>
== Merge branch commits ==
commit 1f2443d902a26365a5c23e4af9077e1539ed2eab
Author: Sandy Ryza <[email protected]>
Date: Thu Feb 6 15:03:50 2014 -0800
SPARK-1056. Fix header comment in Executor to not imply that it's only
used for Mesos and Standalone
commit 18ad59e2c6b7bd009e8ba5ebf8fcf99630863029
Author: Kay Ousterhout <[email protected]>
Date: 2014-02-07T00:10:48Z
Merge pull request #321 from kayousterhout/ui_kill_fix. Closes #321.
Inform DAG scheduler about all started/finished tasks.
Previously, the DAG scheduler was not always informed
when tasks started and finished. The simplest example here
is for speculated tasks: the DAGScheduler was only told about
the first attempt of a task, meaning that SparkListeners were
also not told about multiple task attempts, so users can't see
what's going on with speculation in the UI. The DAGScheduler
also wasn't always told about finished tasks, so in the UI, some
tasks will never be shown as finished (this occurs, for example,
if a task set gets killed).
The other problem is that the fairness accounting was wrong
-- the number of running tasks in a pool was decreased when a
task set was considered done, even if all of its tasks hadn't
yet finished.
Author: Kay Ousterhout <[email protected]>
== Merge branch commits ==
commit c8d547d0f7a17f5a193bef05f5872b9f475675c5
Author: Kay Ousterhout <[email protected]>
Date: Wed Jan 15 16:47:33 2014 -0800
Addressed Reynold's review comments.
Always use a TaskEndReason (remove the option), and explicitly
signal when we don't know the reason. Also, always tell
DAGScheduler (and associated listeners) about started tasks, even
when they're speculated.
commit 3fee1e2e3c06b975ff7f95d595448f38cce97a04
Author: Kay Ousterhout <[email protected]>
Date: Wed Jan 8 22:58:13 2014 -0800
Fixed broken test and improved logging
commit ff12fcaa2567c5d02b75a1d5db35687225bcd46f
Author: Kay Ousterhout <[email protected]>
Date: Sun Dec 29 21:08:20 2013 -0800
Inform DAG scheduler about all finished tasks.
Previously, the DAG scheduler was not always informed
when tasks finished. For example, when a task set was
aborted, the DAG scheduler was never told when the tasks
in that task set finished. The DAG scheduler was also
never told about the completion of speculated tasks.
This led to confusion with SparkListeners because information
about the completion of those tasks was never passed on to
the listeners (so in the UI, for example, some tasks will never
be shown as finished).
The other problem is that the fairness accounting was wrong
-- the number of running tasks in a pool was decreased when a
task set was considered done, even if all of its tasks hadn't
yet finished.
commit 0b448df6ac520a7977b1eb51e8c55e33f3fd2da8
Author: Kay Ousterhout <[email protected]>
Date: 2014-02-07T00:15:24Z
Merge pull request #450 from kayousterhout/fetch_failures. Closes #450.
Only run ResubmitFailedStages event after a fetch fails
Previously, the ResubmitFailedStages event was called every
200 milliseconds, leading to a lot of unnecessary event processing
and clogged DAGScheduler logs.
Author: Kay Ousterhout <[email protected]>
== Merge branch commits ==
commit e603784b3a562980e6f1863845097effe2129d3b
Author: Kay Ousterhout <[email protected]>
Date: Wed Feb 5 11:34:41 2014 -0800
Re-add check for empty set of failed stages
commit d258f0ef50caff4bbb19fb95a6b82186db1935bf
Author: Kay Ousterhout <[email protected]>
Date: Wed Jan 15 23:35:41 2014 -0800
Only run ResubmitFailedStages event after a fetch fails
Previously, the ResubmitFailedStages event was called every
200 milliseconds, leading to a lot of unnecessary event processing
and clogged DAGScheduler logs.
commit 1896c6e7c9f5c29284a045128b4aca0d5a6e7220
Author: Andrew Or <[email protected]>
Date: 2014-02-07T06:05:53Z
Merge pull request #533 from andrewor14/master. Closes #533.
External spilling - generalize batching logic
The existing implementation consists of a hack for Kryo specifically and
only works for LZF compression. Introducing an intermediate batch-level stream
takes care of pre-fetching and other arbitrary behavior of higher level streams
in a more general way.
Author: Andrew Or <[email protected]>
== Merge branch commits ==
commit 3ddeb7ef89a0af2b685fb5d071aa0f71c975cc82
Author: Andrew Or <[email protected]>
Date: Wed Feb 5 12:09:32 2014 -0800
Also privatize fields
commit 090544a87a0767effd0c835a53952f72fc8d24f0
Author: Andrew Or <[email protected]>
Date: Wed Feb 5 10:58:23 2014 -0800
Privatize methods
commit 13920c918efe22e66a1760b14beceb17a61fd8cc
Author: Andrew Or <[email protected]>
Date: Tue Feb 4 16:34:15 2014 -0800
Update docs
commit bd5a1d7350467ed3dc19c2de9b2c9f531f0e6aa3
Author: Andrew Or <[email protected]>
Date: Tue Feb 4 13:44:24 2014 -0800
Typo: phyiscal -> physical
commit 287ef44e593ad72f7434b759be3170d9ee2723d2
Author: Andrew Or <[email protected]>
Date: Tue Feb 4 13:38:32 2014 -0800
Avoid reading the entire batch into memory; also simplify streaming
logic
Additionally, address formatting comments.
commit 3df700509955f7074821e9aab1e74cb53c58b5a5
Merge: a531d2e 164489d
Author: Andrew Or <[email protected]>
Date: Mon Feb 3 18:27:49 2014 -0800
Merge branch 'master' of github.com:andrewor14/incubator-spark
commit a531d2e347acdcecf2d0ab72cd4f965ab5e145d8
Author: Andrew Or <[email protected]>
Date: Mon Feb 3 18:18:04 2014 -0800
Relax assumptions on compressors and serializers when batching
This commit introduces an intermediate layer of an input stream on the
batch level.
This guards against interference from higher level streams (i.e.
compression and
deserialization streams), especially pre-fetching, without specifically
targeting
particular libraries (Kryo) and forcing shuffle spill compression to
use LZF.
commit 164489d6f176bdecfa9dabec2dfce5504d1ee8af
Author: Andrew Or <[email protected]>
Date: Mon Feb 3 18:18:04 2014 -0800
Relax assumptions on compressors and serializers when batching
This commit introduces an intermediate layer of an input stream on the
batch level.
This guards against interference from higher level streams (i.e.
compression and
deserialization streams), especially pre-fetching, without specifically
targeting
particular libraries (Kryo) and forcing shuffle spill compression to
use LZF.
commit 3a9d82cc9e85accb5c1577cf4718aa44c8d5038c
Author: Andrew Ash <[email protected]>
Date: 2014-02-07T06:38:36Z
Merge pull request #506 from ash211/intersection. Closes #506.
SPARK-1062 Add rdd.intersection(otherRdd) method
Author: Andrew Ash <[email protected]>
== Merge branch commits ==
commit 5d9982b171b9572649e9828f37ef0b43f0242912
Author: Andrew Ash <[email protected]>
Date: Thu Feb 6 18:11:45 2014 -0800
Minor fixes
- style: (v,null) => (v, null)
- mention the shuffle in Javadoc
commit b86d02f14e810902719cef893cf6bfa18ff9acb0
Author: Andrew Ash <[email protected]>
Date: Sun Feb 2 13:17:40 2014 -0800
Overload .intersection() for numPartitions and custom Partitioner
commit bcaa34911fcc6bb5bc5e4f9fe46d1df73cb71c09
Author: Andrew Ash <[email protected]>
Date: Sun Feb 2 13:05:40 2014 -0800
Better naming of parameters in intersection's filter
commit b10a6af2d793ec6e9a06c798007fac3f6b860d89
Author: Andrew Ash <[email protected]>
Date: Sat Jan 25 23:06:26 2014 -0800
Follow spark code format conventions of tab => 2 spaces
commit 965256e4304cca514bb36a1a36087711dec535ec
Author: Andrew Ash <[email protected]>
Date: Fri Jan 24 00:28:01 2014 -0800
Add rdd.intersection(otherRdd) method
commit fabf1749995103841e6a3975892572f376ee48d0
Author: Martin Jaggi <[email protected]>
Date: 2014-02-08T19:39:13Z
Merge pull request #552 from martinjaggi/master. Closes #552.
tex formulas in the documentation
using mathjax.
and spliting the MLlib documentation by techniques
see jira
https://spark-project.atlassian.net/browse/MLLIB-19
and
https://github.com/shivaram/spark/compare/mathjax
Author: Martin Jaggi <[email protected]>
== Merge branch commits ==
commit 0364bfabbfc347f917216057a20c39b631842481
Author: Martin Jaggi <[email protected]>
Date: Fri Feb 7 03:19:38 2014 +0100
minor polishing, as suggested by @pwendell
commit dcd2142c164b2f602bf472bb152ad55bae82d31a
Author: Martin Jaggi <[email protected]>
Date: Thu Feb 6 18:04:26 2014 +0100
enabling inline latex formulas with $.$
same mathjax configuration as used in math.stackexchange.com
sample usage in the linear algebra (SVD) documentation
commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa
Author: Martin Jaggi <[email protected]>
Date: Thu Feb 6 17:31:29 2014 +0100
split MLlib documentation by techniques
and linked from the main mllib-guide.md site
commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb
Author: Martin Jaggi <[email protected]>
Date: Thu Feb 6 16:59:43 2014 +0100
enable mathjax formula in the .md documentation files
code by @shivaram
commit d73948db0d9bc36296054e79fec5b1a657b4eab4
Author: Martin Jaggi <[email protected]>
Date: Thu Feb 6 16:57:23 2014 +0100
minor update on how to compile the documentation
commit 78050805bc691a00788f6e51f23dd785ca25b227
Author: Jey Kottalam <[email protected]>
Date: 2014-02-08T20:24:08Z
Merge pull request #454 from jey/atomic-sbt-download. Closes #454.
Make sbt download an atomic operation
Modifies the `sbt/sbt` script to gracefully recover when a previous
invocation died in the middle of downloading the SBT jar.
Author: Jey Kottalam <[email protected]>
== Merge branch commits ==
commit 6c600eb434a2f3e7d70b67831aeebde9b5c0f43b
Author: Jey Kottalam <[email protected]>
Date: Fri Jan 17 10:43:54 2014 -0800
Make sbt download an atomic operation
commit f0ce736fadbcb7642b6148ad740f4508cd7dcd4d
Author: Qiuzhuang Lian <[email protected]>
Date: 2014-02-08T20:59:48Z
Merge pull request #561 from Qiuzhuang/master. Closes #561.
Kill drivers in postStop() for Worker.
JIRA SPARK-1068:https://spark-project.atlassian.net/browse/SPARK-1068
Author: Qiuzhuang Lian <[email protected]>
== Merge branch commits ==
commit 9c19ce63637eee9369edd235979288d3d9fc9105
Author: Qiuzhuang Lian <[email protected]>
Date: Sat Feb 8 16:07:39 2014 +0800
Kill drivers in postStop() for Worker.
JIRA SPARK-1068:https://spark-project.atlassian.net/browse/SPARK-1068
commit c2341c92bb206938fd9b18e2a714e5c6de55b06d
Author: Mark Hamstra <[email protected]>
Date: 2014-02-09T00:00:43Z
Merge pull request #542 from markhamstra/versionBump. Closes #542.
Version number to 1.0.0-SNAPSHOT
Since 0.9.0-incubating is done and out the door, we shouldn't be building
0.9.0-incubating-SNAPSHOT anymore.
@pwendell
Author: Mark Hamstra <[email protected]>
== Merge branch commits ==
commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71
Author: Mark Hamstra <[email protected]>
Date: Wed Feb 5 09:30:32 2014 -0800
Version number to 1.0.0-SNAPSHOT
commit f892da8716d614467fddcc3a1b2b589979414219
Author: Patrick Wendell <[email protected]>
Date: 2014-02-09T07:13:34Z
Merge pull request #565 from pwendell/dev-scripts. Closes #565.
SPARK-1066: Add developer scripts to repository.
These are some developer scripts I've been maintaining in a separate public
repo. This patch adds them to the Spark repository so they can evolve here and
are clearly accessible to all committers.
I may do some small additional clean-up in this PR, but wanted to put them
here in case others want to review. There are a few types of scripts here:
1. A tool to merge pull requests.
2. A script for packaging releases.
3. A script for auditing release candidates.
Author: Patrick Wendell <[email protected]>
== Merge branch commits ==
commit 5d5d331d01f6fd59c2eb830f652955119b012173
Author: Patrick Wendell <[email protected]>
Date: Sat Feb 8 22:11:47 2014 -0800
SPARK-1066: Add developer scripts to repository.
commit b6d40b782327188a25ded5b22790552121e5271f
Author: Patrick Wendell <[email protected]>
Date: 2014-02-09T07:35:31Z
Merge pull request #560 from pwendell/logging. Closes #560.
[WIP] SPARK-1067: Default log4j initialization causes errors for those not
using log4j
To fix this - we add a check when initializing log4j.
Author: Patrick Wendell <[email protected]>
== Merge branch commits ==
commit ffdce513877f64b6eed6d36138c3e0003d392889
Author: Patrick Wendell <[email protected]>
Date: Fri Feb 7 15:22:29 2014 -0800
Logging fix
commit 2ef37c93664d74de6d7f6144834883a4a4ef79b7
Author: jyotiska <[email protected]>
Date: 2014-02-09T07:36:48Z
Merge pull request #562 from jyotiska/master. Closes #562.
Added example Python code for sort
I added an example Python code for sort. Right now, PySpark has limited
examples for new people willing to use the project. This example code sorts
integers stored in a file. I was able to sort 5 million, 10 million and 25
million integers with this code.
Author: jyotiska <[email protected]>
== Merge branch commits ==
commit 8ad8faf6c8e02ae1cd68565d98524edf165f54df
Author: jyotiska <[email protected]>
Date: Sun Feb 9 11:00:41 2014 +0530
Added comments in code on collect() method
commit 6f98f1e313f4472a7c2207d36c4f0fbcebc95a8c
Author: jyotiska <[email protected]>
Date: Sat Feb 8 13:12:37 2014 +0530
Updated python example code sort.py
commit 945e39a5d68daa7e5bab0d96cbd35d7c4b04eafb
Author: jyotiska <[email protected]>
Date: Sat Feb 8 12:59:09 2014 +0530
Added example python code for sort
commit b6dba10ae59215b5c4e40f7632563f592f138c87
Author: CodingCat <[email protected]>
Date: 2014-02-09T07:39:17Z
Merge pull request #556 from CodingCat/JettyUtil. Closes #556.
[SPARK-1060] startJettyServer should explicitly use IP information
https://spark-project.atlassian.net/browse/SPARK-1060
In the current implementation, the webserver in Master/Worker is started
with
val (srv, bPort) = JettyUtils.startJettyServer("0.0.0.0", port, handlers)
inside startJettyServer:
val server = new Server(currentPort) //here, the Server will take "0.0.0.0"
as the hostname, i.e. will always bind to the IP address of the first NIC
this can cause wrong IP binding, e.g. if the host has two NICs, N1 and N2,
the user specify the SPARK_LOCAL_IP as the N2's IP address, however, when
starting the web server, for the reason stated above, it will always bind to
the N1's address
Author: CodingCat <[email protected]>
== Merge branch commits ==
commit 6c6d9a8ccc9ec4590678a3b34cb03df19092029d
Author: CodingCat <[email protected]>
Date: Thu Feb 6 14:53:34 2014 -0500
startJettyServer should explicitly use IP information
commit b69f8b2a01669851c656739b6886efe4cddef31a
Author: Patrick Wendell <[email protected]>
Date: 2014-02-09T18:09:19Z
Merge pull request #557 from ScrapCodes/style. Closes #557.
SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build.
Author: Patrick Wendell <[email protected]>
Author: Prashant Sharma <[email protected]>
== Merge branch commits ==
commit 1a8bd1c059b842cb95cc246aaea74a79fec684f4
Author: Prashant Sharma <[email protected]>
Date: Sun Feb 9 17:39:07 2014 +0530
scala style fixes
commit f91709887a8e0b608c5c2b282db19b8a44d53a43
Author: Patrick Wendell <[email protected]>
Date: Fri Jan 24 11:22:53 2014 -0800
Adding scalastyle snapshot
commit 94ccf869aacbe99b7ca7a40ca585a759923cb407
Author: Patrick Wendell <[email protected]>
Date: 2014-02-09T21:54:27Z
Merge pull request #569 from pwendell/merge-fixes.
Fixes bug where merges won't close associated pull request.
Previously we added "Closes #XX" in the title. Github will sometimes
linbreak the title in a way that causes this to not work. This patch
instead adds the line in the body.
This also makes the commit format more concise for merge commits.
We might consider just dropping those in the future.
Author: Patrick Wendell <[email protected]>
Closes #569 and squashes the following commits:
732eba1 [Patrick Wendell] Fixes bug where merges won't close associated
pull request.
commit afc8f3cb9a7afe3249500a7d135b4a54bb3e58c4
Author: qqsun8819 <[email protected]>
Date: 2014-02-09T21:57:29Z
Merge pull request #551 from qqsun8819/json-protocol.
[SPARK-1038] Add more fields in JsonProtocol and add tests that verify the
JSON itself
This is a PR for SPARK-1038. Two major changes:
1 add some fields to JsonProtocol which is new and important to
standalone-related data structures
2 Use Diff in liftweb.json to verity the stringified Json output for
detecting someone mod type T to Option[T]
Author: qqsun8819 <[email protected]>
Closes #551 and squashes the following commits:
fdf0b4e [qqsun8819] [SPARK-1038] 1. Change code style for more readable
according to rxin review 2. change submitdate hard-coded string to a date
object toString for more complexiblity
095a26f [qqsun8819] [SPARK-1038] mod according to review of pwendel, use
hard-coded json string for json data validation. Each test use its own json
string
0524e41 [qqsun8819] Merge remote-tracking branch 'upstream/master' into
json-protocol
d203d5c [qqsun8819] [SPARK-1038] Add more fields in JsonProtocol and add
tests that verify the JSON itself
commit 2182aa3c55737a90e0ff200eede7146b440801a3
Author: Martin Jaggi <[email protected]>
Date: 2014-02-09T23:19:50Z
Merge pull request #566 from martinjaggi/copy-MLlib-d.
new MLlib documentation for optimization, regression and classification
new documentation with tex formulas, hopefully improving usability and
reproducibility of the offered MLlib methods.
also did some minor changes in the code for consistency. scala tests pass.
this is the rebased branch, i deleted the old PR
jira:
https://spark-project.atlassian.net/browse/MLLIB-19
Author: Martin Jaggi <[email protected]>
Closes #566 and squashes the following commits:
5f0f31e [Martin Jaggi] line wrap at 100 chars
4e094fb [Martin Jaggi] better description of GradientDescent
1d6965d [Martin Jaggi] remove broken url
ea569c3 [Martin Jaggi] telling what updater actually does
964732b [Martin Jaggi] lambda R() in documentation
a6c6228 [Martin Jaggi] better comments in SGD code for regression
b32224a [Martin Jaggi] new optimization documentation
d5dfef7 [Martin Jaggi] new classification and regression documentation
b07ead6 [Martin Jaggi] correct scaling for MSE loss
ba6158c [Martin Jaggi] use d for the number of features
bab2ed2 [Martin Jaggi] renaming LeastSquaresGradient
commit 919bd7f669c61500eee7231298d9880b320eb6f3
Author: Prashant Sharma <[email protected]>
Date: 2014-02-10T06:17:52Z
Merge pull request #567 from ScrapCodes/style2.
SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build. Pt 2
Continuation of PR #557
With this all scala style errors are fixed across the code base !!
The reason for creating a separate PR was to not interrupt an already
reviewed and ready to merge PR. Hope this gets reviewed soon and merged too.
Author: Prashant Sharma <[email protected]>
Closes #567 and squashes the following commits:
3b1ec30 [Prashant Sharma] scala style fixes
commit d6a9bdc097458ee961072e67627ade8a0a9e3c58
Author: Patrick Wendell <[email protected]>
Date: 2014-02-10T07:35:06Z
Revert "Merge pull request #560 from pwendell/logging. Closes #560."
This reverts commit b6d40b782327188a25ded5b22790552121e5271f.
commit 4afe6ccf40223699c13665b1ed5e98d1604d3247
Author: Chen Chao <[email protected]>
Date: 2014-02-11T06:28:39Z
Merge pull request #579 from CrazyJvm/patch-1.
"in the source DStream" rather than "int the source DStream"
"flatMap is a one-to-many DStream operation that creates a new DStream by
generating multiple new records from each record int the source DStream."
Author: Chen Chao <[email protected]>
Closes #579 and squashes the following commits:
4abcae3 [Chen Chao] in the source DStream
commit ba38d9892ec922ff11f204cd4c1b8ddc90f1bd55
Author: Henry Saputra <[email protected]>
Date: 2014-02-11T22:46:22Z
Merge pull request #577 from hsaputra/fix_simple_streaming_doc.
SPARK-1075 Fix doc in the Spark Streaming custom receiver closing bracket
in the class constructor
The closing parentheses in the constructor in the first code block example
is reversed:
diff --git a/docs/streaming-custom-receivers.md
b/docs/streaming-custom-receivers.md
index 4e27d65..3fb540c 100644
â a/docs/streaming-custom-receivers.md
+++ b/docs/streaming-custom-receivers.md
@@ -14,7 +14,7 @@ This starts with implementing
NetworkReceiver(api/streaming/index.html#org.apa
The following is a simple socket text-stream receiver.
{% highlight scala %}
class SocketTextStreamReceiver(host: String, port: Int(
+ class SocketTextStreamReceiver(host: String, port: Int)
extends NetworkReceiverString
{
protected lazy val blocksGenerator: BlockGenerator =
Author: Henry Saputra <[email protected]>
Closes #577 and squashes the following commits:
6508341 [Henry Saputra] SPARK-1075 Fix doc in the Spark Streaming custom
receiver.
commit b0dab1bb9f4cfacae68b106a44d9b14f6bea3d29
Author: Holden Karau <[email protected]>
Date: 2014-02-11T22:48:59Z
Merge pull request #571 from holdenk/switchtobinarysearch.
SPARK-1072 Use binary search when needed in RangePartioner
Author: Holden Karau <[email protected]>
Closes #571 and squashes the following commits:
f31a2e1 [Holden Karau] Swith to using CollectionsUtils in Partitioner
4c7a0c3 [Holden Karau] Add CollectionsUtil as suggested by aarondav
7099962 [Holden Karau] Add the binary search to only init once
1bef01d [Holden Karau] CR feedback
a21e097 [Holden Karau] Use binary search if we have more than 1000 elements
inside of RangePartitioner
commit 1352981979acdebfeae66b940319fff35e71ee4f
Author: Bijay Bisht <[email protected]>
Date: 2014-02-05T17:34:55Z
Ported hadoopClient jar for < 1.0.1 fix
----