GitHub user nssalian opened a pull request: https://github.com/apache/spark/pull/6861
Adding Python code for Spark 8320 Added python code to https://spark.apache.org/docs/latest/streaming-programming-guide.html to the Level of Parallelism in Data Receiving section. Please review and let me know if there are any additional changes that are needed. Thank you. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nssalian/spark SPARK-8320 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6861.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6861 ---- commit 82a396c2f594bade276606dcd0c0545a650fb838 Author: Holden Karau <hol...@pigscanfly.ca> Date: 2015-05-29T21:59:18Z [SPARK-7910] [TINY] [JAVAAPI] expose partitioner information in javardd Author: Holden Karau <hol...@pigscanfly.ca> Closes #6464 from holdenk/SPARK-7910-expose-partitioner-information-in-javardd and squashes the following commits: de1e644 [Holden Karau] Fix the test to get the partitioner bdb31cc [Holden Karau] Add Mima exclude for the new method 347ef4c [Holden Karau] Add a quick little test for the partitioner JavaAPI f49dca9 [Holden Karau] Add partitoner information to JavaRDDLike and fix some whitespace commit 5fb97dca9bcfc29ac33823554c8783997e811b99 Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Date: 2015-05-29T22:08:30Z [SPARK-7954] [SPARKR] Create SparkContext in sparkRSQL init cc davies Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Closes #6507 from shivaram/sparkr-init and squashes the following commits: 6fdd169 [Shivaram Venkataraman] Create SparkContext in sparkRSQL init commit dbf8ff38de0f95f467b874a5b527dcf59439efe8 Author: Ram Sriharsha <rsriharsha@hw11853.local> Date: 2015-05-29T22:22:26Z [SPARK-6013] [ML] Add more Python ML examples for spark.ml Author: Ram Sriharsha <rsriharsha@hw11853.local> Closes #6443 from harsha2010/SPARK-6013 and squashes the following commits: 732506e [Ram Sriharsha] Code Review Feedback 121c211 [Ram Sriharsha] python style fix 5f9b8c3 [Ram Sriharsha] python style fixes 925ca86 [Ram Sriharsha] Simple Params Example 8b372b1 [Ram Sriharsha] GBT Example 965ec14 [Ram Sriharsha] Random Forest Example commit 8c9979337f193c72fd2f1a891909283de53777e3 Author: Andrew Or <and...@databricks.com> Date: 2015-05-29T22:26:49Z [HOTFIX] [SQL] Maven test compilation issue Tests compile in SBT but not Maven. commit a4f24123d8857656524c9138c7c067a4b1033a5e Author: Andrew Or <and...@databricks.com> Date: 2015-05-30T00:19:46Z [HOT FIX] [BUILD] Fix maven build failures This patch fixes a build break in maven caused by #6441. Note that this patch reverts the changes in flume-sink because this module does not currently depend on Spark core, but the tests require it. There is not an easy way to make this work because mvn test dependencies are not transitive (MNG-1378). For now, we will leave the one test suite in flume-sink out until we figure out a better solution. This patch is mainly intended to unbreak the maven build. Author: Andrew Or <and...@databricks.com> Closes #6511 from andrewor14/fix-build-mvn and squashes the following commits: 3d53643 [Andrew Or] [HOT FIX #6441] Fix maven build failures commit 3792d25836e1e521da64c5a62ca1b6cca1bcb6b9 Author: Taka Shinagawa <taka.epsi...@gmail.com> Date: 2015-05-30T03:35:14Z [DOCS][Tiny] Added a missing dash(-) in docs/configuration.md The first line had only two dashes (--) instead of three(---). Because of this missing dash(-), 'jekyll build' command was not converting configuration.md to _site/configuration.html Author: Taka Shinagawa <taka.epsi...@gmail.com> Closes #6513 from mrt/docfix3 and squashes the following commits: c470e2c [Taka Shinagawa] Added a missing dash(-) preventing jekyll from converting configuration.md to html format commit 7ed06c39922ac90acab3a78ce0f2f21184ed68a5 Author: Burak Yavuz <brk...@gmail.com> Date: 2015-05-30T05:19:15Z [SPARK-7957] Preserve partitioning when using randomSplit cc JoshRosen Thanks for noticing this! Author: Burak Yavuz <brk...@gmail.com> Closes #6509 from brkyvz/sample-perf-reg and squashes the following commits: 497465d [Burak Yavuz] addressed code review 293f95f [Burak Yavuz] [SPARK-7957] Preserve partitioning when using randomSplit commit 609c4923f98c188bce60ae35c1c8a08a8dfd95f1 Author: Andrew Or <and...@databricks.com> Date: 2015-05-30T05:57:46Z [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike This is a follow-up patch to #6441. Author: Andrew Or <and...@databricks.com> Closes #6510 from andrewor14/extends-funsuite-check and squashes the following commits: 6618b46 [Andrew Or] Exempt SparkSinkSuite from the FunSuite check 99d02ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into extends-funsuite-check 48874dd [Andrew Or] Guard against direct uses of FunSuite / FunSuiteLike commit 193dba01c77ef1bb63e3f617213eb257960f8d2f Author: Andrew Or <and...@databricks.com> Date: 2015-05-30T06:08:47Z [TRIVIAL] Typo fix for last commit commit da2112aef28e63c452f592e0abd007141787877d Author: Octavian Geagla <ogea...@gmail.com> Date: 2015-05-30T06:55:19Z [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct Author: Octavian Geagla <ogea...@gmail.com> Closes #6501 from ogeagla/ml-guide-elemwiseprod and squashes the following commits: 4ad93d5 [Octavian Geagla] [SPARK-7576] [MLLIB] Incorporate code review feedback. f7be7ad [Octavian Geagla] [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct. commit 78657d53d71b9d3e86b675cc519868f99e2ffa01 Author: Timothy Chen <tnac...@gmail.com> Date: 2015-05-30T06:56:18Z [SPARK-7962] [MESOS] Fix master url parsing in rest submission client. Only parse standalone master url when master url starts with spark:// Author: Timothy Chen <tnac...@gmail.com> Closes #6517 from tnachen/fix_mesos_client and squashes the following commits: 61a1198 [Timothy Chen] Fix master url parsing in rest submission client. commit e3a43748338b02ef6864ca62de40e218e5677506 Author: Octavian Geagla <ogea...@gmail.com> Date: 2015-05-30T07:00:36Z [SPARK-7459] [MLLIB] ElementwiseProduct Java example Author: Octavian Geagla <ogea...@gmail.com> Closes #6008 from ogeagla/elementwise-prod-doc and squashes the following commits: 72e6dc0 [Octavian Geagla] [SPARK-7459] [MLLIB] Java example import. cf2afbd [Octavian Geagla] [SPARK-7459] [MLLIB] Update description of example. b66431b [Octavian Geagla] [SPARK-7459] [MLLIB] Add override annotation to java example, make scala example use same data as java. 6b26b03 [Octavian Geagla] [SPARK-7459] [MLLIB] Fix line which is too long. 79af020 [Octavian Geagla] [SPARK-7459] [MLLIB] Actually don't use Java 8. 9d5b31a [Octavian Geagla] [SPARK-7459] [MLLIB] Don't use Java 8 4f0c92f [Octavian Geagla] [SPARK-7459] [MLLIB] ElementwiseProduct Java example. commit 0978aec9cd47dc0618e47b74a99e1cc2266be424 Author: Wenchen Fan <cloud0...@outlook.com> Date: 2015-05-30T07:26:46Z [SPARK-7964][SQL] remove unnecessary type coercion rule We have defined these logics in `Cast` already, I think we should remove this rule. Author: Wenchen Fan <cloud0...@outlook.com> Closes #6516 from cloud-fan/tmp2 and squashes the following commits: d5035a4 [Wenchen Fan] remove useless rule commit 8c8de3ed863985554e84fd07d1cdcaeca7e3375c Author: Sean Owen <so...@cloudera.com> Date: 2015-05-30T11:59:27Z [SPARK-7890] [DOCS] Document that Spark 2.11 now supports Kafka Remove caveat about Kafka / JDBC not being supported for Scala 2.11 Author: Sean Owen <so...@cloudera.com> Closes #6470 from srowen/SPARK-7890 and squashes the following commits: 4652634 [Sean Owen] One more rewording 7b7f3c8 [Sean Owen] Restore note about JDBC component 126744d [Sean Owen] Remove caveat about Kafka / JDBC not being supported for Scala 2.11 commit 9d8aadb72bbc86595e253fe30201cda6a8db877e Author: WangTaoTheTonic <wangtao...@huawei.com> Date: 2015-05-30T12:04:27Z [SPARK-7945] [CORE] Do trim to values in properties file https://issues.apache.org/jira/browse/SPARK-7945 Now applications submited by org.apache.spark.launcher.Main read properties file without doing trim to values in it. If user left a space after a value(say spark.driver.extraClassPath) then it probably affect global functions(like some jar could not be included in the classpath), so we should do it like Utils.getPropertiesFromFile. Author: WangTaoTheTonic <wangtao...@huawei.com> Author: Tao Wang <wangtao...@huawei.com> Closes #6496 from WangTaoTheTonic/SPARK-7945 and squashes the following commits: bb41b4b [Tao Wang] indent 4 to 2 6dd1cf2 [WangTaoTheTonic] use a simpler way 2c053a1 [WangTaoTheTonic] Do trim to values in properties file commit 2b35c99c7e73d22e82aef90b675709ae7f8d3b4a Author: zhichao.li <zhichao...@intel.com> Date: 2015-05-30T12:06:11Z [SPARK-7717] [WEBUI] Only showing total memory and cores for alive workers Author: zhichao.li <zhichao...@intel.com> Closes #6317 from zhichao-li/workers and squashes the following commits: d68bf11 [zhichao.li] change prefix 99b6768 [zhichao.li] remove extra space and add 'Alive' prefix 1e8eb06 [zhichao.li] only showing alive workers commit 3ab71eb9d5e3fe21af7720421eafa51f6da9b63f Author: Taka Shinagawa <taka.epsi...@gmail.com> Date: 2015-05-30T12:25:21Z [DOCS] [MINOR] Update for the Hadoop versions table with hadoop-2.6 Updated the doc for the hadoop-2.6 profile, which is new to Spark 1.4 Author: Taka Shinagawa <taka.epsi...@gmail.com> Closes #6450 from mrt/docfix2 and squashes the following commits: db1c43b [Taka Shinagawa] Updated the hadoop versions for hadoop-2.6 profile 323710e [Taka Shinagawa] The hadoop-2.6 profile is added to the Hadoop versions table commit d34b43bd5964e1feb03a17937de87a3f718806a5 Author: Reynold Xin <r...@databricks.com> Date: 2015-05-30T19:06:38Z Closes #4685 commit 6e3f0c7810a6721698b0ed51cfbd41a0cd07a4a3 Author: Cheng Lian <l...@databricks.com> Date: 2015-05-30T19:16:09Z [SPARK-7849] [SQL] [Docs] Updates SQL programming guide for 1.4 Author: Cheng Lian <l...@databricks.com> Closes #6520 from liancheng/spark-7849 and squashes the following commits: 705264b [Cheng Lian] Updates SQL programming guide for 1.4 commit 7716a5a1ec8ff8dc24e0146f8ead2f51da6512ad Author: Reynold Xin <r...@databricks.com> Date: 2015-05-30T21:57:23Z Updated SQL programming guide's Hive connectivity section. commit a6430028ecd7a6130f1eb15af9ec00e242c46725 Author: Josh Rosen <joshro...@databricks.com> Date: 2015-05-30T22:27:51Z [SPARK-7855] Move bypassMergeSort-handling from ExternalSorter to own component Spark's `ExternalSorter` writes shuffle output files during sort-based shuffle. Sort-shuffle contains a configuration, `spark.shuffle.sort.bypassMergeThreshold`, which causes ExternalSorter to skip sorting and merging and simply write separate files per partition, which are then concatenated together to form the final map output file. The code paths used during this bypass are almost completely separate from ExternalSorter's other code paths, so refactoring them into a separate file can significantly simplify the code. In addition to re-arranging code, this patch deletes a bunch of dead code. The main entry point into ExternalSorter is `insertAll()` and in SPARK-4479 / #3422 this method was modified to completely bypass in-memory buffering of records when `bypassMergeSort` takes effect. As a result, some of the spilling and merging code paths will no longer be called when `bypassMergeSort` is used, so we should be able to safely remove that code. There's an open JIRA ([SPARK-6026](https://issues.apache.org/jira/browse/SPARK-6026)) for removing the `bypassMergeThreshold` parameter and code paths; I have not done that here, but the changes in this patch will make removing that parameter significantly easier if we ever decide to do that. This patch also makes several improvements to shuffle-related tests and adds more defensive checks to certain shuffle classes: - DiskBlockObjectWriter now throws an exception if `fileSegment()` is called before `commitAndClose()` has been called. - DiskBlockObjectWriter's close methods are now idempotent, so calling any of the close methods twice in a row will no longer result in incorrect shuffle write metrics changes. Calling `revertPartialWritesAndClose()` on a closed DiskBlockObjectWriter now has no effect (before, it might mess up the metrics). - The end-to-end shuffle record count metrics tests have been moved from InputOutputMetricsSuite to ShuffleSuite. This means that these tests will now be run against all shuffle implementations rather than just the default shuffle configuration. - The end-to-end metrics tests now include a test of a job which performs aggregation in the shuffle. - Our tests now check that `shuffleBytesWritten == totalShuffleBytesRead`. - FileSegment now throws IllegalArgumentException if it is constructed with a negative length or offset. Author: Josh Rosen <joshro...@databricks.com> Closes #6397 from JoshRosen/external-sorter-bypass-cleanup and squashes the following commits: bf3f3f6 [Josh Rosen] Merge remote-tracking branch 'origin/master' into external-sorter-bypass-cleanup 8b216c4 [Josh Rosen] Guard against negative offsets and lengths in FileSegment 03f35a4 [Josh Rosen] Minor fix to cleanup logic. b5cc35b [Josh Rosen] Move shuffle metrics tests to ShuffleSuite. 8b8fb9e [Josh Rosen] Add more tests + defensive programming to DiskBlockObjectWriter. 16564eb [Josh Rosen] Guard against calling fileSegment() before commitAndClose() has been called. 96811b4 [Josh Rosen] Remove confusing taskMetrics.shuffleWriteMetrics() optional call 8522b6a [Josh Rosen] Do not perform a map-side sort unless we're also doing map-side aggregation 08e40f3 [Josh Rosen] Remove excessively clever (and wrong) implementation of newBuffer() d7f9938 [Josh Rosen] Add missing overrides; fix compilation 71d76ff [Josh Rosen] Update Javadoc bf0d98f [Josh Rosen] Add comment to clarify confusing factory code 5197f73 [Josh Rosen] Add missing private[this] 30ef2c8 [Josh Rosen] Convert BypassMergeSortShuffleWriter to Java bc1a820 [Josh Rosen] Fix bug when aggregator is used but map-side combine is disabled 0d3dcc0 [Josh Rosen] Remove unnecessary overloaded methods 25b964f [Josh Rosen] Rename SortShuffleSorter to SortShuffleFileWriter 0d9848c [Josh Rosen] Make it more clear that curWriteMetrics is now only used for spill metrics 7af7aea [Josh Rosen] Combine spill() and spillToMergeableFile() 6320112 [Josh Rosen] Add missing negation in deletion success check. d267e0d [Josh Rosen] Fix style issue 7f15f7b [Josh Rosen] Back out extra cleanup-handling code, since this is already covered in stop() 25aa3bd [Josh Rosen] Make sure to delete outputFile after errors. 931ca68 [Josh Rosen] Refactor tests. 6a35716 [Josh Rosen] Refactor logic for deciding when to bypass 4b03539 [Josh Rosen] Move conf prior to first use 1265b25 [Josh Rosen] Fix some style errors and comments. 02355ef [Josh Rosen] More simplification d4cb536 [Josh Rosen] Delete more unused code bb96678 [Josh Rosen] Add missing interface file b6cc1eb [Josh Rosen] Realize that bypass never buffers; proceed to delete tons of code 6185ee2 [Josh Rosen] WIP towards moving bypass code into own file. 8d0678c [Josh Rosen] Move diskBytesSpilled getter next to variable 19bccd6 [Josh Rosen] Remove duplicated buffer creation code. 18959bb [Josh Rosen] Move comparator methods closer together. commit 1617363fbb9b22a2eb09e7bab98c8d05f9508761 Author: Yanbo Liang <yblia...@gmail.com> Date: 2015-05-30T23:24:07Z [SPARK-7918] [MLLIB] MLlib Python doc parity check for evaluation and feature Check then make the MLlib Python evaluation and feature doc to be as complete as the Scala doc. Author: Yanbo Liang <yblia...@gmail.com> Closes #6461 from yanboliang/spark-7918 and squashes the following commits: 940e3f1 [Yanbo Liang] truncate too long line and remove extra sparse a80ae58 [Yanbo Liang] MLlib Python doc parity check for evaluation and feature commit 1281a3518802bfa624618236e6b9b59bc0e78585 Author: Mike Dusenberry <dusenberr...@gmail.com> Date: 2015-05-30T23:50:59Z [SPARK-7920] [MLLIB] Make MLlib ChiSqSelector Serializable (& Fix Related Documentation Example). The MLlib ChiSqSelector class is not serializable, and so the example in the ChiSqSelector documentation fails. Also, that example is missing the import of ChiSqSelector. This PR makes ChiSqSelector extend Serializable in MLlib, and adds the ChiSqSelector import statement to the associated example in the documentation. Author: Mike Dusenberry <dusenberr...@gmail.com> Closes #6462 from dusenberrymw/Make_ChiSqSelector_Serializable_and_Fix_Related_Docs_Example and squashes the following commits: 9cb2f94 [Mike Dusenberry] Make MLlib ChiSqSelector Serializable. d9003bf [Mike Dusenberry] Add missing import in MLlib ChiSqSelector Docs Scala example. commit 66a53a69643e0004742667e140bad2aa8dae44e4 Author: Josh Rosen <joshro...@databricks.com> Date: 2015-05-30T23:52:34Z [HOTFIX] Replace FunSuite with SparkFunSuite. This fixes a build break introduced by merging a6430028ecd7a6130f1eb15af9ec00e242c46725, which fails the new style checks that ensure that we use SparkFunSuite instead of FunSuite. commit 2b258e1c0784c8ca958bf94cd9e75fa17f104448 Author: Xiangrui Meng <m...@databricks.com> Date: 2015-05-31T00:21:41Z [SPARK-5610] [DOC] update genjavadocSettings to use the patched version of genjavadoc This PR updates `genjavadocSettings` to use a patched version of `genjavadoc-plugin` that hides package private classes/methods/interfaces in the generated Java API doc. The patch can be found at: https://github.com/typesafehub/genjavadoc/compare/master...mengxr:spark-1.4. It wasn't merged into the main repo because there exist corner cases where a package private Scala class has to be a Java public class in order to compile. This doesn't seem to apply to the Spark codebase. So we release a patched version under `org.spark-project` and use it in the Spark build. brkyvz is publishing the artifacts to Maven Central. Need more people audit the generated APIs and make sure we don't have false negatives. Current listed classes under `org.apache.spark.rdd`: ![screen shot 2015-05-29 at 12 48 52 pm](https://cloud.githubusercontent.com/assets/829644/7891396/28fb9daa-0601-11e5-8ed8-4e9522d25a71.png) After this PR: ![screen shot 2015-05-29 at 12 48 23 pm](https://cloud.githubusercontent.com/assets/829644/7891408/408e210e-0601-11e5-975c-ff0a02eb5c91.png) cc: pwendell rxin srowen Author: Xiangrui Meng <m...@databricks.com> Closes #6506 from mengxr/SPARK-5610 and squashes the following commits: 489c785 [Xiangrui Meng] update genjavadocSettings to use the patched version of genjavadoc commit 14b314dc2cad7bbf23976347217c676d338e0a2d Author: Reynold Xin <r...@databricks.com> Date: 2015-05-31T02:50:52Z [SQL] Tighten up visibility for JavaDoc. I went through all the JavaDocs and tightened up visibility. Author: Reynold Xin <r...@databricks.com> Closes #6526 from rxin/sql-1.4-visibility-for-docs and squashes the following commits: bc37d1e [Reynold Xin] Tighten up visibility for JavaDoc. commit c63e1a742b3e87e79a4466e9bd0b927a24645756 Author: Reynold Xin <r...@databricks.com> Date: 2015-05-31T02:51:53Z [SPARK-7971] Add JavaDoc style deprecation for deprecated DataFrame methods Scala deprecated annotation actually doesn't show up in JavaDoc. Author: Reynold Xin <r...@databricks.com> Closes #6523 from rxin/df-deprecated-javadoc and squashes the following commits: 26da2b2 [Reynold Xin] [SPARK-7971] Add JavaDoc style deprecation for deprecated DataFrame methods. commit 00a7137900d45188673da85cbcef4f02b7a266c1 Author: Reynold Xin <r...@databricks.com> Date: 2015-05-31T03:10:02Z Update documentation for the new DataFrame reader/writer interface. Author: Reynold Xin <r...@databricks.com> Closes #6522 from rxin/sql-doc-1.4 and squashes the following commits: c227be7 [Reynold Xin] Updated link. 040b6d7 [Reynold Xin] Update documentation for the new DataFrame reader/writer interface. commit f7fe9e474417a68635a5ed1aa819d81a9be40895 Author: Cheng Lian <l...@databricks.com> Date: 2015-05-31T04:56:41Z [SQL] [MINOR] Fixes a minor comment mistake in IsolatedClientLoader Author: Cheng Lian <l...@databricks.com> Closes #6521 from liancheng/classloader-comment-fix and squashes the following commits: fc09606 [Cheng Lian] Addresses @srowen's comment 59945c5 [Cheng Lian] Fixes a minor comment mistake in IsolatedClientLoader commit 084fef76e90116c6465cd6fad7c0197c3e4d4313 Author: Reynold Xin <r...@databricks.com> Date: 2015-05-31T06:36:32Z [SPARK-7976] Add style checker to disallow overriding finalize. Author: Reynold Xin <r...@databricks.com> Closes #6528 from rxin/style-finalizer and squashes the following commits: a2211ca [Reynold Xin] [SPARK-7976] Enable NoFinalizeChecker. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org