GitHub user pradeepsavadi opened a pull request: https://github.com/apache/spark/pull/11668
Branch 1.6 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-1.6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11668.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11668 ---- commit 5d3722f8e5cdb4abd946ea18950225919af53a11 Author: jerryshao <ss...@hortonworks.com> Date: 2015-12-10T23:31:46Z [STREAMING][DOC][MINOR] Update the description of direct Kafka stream doc With the merge of [SPARK-8337](https://issues.apache.org/jira/browse/SPARK-8337), now the Python API has the same functionalities compared to Scala/Java, so here changing the description to make it more precise. zsxwing tdas , please review, thanks a lot. Author: jerryshao <ss...@hortonworks.com> Closes #10246 from jerryshao/direct-kafka-doc-update. (cherry picked from commit 24d3357d66e14388faf8709b368edca70ea96432) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> commit d09af2cb4237cca9ac72aacb9abb822a2982a820 Author: Davies Liu <dav...@databricks.com> Date: 2015-12-11T01:22:18Z [SPARK-12258][SQL] passing null into ScalaUDF Check nullability and passing them into ScalaUDF. Closes #10249 Author: Davies Liu <dav...@databricks.com> Closes #10259 from davies/udf_null. (cherry picked from commit b1b4ee7f3541d92c8bc2b0b4fdadf46cfdb09504) Signed-off-by: Yin Huai <yh...@databricks.com> commit 3e39925f9296bc126adf3f6828a0adf306900c0a Author: Patrick Wendell <pwend...@gmail.com> Date: 2015-12-11T02:45:36Z Preparing Spark release v1.6.0-rc2 commit 250249e26466ff0d6ee6f8ae34f0225285c9bb9b Author: Patrick Wendell <pwend...@gmail.com> Date: 2015-12-11T02:45:42Z Preparing development version 1.6.0-SNAPSHOT commit eec36607f9fc92b6c4d306e3930fcf03961625eb Author: Davies Liu <dav...@databricks.com> Date: 2015-12-11T19:15:53Z [SPARK-12258] [SQL] passing null into ScalaUDF (follow-up) This is a follow-up PR for #10259 Author: Davies Liu <dav...@databricks.com> Closes #10266 from davies/null_udf2. (cherry picked from commit c119a34d1e9e599e302acfda92e5de681086a19f) Signed-off-by: Davies Liu <davies....@gmail.com> commit 23f8dfd45187cb8f2216328ab907ddb5fbdffd0b Author: Patrick Wendell <pwend...@gmail.com> Date: 2015-12-11T19:25:03Z Preparing Spark release v1.6.0-rc2 commit 2e4523161ddf2417f2570bb75cc2d6694813adf5 Author: Patrick Wendell <pwend...@gmail.com> Date: 2015-12-11T19:25:09Z Preparing development version 1.6.0-SNAPSHOT commit f05bae4a30c422f0d0b2ab1e41d32e9d483fa675 Author: Yanbo Liang <yblia...@gmail.com> Date: 2015-12-11T19:47:35Z [SPARK-12146][SPARKR] SparkR jsonFile should support multiple input files * ```jsonFile``` should support multiple input files, such as: ```R jsonFile(sqlContext, c(âpath1â, âpath2â)) # character vector as arguments jsonFile(sqlContext, âpath1,path2â) ``` * Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side. * Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case. * If this PR is accepted, we should also make almost the same change for ```parquetFile```. cc felixcheung sun-rui shivaram Author: Yanbo Liang <yblia...@gmail.com> Closes #10145 from yanboliang/spark-12146. (cherry picked from commit 0fb9825556dbbcc98d7eafe9ddea8676301e09bb) Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu> commit 2ddd10486b91619117b0c236c86e4e0f39869cfa Author: anabranch <wac.chamb...@gmail.com> Date: 2015-12-11T20:55:56Z [SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentation Adding in Pipeline Import and Export Documentation. Author: anabranch <wac.chamb...@gmail.com> Author: Bill Chambers <wchamb...@ischool.berkeley.edu> Closes #10179 from anabranch/master. (cherry picked from commit aa305dcaf5b4148aba9e669e081d0b9235f50857) Signed-off-by: Joseph K. Bradley <jos...@databricks.com> commit bfcc8cfee7219e63d2f53fc36627f95dc60428eb Author: Mike Dusenberry <mwdus...@us.ibm.com> Date: 2015-12-11T22:21:33Z [SPARK-11497][MLLIB][PYTHON] PySpark RowMatrix Constructor Has Type Erasure Issue As noted in PR #9441, implementing `tallSkinnyQR` uncovered a bug with our PySpark `RowMatrix` constructor. As discussed on the dev list [here](http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html), there appears to be an issue with type erasure with RDDs coming from Java, and by extension from PySpark. Although we are attempting to construct a `RowMatrix` from an `RDD[Vector]` in [PythonMLlibAPI](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115), the `Vector` type is erased, resulting in an `RDD[Object]`. Thus, when calling Scala's `tallSkinnyQR` from PySpark, we get a Java `ClassCastException` in which an `Object` cannot be cast to a Spark `Vector`. As noted in the aforementioned dev list thread, this issue was also encountered with `DecisionTrees`, and the fix involved an explicit `retag` of the RDD with a `Vector` type. `IndexedRowMatrix` and `CoordinateM atrix` do not appear to have this issue likely due to their related helper functions in `PythonMLlibAPI` creating the RDDs explicitly from DataFrames with pattern matching, thus preserving the types. This PR currently contains that retagging fix applied to the `createRowMatrix` helper function in `PythonMLlibAPI`. This PR blocks #9441, so once this is merged, the other can be rebased. cc holdenk Author: Mike Dusenberry <mwdus...@us.ibm.com> Closes #9458 from dusenberrymw/SPARK-11497_PySpark_RowMatrix_Constructor_Has_Type_Erasure_Issue. (cherry picked from commit 1b8220387e6903564f765fabb54be0420c3e99d7) Signed-off-by: Joseph K. Bradley <jos...@databricks.com> commit 75531c77e85073c7be18985a54c623710894d861 Author: BenFradet <benjamin.fra...@gmail.com> Date: 2015-12-11T23:43:00Z [SPARK-12217][ML] Document invalid handling for StringIndexer Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation. I wonder if I should also add a snippet to the code example, input welcome. Author: BenFradet <benjamin.fra...@gmail.com> Closes #10257 from BenFradet/SPARK-12217. (cherry picked from commit aea676ca2d07c72b1a752e9308c961118e5bfc3c) Signed-off-by: Joseph K. Bradley <jos...@databricks.com> commit c2f20469d5b53a027b022e3c4a9bea57452c5ba6 Author: Yanbo Liang <yblia...@gmail.com> Date: 2015-12-12T02:02:24Z [SPARK-11978][ML] Move dataset_example.py to examples/ml and rename to dataframe_example.py Since ```Dataset``` has a new meaning in Spark 1.6, we should rename it to avoid confusion. #9873 finished the work of Scala example, here we focus on the Python one. Move dataset_example.py to ```examples/ml``` and rename to ```dataframe_example.py```. BTW, fix minor missing issues of #9873. cc mengxr Author: Yanbo Liang <yblia...@gmail.com> Closes #9957 from yanboliang/SPARK-11978. (cherry picked from commit a0ff6d16ef4bcc1b6ff7282e82a9b345d8449454) Signed-off-by: Joseph K. Bradley <jos...@databricks.com> commit 03d801587936fe92d4e7541711f1f41965e64956 Author: Ankur Dave <ankurd...@gmail.com> Date: 2015-12-12T03:07:48Z [SPARK-12298][SQL] Fix infinite loop in DataFrame.sortWithinPartitions Modifies the String overload to call the Column overload and ensures this is called in a test. Author: Ankur Dave <ankurd...@gmail.com> Closes #10271 from ankurdave/SPARK-12298. (cherry picked from commit 1e799d617a28cd0eaa8f22d103ea8248c4655ae5) Signed-off-by: Yin Huai <yh...@databricks.com> commit 47461fea7c079819de6add308f823c7a8294f891 Author: gatorsmile <gatorsm...@gmail.com> Date: 2015-12-12T04:55:16Z [SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit test cases The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value. This could cause SparkR unit tests failed. For example, I hit it in another PR: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull Author: gatorsmile <gatorsm...@gmail.com> Closes #10160 from gatorsmile/sampleR. (cherry picked from commit 1e3526c2d3de723225024fedd45753b556e18fc6) Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu> commit 2679fce717704bc6e64e726d1b754a6a48148770 Author: Jean-Baptiste Onofré <jbono...@apache.org> Date: 2015-12-12T08:51:52Z [SPARK-11193] Use Java ConcurrentHashMap instead of SynchronizedMap trait in order to avoid ClassCastException due to KryoSerializer in KinesisReceiver Author: Jean-Baptiste Onofré <jbono...@apache.org> Closes #10203 from jbonofre/SPARK-11193. (cherry picked from commit 03138b67d3ef7f5278ea9f8b9c75f0e357ef79d8) Signed-off-by: Sean Owen <so...@cloudera.com> commit e05364baa34cae1d359ebcec1a0a61abf86d464d Author: Xusen Yin <yinxu...@gmail.com> Date: 2015-12-13T01:47:01Z [SPARK-12199][DOC] Follow-up: Refine example code in ml-features.md https://issues.apache.org/jira/browse/SPARK-12199 Follow-up PR of SPARK-11551. Fix some errors in ml-features.md mengxr Author: Xusen Yin <yinxu...@gmail.com> Closes #10193 from yinxusen/SPARK-12199. (cherry picked from commit 98b212d36b34ab490c391ea2adf5b141e4fb9289) Signed-off-by: Joseph K. Bradley <jos...@databricks.com> commit d7e3bfd7d33b8fba44ef80932c0d40fb68075cb4 Author: Shixiong Zhu <shixi...@databricks.com> Date: 2015-12-13T05:58:55Z [SPARK-12267][CORE] Store the remote RpcEnv address to send the correct disconnetion message Author: Shixiong Zhu <shixi...@databricks.com> Closes #10261 from zsxwing/SPARK-12267. (cherry picked from commit 8af2f8c61ae4a59d129fb3530d0f6e9317f4bff8) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> commit fbf16da2e53acc8678bd1454b0749d1923d4eddf Author: Shixiong Zhu <shixi...@databricks.com> Date: 2015-12-14T06:06:39Z [SPARK-12281][CORE] Fix a race condition when reporting ExecutorState in the shutdown hook 1. Make sure workers and masters exit so that no worker or master will still be running when triggering the shutdown hook. 2. Set ExecutorState to FAILED if it's still RUNNING when executing the shutdown hook. This should fix the potential exceptions when exiting a local cluster ``` java.lang.AssertionError: assertion failed: executor 4 state transfer from RUNNING to RUNNING is illegal at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown. at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180) at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73) at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` Author: Shixiong Zhu <shixi...@databricks.com> Closes #10269 from zsxwing/executor-state. (cherry picked from commit 2aecda284e22ec608992b6221e2f5ffbd51fcd24) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> commit 94ce5025f894f01602732b543bc14901e169cc65 Author: yucai <yucai...@intel.com> Date: 2015-12-14T07:08:21Z [SPARK-12275][SQL] No plan for BroadcastHint in some condition When SparkStrategies.BasicOperators's "case BroadcastHint(child) => apply(child)" is hit, it only recursively invokes BasicOperators.apply with this "child". It makes many strategies have no change to process this plan, which probably leads to "No plan" issue, so we use planLater to go through all strategies. https://issues.apache.org/jira/browse/SPARK-12275 Author: yucai <yucai...@intel.com> Closes #10265 from yucai/broadcast_hint. (cherry picked from commit ed87f6d3b48a85391628c29c43d318c26e2c6de7) Signed-off-by: Yin Huai <yh...@databricks.com> commit c0f0f6cb0fef6e939744b60fdd4911c718f8fac5 Author: BenFradet <benjamin.fra...@gmail.com> Date: 2015-12-14T13:50:30Z [MINOR][DOC] Fix broken word2vec link Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is. Author: BenFradet <benjamin.fra...@gmail.com> Closes #10282 from BenFradet/SPARK-12199. (cherry picked from commit e25f1fe42747be71c6b6e6357ca214f9544e3a46) Signed-off-by: Sean Owen <so...@cloudera.com> commit 352a0c80f4833a97916a75388ef290067c2dbede Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Date: 2015-12-15T00:13:55Z [SPARK-12327] Disable commented code lintr temporarily cc yhuai felixcheung shaneknapp Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Closes #10300 from shivaram/comment-lintr-disable. (cherry picked from commit fb3778de685881df66bf0222b520f94dca99e8c8) Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu> commit 23c8846050b307fdfe2307f7e7ca9d0f69f969a9 Author: jerryshao <ss...@hortonworks.com> Date: 2015-12-15T17:41:40Z [STREAMING][MINOR] Fix typo in function name of StateImpl cc\ tdas zsxwing , please review. Thanks a lot. Author: jerryshao <ss...@hortonworks.com> Closes #10305 from jerryshao/fix-typo-state-impl. (cherry picked from commit bc1ff9f4a41401599d3a87fb3c23a2078228a29b) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> commit 80d261718c1157e5cd4b0ac27e36ef919ea65afa Author: Michael Armbrust <mich...@databricks.com> Date: 2015-12-15T23:03:33Z Update branch-1.6 for 1.6.0 release Author: Michael Armbrust <mich...@databricks.com> Closes #10317 from marmbrus/versions. commit 00a39d9c05c55b5ffcd4f49aadc91cedf227669a Author: Patrick Wendell <pwend...@gmail.com> Date: 2015-12-15T23:09:57Z Preparing Spark release v1.6.0-rc3 commit 08aa3b47e6a295a8297e741effa14cd0d834aea8 Author: Patrick Wendell <pwend...@gmail.com> Date: 2015-12-15T23:10:04Z Preparing development version 1.6.0-SNAPSHOT commit 9e4ac56452710ddd8efb695e69c8de49317e3f28 Author: tedyu <yuzhih...@gmail.com> Date: 2015-12-16T02:15:10Z [SPARK-12056][CORE] Part 2 Create a TaskAttemptContext only after calling setConf This is continuation of SPARK-12056 where change is applied to SqlNewHadoopRDD.scala andrewor14 FYI Author: tedyu <yuzhih...@gmail.com> Closes #10164 from tedyu/master. (cherry picked from commit f725b2ec1ab0d89e35b5e2d3ddeddb79fec85f6d) Signed-off-by: Andrew Or <and...@databricks.com> commit 2c324d35a698b353c2193e2f9bd8ba08c741c548 Author: Timothy Chen <tnac...@gmail.com> Date: 2015-12-16T02:20:00Z [SPARK-12351][MESOS] Add documentation about submitting Spark with mesos cluster mode. Adding more documentation about submitting jobs with mesos cluster mode. Author: Timothy Chen <tnac...@gmail.com> Closes #10086 from tnachen/mesos_supervise_docs. (cherry picked from commit c2de99a7c3a52b0da96517c7056d2733ef45495f) Signed-off-by: Andrew Or <and...@databricks.com> commit 8e9a600313f3047139d3cebef85acc782903123b Author: Naveen <naveenmin...@gmail.com> Date: 2015-12-16T02:25:22Z [SPARK-9886][CORE] Fix to use ShutdownHookManager in ExternalBlockStore.scala Author: Naveen <naveenmin...@gmail.com> Closes #10313 from naveenminchu/branch-fix-SPARK-9886. (cherry picked from commit 8a215d2338c6286253e20122640592f9d69896c8) Signed-off-by: Andrew Or <and...@databricks.com> commit 93095eb29a1e59dbdbf6220bfa732b502330e6ae Author: Bryan Cutler <bjcut...@us.ibm.com> Date: 2015-12-16T02:28:16Z [SPARK-12062][CORE] Change Master to asyc rebuild UI when application completes This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild. Author: Bryan Cutler <bjcut...@us.ibm.com> Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062. (cherry picked from commit c5b6b398d5e368626e589feede80355fb74c2bd8) Signed-off-by: Andrew Or <and...@databricks.com> commit fb08f7b784bc8b5e0cd110f315f72c7d9fc65e08 Author: Wenchen Fan <cloud0...@outlook.com> Date: 2015-12-16T02:29:19Z [SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readability Author: Wenchen Fan <cloud0...@outlook.com> Closes #8645 from cloud-fan/test. (cherry picked from commit a89e8b6122ee5a1517fbcf405b1686619db56696) Signed-off-by: Andrew Or <and...@databricks.com> ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org