[GitHub] spark pull request: Branch 1.6

pradeepsavadi Fri, 11 Mar 2016 17:00:06 -0800

GitHub user pradeepsavadi opened a pull request:

    https://github.com/apache/spark/pull/11668


    Branch 1.6

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11668
    
----
commit 5d3722f8e5cdb4abd946ea18950225919af53a11
Author: jerryshao <ss...@hortonworks.com>
Date:   2015-12-10T23:31:46Z

    [STREAMING][DOC][MINOR] Update the description of direct Kafka stream doc
    
    With the merge of 
[SPARK-8337](https://issues.apache.org/jira/browse/SPARK-8337), now the Python 
API has the same functionalities compared to Scala/Java, so here changing the 
description to make it more precise.
    
    zsxwing tdas , please review, thanks a lot.
    
    Author: jerryshao <ss...@hortonworks.com>
    
    Closes #10246 from jerryshao/direct-kafka-doc-update.
    
    (cherry picked from commit 24d3357d66e14388faf8709b368edca70ea96432)
    Signed-off-by: Shixiong Zhu <shixi...@databricks.com>

commit d09af2cb4237cca9ac72aacb9abb822a2982a820
Author: Davies Liu <dav...@databricks.com>
Date:   2015-12-11T01:22:18Z

    [SPARK-12258][SQL] passing null into ScalaUDF
    
    Check nullability and passing them into ScalaUDF.
    
    Closes #10249
    
    Author: Davies Liu <dav...@databricks.com>
    
    Closes #10259 from davies/udf_null.
    
    (cherry picked from commit b1b4ee7f3541d92c8bc2b0b4fdadf46cfdb09504)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit 3e39925f9296bc126adf3f6828a0adf306900c0a
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2015-12-11T02:45:36Z

    Preparing Spark release v1.6.0-rc2

commit 250249e26466ff0d6ee6f8ae34f0225285c9bb9b
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2015-12-11T02:45:42Z

    Preparing development version 1.6.0-SNAPSHOT

commit eec36607f9fc92b6c4d306e3930fcf03961625eb
Author: Davies Liu <dav...@databricks.com>
Date:   2015-12-11T19:15:53Z

    [SPARK-12258] [SQL] passing null into ScalaUDF (follow-up)
    
    This is a follow-up PR for #10259
    
    Author: Davies Liu <dav...@databricks.com>
    
    Closes #10266 from davies/null_udf2.
    
    (cherry picked from commit c119a34d1e9e599e302acfda92e5de681086a19f)
    Signed-off-by: Davies Liu <davies....@gmail.com>

commit 23f8dfd45187cb8f2216328ab907ddb5fbdffd0b
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2015-12-11T19:25:03Z

    Preparing Spark release v1.6.0-rc2

commit 2e4523161ddf2417f2570bb75cc2d6694813adf5
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2015-12-11T19:25:09Z

    Preparing development version 1.6.0-SNAPSHOT

commit f05bae4a30c422f0d0b2ab1e41d32e9d483fa675
Author: Yanbo Liang <yblia...@gmail.com>
Date:   2015-12-11T19:47:35Z

    [SPARK-12146][SPARKR] SparkR jsonFile should support multiple input files
    
    * ```jsonFile``` should support multiple input files, such as:
    ```R
    jsonFile(sqlContext, c(âpath1â, âpath2â)) # character vector as 
arguments
    jsonFile(sqlContext, âpath1,path2â)
    ```
    * Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be 
removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use 
```read.json``` at SparkR side.
    * Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but 
still keep jsonFile test case.
    * If this PR is accepted, we should also make almost the same change for 
```parquetFile```.
    
    cc felixcheung sun-rui shivaram
    
    Author: Yanbo Liang <yblia...@gmail.com>
    
    Closes #10145 from yanboliang/spark-12146.
    
    (cherry picked from commit 0fb9825556dbbcc98d7eafe9ddea8676301e09bb)
    Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu>

commit 2ddd10486b91619117b0c236c86e4e0f39869cfa
Author: anabranch <wac.chamb...@gmail.com>
Date:   2015-12-11T20:55:56Z

    [SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentation
    
    Adding in Pipeline Import and Export Documentation.
    
    Author: anabranch <wac.chamb...@gmail.com>
    Author: Bill Chambers <wchamb...@ischool.berkeley.edu>
    
    Closes #10179 from anabranch/master.
    
    (cherry picked from commit aa305dcaf5b4148aba9e669e081d0b9235f50857)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit bfcc8cfee7219e63d2f53fc36627f95dc60428eb
Author: Mike Dusenberry <mwdus...@us.ibm.com>
Date:   2015-12-11T22:21:33Z

    [SPARK-11497][MLLIB][PYTHON] PySpark RowMatrix Constructor Has Type Erasure 
Issue
    
    As noted in PR #9441, implementing `tallSkinnyQR` uncovered a bug with our 
PySpark `RowMatrix` constructor.  As discussed on the dev list 
[here](http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html),
 there appears to be an issue with type erasure with RDDs coming from Java, and 
by extension from PySpark.  Although we are attempting to construct a 
`RowMatrix` from an `RDD[Vector]` in 
[PythonMLlibAPI](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115),
 the `Vector` type is erased, resulting in an `RDD[Object]`.  Thus, when 
calling Scala's `tallSkinnyQR` from PySpark, we get a Java `ClassCastException` 
in which an `Object` cannot be cast to a Spark `Vector`.  As noted in the 
aforementioned dev list thread, this issue was also encountered with 
`DecisionTrees`, and the fix involved an explicit `retag` of the RDD with a 
`Vector` type.  `IndexedRowMatrix` and `CoordinateM
 atrix` do not appear to have this issue likely due to their related helper 
functions in `PythonMLlibAPI` creating the RDDs explicitly from DataFrames with 
pattern matching, thus preserving the types.
    
    This PR currently contains that retagging fix applied to the 
`createRowMatrix` helper function in `PythonMLlibAPI`.  This PR blocks #9441, 
so once this is merged, the other can be rebased.
    
    cc holdenk
    
    Author: Mike Dusenberry <mwdus...@us.ibm.com>
    
    Closes #9458 from 
dusenberrymw/SPARK-11497_PySpark_RowMatrix_Constructor_Has_Type_Erasure_Issue.
    
    (cherry picked from commit 1b8220387e6903564f765fabb54be0420c3e99d7)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit 75531c77e85073c7be18985a54c623710894d861
Author: BenFradet <benjamin.fra...@gmail.com>
Date:   2015-12-11T23:43:00Z

    [SPARK-12217][ML] Document invalid handling for StringIndexer
    
    Added a paragraph regarding StringIndexer#setHandleInvalid to the 
ml-features documentation.
    
    I wonder if I should also add a snippet to the code example, input welcome.
    
    Author: BenFradet <benjamin.fra...@gmail.com>
    
    Closes #10257 from BenFradet/SPARK-12217.
    
    (cherry picked from commit aea676ca2d07c72b1a752e9308c961118e5bfc3c)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit c2f20469d5b53a027b022e3c4a9bea57452c5ba6
Author: Yanbo Liang <yblia...@gmail.com>
Date:   2015-12-12T02:02:24Z

    [SPARK-11978][ML] Move dataset_example.py to examples/ml and rename to 
dataframe_example.py
    
    Since ```Dataset``` has a new meaning in Spark 1.6, we should rename it to 
avoid confusion.
    #9873 finished the work of Scala example, here we focus on the Python one.
    Move dataset_example.py to ```examples/ml``` and rename to 
```dataframe_example.py```.
    BTW, fix minor missing issues of #9873.
    cc mengxr
    
    Author: Yanbo Liang <yblia...@gmail.com>
    
    Closes #9957 from yanboliang/SPARK-11978.
    
    (cherry picked from commit a0ff6d16ef4bcc1b6ff7282e82a9b345d8449454)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit 03d801587936fe92d4e7541711f1f41965e64956
Author: Ankur Dave <ankurd...@gmail.com>
Date:   2015-12-12T03:07:48Z

    [SPARK-12298][SQL] Fix infinite loop in DataFrame.sortWithinPartitions
    
    Modifies the String overload to call the Column overload and ensures this 
is called in a test.
    
    Author: Ankur Dave <ankurd...@gmail.com>
    
    Closes #10271 from ankurdave/SPARK-12298.
    
    (cherry picked from commit 1e799d617a28cd0eaa8f22d103ea8248c4655ae5)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit 47461fea7c079819de6add308f823c7a8294f891
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2015-12-12T04:55:16Z

    [SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit test 
cases
    
    The existing sample functions miss the parameter `seed`, however, the 
corresponding function interface in `generics` has such a parameter. Thus, 
although the function caller can call the function with the 'seed', we are not 
using the value.
    
    This could cause SparkR unit tests failed. For example, I hit it in another 
PR:
    
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull
    
    Author: gatorsmile <gatorsm...@gmail.com>
    
    Closes #10160 from gatorsmile/sampleR.
    
    (cherry picked from commit 1e3526c2d3de723225024fedd45753b556e18fc6)
    Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu>

commit 2679fce717704bc6e64e726d1b754a6a48148770
Author: Jean-Baptiste OnofrÃ© <jbono...@apache.org>
Date:   2015-12-12T08:51:52Z

    [SPARK-11193] Use Java ConcurrentHashMap instead of SynchronizedMap trait 
in order to avoid ClassCastException due to KryoSerializer in KinesisReceiver
    
    Author: Jean-Baptiste OnofrÃ© <jbono...@apache.org>
    
    Closes #10203 from jbonofre/SPARK-11193.
    
    (cherry picked from commit 03138b67d3ef7f5278ea9f8b9c75f0e357ef79d8)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit e05364baa34cae1d359ebcec1a0a61abf86d464d
Author: Xusen Yin <yinxu...@gmail.com>
Date:   2015-12-13T01:47:01Z

    [SPARK-12199][DOC] Follow-up: Refine example code in ml-features.md
    
    https://issues.apache.org/jira/browse/SPARK-12199
    
    Follow-up PR of SPARK-11551. Fix some errors in ml-features.md
    
    mengxr
    
    Author: Xusen Yin <yinxu...@gmail.com>
    
    Closes #10193 from yinxusen/SPARK-12199.
    
    (cherry picked from commit 98b212d36b34ab490c391ea2adf5b141e4fb9289)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit d7e3bfd7d33b8fba44ef80932c0d40fb68075cb4
Author: Shixiong Zhu <shixi...@databricks.com>
Date:   2015-12-13T05:58:55Z

    [SPARK-12267][CORE] Store the remote RpcEnv address to send the correct 
disconnetion message
    
    Author: Shixiong Zhu <shixi...@databricks.com>
    
    Closes #10261 from zsxwing/SPARK-12267.
    
    (cherry picked from commit 8af2f8c61ae4a59d129fb3530d0f6e9317f4bff8)
    Signed-off-by: Shixiong Zhu <shixi...@databricks.com>

commit fbf16da2e53acc8678bd1454b0749d1923d4eddf
Author: Shixiong Zhu <shixi...@databricks.com>
Date:   2015-12-14T06:06:39Z

    [SPARK-12281][CORE] Fix a race condition when reporting ExecutorState in 
the shutdown hook
    
    1. Make sure workers and masters exit so that no worker or master will 
still be running when triggering the shutdown hook.
    2. Set ExecutorState to FAILED if it's still RUNNING when executing the 
shutdown hook.
    
    This should fix the potential exceptions when exiting a local cluster
    ```
    java.lang.AssertionError: assertion failed: executor 4 state transfer from 
RUNNING to RUNNING is illegal
        at scala.Predef$.assert(Predef.scala:179)
        at 
org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260)
        at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    
    java.lang.IllegalStateException: Shutdown hooks cannot be modified during 
shutdown.
        at 
org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246)
        at 
org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191)
        at 
org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180)
        at 
org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73)
        at 
org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474)
        at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    ```
    
    Author: Shixiong Zhu <shixi...@databricks.com>
    
    Closes #10269 from zsxwing/executor-state.
    
    (cherry picked from commit 2aecda284e22ec608992b6221e2f5ffbd51fcd24)
    Signed-off-by: Shixiong Zhu <shixi...@databricks.com>

commit 94ce5025f894f01602732b543bc14901e169cc65
Author: yucai <yucai...@intel.com>
Date:   2015-12-14T07:08:21Z

    [SPARK-12275][SQL] No plan for BroadcastHint in some condition
    
    When SparkStrategies.BasicOperators's "case BroadcastHint(child) => 
apply(child)" is hit, it only recursively invokes BasicOperators.apply with 
this "child". It makes many strategies have no change to process this plan, 
which probably leads to "No plan" issue, so we use planLater to go through all 
strategies.
    
    https://issues.apache.org/jira/browse/SPARK-12275
    
    Author: yucai <yucai...@intel.com>
    
    Closes #10265 from yucai/broadcast_hint.
    
    (cherry picked from commit ed87f6d3b48a85391628c29c43d318c26e2c6de7)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit c0f0f6cb0fef6e939744b60fdd4911c718f8fac5
Author: BenFradet <benjamin.fra...@gmail.com>
Date:   2015-12-14T13:50:30Z

    [MINOR][DOC] Fix broken word2vec link
    
    Follow-up of 
[SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 
where a broken link has been left as is.
    
    Author: BenFradet <benjamin.fra...@gmail.com>
    
    Closes #10282 from BenFradet/SPARK-12199.
    
    (cherry picked from commit e25f1fe42747be71c6b6e6357ca214f9544e3a46)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 352a0c80f4833a97916a75388ef290067c2dbede
Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu>
Date:   2015-12-15T00:13:55Z

    [SPARK-12327] Disable commented code lintr temporarily
    
    cc yhuai felixcheung shaneknapp
    
    Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu>
    
    Closes #10300 from shivaram/comment-lintr-disable.
    
    (cherry picked from commit fb3778de685881df66bf0222b520f94dca99e8c8)
    Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu>

commit 23c8846050b307fdfe2307f7e7ca9d0f69f969a9
Author: jerryshao <ss...@hortonworks.com>
Date:   2015-12-15T17:41:40Z

    [STREAMING][MINOR] Fix typo in function name of StateImpl
    
    cc\ tdas zsxwing , please review. Thanks a lot.
    
    Author: jerryshao <ss...@hortonworks.com>
    
    Closes #10305 from jerryshao/fix-typo-state-impl.
    
    (cherry picked from commit bc1ff9f4a41401599d3a87fb3c23a2078228a29b)
    Signed-off-by: Shixiong Zhu <shixi...@databricks.com>

commit 80d261718c1157e5cd4b0ac27e36ef919ea65afa
Author: Michael Armbrust <mich...@databricks.com>
Date:   2015-12-15T23:03:33Z

    Update branch-1.6 for 1.6.0 release
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #10317 from marmbrus/versions.

commit 00a39d9c05c55b5ffcd4f49aadc91cedf227669a
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2015-12-15T23:09:57Z

    Preparing Spark release v1.6.0-rc3

commit 08aa3b47e6a295a8297e741effa14cd0d834aea8
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2015-12-15T23:10:04Z

    Preparing development version 1.6.0-SNAPSHOT

commit 9e4ac56452710ddd8efb695e69c8de49317e3f28
Author: tedyu <yuzhih...@gmail.com>
Date:   2015-12-16T02:15:10Z

    [SPARK-12056][CORE] Part 2 Create a TaskAttemptContext only after calling 
setConf
    
    This is continuation of SPARK-12056 where change is applied to 
SqlNewHadoopRDD.scala
    
    andrewor14
    FYI
    
    Author: tedyu <yuzhih...@gmail.com>
    
    Closes #10164 from tedyu/master.
    
    (cherry picked from commit f725b2ec1ab0d89e35b5e2d3ddeddb79fec85f6d)
    Signed-off-by: Andrew Or <and...@databricks.com>

commit 2c324d35a698b353c2193e2f9bd8ba08c741c548
Author: Timothy Chen <tnac...@gmail.com>
Date:   2015-12-16T02:20:00Z

    [SPARK-12351][MESOS] Add documentation about submitting Spark with mesos 
cluster mode.
    
    Adding more documentation about submitting jobs with mesos cluster mode.
    
    Author: Timothy Chen <tnac...@gmail.com>
    
    Closes #10086 from tnachen/mesos_supervise_docs.
    
    (cherry picked from commit c2de99a7c3a52b0da96517c7056d2733ef45495f)
    Signed-off-by: Andrew Or <and...@databricks.com>

commit 8e9a600313f3047139d3cebef85acc782903123b
Author: Naveen <naveenmin...@gmail.com>
Date:   2015-12-16T02:25:22Z

    [SPARK-9886][CORE] Fix to use ShutdownHookManager in
    
    ExternalBlockStore.scala
    
    Author: Naveen <naveenmin...@gmail.com>
    
    Closes #10313 from naveenminchu/branch-fix-SPARK-9886.
    
    (cherry picked from commit 8a215d2338c6286253e20122640592f9d69896c8)
    Signed-off-by: Andrew Or <and...@databricks.com>

commit 93095eb29a1e59dbdbf6220bfa732b502330e6ae
Author: Bryan Cutler <bjcut...@us.ibm.com>
Date:   2015-12-16T02:28:16Z

    [SPARK-12062][CORE] Change Master to asyc rebuild UI when application 
completes
    
    This change builds the event history of completed apps asynchronously so 
the RPC thread will not be blocked and allow new workers to register/remove if 
the event log history is very large and takes a long time to rebuild.
    
    Author: Bryan Cutler <bjcut...@us.ibm.com>
    
    Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.
    
    (cherry picked from commit c5b6b398d5e368626e589feede80355fb74c2bd8)
    Signed-off-by: Andrew Or <and...@databricks.com>

commit fb08f7b784bc8b5e0cd110f315f72c7d9fc65e08
Author: Wenchen Fan <cloud0...@outlook.com>
Date:   2015-12-16T02:29:19Z

    [SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readability
    
    Author: Wenchen Fan <cloud0...@outlook.com>
    
    Closes #8645 from cloud-fan/test.
    
    (cherry picked from commit a89e8b6122ee5a1517fbcf405b1686619db56696)
    Signed-off-by: Andrew Or <and...@databricks.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Branch 1.6

Reply via email to