[GitHub] spark pull request #20587: Branch 2.2
GitHub user zhuge134 opened a pull request: https://github.com/apache/spark/pull/20587 Branch 2.2 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20587.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20587 commit 4e53a4edd72e372583f243c660bbcc0572205716 Author: Tathagata Das Date: 2017-07-06T07:20:26Z [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted ## What changes were proposed in this pull request? Stopping query while it is being initialized can throw interrupt exception, in which case temporary checkpoint directories will not be deleted, and the test will fail. Author: Tathagata Das Closes #18442 from tdas/DatastreamReaderWriterSuite-fix. (cherry picked from commit 60043f22458668ac7ecba94fa78953f23a6bdcec) Signed-off-by: Tathagata Das commit 576fd4c3a67b4affc5ac50979e27ae929472f0d9 Author: Tathagata Das Date: 2017-07-07T00:28:20Z [SPARK-21267][SS][DOCS] Update Structured Streaming Documentation ## What changes were proposed in this pull request? Few changes to the Structured Streaming documentation - Clarify that the entire stream input table is not materialized - Add information for Ganglia - Add Kafka Sink to the main docs - Removed a couple of leftover experimental tags - Added more associated reading material and talk videos. In addition, https://github.com/apache/spark/pull/16856 broke the link to the RDD programming guide in several places while renaming the page. This PR fixes those sameeragarwal cloud-fan. - Added a redirection to avoid breaking internal and possible external links. - Removed unnecessary redirection pages that were there since the separate scala, java, and python programming guides were merged together in 2013 or 2014. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tathagata Das Closes #18485 from tdas/SPARK-21267. (cherry picked from commit 0217dfd26f89133f146197359b556c9bf5aca172) Signed-off-by: Shixiong Zhu commit ab12848d624f6b74d401e924255c0b4fcc535231 Author: Prashant Sharma Date: 2017-07-08T06:33:12Z [SPARK-21069][SS][DOCS] Add rate source to programming guide. ## What changes were proposed in this pull request? SPARK-20979 added a new structured streaming source: Rate source. This patch adds the corresponding documentation to programming guide. ## How was this patch tested? Tested by running jekyll locally. Author: Prashant Sharma Author: Prashant Sharma Closes #18562 from ScrapCodes/spark-21069/rate-source-docs. (cherry picked from commit d0bfc6733521709e453d643582df2bdd68f28de7) Signed-off-by: Shixiong Zhu commit 7d0b1c927d92cc2a4932262514ffd12c47593b80 Author: Bogdan Raducanu Date: 2017-07-08T12:14:59Z [SPARK-21228][SQL][BRANCH-2.2] InSet incorrect handling of structs ## What changes were proposed in this pull request? This is backport of https://github.com/apache/spark/pull/18455 When data type is struct, InSet now uses TypeUtils.getInterpretedOrdering (similar to EqualTo) to build a TreeSet. In other cases it will use a HashSet as before (which should be faster). Similarly, In.eval uses Ordering.equiv instead of equals. ## How was this patch tested? New test in SQLQuerySuite. Author: Bogdan Raducanu Closes #18563 from bogdanrdc/SPARK-21228-BRANCH2.2. commit a64f10800244a8057f7f32c3d2f4a719c5080d05 Author: Dongjoon Hyun Date: 2017-07-08T12:16:47Z [SPARK-21345][SQL][TEST][TEST-MAVEN] SparkSessionBuilderSuite should clean up stopped sessions. `SparkSessionBuilderSuite` should clean up stopped sessions. Otherwise, it leaves behind some stopped `SparkContext`s interfereing with other test suites using `ShardSQLContext`. Recently, master branch fails consequtively. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ Pass the Jenkins with a updated suite. Author: Dongjoon Hyun Closes #18567 from dongjoon-hyun/SPARK-SESSION. (cherry picked from commit 0b8dd2d08460f3e6eb578727d2c336b6f11959e7) Signed-off-by: Wenchen Fan commit c8d7855b905742033b7588ce7ee28bc23de13709 Author: Marcelo Vanzin Date: 2017-07-08T16:24
[GitHub] spark pull request #20586: Branch 2.1
GitHub user zhuge134 opened a pull request: https://github.com/apache/spark/pull/20586 Branch 2.1 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20586.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20586 commit 21afc4534f90e063330ad31033aa178b37ef8340 Author: Marcelo Vanzin Date: 2017-02-22T21:19:31Z [SPARK-19652][UI] Do auth checks for REST API access (branch-2.1). The REST API has a security filter that performs auth checks based on the UI root's security manager. That works fine when the UI root is the app's UI, but not when it's the history server. In the SHS case, all users would be allowed to see all applications through the REST API, even if the UI itself wouldn't be available to them. This change adds auth checks for each app access through the API too, so that only authorized users can see the app's data. The change also modifies the existing security filter to use `HttpServletRequest.getRemoteUser()`, which is used in other places. That is not necessarily the same as the principal's name; for example, when using Hadoop's SPNEGO auth filter, the remote user strips the realm information, which then matches the user name registered as the owner of the application. I also renamed the UIRootFromServletContext trait to a more generic name since I'm using it to store more context information now. Tested manually with an authentication filter enabled. Author: Marcelo Vanzin Closes #17019 from vanzin/SPARK-19652_2.1. commit d30238f1b9096c9fd85527d95be639de9388fcc7 Author: actuaryzhang Date: 2017-02-23T19:12:02Z [SPARK-19682][SPARKR] Issue warning (or error) when subset method "[[" takes vector index ## What changes were proposed in this pull request? The `[[` method is supposed to take a single index and return a column. This is different from base R which takes a vector index. We should check for this and issue warning or error when vector index is supplied (which is very likely given the behavior in base R). Currently I'm issuing a warning message and just take the first element of the vector index. We could change this to an error it that's better. ## How was this patch tested? new tests Author: actuaryzhang Closes #17017 from actuaryzhang/sparkRSubsetter. (cherry picked from commit 7bf09433f5c5e08154ba106be21fe24f17cd282b) Signed-off-by: Felix Cheung commit 43084b3cc3918b720fe28053d2037fa22a71264e Author: Herman van Hovell Date: 2017-02-23T22:58:02Z [SPARK-19459][SQL][BRANCH-2.1] Support for nested char/varchar fields in ORC ## What changes were proposed in this pull request? This is a backport of the two following commits: https://github.com/apache/spark/commit/78eae7e67fd5dec0c2d5b1853ce86cd0f1ae & https://github.com/apache/spark/commit/de8a03e68202647555e30fffba551f65bc77608d This PR adds support for ORC tables with (nested) char/varchar fields. ## How was this patch tested? Added a regression test to `OrcSourceSuite`. Author: Herman van Hovell Closes #17041 from hvanhovell/SPARK-19459-branch-2.1. commit 66a7ca28a9de92e67ce24896a851a0c96c92aec6 Author: Takeshi Yamamuro Date: 2017-02-24T09:54:00Z [SPARK-19691][SQL][BRANCH-2.1] Fix ClassCastException when calculating percentile of decimal column ## What changes were proposed in this pull request? This is a backport of the two following commits: https://github.com/apache/spark/commit/93aa4271596a30752dc5234d869c3ae2f6e8e723 This pr fixed a class-cast exception below; ``` scala> spark.range(10).selectExpr("cast (id as decimal) as x").selectExpr("percentile(x, 0.5)").collect() java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be cast to java.lang.Number at org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.update(Percentile.scala:141) at org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.update(Percentile.scala:58) at org.apache.sp