[GitHub] spark pull request: Branch 1.5

somideshmukh Wed, 04 Nov 2015 02:45:02 -0800

GitHub user somideshmukh opened a pull request:

    https://github.com/apache/spark/pull/9464


    Branch 1.5

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/somideshmukh/spark branch-1.5

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9464.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9464
    
----
commit b7c4ff144e783e80adc1efc2c28965c2e739dd5e
Author: Josh Rosen <joshro...@databricks.com>
Date:   2015-08-25T07:04:10Z

    [SPARK-9293] [SPARK-9813] Analysis should check that set operations are 
only performed on tables with equal numbers of columns
    
    This patch adds an analyzer rule to ensure that set operations (union, 
intersect, and except) are only applied to tables with the same number of 
columns. Without this rule, there are scenarios where invalid queries can 
return incorrect results instead of failing with error messages; SPARK-9813 
provides one example of this problem. In other cases, the invalid query can 
crash at runtime with extremely confusing exceptions.
    
    I also performed a bit of cleanup to refactor some of those logical 
operators' code into a common `SetOperation` base class.
    
    Author: Josh Rosen <joshro...@databricks.com>
    
    Closes #7631 from JoshRosen/SPARK-9293.
    
    (cherry picked from commit 82268f07abfa658869df2354ae72f8d6ddd119e8)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit 76d920f2b814304051dd76f0ca78301e872fc811
Author: Yu ISHIKAWA <yuu.ishik...@gmail.com>
Date:   2015-08-25T07:28:51Z

    [SPARK-10214] [SPARKR] [DOCS] Improve SparkR Column, DataFrame API docs
    
    cc: shivaram
    
    ## Summary
    
    - Add name tags to each methods in DataFrame.R and column.R
    - Replace `rdname column` with `rdname {each_func}`. i.e. alias method : 
`rdname column` =>  `rdname alias`
    
    ## Generated PDF File
    
https://drive.google.com/file/d/0B9biIZIU47lLNHN2aFpnQXlSeGs/view?usp=sharing
    
    ## JIRA
    [[SPARK-10214] Improve SparkR Column, DataFrame API docs - ASF 
JIRA](https://issues.apache.org/jira/browse/SPARK-10214)
    
    Author: Yu ISHIKAWA <yuu.ishik...@gmail.com>
    
    Closes #8414 from yu-iskw/SPARK-10214.
    
    (cherry picked from commit d4549fe58fa0d781e0e891bceff893420cb1d598)
    Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu>

commit 4841ebb1861025067a1108c11f64bb144427a308
Author: Sean Owen <so...@cloudera.com>
Date:   2015-08-25T07:32:20Z

    [SPARK-6196] [BUILD] Remove MapR profiles in favor of hadoop-provided
    
    Follow up to https://github.com/apache/spark/pull/7047
    
    pwendell mentioned that MapR should use `hadoop-provided` now, and indeed 
the new build script does not produce `mapr3`/`mapr4` artifacts anymore. Hence 
the action seems to be to remove the profiles, which are now not used.
    
    CC trystanleftwich
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #8338 from srowen/SPARK-6196.
    
    (cherry picked from commit 57b960bf3706728513f9e089455a533f0244312e)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 2032d66706d165079550f06bf695e0b08be7e143
Author: Tathagata Das <tathagata.das1...@gmail.com>
Date:   2015-08-25T07:35:51Z

    [SPARK-10210] [STREAMING] Filter out non-existent blocks before creating 
BlockRDD
    
    When write ahead log is not enabled, a recovered streaming driver still 
tries to run jobs using pre-failure block ids, and fails as the block do not 
exists in-memory any more (and cannot be recovered as receiver WAL is not 
enabled).
    
    This occurs because the driver-side WAL of ReceivedBlockTracker is recovers 
that past block information, and ReceiveInputDStream creates BlockRDDs even if 
those blocks do not exist.
    
    The solution in this PR is to filter out block ids that do not exist before 
creating the BlockRDD. In addition, it adds unit tests to verify other logic in 
ReceiverInputDStream.
    
    Author: Tathagata Das <tathagata.das1...@gmail.com>
    
    Closes #8405 from tdas/SPARK-10210.
    
    (cherry picked from commit 1fc37581a52530bac5d555dbf14927a5780c3b75)
    Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com>

commit e5cea566a32d254adc9424a2f9e79b92eda3e6e4
Author: Davies Liu <dav...@databricks.com>
Date:   2015-08-25T08:00:44Z

    [SPARK-10177] [SQL] fix reading Timestamp in parquet from Hive
    
    We misunderstood the Julian days and nanoseconds of the day in parquet (as 
TimestampType) from Hive/Impala, they are overlapped, so can't be added 
together directly.
    
    In order to avoid the confusing rounding when do the converting, we use 
`2440588` as the Julian Day of epoch of unix timestamp (which should be 
2440587.5).
    
    Author: Davies Liu <dav...@databricks.com>
    Author: Cheng Lian <l...@databricks.com>
    
    Closes #8400 from davies/timestamp_parquet.
    
    (cherry picked from commit 2f493f7e3924b769160a16f73cccbebf21973b91)
    Signed-off-by: Cheng Lian <l...@databricks.com>

commit a0f22cf295a1d20814c5be6cc727e39e95a81c27
Author: Josh Rosen <joshro...@databricks.com>
Date:   2015-08-25T08:06:36Z

    [SPARK-10195] [SQL] Data sources Filter should not expose internal types
    
    Spark SQL's data sources API exposes Catalyst's internal types through its 
Filter interfaces. This is a problem because types like UTF8String are not 
stable developer APIs and should not be exposed to third-parties.
    
    This issue caused incompatibilities when upgrading our `spark-redshift` 
library to work against Spark 1.5.0.  To avoid these issues in the future we 
should only expose public types through these Filter objects. This patch 
accomplishes this by using CatalystTypeConverters to add the appropriate 
conversions.
    
    Author: Josh Rosen <joshro...@databricks.com>
    
    Closes #8403 from JoshRosen/datasources-internal-vs-external-types.
    
    (cherry picked from commit 7bc9a8c6249300ded31ea931c463d0a8f798e193)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 73f1dd1b5acf1c6c37045da25902d7ca5ab795e4
Author: Yin Huai <yh...@databricks.com>
Date:   2015-08-25T08:19:34Z

    [SPARK-10197] [SQL] Add null check in wrapperFor (inside HiveInspectors).
    
    https://issues.apache.org/jira/browse/SPARK-10197
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #8407 from yhuai/ORCSPARK-10197.
    
    (cherry picked from commit 0e6368ffaec1965d0c7f89420e04a974675c7f6e)
    Signed-off-by: Cheng Lian <l...@databricks.com>

commit 5d6840569761a42624f9852b942e33039d21f46a
Author: Zhang, Liye <liye.zh...@intel.com>
Date:   2015-08-25T10:48:55Z

    [DOC] add missing parameters in SparkContext.scala for scala doc
    
    Author: Zhang, Liye <liye.zh...@intel.com>
    
    Closes #8412 from liyezhang556520/minorDoc.
    
    (cherry picked from commit 5c14890159a5711072bf395f662b2433a389edf9)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit bdcc8e608d9a1160db988faa76808149c28a3b50
Author: ehnalis <zoltan.zv...@gmail.com>
Date:   2015-08-25T11:30:06Z

    Fixed a typo in DAGScheduler.
    
    Author: ehnalis <zoltan.zv...@gmail.com>
    
    Closes #8308 from ehnalis/master.
    
    (cherry picked from commit 7f1e507bf7e82bff323c5dec3c1ee044687c4173)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 0402f1297c697bfbe8b5c7bfc170fcdc6b2c9de5
Author: Michael Armbrust <mich...@databricks.com>
Date:   2015-08-25T17:22:54Z

    [SPARK-10198] [SQL] Turn off partition verification by default
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #8404 from marmbrus/turnOffPartitionVerification.
    
    (cherry picked from commit 5c08c86bfa43462fb2ca5f7c5980ddfb44dd57f8)
    Signed-off-by: Michael Armbrust <mich...@databricks.com>

commit 742c82ed97ed3fc60d4f17c4363c52062829ea49
Author: Yuhao Yang <hhb...@gmail.com>
Date:   2015-08-25T17:54:03Z

    [SPARK-8531] [ML] Update ML user guide for MinMaxScaler
    
    jira: https://issues.apache.org/jira/browse/SPARK-8531
    
    Update ML user guide for MinMaxScaler
    
    Author: Yuhao Yang <hhb...@gmail.com>
    Author: unknown <yuhao...@yuhaoyan-mobl1.ccr.corp.intel.com>
    
    Closes #7211 from hhbyyh/minmaxdoc.
    
    (cherry picked from commit b37f0cc1b4c064d6f09edb161250fa8b783de52a)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit c740f5dd20459b491a8c088383c19c11a76c225d
Author: Feynman Liang <fli...@databricks.com>
Date:   2015-08-25T18:58:47Z

    [SPARK-10230] [MLLIB] Rename optimizeAlpha to optimizeDocConcentration
    
    See 
[discussion](https://github.com/apache/spark/pull/8254#discussion_r37837770)
    
    CC jkbradley
    
    Author: Feynman Liang <fli...@databricks.com>
    
    Closes #8422 from feynmanliang/SPARK-10230.
    
    (cherry picked from commit 881208a8e849facf54166bdd69d3634407f952e7)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit 5a32ed75c939dc42886ea940aba2b14b89e9f40e
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-25T19:16:23Z

    [SPARK-10231] [MLLIB] update @Since annotation for mllib.classification
    
    Update `Since` annotation in `mllib.classification`:
    
    1. add version to classes, objects, constructors, and public variables 
declared in constructors
    2. correct some versions
    3. remove `Since` on `toString`
    
    MechCoder dbtsai
    
    Author: Xiangrui Meng <m...@databricks.com>
    
    Closes #8421 from mengxr/SPARK-10231 and squashes the following commits:
    
    b2dce80 [Xiangrui Meng] update @Since annotation for mllib.classification
    
    (cherry picked from commit 16a2be1a84c0a274a60c0a584faaf58b55d4942b)
    Signed-off-by: DB Tsai <d...@netflix.com>

commit 95e44b4df81b09803be2fde8c4e2566be0c8fdbc
Author: Feynman Liang <fli...@databricks.com>
Date:   2015-08-25T20:21:05Z

    [SPARK-9800] Adds docs for GradientDescent$.runMiniBatchSGD alias
    
    * Adds doc for alias of runMIniBatchSGD documenting default value for 
convergeTol
    * Cleans up a note in code
    
    Author: Feynman Liang <fli...@databricks.com>
    
    Closes #8425 from feynmanliang/SPARK-9800.
    
    (cherry picked from commit c0e9ff1588b4d9313cc6ec6e00e5c7663eb67910)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit 186326df21daf8d8271a522f2569eb5cd7be1442
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-25T20:22:38Z

    [SPARK-10237] [MLLIB] update since versions in mllib.fpm
    
    Same as #8421 but for `mllib.fpm`.
    
    cc feynmanliang
    
    Author: Xiangrui Meng <m...@databricks.com>
    
    Closes #8429 from mengxr/SPARK-10237.
    
    (cherry picked from commit c619c7552f22d28cfa321ce671fc9ca854dd655f)
    Signed-off-by: Xiangrui Meng <m...@databricks.com>

commit 055387c087989c8790b6761429b68416ecee3a33
Author: Feynman Liang <fli...@databricks.com>
Date:   2015-08-25T20:23:15Z

    [SPARK-9797] [MLLIB] [DOC] 
StreamingLinearRegressionWithSGD.setConvergenceTol default value
    
    Adds default convergence tolerance (0.001, set in 
`GradientDescent.convergenceTol`) to `setConvergenceTol`'s scaladoc
    
    Author: Feynman Liang <fli...@databricks.com>
    
    Closes #8424 from feynmanliang/SPARK-9797.
    
    (cherry picked from commit 9205907876cf65695e56c2a94bedd83df3675c03)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit 6f05b7aebd66a00e2556a29b35084e81ac526406
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-25T21:11:38Z

    [SPARK-10239] [SPARK-10244] [MLLIB] update since versions in mllib.pmml and 
mllib.util
    
    Same as #8421 but for `mllib.pmml` and `mllib.util`.
    
    cc dbtsai
    
    Author: Xiangrui Meng <m...@databricks.com>
    
    Closes #8430 from mengxr/SPARK-10239 and squashes the following commits:
    
    a189acf [Xiangrui Meng] update since versions in mllib.pmml and mllib.util
    
    (cherry picked from commit 00ae4be97f7b205432db2967ba6d506286ef2ca6)
    Signed-off-by: DB Tsai <d...@netflix.com>

commit 8925896b1eb0a13d723d38fb263d3bec0a01ec10
Author: Davies Liu <dav...@databricks.com>
Date:   2015-08-25T21:55:34Z

    [SPARK-10245] [SQL] Fix decimal literals with precision < scale
    
    In BigDecimal or java.math.BigDecimal, the precision could be smaller than 
scale, for example, BigDecimal("0.001") has precision = 1 and scale = 3. But 
DecimalType require that the precision should be larger than scale, so we 
should use the maximum of precision and scale when inferring the schema from 
decimal literal.
    
    Author: Davies Liu <dav...@databricks.com>
    
    Closes #8428 from davies/smaller_decimal.
    
    (cherry picked from commit ec89bd840a6862751999d612f586a962cae63f6d)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit ab7d46d1d6e7e6705a3348a0cab2d05fe62951cf
Author: Davies Liu <dav...@databricks.com>
Date:   2015-08-25T22:19:41Z

    [SPARK-10215] [SQL] Fix precision of division (follow the rule in Hive)
    
    Follow the rule in Hive for decimal division. see 
https://github.com/apache/hive/blob/ac755ebe26361a4647d53db2a28500f71697b276/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPDivide.java#L113
    
    cc chenghao-intel
    
    Author: Davies Liu <dav...@databricks.com>
    
    Closes #8415 from davies/decimal_div2.
    
    (cherry picked from commit 7467b52ed07f174d93dfc4cb544dc4b69a2c2826)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit 727771352855dbb780008c449a877f5aaa5fc27a
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2015-08-25T22:56:37Z

    Preparing Spark release v1.5.0-rc2

commit 4c03cb4da846bf3ea4cd99f593d74c4a817a7d2d
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2015-08-25T22:56:44Z

    Preparing development version 1.5.1-SNAPSHOT

commit 5cf266fdeb6632622642e5d9bc056a76680b1970
Author: Feynman Liang <fli...@databricks.com>
Date:   2015-08-26T00:39:20Z

    [SPARK-9888] [MLLIB] User guide for new LDA features
    
     * Adds two new sections to LDA's user guide; one for each optimizer/model
     * Documents new features added to LDA (e.g. topXXXperXXX, asymmetric 
priors, hyperpam optimization)
     * Cleans up a TODO and sets a default parameter in LDA code
    
    jkbradley hhbyyh
    
    Author: Feynman Liang <fli...@databricks.com>
    
    Closes #8254 from feynmanliang/SPARK-9888.
    
    (cherry picked from commit 125205cdb35530cdb4a8fff3e1ee49cf4a299583)
    Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

commit af98e51f273d95e0fc19da1eca32a5f87a8c5576
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-26T01:17:54Z

    [SPARK-10233] [MLLIB] update since version in mllib.evaluation
    
    Same as #8421 but for `mllib.evaluation`.
    
    cc avulanov
    
    Author: Xiangrui Meng <m...@databricks.com>
    
    Closes #8423 from mengxr/SPARK-10233.
    
    (cherry picked from commit 8668ead2e7097b9591069599fbfccf67c53db659)
    Signed-off-by: Xiangrui Meng <m...@databricks.com>

commit 46750b912781433b6ce0845ac22805cde975361e
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-26T03:07:56Z

    [SPARK-10238] [MLLIB] update since versions in mllib.linalg
    
    Same as #8421 but for `mllib.linalg`.
    
    cc dbtsai
    
    Author: Xiangrui Meng <m...@databricks.com>
    
    Closes #8440 from mengxr/SPARK-10238 and squashes the following commits:
    
    b38437e [Xiangrui Meng] update since versions in mllib.linalg
    
    (cherry picked from commit ab431f8a970b85fba34ccb506c0f8815e55c63bf)
    Signed-off-by: DB Tsai <d...@netflix.com>

commit b7766699aef65586b0c3af96fb625efaa218d2b2
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-26T05:31:23Z

    [SPARK-10240] [SPARK-10242] [MLLIB] update since versions in mlilb.random 
and mllib.stat
    
    The same as #8241 but for `mllib.stat` and `mllib.random`.
    
    cc feynmanliang
    
    Author: Xiangrui Meng <m...@databricks.com>
    
    Closes #8439 from mengxr/SPARK-10242.
    
    (cherry picked from commit c3a54843c0c8a14059da4e6716c1ad45c69bbe6c)
    Signed-off-by: Xiangrui Meng <m...@databricks.com>

commit be0c9915c0084a187933f338e51e606dc68e93af
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-26T05:33:48Z

    [SPARK-10234] [MLLIB] update since version in mllib.clustering
    
    Same as #8421 but for `mllib.clustering`.
    
    cc feynmanliang yu-iskw
    
    Author: Xiangrui Meng <m...@databricks.com>
    
    Closes #8435 from mengxr/SPARK-10234.
    
    (cherry picked from commit d703372f86d6a59383ba8569fcd9d379849cffbf)
    Signed-off-by: Xiangrui Meng <m...@databricks.com>

commit 6d8ebc801799714d297c83be6935b37e26dc2df7
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-26T05:35:49Z

    [SPARK-10243] [MLLIB] update since versions in mllib.tree
    
    Same as #8421 but for `mllib.tree`.
    
    cc jkbradley
    
    Author: Xiangrui Meng <m...@databricks.com>
    
    Closes #8442 from mengxr/SPARK-10236.
    
    (cherry picked from commit fb7e12fe2e14af8de4c206ca8096b2e8113bfddc)
    Signed-off-by: Xiangrui Meng <m...@databricks.com>

commit 08d390f457f80ffdc2dfce61ea579d9026047f12
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-26T05:49:33Z

    [SPARK-10235] [MLLIB] update since versions in mllib.regression
    
    Same as #8421 but for `mllib.regression`.
    
    cc freeman-lab dbtsai
    
    Author: Xiangrui Meng <m...@databricks.com>
    
    Closes #8426 from mengxr/SPARK-10235 and squashes the following commits:
    
    6cd28e4 [Xiangrui Meng] update since versions in mllib.regression
    
    (cherry picked from commit 4657fa1f37d41dd4c7240a960342b68c7c591f48)
    Signed-off-by: DB Tsai <d...@netflix.com>

commit 21a10a86d20ec1a6fea42286b4d2aae9ce7e848d
Author: Xiangrui Meng <m...@databricks.com>
Date:   2015-08-26T06:45:41Z

    [SPARK-10236] [MLLIB] update since versions in mllib.feature
    
    Same as #8421 but for `mllib.feature`.
    
    cc dbtsai
    
    Author: Xiangrui Meng <m...@databricks.com>
    
    Closes #8449 from mengxr/SPARK-10236.feature and squashes the following 
commits:
    
    0e8d658 [Xiangrui Meng] remove unnecessary comment
    ad70b03 [Xiangrui Meng] update since versions in mllib.feature
    
    (cherry picked from commit 321d7759691bed9867b1f0470f12eab2faa50aff)
    Signed-off-by: DB Tsai <d...@netflix.com>

commit 5220db9e352b5d5eae59cead9478ca0a9f73f16b
Author: felixcheung <felixcheun...@hotmail.com>
Date:   2015-08-26T06:48:16Z

    [SPARK-9316] [SPARKR] Add support for filtering using `[` (synonym for 
filter / select)
    
    Add support for
    ```
       df[df$name == "Smith", c(1,2)]
       df[df$age %in% c(19, 30), 1:2]
    ```
    
    shivaram
    
    Author: felixcheung <felixcheun...@hotmail.com>
    
    Closes #8394 from felixcheung/rsubset.
    
    (cherry picked from commit 75d4773aa50e24972c533e8b48697fde586429eb)
    Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Branch 1.5

Reply via email to