[GitHub] spark pull request: Fix org.scala-lang: * inconsistent versions

witgo Mon, 21 Apr 2014 08:40:37 -0700

GitHub user witgo reopened a pull request:

    https://github.com/apache/spark/pull/234


    Fix  org.scala-lang: * inconsistent versions

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/witgo/spark SPARK-1325

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/234.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #234
    
----
commit 841721e03cc44ee7d8fe72c882db8c0f9f3af365
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-03-31T19:07:14Z

    SPARK-1352: Improve robustness of spark-submit script
    
    1. Better error messages when required arguments are missing.
    2. Support for unit testing cases where presented arguments are invalid.
    3. Bug fix: Only use environment varaibles when they are set (otherwise 
will cause NPE).
    4. A verbose mode to aid debugging.
    5. Visibility of several variables is set to private.
    6. Deprecation warning for existing scripts.
    
    Author: Patrick Wendell <pwend...@gmail.com>
    
    Closes #271 from pwendell/spark-submit and squashes the following commits:
    
    9146def [Patrick Wendell] SPARK-1352: Improve robustness of spark-submit 
script

commit 5731af5be65ccac831445f351baf040a0d007687
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-03-31T22:23:46Z

    [SQL] Rewrite join implementation to allow streaming of one relation.
    
    Before we were materializing everything in memory.  This also uses the 
projection interface so will be easier to plug in code gen (its ported from 
that branch).
    
    @rxin @liancheng
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #250 from marmbrus/hashJoin and squashes the following commits:
    
    1ad873e [Michael Armbrust] Change hasNext logic back to the correct version.
    8e6f2a2 [Michael Armbrust] Review comments.
    1e9fb63 [Michael Armbrust] style
    bc0cb84 [Michael Armbrust] Rewrite join implementation to allow streaming 
of one relation.

commit 33b3c2a8c6c71b89744834017a183ea855e1697c
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-03-31T23:25:43Z

    SPARK-1365 [HOTFIX] Fix RateLimitedOutputStream test
    
    This test needs to be fixed. It currently depends on Thread.sleep() having 
exact-timing
    semantics, which is not a valid assumption.
    
    Author: Patrick Wendell <pwend...@gmail.com>
    
    Closes #277 from pwendell/rate-limited-stream and squashes the following 
commits:
    
    6c0ff81 [Patrick Wendell] SPARK-1365: Fix RateLimitedOutputStream test

commit 564f1c137caf07bd1f073ec6c93551dcad935ee5
Author: Sandy Ryza <sa...@cloudera.com>
Date:   2014-04-01T02:56:31Z

    SPARK-1376. In the yarn-cluster submitter, rename "args" option to "arg"
    
    Author: Sandy Ryza <sa...@cloudera.com>
    
    Closes #279 from sryza/sandy-spark-1376 and squashes the following commits:
    
    d8aebfa [Sandy Ryza] SPARK-1376. In the yarn-cluster submitter, rename 
"args" option to "arg"

commit 94fe7fd4fa9749cb13e540e4f9caf28de47eaf32
Author: Andrew Or <andrewo...@gmail.com>
Date:   2014-04-01T04:42:36Z

    [SPARK-1377] Upgrade Jetty to 8.1.14v20131031
    
    Previous version was 7.6.8v20121106. The only difference between Jetty 7 
and Jetty 8 is that the former uses Servlet API 2.5, while the latter uses 
Servlet API 3.0.
    
    Author: Andrew Or <andrewo...@gmail.com>
    
    Closes #280 from andrewor14/jetty-upgrade and squashes the following 
commits:
    
    dd57104 [Andrew Or] Merge github.com:apache/spark into jetty-upgrade
    e75fa85 [Andrew Or] Upgrade Jetty to 8.1.14v20131031

commit ada310a9d3d5419e101b24d9b41398f609da1ad3
Author: Andrew Or <andrewo...@gmail.com>
Date:   2014-04-01T06:01:14Z

    [Hot Fix #42] Persisted RDD disappears on storage page if re-used
    
    If a previously persisted RDD is re-used, its information disappears from 
the Storage page.
    
    This is because the tasks associated with re-using the RDD do not report 
the RDD's blocks as updated (which is correct). On stage submit, however, we 
overwrite any existing information regarding that RDD with a fresh one, whether 
or not the information for the RDD already exists.
    
    Author: Andrew Or <andrewo...@gmail.com>
    
    Closes #281 from andrewor14/ui-storage-fix and squashes the following 
commits:
    
    408585a [Andrew Or] Fix storage UI bug

commit f5c418da044ef7f3d7185cc5bb1bef79d7f4e25c
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-04-01T21:45:44Z

    [SQL] SPARK-1372 Support for caching and uncaching tables in a SQLContext.
    
    This doesn't yet support different databases in Hive (though you can 
probably workaround this by calling `USE <dbname>`).  However, given the time 
constraints for 1.0 I think its probably worth including this now and extending 
the functionality in the next release.
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #282 from marmbrus/cacheTables and squashes the following commits:
    
    83785db [Michael Armbrust] Support for caching and uncaching tables in a 
SQLContext.

commit 764353d2c5162352781c273dd3d4af6a309190c7
Author: Mark Hamstra <markhams...@gmail.com>
Date:   2014-04-02T01:35:50Z

    [SPARK-1342] Scala 2.10.4
    
    Just a Scala version increment
    
    Author: Mark Hamstra <markhams...@gmail.com>
    
    Closes #259 from markhamstra/scala-2.10.4 and squashes the following 
commits:
    
    fbec547 [Mark Hamstra] [SPARK-1342] Bumped Scala version to 2.10.4

commit afb5ea62786e3ca055e247176def3e7ecf0d2c9d
Author: Diana Carroll <dcarr...@cloudera.com>
Date:   2014-04-02T02:29:26Z

    [Spark-1134] only call ipython if no arguments are given; remove 
IPYTHONOPTS from call
    
    see comments on Pull Request https://github.com/apache/spark/pull/38
    (i couldn't figure out how to modify an existing pull request, so I'm 
hoping I can withdraw that one and replace it with this one.)
    
    Author: Diana Carroll <dcarr...@cloudera.com>
    
    Closes #227 from dianacarroll/spark-1134 and squashes the following commits:
    
    ffe47f2 [Diana Carroll] [spark-1134] remove ipythonopts from ipython command
    b673bf7 [Diana Carroll] Merge branch 'master' of github.com:apache/spark
    0309cf9 [Diana Carroll] SPARK-1134 bug with ipython prevents 
non-interactive use with spark; only call ipython if no command line arguments 
were supplied

commit 45df9127365f8942794273b8ada004bf6ea3ef10
Author: Matei Zaharia <ma...@databricks.com>
Date:   2014-04-02T02:31:50Z

    Revert "[Spark-1134] only call ipython if no arguments are given; remove 
IPYTHONOPTS from call"
    
    This reverts commit afb5ea62786e3ca055e247176def3e7ecf0d2c9d.

commit 8b3045ceab591a3f3ca18823c7e2c5faca38a06e
Author: Manish Amde <manish...@gmail.com>
Date:   2014-04-02T04:40:49Z

    MLI-1 Decision Trees
    
    Joint work with @hirakendu, @etrain, @atalwalkar and @harsha2010.
    
    Key features:
    + Supports binary classification and regression
    + Supports gini, entropy and variance for information gain calculation
    + Supports both continuous and categorical features
    
    The algorithm has gone through several development iterations over the last 
few months leading to a highly optimized implementation. Optimizations include:
    
    1. Level-wise training to reduce passes over the entire dataset.
    2. Bin-wise split calculation to reduce computation overhead.
    3. Aggregation over partitions before combining to reduce communication 
overhead.
    
    Author: Manish Amde <manish...@gmail.com>
    Author: manishamde <manish...@gmail.com>
    Author: Xiangrui Meng <m...@databricks.com>
    
    Closes #79 from manishamde/tree and squashes the following commits:
    
    1e8c704 [Manish Amde] remove numBins field in the Strategy class
    7d54b4f [manishamde] Merge pull request #4 from mengxr/dtree
    f536ae9 [Xiangrui Meng] another pass on code style
    e1dd86f [Manish Amde] implementing code style suggestions
    62dc723 [Manish Amde] updating javadoc and converting helper methods to 
package private to allow unit testing
    201702f [Manish Amde] making some more methods private
    f963ef5 [Manish Amde] making methods private
    c487e6a [manishamde] Merge pull request #1 from mengxr/dtree
    24500c5 [Xiangrui Meng] minor style updates
    4576b64 [Manish Amde] documentation and for to while loop conversion
    ff363a7 [Manish Amde] binary search for bins and while loop for categorical 
feature bins
    632818f [Manish Amde] removing threshold for classification predict method
    2116360 [Manish Amde] removing dummy bin calculation for categorical 
variables
    6068356 [Manish Amde] ensuring num bins is always greater than max number 
of categories
    62c2562 [Manish Amde] fixing comment indentation
    ad1fc21 [Manish Amde] incorporated mengxr's code style suggestions
    d1ef4f6 [Manish Amde] more documentation
    794ff4d [Manish Amde] minor improvements to docs and style
    eb8fcbe [Manish Amde] minor code style updates
    cd2c2b4 [Manish Amde] fixing code style based on feedback
    63e786b [Manish Amde] added multiple train methods for java compatability
    d3023b3 [Manish Amde] adding more docs for nested methods
    84f85d6 [Manish Amde] code documentation
    9372779 [Manish Amde] code style: max line lenght <= 100
    dd0c0d7 [Manish Amde] minor: some docs
    0dd7659 [manishamde] basic doc
    5841c28 [Manish Amde] unit tests for categorical features
    f067d68 [Manish Amde] minor cleanup
    c0e522b [Manish Amde] updated predict and split threshold logic
    b09dc98 [Manish Amde] minor refactoring
    6b7de78 [Manish Amde] minor refactoring and tests
    d504eb1 [Manish Amde] more tests for categorical features
    dbb7ac1 [Manish Amde] categorical feature support
    6df35b9 [Manish Amde] regression predict logic
    53108ed [Manish Amde] fixing index for highest bin
    e23c2e5 [Manish Amde] added regression support
    c8f6d60 [Manish Amde] adding enum for feature type
    b0e3e76 [Manish Amde] adding enum for feature type
    154aa77 [Manish Amde] enums for configurations
    733d6dd [Manish Amde] fixed tests
    02c595c [Manish Amde] added command line parsing
    98ec8d5 [Manish Amde] tree building and prediction logic
    b0eb866 [Manish Amde] added logic to handle leaf nodes
    80e8c66 [Manish Amde] working version of multi-level split calculation
    4798aae [Manish Amde] added gain stats class
    dad0afc [Manish Amde] decison stump functionality working
    03f534c [Manish Amde] some more tests
    0012a77 [Manish Amde] basic stump working
    8bca1e2 [Manish Amde] additional code for creating intermediate RDD
    92cedce [Manish Amde] basic building blocks for intermediate RDD 
calculation. untested.
    cd53eae [Manish Amde] skeletal framework

commit ea9de658a365dca2b7403d8fab68a8a87c4e06c8
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-04-02T06:54:38Z

    Remove * from test case golden filename.
    
    @rxin mentioned this might cause issues on windows machines.
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #297 from marmbrus/noStars and squashes the following commits:
    
    263122a [Michael Armbrust] Remove * from test case golden filename.

commit 11973a7bdad58fdb759033c232d87f0b279c83b4
Author: Kay Ousterhout <kayousterh...@gmail.com>
Date:   2014-04-02T17:35:52Z

    Renamed stageIdToActiveJob to jobIdToActiveJob.
    
    This data structure was misused and, as a result, later renamed to an 
incorrect name.
    
    This data structure seems to have gotten into this tangled state as a 
result of @henrydavidge using the stageID instead of the job Id to index into 
it and later @andrewor14 renaming the data structure to reflect this 
misunderstanding.
    
    This patch renames it and removes an incorrect indexing into it.  The 
incorrect indexing into it meant that the code added by @henrydavidge to warn 
when a task size is too large (added here 
https://github.com/apache/spark/commit/57579934f0454f258615c10e69ac2adafc5b9835)
 was not always executed; this commit fixes that.
    
    Author: Kay Ousterhout <kayousterh...@gmail.com>
    
    Closes #301 from kayousterhout/fixCancellation and squashes the following 
commits:
    
    bd3d3a4 [Kay Ousterhout] Renamed stageIdToActiveJob to jobIdToActiveJob.

commit de8eefa804e229635eaa29a78b9e9ce161ac58e1
Author: Andrew Or <andrewo...@gmail.com>
Date:   2014-04-02T17:43:09Z

    [SPARK-1385] Use existing code for JSON de/serialization of BlockId
    
    `BlockId.scala` offers a way to reconstruct a BlockId from a string through 
regex matching. `util/JsonProtocol.scala` duplicates this functionality by 
explicitly matching on the BlockId type.
    With this PR, the de/serialization of BlockIds will go through the first 
(older) code path.
    
    (Most of the line changes in this PR involve changing `==` to `===` in 
`JsonProtocolSuite.scala`)
    
    Author: Andrew Or <andrewo...@gmail.com>
    
    Closes #289 from andrewor14/blockid-json and squashes the following commits:
    
    409d226 [Andrew Or] Simplify JSON de/serialization for BlockId

commit 78236334e4ca7518b6d7d9b38464dbbda854a777
Author: Daniel Darabos <darabos.dan...@gmail.com>
Date:   2014-04-02T19:27:37Z

    Do not re-use objects in the EdgePartition/EdgeTriplet iterators.
    
    This avoids a silent data corruption issue 
(https://spark-project.atlassian.net/browse/SPARK-1188) and has no performance 
impact by my measurements. It also simplifies the code. As far as I can tell 
the object re-use was nothing but premature optimization.
    
    I did actual benchmarks for all the included changes, and there is no 
performance difference. I am not sure where to put the benchmarks. Does Spark 
not have a benchmark suite?
    
    This is an example benchmark I did:
    
    test("benchmark") {
      val builder = new EdgePartitionBuilder[Int]
      for (i <- (1 to 10000000)) {
        builder.add(i.toLong, i.toLong, i)
      }
      val p = builder.toEdgePartition
      p.map(_.attr + 1).iterator.toList
    }
    
    It ran for 10 seconds both before and after this change.
    
    Author: Daniel Darabos <darabos.dan...@gmail.com>
    
    Closes #276 from darabos/spark-1188 and squashes the following commits:
    
    574302b [Daniel Darabos] Restore "manual" copying in 
EdgePartition.map(Iterator). Add comment to discourage novices like myself from 
trying to simplify the code.
    4117a64 [Daniel Darabos] Revert EdgePartitionSuite.
    4955697 [Daniel Darabos] Create a copy of the Edge objects in 
EdgeRDD.compute(). This avoids exposing the object re-use, while still enables 
the more efficient behavior for internal code.
    4ec77f8 [Daniel Darabos] Add comments about object re-use to the affected 
functions.
    2da5e87 [Daniel Darabos] Restore object re-use in EdgePartition.
    0182f2b [Daniel Darabos] Do not re-use objects in the 
EdgePartition/EdgeTriplet iterators. This avoids a silent data corruption issue 
(SPARK-1188) and has no performance impact in my measurements. It also 
simplifies the code.
    c55f52f [Daniel Darabos] Tests that reproduce the problems from SPARK-1188.

commit 1faa57971192226837bea32eb29eae5bfb425a7e
Author: Cheng Lian <lian.cs....@gmail.com>
Date:   2014-04-02T19:47:22Z

    [SPARK-1371][WIP] Compression support for Spark SQL in-memory columnar 
storage
    
    JIRA issue: [SPARK-1373](https://issues.apache.org/jira/browse/SPARK-1373)
    
    (Although tagged as WIP, this PR is structurally complete. The only things 
left unimplemented are 3 more compression algorithms: `BooleanBitSet`, 
`IntDelta` and `LongDelta`, which are trivial to add later in this or another 
separate PR.)
    
    This PR contains compression support for Spark SQL in-memory columnar 
storage. Main interfaces include:
    
    *   `CompressionScheme`
    
        Each `CompressionScheme` represents a concrete compression algorithm, 
which basically consists of an `Encoder` for compression and a `Decoder` for 
decompression. Algorithms implemented include:
    
        * `RunLengthEncoding`
        * `DictionaryEncoding`
    
        Algorithms to be implemented include:
    
        * `BooleanBitSet`
        * `IntDelta`
        * `LongDelta`
    
    *   `CompressibleColumnBuilder`
    
        A stackable `ColumnBuilder` trait used to build byte buffers for 
compressible columns.  A best `CompressionScheme` that exhibits lowest 
compression ratio is chosen for each column according to statistical 
information gathered while elements are appended into the `ColumnBuilder`. 
However, if no `CompressionScheme` can achieve a compression ratio better than 
80%, no compression will be done for this column to save CPU time.
    
        Memory layout of the final byte buffer is showed below:
    
        ```
         .--------------------------- Column type ID (4 bytes)
         |   .----------------------- Null count N (4 bytes)
         |   |   .------------------- Null positions (4 x N bytes, empty if 
null count is zero)
         |   |   |     .------------- Compression scheme ID (4 bytes)
         |   |   |     |   .--------- Compressed non-null elements
         V   V   V     V   V
        +---+---+-----+---+---------+
        |   |   | ... |   | ... ... |
        +---+---+-----+---+---------+
         \-----------/ \-----------/
            header         body
        ```
    
    *   `CompressibleColumnAccessor`
    
        A stackable `ColumnAccessor` trait used to iterate (possibly) 
compressed data column.
    
    *   `ColumnStats`
    
        Used to collect statistical information while loading data into 
in-memory columnar table. Optimizations like partition pruning rely on this 
information.
    
        Strictly speaking, `ColumnStats` related code is not part of the 
compression support. It's contained in this PR to ensure and validate the 
row-based API design (which is used to avoid boxing/unboxing cost whenever 
possible).
    
    A major refactoring change since PR #205 is:
    
    * Refactored all getter/setter methods for primitive types in various 
places into `ColumnType` classes to remove duplicated code.
    
    Author: Cheng Lian <lian.cs....@gmail.com>
    
    Closes #285 from liancheng/memColumnarCompression and squashes the 
following commits:
    
    ed71bbd [Cheng Lian] Addressed all PR comments by @marmbrus
    d3a4fa9 [Cheng Lian] Removed Ordering[T] in ColumnStats for better 
performance
    5034453 [Cheng Lian] Bug fix, more tests, and more refactoring
    c298b76 [Cheng Lian] Test suites refactored
    2780d6a [Cheng Lian] [WIP] in-memory columnar compression support
    211331c [Cheng Lian] WIP: in-memory columnar compression support
    85cc59b [Cheng Lian] Refactored ColumnAccessors & ColumnBuilders to remove 
duplicate code

commit ed730c95026d322f4b24d3d9fe92050ffa74cf4a
Author: Reynold Xin <r...@apache.org>
Date:   2014-04-02T19:48:04Z

    StopAfter / TopK related changes
    
    1. Renamed StopAfter to Limit to be more consistent with naming in other 
relational databases.
    2. Renamed TopK to TakeOrdered to be more consistent with Spark RDD API.
    3. Avoid breaking lineage in Limit.
    4. Added a bunch of override's to execution/basicOperators.scala.
    
    @marmbrus @liancheng
    
    Author: Reynold Xin <r...@apache.org>
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #233 from rxin/limit and squashes the following commits:
    
    13eb12a [Reynold Xin] Merge pull request #1 from marmbrus/limit
    92b9727 [Michael Armbrust] More hacks to make Maps serialize with Kryo.
    4fc8b4e [Reynold Xin] Merge branch 'master' of github.com:apache/spark into 
limit
    87b7d37 [Reynold Xin] Use the proper serializer in limit.
    9b79246 [Reynold Xin] Updated doc for Limit.
    47d3327 [Reynold Xin] Copy tuples in Limit before shuffle.
    231af3a [Reynold Xin] Limit/TakeOrdered: 1. Renamed StopAfter to Limit to 
be more consistent with naming in other relational databases. 2. Renamed TopK 
to TakeOrdered to be more consistent with Spark RDD API. 3. Avoid breaking 
lineage in Limit. 4. Added a bunch of override's to 
execution/basicOperators.scala.

commit 9c65fa76f9d413e311a80f29d35d3ff7722e9476
Author: Xiangrui Meng <m...@databricks.com>
Date:   2014-04-02T21:01:12Z

    [SPARK-1212, Part II] Support sparse data in MLlib
    
    In PR https://github.com/apache/spark/pull/117, we added dense/sparse 
vector data model and updated KMeans to support sparse input. This PR is to 
replace all other `Array[Double]` usage by `Vector` in generalized linear 
models (GLMs) and Naive Bayes. Major changes:
    
    1. `LabeledPoint` becomes `LabeledPoint(Double, Vector)`.
    2. Methods that accept `RDD[Array[Double]]` now accept `RDD[Vector]`. We 
cannot support both in an elegant way because of type erasure.
    3. Mark 'createModel' and 'predictPoint' protected because they are not for 
end users.
    4. Add libSVMFile to MLContext.
    5. NaiveBayes can accept arbitrary labels (introducing a breaking change to 
Python's `NaiveBayesModel`).
    6. Gradient computation no longer creates temp vectors.
    7. Column normalization and centering are removed from Lasso and Ridge 
because the operation will densify the data. Simple feature transformation can 
be done before training.
    
    TODO:
    1. ~~Use axpy when possible.~~
    2. ~~Optimize Naive Bayes.~~
    
    Author: Xiangrui Meng <m...@databricks.com>
    
    Closes #245 from mengxr/vector and squashes the following commits:
    
    eb6e793 [Xiangrui Meng] move libSVMFile to MLUtils and rename to 
loadLibSVMData
    c26c4fc [Xiangrui Meng] update DecisionTree to use RDD[Vector]
    11999c7 [Xiangrui Meng] Merge branch 'master' into vector
    f7da54b [Xiangrui Meng] add minSplits to libSVMFile
    da25e24 [Xiangrui Meng] revert the change to default addIntercept because 
it might change the behavior of existing code without warning
    493f26f [Xiangrui Meng] Merge branch 'master' into vector
    7c1bc01 [Xiangrui Meng] add a TODO to NB
    b9b7ef7 [Xiangrui Meng] change default value of addIntercept to false
    b01df54 [Xiangrui Meng] allow to change or clear threshold in LR and SVM
    4addc50 [Xiangrui Meng] merge master
    4ca5b1b [Xiangrui Meng] remove normalization from Lasso and update tests
    f04fe8a [Xiangrui Meng] remove normalization from RidgeRegression and 
update tests
    d088552 [Xiangrui Meng] use static constructor for MLContext
    6f59eed [Xiangrui Meng] update libSVMFile to determine number of features 
automatically
    3432e84 [Xiangrui Meng] update NaiveBayes to support sparse data
    0f8759b [Xiangrui Meng] minor updates to NB
    b11659c [Xiangrui Meng] style update
    78c4671 [Xiangrui Meng] add libSVMFile to MLContext
    f0fe616 [Xiangrui Meng] add a test for sparse linear regression
    44733e1 [Xiangrui Meng] use in-place gradient computation
    e981396 [Xiangrui Meng] use axpy in Updater
    db808a1 [Xiangrui Meng] update JavaLR example
    befa592 [Xiangrui Meng] passed scala/java tests
    75c83a4 [Xiangrui Meng] passed test compile
    1859701 [Xiangrui Meng] passed compile
    834ada2 [Xiangrui Meng] optimized MLUtils.computeStats update some ml 
algorithms to use Vector (cont.)
    135ab72 [Xiangrui Meng] merge glm
    0e57aa4 [Xiangrui Meng] update Lasso and RidgeRegression to parse the 
weights correctly from GLM mark createModel protected mark predictPoint 
protected
    d7f629f [Xiangrui Meng] fix a bug in GLM when intercept is not used
    3f346ba [Xiangrui Meng] update some ml algorithms to use Vector

commit 47ebea5468df2e4f94ef493c5403fcdcda8c5eb2
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-04-03T01:14:31Z

    [SQL] SPARK-1364 Improve datatype and test coverage for ScalaReflection 
schema inference.
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #293 from marmbrus/reflectTypes and squashes the following commits:
    
    f54e8e8 [Michael Armbrust] Improve datatype and test coverage for 
ScalaReflection schema inference.

commit 92a86b285f8a4af1bdf577dd4c4ea0fd5ca8d682
Author: Mark Hamstra <markhams...@gmail.com>
Date:   2014-04-03T21:08:47Z

    [SPARK-1398] Removed findbugs jsr305 dependency
    
    Should be a painless upgrade, and does offer some significant advantages 
should we want to leverage FindBugs more during the 1.0 lifecycle. 
http://findbugs.sourceforge.net/findbugs2.html
    
    Author: Mark Hamstra <markhams...@gmail.com>
    
    Closes #307 from markhamstra/findbugs and squashes the following commits:
    
    99f2d09 [Mark Hamstra] Removed unnecessary findbugs jsr305 dependency

commit fbebaedf26286ee8a75065822a3af1148351f828
Author: Andre Schumacher <andre.schumac...@iki.fi>
Date:   2014-04-03T22:31:47Z

    Spark parquet improvements
    
    A few improvements to the Parquet support for SQL queries:
    - Instead of files a ParquetRelation is now backed by a directory, which 
simplifies importing data from other
      sources
    - InsertIntoParquetTable operation now supports switching between 
overwriting or appending (at least in
      HiveQL)
    - tests now use the new API
    - Parquet logging can be set to WARNING level (Default)
    - Default compression for Parquet files (GZIP, as in parquet-mr)
    
    Author: Andre Schumacher <andre.schumac...@iki.fi>
    
    Closes #195 from AndreSchumacher/spark_parquet_improvements and squashes 
the following commits:
    
    54df314 [Andre Schumacher] SPARK-1383 [SQL] Improvements to ParquetRelation

commit 5d1feda217d25616d190f9bb369664e57417cd45
Author: Cheng Hao <hao.ch...@intel.com>
Date:   2014-04-03T22:33:17Z

    [SPARK-1360] Add Timestamp Support for SQL
    
    This PR includes:
    1) Add new data type Timestamp
    2) Add more data type casting base on Hive's Rule
    3) Fix bug missing data type in both parsers (HiveQl & SQLParser).
    
    Author: Cheng Hao <hao.ch...@intel.com>
    
    Closes #275 from chenghao-intel/timestamp and squashes the following 
commits:
    
    df709e5 [Cheng Hao] Move orc_ends_with_nulls to blacklist
    24b04b0 [Cheng Hao] Put 3 cases into the black 
lists(describe_pretty,describe_syntax,lateral_view_outer)
    fc512c2 [Cheng Hao] remove the unnecessary data type equality check in data 
casting
    d0d1919 [Cheng Hao] Add more data type for scala reflection
    3259808 [Cheng Hao] Add the new Golden files
    3823b97 [Cheng Hao] Update the UnitTest cases & add timestamp type for 
HiveQL
    54a0489 [Cheng Hao] fix bug mapping to 0 (which is supposed to be null) 
when NumberFormatException occurs
    9cb505c [Cheng Hao] Fix issues according to PR comments
    e529168 [Cheng Hao] Fix bug of converting from String
    6fc8100 [Cheng Hao] Update Unit Test & CodeStyle
    8a1d4d6 [Cheng Hao] Add DataType for SqlParser
    ce4385e [Cheng Hao] Add TimestampType Support

commit c1ea3afb516c204925259f0928dfb17d0fa89621
Author: Prashant Sharma <prashan...@imaginea.com>
Date:   2014-04-03T22:42:17Z

    Spark 1162 Implemented takeOrdered in pyspark.
    
    Since python does not have a library for max heap and usual tricks like 
inverting values etc.. does not work for all cases.
    
    We have our own implementation of max heap.
    
    Author: Prashant Sharma <prashan...@imaginea.com>
    
    Closes #97 from ScrapCodes/SPARK-1162/pyspark-top-takeOrdered2 and squashes 
the following commits:
    
    35f86ba [Prashant Sharma] code review
    2b1124d [Prashant Sharma] fixed tests
    e8a08e2 [Prashant Sharma] Code review comments.
    49e6ba7 [Prashant Sharma] SPARK-1162 added takeOrdered to pyspark

commit b8f534196f9a8c99f75728a06e62282d139dee28
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-04-03T22:45:34Z

    [SQL] SPARK-1333 First draft of java API
    
    WIP: Some work remains...
     * [x] Hive support
     * [x] Tests
     * [x] Update docs
    
    Feedback welcome!
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #248 from marmbrus/javaSchemaRDD and squashes the following commits:
    
    b393913 [Michael Armbrust] @srowen 's java style suggestions.
    f531eb1 [Michael Armbrust] Address matei's comments.
    33a1b1a [Michael Armbrust] Ignore JavaHiveSuite.
    822f626 [Michael Armbrust] improve docs.
    ab91750 [Michael Armbrust] Improve Java SQL API: * Change JavaRow => Row * 
Add support for querying RDDs of JavaBeans * Docs * Tests * Hive support
    0b859c8 [Michael Armbrust] First draft of java API.

commit a599e43d6e0950f6b6b32150ce264a8c2711470c
Author: Diana Carroll <dcarr...@cloudera.com>
Date:   2014-04-03T22:48:42Z

    [SPARK-1134] Fix and document passing of arguments to IPython
    
    This is based on @dianacarroll's previous pull request 
https://github.com/apache/spark/pull/227, and @joshrosen's comments on 
https://github.com/apache/spark/pull/38. Since we do want to allow passing 
arguments to IPython, this does the following:
    * It documents that IPython can't be used with standalone jobs for now. 
(Later versions of IPython will deal with PYTHONSTARTUP properly and enable 
this, see https://github.com/ipython/ipython/pull/5226, but no released version 
has that fix.)
    * If you run `pyspark` with `IPYTHON=1`, it passes your command-line 
arguments to it. This way you can do stuff like `IPYTHON=1 bin/pyspark 
notebook`.
    * The old `IPYTHON_OPTS` remains, but I've removed it from the 
documentation. This is in case people read an old tutorial that uses it.
    
    This is not a perfect solution and I'd also be okay with keeping things as 
they are today (ignoring `$@` for IPython and using IPYTHON_OPTS), and only 
doing the doc change. With this change though, when IPython fixes 
https://github.com/ipython/ipython/pull/5226, people will immediately be able 
to do `IPYTHON=1 bin/pyspark myscript.py` to run a standalone script and get 
all the benefits of running scripts in IPython (presumably better debugging and 
such). Without it, there will be no way to run scripts in IPython.
    
    @joshrosen you should probably take the final call on this.
    
    Author: Diana Carroll <dcarr...@cloudera.com>
    
    Closes #294 from mateiz/spark-1134 and squashes the following commits:
    
    747bb13 [Diana Carroll] SPARK-1134 bug with ipython prevents 
non-interactive use with spark; only call ipython if no command line arguments 
were supplied

commit d94826be6d46edf3bc6377d33787df23a6030a6c
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-04-03T23:12:08Z

    [BUILD FIX] Fix compilation of Spark SQL Java API.
    
    The JavaAPI and the Parquet improvements PRs didn't conflict, but broke the 
build.
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #316 from marmbrus/hotFixJavaApi and squashes the following commits:
    
    0b84c2d [Michael Armbrust] Fix compilation of Spark SQL Java API.

commit 9231b011a9ba5a2b25bd3d1a68be7d1a7cb735da
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-04-03T23:53:35Z

    Fix jenkins from giving the green light to builds that don't compile.
    
     Adding `| grep` swallows the non-zero return code from sbt failures. See 
[here](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13735/consoleFull)
 for a Jenkins run that fails to compile, but still gets a green light.
    
    Note the [BUILD FIX] commit isn't actually part of this PR, but github is 
out of date.
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #317 from marmbrus/fixJenkins and squashes the following commits:
    
    7c77ff9 [Michael Armbrust] Remove output filter that was swallowing 
non-zero exit codes for test failures.

commit 33e63618d061eeaae257a7350ea3287a702fc123
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-04-04T00:00:06Z

    Revert "[SPARK-1398] Removed findbugs jsr305 dependency"
    
    This reverts commit 92a86b285f8a4af1bdf577dd4c4ea0fd5ca8d682.

commit ee6e9e7d863022304ac9ced405b353b63accb6ab
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-04-04T05:13:56Z

    SPARK-1337: Application web UI garbage collects newest stages
    
    Simple fix...
    
    Author: Patrick Wendell <pwend...@gmail.com>
    
    Closes #320 from pwendell/stage-clean-up and squashes the following commits:
    
    29be62e [Patrick Wendell] SPARK-1337: Application web UI garbage collects 
newest stages instead old ones

commit 7f32fd42aaadcf6626b4d8bcf6295203b43b2037
Author: Sandy Ryza <sa...@cloudera.com>
Date:   2014-04-04T13:54:04Z

    SPARK-1350. Always use JAVA_HOME to run executor container JVMs.
    
    Author: Sandy Ryza <sa...@cloudera.com>
    
    Closes #313 from sryza/sandy-spark-1350 and squashes the following commits:
    
    bb6d187 [Sandy Ryza] SPARK-1350. Always use JAVA_HOME to run executor 
container JVMs.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Fix org.scala-lang: * inconsistent versions

Reply via email to