git commit: Merge the old sbt-launch-lib.bash with the new sbt-launcher jar downloading logic.

2014-03-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6fc76e49c - 012bd5fbc Merge the old sbt-launch-lib.bash with the new sbt-launcher jar downloading logic. This allows developers to pass options (such as -D) to sbt. I also modified the SparkBuild to ensure spark specific properties are

git commit: SPARK-1173. Improve scala streaming docs.

2014-03-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 55a4f11b5 - 2b53447f3 SPARK-1173. Improve scala streaming docs. Clarify imports to add implicit conversions to DStream and fix other small typos in the streaming intro documentation. Tested by inspecting output via a local jekyll server,

git commit: Removed accidentally checked in comment

2014-03-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master f65c1f38e - 369aad6f9 Removed accidentally checked in comment It looks like this comment was added a while ago by @mridulm as part of a merge and was accidentally checked in. We should remove it. Author: Kay Ousterhout

git commit: update proportion of memory

2014-03-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master 369aad6f9 - 9d225a910 update proportion of memory The default value of spark.storage.memoryFraction has been changed from 0.66 to 0.6 . So it should be 60% of the memory to cache while 40% used for task execution. Author: Chen Chao

git commit: SPARK-1178: missing document of spark.scheduler.revive.interval

2014-03-04 Thread rxin
Repository: spark Updated Branches: refs/heads/master 2d8e0a062 - 1865dd681 SPARK-1178: missing document of spark.scheduler.revive.interval https://spark-project.atlassian.net/browse/SPARK-1178 The configuration on spark.scheduler.revive.interval is undocumented but actually used

git commit: Fix maven jenkins: Add explicit init for required tables in SQLQuerySuite

2014-03-20 Thread rxin
Repository: spark Updated Branches: refs/heads/master 9aadcffab - e09139d9c Fix maven jenkins: Add explicit init for required tables in SQLQuerySuite Sorry! I added this test at the last minute and failed to run it in maven as well. Note that, this will probably not be sufficient to

git commit: Make SQL keywords case-insensitive

2014-03-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master 2c0aa22e2 - dab5439a0 Make SQL keywords case-insensitive This is a bit of a hack that allows all variations of a keyword, but it still seems to produce valid error messages and such. Author: Matei Zaharia ma...@databricks.com Closes

git commit: Add asCode function for dumping raw tree representations.

2014-03-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master dab5439a0 - d78098364 Add asCode function for dumping raw tree representations. Intended only for use by Catalyst developers. Author: Michael Armbrust mich...@databricks.com Closes #200 from marmbrus/asCode and squashes the following

git commit: Implement the RLike Like in catalyst

2014-03-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 161781609 - af3746ce0 Implement the RLike Like in catalyst This PR includes: 1) Unify the unit test for expression evaluation 2) Add implementation of RLike Like Author: Cheng Hao hao.ch...@intel.com Closes #224 from

git commit: [SQL] Rewrite join implementation to allow streaming of one relation.

2014-03-31 Thread rxin
that branch). @rxin @liancheng Author: Michael Armbrust mich...@databricks.com Closes #250 from marmbrus/hashJoin and squashes the following commits: 1ad873e [Michael Armbrust] Change hasNext logic back to the correct version. 8e6f2a2 [Michael Armbrust] Review comments. 1e9fb63 [Michael

git commit: [SQL] SPARK-1372 Support for caching and uncaching tables in a SQLContext.

2014-04-01 Thread rxin
Repository: spark Updated Branches: refs/heads/master ada310a9d - f5c418da0 [SQL] SPARK-1372 Support for caching and uncaching tables in a SQLContext. This doesn't yet support different databases in Hive (though you can probably workaround this by calling `USE dbname`). However, given the

git commit: Do not re-use objects in the EdgePartition/EdgeTriplet iterators.

2014-04-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master de8eefa80 - 78236334e Do not re-use objects in the EdgePartition/EdgeTriplet iterators. This avoids a silent data corruption issue (https://spark-project.atlassian.net/browse/SPARK-1188) and has no performance impact by my measurements.

git commit: small fix ( proogram - program )

2014-04-04 Thread rxin
Repository: spark Updated Branches: refs/heads/master 8de038eb3 - 0acc7a02b small fix ( proogram - program ) Author: Prabeesh K prabsma...@gmail.com Closes #331 from prabeesh/patch-3 and squashes the following commits: 9399eb5 [Prabeesh K] small fix(proogram - program) Project:

git commit: [SQL] SPARK-1427 Fix toString for SchemaRDD NativeCommands.

2014-04-07 Thread rxin
Repository: spark Updated Branches: refs/heads/master accd0999f - b5bae849d [SQL] SPARK-1427 Fix toString for SchemaRDD NativeCommands. Author: Michael Armbrust mich...@databricks.com Closes #343 from marmbrus/toStringFix and squashes the following commits: 37198fe [Michael Armbrust] Fix

git commit: [sql] Rename Expression.apply to eval for better readability.

2014-04-07 Thread rxin
Repository: spark Updated Branches: refs/heads/master a3c51c6ea - 83f2a2f14 [sql] Rename Expression.apply to eval for better readability. Also used this opportunity to add a bunch of override's and made some members private. Author: Reynold Xin r...@apache.org Closes #340 from rxin/eval

git commit: [sql] Rename execution/aggregates.scala Aggregate.scala, and added a bunch of private[this] to variables.

2014-04-07 Thread rxin
Repository: spark Updated Branches: refs/heads/master 0307db0f5 - 14c9238aa [sql] Rename execution/aggregates.scala Aggregate.scala, and added a bunch of private[this] to variables. Author: Reynold Xin r...@apache.org Closes #348 from rxin/aggregate and squashes the following commits

git commit: [SPARK-1402] Added 3 more compression schemes

2014-04-07 Thread rxin
Repository: spark Updated Branches: refs/heads/master f27e56aa6 - 0d0493fcf [SPARK-1402] Added 3 more compression schemes JIRA issue: [SPARK-1402](https://issues.apache.org/jira/browse/SPARK-1402) This PR provides 3 more compression schemes for Spark SQL in-memory columnar storage: *

git commit: Remove extra semicolon in import statement and unused import in ApplicationMaster

2014-04-08 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6dc5f5849 - 3bc054893 Remove extra semicolon in import statement and unused import in ApplicationMaster Small nit cleanup to remove extra semicolon and unused import in Yarn's stable ApplicationMaster (it bothers me every time I saw it)

git commit: Update tuning.md

2014-04-10 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7b52b6631 - f04666252 Update tuning.md http://stackoverflow.com/questions/9699071/what-is-the-javas-internal-represention-for-string-modified-utf-8-utf-16 Author: Andrew Ash and...@andrewash.com Closes #384 from ash211/patch-2 and

git commit: SPARK-1501: Ensure assertions in Graph.apply are asserted.

2014-04-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 1491b2a0d - 692dd6936 SPARK-1501: Ensure assertions in Graph.apply are asserted. The Graph.apply test in GraphSuite had some assertions in a closure in a graph transformation. As a consequence, these assertions never actually executed.

git commit: [SQL] SPARK-1424 Generalize insertIntoTable functions on SchemaRDDs

2014-04-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 95647fad1 - e5130d978 [SQL] SPARK-1424 Generalize insertIntoTable functions on SchemaRDDs This makes it possible to create tables and insert into them using the DSL and SQL for the scala and java apis. Author: Michael Armbrust

git commit: Rebuild routing table after Graph.reverse

2014-04-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master 987760ec0 - 235a47ce1 Rebuild routing table after Graph.reverse GraphImpl.reverse used to reverse edges in each partition of the edge RDD but preserve the routing table and replicated vertex view, since reversing should not affect

git commit: SPARK-1329: Create pid2vid with correct number of partitions

2014-04-16 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 602b9ea65 - b4ea3d972 SPARK-1329: Create pid2vid with correct number of partitions Each vertex partition is co-located with a pid2vid array created in RoutingTable.scala. This array maps edge partition IDs to the list of vertices in

git commit: remove unnecessary brace and semicolon in 'putBlockInfo.synchronize' block

2014-04-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master 17d323455 - 016a87764 remove unnecessary brace and semicolon in 'putBlockInfo.synchronize' block delete semicolon Author: Chen Chao crazy...@gmail.com Closes #411 from CrazyJvm/patch-5 and squashes the following commits: 72333a3 [Chen

git commit: remove unnecessary brace and semicolon in 'putBlockInfo.synchronize' block

2014-04-16 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 b4ea3d972 - e43e31ded remove unnecessary brace and semicolon in 'putBlockInfo.synchronize' block delete semicolon Author: Chen Chao crazy...@gmail.com Closes #411 from CrazyJvm/patch-5 and squashes the following commits: 72333a3

git commit: Fixing a race condition in event listener unit test

2014-04-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master 016a87764 - 38877ccf3 Fixing a race condition in event listener unit test Author: Kan Zhang kzh...@apache.org Closes #401 from kanzhang/fix-1475 and squashes the following commits: c6058bd [Kan Zhang] Fixing a race condition in event

git commit: misleading task number of groupByKey

2014-04-16 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 f0abf5f08 - 51c41da51 misleading task number of groupByKey By default, this uses only 8 parallel tasks to do the grouping. is a big misleading. Please refer to https://github.com/apache/spark/pull/389 detail is as following code :

git commit: misleading task number of groupByKey

2014-04-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master 38877ccf3 - 9c40b9ead misleading task number of groupByKey By default, this uses only 8 parallel tasks to do the grouping. is a big misleading. Please refer to https://github.com/apache/spark/pull/389 detail is as following code : def

git commit: Update ReducedWindowedDStream.scala

2014-04-16 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 51c41da51 - 822353dc5 Update ReducedWindowedDStream.scala change _slideDuration to _windowDuration Author: baishuo(白硕) vc_j...@hotmail.com Closes #425 from baishuo/master and squashes the following commits: 6f09ea1

git commit: Include stack trace for exceptions thrown by user code.

2014-04-16 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 822353dc5 - aef8a4a51 Include stack trace for exceptions thrown by user code. It is very confusing when your code throws an exception, but the only stack trace show is in the DAGScheduler. This is a simple patch to include the stack

git commit: [python alternative] pyspark require Python2, failing if system default is Py3 from shell.py

2014-04-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6ad4c5498 - bb76eae1b [python alternative] pyspark require Python2, failing if system default is Py3 from shell.py Python alternative for https://github.com/apache/spark/pull/392; managed from shell.py Author: AbhishekKr

git commit: [python alternative] pyspark require Python2, failing if system default is Py3 from shell.py

2014-04-16 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 13fb4c782 - b3ad707c4 [python alternative] pyspark require Python2, failing if system default is Py3 from shell.py Python alternative for https://github.com/apache/spark/pull/392; managed from shell.py Author: AbhishekKr

git commit: SPARK-1483: Rename minSplits to minPartitions in public APIs

2014-04-18 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 1c0dc3733 - 969a07577 SPARK-1483: Rename minSplits to minPartitions in public APIs https://issues.apache.org/jira/browse/SPARK-1483 From the original JIRA: The parameter name is part of the public API in Scala and Python, since you

git commit: Reuses Row object in ExistingRdd.productToRowRdd()

2014-04-18 Thread rxin
Repository: spark Updated Branches: refs/heads/master e31c8ffca - 89f47434e Reuses Row object in ExistingRdd.productToRowRdd() Author: Cheng Lian lian.cs@gmail.com Closes #432 from liancheng/reuseRow and squashes the following commits: 9e6d083 [Cheng Lian] Simplified code with

git commit: [SPARK-1520] remove fastutil from dependencies

2014-04-18 Thread rxin
Repository: spark Updated Branches: refs/heads/master 89f47434e - aa17f022c [SPARK-1520] remove fastutil from dependencies A quick fix for https://issues.apache.org/jira/browse/SPARK-1520 By excluding fastutil, we bring the number of files in the assembly jar back under 65536, so Java 7

git commit: [SPARK-1520] remove fastutil from dependencies

2014-04-18 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 977467e51 - c40eec844 [SPARK-1520] remove fastutil from dependencies A quick fix for https://issues.apache.org/jira/browse/SPARK-1520 By excluding fastutil, we bring the number of files in the assembly jar back under 65536, so Java 7

git commit: SPARK-1357 (addendum). More Experimental items in MLlib

2014-04-18 Thread rxin
Repository: spark Updated Branches: refs/heads/master aa17f022c - 8aa1f4c4f SPARK-1357 (addendum). More Experimental items in MLlib Per discussion, this is my suggestion to make ALS Rating, ClassificationModel, RegressionModel experimental for now, to reserve the right to possibly change

git commit: SPARK-1357 (addendum). More Experimental items in MLlib

2014-04-18 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 c40eec844 - 1a304297f SPARK-1357 (addendum). More Experimental items in MLlib Per discussion, this is my suggestion to make ALS Rating, ClassificationModel, RegressionModel experimental for now, to reserve the right to possibly change

git commit: SPARK-1523: improve the readability of code in AkkaUtil

2014-04-18 Thread rxin
Repository: spark Updated Branches: refs/heads/master 8aa1f4c4f - 3c7a9bae9 SPARK-1523: improve the readability of code in AkkaUtil Actually it is separated from https://github.com/apache/spark/pull/85 as suggested by @rxin compare https://github.com/apache/spark/blob/master/core/src/main

git commit: README update

2014-04-18 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 ea174606d - 2fe6b183e README update Author: Reynold Xin r...@apache.org Closes #443 from rxin/readme and squashes the following commits: 16853de [Reynold Xin] Updated SBT and Scala instructions. 3ac3ceb [Reynold Xin] README update

git commit: README update

2014-04-18 Thread rxin
Repository: spark Updated Branches: refs/heads/master 2089e0e7e - 28238c81d README update Author: Reynold Xin r...@apache.org Closes #443 from rxin/readme and squashes the following commits: 16853de [Reynold Xin] Updated SBT and Scala instructions. 3ac3ceb [Reynold Xin] README update

git commit: SPARK-1539: RDDPage.scala contains RddPage class

2014-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/master af46f1fd0 - b7df31eb3 SPARK-1539: RDDPage.scala contains RddPage class SPARK-1386 changed RDDPage to RddPage but didn't change the filename. I tried sbt/sbt publish-local. Inside the spark-core jar, the unit name is RDDPage.class and

git commit: SPARK-1539: RDDPage.scala contains RddPage class

2014-04-21 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 d62ce6d0c - 8aa3860bf SPARK-1539: RDDPage.scala contains RddPage class SPARK-1386 changed RDDPage to RddPage but didn't change the filename. I tried sbt/sbt publish-local. Inside the spark-core jar, the unit name is RDDPage.class and

git commit: [SQL] Add support for parsing indexing into arrays in SQL.

2014-04-24 Thread rxin
Repository: spark Updated Branches: refs/heads/master 526a518bf - 4660991e6 [SQL] Add support for parsing indexing into arrays in SQL. Author: Michael Armbrust mich...@databricks.com Closes #518 from marmbrus/parseArrayIndex and squashes the following commits: afd2d6b [Michael Armbrust] 100

git commit: Generalize pattern for planning hash joins.

2014-04-24 Thread rxin
want to repeat the logic for finding the join keys. Author: Michael Armbrust mich...@databricks.com Closes #418 from marmbrus/hashFilter and squashes the following commits: d5cc79b [Michael Armbrust] Address @rxin 's comments. 366b6d9 [Michael Armbrust] style fixes 14560eb [Michael Armbrust

git commit: Fix [SPARK-1078]: Remove the Unnecessary lift-json dependency

2014-04-24 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 db698414f - 496b9ae18 Fix [SPARK-1078]: Remove the Unnecessary lift-json dependency Remove the Unnecessary lift-json dependency from pom.xml Author: Sandeep sand...@techaddict.me Closes #536 from techaddict/FIX-SPARK-1078 and

git commit: Fix [SPARK-1078]: Remove the Unnecessary lift-json dependency

2014-04-24 Thread rxin
Repository: spark Updated Branches: refs/heads/master 06e82d94b - 095b51825 Fix [SPARK-1078]: Remove the Unnecessary lift-json dependency Remove the Unnecessary lift-json dependency from pom.xml Author: Sandeep sand...@techaddict.me Closes #536 from techaddict/FIX-SPARK-1078 and squashes

git commit: [Typo] In the maven docs: chd - cdh

2014-04-24 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 ab131abd4 - db698414f [Typo] In the maven docs: chd - cdh Author: Andrew Or andrewo...@gmail.com Closes #548 from andrewor14/doc-typo and squashes the following commits: 3eaf4c4 [Andrew Or] chd - cdh (cherry picked from commit

git commit: add note of how to support table with more than 22 fields

2014-04-26 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 f85c6815e - a020686de add note of how to support table with more than 22 fields Author: wangfei wangf...@huawei.com Closes #564 from scwf/patch-6 and squashes the following commits: a331876 [wangfei] Update sql-programming-guide.md

git commit: [SPARK-1608] [SQL] Fix Cast.nullable when cast from StringType to NumericType/TimestampType.

2014-04-26 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 a020686de - dcea67f11 [SPARK-1608] [SQL] Fix Cast.nullable when cast from StringType to NumericType/TimestampType. `Cast.nullable` should be `true` when cast from `StringType` to `NumericType` or `TimestampType`. Because if

git commit: [SPARK-1608] [SQL] Fix Cast.nullable when cast from StringType to NumericType/TimestampType.

2014-04-26 Thread rxin
Repository: spark Updated Branches: refs/heads/master e6e44e46e - 8e37ed6eb [SPARK-1608] [SQL] Fix Cast.nullable when cast from StringType to NumericType/TimestampType. `Cast.nullable` should be `true` when cast from `StringType` to `NumericType` or `TimestampType`. Because if `StringType`

git commit: Update the import package name for TestHive in sbt shell

2014-04-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 71f4d2612 - ea01affc3 Update the import package name for TestHive in sbt shell sbt/sbt hive/console will fail as TestHive changed its package from org.apache.spark.sql.hive to org.apache.spark.sql.hive.test. Author: Cheng Hao

git commit: [SQL]Append some missing types for HiveUDF

2014-04-28 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 eb9308e18 - 42cb3b41b [SQL]Append some missing types for HiveUDF Add the missing types Author: Cheng Hao hao.ch...@intel.com Closes #459 from chenghao-intel/missing_types and squashes the following commits: 21cba2e [Cheng Hao]

git commit: [SPARK-1646] Micro-optimisation of ALS

2014-04-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master d33df1c15 - 5c0cd5c1a [SPARK-1646] Micro-optimisation of ALS This change replaces some Scala `for` and `foreach` constructs with `while` constructs. There may be a slight performance gain on the order of 1-2% when training an ALS model.

git commit: [SPARK-1646] Micro-optimisation of ALS

2014-04-29 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 919ed3108 - 92269f97c [SPARK-1646] Micro-optimisation of ALS This change replaces some Scala `for` and `foreach` constructs with `while` constructs. There may be a slight performance gain on the order of 1-2% when training an ALS

git commit: Handle the vals that never used

2014-04-29 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 775020f00 - b0ded1f6f Handle the vals that never used In XORShiftRandom.scala, use val million instead of constant 1e6.toInt. Delete vals that never used in other files. Author: WangTao barneystin...@aliyun.com Closes #565 from

git commit: [SQL] SPARK-1661 - Fix regex_serde test

2014-05-01 Thread rxin
Repository: spark Updated Branches: refs/heads/master 98b65593b - a43d9c14f [SQL] SPARK-1661 - Fix regex_serde test The JIRA in question is actually reporting a bug with Shark, but I wanted to make sure Spark SQL did not have similar problems. This fixes a bug in our parsing code that was

git commit: [SQL] Better logging when applying rules.

2014-05-03 Thread rxin
Repository: spark Updated Branches: refs/heads/master 4669a84ab - b29571470 [SQL] Better logging when applying rules. Author: Michael Armbrust mich...@databricks.com Closes #616 from marmbrus/ruleLogging and squashes the following commits: 39c09fe [Michael Armbrust] Fix off by one error.

git commit: Updated doc for spark.closure.serializer to indicate only Java serializer work.

2014-05-05 Thread rxin
: Reynold Xin r...@apache.org Closes #642 from rxin/docs-ser and squashes the following commits: a507db5 [Reynold Xin] Use Java instead of default. 5eb8cdd [Reynold Xin] Updated doc for spark.closure.serializer to indicate only the default serializer work. Project: http://git-wip-us.apache.org

git commit: Update OpenHashSet.scala

2014-05-06 Thread rxin
Repository: spark Updated Branches: refs/heads/master 3c64750bd - 0a5a46811 Update OpenHashSet.scala Modify wrong comment of function addWithoutResize. Author: ArcherShao archers...@users.noreply.github.com Closes #667 from ArcherShao/patch-3 and squashes the following commits: a607358

git commit: Update OpenHashSet.scala

2014-05-06 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 39ac62d6c - 4ff39292c Update OpenHashSet.scala Modify wrong comment of function addWithoutResize. Author: ArcherShao archers...@users.noreply.github.com Closes #667 from ArcherShao/patch-3 and squashes the following commits: a607358

git commit: [SQL] Improve SparkSQL Aggregates

2014-05-10 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6ed7e2cd0 - 19c8fb02b [SQL] Improve SparkSQL Aggregates * Add native min/max (was using hive before). * Handle nulls correctly in Avg and Sum. Author: Michael Armbrust mich...@databricks.com Closes #683 from marmbrus/aggFixes and

git commit: Modify a typo in monitoring.md

2014-05-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master 5c2275d6e - 9cf9f1897 Modify a typo in monitoring.md As I mentioned in SPARK-1765, there is a word 'JXM' in monitoring.md. I think it's typo for 'JMX'. Author: Kousuke Saruta saru...@oss.nttdata.co.jp Closes #698 from sarutak/SPARK-1765

git commit: L-BFGS Documentation

2014-05-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master a5150d199 - 5c2275d6e L-BFGS Documentation Documentation for L-BFGS, and an example of training binary L2 logistic regression using L-BFGS. Author: DB Tsai dbt...@alpinenow.com Closes #702 from dbtsai/dbtsai-lbfgs-doc and squashes the

git commit: SPARK-1757 Failing test for saving null primitives with .saveAsParquetFile()

2014-05-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master 9cf9f1897 - 156df87e7 SPARK-1757 Failing test for saving null primitives with .saveAsParquetFile() https://issues.apache.org/jira/browse/SPARK-1757 The first test succeeds, but the second test fails with exception: ``` [info] - save and

git commit: SPARK-1757 Failing test for saving null primitives with .saveAsParquetFile()

2014-05-12 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 89b56d7b7 - b52ac0e0b SPARK-1757 Failing test for saving null primitives with .saveAsParquetFile() https://issues.apache.org/jira/browse/SPARK-1757 The first test succeeds, but the second test fails with exception: ``` [info] - save

git commit: [SQL] Make Hive Metastore conversion functions publicly visible.

2014-05-12 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 59695b367 - 24cc933c8 [SQL] Make Hive Metastore conversion functions publicly visible. I need this to be public for the implementation of SharkServer2. However, I think this functionality is generally useful and should be pretty

git commit: SPARK-1791 - SVM implementation does not use threshold parameter

2014-05-13 Thread rxin
Repository: spark Updated Branches: refs/heads/master 16ffadcc4 - d1e487473 SPARK-1791 - SVM implementation does not use threshold parameter Summary: https://issues.apache.org/jira/browse/SPARK-1791 Simple fix, and backward compatible, since - anyone who set the threshold was getting

git commit: SPARK-1791 - SVM implementation does not use threshold parameter

2014-05-13 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 d08e9604f - d6994f4e6 SPARK-1791 - SVM implementation does not use threshold parameter Summary: https://issues.apache.org/jira/browse/SPARK-1791 Simple fix, and backward compatible, since - anyone who set the threshold was getting

git commit: Implement ApproximateCountDistinct for SparkSql

2014-05-13 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 66fe4797a - 92b0ec9ac Implement ApproximateCountDistinct for SparkSql Add the implementation for ApproximateCountDistinct to SparkSql. We use the HyperLogLog algorithm implemented in stream-lib, and do the count in two phases: 1)

git commit: [SQL] Make it possible to create Java/Python SQLContexts from an existing Scala SQLContext.

2014-05-13 Thread rxin
Repository: spark Updated Branches: refs/heads/master 753b04dea - 44233865c [SQL] Make it possible to create Java/Python SQLContexts from an existing Scala SQLContext. Author: Michael Armbrust mich...@databricks.com Closes #761 from marmbrus/existingContext and squashes the following

git commit: [SPARK-1784] Add a new partitioner to allow specifying # of keys per partition

2014-05-13 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 618b3e6e7 - 66fe4797a [SPARK-1784] Add a new partitioner to allow specifying # of keys per partition This change adds a new partitioner which allows users to specify # of keys per partition. Author: Syed Hashmi shas...@cloudera.com

git commit: Fix: sbt test throw an java.lang.OutOfMemoryError: PermGen space

2014-05-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 e480bcfbd - 379f733e9 Fix: sbt test throw an java.lang.OutOfMemoryError: PermGen space Author: witgo wi...@qq.com Closes #773 from witgo/sbt_javaOptions and squashes the following commits: 26c7d38 [witgo] Improve sbt configuration

git commit: Fix: sbt test throw an java.lang.OutOfMemoryError: PermGen space

2014-05-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 17f3075bc - fde82c154 Fix: sbt test throw an java.lang.OutOfMemoryError: PermGen space Author: witgo wi...@qq.com Closes #773 from witgo/sbt_javaOptions and squashes the following commits: 26c7d38 [witgo] Improve sbt configuration

git commit: SPARK-1829 Sub-second durations shouldn't round to 0 s

2014-05-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 379f733e9 - 530bdf7d4 SPARK-1829 Sub-second durations shouldn't round to 0 s As 99 ms up to 99 ms As 0.1 s from 0.1 s up to 0.9 s https://issues.apache.org/jira/browse/SPARK-1829 Compare the first image to the second here:

git commit: SPARK-1668: Add implicit preference as an option to examples/MovieLensALS

2014-05-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 c7b27043a - 35aa2448a SPARK-1668: Add implicit preference as an option to examples/MovieLensALS Add --implicitPrefs as an command-line option to the example app MovieLensALS under examples/ Author: Sandeep sand...@techaddict.me

git commit: Use numpy directly for matrix multiply.

2014-05-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 35aa2448a - 010040fd0 Use numpy directly for matrix multiply. Using matrix multiply to compute XtX and XtY yields a 5-20x speedup depending on problem size. For example - the following takes 19s locally after this change vs. 5m21s

git commit: Nicer logging for SecurityManager startup

2014-05-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master ca4318686 - 7f6f4a103 Nicer logging for SecurityManager startup Happy to open a jira ticket if you'd like to track one there. Author: Andrew Ash and...@andrewash.com Closes #678 from ash211/SecurityManagerLogging and squashes the

git commit: Typo fix: fetchting - fetching

2014-05-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 69e2726d4 - 0759ee790 Typo fix: fetchting - fetching Author: Andrew Ash and...@andrewash.com Closes #680 from ash211/patch-3 and squashes the following commits: 9ce3746 [Andrew Ash] Typo fix: fetchting - fetching (cherry picked from

git commit: [Typo] propertes - properties

2014-05-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master e3d72a74a - 9ad096d55 [Typo] propertes - properties Author: andrewor14 andrewo...@gmail.com Closes #780 from andrewor14/submit-typo and squashes the following commits: e70e057 [andrewor14] propertes - properties Project:

git commit: [Typo] propertes - properties

2014-05-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 d6f1a75bc - 5ca3096dd [Typo] propertes - properties Author: andrewor14 andrewo...@gmail.com Closes #780 from andrewor14/submit-typo and squashes the following commits: e70e057 [andrewor14] propertes - properties (cherry picked from

git commit: [FIX] do not load defaults when testing SparkConf in pyspark

2014-05-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 65533c7ec - 94c6c06ea [FIX] do not load defaults when testing SparkConf in pyspark The default constructor loads default properties, which can fail the test. Author: Xiangrui Meng m...@databricks.com Closes #775 from

git commit: [SPARK-1696][MLLIB] use alpha in dense dspr

2014-05-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 31faec790 - d6f1a75bc [SPARK-1696][MLLIB] use alpha in dense dspr It doesn't affect existing code because only `alpha = 1.0` is used in the code. Author: Xiangrui Meng m...@databricks.com Closes #778 from mengxr/mllib-dspr-fix and

git commit: [SQL] Improve SparkSQL Aggregates

2014-05-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 010040fd0 - 8f3b9250c [SQL] Improve SparkSQL Aggregates * Add native min/max (was using hive before). * Handle nulls correctly in Avg and Sum. Author: Michael Armbrust mich...@databricks.com Closes #683 from marmbrus/aggFixes and

git commit: Typo fix: fetchting - fetching

2014-05-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7f6f4a103 - d00981a95 Typo fix: fetchting - fetching Author: Andrew Ash and...@andrewash.com Closes #680 from ash211/patch-3 and squashes the following commits: 9ce3746 [Andrew Ash] Typo fix: fetchting - fetching Project:

git commit: default task number misleading in several places

2014-05-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master 44165fc91 - 2f639957f default task number misleading in several places private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } it represents that the

git commit: default task number misleading in several places

2014-05-15 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 fdf9717da - 9f0f2ecb8 default task number misleading in several places private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } it represents that

git commit: SPARK-1829 Sub-second durations shouldn't round to 0 s

2014-05-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master fde82c154 - a3315d7f4 SPARK-1829 Sub-second durations shouldn't round to 0 s As 99 ms up to 99 ms As 0.1 s from 0.1 s up to 0.9 s https://issues.apache.org/jira/browse/SPARK-1829 Compare the first image to the second here:

git commit: [SPARK-1696][MLLIB] use alpha in dense dspr

2014-05-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master 601e37198 - e3d72a74a [SPARK-1696][MLLIB] use alpha in dense dspr It doesn't affect existing code because only `alpha = 1.0` is used in the code. Author: Xiangrui Meng m...@databricks.com Closes #778 from mengxr/mllib-dspr-fix and

git commit: [SQL] Fix tiny/small ints from HiveMetastore.

2014-05-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master e66e31be5 - a4aafe5f9 [SQL] Fix tiny/small ints from HiveMetastore. Author: Michael Armbrust mich...@databricks.com Closes #797 from marmbrus/smallInt and squashes the following commits: 2db9dae [Michael Armbrust] Fix tiny/small ints

git commit: Typos in Spark

2014-05-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master e1e3416c4 - 94c513960 Typos in Spark Author: Huajian Mao huajian...@gmail.com Closes #798 from huajianmao/patch-1 and squashes the following commits: 208a454 [Huajian Mao] A typo in Task 1b515af [Huajian Mao] A typo in the message

git commit: [SPARK-1845] [SQL] Use AllScalaRegistrar for SparkSqlSerializer to register serializers of ...

2014-05-16 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 aa5f989a5 - 7515367e3 [SPARK-1845] [SQL] Use AllScalaRegistrar for SparkSqlSerializer to register serializers of ... ...Scala collections. When I execute `orderBy` or `limit` for `SchemaRDD` including `ArrayType` or `MapType`,

git commit: Nicer logging for SecurityManager startup

2014-05-16 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 82ceda250 - 69e2726d4 Nicer logging for SecurityManager startup Happy to open a jira ticket if you'd like to track one there. Author: Andrew Ash and...@andrewash.com Closes #678 from ash211/SecurityManagerLogging and squashes the

git commit: [SQL] Implement between in hql

2014-05-16 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 ff47cdc0c - 386b31cbc [SQL] Implement between in hql Author: Michael Armbrust mich...@databricks.com Closes #804 from marmbrus/between and squashes the following commits: ae24672 [Michael Armbrust] add golden answer. d9997ef [Michael

git commit: bugfix: overflow of graphx Edge compare function

2014-05-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master e304eb998 - fa6de408a bugfix: overflow of graphx Edge compare function Author: Zhen Peng zhenpen...@baidu.com Closes #769 from zhpengg/bugfix-graphx-edge-compare and squashes the following commits: 8a978ff [Zhen Peng] add ut for graphx

git commit: [SPARK-1819] [SQL] Fix GetField.nullable.

2014-05-16 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 7515367e3 - f9eeddccb [SPARK-1819] [SQL] Fix GetField.nullable. `GetField.nullable` should be `true` not only when `field.nullable` is `true` but also when `child.nullable` is `true`. Author: Takuya UESHIN ues...@happy-camper.st

git commit: [SQL] Implement between in hql

2014-05-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master fa6de408a - 032d6632a [SQL] Implement between in hql Author: Michael Armbrust mich...@databricks.com Closes #804 from marmbrus/between and squashes the following commits: ae24672 [Michael Armbrust] add golden answer. d9997ef [Michael

git commit: Fixes a misplaced comment.

2014-05-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master a4aafe5f9 - e1e3416c4 Fixes a misplaced comment. Fixes a misplaced comment from #785. @pwendell Author: Prashant Sharma prashan...@imaginea.com Closes #788 from ScrapCodes/patch-1 and squashes the following commits: 3ef6a69 [Prashant

git commit: Typos in Spark

2014-05-16 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 2e418f517 - a2742d850 Typos in Spark Author: Huajian Mao huajian...@gmail.com Closes #798 from huajianmao/patch-1 and squashes the following commits: 208a454 [Huajian Mao] A typo in Task 1b515af [Huajian Mao] A typo in the message

git commit: [Spark-1461] Deferred Expression Evaluation (short-circuit evaluation)

2014-05-16 Thread rxin
Repository: spark Updated Branches: refs/heads/master bb98ecafc - a20fea988 [Spark-1461] Deferred Expression Evaluation (short-circuit evaluation) This patch unify the foldable nullable interface for Expression. 1) Deterministic-less UDF (like Rand()) can not be folded. 2) Short-circut will

  1   2   3   4   5   6   7   8   9   10   >