spark git commit: [SPARK-8702] [WEBUI] Avoid massive concating strings in Javascript

2015-06-29 Thread sarutak
Repository: spark Updated Branches: refs/heads/master 660c6cec7 - 630bd5fd8 [SPARK-8702] [WEBUI] Avoid massive concating strings in Javascript When there are massive tasks, such as `sc.parallelize(1 to 10, 1).count()`, the generated JS codes have a lot of string concatenations in

spark git commit: Revert [SPARK-8372] History server shows incorrect information for application not started

2015-06-29 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 715f084ca - ea88b1a50 Revert [SPARK-8372] History server shows incorrect information for application not started This reverts commit 2837e067099921dd4ab6639ac5f6e89f789d4ff4. Project: http://git-wip-us.apache.org/repos/asf/spark/repo

spark git commit: Revert [SPARK-8372] History server shows incorrect information for application not started

2015-06-29 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.4 9d9c4b476 - f7c200e6a Revert [SPARK-8372] History server shows incorrect information for application not started This reverts commit f0513733d4f6fc34f86feffd3062600cbbd56a28. Project:

spark git commit: [SPARK-8661][ML] for LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments, to make copy-pasting R code more simple

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master ed359de59 - 4e880cf59 [SPARK-8661][ML] for LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments, to make copy-pasting R code more simple for

spark git commit: [SPARK-8710] [SQL] Change ScalaReflection.mirror from a val to a def.

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 4e880cf59 - 4b497a724 [SPARK-8710] [SQL] Change ScalaReflection.mirror from a val to a def. jira: https://issues.apache.org/jira/browse/SPARK-8710 Author: Yin Huai yh...@databricks.com Closes #7094 from yhuai/SPARK-8710 and squashes the

spark git commit: [SPARK-8710] [SQL] Change ScalaReflection.mirror from a val to a def.

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.4 0c3f7fc88 - 6a45d86db [SPARK-8710] [SQL] Change ScalaReflection.mirror from a val to a def. jira: https://issues.apache.org/jira/browse/SPARK-8710 Author: Yin Huai yh...@databricks.com Closes #7094 from yhuai/SPARK-8710 and squashes

spark git commit: [SPARK-8589] [SQL] cleanup DateTimeUtils

2015-06-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 4b497a724 - 881662e9c [SPARK-8589] [SQL] cleanup DateTimeUtils move date time related operations into `DateTimeUtils` and rename some methods to make it more clear. Author: Wenchen Fan cloud0...@outlook.com Closes #6980 from

spark git commit: [SPARK-8634] [STREAMING] [TESTS] Fix flaky test StreamingListenerSuite receiver info reporting

2015-06-29 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 881662e9c - cec98525f [SPARK-8634] [STREAMING] [TESTS] Fix flaky test StreamingListenerSuite receiver info reporting As per the unit test log in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35754/ ``` 15/06/24

spark git commit: [SPARK-7287] [SPARK-8567] [TEST] Add sc.stop to applications in SparkSubmitSuite

2015-06-29 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master cec98525f - fbf75738f [SPARK-7287] [SPARK-8567] [TEST] Add sc.stop to applications in SparkSubmitSuite Hopefully, this suite will not be flaky anymore. Author: Yin Huai yh...@databricks.com Closes #7027 from yhuai/SPARK-8567 and squashes

spark git commit: [SQL][DOCS] Remove wrong example from DataFrame.scala

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 492dca3a7 - 94e040d05 [SQL][DOCS] Remove wrong example from DataFrame.scala In DataFrame.scala, there are examples like as follows. ``` * // The following are equivalent: * peopleDf.filter($age 15) * peopleDf.where($age 15) *

spark git commit: [SQL][DOCS] Remove wrong example from DataFrame.scala

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.4 f7c200e6a - da51cf58f [SQL][DOCS] Remove wrong example from DataFrame.scala In DataFrame.scala, there are examples like as follows. ``` * // The following are equivalent: * peopleDf.filter($age 15) * peopleDf.where($age 15) *

spark git commit: [SPARK-8692] [SQL] re-order the case statements that handling catalyst data types

2015-06-29 Thread lian
Repository: spark Updated Branches: refs/heads/master ea88b1a50 - ed413bcc7 [SPARK-8692] [SQL] re-order the case statements that handling catalyst data types use same order: boolean, byte, short, int, date, long, timestamp, float, double, string, binary, decimal. Then we can easily check

spark git commit: [SPARK-8579] [SQL] support arbitrary object in UnsafeRow

2015-06-29 Thread davies
Repository: spark Updated Branches: refs/heads/master 931da5c8a - ed359de59 [SPARK-8579] [SQL] support arbitrary object in UnsafeRow This PR brings arbitrary object support in UnsafeRow (both in grouping key and aggregation buffer). Two object pools will be created to hold those

spark git commit: [SPARK-7810] [PYSPARK] solve python rdd socket connection problem

2015-06-29 Thread davies
Repository: spark Updated Branches: refs/heads/branch-1.4 457d07eaa - 187015f67 [SPARK-7810] [PYSPARK] solve python rdd socket connection problem Method _load_from_socket in rdd.py cannot load data from jvm socket when ipv6 is used. The current method only works well with ipv4. New

spark git commit: [SPARK-7810] [PYSPARK] solve python rdd socket connection problem

2015-06-29 Thread davies
Repository: spark Updated Branches: refs/heads/master f6fc254ec - ecd3aacf2 [SPARK-7810] [PYSPARK] solve python rdd socket connection problem Method _load_from_socket in rdd.py cannot load data from jvm socket when ipv6 is used. The current method only works well with ipv4. New modification

spark git commit: [HOTFIX] Fix whitespace style error

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.4 0de1737a8 - 0c3f7fc88 [HOTFIX] Fix whitespace style error Author: Michael Armbrust mich...@databricks.com Closes #7102 from marmbrus/fixStyle and squashes the following commits: 8c08124 [Michael Armbrust] [HOTFIX] Fix whitespace

spark git commit: [SQL] [MINOR] Skip unresolved expression for InConversion

2015-06-29 Thread lian
Repository: spark Updated Branches: refs/heads/branch-1.4 6b9f3831a - 457d07eaa [SQL] [MINOR] Skip unresolved expression for InConversion Author: scwf wangf...@huawei.com Closes #6145 from scwf/InConversion and squashes the following commits: 5c8ac6b [scwf] minir fix for InConversion

spark git commit: [SPARK-8056][SQL] Design an easier way to construct schema for both Scala and Python

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 27ef85451 - f6fc254ec [SPARK-8056][SQL] Design an easier way to construct schema for both Scala and Python I've added functionality to create new StructType similar to how we add parameters to a new SparkContext. I've also added tests

spark git commit: [SPARK-8070] [SQL] [PYSPARK] avoid spark jobs in createDataFrame

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master be7ef0676 - afae9766f [SPARK-8070] [SQL] [PYSPARK] avoid spark jobs in createDataFrame Avoid the unnecessary jobs when infer schema from list. cc yhuai mengxr Author: Davies Liu dav...@databricks.com Closes #6606 from

[4/4] spark git commit: [SPARK-8478] [SQL] Harmonize UDF-related code to use uniformly UDF instead of Udf

2015-06-29 Thread marmbrus
[SPARK-8478] [SQL] Harmonize UDF-related code to use uniformly UDF instead of Udf Follow-up of #6902 for being coherent between ```Udf``` and ```UDF``` Author: BenFradet benjamin.fra...@gmail.com Closes #6920 from BenFradet/SPARK-8478 and squashes the following commits: c500f29 [BenFradet]

[2/4] spark git commit: [SPARK-8478] [SQL] Harmonize UDF-related code to use uniformly UDF instead of Udf

2015-06-29 Thread marmbrus
http://git-wip-us.apache.org/repos/asf/spark/blob/931da5c8/sql/core/src/main/scala/org/apache/spark/sql/execution/pythonUDFs.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/pythonUDFs.scala

[3/4] spark git commit: [SPARK-8478] [SQL] Harmonize UDF-related code to use uniformly UDF instead of Udf

2015-06-29 Thread marmbrus
http://git-wip-us.apache.org/repos/asf/spark/blob/931da5c8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUdf.scala -- diff --git

[1/4] spark git commit: [SPARK-8478] [SQL] Harmonize UDF-related code to use uniformly UDF instead of Udf

2015-06-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master c8ae887ef - 931da5c8a http://git-wip-us.apache.org/repos/asf/spark/blob/931da5c8/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala -- diff

spark git commit: [SPARK-7810] [PYSPARK] solve python rdd socket connection problem

2015-06-29 Thread davies
Repository: spark Updated Branches: refs/heads/branch-1.3 ac3591d09 - 0ce83db11 [SPARK-7810] [PYSPARK] solve python rdd socket connection problem Method _load_from_socket in rdd.py cannot load data from jvm socket when ipv6 is used. The current method only works well with ipv4. New

spark git commit: [SPARK-7862] [SQL] Disable the error message redirect to stderr

2015-06-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 637b4eeda - c6ba2ea34 [SPARK-7862] [SQL] Disable the error message redirect to stderr This is a follow up of #6404, the ScriptTransformation prints the error msg into stderr directly, probably be a disaster for application log. Author:

spark git commit: [SPARK-8681] fixed wrong ordering of columns in crosstab

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master c6ba2ea34 - be7ef0676 [SPARK-8681] fixed wrong ordering of columns in crosstab I specifically randomized the test. What crosstab does is equivalent to a countByKey, therefore if this test fails again for any reason, we will know that we

spark git commit: [SPARK-8681] fixed wrong ordering of columns in crosstab

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.4 da51cf58f - 6b9f3831a [SPARK-8681] fixed wrong ordering of columns in crosstab I specifically randomized the test. What crosstab does is equivalent to a countByKey, therefore if this test fails again for any reason, we will know that

spark git commit: [SPARK-8709] Exclude hadoop-client's mockito-all dependency

2015-06-29 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master afae9766f - 27ef85451 [SPARK-8709] Exclude hadoop-client's mockito-all dependency This patch excludes `hadoop-client`'s dependency on `mockito-all`. As of #7061, Spark depends on `mockito-core` instead of `mockito-all`, so the

spark git commit: [SPARK-8567] [SQL] Add logs to record the progress of HiveSparkSubmitSuite (1.4 branch)

2015-06-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 187015f67 - 0de1737a8 [SPARK-8567] [SQL] Add logs to record the progress of HiveSparkSubmitSuite (1.4 branch) Cherry-pick f9b397f54d1c491680d70aba210bb8211fd249c1 to branch 1.4. Author: Yin Huai yh...@databricks.com Closes #7092

spark git commit: [SPARK-8355] [SQL] Python DataFrameReader/Writer should mirror Scala

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.4 e1bbf1a08 - 9d9c4b476 [SPARK-8355] [SQL] Python DataFrameReader/Writer should mirror Scala I compared PySpark DataFrameReader/Writer against Scala ones. `Option` function is missing in both reader and writer, but the rest seems to all

spark git commit: [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should not default to empty tuple.

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master ac2e17b01 - 660c6cec7 [SPARK-8698] partitionBy in Python DataFrame reader/writer interface should not default to empty tuple. Author: Reynold Xin r...@databricks.com Closes #7079 from rxin/SPARK-8698 and squashes the following commits:

spark git commit: [SPARK-8355] [SQL] Python DataFrameReader/Writer should mirror Scala

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master 0b10662fe - ac2e17b01 [SPARK-8355] [SQL] Python DataFrameReader/Writer should mirror Scala I compared PySpark DataFrameReader/Writer against Scala ones. `Option` function is missing in both reader and writer, but the rest seems to all

spark git commit: [SPARK-8634] [STREAMING] [TESTS] Fix flaky test StreamingListenerSuite receiver info reporting

2015-06-29 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.4 6a45d86db - f84f24769 [SPARK-8634] [STREAMING] [TESTS] Fix flaky test StreamingListenerSuite receiver info reporting As per the unit test log in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35754/ ``` 15/06/24

spark git commit: [SPARK-8019] [SPARKR] Support SparkR spawning worker R processes with a command other then Rscript

2015-06-29 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master d7f796da4 - 4a9e03fa8 [SPARK-8019] [SPARKR] Support SparkR spawning worker R processes with a command other then Rscript This is a simple change to add a new environment variable spark.sparkr.r.command that specifies the command that

spark git commit: [SPARK-8715] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab

2015-06-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 80d53565a - ffc793a6c [SPARK-8715] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab cc yhuai Author: Burak Yavuz brk...@gmail.com Closes #7100 from brkyvz/ct-flakiness-fix and squashes the following commits: abc299a

spark git commit: [SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles

2015-06-29 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.4 cdfa388dd - b2684557f [SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles Note that 'dir/*' can be more efficient in some Hadoop FS implementations

spark git commit: [SPARK-8456] [ML] Ngram featurizer python

2015-06-29 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 4c1808be4 - 620605a4a [SPARK-8456] [ML] Ngram featurizer python Python API for N-gram feature transformer Author: Feynman Liang fli...@databricks.com Closes #6960 from feynmanliang/ngram-featurizer-python and squashes the following

spark git commit: [SPARK-8410] [SPARK-8475] remove previous ivy resolution when using spark-submit

2015-06-29 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 5d30eae56 - d7f796da4 [SPARK-8410] [SPARK-8475] remove previous ivy resolution when using spark-submit This PR also includes re-ordering the order that repositories are used when resolving packages. User provided repositories will be

spark git commit: [SPARK-8410] [SPARK-8475] remove previous ivy resolution when using spark-submit

2015-06-29 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.4 b2684557f - c0fbd6781 [SPARK-8410] [SPARK-8475] remove previous ivy resolution when using spark-submit This PR also includes re-ordering the order that repositories are used when resolving packages. User provided repositories will be

spark git commit: Revert [SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles

2015-06-29 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 4a9e03fa8 - 4c1808be4 Revert [SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles This reverts commit 5d30eae56051c563a8427f330b09ef66db0a0d21.

spark git commit: Revert [SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles

2015-06-29 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.4 c0fbd6781 - 80d53565a Revert [SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles This reverts commit b2684557fa0d2ec14b7529324443c8154d81c348.

spark git commit: [SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles

2015-06-29 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master fbf75738f - 5d30eae56 [SPARK-8437] [DOCS] Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles Note that 'dir/*' can be more efficient in some Hadoop FS implementations that

spark git commit: [SPARK-7287] [SPARK-8567] [TEST] Add sc.stop to applications in SparkSubmitSuite

2015-06-29 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.4 f84f24769 - cdfa388dd [SPARK-7287] [SPARK-8567] [TEST] Add sc.stop to applications in SparkSubmitSuite Hopefully, this suite will not be flaky anymore. Author: Yin Huai yh...@databricks.com Closes #7027 from yhuai/SPARK-8567 and

spark git commit: [SPARK-8669] [SQL] Fix crash with BINARY (ENUM) fields with Parquet 1.7

2015-06-29 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master ecacb1e88 - 4915e9e3b [SPARK-8669] [SQL] Fix crash with BINARY (ENUM) fields with Parquet 1.7 Patch to fix crash with BINARY fields with ENUM original types. Author: Steven She ste...@canopylabs.com Closes #7048 from

spark git commit: [SPARK-7667] [MLLIB] MLlib Python API consistency check

2015-06-29 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 4915e9e3b - f9b6bf2f8 [SPARK-7667] [MLLIB] MLlib Python API consistency check MLlib Python API consistency check Author: Yanbo Liang yblia...@gmail.com Closes #6856 from yanboliang/spark-7667 and squashes the following commits: 21bae35

Git Push Summary

2015-06-29 Thread pwendell
Repository: spark Updated Branches: refs/heads/scala-2.9 [deleted] d2efe1357 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: MAINTENANCE: Automated closing of pull requests.

2015-06-29 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 7bbbe380c - ea775b066 MAINTENANCE: Automated closing of pull requests. This commit exists to close the following pull requests on Github: Closes #1767 (close requested by 'andrewor14') Closes #6952 (close requested by 'andrewor14') Closes

spark git commit: [SPARK-5161] Parallelize Python test execution

2015-06-29 Thread davies
Repository: spark Updated Branches: refs/heads/master f9b6bf2f8 - 7bbbe380c [SPARK-5161] Parallelize Python test execution This commit parallelizes the Python unit test execution, significantly reducing Jenkins build times. Parallelism is now configurable by passing the `-p` or

spark git commit: [SPARK-8650] [SQL] Use the user-specified app name priority in SparkSQLCLIDriver or HiveThriftServer2

2015-06-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/master f79410c49 - e6c3f7462 [SPARK-8650] [SQL] Use the user-specified app name priority in SparkSQLCLIDriver or HiveThriftServer2 When run `./bin/spark-sql --name query1.sql` [Before]

spark git commit: [SPARK-8721][SQL] Rename ExpectsInputTypes = AutoCastInputTypes.

2015-06-29 Thread rxin
Repository: spark Updated Branches: refs/heads/master ea775b066 - f79410c49 [SPARK-8721][SQL] Rename ExpectsInputTypes = AutoCastInputTypes. Author: Reynold Xin r...@databricks.com Closes #7109 from rxin/auto-cast and squashes the following commits: a914cc3 [Reynold Xin] [SPARK-8721][SQL]

spark git commit: [SPARK-8214] [SQL] Add function hex

2015-06-29 Thread davies
Repository: spark Updated Branches: refs/heads/master 94e040d05 - 637b4eeda [SPARK-8214] [SQL] Add function hex cc chenghao-intel adrian-wang Author: zhichao.li zhichao...@intel.com Closes #6976 from zhichao-li/hex and squashes the following commits: e218d1b [zhichao.li] turn off