spark git commit: [SQL] Fix typo in DataframeWriter doc

2017-07-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6550086bb -> 51f99fb25 [SQL] Fix typo in DataframeWriter doc ## What changes were proposed in this pull request? The format of none should be consistent with other compression codec(\`snappy\`, \`lz4\`) as \`none\`. ## How was this

spark git commit: [SPARK-12717][PYTHON][BRANCH-2.2] Adding thread-safe broadcast pickle registry

2017-08-02 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 467ee8dff -> 690f491f6 [SPARK-12717][PYTHON][BRANCH-2.2] Adding thread-safe broadcast pickle registry ## What changes were proposed in this pull request? When using PySpark broadcast variables in a multi-threaded environment,

spark git commit: [SPARK-12717][PYTHON][BRANCH-2.1] Adding thread-safe broadcast pickle registry

2017-08-02 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.1 b31b30209 -> d93e45b8b [SPARK-12717][PYTHON][BRANCH-2.1] Adding thread-safe broadcast pickle registry ## What changes were proposed in this pull request? When using PySpark broadcast variables in a multi-threaded environment,

spark git commit: [SPARK-21602][R] Add map_keys and map_values functions to R

2017-08-03 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master e7c59b417 -> 97ba49183 [SPARK-21602][R] Add map_keys and map_values functions to R ## What changes were proposed in this pull request? This PR adds `map_values` and `map_keys` to R API. ```r > df <- createDataFrame(cbind(model =

spark git commit: [SPARK-12717][PYTHON] Adding thread-safe broadcast pickle registry

2017-08-01 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 58da1a245 -> 77cc0d67d [SPARK-12717][PYTHON] Adding thread-safe broadcast pickle registry ## What changes were proposed in this pull request? When using PySpark broadcast variables in a multi-threaded environment,

spark git commit: [SPARK-21712][PYSPARK] Clarify type error for Column.substr()

2017-08-15 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 42b9eda80 -> 966083105 [SPARK-21712][PYSPARK] Clarify type error for Column.substr() Proposed changes: * Clarify the type error that `Column.substr()` gives. Test plan: * Tested this manually. * Test code: ```python from

spark git commit: [MINOR][BUILD] Download RAT and R version info over HTTPS; use RAT 0.12

2017-08-11 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master da8c59bde -> b0bdfce9c [MINOR][BUILD] Download RAT and R version info over HTTPS; use RAT 0.12 ## What changes were proposed in this pull request? This is trivial, but bugged me. We should download software over HTTPS. And we can use RAT

spark git commit: [SPARK-21658][SQL][PYSPARK] Add default None for value in na.replace in PySpark

2017-08-14 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6847e93cf -> 0fcde87aa [SPARK-21658][SQL][PYSPARK] Add default None for value in na.replace in PySpark ## What changes were proposed in this pull request? JIRA issue: https://issues.apache.org/jira/browse/SPARK-21658 Add default None for

spark-website git commit: Update committer page

2017-07-28 Thread gurwls223
Repository: spark-website Updated Branches: refs/heads/asf-site 6ff5039f3 -> 0e09b2f58 Update committer page Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/0e09b2f5 Tree:

spark git commit: [SPARKR][BUILD] AppVeyor change to latest R version

2017-08-06 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1ba967b25 -> d4e7f20f5 [SPARKR][BUILD] AppVeyor change to latest R version ## What changes were proposed in this pull request? R version update ## How was this patch tested? AppVeyor Author: Felix Cheung

spark git commit: [MINOR] Minor comment fixes in merge_spark_pr.py script

2017-07-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6830e90de -> f1a798b57 [MINOR] Minor comment fixes in merge_spark_pr.py script ## What changes were proposed in this pull request? This PR proposes to fix few rather typos in `merge_spark_pr.py`. - `# usage: ./apache-pr-merge.py

spark git commit: [MINOR][R][BUILD] More reliable detection of R version for Windows in AppVeyor

2017-08-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master ee1304199 -> 08ef7d718 [MINOR][R][BUILD] More reliable detection of R version for Windows in AppVeyor ## What changes were proposed in this pull request? This PR proposes to use https://rversions.r-pkg.org/r-release-win instead of

spark git commit: [INFRA] Close stale PRs

2017-08-05 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 894d5a453 -> 3a45c7fee [INFRA] Close stale PRs ## What changes were proposed in this pull request? This PR proposes to close stale PRs, mostly the same instances with #18017 Closes #14085 - [SPARK-16408][SQL] SparkSQL Added file get

spark git commit: [SPARK-21778][SQL] Simpler Dataset.sample API in Scala / Java

2017-08-18 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 310454be3 -> 07a2b8738 [SPARK-21778][SQL] Simpler Dataset.sample API in Scala / Java ## What changes were proposed in this pull request? Dataset.sample requires a boolean flag withReplacement as the first argument. However, most of the

spark git commit: [SPARK-21513][SQL][FOLLOWUP] Allow UDF to_json support converting MapType to json for PySpark and SparkR

2017-09-14 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 054ddb2f5 -> a28728a9a [SPARK-21513][SQL][FOLLOWUP] Allow UDF to_json support converting MapType to json for PySpark and SparkR ## What changes were proposed in this pull request? In previous work SPARK-21513, we has allowed `MapType` and

spark git commit: [SPARK-18136] Fix SPARK_JARS_DIR for Python pip install on Windows

2017-09-23 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f180b6534 -> c11f24a94 [SPARK-18136] Fix SPARK_JARS_DIR for Python pip install on Windows ## What changes were proposed in this pull request? Fix for setup of `SPARK_JARS_DIR` on Windows as it looks for `%SPARK_HOME%\RELEASE` file

spark git commit: [SPARK-18136] Fix SPARK_JARS_DIR for Python pip install on Windows

2017-09-23 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 de6274a58 -> c0a34a9ff [SPARK-18136] Fix SPARK_JARS_DIR for Python pip install on Windows ## What changes were proposed in this pull request? Fix for setup of `SPARK_JARS_DIR` on Windows as it looks for `%SPARK_HOME%\RELEASE` file

spark git commit: [SPARK-18136] Fix SPARK_JARS_DIR for Python pip install on Windows

2017-09-23 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.1 03db72149 -> 0b3e7cc6a [SPARK-18136] Fix SPARK_JARS_DIR for Python pip install on Windows ## What changes were proposed in this pull request? Fix for setup of `SPARK_JARS_DIR` on Windows as it looks for `%SPARK_HOME%\RELEASE` file

spark git commit: [SPARK-21780][R] Simpler Dataset.sample API in R

2017-09-21 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1da5822e6 -> a8d9ec8a6 [SPARK-21780][R] Simpler Dataset.sample API in R ## What changes were proposed in this pull request? This PR make `sample(...)` able to omit `withReplacement` defaulting to `FALSE`. In short, the following examples

spark git commit: [SPARK-22086][DOCS] Add expression description for CASE WHEN

2017-09-21 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1d1a09be9 -> 1270e7175 [SPARK-22086][DOCS] Add expression description for CASE WHEN ## What changes were proposed in this pull request? In SQL conditional expressions, only CASE WHEN lacks for expression description. This patch fills the

spark git commit: [SPARK-22032][PYSPARK] Speed up StructType conversion

2017-09-17 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 73d906722 -> f4073020a [SPARK-22032][PYSPARK] Speed up StructType conversion ## What changes were proposed in this pull request? StructType.fromInternal is calling f.fromInternal(v) for every field. We can use precalculated information

spark git commit: [SPARK-21985][PYSPARK] PairDeserializer is broken for double-zipped RDDs

2017-09-17 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 51e5a821d -> 42852bb17 [SPARK-21985][PYSPARK] PairDeserializer is broken for double-zipped RDDs ## What changes were proposed in this pull request? (edited) Fixes a bug introduced in #16121 In PairDeserializer convert each batch of

spark git commit: [SPARK-21985][PYSPARK] PairDeserializer is broken for double-zipped RDDs

2017-09-17 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.1 e49c997fe -> 3ae7ab8e8 [SPARK-21985][PYSPARK] PairDeserializer is broken for double-zipped RDDs ## What changes were proposed in this pull request? (edited) Fixes a bug introduced in #16121 In PairDeserializer convert each batch of

spark git commit: [SPARK-21985][PYSPARK] PairDeserializer is broken for double-zipped RDDs

2017-09-17 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f4073020a -> 6adf67dd1 [SPARK-21985][PYSPARK] PairDeserializer is broken for double-zipped RDDs ## What changes were proposed in this pull request? (edited) Fixes a bug introduced in #16121 In PairDeserializer convert each batch of keys

spark git commit: [SPARK-22043][PYTHON] Improves error message for show_profiles and dump_profiles

2017-09-17 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 309c401a5 -> a86831d61 [SPARK-22043][PYTHON] Improves error message for show_profiles and dump_profiles ## What changes were proposed in this pull request? This PR proposes to improve error message from: ``` >>> sc.show_profiles()

spark git commit: [SPARK-22043][PYTHON] Improves error message for show_profiles and dump_profiles

2017-09-17 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.1 99de4b8f5 -> b35136a9e [SPARK-22043][PYTHON] Improves error message for show_profiles and dump_profiles ## What changes were proposed in this pull request? This PR proposes to improve error message from: ``` >>> sc.show_profiles()

spark git commit: [SPARK-22043][PYTHON] Improves error message for show_profiles and dump_profiles

2017-09-17 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6308c65f0 -> 7c7266208 [SPARK-22043][PYTHON] Improves error message for show_profiles and dump_profiles ## What changes were proposed in this pull request? This PR proposes to improve error message from: ``` >>> sc.show_profiles()

spark git commit: [SPARK-21766][PYSPARK][SQL] DataFrame toPandas() raises ValueError with nullable int columns

2017-09-22 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master d2b2932d8 -> 3e6a714c9 [SPARK-21766][PYSPARK][SQL] DataFrame toPandas() raises ValueError with nullable int columns ## What changes were proposed in this pull request? When calling `DataFrame.toPandas()` (without Arrow enabled), if there

spark git commit: [SPARK-22049][DOCS] Confusing behavior of from_utc_timestamp and to_utc_timestamp

2017-09-20 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 2b6ff0cef -> e17901d6d [SPARK-22049][DOCS] Confusing behavior of from_utc_timestamp and to_utc_timestamp ## What changes were proposed in this pull request? Clarify behavior of to_utc_timestamp/from_utc_timestamp with an example ## How

spark git commit: [SPARK-21877][DEPLOY, WINDOWS] Handle quotes in Windows command scripts

2017-10-06 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 0c03297bf -> c7b46d4d8 [SPARK-21877][DEPLOY, WINDOWS] Handle quotes in Windows command scripts ## What changes were proposed in this pull request? All the windows command scripts can not handle quotes in parameter. Run a windows command

spark git commit: [SPARK-20396][SQL][PYSPARK] groupby().apply() with pandas udf

2017-10-10 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 2028e5a82 -> bfc7e1fe1 [SPARK-20396][SQL][PYSPARK] groupby().apply() with pandas udf ## What changes were proposed in this pull request? This PR adds an apply() function on df.groupby(). apply() takes a pandas udf that is a

spark git commit: [SPARK-22206][SQL][SPARKR] gapply in R can't work on empty grouping columns

2017-10-05 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master c8affec21 -> ae61f187a [SPARK-22206][SQL][SPARKR] gapply in R can't work on empty grouping columns ## What changes were proposed in this pull request? Looks like `FlatMapGroupsInRExec.requiredChildDistribution` didn't consider empty

spark git commit: [SPARK-22206][SQL][SPARKR] gapply in R can't work on empty grouping columns

2017-10-05 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.1 5bb4e931b -> 920372a19 [SPARK-22206][SQL][SPARKR] gapply in R can't work on empty grouping columns ## What changes were proposed in this pull request? Looks like `FlatMapGroupsInRExec.requiredChildDistribution` didn't consider empty

spark git commit: [SPARK-22206][SQL][SPARKR] gapply in R can't work on empty grouping columns

2017-10-05 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 81232ce03 -> 8a4e7dd89 [SPARK-22206][SQL][SPARKR] gapply in R can't work on empty grouping columns ## What changes were proposed in this pull request? Looks like `FlatMapGroupsInRExec.requiredChildDistribution` didn't consider empty

spark git commit: [SPARK-22233][CORE] Allow user to filter out empty split in HadoopRDD

2017-10-14 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master e0503a722 -> 014dc8471 [SPARK-22233][CORE] Allow user to filter out empty split in HadoopRDD ## What changes were proposed in this pull request? Add a flag spark.files.ignoreEmptySplits. When true, methods like that use HadoopRDD and

spark git commit: [SPARK-21726][SQL][FOLLOW-UP] Check for structural integrity of the plan in Optimzer in test mode

2017-09-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master dbb824125 -> 0dfc1ec59 [SPARK-21726][SQL][FOLLOW-UP] Check for structural integrity of the plan in Optimzer in test mode ## What changes were proposed in this pull request? The condition in `Optimizer.isPlanIntegral` is wrong. We should

spark git commit: [SPARK-21875][BUILD] Fix Java style bugs

2017-08-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master d8f454086 -> 313c6ca43 [SPARK-21875][BUILD] Fix Java style bugs ## What changes were proposed in this pull request? Fix Java code style so `./dev/lint-java` succeeds ## How was this patch tested? Run `./dev/lint-java` Author: Andrew

spark git commit: [SPARK-21839][SQL] Support SQL config for ORC compression

2017-08-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6949a9c5c -> d8f454086 [SPARK-21839][SQL] Support SQL config for ORC compression ## What changes were proposed in this pull request? This PR aims to support `spark.sql.orc.compression.codec` like Parquet's

spark git commit: [SPARK-21903][BUILD][FOLLOWUP] Upgrade scalastyle-maven-plugin and scalastyle as well in POM and SparkBuild.scala

2017-09-06 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 16c4c03c7 -> 64936c14a [SPARK-21903][BUILD][FOLLOWUP] Upgrade scalastyle-maven-plugin and scalastyle as well in POM and SparkBuild.scala ## What changes were proposed in this pull request? This PR proposes to match scalastyle version in

spark git commit: Fixed pandoc dependency issue in python/setup.py

2017-09-06 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master fa0092bdd -> aad212547 Fixed pandoc dependency issue in python/setup.py ## Problem Description When pyspark is listed as a dependency of another package, installing the other package will cause an install failure in pyspark. When the

spark git commit: Fixed pandoc dependency issue in python/setup.py

2017-09-06 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 342cc2a4c -> 49968de52 Fixed pandoc dependency issue in python/setup.py ## Problem Description When pyspark is listed as a dependency of another package, installing the other package will cause an install failure in pyspark. When the

spark git commit: [SPARK-21513][SQL] Allow UDF to_json support converting MapType to json

2017-09-12 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1a9857476 -> 371e4e205 [SPARK-21513][SQL] Allow UDF to_json support converting MapType to json # What changes were proposed in this pull request? UDF to_json only supports converting `StructType` or `ArrayType` of `StructType`s to a json

spark git commit: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-09-10 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 182478e03 -> b1b5a7fdc [SPARK-20098][PYSPARK] dataType's typeName fix ## What changes were proposed in this pull request? `typeName` classmethod has been fixed by using type -> typeName map. ## How was this patch tested? local build

spark git commit: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-09-10 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f76790557 -> 520d92a19 [SPARK-20098][PYSPARK] dataType's typeName fix ## What changes were proposed in this pull request? `typeName` classmethod has been fixed by using type -> typeName map. ## How was this patch tested? local build

spark git commit: [SPARK-21954][SQL] JacksonUtils should verify MapType's value type instead of key type

2017-09-09 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 8a5eb5068 -> 6b45d7e94 [SPARK-21954][SQL] JacksonUtils should verify MapType's value type instead of key type ## What changes were proposed in this pull request? `JacksonUtils.verifySchema` verifies if a data type can be converted to

spark git commit: [SPARK-21954][SQL] JacksonUtils should verify MapType's value type instead of key type

2017-09-09 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 987682160 -> 182478e03 [SPARK-21954][SQL] JacksonUtils should verify MapType's value type instead of key type ## What changes were proposed in this pull request? `JacksonUtils.verifySchema` verifies if a data type can be converted to

spark git commit: [BUILD][TEST][SPARKR] add sparksubmitsuite to appveyor tests

2017-09-10 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6273a711b -> 828fab035 [BUILD][TEST][SPARKR] add sparksubmitsuite to appveyor tests ## What changes were proposed in this pull request? more file regex ## How was this patch tested? Jenkins, AppVeyor Author: Felix Cheung

spark git commit: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not handled properly when creating a dataframe from a file

2017-09-12 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master dd7816758 -> 7d0a3ef4c [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not handled properly when creating a dataframe from a file ## What changes were proposed in this pull request? When the `requiredSchema` only contains

spark git commit: [SPARK-22107] Change as to alias in python quickstart

2017-09-24 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 211d81beb -> 8acce00ac [SPARK-22107] Change as to alias in python quickstart ## What changes were proposed in this pull request? Updated docs so that a line of python in the quick start guide executes. Closes #19283 ## How was this

spark git commit: [SPARK-22107] Change as to alias in python quickstart

2017-09-24 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 576c43fb4 -> 20adf9aa1 [SPARK-22107] Change as to alias in python quickstart ## What changes were proposed in this pull request? Updated docs so that a line of python in the quick start guide executes. Closes #19283 ## How was this

spark git commit: [SPARK-22112][PYSPARK] Supports RDD of strings as input in spark.read.csv in PySpark

2017-09-26 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master ceaec9383 -> 1fdfe6935 [SPARK-22112][PYSPARK] Supports RDD of strings as input in spark.read.csv in PySpark ## What changes were proposed in this pull request? We added a method to the scala API for creating a `DataFrame` from

spark git commit: [BUILD] Close stale PRs

2017-09-26 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f21f6ce99 -> ceaec9383 [BUILD] Close stale PRs Closes #13794 Closes #18474 Closes #18897 Closes #18978 Closes #19152 Closes #19238 Closes #19295 Closes #19334 Closes #19335 Closes #19347 Closes #19236 Closes #19244 Closes #19300 Closes

spark git commit: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_udf and add doctests

2017-09-25 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master ce204780e -> d8e825e3b [SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_udf and add doctests ## What changes were proposed in this pull request? This change disables the use of 0-parameter pandas_udfs due to the API being overly

spark git commit: [SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r

2017-10-01 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master c6610a997 -> 02c91e03f [SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r ## What changes were proposed in this pull request? Currently, we set lintr to jimhester/lintra769c0b (see

spark git commit: [MINOR] Fixed up pandas_udf related docs and formatting

2017-09-27 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 9244957b5 -> 7bf4da8a3 [MINOR] Fixed up pandas_udf related docs and formatting ## What changes were proposed in this pull request? Fixed some minor issues with pandas_udf related docs and formatting. ## How was this patch tested? NA

spark git commit: [SPARK-22130][CORE] UTF8String.trim() scans " " twice

2017-09-27 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master d2b8b63b9 -> 12e740bba [SPARK-22130][CORE] UTF8String.trim() scans " " twice ## What changes were proposed in this pull request? This PR allows us to scan a string including only white space (e.g. `" "`) once while the current

spark git commit: [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream format for vectorized UDF.

2017-09-27 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 12e740bba -> 09cbf3df2 [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream format for vectorized UDF. ## What changes were proposed in this pull request? Currently we use Arrow File format to communicate with Python worker when invoking

spark git commit: [SPARK-22093][TESTS] Fixes `assume` in `UtilsSuite` and `HiveDDLSuite`

2017-09-24 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 2274d84ef -> 9d48bd0b3 [SPARK-22093][TESTS] Fixes `assume` in `UtilsSuite` and `HiveDDLSuite` ## What changes were proposed in this pull request? This PR proposes to remove `assume` in `Utils.resolveURIs` and replace `assume` to `assert`

spark git commit: [SPARK-21804][SQL] json_tuple returns null values within repeated columns except the first one

2017-08-24 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 846bc61cf -> 95713eb4f [SPARK-21804][SQL] json_tuple returns null values within repeated columns except the first one ## What changes were proposed in this pull request? When json_tuple in extracting values from JSON it returns null

spark git commit: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as arguments should validate input types for column

2017-08-24 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 95713eb4f -> dc5d34d8d [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as arguments should validate input types for column ## What changes were proposed in this pull request? While preparing to take over

spark git commit: [SPARK-21070][PYSPARK] Attempt to update cloudpickle again

2017-08-21 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master c108a5d30 -> 751f51336 [SPARK-21070][PYSPARK] Attempt to update cloudpickle again ## What changes were proposed in this pull request? Based on https://github.com/apache/spark/pull/18282 by rgbkrk this PR attempts to update to the current

spark git commit: [MINOR][DOCS] Minor doc fixes related with doc build and uses script dir in SQL doc gen script

2017-08-25 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 522e1f80d -> 3b66b1c44 [MINOR][DOCS] Minor doc fixes related with doc build and uses script dir in SQL doc gen script ## What changes were proposed in this pull request? This PR proposes both: - Add information about Javadoc, SQL docs

spark git commit: [SPARK-21773][BUILD][DOCS] Installs mkdocs if missing in the path in SQL documentation build

2017-08-20 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 73e04ecc4 -> 41e0eb71a [SPARK-21773][BUILD][DOCS] Installs mkdocs if missing in the path in SQL documentation build ## What changes were proposed in this pull request? This PR proposes to install `mkdocs` by `pip install` if missing in

spark git commit: [SPARK-21764][TESTS] Fix tests failures on Windows: resources not being closed and incorrect paths

2017-08-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 734ed7a7b -> b30a11a6a [SPARK-21764][TESTS] Fix tests failures on Windows: resources not being closed and incorrect paths ## What changes were proposed in this pull request? `org.apache.spark.deploy.RPackageUtilsSuite` ``` - jars

spark-website git commit: Update committer page

2017-08-29 Thread gurwls223
Repository: spark-website Updated Branches: refs/remotes/apache/asf-site [created] 1895d5cb0 Update committer page Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/1895d5cb Tree:

spark git commit: [SPARK-21897][PYTHON][R] Add unionByName API to DataFrame in Python and R

2017-09-03 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master acb7fed23 -> 07fd68a29 [SPARK-21897][PYTHON][R] Add unionByName API to DataFrame in Python and R ## What changes were proposed in this pull request? This PR proposes to add a wrapper for `unionByName` API to R and Python as well.

spark git commit: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Python

2017-08-31 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f5e10a34e -> 5cd8ea99f [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Python ## What changes were proposed in this pull request? This PR make `DataFrame.sample(...)` can omit `withReplacement` defaulting `False`, consistently with

spark git commit: [SPARK-21789][PYTHON] Remove obsolete codes for parsing abstract schema strings

2017-08-31 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 5cd8ea99f -> 648a8626b [SPARK-21789][PYTHON] Remove obsolete codes for parsing abstract schema strings ## What changes were proposed in this pull request? This PR proposes to remove private functions that look not used in the main codes,

spark git commit: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-05 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 4e7a29efd -> 7f3c6ff4f [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0. ## What changes were proposed in this pull request? 1.0.0 fixes an issue with import order, explicit type for public methods, line length limitation and comment

spark git commit: [SPARK-20886][CORE] HadoopMapReduceCommitProtocol to handle FileOutputCommitter.getWorkPath==null

2017-08-29 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 3d0e17424 -> e47f48c73 [SPARK-20886][CORE] HadoopMapReduceCommitProtocol to handle FileOutputCommitter.getWorkPath==null ## What changes were proposed in this pull request? Handles the situation where a

spark git commit: [SPARK-21534][SQL][PYSPARK] PickleException when creating dataframe from python row with empty bytearray

2017-08-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 4482ff23a -> ecf437a64 [SPARK-21534][SQL][PYSPARK] PickleException when creating dataframe from python row with empty bytearray ## What changes were proposed in this pull request? `PickleException` is thrown when creating dataframe from

spark git commit: [SPARK-22217][SQL] ParquetFileFormat to support arbitrary OutputCommitters

2017-10-12 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 02218c4c7 -> 9104add4c [SPARK-22217][SQL] ParquetFileFormat to support arbitrary OutputCommitters ## What changes were proposed in this pull request? `ParquetFileFormat` to relax its requirement of output committer class from

spark git commit: [SPARK-22217][SQL] ParquetFileFormat to support arbitrary OutputCommitters

2017-10-12 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 cd51e2c32 -> cfc04e062 [SPARK-22217][SQL] ParquetFileFormat to support arbitrary OutputCommitters ## What changes were proposed in this pull request? `ParquetFileFormat` to relax its requirement of output committer class from

spark git commit: [SPARK-21551][PYTHON] Increase timeout for PythonRDD.serveIterator

2017-10-18 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 010b50cea -> f8c83fdc5 [SPARK-21551][PYTHON] Increase timeout for PythonRDD.serveIterator Backport of https://github.com/apache/spark/pull/18752 (https://issues.apache.org/jira/browse/SPARK-21551) (cherry picked from commit

spark git commit: [SPARK-22313][PYTHON] Mark/print deprecation warnings as DeprecationWarning for deprecated APIs

2017-10-23 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 884d4f95f -> d9798c834 [SPARK-22313][PYTHON] Mark/print deprecation warnings as DeprecationWarning for deprecated APIs ## What changes were proposed in this pull request? This PR proposes to mark the existing warnings as

spark git commit: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFrame from Pandas

2017-11-12 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 3d90b2cb3 -> 209b9361a [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFrame from Pandas ## What changes were proposed in this pull request? This change uses Arrow to optimize the creation of a Spark DataFrame from a Pandas

spark git commit: [SPARK-21693][R][FOLLOWUP] Reduce shuffle partitions running R worker in few tests to speed up

2017-11-26 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master fba63c1a7 -> d49d9e403 [SPARK-21693][R][FOLLOWUP] Reduce shuffle partitions running R worker in few tests to speed up ## What changes were proposed in this pull request? This is a followup to reduce AppVeyor test time. This PR proposes

spark git commit: [SPARK-22495] Fix setup of SPARK_HOME variable on Windows

2017-11-22 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1edb3175d -> b4edafa99 [SPARK-22495] Fix setup of SPARK_HOME variable on Windows ## What changes were proposed in this pull request? Fixing the way how `SPARK_HOME` is resolved on Windows. While the previous version was working with the

spark git commit: [SPARK-22572][SPARK SHELL] spark-shell does not re-initialize on :replay

2017-11-22 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 572af5027 -> 327d25fe1 [SPARK-22572][SPARK SHELL] spark-shell does not re-initialize on :replay ## What changes were proposed in this pull request? Ticket: [SPARK-22572](https://issues.apache.org/jira/browse/SPARK-22572) ## How was this

spark git commit: [SPARK-22654][TESTS] Retry Spark tarball download if failed in HiveExternalCatalogVersionsSuite

2017-11-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 38a0532cf -> d7b14746d [SPARK-22654][TESTS] Retry Spark tarball download if failed in HiveExternalCatalogVersionsSuite ## What changes were proposed in this pull request? Adds a simple loop to retry download of Spark tarballs from

spark git commit: [SPARK-22654][TESTS] Retry Spark tarball download if failed in HiveExternalCatalogVersionsSuite

2017-11-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 9c29c5576 -> 6eb203fae [SPARK-22654][TESTS] Retry Spark tarball download if failed in HiveExternalCatalogVersionsSuite ## What changes were proposed in this pull request? Adds a simple loop to retry download of Spark tarballs from

spark git commit: [SPARK-22635][SQL][ORC] FileNotFoundException while reading ORC files containing special characters

2017-11-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6eb203fae -> 932bd09c8 [SPARK-22635][SQL][ORC] FileNotFoundException while reading ORC files containing special characters ## What changes were proposed in this pull request? SPARK-22146 fix the FileNotFoundException issue only for the

spark git commit: [SPARK-22484][DOC] Document PySpark DataFrame csv writer behavior whe…

2017-11-27 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 087879a77 -> 33d43bf1b [SPARK-22484][DOC] Document PySpark DataFrame csv writer behavior whe… ## What changes were proposed in this pull request? In PySpark API Document, DataFrame.write.csv() says that setting the quote parameter to

spark git commit: [SPARK-22585][CORE] Path in addJar is not url encoded

2017-11-29 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 8ff474f6e -> ab6f60c4d [SPARK-22585][CORE] Path in addJar is not url encoded ## What changes were proposed in this pull request? This updates a behavior of `addJar` method of `sparkContext` class. If path without any scheme is passed as

spark git commit: [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and fix image test failure in Python 3.6.0 / NumPy 1.13.3

2017-11-29 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master ab6f60c4d -> 92cfbeeb5 [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and fix image test failure in Python 3.6.0 / NumPy 1.13.3 ## What changes were proposed in this pull request? Image test seems failed in Python 3.6.0 / NumPy 1.13.3.

spark git commit: [SPARK-22651][PYTHON][ML] Prevent initiating multiple Hive clients for ImageSchema.readImages

2017-12-01 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master ee10ca7ec -> aa4cf2b19 [SPARK-22651][PYTHON][ML] Prevent initiating multiple Hive clients for ImageSchema.readImages ## What changes were proposed in this pull request? Calling `ImageSchema.readImages` multiple times as below in PySpark

spark git commit: [SPARK-22635][SQL][ORC] FileNotFoundException while reading ORC files containing special characters

2017-12-01 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 ba00bd961 -> f3f8c8767 [SPARK-22635][SQL][ORC] FileNotFoundException while reading ORC files containing special characters ## What changes were proposed in this pull request? SPARK-22146 fix the FileNotFoundException issue only for

spark git commit: [SPARK-22811][PYSPARK][ML] Fix pyspark.ml.tests failure when Hive is not available.

2017-12-15 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 46776234a -> 0c8fca460 [SPARK-22811][PYSPARK][ML] Fix pyspark.ml.tests failure when Hive is not available. ## What changes were proposed in this pull request? pyspark.ml.tests is missing a py4j import. I've added the import and fixed the

spark git commit: [SPARK-19809][SQL][TEST] NullPointerException on zero-size ORC file

2017-12-12 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 704af4bd6 -> 17cdabb88 [SPARK-19809][SQL][TEST] NullPointerException on zero-size ORC file ## What changes were proposed in this pull request? Until 2.2.1, Spark raises `NullPointerException` on zero-size ORC files. Usually, these

spark git commit: [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-tests.py

2017-12-18 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master fbfa9be7e -> 3a07eff5a [SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-tests.py ## What changes were proposed in this pull request? In [the environment where `/usr/sbin/lsof` does not

spark git commit: Revert "Revert "[SPARK-22496][SQL] thrift server adds operation logs""

2017-12-18 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 772e4648d -> fbfa9be7e Revert "Revert "[SPARK-22496][SQL] thrift server adds operation logs"" This reverts commit e58f275678fb4f904124a4a2a1762f04c835eb0e. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-22817][R] Use fixed testthat version for SparkR tests in AppVeyor

2017-12-16 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 0c8fca460 -> c2aeddf9e [SPARK-22817][R] Use fixed testthat version for SparkR tests in AppVeyor ## What changes were proposed in this pull request? `testthat` 2.0.0 is released and AppVeyor now started to use it instead of 1.0.2. And

spark git commit: [SPARK-22817][R] Use fixed testthat version for SparkR tests in AppVeyor

2017-12-16 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 b4f4be396 -> 1e4cca02f [SPARK-22817][R] Use fixed testthat version for SparkR tests in AppVeyor ## What changes were proposed in this pull request? `testthat` 2.0.0 is released and AppVeyor now started to use it instead of 1.0.2. And

spark git commit: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not exists in release-build.sh

2017-11-13 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.1 ca19271cc -> 7bdad58e2 [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not exists in release-build.sh ## What changes were proposed in this pull request? This PR proposes to use `/usr/sbin/lsof` if `lsof` is missing in the path

spark git commit: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not exists in release-build.sh

2017-11-13 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f7534b37e -> c8b7f97b8 [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not exists in release-build.sh ## What changes were proposed in this pull request? This PR proposes to use `/usr/sbin/lsof` if `lsof` is missing in the path to

spark git commit: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not exists in release-build.sh

2017-11-13 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 d905e85d2 -> 3ea6fd0c4 [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not exists in release-build.sh ## What changes were proposed in this pull request? This PR proposes to use `/usr/sbin/lsof` if `lsof` is missing in the path

spark git commit: [SPARK-22554][PYTHON] Add a config to control if PySpark should use daemon or not for workers

2017-11-19 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master b10837ab1 -> 57c5514de [SPARK-22554][PYTHON] Add a config to control if PySpark should use daemon or not for workers ## What changes were proposed in this pull request? This PR proposes to add a flag to control if PySpark should use

spark git commit: [SPARK-22557][TEST] Use ThreadSignaler explicitly

2017-11-19 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master d54bfec2e -> b10837ab1 [SPARK-22557][TEST] Use ThreadSignaler explicitly ## What changes were proposed in this pull request? ScalaTest 3.0 uses an implicit `Signaler`. This PR makes it sure all Spark tests uses `ThreadSignaler`

spark git commit: [SPARK-20791][PYTHON][FOLLOWUP] Check for unicode column names in createDataFrame with Arrow

2017-11-15 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master dce1610ae -> 8f0e88df0 [SPARK-20791][PYTHON][FOLLOWUP] Check for unicode column names in createDataFrame with Arrow ## What changes were proposed in this pull request? If schema is passed as a list of unicode strings for column names,

spark git commit: [SPARK-22476][R] Add dayofweek function to R

2017-11-11 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 3eb315d71 -> 223d83ee9 [SPARK-22476][R] Add dayofweek function to R ## What changes were proposed in this pull request? This PR adds `dayofweek` to R API: ```r data <- list(list(d = as.Date("2012-12-13")), list(d =

  1   2   3   4   5   6   7   8   9   10   >