spark git commit: [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted

2017-07-06 Thread tdas
Repository: spark Updated Branches: refs/heads/master 14a3bb3a0 -> 60043f224 [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted ## What changes were proposed in this pull request? Stopping query while it is being initialized can throw interrupt e

spark git commit: [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted

2017-07-06 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.2 6e1081cbe -> 4e53a4edd [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted ## What changes were proposed in this pull request? Stopping query while it is being initialized can throw interru

spark git commit: [SPARK-21012][SUBMIT] Add glob support for resources adding to Spark

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 60043f224 -> 5800144a5 [SPARK-21012][SUBMIT] Add glob support for resources adding to Spark Current "--jars (spark.jars)", "--files (spark.files)", "--py-files (spark.submit.pyFiles)" and "--archives (spark.yarn.dist.archives)" only suppo

spark git commit: [SPARK-20703][SQL] Associate metrics with data writes onto DataFrameWriter operations

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 5800144a5 -> 6ff05a66f [SPARK-20703][SQL] Associate metrics with data writes onto DataFrameWriter operations ## What changes were proposed in this pull request? Right now in the UI, after SPARK-20213, we can show the operations to write

spark git commit: [SPARK-21324][TEST] Improve statistics test suites

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 6ff05a66f -> b8e4d567a [SPARK-21324][TEST] Improve statistics test suites ## What changes were proposed in this pull request? 1. move `StatisticsCollectionTestBase` to a separate file. 2. move some test cases to `StatisticsCollectionSuite`

spark-website git commit: Update committer page.

2017-07-06 Thread srowen
Repository: spark-website Updated Branches: refs/heads/asf-site 9749c8e2f -> 8a9ae6b7d Update committer page. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/8a9ae6b7 Tree: http://git-wip-us.apache.org/repo

spark-website git commit: Update Sandy.

2017-07-06 Thread srowen
Repository: spark-website Updated Branches: refs/heads/asf-site 8a9ae6b7d -> 878dcfd84 Update Sandy. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/878dcfd8 Tree: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-21273][SQL][FOLLOW-UP] Add missing test cases back and revise code style

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master b8e4d567a -> d540dfbff [SPARK-21273][SQL][FOLLOW-UP] Add missing test cases back and revise code style ## What changes were proposed in this pull request? Add missing test cases back and revise code style Follow up the previous PR: https:

spark git commit: [SPARK-20950][CORE] add a new config to diskWriteBufferSize which is hard coded before

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master d540dfbff -> 565e7a8d4 [SPARK-20950][CORE] add a new config to diskWriteBufferSize which is hard coded before ## What changes were proposed in this pull request? This PR Improvement in two: 1.With spark.shuffle.spill.diskWriteBufferSize c

spark git commit: [SPARK-21228][SQL] InSet incorrect handling of structs

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 565e7a8d4 -> 26ac085de [SPARK-21228][SQL] InSet incorrect handling of structs ## What changes were proposed in this pull request? When data type is struct, InSet now uses TypeUtils.getInterpretedOrdering (similar to EqualTo) to build a Tre

spark git commit: [SPARK-21204][SQL] Add support for Scala Set collection types in serialization

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 26ac085de -> 48e44b24a [SPARK-21204][SQL] Add support for Scala Set collection types in serialization ## What changes were proposed in this pull request? Currently we can't produce a `Dataset` containing `Set` in SparkSQL. This PR tries t

spark git commit: [SPARK-21323][SQL] Rename plans.logical.statsEstimation.Range to ValueInterval

2017-07-06 Thread rxin
Repository: spark Updated Branches: refs/heads/master 48e44b24a -> bf66335ac [SPARK-21323][SQL] Rename plans.logical.statsEstimation.Range to ValueInterval ## What changes were proposed in this pull request? Rename org.apache.spark.sql.catalyst.plans.logical.statsEstimation.Range to ValueInt

spark git commit: [SPARK-21267][SS][DOCS] Update Structured Streaming Documentation

2017-07-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master bf66335ac -> 0217dfd26 [SPARK-21267][SS][DOCS] Update Structured Streaming Documentation ## What changes were proposed in this pull request? Few changes to the Structured Streaming documentation - Clarify that the entire stream input table

spark git commit: [SPARK-21267][SS][DOCS] Update Structured Streaming Documentation

2017-07-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 4e53a4edd -> 576fd4c3a [SPARK-21267][SS][DOCS] Update Structured Streaming Documentation ## What changes were proposed in this pull request? Few changes to the Structured Streaming documentation - Clarify that the entire stream input t

spark git commit: [SPARK-20946][SQL] Do not update conf for existing SparkContext in SparkSession.getOrCreate

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 0217dfd26 -> 40c7add3a [SPARK-20946][SQL] Do not update conf for existing SparkContext in SparkSession.getOrCreate ## What changes were proposed in this pull request? SparkContext is shared by all sessions, we should not update its conf f

spark git commit: [SPARK-21329][SS] Make EventTimeWatermarkExec explicitly UnaryExecNode

2017-07-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 40c7add3a -> e5bb26174 [SPARK-21329][SS] Make EventTimeWatermarkExec explicitly UnaryExecNode ## What changes were proposed in this pull request? Making EventTimeWatermarkExec explicitly UnaryExecNode /cc tdas zsxwing ## How was this pat

spark git commit: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in LibSVMFileFormat and allow multiple input paths for determining numFeatures

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master e5bb26174 -> d451b7f43 [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in LibSVMFileFormat and allow multiple input paths for determining numFeatures ## What changes were proposed in this pull request? This is related with [SPARK-19918

spark git commit: [SPARK-21327][SQL][PYSPARK] ArrayConstructor should handle an array of typecode 'l' as long rather than int in Python 2.

2017-07-06 Thread ueshin
Repository: spark Updated Branches: refs/heads/master d451b7f43 -> 53c2eb59b [SPARK-21327][SQL][PYSPARK] ArrayConstructor should handle an array of typecode 'l' as long rather than int in Python 2. ## What changes were proposed in this pull request? Currently `ArrayConstructor` handles an ar

spark git commit: [SPARK-21217][SQL] Support ColumnVector.Array.toArray()

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 53c2eb59b -> c09b31eb8 [SPARK-21217][SQL] Support ColumnVector.Array.toArray() ## What changes were proposed in this pull request? This PR implements bulk-copy for `ColumnVector.Array.toArray()` methods (e.g. `toIntArray()`) in `ColumnVec

spark git commit: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with data writes onto DataFrameWriter operations

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master c09b31eb8 -> 5df99bd36 [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with data writes onto DataFrameWriter operations ## What changes were proposed in this pull request? Remove time metrics since it seems no way to measure it in non per