[GitHub] spark pull request: [SPARK-14678][SQL]Add a file sink log to suppo...

2016-04-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12435#discussion_r60302180 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -443,6 +444,27 @@ object SQLConf { .booleanConf

[GitHub] spark pull request: [WIP, DO-NOT-MERGE][SQL][Added support for par...

2016-04-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12409#discussion_r59942518 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala --- @@ -371,6 +382,97 @@ private[sql] class

[GitHub] spark pull request: [SPARK-14473][SQL] Define analysis rules to ca...

2016-04-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12246#discussion_r59942192 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -0,0 +1,145

[GitHub] spark pull request: [WIP, DO-NOT-MERGE][SQL][Added support for par...

2016-04-15 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12409#issuecomment-210614034 General structure looks good! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [WIP, DO-NOT-MERGE][SQL][Added support for par...

2016-04-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12409#discussion_r59930738 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -347,6 +347,14 @@ abstract class OutputWriterFactory extends

[GitHub] spark pull request: [WIP, DO-NOT-MERGE][SQL][Added support for par...

2016-04-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12409#discussion_r59930604 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala --- @@ -40,17 +48,22 @@ object FileStreamSink { class

[GitHub] spark pull request: [WIP, DO-NOT-MERGE][SQL][Added support for par...

2016-04-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12409#discussion_r59930551 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala --- @@ -371,6 +382,97 @@ private[sql] class

[GitHub] spark pull request: [SPARK-14628][WIP] Simplify task metrics by al...

2016-04-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12388#discussion_r59923261 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -434,50 +434,50 @@ class JobProgressListener(conf: SparkConf

[GitHub] spark pull request: [SPARK-14555] First cut of Python API for Stru...

2016-04-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12320#discussion_r59922003 --- Diff: python/pyspark/sql/readwriter.py --- @@ -426,6 +488,68 @@ def save(self, path=None, format=None, mode=None, partitionBy=None, **options

[GitHub] spark pull request: [SPARK-14555] First cut of Python API for Stru...

2016-04-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12320#discussion_r59919017 --- Diff: python/pyspark/sql/readwriter.py --- @@ -395,6 +425,38 @@ def partitionBy(self, *cols): self._jwrite = self._jwrite.partitionBy

[GitHub] spark pull request: [SPARK-14555] First cut of Python API for Stru...

2016-04-15 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12320#issuecomment-210575281 Overall looks pretty good! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14555] First cut of Python API for Stru...

2016-04-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12320#discussion_r59918040 --- Diff: python/pyspark/sql/readwriter.py --- @@ -426,6 +488,68 @@ def save(self, path=None, format=None, mode=None, partitionBy=None, **options

[GitHub] spark pull request: [SPARK-14555] First cut of Python API for Stru...

2016-04-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12320#discussion_r59917973 --- Diff: python/pyspark/sql/readwriter.py --- @@ -426,6 +488,68 @@ def save(self, path=None, format=None, mode=None, partitionBy=None, **options

[GitHub] spark pull request: [SPARK-14555] First cut of Python API for Stru...

2016-04-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12320#discussion_r59917835 --- Diff: python/pyspark/sql/readwriter.py --- @@ -426,6 +488,68 @@ def save(self, path=None, format=None, mode=None, partitionBy=None, **options

[GitHub] spark pull request: [SPARK-14555] First cut of Python API for Stru...

2016-04-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12320#discussion_r59917419 --- Diff: python/pyspark/sql/readwriter.py --- @@ -395,6 +425,38 @@ def partitionBy(self, *cols): self._jwrite = self._jwrite.partitionBy

[GitHub] spark pull request: [SPARK-14555] First cut of Python API for Stru...

2016-04-15 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12320#discussion_r59917201 --- Diff: python/pyspark/sql/readwriter.py --- @@ -426,6 +488,68 @@ def save(self, path=None, format=None, mode=None, partitionBy=None, **options

[GitHub] spark pull request: [SPARK-14614][SQL] Add `bround` function

2016-04-14 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12376#issuecomment-210100953 +1 to native implementations of hive udfs so we can continue to minimize our dependence. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-14473][SQL] Define analysis rules to ca...

2016-04-14 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12246#issuecomment-210081598 Some minor comments, otherwise LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-14473][SQL] Define analysis rules to ca...

2016-04-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12246#discussion_r59764187 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala --- @@ -0,0 +1,379

[GitHub] spark pull request: [SPARK-14473][SQL] Define analysis rules to ca...

2016-04-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12246#discussion_r59763939 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala --- @@ -0,0 +1,379

[GitHub] spark pull request: [SPARK-14473][SQL] Define analysis rules to ca...

2016-04-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12246#discussion_r59763849 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala --- @@ -0,0 +1,379

[GitHub] spark pull request: [SPARK-14473][SQL] Define analysis rules to ca...

2016-04-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12246#discussion_r59763010 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StreamTest.scala --- @@ -75,6 +76,8 @@ trait StreamTest extends QueryTest with Timeouts

[GitHub] spark pull request: [SPARK-14473][SQL] Define analysis rules to ca...

2016-04-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12246#discussion_r59762679 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -42,6 +42,9 @@ abstract class LogicalPlan

[GitHub] spark pull request: [SPARK-14473][SQL] Define analysis rules to ca...

2016-04-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12246#discussion_r59762314 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -0,0 +1,143

[GitHub] spark pull request: [SPARK-14473][SQL] Define analysis rules to ca...

2016-04-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12246#discussion_r59762181 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/OutputMode.scala --- @@ -0,0 +1,23 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-14607] [SPARK-14484] [SQL] fix case-ins...

2016-04-13 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12371#issuecomment-209673476 LGTM, thanks for fixing this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14581] [SQL] push predicatese through m...

2016-04-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12342#discussion_r59490343 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -975,6 +939,73 @@ object

[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-04-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/11925#discussion_r59487598 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -196,4 +195,148 @@ class HDFSMetadataLog[T

[GitHub] spark pull request: [SPARK-14554][SQL][follow-up] use checkDataset...

2016-04-12 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12346#issuecomment-209196952 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14579][SQL]Fix a race condition in Stre...

2016-04-12 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12339#issuecomment-209128576 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

spark git commit: [SPARK-14474][SQL] Move FileSource offset log into checkpointLocation

2016-04-12 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master da60b34d2 -> 6bf692147 [SPARK-14474][SQL] Move FileSource offset log into checkpointLocation ## What changes were proposed in this pull request? Now that we have a single location for storing checkpointed state. This PR just propagates

[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-12 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-209024772 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14288][SQL] Memory Sink for streaming

2016-04-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12119#discussion_r59420081 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -275,23 +277,64 @@ final class DataFrameWriter private[sql](df

[GitHub] spark pull request: [SPARK-14554][SQL] disable whole stage codegen...

2016-04-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12322#discussion_r59417359 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -620,6 +620,12 @@ class DatasetSuite extends QueryTest

[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59297156 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -129,8 +129,17 @@ trait SchemaRelationProvider { * Implemented

[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59295994 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -129,8 +129,17 @@ trait SchemaRelationProvider { * Implemented

[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59282240 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -341,6 +347,33 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59256046 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -67,12 +62,33 @@ class FileStreamSource

[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59255827 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -123,8 +123,16 @@ case class DataSource

[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59254285 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -341,6 +347,33 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59253946 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -67,12 +62,33 @@ class FileStreamSource

[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59253976 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/MemorySinkSuite.scala --- @@ -59,7 +59,7 @@ class MemorySinkSuite extends StreamTest

[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59253837 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -123,8 +123,16 @@ case class DataSource

spark git commit: [SPARK-14494][SQL] Fix the race conditions in MemoryStream and MemorySink

2016-04-11 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 5de26194a -> 2dacc81ec [SPARK-14494][SQL] Fix the race conditions in MemoryStream and MemorySink ## What changes were proposed in this pull request? Make sure accessing mutable variables in MemoryStream and MemorySink are protected by

[GitHub] spark pull request: [SPARK-14494][SQL]Fix the race conditions in M...

2016-04-11 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12261#issuecomment-208468200 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14449][SQL] SparkContext should use Spa...

2016-04-07 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12227#issuecomment-207091195 /cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-12740] [SPARK-13932] support grouping()...

2016-04-07 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12235#issuecomment-207047005 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-12740] [SPARK-13932] support grouping()...

2016-04-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12235#discussion_r58921053 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala --- @@ -333,6 +333,8 @@ case class PrettyAttribute

[GitHub] spark pull request: [SPARK-14456][SQL][MINOR] Remove unused variab...

2016-04-07 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12237#issuecomment-207030624 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

spark git commit: [SPARK-14456][SQL][MINOR] Remove unused variables and logics in DataSource

2016-04-07 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 3aa7d7639 -> 8dcb0c7c9 [SPARK-14456][SQL][MINOR] Remove unused variables and logics in DataSource ## What changes were proposed in this pull request? In DataSource#write method, the variables `dataSchema` and `equality`, and related

[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...

2016-04-07 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12239#issuecomment-207029860 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14449][SQL] SparkContext should use Spa...

2016-04-06 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/12227 [SPARK-14449][SQL] SparkContext should use SparkListenerInterface Currently all `SparkFirehoseListener` implementations are broken since we expect listeners to extend `SparkListener`, while

[GitHub] spark pull request: [SPARK-14446][tests] Fix ReplSuite for Scala 2...

2016-04-06 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12223#issuecomment-206637406 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

2016-04-06 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12047#issuecomment-206476011 This is a huge improvement. A few minor comments, otherwise LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

2016-04-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12047#discussion_r58746557 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala --- @@ -347,32 +358,24 @@ private[sql] class

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

2016-04-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12047#discussion_r58746322 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala --- @@ -347,32 +358,24 @@ private[sql] class

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

2016-04-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12047#discussion_r58745999 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -396,6 +396,12 @@ object SQLConf { .booleanConf

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

2016-04-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12047#discussion_r58745859 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala --- @@ -279,7 +279,8 @@ class

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

2016-04-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12047#discussion_r58745744 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -469,6 +469,13 @@ trait FileFormat { options: Map[String

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

2016-04-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12047#discussion_r58745638 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala --- @@ -306,6 +315,10 @@ private[sql] class

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

2016-04-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12047#discussion_r58745460 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala --- @@ -408,6 +411,10 @@ private[sql] class

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

2016-04-06 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12047#discussion_r58745404 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala --- @@ -481,6 +497,17 @@ private[sql] class

[GitHub] spark pull request: [HOTFIX][SPARK-14402] Fix ExpressionDescriptio...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12192#issuecomment-206095830 @tdas another streaming failure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12087#discussion_r58645473 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12087#discussion_r58643806 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12087#discussion_r58643564 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [HOTFIX][SPARK-14402] Fix ExpressionDescriptio...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12192#issuecomment-206068825 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14288][SQL] Memory Sink for streaming

2016-04-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12119#discussion_r58635507 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -275,23 +279,64 @@ final class DataFrameWriter private[sql](df

[GitHub] spark pull request: [SPARK-529] [core] [yarn] Add type-safe config...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10205#issuecomment-206043222 > Also, because of that method, the return value of getConf(CHECKPOINT_LOCATION) would be String and not Option[String], which is probably why intellij is complain

[GitHub] spark pull request: [SPARK-14288][SQL] Memory Sink for streaming

2016-04-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12119#discussion_r58634150 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -275,23 +279,64 @@ final class DataFrameWriter private[sql](df

[GitHub] spark pull request: [SPARK-529] [core] [yarn] Add type-safe config...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10205#issuecomment-206039890 In particular the implicit wrapping of `OptionalConfigEntry[T]` to be a `ConfigEntry[Option[T]]` coupled with the unwrapping done via overloading of `getConf` took me

[GitHub] spark pull request: [SPARK-529] [core] [yarn] Add type-safe config...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10205#issuecomment-20605 I realize I'm super late to this party, but I just spent a bunch of time trying to understand this new system while rebasing a PR. Overall, I think all

[GitHub] spark pull request: [SPARK-14372] [SQL] : Dataset.randomSplit() ne...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12184#issuecomment-206027097 /cc @liancheng Should we just make the scala version return a `Seq` if `Array` doesn't work for java? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-14372] [SQL] : Dataset.randomSplit() ne...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12184#issuecomment-206026945 There is no test suite, please add one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

spark git commit: [SPARK-14411][SQL] Add a note to warn that onQueryProgress is asynchronous

2016-04-05 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 45d8cdee3 -> 7329fe272 [SPARK-14411][SQL] Add a note to warn that onQueryProgress is asynchronous ## What changes were proposed in this pull request? onQueryProgress is asynchronous so the user may see some future status of

[GitHub] spark pull request: [SPARK-14411][SQL]Add a note to warn that onQu...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12180#issuecomment-206007628 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-13929] Use Scala reflection for UDTs

2016-04-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12149#discussion_r58622959 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -81,9 +81,43 @@ case class MultipleConstructorsData

spark git commit: [SPARK-14402][SQL] initcap UDF doesn't match Hive/Oracle behavior in lowercasing rest of string

2016-04-05 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 9ee5c2571 -> c59abad05 [SPARK-14402][SQL] initcap UDF doesn't match Hive/Oracle behavior in lowercasing rest of string ## What changes were proposed in this pull request? Current, SparkSQL `initCap` is using `toTitleCase` function.

[GitHub] spark pull request: [SPARK-14402][SQL] initcap UDF doesn't match H...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12175#issuecomment-205978170 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14411][SQL]Add a note to warn that onQu...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12180#issuecomment-205975113 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

spark git commit: [SPARK-14257][SQL] Allow multiple continuous queries to be started from the same DataFrame

2016-04-05 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master f77f11c67 -> 463bac001 [SPARK-14257][SQL] Allow multiple continuous queries to be started from the same DataFrame ## What changes were proposed in this pull request? Make StreamingRelation store the closure to create the source in

[GitHub] spark pull request: [SPARK-14257][SQL]Allow multiple continuous qu...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12049#issuecomment-205925072 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14402][SQL] initcap UDF doesn't match H...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12175#issuecomment-205924675 It does seem reasonable to match hive since that was probably the original intention. I've tagged the JIRA for inclusion in the release notes. A few comments

[GitHub] spark pull request: [SPARK-13929] Use Scala reflection for UDTs

2016-04-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12149#discussion_r58586141 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -81,9 +81,43 @@ case class MultipleConstructorsData

spark git commit: [SPARK-14345][SQL] Decouple deserializer expression resolution from ObjectOperator

2016-04-05 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master e4bd50412 -> f77f11c67 [SPARK-14345][SQL] Decouple deserializer expression resolution from ObjectOperator ## What changes were proposed in this pull request? This PR decouples deserializer expression resolution from `ObjectOperator`, so

[GitHub] spark pull request: [SPARK-14345][SQL] Decouple deserializer expre...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12131#issuecomment-205918635 LGTM, thanks for improving the comments. Its much clearer to me what is happing now!] Merging to master. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12087#discussion_r58582999 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12087#discussion_r58577527 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-14257][SQL]Allow multiple continuous qu...

2016-04-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12049#discussion_r58478636 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala --- @@ -19,16 +19,33 @@ package

[GitHub] spark pull request: [SPARK-14257][SQL]Allow multiple continuous qu...

2016-04-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12049#discussion_r58478509 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala --- @@ -19,16 +19,33 @@ package

[GitHub] spark pull request: [SPARK-14257][SQL]Allow multiple continuous qu...

2016-04-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12049#discussion_r58478446 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ContinuousQueryManager.scala --- @@ -178,11 +178,19 @@ class ContinuousQueryManager(sqlContext

spark git commit: [SPARK-14287] isStreaming method for Dataset

2016-04-04 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 7201f033c -> ba24d1ee9 [SPARK-14287] isStreaming method for Dataset With the addition of StreamExecution (ContinuousQuery) to Datasets, data will become unbounded. With unbounded data, the execution of some methods and operations will

[GitHub] spark pull request: [SPARK-14287] isStreaming method for Dataset

2016-04-04 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12080#issuecomment-205598309 Thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

2016-04-04 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12047#issuecomment-205551227 Yeah, this looks much cleaner. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14287] isStreaming method for Dataset

2016-04-04 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12080#issuecomment-205536306 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14287] isStreaming method for Dataset

2016-04-04 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12080#issuecomment-205536342 @tdas here is another failure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-14259] [SQL] Merging small files togeth...

2016-04-04 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12095#issuecomment-205497090 Lgtm, though I still think pipelined reading is the right thing to do long term. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-14287] isStreaming method for Dataset

2016-04-04 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12080#issuecomment-205438436 Implementation LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14287] isStreaming method for Dataset

2016-04-04 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12080#discussion_r5842 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -449,6 +450,17 @@ class Dataset[T] private[sql]( def isLocal: Boolean

[GitHub] spark pull request: [SPARK-14176][SQL]Add DataFrameWriter.trigger ...

2016-04-04 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/11976#issuecomment-205412532 LGTM, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

<    5   6   7   8   9   10   11   12   13   14   >