[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19394 What's the other value? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19394 Not sure - maybe print the chi-value of the test and see if they make sense. If they do, we can change the threshold

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-09-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18732 What's the difference between this one and the transform function you also proposed? I'm trying to see if all the naming makes sense when considered together

[GitHub] spark issue #19393: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...

2017-09-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19393 LGTM but I wrote most of the code so perhaps we should find somebody else to review. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-09-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18732 Is this just a mapGroups function? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #19387: [SPARK-22160][SQL] Make sample points per partiti...

2017-09-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19387#discussion_r141786663 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -108,11 +108,21 @@ class HashPartitioner(partitions: Int) extends Partitioner

[GitHub] spark issue #19387: [SPARK-22160][SQL] Make sample points per partition (in ...

2017-09-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19387 Merging in master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19387: [SPARK-22160][SQL] Make sample points per partition (in ...

2017-09-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19387 I put up a comment saying this test result should be deterministic, since the sampling uses a fixed seed based on partition id

[GitHub] spark pull request #19387: [SPARK-22160][SQL] Make sample points per partiti...

2017-09-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19387#discussion_r141764431 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19387: [SPARK-22160][SQL] Make sample points per partiti...

2017-09-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19387#discussion_r141764415 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19387: [SPARK-22160][SQL] Make sample points per partiti...

2017-09-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19387#discussion_r141755874 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -108,9 +108,17 @@ class HashPartitioner(partitions: Int) extends Partitioner

[GitHub] spark pull request #19387: [SPARK-22160][SQL] Allow changing sample points p...

2017-09-28 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/19387 [SPARK-22160][SQL] Allow changing sample points per partition in range shuffle exchange ## What changes were proposed in this pull request? Spark's RangePartitioner hard codes the number

[GitHub] spark issue #19384: [SPARK-22159][SQL] Make config names consistently end wi...

2017-09-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19384 I reverted the 2nd commit. Should be good for merge now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19384: [SPARK-22159][SQL] Make config names consistently end wi...

2017-09-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19384 hm the 2nd commit is not meant for this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19384: [SPARK-22159][SQL] Make config names consistently...

2017-09-28 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/19384 [SPARK-22159][SQL] Make config names consistently end with "enabled". ## What changes were proposed in this pull request? spark.sql.execution.ar

[GitHub] spark pull request #19376: [SPARK-22153][SQL] Rename ShuffleExchange -> Shuf...

2017-09-27 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/19376 [SPARK-22153][SQL] Rename ShuffleExchange -> ShuffleExchangeExec ## What changes were proposed in this pull request? For some reason when we added the Exec suffix to all physical operators,

[GitHub] spark pull request #19362: [SPARK-22141][SQL] Propagate empty relation befor...

2017-09-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19362#discussion_r141403817 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -136,6 +134,8 @@ abstract class Optimizer(sessionCatalog

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r140379971 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139889741 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Command.scala --- @@ -0,0 +1,114

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139889045 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/SupportsWriteUnsafeRow.java --- @@ -0,0 +1,44 @@ +/* + * Licensed

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r13951 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

2017-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18704 cc @michal-databricks any thoughts on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19261: [SPARK-22040] Add current_date function with timezone id

2017-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19261 What does this even mean? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19136 LGTM. Still some feedback that can be addressed later. We should also document all the APIs as Evolving

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138947707 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138947426 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Statistics.java --- @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138947297 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupportWithSchema.java --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138946124 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138945691 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138709319 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Statistics.java --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138665881 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138624261 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/Statistics.java --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138623586 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/downward/ColumnPruningSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138622262 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138622067 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SchemaRequiredDataSourceV2.java --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138621970 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SchemaRequiredDataSourceV2.java --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138621700 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138621506 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16578 I tried this and this is definitely super useful! it's a big patch and most of the people working in this area are either doing something else that's not Spark, or working on a few high priority SPIPs

[GitHub] spark pull request #19086: [SPARK-21874][SQL] Support changing database when...

2017-09-01 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19086#discussion_r136640563 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -495,15 +495,16 @@ private[hive] class HiveClientImpl

[GitHub] spark pull request #19064: [SPARK-21848][SQL] Add trait UDFType to identify ...

2017-08-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19064#discussion_r135590939 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -23,6 +23,12 @@ import

[GitHub] spark pull request #19064: [SPARK-21848][SQL] Add trait UDFType to identify ...

2017-08-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19064#discussion_r135590830 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -23,6 +23,12 @@ import

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-08-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18906 I understand why you are using Python. What I don't understand is why you'd need to annotate nullability, because those are typically annotated for the purpose of performance improvement, but Python

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

2017-08-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18999#discussion_r135132622 --- Diff: python/pyspark/sql/dataframe.py --- @@ -659,19 +659,77 @@ def distinct(self): return DataFrame(self._jdf.distinct(), self.sql_ctx

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-08-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18906 @ptkool have you seen a real use case so far that you need this? I'm a bit surprised since Python UDFs are already pretty slow, and you'd care about this. Are there other cases you run

[GitHub] spark issue #19008: [SPARK-21756][SQL]Add JSON option to allow unquoted cont...

2017-08-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19008 Can you also update the various json functions in which we document the options? The way it is right now there is no way for end-users to discover this option. --- If your project is set up

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

2017-08-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18999#discussion_r134123916 --- Diff: python/pyspark/sql/dataframe.py --- @@ -659,19 +659,77 @@ def distinct(self): return DataFrame(self._jdf.distinct(), self.sql_ctx

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

2017-08-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18999#discussion_r134123358 --- Diff: python/pyspark/sql/dataframe.py --- @@ -659,19 +659,77 @@ def distinct(self): return DataFrame(self._jdf.distinct(), self.sql_ctx

[GitHub] spark issue #18996: [MINOR][TYPO] Fix typos: runnning and Excecutors

2017-08-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18996 Merging in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18988: [SPARK-21778][SQL] Simpler Dataset.sample API in Scala /...

2017-08-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18988 Thanks. Do you want to add the Python and R ones? It is a little bit tricky because in Python we would need to detect whether withReplacement is a boolean or a floating point value

[GitHub] spark pull request #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTas...

2017-08-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18979#discussion_r133887920 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala --- @@ -57,7 +60,14 @@ class

[GitHub] spark pull request #18988: [SPARK-21778][SQL] Simpler Dataset.sample API in ...

2017-08-17 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18988 [SPARK-21778][SQL] Simpler Dataset.sample API in Scala / Java ## What changes were proposed in this pull request? Dataset.sample requires a boolean flag withReplacement as the first argument

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-08-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18956#discussion_r133360047 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -37,6 +37,12 @@ import org.apache.spark.sql.types

[GitHub] spark pull request #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18640#discussion_r133131618 --- Diff: sql/core/pom.xml --- @@ -87,6 +87,16 @@ + org.apache.orc + orc-core + ${orc.classifier

[GitHub] spark issue #18923: [SPARK-21710][StSt] Fix OOM on ConsoleSink with large in...

2017-08-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18923 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18912: [SPARK-21699][SQL] Remove unused getTableOption in Exter...

2017-08-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18912 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18912: [SQL] Remove unused getTableOption in ExternalCat...

2017-08-10 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18912 [SQL] Remove unused getTableOption in ExternalCatalog ## What changes were proposed in this pull request? This patch removes the unused SessionCatalog.getTableMetadataOption and ExternalCatalog

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2017-08-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18900 We should put this in the catalog, shouldn't we? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18884: [SPARK-21669] Internal API for collecting metrics/stats ...

2017-08-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18884 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18884: [SPARK-21669] Internal API for collecting metrics/stats ...

2017-08-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18884 this looks good to me, but I didn't review super carefully. cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18752: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-08-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18752 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18886: [SPARK-21671][core] Move kvstore to "util" sub-package, ...

2017-08-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18886 thx lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18786: [SPARK-21584][SQL][SparkR] Update R method for summary t...

2017-08-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18786 I suspect it is ok for R ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18884: [SPARK-21669] Internal API for collecting metrics...

2017-08-08 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18884#discussion_r132022172 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala --- @@ -128,6 +128,7 @@ class FileStreamSink

[GitHub] spark pull request #18884: [SPARK-21669] Internal API for collecting metrics...

2017-08-08 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18884#discussion_r132006851 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala --- @@ -0,0 +1,121 @@ +/* + * Licensed

[GitHub] spark pull request #18884: [SPARK-21669] Internal API for collecting metrics...

2017-08-08 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18884#discussion_r132006674 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed

[GitHub] spark issue #18607: [SPARK-21362][SQL][Adding Apache Drill JDBC Dialect]

2017-08-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18607 We can just put this code in a 3rd party library, can't we? If there is an issue with service/code discovery, we can come up with some sort of registration process similar to the data source API

[GitHub] spark issue #18884: [SPARK-21669] Internal API for collecting metrics/stats ...

2017-08-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18884 Jenkins, add to white list. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18607: [SPARK-21362][SQL][Adding Apache Drill JDBC Dialect]

2017-08-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18607 Unfortunately I think drill is not popular enough to warrant inclusion in here yet. If this is not extensible, we should make it possible to include such mappings outside Spark and then perhaps Drill

[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...

2017-08-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18851 Looks like the strip global limit is used by at least some test cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...

2017-08-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18844 Ok makes sense. LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18851 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined ...

2017-08-04 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18851 [SPARK-21644][SQL] LocalLimit.maxRows is defined incorrectly ## What changes were proposed in this pull request? The definition of `maxRows` in `LocalLimit` operator was simply wrong. This patch

[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18844 Actually why do we need this? Can't you just add Error? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 I just checked the dependency size. They look pretty reasonable, roughly 2 MBs in total (although I do worry in the future whether ORC would bring in a lot more jars). cc @omalley any

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 Why don't we then create a separate orc module? Just copy a few of the files over? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 To the best of my knowledge almost everybody runs with Hive anyway and the vast majority of users that run ORC are Hive users. In hindsight we probably should have put most of the data source

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 @srowen that's not what I said. Almost always an explicit LGTM is preferred. There are tiny changes that might not require them, and it is up to the judgement of the committer. But those are more

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 Why are we adding this to core? Why not just the hive module? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18844 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18839: [SPARK-21634][SQL] Change OneRowRelation from a case obj...

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18839 cc @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18839: [SPARK-21634][SQL] Change OneRowRelation from a case obj...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18839 Some test on string form of the plan might fail. Let's see ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18839: [SPARK-21634][SQL] Change OneRowRelation from a c...

2017-08-03 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18839 [SPARK-21634][SQL] Change OneRowRelation from a case object to case class ## What changes were proposed in this pull request? OneRowRelation is the only plan that is a case object, which causes

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 @HyukjinKwon you weren't a committer before :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 @srowen search for "RTC vs CTR (was: Concerning Sentry...)" From Todd Lipcon: ``` I don't have incubator stats... nor do I have a good way to measure "most a

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 Actually Sean I disagree. Spark has always been review then commit from the days before it entered ASF. In a huge debate last year within the ASF on RTC vs CTR, Spark was cited as a prominent example

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 Ah OK. That's what we are discussing here. In the past it has always been an explicit "LGTM". That was defined before github had even the approval feature. Now most committers are actual

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18828 Still looking into it, but the failure is related to reuse exchange and caching. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 What's your point? You should be able to merge PR without anybody reviewing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 Yes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 It is documented: http://spark.apache.org/contributing.html It's been the convention forever and it's also good to use one way rather than multiple, so I'd prefer us just using that ... until

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 I think @srowen did it here using the new github approval: srowen approved these changes 20 hours ago @srowen might be better if we stick with the LGTM one. --- If your project is set up

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 seems fine to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18828 cc @adrian-ionescu @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18828: [SPARK-21619][SQL] Fail the execution of canonica...

2017-08-02 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18828 [SPARK-21619][SQL] Fail the execution of canonicalized plans explicitly ## What changes were proposed in this pull request? Canonicalized plans are not supposed to be executed. I ran into a case

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-08-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18805 @sitalkedia anyway you can talk to the FB team that does that one and relicense, similar to RocksDB? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-08-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18805 Our compression codec is actually completely decoupled from Hadoops, but dependency management (and licensing) can be annoying to deal with. --- If your project is set up for it, you can reply

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-08-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18805 How big is the dependency that's getting pulled in? If we are adding more compression codecs maybe we should retire some old ones, or move them into a separate package so downstream apps can

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-08-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18805 Any benchmark data? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

<    1   2   3   4   5   6   7   8   9   10   >