[GitHub] spark pull request #19905: [SPARK-22710] ConfigBuilder.fallbackConf should t...

2017-12-05 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/19905 [SPARK-22710] ConfigBuilder.fallbackConf should trigger onCreate function ## What changes were proposed in this pull request? I was looking at the config code today and found that configs defined

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19468 For future pull requests, can you create subtasks under https://issues.apache.org/jira/browse/SPARK-18278 ? --- - To unsubscribe

[1/2] spark git commit: [SPARK-18278][SCHEDULER] Spark on Kubernetes - Basic Scheduler Backend

2017-11-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 475a29f11 -> e9b2070ab http://git-wip-us.apache.org/repos/asf/spark/blob/e9b2070a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19468 Thanks - merging in master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[2/2] spark git commit: [SPARK-18278][SCHEDULER] Spark on Kubernetes - Basic Scheduler Backend

2017-11-28 Thread rxin
-on-k8s.github.io/userdocs/running-on-kubernetes.html cc rxin felixcheung mateiz (shepherd) k8s-big-data SIG members & contributors: mccheah ash211 ssuchter varunkatta kimoonkim erikerlandson liyinan926 tnachen ifilonenko Author: Yinan Li <liyinan...@gmail.com> Author: foxish

[GitHub] spark pull request #11994: [SPARK-14151] Expose metrics Source and Sink inte...

2017-11-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11994#discussion_r153321690 --- Diff: core/src/main/scala/org/apache/spark/metrics/sink/Sink.scala --- @@ -17,8 +17,48 @@ package org.apache.spark.metrics.sink

[GitHub] spark issue #11994: [SPARK-14151] Expose metrics Source and Sink interface

2017-11-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/11994 Hey so my main question is whether we should expose the coda hale metric library directly. In the past, we have done this and it has come back to bite us. For example, exposing the Hadoop

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19468 I went through the changes to make sure the non-k8s changes are ok. They do look ok to me. From that perspective, LGTM

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-02 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148605519 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -50,28 +53,34 @@ /** * Creates

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-02 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148553358 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -50,28 +53,34 @@ /** * Creates

spark git commit: [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark

2017-11-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master b2463fad7 -> 41b60125b [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark ## What changes were proposed in this pull request? This PR proposes to add a link from `spark.catalog(..)` to `Catalog` and expose Catalog

[GitHub] spark issue #19596: [SPARK-22369][PYTHON][DOCS] Exposes catalog API document...

2017-11-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19596 Merging in master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-02 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148545942 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -50,28 +53,34 @@ /** * Creates

spark git commit: [SPARK-22408][SQL] RelationalGroupedDataset's distinct pivot value calculation launches unnecessary stages

2017-11-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 849b465bb -> 277b1924b [SPARK-22408][SQL] RelationalGroupedDataset's distinct pivot value calculation launches unnecessary stages ## What changes were proposed in this pull request? Adding a global limit on top of the distinct values

[GitHub] spark issue #19629: [SPARK-22408][SQL] RelationalGroupedDataset's distinct p...

2017-11-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19629 Merging in master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19623: [SPARK-22078][SQL] clarify exception behaviors fo...

2017-11-02 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148527451 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -50,28 +53,34 @@ /** * Creates

[GitHub] spark issue #19629: [SPARK-22408][SQL] RelationalGroupedDataset's distinct p...

2017-11-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19629 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

spark git commit: [MINOR] Data source v2 docs update.

2017-11-01 Thread rxin
How was this patch tested? This is a doc only change. Author: Reynold Xin <r...@databricks.com> Closes #19626 from rxin/dsv2-update. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d43e1f06 Tree: http://git-wip-us.apache.

[GitHub] spark issue #19626: [minor] Data source v2 docs update.

2017-11-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19626 Merging in master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19626: [minor] Data source v2 docs update.

2017-11-01 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/19626 [minor] Data source v2 docs update. ## What changes were proposed in this pull request? This patch includes some doc updates for data source API v2. I was reading the code and noticed some minor

[GitHub] spark issue #19626: [minor] Data source v2 docs update.

2017-11-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19626 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19596: [SPARK-22369][PYTHON][DOCS] Exposes catalog API document...

2017-10-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19596 Yea definitely. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19592: [SPARK-22347][SQL][PySpark] Support optionally running P...

2017-10-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19592 Is this complexity worth it? Can we just document it as a behavior and users need to be careful with it? --- - To unsubscribe, e

[GitHub] spark pull request #18828: [SPARK-21619][SQL] Fail the execution of canonica...

2017-10-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18828#discussion_r147544081 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SparkPlanSuite.scala --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - B...

2017-10-23 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19468#discussion_r146429113 --- Diff: pom.xml --- @@ -2649,6 +2649,13 @@ + kubernetes + +resource-managers/kubernetes/core

[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-10-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19498 cc @tdas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19535: [SPARK-22313][PYTHON] Mark/print deprecation warnings as...

2017-10-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19535 Looks good at high level. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19512: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-10-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19512 Seems fine to backport into 2.2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #19269: [SPARK-22026][SQL] data source v2 write path

2017-10-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r145579513 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19521: [SPARK-22300][BUILD] Update ORC to 1.4.1

2017-10-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19521 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19524: [SPARK-22302][INFRA] Remove manual backports for subproc...

2017-10-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19524 seems fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19419: [SPARK-22188] [CORE] Adding security headers for prevent...

2017-10-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19419 Yea in general for security features it seems like it's good to turn on them by default. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #19419: [SPARK-22188] [CORE] Adding security headers for prevent...

2017-10-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19419 Is there a reason why this cannot be always enabled? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18732 Grouped UDFs, or Grouped Vectorized UDFs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19451#discussion_r144687542 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1242,6 +1244,51 @@ object ReplaceIntersectWithSemiJoin

[GitHub] spark issue #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19451 If we have to do this all over again i'd put all rules in their own files. Replace isn't really a great high level category because all rules at some level replace something

[GitHub] spark issue #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19451 Actually you already have it in the classdoc, so please just update the pr description with it. --- - To unsubscribe, e-mail

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19451#discussion_r144461898 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1242,6 +1244,53 @@ object ReplaceIntersectWithSemiJoin

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19451#discussion_r144461913 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1242,6 +1244,53 @@ object ReplaceIntersectWithSemiJoin

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-12 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19451#discussion_r144461813 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1242,6 +1244,53 @@ object ReplaceIntersectWithSemiJoin

[GitHub] spark issue #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19451 Can you update the pr description with an example plan before / after this optimization, and also put that example in the comment section of the doc

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-10-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18805 Does the package include a binary distribution for Linux? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #6751: [SPARK-8300] DataFrame hint for broadcast join.

2017-10-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/6751 Isn't the hint available in SQL? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855][SQL] Added flatten functions ...

2017-10-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19454 Honestly I don't think it is worth doing this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18732 I'm OK with the naming. We can change them later if needed before the release. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855][SQL] Added flatten functions ...

2017-10-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19454 I actually think this can be confusing on Dataset[T], when the Dataset is just untyped and a DataFrame. Do we throw a runtime exception

[GitHub] spark issue #19454: [SPARK-22152][SPARK-18855][SQL] Added flatten functions ...

2017-10-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19454 Is this worth doing? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-07 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143340681 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -519,3 +519,18 @@ case class CoGroup

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143243338 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1213,6 +1213,71 @@ case class

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143122895 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TimestampTableTimeZone.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143122657 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -230,6 +230,13 @@ case class AlterTableSetPropertiesCommand

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143122503 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -266,6 +267,10 @@ final class DataFrameWriter[T] private[sql](ds: Dataset

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143122396 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -1015,6 +1020,10 @@ object DateTimeUtils { guess

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143122317 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1213,6 +1213,71 @@ case class

[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19394 What's the other value? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19394 Not sure - maybe print the chi-value of the test and see if they make sense. If they do, we can change the threshold

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-09-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18732 What's the difference between this one and the transform function you also proposed? I'm trying to see if all the naming makes sense when considered together

[GitHub] spark issue #19393: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...

2017-09-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19393 LGTM but I wrote most of the code so perhaps we should find somebody else to review. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-09-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18732 Is this just a mapGroups function? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

spark git commit: [SPARK-22160][SQL] Make sample points per partition (in range partitioner) configurable and bump the default value up to 100

2017-09-28 Thread rxin
sed on chi square test ... Author: Reynold Xin <r...@databricks.com> Closes #19387 from rxin/SPARK-22160. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/323806e6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3238

[GitHub] spark pull request #19387: [SPARK-22160][SQL] Make sample points per partiti...

2017-09-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19387#discussion_r141786663 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -108,11 +108,21 @@ class HashPartitioner(partitions: Int) extends Partitioner

[GitHub] spark issue #19387: [SPARK-22160][SQL] Make sample points per partition (in ...

2017-09-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19387 Merging in master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19387: [SPARK-22160][SQL] Make sample points per partition (in ...

2017-09-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19387 I put up a comment saying this test result should be deterministic, since the sampling uses a fixed seed based on partition id

[GitHub] spark pull request #19387: [SPARK-22160][SQL] Make sample points per partiti...

2017-09-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19387#discussion_r141764431 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19387: [SPARK-22160][SQL] Make sample points per partiti...

2017-09-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19387#discussion_r141764415 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19387: [SPARK-22160][SQL] Make sample points per partiti...

2017-09-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19387#discussion_r141755874 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -108,9 +108,17 @@ class HashPartitioner(partitions: Int) extends Partitioner

[GitHub] spark pull request #19387: [SPARK-22160][SQL] Allow changing sample points p...

2017-09-28 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/19387 [SPARK-22160][SQL] Allow changing sample points per partition in range shuffle exchange ## What changes were proposed in this pull request? Spark's RangePartitioner hard codes the number

[GitHub] spark issue #19384: [SPARK-22159][SQL] Make config names consistently end wi...

2017-09-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19384 I reverted the 2nd commit. Should be good for merge now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19384: [SPARK-22159][SQL] Make config names consistently end wi...

2017-09-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19384 hm the 2nd commit is not meant for this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19384: [SPARK-22159][SQL] Make config names consistently...

2017-09-28 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/19384 [SPARK-22159][SQL] Make config names consistently end with "enabled". ## What changes were proposed in this pull request? spark.sql.execution.ar

[GitHub] spark pull request #19376: [SPARK-22153][SQL] Rename ShuffleExchange -> Shuf...

2017-09-27 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/19376 [SPARK-22153][SQL] Rename ShuffleExchange -> ShuffleExchangeExec ## What changes were proposed in this pull request? For some reason when we added the Exec suffix to all physical operators,

[GitHub] spark pull request #19362: [SPARK-22141][SQL] Propagate empty relation befor...

2017-09-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19362#discussion_r141403817 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -136,6 +134,8 @@ abstract class Optimizer(sessionCatalog

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r140379971 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139889741 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Command.scala --- @@ -0,0 +1,114

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139889045 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/SupportsWriteUnsafeRow.java --- @@ -0,0 +1,44 @@ +/* + * Licensed

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r13951 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

2017-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18704 cc @michal-databricks any thoughts on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19261: [SPARK-22040] Add current_date function with timezone id

2017-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19261 What does this even mean? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19136 LGTM. Still some feedback that can be addressed later. We should also document all the APIs as Evolving

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138947707 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138947426 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Statistics.java --- @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138947297 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupportWithSchema.java --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138946124 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ReadSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138945691 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138709319 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Statistics.java --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138665881 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138624261 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/Statistics.java --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138623586 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/downward/ColumnPruningSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138622262 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138622067 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SchemaRequiredDataSourceV2.java --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138621970 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SchemaRequiredDataSourceV2.java --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138621700 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138621506 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16578 I tried this and this is definitely super useful! it's a big patch and most of the people working in this area are either doing something else that's not Spark, or working on a few high priority SPIPs

[GitHub] spark pull request #19086: [SPARK-21874][SQL] Support changing database when...

2017-09-01 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19086#discussion_r136640563 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -495,15 +495,16 @@ private[hive] class HiveClientImpl

[GitHub] spark pull request #19064: [SPARK-21848][SQL] Add trait UDFType to identify ...

2017-08-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19064#discussion_r135590939 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -23,6 +23,12 @@ import

[GitHub] spark pull request #19064: [SPARK-21848][SQL] Add trait UDFType to identify ...

2017-08-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19064#discussion_r135590830 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -23,6 +23,12 @@ import

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-08-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18906 I understand why you are using Python. What I don't understand is why you'd need to annotate nullability, because those are typically annotated for the purpose of performance improvement, but Python

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

2017-08-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18999#discussion_r135132622 --- Diff: python/pyspark/sql/dataframe.py --- @@ -659,19 +659,77 @@ def distinct(self): return DataFrame(self._jdf.distinct(), self.sql_ctx

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-08-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18906 @ptkool have you seen a real use case so far that you need this? I'm a bit surprised since Python UDFs are already pretty slow, and you'd care about this. Are there other cases you run

<    1   2   3   4   5   6   7   8   9   10   >