[GitHub] spark pull request #18159: [SPARK-20703][SQL] Associate metrics with data wr...

2017-07-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18159#discussion_r126015755 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -314,21 +339,40 @@ object FileFormatWriter extends

[GitHub] spark issue #18549: [SPARK-21323][SQL]Rename plans.logical.statsEstimation.R...

2017-07-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18549 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

spark git commit: [SPARK-21323][SQL] Rename plans.logical.statsEstimation.Range to ValueInterval

2017-07-06 Thread rxin
Repository: spark Updated Branches: refs/heads/master 48e44b24a -> bf66335ac [SPARK-21323][SQL] Rename plans.logical.statsEstimation.Range to ValueInterval ## What changes were proposed in this pull request? Rename org.apache.spark.sql.catalyst.plans.logical.statsEstimation.Range to

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126013379 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,40 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #18307: [SPARK-21100][SQL] describe should give quartiles simila...

2017-07-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18307 high level change looks good to me. @aray can you update the title / description of the PR and JIRA ticket? cc @cloud-fan can you review this to make sure the implementation

[GitHub] spark issue #18494: [SPARK-21272] SortMergeJoin LeftAnti does not update num...

2017-07-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18494 cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18307: [SPARK-21100][SQL] describe should give quartiles...

2017-06-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18307#discussion_r125146093 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2205,37 +2205,151 @@ class Dataset[T] private[sql]( * // max 92.0

[GitHub] spark pull request #18307: [SPARK-21100][SQL] describe should give quartiles...

2017-06-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18307#discussion_r125146112 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2205,37 +2205,151 @@ class Dataset[T] private[sql]( * // max 92.0

[GitHub] spark pull request #18307: [SPARK-21100][SQL] describe should give quartiles...

2017-06-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18307#discussion_r125146063 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2205,37 +2205,151 @@ class Dataset[T] private[sql]( * // max 92.0

[GitHub] spark issue #18479: WIP - logical plan stat propagation using mixin and visi...

2017-06-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18479 Funny tests actually passed. Maybe you guys can just review this. cc @gengliangwang @gatorsmile @wzhfy --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18307: [SPARK-21100][SQL] describe should give quartiles simila...

2017-06-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18307 OK then let's use summary. @aray want to do that update? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18307: [SPARK-21100][SQL] describe should give quartiles...

2017-06-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18307#discussion_r125095026 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2205,37 +2205,170 @@ class Dataset[T] private[sql]( * // max 92.0

[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...

2017-06-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18334 Can the stats be updated incrementally? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18424: [SPARK-17091] Add rule to convert IN predicate to equiva...

2017-06-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18424 Have you done actual benchmarks to validate that this is a perf improvement? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test

2017-06-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18469 Can we minimize the change by just adding this method to PlanTest? It's not that many lines of code. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #18479: WIP - stat propagation code using mixin

2017-06-30 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18479 WIP - stat propagation code using mixin ## What changes were proposed in this pull request? TBD ## How was this patch tested? Should be covered by existing test cases. You can merge

[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...

2017-06-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17935 The reason I found out about this is because the one of the widely circulated TPC-DS benchmark harness online uses this. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...

2017-06-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17935 I don't think that argument is useful at all. For example, none of the other databases support the DataFrame API. Does that mean few users will write DataFrame code? --- If your project is set up

[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...

2017-06-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17935 Other committers please revert this change until we find a solution or verify that almost no users write queries like this. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #18307: [SPARK-21100][SQL] describe should give quartiles...

2017-06-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18307#discussion_r124932359 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2205,37 +2205,170 @@ class Dataset[T] private[sql]( * // max 92.0

[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...

2017-06-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17935 Also the description / title is completely different from the JIRA ticket. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...

2017-06-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17935 Guys - isn't this API breaking? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18301 hey i didn't track super closely, but it is pretty important to show at least one more digit, e.g. 1.7, rather than just 2. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15821 In the future we should revert PRs that fail builds IMMEDIATELY. There is no way we should've let the build be broken for days. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #18429: [SPARK-21222] Move elimination of Distinct clause...

2017-06-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18429#discussion_r124457557 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -152,6 +153,19 @@ abstract class Optimizer

[GitHub] spark pull request #18429: [SPARK-21222] Move elimination of Distinct clause...

2017-06-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18429#discussion_r124455032 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -152,6 +153,19 @@ abstract class Optimizer

[GitHub] spark pull request #18429: [SPARK-21222] Move elimination of Distinct clause...

2017-06-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18429#discussion_r124455104 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -152,6 +153,19 @@ abstract class Optimizer

[GitHub] spark pull request #18429: [SPARK-21222] Move elimination of Distinct clause...

2017-06-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18429#discussion_r124452275 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -152,6 +153,19 @@ abstract class Optimizer

[GitHub] spark pull request #18429: [SPARK-21222] Move elimination of Distinct clause...

2017-06-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18429#discussion_r124177929 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateDistinceSuite.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed

[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...

2017-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18368 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18395: [SPARK-20655][core] In-memory KVStore implementation.

2017-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18395 Is this going to be exposed? Either way, we should find something like spark.util.kvstore package rather than a top level package. --- If your project is set up for it, you can reply

[GitHub] spark issue #18042: [SPARK-20817][core] Fix to return "Unknown processor" on...

2017-06-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18042 Please let's not waste more time here. I don't think the gain is worth the effort required (or even the discussions here). --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #18377: [SPARK-18016][SQL][CATALYST][BRANCH-2.2] Code Generation...

2017-06-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18377 Hm I'm not even sure if we should backport this in branch-2.2. Let's wait and see ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18387: [SPARK-21174] [SQL] Validate sampling fraction in logica...

2017-06-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18387 What about CheckAnalysis? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #18387: [SPARK-21174] [SQL] Validate sampling fraction in logica...

2017-06-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18387 hm should we do this? It'd make more sense to throw an analyzer error, rather than some deep call stack that's coming from an operator. --- If your project is set up for it, you can reply

[GitHub] spark issue #18377: [SPARK-18016][SQL][CATALYST][BRANCH-2.2] Code Generation...

2017-06-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18377 Why did we backport this? This seems too risky. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

spark git commit: [SPARK-21103][SQL] QueryPlanConstraints should be part of LogicalPlan

2017-06-20 Thread rxin
nce the constraint framework is only used for query plan rewriting and not for physical planning. ## How was this patch tested? Should be covered by existing tests, since it is a simple refactoring. Author: Reynold Xin <r...@databricks.com> Closes #18310 from rxin/SPARK-21103. Project: http:

[GitHub] spark issue #18310: [SPARK-21103][SQL] QueryPlanConstraints should be part o...

2017-06-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18310 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18343 I was talking about the classname for the internal members. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18343 It's obvious it will reduce data size with custom serialization, since the custom logic doesn't need to write the full classname out which the java default one does. I don't think Kryo knows

[GitHub] spark issue #18307: [SPARK-21100][SQL] describe should give quartiles simila...

2017-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18307 My worry is that now the default performance will be slow. Maybe this flag can be off by default? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #18310: [SPARK-21103][SQL] QueryPlanConstraints should be...

2017-06-15 Thread rxin
GitHub user rxin reopened a pull request: https://github.com/apache/spark/pull/18310 [SPARK-21103][SQL] QueryPlanConstraints should be part of LogicalPlan ## What changes were proposed in this pull request? QueryPlanConstraints should be part of LogicalPlan, rather than

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18301 also the avg probe probably shouldn't be an integer. at least we should show something like 1.9? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18301 yes but i just feel it is getting very long and verbose .. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18301 I'd shorten it to "avg hash probe". Also do we really need min, med, max? Maybe just a single global avg? --- If your project is set up for it, you can reply to this email and have your re

[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18301#discussion_r122128307 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -573,8 +586,11 @@ private[execution] final class

[GitHub] spark issue #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan and expr...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18299 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

spark git commit: [SPARK-21092][SQL] Wire SQLConf in logical plan and expressions

2017-06-14 Thread rxin
<r...@databricks.com> Closes #18299 from rxin/SPARK-21092. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fffeb6d7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fffeb6d7 Diff: http://git-wip-us.apache.org/

[GitHub] spark pull request #18310: [SPARK-21103][SQL] QueryPlanConstraints should be...

2017-06-14 Thread rxin
Github user rxin closed the pull request at: https://github.com/apache/spark/pull/18310 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #18310: [SPARK-21103][SQL] QueryPlanConstraints should be part o...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18310 Closing for now, since @sameeragarwal said it might be useful in physical planning in the future. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #18310: [SPARK-21103][SQL] QueryPlanConstraints should be part o...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18310 This current includes all the changes from https://github.com/apache/spark/pull/18299 But only the last commit matters. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #18310: [SPARK-21103][SQL] QueryPlanConstraints should be...

2017-06-14 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18310 [SPARK-21103][SQL] QueryPlanConstraints should be part of LogicalPlan ## What changes were proposed in this pull request? QueryPlanConstraints should be part of LogicalPlan, rather than QueryPlan

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18301 Can you put a screenshot of the UI up, for both join and aggregate? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18307: [SPARK-21100][SQL] describe should give quartiles simila...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18307 What's the perf impact here? My worry is that we will significantly slow down describe ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan a...

2017-06-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18299#discussion_r122072883 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala --- @@ -27,18 +27,20 @@ trait QueryPlanConstraints

[GitHub] spark issue #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan and expr...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18299 The issue is that SparkSession might change the way they are wired and it's not always the case that when we create a new thread, we set the thread local conf. --- If your project is set up

[GitHub] spark issue #18306: [SPARK-21029][SS] All StreamingQuery should be stopped w...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18306 Is this safe to do @marmbrus ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18298: [SPARK-21091][SQL] Move constraint code into QueryPlanCo...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18298 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

spark git commit: [SPARK-21091][SQL] Move constraint code into QueryPlanConstraints

2017-06-14 Thread rxin
n't litter QueryPlan with a lot of constraint private functions. ## How was this patch tested? This is a simple move refactoring and should be covered by existing tests. Author: Reynold Xin <r...@databricks.com> Closes #18298 from rxin/SPARK-21091. Project: http://git-wip-us.apache.org/

[GitHub] spark pull request #18298: [SPARK-21091][SQL] Move constraint code into Quer...

2017-06-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18298#discussion_r122008512 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed

[GitHub] spark issue #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan and expr...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18299 cc @wzhfy --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18299: Spark 21092

2017-06-14 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18299 Spark 21092 ## What changes were proposed in this pull request? It is really painful to not have configs in logical plan and expressions. We had to add all sorts of hacks (e.g. pass SQLConf

[GitHub] spark issue #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan and expr...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18299 Note that this patch is based on https://github.com/apache/spark/pull/18298. Once we merge that one the diff will become smaller. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #18298: [SPARK-21091][SQL] Move constraint code into Quer...

2017-06-14 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18298 [SPARK-21091][SQL] Move constraint code into QueryPlanConstraints ## What changes were proposed in this pull request? This patch moves constraint related code into a separate trait

[GitHub] spark pull request #18298: [SPARK-21091][SQL] Move constraint code into Quer...

2017-06-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18298#discussion_r121865658 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-06-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r121729635 --- Diff: pom.xml --- @@ -1871,6 +1872,25 @@ paranamer ${paranamer.version} + +org.apache.arrow

[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18260 Why are we doing this? Isn't it better potentially for compression to store them separately? We can also easily remove the offset for fixed length arrays. --- If your project is set up for it, you

[GitHub] spark pull request #18273: [SPARK-21059][SQL] LikeSimplification can NPE on ...

2017-06-12 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18273 [SPARK-21059][SQL] LikeSimplification can NPE on null pattern ## What changes were proposed in this pull request? This patch fixes a bug that can cause NullPointerException in LikeSimplification

[GitHub] spark issue #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange o...

2017-06-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18257 Sorry it doesn't make sense to do this. Range is used primarily for testing, and it doesn't make sense to have an optimizer rule that removes it. If there is a correctness issue in it, we should fix

[GitHub] spark issue #18258: [SPARK-20953][SQL] Add hash map metrics to aggregate

2017-06-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 That's a good idea. In that case, create a subtask on jira for this and another one for join? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 If there is no regression, I'd remove the flag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Can you run it a few more times to tell? Right now it's a difference of 7% almost --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 16.8 vs 15.8? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Can you test the perf degradation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Why would the tracking have perf impact? It's just a simple counter increase isn't it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18209: [SPARK-20992][Scheduler] Add support for Nomad as a sche...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18209 The next one to add is probably Kubernetes. Even the Spark on Kubernetes is going through this cycle of maintaining a separate project for it first. --- If your project is set up for it, you can

[GitHub] spark issue #18228: [SPARK-21007][SQL]Add SQL function - RIGHT && LEFT

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18228 Are these ANSI SQL functions? If it is just some esoteric MySQL function I don't think we should add them. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18236: [SPARK-21015] Check field name is not null and empty in ...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18236 Why do we want this check? If the user passes in null value, it is ok if it is not found, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #18252: [SPARK-17914][SQL] Fix parsing of timestamp strin...

2017-06-09 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18252#discussion_r121246583 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -32,7 +32,7 @@ import

spark git commit: [SPARK-21042][SQL] Document Dataset.union is resolution by position

2017-06-09 Thread rxin
een a confusing point for a lot of users. ## How was this patch tested? N/A - doc only change. Author: Reynold Xin <r...@databricks.com> Closes #18256 from rxin/SPARK-21042. (cherry picked from commit b78e3849b20d0d09b7146efd7ce8f203ef67b890) Signed-off-by: Reynold Xin <r...@databricks.com>

[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18256 Merging in master/branch-2.2. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

spark git commit: [SPARK-21042][SQL] Document Dataset.union is resolution by position

2017-06-09 Thread rxin
ing point for a lot of users. ## How was this patch tested? N/A - doc only change. Author: Reynold Xin <r...@databricks.com> Closes #18256 from rxin/SPARK-21042. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b78e

[GitHub] spark pull request #18256: [SPARK-21042][SQL] Document Dataset.union is reso...

2017-06-09 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18256 [SPARK-21042][SQL] Document Dataset.union is resolution by position ## What changes were proposed in this pull request? Document Dataset.union is resolution by position, not by name, since

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18142 Guys - please in the future separate bug fixes with refactoring. Don't mix a bunch of cosmetic changes with actual bug fixes together. --- If your project is set up for it, you can reply

[GitHub] spark pull request #18113: [SPARK-20890][SQL] Added min and max typed aggreg...

2017-06-08 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18113#discussion_r121025561 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/typedaggregators.scala --- @@ -26,43 +26,64 @@ import

spark git commit: [SPARK-20854][TESTS] Removing duplicate test case

2017-06-06 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.2 421d8ecb8 -> 3f93d076b [SPARK-20854][TESTS] Removing duplicate test case ## What changes were proposed in this pull request? Removed a duplicate case in "SPARK-20854: select hint syntax with expressions" ## How was this patch tested?

spark git commit: [SPARK-20854][TESTS] Removing duplicate test case

2017-06-06 Thread rxin
Repository: spark Updated Branches: refs/heads/master c92949ac2 -> cb83ca143 [SPARK-20854][TESTS] Removing duplicate test case ## What changes were proposed in this pull request? Removed a duplicate case in "SPARK-20854: select hint syntax with expressions" ## How was this patch tested?

[GitHub] spark issue #18217: [SPARK-20854][TESTS] Removing duplicate test case

2017-06-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18217 Merging in master/branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18221: [SPARK-20655][core] In-memory KVStore implementation.

2017-06-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18221 Question: why are these files written in Java? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...

2017-06-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18207 OK great then we have officially deprecated it, haven't we? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...

2017-06-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18207 I believe we still support Python 2.6, given Jenkins runs 2.6... There seems to be no point in removing that support this late in the release cycle. --- If your project is set up for it, you can

[GitHub] spark issue #18202: [SPARK-20980] [SQL] Rename `wholeFile` to `multiLine` fo...

2017-06-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18202 Wouldn't this break compatibility? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18189: [SPARK-20972][SQL] rename HintInfo.isBroadcastable to fo...

2017-06-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18189 But isn't it in a hint? If you are worried about user, I'd just change it to "broadcast". --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...

2017-06-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18159 hmm anyway to shorten the change? this change is a bit too big for metrics ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #18159: [SPARK-20703][SQL] Associate metrics with data wr...

2017-06-03 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18159#discussion_r119995109 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala --- @@ -17,38 +17,97 @@ package

[GitHub] spark issue #18189: [SPARK-20972][SQL] rename HintInfo.isBroadcastable to fo...

2017-06-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18189 tbh the difference is so small that i don't think it is worth spending time here ... as pointed out it is not forceBroadcast either. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #18086: [SPARK-20854][SQL] Extend hint syntax to support ...

2017-05-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18086#discussion_r119271262 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -533,13 +533,16 @@ class AstBuilder(conf: SQLConf) extends

[GitHub] spark issue #16598: [SPARK-19236][Core] Added createOrReplaceGlobalTempView ...

2017-05-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16598 @gatorsmile this didn't run any tests!!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

spark git commit: [SPARK-8184][SQL] Add additional function description for weekofyear

2017-05-29 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.2 26640a269 -> 3b79e4cda [SPARK-8184][SQL] Add additional function description for weekofyear ## What changes were proposed in this pull request? Add additional function description for weekofyear. ## How was this patch tested?

<    2   3   4   5   6   7   8   9   10   11   >