[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23207 ```var writer: ShuffleWriter[Any, Any] = null try { val manager = SparkEnv.get.shuffleManager writer = manager.getWriter[Any, Any]( dep.shuffleHandle

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239308829 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -170,13 +172,23 @@ class SQLMetricsSuite extends

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239308706 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -95,3 +96,59 @@ private[spark] object

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239308197 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -95,3 +96,59 @@ private[spark] object

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239308082 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -38,12 +38,18 @@ case class CollectLimitExec(limit: Int, child: SparkPlan

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239308007 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -38,12 +38,18 @@ case class CollectLimitExec(limit: Int, child: SparkPlan

[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23207 @xuanyuanking can you separate the prs to rename read side metric and the write side change? --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238845399 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -299,12 +312,25 @@ class SQLMetricsSuite extends

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238845029 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -170,13 +172,23 @@ class SQLMetricsSuite extends

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238843017 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -163,6 +171,8 @@ object SQLMetrics

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238842276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -78,6 +78,7 @@ object SQLMetrics { private val

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238837000 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -50,3 +50,57 @@ private[spark] trait ShuffleWriteMetricsReporter { private

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238836448 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -50,3 +50,57 @@ private[spark] trait ShuffleWriteMetricsReporter { private

[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-12-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23171 Basically logically there are only two expressions: In which handles arbitrary expressions, and InSet which handles expressions with literals. Both could work: (1) we provide two separate expressions

[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-12-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23171 I thought InSwitch logically is the same as InSet, in which all the child expressions are literals? On Mon, Dec 03, 2018 at 8:38 PM, Wenchen Fan < notificati...@github.com >

[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-12-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23171 That probably means we should just optimize InSet to have the switch version though? Rather than do it in In? On Mon, Dec 03, 2018 at 8:20 PM, Wenchen Fan < notificati...@github.com >

[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-12-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23171 I'm not a big fan of making the physical implementation of an expression very different depending on the situation. Why can't we just make InSet efficient and convert these cases

[GitHub] spark issue #23192: [SPARK-26241][SQL] Add queryId to IncrementalExecution

2018-12-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23192 Thanks @HyukjinKwon. Fixed it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23193: [SPARK-26226][SQL] Track optimization phase for s...

2018-11-30 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23193 [SPARK-26226][SQL] Track optimization phase for streaming queries ## What changes were proposed in this pull request? In an earlier PR, we missed measuring the optimization phase time

[GitHub] spark issue #23193: [SPARK-26226][SQL] Track optimization phase for streamin...

2018-11-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23193 cc @gatorsmile @jose-torres --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23192: [SPARK-26221][SQL] Add queryId to IncrementalExecution

2018-11-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23192 cc @zsxwing @jose-torres --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23192: [SPARK-26221][SQL] Add queryId to IncrementalExec...

2018-11-30 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23192 [SPARK-26221][SQL] Add queryId to IncrementalExecution ## What changes were proposed in this pull request? This is a small change for better debugging: to pass query uuid in IncrementalExecution

[GitHub] spark pull request #23183: [SPARK-26226][SQL] Update query tracker to report...

2018-11-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23183#discussion_r238019351 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala --- @@ -51,6 +58,18 @@ object QueryPlanningTracker

[GitHub] spark issue #23183: [SPARK-26226][SQL] Update query tracker to report timeli...

2018-11-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23183 cc @hvanhovell @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23183: [SPARK-26226][SQL] Update query tracker to report...

2018-11-29 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23183 [SPARK-26226][SQL] Update query tracker to report timeline for phases ## What changes were proposed in this pull request? This patch changes the query plan tracker added earlier to report phase

[GitHub] spark issue #23175: [SPARK-26142]followup: Move sql shuffle read metrics rel...

2018-11-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23175 LGTM - merged in master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23178: [SPARK-26216][SQL] Do not use case class as public API (...

2018-11-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23178 Good idea to have it sealed! > On Nov 29, 2018, at 7:04 AM, Sean Owen wrote: > > @srowen commented on this pull request. > > In sql/core/src/main/scala/org/a

[GitHub] spark issue #23128: [SPARK-26142][SQL] Implement shuffle read metrics in SQL

2018-11-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23128 @xuanyuanking @cloud-fan when you think about where to put each code block, make sure you also think about future evolution of the codebase. In general put relevant things closer to each other (e.g

[GitHub] spark pull request #23128: [SPARK-26142][SQL] Implement shuffle read metrics...

2018-11-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23128#discussion_r237129249 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -82,6 +82,14 @@ object SQLMetrics { private val

[GitHub] spark pull request #23128: [SPARK-26142][SQL] Implement shuffle read metrics...

2018-11-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23128#discussion_r237128247 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed

[GitHub] spark pull request #23128: [SPARK-26142][SQL] Implement shuffle read metrics...

2018-11-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23128#discussion_r237128189 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -194,4 +202,16 @@ object SQLMetrics

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236845375 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -38,7 +38,7 @@ import org.apache.spark.sql.execution.datasources.jdbc

[GitHub] spark issue #23106: [SPARK-26141] Enable custom metrics implementation in sh...

2018-11-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23106 Merging in master. Thanks @squito. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236492408 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #23106: [SPARK-26141] Enable custom metrics implementatio...

2018-11-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23106#discussion_r236432889 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -242,8 +243,13 @@ private void writeSortedFile(boolean isLastFile

[GitHub] spark issue #23147: [SPARK-26140] followup: rename ShuffleMetricsReporter

2018-11-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23147 cc @gatorsmile @xuanyuanking @cloud-fan I misunderstood your comment. Finally saw it today when I was looking at my other PR

[GitHub] spark pull request #23147: [SPARK-26140] followup: rename ShuffleMetricsRepo...

2018-11-26 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23147 [SPARK-26140] followup: rename ShuffleMetricsReporter ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/23105, due to working on two parallel PRs at once

[GitHub] spark pull request #23135: [SPARK-26168][SQL] Update the code comments in Ex...

2018-11-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23135#discussion_r236089467 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -575,6 +575,19 @@ case class Range

[GitHub] spark pull request #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23131#discussion_r236052557 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1852,6 +1852,19 @@ class Dataset[T] private[sql]( CombineUnions(Union

[GitHub] spark issue #23129: [MINOR] Update all DOI links to preferred resolver

2018-11-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23129 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23128: [SPARK-26142][SQL] Support passing shuffle metric...

2018-11-23 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23128#discussion_r236025838 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed

[GitHub] spark pull request #23128: [SPARK-26142][SQL] Support passing shuffle metric...

2018-11-23 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23128#discussion_r236025817 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed

[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...

2018-11-23 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23105#discussion_r236020103 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...

2018-11-23 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23105#discussion_r235950427 --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala --- @@ -48,7 +48,8 @@ private[spark] trait ShuffleManager { handle

[GitHub] spark issue #23110: [SPARK-26129] Followup - edge behavior for QueryPlanning...

2018-11-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23110 cc @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #23110: [SPARK-26129] Followup - edge behavior for QueryP...

2018-11-21 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23110 [SPARK-26129] Followup - edge behavior for QueryPlanningTracker.topRulesByTime ## What changes were proposed in this pull request? This is an addendum patch for SPARK-26129 that defines the edge

[GitHub] spark pull request #23106: [SPARK-26141] Enable custom shuffle metrics imple...

2018-11-21 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23106 [SPARK-26141] Enable custom shuffle metrics implementation in shuffle write ## What changes were proposed in this pull request? This is the write side counterpart to https://github.com/apache

[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

2018-11-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23105 cc @jiangxb1987 @squito --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

2018-11-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23096 Merging this. Feel free to leave more comments. I'm hoping we can wire this into the UI eventually. --- - To unsubscribe, e-mail

[GitHub] spark pull request #23105: [SPARK-26140] Enable passing in a custom shuffle ...

2018-11-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23105#discussion_r235420647 --- Diff: core/src/main/scala/org/apache/spark/executor/ShuffleReadMetrics.scala --- @@ -122,34 +123,3 @@ class ShuffleReadMetrics private[spark] () extends

[GitHub] spark pull request #23105: [SPARK-26140] Pull TempShuffleReadMetrics creatio...

2018-11-21 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23105 [SPARK-26140] Pull TempShuffleReadMetrics creation out of shuffle reader ## What changes were proposed in this pull request? This patch defines an internal Spark interface for reporting shuffle

[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

2018-11-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23096#discussion_r235309483 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -648,7 +648,11 @@ class SparkSession private( * @since 2.0.0

[GitHub] spark issue #23100: [WIP][SPARK-26133][ML] Remove deprecated OneHotEncoder a...

2018-11-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23100 Change of this type can really piss some people off. Was there consensus on this? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

2018-11-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23096#discussion_r235182105 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala --- @@ -88,15 +101,20 @@ abstract class RuleExecutor[TreeType

[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

2018-11-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23096#discussion_r235162047 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala --- @@ -88,15 +92,18 @@ abstract class RuleExecutor[TreeType

[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

2018-11-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23096#discussion_r235161825 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -696,7 +701,7 @@ class Analyzer( s

[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

2018-11-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23096#discussion_r235161336 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

2018-11-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23096 cc @hvanhovell @gatorsmile This is different from the existing metrics for rules as it is query specific. We might want to replace that one with this in the future

[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for query plan...

2018-11-20 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23096 [SPARK-26129][SQL] Instrumentation for query planning time ## What changes were proposed in this pull request? We currently don't have good visibility into query planning time (analysis vs

[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of non-struct ty...

2018-11-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23054#discussion_r234569150 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1594,6 +1594,15 @@ object SQLConf { "WHERE, which

[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...

2018-11-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23054 BTW what does the non-primitive types look like? Do they get flattened, or is there a strict? --- - To unsubscribe, e-mail

[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...

2018-11-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23054 We should add a “legacy” flag in case somebody’s workload gets broken by this. We can remove the legacy flag in a future release

[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode

2018-11-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18784 Go for it. On Fri, Nov 16, 2018 at 6:08 AM Stavros Kontopoulos < notificati...@github.com> wrote: > @imaxxs <https://github.com/imaxxs> @rxin <https://

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23021 One thing - I would put “pandas” right after test_ so you get the natural logical grouping with sorting by file name. On Tue, Nov 13, 2018 at 4:58 PM Hyukjin Kwon wrote

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23021 Great initiative! I'd break the pandas udf one into smaller pieces too, as you suggested. We should also investigate why the runtime didn't improve

[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

2018-11-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22957 i didn't look at your new code, but is your old code safe? e.g. a project that depends on the new alias. --- - To unsubscribe, e

[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD

2018-11-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15899 Thanks for the example. I didn't even know that was possible in earlier versions. I just looked it up: looks like Scala 2.11 rewrites for comprehensions into map, filter, and flatMap

[GitHub] spark pull request #15899: [SPARK-18466] added withFilter method to RDD

2018-11-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15899#discussion_r231390266 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -387,6 +387,14 @@ abstract class RDD[T: ClassTag]( preservesPartitioning = true

[GitHub] spark issue #22889: [SPARK-25882][SQL] Added a function to join two datasets...

2018-11-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22889 Yea good idea (prefer Array over Seq for short lists) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22921 seems good to me; might want to leave this open for a few days so more people can take a look --- - To unsubscribe, e-mail

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-01 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230135473 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -62,17 +62,6 @@ class SQLContext private[sql](val sparkSession: SparkSession

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-01 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230132632 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -639,20 +639,6 @@ private[spark] object SparkConf extends Logging

[GitHub] spark issue #22830: [SPARK-25838][ML] Remove formatVersion from Saveable

2018-10-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22830 Perhaps @jkbradley and @mengxr can comment on it. If the trait is inheritable, then protected still means it is part of the API contract

[GitHub] spark issue #22830: [SPARK-25838][ML] Remove formatVersion from Saveable

2018-10-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22830 Who introduced this? We should ask the person that introduced it whether it can be removed. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #22870: [SPARK-25862][SQL] Remove rangeBetween APIs intro...

2018-10-28 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22870 [SPARK-25862][SQL] Remove rangeBetween APIs introduced in SPARK-21608 ## What changes were proposed in this pull request? This patch removes the rangeBetween functions introduced in SPARK-21608

[GitHub] spark pull request #22853: [SPARK-25845][SQL] Fix MatchError for calendar in...

2018-10-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22853#discussion_r228608016 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala --- @@ -267,6 +267,25 @@ class DataFrameWindowFramesSuite extends

[GitHub] spark pull request #22815: [SPARK-25821][SQL] Remove SQLContext methods depr...

2018-10-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22815#discussion_r228594291 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -54,6 +54,7 @@ import org.apache.spark.sql.util.ExecutionListenerManager

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21588 Does this upgrade Hive for execution or also for metastore? Spark supports virtually all Hive metastore versions out there, and a lot of deployments do run different versions of Spark against the same

[GitHub] spark pull request #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs in...

2018-10-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22841#discussion_r228376622 --- Diff: python/pyspark/sql/window.py --- @@ -239,34 +212,27 @@ def rangeBetween(self, start, end): and "5" means the five off after t

[GitHub] spark pull request #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json...

2018-10-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22775#discussion_r228372331 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -770,8 +776,17 @@ case class SchemaOfJson

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22775 I agree it should be a literal value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs in...

2018-10-25 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22841 [SPARK-25842][SQL] Deprecate rangeBetween APIs introduced in SPARK-21608 ## What changes were proposed in this pull request? See the detailed information at https://issues.apache.org/jira/browse

[GitHub] spark issue #22821: [SPARK-25832][SQL] remove newly added map related functi...

2018-10-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22821 We seem to be splitting hairs here. Why are we providing tech preview to advanced users? Are you saying they construct expressions directly using internal APIs? I doubt that’s tech preview

[GitHub] spark issue #22815: [SPARK-25821][SQL] Remove SQLContext methods deprecated ...

2018-10-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22815 LGTM. On a related note, we should probably deprecate the entire SQLContext. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22144 @markhamstra how did you arrive at that conclusion? I said "it’s not a new regression and also somewhat eso

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22144 It’s certainly not a blocker since it’s not a new regression and also somewhat esoteric. Would be good to fix though. On Tue, Oct 23, 2018 at 8:20 AM Wenchen Fan wrote

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21157 But that would break both ipython notebooks and repl right? Pretty significant breaking change. --- - To unsubscribe, e-mail

[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-10-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22010 If this is not yet in 2.4 it shouldn’t be merged now. On Wed, Oct 10, 2018 at 10:57 AM Holden Karau wrote: > Open question: is this suitable for branch-2.4 since it preda

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21157 @superbobry which blog were you referring to? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21157 so this change would introduce a pretty big regression? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22543: [SPARK-23715][SQL][DOC] improve document for from...

2018-09-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22543#discussion_r220410457 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1018,9 +1018,20 @@ case class TimeAdd(start

[GitHub] spark issue #22521: [SPARK-24519][CORE] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIG...

2018-09-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22521 seems like our tests are really flaky --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...

2018-09-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22521 yup; just did --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22541: [SPARK-23907][SQL] Revert regr_* functions entire...

2018-09-24 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22541 [SPARK-23907][SQL] Revert regr_* functions entirely ## What changes were proposed in this pull request? This patch reverts entirely all the regr_* functions added in SPARK-23907. These were added

[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...

2018-09-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22521 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HI...

2018-09-21 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22521 [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS only once - WIP ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix

[GitHub] spark pull request #21527: [SPARK-24519] Make the threshold for highly compr...

2018-09-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21527#discussion_r219559889 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -50,7 +50,9 @@ private[spark] sealed trait MapStatus { private[spark

[GitHub] spark pull request #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsing...

2018-09-21 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22515 [SPARK-19724][SQL] allowCreatingManagedTableUsingNonemptyLocation should have legacy prefix One more legacy config to go ... You can merge this pull request into a Git repository by running

[GitHub] spark pull request #22456: [SPARK-19355][SQL] Fix variable names numberOfOut...

2018-09-20 Thread rxin
Github user rxin closed the pull request at: https://github.com/apache/spark/pull/22456 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema w...

2018-09-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22509 cc @dongjoon-hyun @MaxGekk we still need this pr don't we? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

  1   2   3   4   5   6   7   8   9   10   >