[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...

2018-11-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23054 BTW what does the non-primitive types look like? Do they get flattened, or is there a strict? --- - To unsubscribe, e-mail

[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...

2018-11-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23054 We should add a “legacy” flag in case somebody’s workload gets broken by this. We can remove the legacy flag in a future release

[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode

2018-11-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18784 Go for it. On Fri, Nov 16, 2018 at 6:08 AM Stavros Kontopoulos < notificati...@github.com> wrote: > @imaxxs <https://github.com/imaxxs> @rxin <https://

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23021 One thing - I would put “pandas” right after test_ so you get the natural logical grouping with sorting by file name. On Tue, Nov 13, 2018 at 4:58 PM Hyukjin Kwon wrote

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23021 Great initiative! I'd break the pandas udf one into smaller pieces too, as you suggested. We should also investigate why the runtime didn't improve

[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

2018-11-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22957 i didn't look at your new code, but is your old code safe? e.g. a project that depends on the new alias. --- - To unsubscribe, e

[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD

2018-11-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15899 Thanks for the example. I didn't even know that was possible in earlier versions. I just looked it up: looks like Scala 2.11 rewrites for comprehensions into map, filter, and flatMap

[GitHub] spark pull request #15899: [SPARK-18466] added withFilter method to RDD

2018-11-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15899#discussion_r231390266 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -387,6 +387,14 @@ abstract class RDD[T: ClassTag]( preservesPartitioning = true

[GitHub] spark issue #22889: [SPARK-25882][SQL] Added a function to join two datasets...

2018-11-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22889 Yea good idea (prefer Array over Seq for short lists) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22921 seems good to me; might want to leave this open for a few days so more people can take a look --- - To unsubscribe, e-mail

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-01 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230135473 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -62,17 +62,6 @@ class SQLContext private[sql](val sparkSession: SparkSession

[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-01 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22921#discussion_r230132632 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -639,20 +639,6 @@ private[spark] object SparkConf extends Logging

[GitHub] spark issue #22830: [SPARK-25838][ML] Remove formatVersion from Saveable

2018-10-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22830 Perhaps @jkbradley and @mengxr can comment on it. If the trait is inheritable, then protected still means it is part of the API contract

[GitHub] spark issue #22830: [SPARK-25838][ML] Remove formatVersion from Saveable

2018-10-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22830 Who introduced this? We should ask the person that introduced it whether it can be removed. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #22870: [SPARK-25862][SQL] Remove rangeBetween APIs intro...

2018-10-28 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22870 [SPARK-25862][SQL] Remove rangeBetween APIs introduced in SPARK-21608 ## What changes were proposed in this pull request? This patch removes the rangeBetween functions introduced in SPARK-21608

[GitHub] spark pull request #22853: [SPARK-25845][SQL] Fix MatchError for calendar in...

2018-10-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22853#discussion_r228608016 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala --- @@ -267,6 +267,25 @@ class DataFrameWindowFramesSuite extends

[GitHub] spark pull request #22815: [SPARK-25821][SQL] Remove SQLContext methods depr...

2018-10-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22815#discussion_r228594291 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -54,6 +54,7 @@ import org.apache.spark.sql.util.ExecutionListenerManager

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21588 Does this upgrade Hive for execution or also for metastore? Spark supports virtually all Hive metastore versions out there, and a lot of deployments do run different versions of Spark against the same

[GitHub] spark pull request #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs in...

2018-10-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22841#discussion_r228376622 --- Diff: python/pyspark/sql/window.py --- @@ -239,34 +212,27 @@ def rangeBetween(self, start, end): and "5" means the five off after t

[GitHub] spark pull request #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json...

2018-10-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22775#discussion_r228372331 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -770,8 +776,17 @@ case class SchemaOfJson

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22775 I agree it should be a literal value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs in...

2018-10-25 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22841 [SPARK-25842][SQL] Deprecate rangeBetween APIs introduced in SPARK-21608 ## What changes were proposed in this pull request? See the detailed information at https://issues.apache.org/jira/browse

spark-website git commit: Use Heilmeier Catechism for SPIP template.

2018-10-25 Thread rxin
Repository: spark-website Updated Branches: refs/heads/asf-site e4b87718d -> 005a2a0d1 Use Heilmeier Catechism for SPIP template. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/005a2a0d Tree:

[GitHub] spark issue #22821: [SPARK-25832][SQL] remove newly added map related functi...

2018-10-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22821 We seem to be splitting hairs here. Why are we providing tech preview to advanced users? Are you saying they construct expressions directly using internal APIs? I doubt that’s tech preview

[GitHub] spark issue #22815: [SPARK-25821][SQL] Remove SQLContext methods deprecated ...

2018-10-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22815 LGTM. On a related note, we should probably deprecate the entire SQLContext. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22144 @markhamstra how did you arrive at that conclusion? I said "it’s not a new regression and also somewhat eso

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22144 It’s certainly not a blocker since it’s not a new regression and also somewhat esoteric. Would be good to fix though. On Tue, Oct 23, 2018 at 8:20 AM Wenchen Fan wrote

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21157 But that would break both ipython notebooks and repl right? Pretty significant breaking change. --- - To unsubscribe, e-mail

[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-10-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22010 If this is not yet in 2.4 it shouldn’t be merged now. On Wed, Oct 10, 2018 at 10:57 AM Holden Karau wrote: > Open question: is this suitable for branch-2.4 since it preda

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21157 @superbobry which blog were you referring to? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-09-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21157 so this change would introduce a pretty big regression? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22543: [SPARK-23715][SQL][DOC] improve document for from...

2018-09-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22543#discussion_r220410457 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1018,9 +1018,20 @@ case class TimeAdd(start

[GitHub] spark issue #22521: [SPARK-24519][CORE] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIG...

2018-09-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22521 seems like our tests are really flaky --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...

2018-09-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22521 yup; just did --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22541: [SPARK-23907][SQL] Revert regr_* functions entire...

2018-09-24 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22541 [SPARK-23907][SQL] Revert regr_* functions entirely ## What changes were proposed in this pull request? This patch reverts entirely all the regr_* functions added in SPARK-23907. These were added

[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...

2018-09-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22521 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HI...

2018-09-21 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22521 [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS only once - WIP ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix

[GitHub] spark pull request #21527: [SPARK-24519] Make the threshold for highly compr...

2018-09-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21527#discussion_r219559889 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -50,7 +50,9 @@ private[spark] sealed trait MapStatus { private[spark

[GitHub] spark pull request #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsing...

2018-09-21 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22515 [SPARK-19724][SQL] allowCreatingManagedTableUsingNonemptyLocation should have legacy prefix One more legacy config to go ... You can merge this pull request into a Git repository by running

[GitHub] spark pull request #22456: [SPARK-19355][SQL] Fix variable names numberOfOut...

2018-09-20 Thread rxin
Github user rxin closed the pull request at: https://github.com/apache/spark/pull/22456 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema w...

2018-09-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22509 cc @dongjoon-hyun @MaxGekk we still need this pr don't we? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableS...

2018-09-20 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22509 [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema will be removed in Spark 3.0 See above. This should go into the 2.4 release. You can merge this pull request into a Git repository by running

[GitHub] spark issue #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy.compar...

2018-09-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22508 cc @gatorsmile who merged the original pr. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy...

2018-09-20 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22508 [SPARK-23549][SQL] Rename config spark.sql.legacy.compareDateTimestampInTimestamp ## What changes were proposed in this pull request? See title. ## How was this patch tested? Make

[GitHub] spark issue #22505: Revert "[SPARK-23715][SQL] the input of to/from_utc_time...

2018-09-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22505 lgtm - let's make sure tests pass --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22442: [SPARK-25447][SQL] Support JSON options by schema...

2018-09-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22442#discussion_r219297029 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3611,6 +3611,20 @@ object functions { */ def schema_of_json(e

[GitHub] spark pull request #22471: [SPARK-25470][SQL][Performance] Concat.eval shoul...

2018-09-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22471#discussion_r219023998 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2274,33 +2274,41 @@ case class Concat

[GitHub] spark issue #22476: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streamin...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22476 Merged in master/2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

spark git commit: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled

2018-09-19 Thread rxin
How was this patch tested? Made sure no other references to this config are in the code base: ``` > git grep "noDataMicro" sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: buildConf("spark.sql.streaming.noDataMicroBatches.enabled") ``` Closes #2

spark git commit: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled

2018-09-19 Thread rxin
How was this patch tested? Made sure no other references to this config are in the code base: ``` > git grep "noDataMicro" sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: buildConf("spark.sql.streaming.noDataMicroBatches.enabled") ``` Closes #2

[GitHub] spark issue #22475: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSc...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22475 jenkins, retest this again --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22475: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSc...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22475 done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22476: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streamin...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22476 done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21169: [SPARK-23715][SQL] the input of to/from_utc_timestamp ca...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21169 i'm actually not sure if we should do this, given impala treats timestamp as timestamp without timezone, whereas spark treats it as a utc timestamp (with timezone). these functions are super confusing

[GitHub] spark issue #22476: [SPARK-24157] spark.sql.streaming.noDataMicroBatches.ena...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22476 cc @tdas @marmbrus @jose-torres --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22476: [SPARK-24157] spark.sql.streaming.noDataMicroBatc...

2018-09-19 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22476 [SPARK-24157] spark.sql.streaming.noDataMicroBatches.enabled ## What changes were proposed in this pull request? This patch changes the config option

[GitHub] spark pull request #22475: [SPARK-4502][SQL] spark.sql.optimizer.nestedSchem...

2018-09-19 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22475 [SPARK-4502][SQL] spark.sql.optimizer.nestedSchemaPruning.enabled ## What changes were proposed in this pull request? This patch adds an "optimizer" prefix to nested sche

[GitHub] spark issue #22475: [SPARK-4502][SQL] spark.sql.optimizer.nestedSchemaPrunin...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22475 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22472: [SPARK-23173][SQL] Reverting of spark.sql.fromJsonForceN...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22472 im ok either way --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22471: [SPARK-25470][SQL][Performance] Concat.eval should use p...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22471 @ueshin can you review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Extending the concat function ...

2018-09-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r218677837 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -665,3 +667,219 @@ case class ElementAt

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19868 can somebody explain to me what the pr description has to do with missingFiles? I'm probably missing something but i feel the implementation is very different from the pr description

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16677 ok after thinking about it more, i think we should just revert all of these changes and go back to the drawing board. here's why: 1. the prs change some of the most common/core parts of spark

[GitHub] spark pull request #22456: [SPARK-19355][SQL] Fix variable names numberOfOut...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22456#discussion_r218666270 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -31,7 +31,7 @@ import org.apache.spark.util.Utils /** * Result

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218665902 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -93,25 +96,93 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #21527: [SPARK-24519] Make the threshold for highly compr...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21527#discussion_r218640616 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -50,7 +50,9 @@ private[spark] sealed trait MapStatus { private[spark

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218640368 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -93,25 +96,93 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #21527: [SPARK-24519] Make the threshold for highly compr...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21527#discussion_r218639496 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -50,7 +50,9 @@ private[spark] sealed trait MapStatus { private[spark

[GitHub] spark pull request #22459: [SPARK-23173] rename spark.sql.fromJsonForceNulla...

2018-09-18 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22459 [SPARK-23173] rename spark.sql.fromJsonForceNullableSchema ## What changes were proposed in this pull request? `spark.sql.fromJsonForceNullableSchema

[GitHub] spark issue #22459: [SPARK-23173][SQL] rename spark.sql.fromJsonForceNullabl...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22459 cc @mswit-databricks @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16677 actually looking at the design - this could cause perf regressions in some cases too right? it introduces a barrier that was previously non-existent. if the number of records to take isn't

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22344#discussion_r218633220 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -68,22 +68,42 @@ abstract class SparkStrategies extends

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22344#discussion_r218632551 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -98,7 +98,8 @@ case class LocalLimitExec(limit: Int, child: SparkPlan

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218631745 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -93,25 +96,93 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218631682 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -93,25 +96,93 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22344#discussion_r218631461 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -98,7 +98,8 @@ case class LocalLimitExec(limit: Int, child: SparkPlan

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22344#discussion_r218630599 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -98,7 +98,8 @@ case class LocalLimitExec(limit: Int, child: SparkPlan

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218630513 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala --- @@ -22,21 +22,29 @@ import scala.collection.JavaConverters

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218630488 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala --- @@ -557,11 +557,13 @@ class DataFrameAggregateSuite extends QueryTest

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218630324 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -204,6 +204,13 @@ object SQLConf { .intConf

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22344#discussion_r218629650 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -98,7 +98,8 @@ case class LocalLimitExec(limit: Int, child: SparkPlan

[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22344 guys - the whole sequence of prs for this feature are contributing a lot of cryptic code with arcane documentation everywhere. i worry a lot about the maintainability of the code that's coming in. can

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22344#discussion_r218623478 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -98,7 +98,8 @@ case class LocalLimitExec(limit: Int, child: SparkPlan

[GitHub] spark pull request #22457: [SPARK-24626] Add statistics prefix to parallelFi...

2018-09-18 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22457 [SPARK-24626] Add statistics prefix to parallelFileListingInStatsComputation ## What changes were proposed in this pull request? To be more consistent with other statistics based configs

[GitHub] spark issue #22456: [SPARK-19355][SQL] Fix variable names numberOfOutput -> ...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22456 cc @hvanhovell @cloud-fan also @viirya please don't use such cryptic variable names ... we also need to fix the documentation for the config flag - it's arcane

[GitHub] spark pull request #22456: [SPARK-19355][SQL] Fix variable names numberOfOut...

2018-09-18 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22456 [SPARK-19355][SQL] Fix variable names numberOfOutput ## What changes were proposed in this pull request? SPARK-19355 introduced a variable / method called numberOfOutput, which is a really bad

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16677 two questions about this (i just saw this from a different place): 1. is numOutput about number of records? 2. how much memory usage will be increased by, for the driver, at scale

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218614872 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -44,18 +45,23 @@ private[spark] sealed trait MapStatus { * necessary

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22395 Looks like a use case for a legacy config. On Mon, Sep 17, 2018 at 6:41 PM Wenchen Fan wrote: > To clarify, it's not following hive, but following the behavior of > pr

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22395 why are we always returning long type here? shouldn't they be the same as the left expr's type? see mysql ```mysql> create temporary table rxin_temp select 4 div 2, 123456789124 div 2, 4

[GitHub] spark pull request #22442: [SPARK-25447][SQL] Support JSON options by schema...

2018-09-17 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22442#discussion_r218250393 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3611,6 +3611,20 @@ object functions { */ def schema_of_json(e

[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...

2018-09-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21433 Yea we can add this back easily. On Tue, Sep 11, 2018 at 12:50 PM Sean Owen wrote: > Given lack of certainty, and that's this is small and easy to add back in > a differen

[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22010 Actually @holdenk is this change even correct? RDD.distinct is not key based. It is based on the value of the elements in RDD. Even if `numPartitions == partitions.length`, it doesn't mean the RDD

[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-09-08 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r216145892 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -396,7 +396,26 @@ abstract class RDD[T: ClassTag]( * Return a new RDD containing

[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...

2018-09-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22332 Thanks guys. On Thu, Sep 6, 2018 at 2:12 AM Hyukjin Kwon wrote: > Thanks, @wmellouli <https://github.com/wmellouli>. > > — > You are receiving thi

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-09-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 BTW I think this is probably SPIP-worthy. At the very least we should write a design doc on this, similar to the other docs for dsv2 sub-components. We should really think about whether it'd

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-09-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 Given the uncertainty about how this works across batch, streaming, and CP, and given we are still flushing out the main APIs, I think we should revert this, and revisit when the main APIs are done

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-31 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 I will take a look at this tomorrow, since I’m already looking at data source apis myself. Can provide opinion after another look on whether we should keep it unstable or revert

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 I'm confused by this api. Is this for streaming only? If yes, why are they not in the stream package? If not, I only found streaming implementation. Maybe I missed

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 Stuff like this merits api discussions. Not just implementation changes ... --- - To unsubscribe, e-mail: reviews-unsubscr

<    1   2   3   4   5   6   7   8   9   10   >