[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/7 Please remove the 0 semantics. IMO the zero vs negative number difference is too subtle. I only find Java String supporting that. Python doesn't

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214135400 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -229,33 +229,58 @@ case class RLike(left

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214131195 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -229,33 +229,58 @@ case class RLike(left

[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22010 Thanks for pinging. Please don't merge this until you've addressed the OOM issue. The aggregators were created to handle incoming data larger than size of memory. We should never use a Scala or Java

[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-08-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r214103667 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -396,7 +396,16 @@ abstract class RDD[T: ClassTag]( * Return a new RDD containing

[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-08-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r214103223 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -396,7 +396,16 @@ abstract class RDD[T: ClassTag]( * Return a new RDD containing

[GitHub] spark issue #22258: [SPARK-25266] Fix memory leak vulnerability in Barrier E...

2018-08-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22258 Can you remove vulnerability from the title? Otherwise it sounds like a security vulnerability here. --- - To unsubscribe, e-mail

[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r213100428 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -130,6 +130,10 @@ abstract class Optimizer

[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...

2018-08-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18447 Yea I'd probably reject this for now, until we see bigger needs for it. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

2018-08-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213026874 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString

[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...

2018-08-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212815703 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,41 @@ case class RLike(left

[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...

2018-08-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212815685 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,41 @@ case class RLike(left

[GitHub] spark issue #22185: [SPARK-25127] DataSourceV2: Remove SupportsPushDownCatal...

2018-08-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22185 cc @rdblue @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22185: [SPARK-25127] DataSourceV2: Remove SupportsPushDo...

2018-08-22 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22185 [SPARK-25127] DataSourceV2: Remove SupportsPushDownCatalystFilters ## What changes were proposed in this pull request? They depend on internal Expression APIs. Let's see how far we can get

[GitHub] spark issue #16600: [SPARK-19242][SQL] SHOW CREATE TABLE should generate new...

2018-08-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16600 Can you close this pr? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22065 Are we talking about a 0.7% margin improvement? It doesn't seem like it's worth the complexity. --- - To unsubscribe, e-mail

[GitHub] spark issue #22157: [SPARK-25126] Avoid creating Reader for all orc files

2018-08-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22157 Do we have a similar issue for Parquet? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #22160: Revert "[SPARK-24418][BUILD] Upgrade Scala to 2.11.12 an...

2018-08-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22160 Can you add to the pr description why we are reverting? Just copy paste what you had above. Thanks. --- - To unsubscribe, e-mail

[GitHub] spark issue #22134: [SPARK-25143][SQL] Support data source name mapping conf...

2018-08-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22134 I think it's premature to introduce this. The extra layer of abstraction actually makes it more difficult to reason about what's going on. We don't have that many data sources that require flexibility

[GitHub] spark issue #21944: [SPARK-24988][SQL]Add a castBySchema method which casts ...

2018-08-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21944 Thanks, Mahmoud! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21951 LGTM. On Thu, Aug 2, 2018 at 1:14 AM Xiao Li wrote: > This will simplify the code and improve the readability. We can do the > same in the other expression. >

[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21951 Why would we want to use the DSL here? Do we use it in other expressions? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21938: [SPARK-24982][SQL] UDAF resolution should not thr...

2018-07-31 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21938 [SPARK-24982][SQL] UDAF resolution should not throw AssertionError ## What changes were proposed in this pull request? When user calls anUDAF with the wrong number of arguments, Spark previously

[GitHub] spark pull request #21934: [SPARK-24951][SQL] Table valued functions should ...

2018-07-31 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21934#discussion_r206681479 --- Diff: sql/core/src/test/resources/sql-tests/results/table-valued-functions.sql.out --- @@ -83,8 +83,13 @@ select * from range(1, null) -- !query 6

[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...

2018-07-31 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21934 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...

2018-07-31 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21934 cc @gatorsmile @ericl who originally wrote this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21934: [SPARK-24951][SQL] Table valued functions should ...

2018-07-31 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21934 [SPARK-24951][SQL] Table valued functions should throw AnalysisException ## What changes were proposed in this pull request? Previously TVF resolution could throw IllegalArgumentException

[GitHub] spark issue #21932: [SPARK-24979][SQL] add AnalysisHelper#resolveOperatorsUp

2018-07-31 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21932 Do we really need this? It's almost always the case for resolution that you'd want to do bottom up, so I thought Michael's original design to just call it resolveOperators make a lot of sense

[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api

2018-07-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21923 Are there more specific use cases? I always feel it'd be impossible to design APIs without seeing couple different use cases

[GitHub] spark issue #21922: [WIP] Add an ANSI SQL parser mode

2018-07-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21922 what are the actual changes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...

2018-07-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21318 LGTM. On Fri, Jul 27, 2018 at 10:58 PM Hyukjin Kwon wrote: > @rxin <https://github.com/rxin> re: #21318 (comment) > <https://github.com/apache/s

[GitHub] spark pull request #21318: [minor] Update docs for functions.scala to make i...

2018-07-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21318#discussion_r20582 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -39,7 +39,21 @@ import org.apache.spark.util.Utils

[GitHub] spark issue #21897: [minor] Improve documentation for HiveStringType's

2018-07-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21897 cc @gatorsmile cc @hvanhovell why did we expose these types as public Scala APIs? I feel they should not have been public. If they are public, we should have more generic VarcharType

[GitHub] spark pull request #21897: [minor] Improve documentation for HiveStringType'...

2018-07-27 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21897 [minor] Improve documentation for HiveStringType's The diff should be self-explanatory. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin

[GitHub] spark pull request #21706: [SPARK-24702] Fix Unable to cast to calendar inte...

2018-07-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21706#discussion_r205851385 --- Diff: sql/core/src/test/resources/sql-tests/inputs/cast.sql --- @@ -42,4 +42,38 @@ SELECT CAST('9223372036854775808' AS long); DESC FUNCTION

[GitHub] spark issue #21896: [SPARK-24865][SQL] Remove AnalysisBarrier addendum

2018-07-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21896 cc @gatorsmile @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21896: [SPARK-24865][SQL] Remove AnalysisBarrier addendu...

2018-07-27 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21896 [SPARK-24865][SQL] Remove AnalysisBarrier addendum ## What changes were proposed in this pull request? I didn't want to pollute the diff in the previous PR and left some TODOs. This is a follow

[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...

2018-07-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21318 Yup will do. On Fri, Jul 27, 2018 at 10:23 AM Sean Owen wrote: > Just browsing old PRs .. want to finish this one up @rxin > <https://github.com/rxin>?

[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument

2018-07-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21699 I'm OK with it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21873: [SPARK-24919][BUILD] New linter rule for sparkCon...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21873#discussion_r205252848 --- Diff: scalastyle-config.xml --- @@ -150,6 +150,19 @@ This file is divided into 3 sections: // scalastyle:on println

[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21758 What's the failure mode if there are not enough slots for the barrier mode? We should throw an exception right

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205250930 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ActiveJob.scala --- @@ -60,4 +60,10 @@ private[spark] class ActiveJob( val finished

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205250352 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1839,6 +1847,20 @@ abstract class RDD[T: ClassTag]( def toJavaRDD() : JavaRDD[T

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205249547 --- Diff: core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala --- @@ -27,7 +27,8 @@ import org.apache.spark.{Partition, TaskContext} private

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205249449 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205249225 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala --- @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205249297 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala --- @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #21875: [SPARK-24288][SQL] Enable preventing predicate pushdown

2018-07-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21875 Can you add JDBC to the title? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21866: [SPARK-24768][FollowUp][SQL]Avro migration follow...

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21866#discussion_r204961291 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala --- @@ -56,7 +56,7 @@ private[avro] class AvroFileFormat extends

[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21867#discussion_r204959300 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -731,7 +731,14 @@ private[spark] class BlockManager

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204957474 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -751,7 +751,8 @@ object TypeCoercion

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 I changed the way we do the checks in test to use a thread local rather than checking the stacktrace, so they should run faster now. Also added test cases for the various new methods. Also moved

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204955869 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -787,6 +782,7 @@ class Analyzer( right

[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...

2018-07-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21845 If that's the only one I think that PR itself needs to be fixed (significantly increases test runtime), and I wouldn't increase the time here. On Mon, Jul 23, 2018 at 11:44 PM

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 Yea the extra check in test cases might've contributed to the longer test time. Let me think about how to reduce it. On Mon, Jul 23, 2018 at 11:28 PM Hyukjin Kwon wrote

[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...

2018-07-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21845 Are more pull requests failing due to time out right now? On Mon, Jul 23, 2018 at 6:30 PM Hyukjin Kwon wrote: > @rxin <https://github.com/rxin>, btw you want me close

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-23 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r204504127 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala --- @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol “||” of the “OR”...

2018-07-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21826 No we can't because you can still use string concat in filters, e.g. colA || colB == "ab" What is

[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...

2018-07-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21845 This helps, but it is not sustainable to keep increasing the threshold. What we need to do is to look at test time distribution and figure out what test suites are unnecessarily long and actually cut

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.

2018-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21802 Do we really need full codegen for all of these collection functions? They seem pretty slow and specialization with full codegen won't help perf that much (and might even hurt by blowing up the code

[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol “||” of the “OR”...

2018-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21826 cc @gatorsmile @cloud-fan @HyukjinKwon this is a good thing to do? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol “||” of the “OR”...

2018-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21826 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204163484 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -33,6 +49,116 @@ abstract class LogicalPlan

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204163424 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -23,8 +23,24 @@ import

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204163328 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2390,16 +2375,21 @@ class Analyzer( * scoping

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204160853 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -533,7 +537,8 @@ trait CheckAnalysis extends

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204160150 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -787,6 +782,7 @@ class Analyzer( right

[GitHub] spark issue #21803: [SPARK-24849][SQL] Converting a value of StructType to a...

2018-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21803 Should we do schema.toDDL, or StructType.toDDL(schema)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r203918981 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -533,7 +537,8 @@ trait CheckAnalysis extends

[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode

2018-07-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18784 Let's remove it in 3.0 then. We can do it after 2.4 release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r203496489 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #21766: [SPARK-24803][SQL] add support for numeric

2018-07-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21766 Why did you need this change? Given it's very difficult to revert the change (or introduce a proper numeric type if ever needed in the future), I would not merge this pull request unless

[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 To me it is actually confusing to have the decimal one in there at all, by defining a list of queries that are reused for different functional testing. It is very easy to just ignore the subtle

[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 What are the use cases other than decimal? I am not sure if we need to build a lot of infrastructure just for one or two use cases

[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 If they produce different results why do you need any infrastructure for them? They are just part of the normal test flow. If they produce the same result, and you don't want to define

[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 Can you just define a config matrix in the beginning of the file, and each file is run with the config matrix

[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 I think it's super confusing to have the config names encoded in file names. Makes the names super long and difficult to read, and also hard to verify what was set, and difficult to get multiple

[GitHub] spark pull request #21705: [SPARK-24727][SQL] Add a static config to control...

2018-07-03 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21705#discussion_r199940775 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala --- @@ -66,6 +66,12 @@ object StaticSQLConf { .checkValue

[GitHub] spark issue #21686: [SPARK-24709][SQL] schema_of_json() - schema inference f...

2018-07-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21686 Thanks. Awesome. This matches what I had in mind then. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21459: [SPARK-24420][Build] Upgrade ASM to 6.1 to support JDK9+

2018-07-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21459 SGTM. On Mon, Jul 2, 2018 at 4:38 PM DB Tsai wrote: > There are three approvals from the committers, and the changes are pretty > trivial to revert if we see any perfo

[GitHub] spark issue #21686: [SPARK-24709][SQL] schema_of_json() - schema inference f...

2018-07-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21686 Does this actually work in SQL? How does it work when we don't have a data type that's a schema? --- - To unsubscribe, e-mail

[GitHub] spark issue #21626: [SPARK-24642][SQL] New function infers schema for JSON c...

2018-06-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21626 It is on the public list: https://issues.apache.org/jira/browse/SPARK-24642 --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21598: [SPARK-24605][SQL] size(null) returns null instea...

2018-06-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21598#discussion_r198364343 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1324,6 +1324,12 @@ object SQLConf { "Other column v

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 OK I double checked. I don't think we should be adding this functionality, since different databases implemented it differently, and it is somewhat difficult to create Infinity in Spark SQL given we

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 Hey I have an additional thought on this. Will leave it in the next ten mins. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 Here: https://en.wikipedia.org/wiki/Bug_compatibility On Tue, Jun 26, 2018 at 9:28 AM Reynold Xin wrote: > It’s actually common software engineering practice to keep “bug

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 It’s actually common software engineering practice to keep “buggy” semantics if a bug has been out there long enough and a lot of applications depend on the semantics. On Tue, Jun

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 Do we have other "legacy" configs that we haven't released and can change to match this prefix? It's pretty nice to have a single prefix for

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 This is not a "bug" and there is no "right" behavior in APIs. It's been defined as -1 since the very beginning (when was it added?), so we can't just change the default value

[GitHub] spark issue #21544: add one supported type missing from the javadoc

2018-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21544 Thanks. Merging in master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

spark git commit: add one supported type missing from the javadoc

2018-06-15 Thread rxin
Repository: spark Updated Branches: refs/heads/master e4fee395e -> c7c0b086a add one supported type missing from the javadoc ## What changes were proposed in this pull request? The supported java.math.BigInteger type is not mentioned in the javadoc of Encoders.bean() ## How was this patch

[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 I'm confused by the description. What does this PR actually do? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21574: [SPARK-24478][SQL][followup] Move projection and filter ...

2018-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21574 Does this move actually make sense? It'd destroy stats estimation for partition pruning. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

2018-06-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21502 How does this solve the problem you described? If the container is gone, the process is gone and users can't destroy things anymore

[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2018-06-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19498 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 Thanks, Henry. In general I'm not a huge fan of adding something because hypothetically somebody might want it. Also if you want this to be compatible with Impala, wouldn't you want to name

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 @henryr 1.0/0.0 also returns null in Spark SQL ... ``` scala> sql("select cast(1.0 as double)/cast(0 as double)").show() +-+

<    1   2   3   4   5   6   7   8   9   10   >