[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22688 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22674 hmm, seems it failed at the same test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22688 Seems the same test failed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22687: [SPARK-25702][SQL] Push down filters with `Not` operator...

2018-10-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22687 I prefer not to add code that will not run. Let's see others options too. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-10-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 @sujith71955 Thanks. I see. The case is somehow different with the problem this PR wants to solve. But I think it is a reasonable use case. May you want to create a ticket for us to track

[GitHub] spark issue #22687: [SPARK-25702][SQL] Push down filters with `Not` operator...

2018-10-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22687 Won't such predicates be simplified at `BooleanSimplification` rule? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-10-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 @sujith71955 For `executeTake`, to optimize it we need to collect statistics of RDD. `executeTake` incrementally scans partitions. Ideally, it should just scan few partitions to return `n` rows

[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...

2018-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22684#discussion_r224005596 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala --- @@ -382,5 +382,40 @@ class OrcFilterSuite extends

[GitHub] spark pull request #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Supp...

2018-10-09 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20999#discussion_r223703615 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -521,35 +521,112 @@ case class AlterTableRenamePartitionCommand

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-10-09 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 Thanks. I've updated the PR description to summarize encoding of `Product` and `Option[Product]`. --- - To unsubscribe, e-mail

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-10-09 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 > Option[Product] is the same as Product except at top level? Yeah, at top level, `Option[Product]` was not supported. After this change, it is a struct type col

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-10-09 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 > For Product, normally it should be a struct type column, the special case I can think of is: 1) at root level, flatten it into multiple columns, and the Product can't be null. > For

[GitHub] spark pull request #22653: [SPARK-25659][PYTHON][TEST] Test type inference s...

2018-10-08 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22653#discussion_r223534303 --- Diff: python/pyspark/sql/tests.py --- @@ -1149,6 +1149,75 @@ def test_infer_schema(self): result = self.spark.sql("SELECT l[0].a from

[GitHub] spark issue #22653: [SPARK-25659][PYTHON][TEST] Test type inference specific...

2018-10-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22653 Sorry for late. @HyukjinKwon Yes, this LGTM. It is great we can follow up other issues like None and datetime.time in other JIRA

[GitHub] spark issue #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting deserializ...

2018-10-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22635 @cloud-fan @gatorsmile @HyukjinKwon Thanks. Yes. As Pandas UDF has the same issue and it is fixed by this PR

[GitHub] spark pull request #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Supp...

2018-10-08 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20999#discussion_r223414695 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1015,6 +1036,23 @@ class AstBuilder(conf: SQLConf

[GitHub] spark pull request #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Supp...

2018-10-08 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20999#discussion_r223415516 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -521,35 +521,112 @@ case class AlterTableRenamePartitionCommand

[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

2018-10-08 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22630#discussion_r223318798 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -362,8 +362,15 @@ trait CodegenSupport extends

[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22630 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting des...

2018-10-07 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22635#discussion_r223251196 --- Diff: python/pyspark/accumulators.py --- @@ -109,10 +109,14 @@ def _deserialize_accumulator(aid, zero_value, accum_param): from

[GitHub] spark issue #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting deserializ...

2018-10-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22635 Thanks @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting des...

2018-10-07 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22635#discussion_r223251175 --- Diff: python/pyspark/sql/tests.py --- @@ -3603,6 +3603,31 @@ def test_repr_behaviors(self): self.assertEquals(None, df

[GitHub] spark pull request #22610: [SPARK-25461][PySpark][SQL] Add document for mism...

2018-10-07 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22610#discussion_r223217704 --- Diff: python/pyspark/sql/functions.py --- @@ -2909,6 +2909,12 @@ def pandas_udf(f=None, returnType=None, functionType=None): can fail

[GitHub] spark pull request #22655: [SPARK-25666][PYTHON] Internally document type co...

2018-10-07 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22655#discussion_r223214617 --- Diff: python/pyspark/sql/functions.py --- @@ -2733,6 +2733,33 @@ def udf(f=None, returnType=StringType()): | 8| JOHN DOE

[GitHub] spark pull request #22655: [SPARK-25666][PYTHON] Internally document type co...

2018-10-07 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22655#discussion_r223214749 --- Diff: python/pyspark/sql/functions.py --- @@ -2733,6 +2733,33 @@ def udf(f=None, returnType=StringType()): | 8| JOHN DOE

[GitHub] spark pull request #22655: [SPARK-25666][PYTHON] Internally document type co...

2018-10-07 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22655#discussion_r223214580 --- Diff: python/pyspark/sql/functions.py --- @@ -2733,6 +2733,33 @@ def udf(f=None, returnType=StringType()): | 8| JOHN DOE

[GitHub] spark pull request #22655: [SPARK-25666][PYTHON] Internally document type co...

2018-10-07 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22655#discussion_r223214510 --- Diff: python/pyspark/sql/functions.py --- @@ -2733,6 +2733,33 @@ def udf(f=None, returnType=StringType()): | 8| JOHN DOE

[GitHub] spark pull request #22646: [SPARK-25654][SQL] Support for nested JavaBean ar...

2018-10-07 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22646#discussion_r223214350 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1098,12 +1099,19 @@ object SQLContext { data: Iterator

[GitHub] spark issue #22662: [SPARK-25627][TEST] Reduce test time for ContinuousStres...

2018-10-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22662 Not sure if the modified triggers/epochs are enough for this test. cc @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22662: [SPARK-25627][TEST] Reduce test time for Continuo...

2018-10-07 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/22662 [SPARK-25627][TEST] Reduce test time for ContinuousStressSuite ## What changes were proposed in this pull request? This goes to reduce test time for ContinuousStressSuite - from 8 mins 13

[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22630 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22655 Thanks for pinging me. I'll look into this this tonight or tomorrow. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning wh...

2018-10-06 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22610#discussion_r223177785 --- Diff: python/pyspark/sql/functions.py --- @@ -2909,6 +2909,11 @@ def pandas_udf(f=None, returnType=None, functionType=None): can fail

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-10-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilter shou...

2018-10-05 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22614 The PR description and title may need to change accordingly. Can you update it? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22646: [SPARK-25654][SQL] Support for nested JavaBean arrays, l...

2018-10-05 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22646 The `createDataFrame` API for Java Beans doesn't have clear document about what JavaBeans are supportd. Can you also update it to explicitly document

[GitHub] spark pull request #22646: [SPARK-25654][SQL] Support for nested JavaBean ar...

2018-10-05 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22646#discussion_r223169392 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1115,8 +1123,31 @@ object SQLContext

[GitHub] spark pull request #22646: [SPARK-25654][SQL] Support for nested JavaBean ar...

2018-10-05 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22646#discussion_r223168936 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1098,12 +1099,19 @@ object SQLContext { data: Iterator

[GitHub] spark pull request #22646: [SPARK-25654][SQL] Support for nested JavaBean ar...

2018-10-05 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22646#discussion_r223168881 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1098,12 +1099,19 @@ object SQLContext { data: Iterator

[GitHub] spark issue #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting deserializ...

2018-10-05 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22635 Since this is for correctness, I think we should include this into 2.4 if it can catch up. cc @cloud-fan --- - To unsubscribe

[GitHub] spark pull request #22524: [SPARK-25497][SQL] Limit operation within whole s...

2018-10-05 Thread viirya
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/22524 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22524: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-05 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22524 In favor of #22630, so close this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

2018-10-05 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22630#discussion_r223002050 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala --- @@ -166,7 +166,7 @@ private[sql] trait ColumnarBatchScan

[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

2018-10-05 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22630#discussion_r222991883 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -46,6 +46,15 @@ case class CollectLimitExec(limit: Int, child

[GitHub] spark pull request #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting des...

2018-10-05 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22635#discussion_r222905059 --- Diff: python/pyspark/accumulators.py --- @@ -109,10 +109,14 @@ def _deserialize_accumulator(aid, zero_value, accum_param): from

[GitHub] spark pull request #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting des...

2018-10-04 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22635#discussion_r222892890 --- Diff: python/pyspark/accumulators.py --- @@ -109,10 +109,14 @@ def _deserialize_accumulator(aid, zero_value, accum_param): from

[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

2018-10-04 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22622#discussion_r222871049 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -182,4 +182,12 @@ class HiveOrcSourceSuite extends OrcSuite

[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

2018-10-04 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22622#discussion_r222865396 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -182,4 +182,12 @@ class HiveOrcSourceSuite extends OrcSuite

[GitHub] spark issue #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting deserializ...

2018-10-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22635 cc @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting des...

2018-10-04 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/22635 [SPARK-25591][PySpark][SQL] Avoid overwriting deserialized accumulator ## What changes were proposed in this pull request? If we use accumulators in more than one UDFs, it is possible

[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

2018-10-04 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22630#discussion_r222855008 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -345,6 +345,16 @@ trait CodegenSupport extends

[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22630 This is an interesting change. I like this idea. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

2018-10-04 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22630#discussion_r222728210 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -159,6 +159,10 @@ case class HashAggregateExec

[GitHub] spark pull request #22615: [SPARK-25016][BUILD][CORE] Remove support for Had...

2018-10-04 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22615#discussion_r222695007 --- Diff: hadoop-cloud/pom.xml --- @@ -166,45 +166,35 @@ httpcore ${hadoop.deps.scope} + + org.apache.hadoop

[GitHub] spark pull request #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning wh...

2018-10-04 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22610#discussion_r222617651 --- Diff: python/pyspark/worker.py --- @@ -84,13 +84,36 @@ def wrap_scalar_pandas_udf(f, return_type): arrow_return_type = to_arrow_type(return_type

[GitHub] spark issue #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning when retu...

2018-10-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22610 So I've added a bit document for this. @HyukjinKwon @BryanCutler please check it when you have time. --- - To unsubscribe, e

[GitHub] spark pull request #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning wh...

2018-10-04 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22610#discussion_r222616380 --- Diff: python/pyspark/worker.py --- @@ -84,13 +84,36 @@ def wrap_scalar_pandas_udf(f, return_type): arrow_return_type = to_arrow_type(return_type

[GitHub] spark issue #22621: [SPARK-25602][SQL] SparkPlan.getByteArrayRdd should not ...

2018-10-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22621 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning when retu...

2018-10-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22610 Btw, I checked our `_minimum_pyarrow_version` is 0.8.0, so seems like even there is next upgrade available, for users with pyarrow versions before 0.11.0, this is still an potential issue. Isn't

[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

2018-10-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22620 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning when retu...

2018-10-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22610 I think it is more reasonable to use the option when converting from Pandas to raise an error for unsafe casts. It should be better than to display warning message. Not sure how long before

[GitHub] spark issue #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning when retu...

2018-10-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22610 > Yeah, it's part of pyarrow.Array now, but will only be in the 0.11.0 release so we would have to do it after the next upgrade. Then I think we can wait for next upgrade to use this feat

[GitHub] spark issue #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning when retu...

2018-10-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22610 Thanks @BryanCutler! Looks like an useful option. It is pretty new one, is it said we need to upgrade to latest PyArrow in order to use it? Since it is an option at `Table.from_pandas

[GitHub] spark issue #22621: [SPARK-25602][SQL] range metrics can be wrong if the res...

2018-10-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22621 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22621: [SPARK-25602][SQL] range metrics can be wrong if ...

2018-10-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22621#discussion_r222368401 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -517,4 +517,93 @@ class SQLMetricsSuite extends

[GitHub] spark pull request #22621: [SPARK-25602][SQL] range metrics can be wrong if ...

2018-10-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22621#discussion_r222354699 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -397,7 +397,7 @@ case class RangeExec(range

[GitHub] spark pull request #22621: [SPARK-25602][SQL] range metrics can be wrong if ...

2018-10-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22621#discussion_r222351322 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -453,45 +453,89 @@ case class RangeExec(range

[GitHub] spark issue #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning when retu...

2018-10-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22610 @HyukjinKwon Thanks! I agree that having document of this is definitely useful. I will try to add it and let's see if it is ok for you. I think it is good to mention that users

[GitHub] spark issue #22602: [SPARK-25538][SQL] Zero-out all bytes when writing decim...

2018-10-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22602 two minor comments. LGTM and good catch! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22602: [SPARK-25538][SQL] Zero-out all bytes when writin...

2018-10-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22602#discussion_r41581 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriterSuite.scala --- @@ -0,0 +1,48

[GitHub] spark pull request #22602: [SPARK-25538][SQL] Zero-out all bytes when writin...

2018-10-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22602#discussion_r41193 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriterSuite.scala --- @@ -0,0 +1,48

[GitHub] spark pull request #10989: [SPARK-12798] [SQL] generated BroadcastHashJoin

2018-10-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10989#discussion_r35960 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala --- @@ -117,6 +120,87 @@ case class BroadcastHashJoin

[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r00361 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object

[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r222198383 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object

[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

2018-10-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22619#discussion_r222198257 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -172,51 +172,35 @@ private[csv] object

[GitHub] spark pull request #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning wh...

2018-10-02 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22610#discussion_r222173904 --- Diff: python/pyspark/worker.py --- @@ -84,13 +84,36 @@ def wrap_scalar_pandas_udf(f, return_type): arrow_return_type = to_arrow_type(return_type

[GitHub] spark issue #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning when retu...

2018-10-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22610 Thanks @BryanCutler! Yes, this should not be a bug but is used as a warning to users that there might be some type conversion they are not noticed at first glance on the Pandas UDFs. For now

[GitHub] spark pull request #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning wh...

2018-10-02 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22610#discussion_r222015287 --- Diff: python/pyspark/worker.py --- @@ -84,13 +84,36 @@ def wrap_scalar_pandas_udf(f, return_type): arrow_return_type = to_arrow_type(return_type

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-10-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22318 For a query similar to the one in the PR description: `a.join(b, a("id") === b("id"), "inner").join(c, a("id") === b("id"), "inner&q

[GitHub] spark pull request #10989: [SPARK-12798] [SQL] generated BroadcastHashJoin

2018-10-02 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10989#discussion_r221972056 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala --- @@ -117,6 +120,87 @@ case class BroadcastHashJoin

[GitHub] spark issue #22524: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22524 Thanks @mgaido91. Resetting a flag in blocking operators is feasible solution I think for now to solve this issue that limit operator consumes all inputs. For current blocking operators, we should

[GitHub] spark issue #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning when retu...

2018-10-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22610 cc @HyukjinKwon Can you take a look at this when you have time? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22610: [WIP][SPARK-25461][PySpark][SQL] Print warning wh...

2018-10-02 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/22610 [WIP][SPARK-25461][PySpark][SQL] Print warning when return type of Pandas.Series mismatches the arrow return type of pandas udf ## What changes were proposed in this pull request

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-10-01 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r221662234 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2546,15 +2546,39 @@ object functions { def soundex(e: Column): Column

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-10-01 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r221645288 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2546,15 +2546,39 @@ object functions { def soundex(e: Column): Column

[GitHub] spark issue #22524: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-01 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22524 ping @cloud-fan @mgaido91 Any more comments or questions on this change? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...

2018-09-29 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22514 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22588: [SPARK-25262][DOC][FOLLOWUP] Fix link tags in html table

2018-09-28 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22588 cc @rvesse @mccheah @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22588: [SPARK-25262][DOC][FOLLOWUP] Fix link tags in htm...

2018-09-28 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/22588 [SPARK-25262][DOC][FOLLOWUP] Fix link tags in html table ## What changes were proposed in this pull request? Markdown links are not working inside html table. We should use html link tag

[GitHub] spark pull request #22582: [SPARK-25505][SQL][FOLLOWUP] Fix for attributes c...

2018-09-28 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22582#discussion_r221413500 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -556,7 +556,7 @@ class Analyzer( // Group

[GitHub] spark pull request #22582: [SPARK-25505][SQL][FOLLOWUP] Fix for attributes c...

2018-09-28 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22582#discussion_r221413483 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -556,7 +556,7 @@ class Analyzer( // Group

[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

2018-09-28 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22574 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

2018-09-28 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22574#discussion_r221159822 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -488,26 +494,27 @@ private[parquet] class

[GitHub] spark issue #22569: [SPARK-25542][Core][Test] Move flaky test in OpenHashMap...

2018-09-28 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22569 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22569: [SPARK-25542][Core][Test] Move flaky test in Open...

2018-09-27 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22569#discussion_r221147850 --- Diff: core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala --- @@ -255,4 +255,16 @@ class OpenHashSetSuite extends SparkFunSuite

[GitHub] spark issue #22569: [SPARK-25542][Core][Test] Move flaky test in OpenHashMap...

2018-09-27 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22569 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22569: [SPARK-25542][Core][Test] Move flaky test in Open...

2018-09-27 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22569#discussion_r221131588 --- Diff: core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala --- @@ -255,4 +255,16 @@ class OpenHashSetSuite extends SparkFunSuite

[GitHub] spark pull request #22573: [SPARK-25558][SQL] Pushdown predicates for nested...

2018-09-27 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22573#discussion_r221130276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -437,53 +436,65 @@ object DataSourceStrategy

[GitHub] spark pull request #22573: [SPARK-25558][SQL] Pushdown predicates for nested...

2018-09-27 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22573#discussion_r221116417 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -437,53 +436,65 @@ object DataSourceStrategy

[GitHub] spark pull request #22569: [SPARK-25542][SQL][Test] Move flaky test in OpenH...

2018-09-27 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22569#discussion_r220955679 --- Diff: core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala --- @@ -255,4 +255,16 @@ class OpenHashSetSuite extends SparkFunSuite

<    1   2   3   4   5   6   7   8   9   10   >