[GitHub] spark pull request #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json...

2018-10-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22775#discussion_r227237973 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -770,8 +776,17 @@ case class SchemaOfJson

[GitHub] spark issue #22797: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

2018-10-23 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22797 Hi @dilipbiswal sorry for the back and forth, can you try one more approach? Basically we want to evaluate how the simplest logical rewrite looks like. We can create unevaluatable EVERY

[GitHub] spark issue #22800: [SPARK-24499][SQL][DOC][follow-up] Fix spelling in doc

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22800 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22800: [SPARK-24499][SQL][DOC][follow-up] Fix spelling in doc

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22800 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22790: [SPARK-25793][ML]call SaveLoadV2_0.load for classNameV2_...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22790 cc @mengxr @WeichenXu123 how serious is it? shall we treat it as a blocker? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22799: [SPARK-25805][SQL][TEST] Fix test for SPARK-25159

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22799 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22788: [SPARK-25769][SQL]escape nested columns by backti...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22788#discussion_r227203199 --- Diff: sql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out --- @@ -81,7 +81,7 @@ SELECT t1.i1 FROM t1, mydb1.t1

[GitHub] spark pull request #22788: [SPARK-25769][SQL]escape nested columns by backti...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22788#discussion_r227202767 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -99,7 +99,7 @@ case class

[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22512 LGTM, we also need a unit test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r227202171 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -140,6 +141,14 @@ class SQLQueryTestSuite extends QueryTest

[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r227200656 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala --- @@ -49,10 +51,54 @@ class

[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r227200458 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala --- @@ -49,10 +51,54 @@ class

[GitHub] spark pull request #22745: [SPARK-25772][SQL] Fix java map of structs deseri...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22745#discussion_r227037566 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala --- @@ -278,24 +278,20 @@ object JavaTypeInference

[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19788 BTW, let's add a config for this feature. We may enable adaptive execution by default in the future, and we should still allow users to run spark with legacy shuffle service. We should also throw

[GitHub] spark issue #22786: [SPARK-25764][ML][EXAMPLES] Update BisectingKMeans examp...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22786 also cc @WeichenXu123 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20820: [SPARK-23676][SQL]Support left join codegen in SortMerge...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20820 This is good to have, but we should follow `BroadcastHashJoinExec` and make the implementation more structured. e.g. `codegenInner`, `codegenOuter`, etc

[GitHub] spark pull request #22788: [SPARK-25769][SQL]change nested columns from `a.b...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22788#discussion_r226988117 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2702,7 +2702,7 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #22788: [SPARK-25769][SQL]change nested columns from `a.b...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22788#discussion_r226987977 --- Diff: sql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out --- @@ -81,7 +81,7 @@ SELECT t1.i1 FROM t1, mydb1.t1

[GitHub] spark issue #17520: [WIP][SPARK-19712][SQL] Move PullupCorrelatedPredicates ...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17520 is it time to revisit it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r226985475 --- Diff: python/pyspark/sql/functions.py --- @@ -403,6 +403,28 @@ def countDistinct(col, *cols): return Column(jc) +def every

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21860 LGTM except the naming --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21860: [SPARK-24901][SQL]Merge the codegen of RegularHas...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21860#discussion_r226981919 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -744,6 +744,7 @@ case class HashAggregateExec

[GitHub] spark pull request #22745: [SPARK-25772][SQL] Fix java map of structs deseri...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22745#discussion_r226981527 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala --- @@ -278,24 +278,20 @@ object JavaTypeInference

[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22785#discussion_r226978705 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala --- @@ -187,7 +187,7 @@ object RowEncoder

[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22785#discussion_r226978368 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala --- @@ -171,7 +171,7 @@ object RowEncoder

[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22785#discussion_r226978811 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala --- @@ -273,6 +273,16 @@ class RowEncoderSuite

[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22785#discussion_r226977895 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala --- @@ -171,7 +171,7 @@ object RowEncoder

[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22785#discussion_r226977568 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala --- @@ -273,6 +273,16 @@ class RowEncoderSuite

[GitHub] spark issue #21402: SPARK-24355 Spark external shuffle server improvement to...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21402 shall we close it since #22173 is merged? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-10-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19788 Hi @yucai , good points on the performance concerns. Let's go with the previous approach: https://github.com/apache/spark/pull/19788#issuecomment-366887404 sorry for the back and forth

[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-10-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22575 do we have a full story about stream sql? is the `STREAM` keyword the only difference between stream sql and normal sql? also cc @tdas @zsxwing

[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-10-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22575 do we have a full story about stream sql? is the `STREAM` keyword the only difference between stream sql and normal sql? how could users define watermark with SQL? also cc @tdas @zsxwing

[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-10-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22575 do we have a full story about stream sql? is the `STREAM` keyword the only difference between stream sql and normal sql? how could users define watermark with SQL? also cc @tdas @zsxwing

[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-10-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22575 do we have a full story about stream sql? is the `STREAM` keyword the only difference between stream sql and normal sql? how could users define watermark with SQL? also cc @tdas @zsxwing

[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-10-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22575 do we have a full story about stream sql? is the `STREAM` keyword the only difference between stream sql and normal sql? how could users define watermark with SQL? also cc @tdas @zsxwing

[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-10-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22575 Do we have a full story about streaming SQL? is the `STREAM` keyword the only difference between stream sql and normal sql? also cc @tdas @zsxwing

[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

2018-10-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22756 reverted from master. Let's move the discussion to https://github.com/apache/spark/pull/22764 --- - To unsubscribe, e-mail

[GitHub] spark issue #22763: [SPARK-25764][ML][EXAMPLES] Update BisectingKMeans examp...

2018-10-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22763 ah i see. @mgaido91 can you resubmit it and update the description? The method is not deprecated now. --- - To unsubscribe, e

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

2018-10-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22501 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-10-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22547 Let's move the high-level discussion to https://docs.google.com/document/d/1uUmKCpWLdh9vHxP7AWJ9EgbwB_U6T3EJYNjhISGmiQg/edit?usp=sharing

[GitHub] spark pull request #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needs...

2018-10-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22750#discussion_r226819049 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -168,10 +168,11 @@ case class FileSourceScanExec

[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22756 shall we revert it from master as well? At least we need to update the message `This method is deprecated and will be removed in 3.0.0

[GitHub] spark pull request #22781: [MINOR][DOC] Fix the building document to describ...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22781#discussion_r226816630 --- Diff: docs/building-spark.md --- @@ -12,7 +12,7 @@ redirect_from: "building-with-maven.html" ## Apache Maven The Maven-b

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22501 seems jenkins is broken, cc @shaneknapp ``` Command "/tmp/tmp.JfFHaoRFPU/3.5/bin/python -c "import setuptools, tokenize;__file__='/home/jenkins/workspace/SparkPullRequestBuil

[GitHub] spark issue #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needsUnsafeR...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22750 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226812577 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Format.java --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needs...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22750#discussion_r226812447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala --- @@ -164,12 +162,11 @@ private[sql] trait ColumnarBatchScan

[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22501 thank you guys for refreshing the benchmarks and results! It's very helpful. If possible, can we post the perf regressions we found in the umbrella JIRA? Then people can see if the perf

[GitHub] spark issue #22763: [SPARK-25764][ML][EXAMPLES] Update BisectingKMeans examp...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22763 This has been reverted from master/2.4 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22764: [SPARK-25765][ML] Add training cost to BisectingKMeans s...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22764 Since this PR is a little more complicated than we expect, we decided to not have it in 2.4.0. I'm not sure if we can treat it as a special case and put it in 2.4.1, cc @mengxr Anyway

[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226655438 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -207,6 +207,14 @@ class SessionCatalog

[GitHub] spark pull request #22764: [SPARK-25765][ML] Add training cost to BisectingK...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22764#discussion_r226652377 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala --- @@ -225,13 +227,14 @@ object BisectingKMeansModel

[GitHub] spark issue #22766: [SPARK-25768][SQL] fix constant argument expecting UDAFs

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22766 thanks, merging to master/2.4/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22743 > Datasource table will not cache in tableRelationCache. I don't think so. Spark caches data source table in `FindDataSourceTa

[GitHub] spark pull request #22666: [SPARK-25672][SQL] schema_of_csv() - schema infer...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22666#discussion_r226641023 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3886,6 +3886,31 @@ object functions { withExpr(new CsvToStructs

[GitHub] spark pull request #22666: [SPARK-25672][SQL] schema_of_csv() - schema infer...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22666#discussion_r226640860 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CsvExpressionsSuite.scala --- @@ -155,4 +155,15 @@ class

[GitHub] spark pull request #22666: [SPARK-25672][SQL] schema_of_csv() - schema infer...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22666#discussion_r226640362 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala --- @@ -60,7 +63,7 @@ case class CsvToStructs

[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22732 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needsUnsafeR...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22750 `DataSourceScanExec` does not have `needsUnsafeRowConversion` --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22743 why it's only a problem for hive tables? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22732#discussion_r226520350 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -932,6 +935,23 @@ trait ScalaReflection

[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22749#discussion_r226519284 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -43,10 +44,11 @@ import

[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22743 can you explain more about how this happens? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22732#discussion_r226517584 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -39,29 +42,29 @@ import

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22501#discussion_r226516354 --- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt --- @@ -1,117 +1,145 @@ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X

[GitHub] spark issue #22763: [SPARK-25764][ML][EXAMPLES] Update BisectingKMeans examp...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22763 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22764: [SPARK-25765][ML] Add training cost to BisectingK...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22764#discussion_r226512051 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala --- @@ -310,4 +317,6 @@ class BisectingKMeansSummary private

[GitHub] spark issue #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needsUnsafeR...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22750 which description is inaccurate? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22766: [SPARK-25768][SQL] fix constant argument expectin...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22766#discussion_r226511589 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -340,39 +340,39 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark pull request #22766: [SPARK-25768][SQL] fix constant argument expectin...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22766#discussion_r226511506 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -340,39 +340,39 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark pull request #22766: [SPARK-25768][SQL] fix constant argument expectin...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22766#discussion_r226511479 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -340,39 +340,39 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark pull request #22764: [SPARK-25765][ML] Add training cost to BisectingK...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22764#discussion_r226384584 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala --- @@ -310,4 +317,6 @@ class BisectingKMeansSummary private

[GitHub] spark pull request #22766: [SPARK-25768][SQL] fix constant argument expectin...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22766#discussion_r226375347 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -339,40 +339,38 @@ private[hive] case class HiveUDAFFunction

[GitHub] spark issue #22766: [SPARK-25768][SQL] fix constant argument expecting UDAFs

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22766 OK to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22764: [SPARK-25765][ML] Add training cost to BisectingK...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22764#discussion_r226372701 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala --- @@ -310,4 +317,6 @@ class BisectingKMeansSummary private

[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22547 A major part of this PR is to update existing streaming sources, which is just moving code around. There are 3 things we need to pay attention to during review: 1. the naming

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226363445 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala --- @@ -381,7 +390,7 @@ class StreamSuite extends StreamTest

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226363020 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala --- @@ -154,21 +159,25 @@ class StreamSuite extends StreamTest

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226361309 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/RateStreamProviderSuite.scala --- @@ -319,29 +307,18 @@ class

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226359031 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RateStreamMicroBatchInputStream.scala --- @@ -60,6 +59,14 @@ class

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226355931 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala --- @@ -90,6 +140,8 @@ class

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r226338580 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #22764: [SPARK-25765][ML] Add training cost to BisectingKMeans s...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22764 does the example need to be updated with this new API? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22749 I like this idea! waiting for tests pass --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22749#discussion_r226301402 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -212,21 +183,88 @@ object ExpressionEncoder

[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22749#discussion_r226301139 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -212,21 +183,88 @@ object ExpressionEncoder

[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22749#discussion_r226299441 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -212,21 +183,88 @@ object ExpressionEncoder

[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22749#discussion_r226298803 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -212,21 +183,88 @@ object ExpressionEncoder

[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22749#discussion_r226296369 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -103,75 +88,61 @@ object ExpressionEncoder

[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22749#discussion_r226295859 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -103,75 +88,61 @@ object ExpressionEncoder

[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22749#discussion_r226294255 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -103,75 +88,61 @@ object ExpressionEncoder

[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22749#discussion_r226294017 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -43,10 +44,11 @@ import

[GitHub] spark issue #22721: [SPARK-25403][SQL] Refreshes the table after inserting t...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22721 I think it's reasonable to follow `InsertIntoHiveTable`, but it's better to provide more details about what changes in `InsertIntoHadoopFsRelationCommand`: 1. what's refreshed? Previously we

[GitHub] spark pull request #22721: [SPARK-25403][SQL] Refreshes the table after inse...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22721#discussion_r226280121 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -189,6 +189,7 @@ case class

[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22756 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22547 Hi @rdblue welcome back! I just rebased it so it's ready for review :) --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22721: [SPARK-25403][SQL] Refreshes the table after inse...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22721#discussion_r226208576 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala --- @@ -189,6 +189,7 @@ case class

[GitHub] spark pull request #22758: [SPARK-25332][SQL] Instead of broadcast hash join...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22758#discussion_r226198591 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -193,6 +193,16 @@ private[hive] class HiveMetastoreCatalog

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22732#discussion_r226156536 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala --- @@ -393,4 +393,30 @@ class UDFSuite extends QueryTest with SharedSQLContext

[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22732#discussion_r226156400 --- Diff: docs/sql-programming-guide.md --- @@ -1978,6 +1978,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see

<    4   5   6   7   8   9   10   11   12   13   >