[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-29 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r76579917 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -136,7 +136,7 @@ trait

[GitHub] spark pull request #14298: [SPARK-16283][SQL] Implement `percentile_approx` ...

2016-08-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14298#discussion_r76548403 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PercentileApprox.scala --- @@ -0,0 +1,462

[GitHub] spark issue #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` should make...

2016-08-25 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14698 Thanks @hvanhovell for the review! This patch has been updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-24 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r76177037 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -136,7 +136,7 @@ trait

[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-24 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r76176582 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -474,6 +474,20 @@ case class MapObjects private

[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-24 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r76176311 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -474,6 +474,20 @@ case class MapObjects private

[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-24 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r76176321 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -474,6 +474,20 @@ case class MapObjects private

[GitHub] spark pull request #14298: [SPARK-16283][SQL] Implement `percentile_approx` ...

2016-08-23 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14298#discussion_r75830247 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PercentileApprox.scala --- @@ -0,0 +1,462

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-19 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 @rxin yes all empty (e.g. zero sized string) values become null values once they are read back. E.g. given `test.csv`: ``` 1,,3, ``` `spark.read.csv("test.csv&q

[GitHub] spark pull request #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make...

2016-08-18 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14118#discussion_r75426062 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -370,7 +370,8 @@ class DataFrameReader private[sql](sparkSession

[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-18 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14698 [SPARK-17061][SPARK-17093][SQL] `MapObjects` should make copies of unsafe-backed data ## What changes were proposed in this pull request? Currently `MapObjects` does not make copies

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 This is ready for review. To summarize, this patch casts user-specified `nullValue`s to `null`s for all supported types including the string type: - this fixes the problem where null

[GitHub] spark pull request #14298: [SPARK-16283][SQL] Implement `percentile_approx` ...

2016-08-08 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14298#discussion_r73995159 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PercentileApprox.scala --- @@ -0,0 +1,462

[GitHub] spark pull request #14298: [SPARK-16283][SQL] Implement `percentile_approx` ...

2016-08-08 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14298#discussion_r73993189 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PercentileApprox.scala --- @@ -0,0 +1,462

[GitHub] spark pull request #14298: [SPARK-16283][SQL] Implement `percentile_approx` ...

2016-08-08 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14298#discussion_r73990892 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PercentileApprox.scala --- @@ -0,0 +1,462

[GitHub] spark pull request #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make...

2016-08-05 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14118#discussion_r73666106 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala --- @@ -68,16 +68,48 @@ class CSVTypeCastSuite

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 @falaki could you take a look at the lasted update: [[bf01cea] StringType should also respect `nullValue`](https://github.com/apache/spark/pull/14118/commits/bf01cea8273f00386ceef6459f8b8fe2c169e12a

[GitHub] spark issue #14298: [SPARK-16283][SQL] Implement `percentile_approx` SQL fun...

2016-08-02 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14298 @hvanhovell comments addressed. Please let me know when there's more to do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #14298: [SPARK-16283][SQL] Implement `percentile_approx` ...

2016-08-02 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14298#discussion_r73101674 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PercentileApprox.scala --- @@ -0,0 +1,456

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-01 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 Here are some findings as I dug a little: 1. Since https://github.com/databricks/spark-csv/pull/102(Jul, 2015), we would cast `""` as `null` for all types other than strings. F

[GitHub] spark pull request #14370: [SPARK-16713][SQL] Check codegen method size ≤ ...

2016-07-27 Thread lw-lin
Github user lw-lin closed the pull request at: https://github.com/apache/spark/pull/14370 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14370: [SPARK-16713][SQL] Check codegen method size ≤ 8K on c...

2016-07-27 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14370 @rxin @cloud-fan sorry for the ambiguity, but we're not enforcing this. There are 3 levels: - when `NO_OP` is on (which is the fault), we do nothing at all to a huge method; - when `WARN_

[GitHub] spark issue #14370: [SPARK-16713][SQL] Check codegen method size ≤ 8K on c...

2016-07-26 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14370 @davies would you also take a look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14370: [SPARK-16713][SQL] Check codegen method size ≤ ...

2016-07-26 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14370 [SPARK-16713][SQL] Check codegen method size ≤ 8K on compile ## What changes were proposed in this pull request? Ideally, we would wish codegen methods to be less than 8KB for bytecode

[GitHub] spark issue #14298: [SPARK-16283][SQL] Implement `percentile_approx` SQL fun...

2016-07-26 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14298 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14324: [SPARK-16664][SQL] Fix persist call on Data frames with ...

2016-07-23 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14324 @breakdawn what else can we do to actually fix the ≥ 8118 cols issue? We're actually running out of the constant pool when we compile the generated code. So maybe compile it into multiple classes

[GitHub] spark issue #14280: [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` on OS X/...

2016-07-23 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14280 Maybe this is ready to go? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14280: [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` o...

2016-07-23 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14280#discussion_r71971600 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -64,14 +67,17 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark issue #14324: [SPARK-16664][SQL] Fix persist call on Data frames with ...

2016-07-23 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14324 @breakdawn yes that's a different issue and I'm looking into it. Regarding what this PR tries to fix, could you run this PR's change against [this test case](https://github.com/apache/spark

[GitHub] spark issue #14324: [SPARK-16664][SQL] Fix persist call on Data frames with ...

2016-07-22 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14324 @breakdawn it'd be great to do more tests when you open a request. As I'm investigate into this too, I found that my same fix works for 201 cols but fails for 8118 cols. The exact limit is 8117

[GitHub] spark issue #14298: [SPARK-16283][SQL] Implement `percentile_approx` SQL fun...

2016-07-21 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14298 @cloud-fan could you also help review this? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #14298: [SPARK-16283][SQL] Implement `percentile_approx` ...

2016-07-21 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14298#discussion_r71815241 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PercentileApprox.scala --- @@ -0,0 +1,456

[GitHub] spark issue #14298: [SPARK-16283][SQL] Implement `percentile_approx` SQL fun...

2016-07-21 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14298 @hvanhovell could you take a look at this? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14298: [SPARK-16283][SQL] Implement `percentile_approx` SQL fun...

2016-07-21 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14298 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14298: [SPARK-16283][SQL] Implement `percentile_approx` ...

2016-07-21 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14298 [SPARK-16283][SQL] Implement `percentile_approx` SQL function ## What changes were proposed in this pull request? This patch Implements `percentile_approx` SQL function using Spark's

[GitHub] spark pull request #14237: [WIP][SPARK-16283][SQL] Implement `percentile_app...

2016-07-21 Thread lw-lin
Github user lw-lin closed the pull request at: https://github.com/apache/spark/pull/14237 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #14237: [WIP][SPARK-16283][SQL] Implement `percentile_app...

2016-07-21 Thread lw-lin
GitHub user lw-lin reopened a pull request: https://github.com/apache/spark/pull/14237 [WIP][SPARK-16283][SQL] Implement `percentile_approx` SQL function I'll reopen once it's ready for review, thanks! You can merge this pull request into a Git repository by running: $ git

[GitHub] spark issue #14280: [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` on OS X/...

2016-07-20 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14280 `sed $'...'` works on both Linux & OS X! So let's switch back to `sed`. @srowen thanks a lot for the `$`! --- If your project is set up for it, you can reply to this email and have your r

[GitHub] spark pull request #14280: [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` o...

2016-07-20 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14280#discussion_r71480386 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -64,14 +67,19 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #14280: [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` o...

2016-07-20 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14280#discussion_r71480336 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -64,14 +67,19 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #14280: [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` o...

2016-07-20 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14280#discussion_r71477795 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -64,14 +67,19 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #14280: [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` o...

2016-07-20 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14280 [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` on OS X/Windows... ## Problem OS X's `sed` doesn't understand `\t` at all, so this `script` test would fail: ``` == Results

[GitHub] spark issue #14256: [SPARK-16620][CORE] Add back the tokenization process in...

2016-07-19 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14256 @srowen @dongjoon-hyun, thanks for the clarification. Yep I had verified manually that the java version does not need any change before I opened this PR. --- If your project is set up for it, you

[GitHub] spark pull request #14256: [SPARK-16620][CORE] Add back tokenization process...

2016-07-19 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14256 [SPARK-16620][CORE] Add back tokenization process in RDD.pipe(command: String) ## What changes were proposed in this pull request? Currently `RDD.pipe(command: String)`: - works only

[GitHub] spark pull request #14237: [WIP][SPARK-16283][SQL] Implement `percentile_app...

2016-07-17 Thread lw-lin
Github user lw-lin closed the pull request at: https://github.com/apache/spark/pull/14237 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #14237: [WIP][SPARK-16283][SQL] Implement `percentile_app...

2016-07-17 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14237 [WIP][SPARK-16283][SQL] Implement `percentile_approx` SQL function ## What changes were proposed in this pull request? WIP ## How was this patch tested? WIP

[GitHub] spark pull request #14214: [SPARK-16545][SQL] Eliminate unnecessary rounds o...

2016-07-15 Thread lw-lin
Github user lw-lin closed the pull request at: https://github.com/apache/spark/pull/14214 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate unnecessary rounds of physi...

2016-07-15 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14214 Sure let's rewrite the incremental planner to solve problems more holistically; actually this patch is not satisfying to myself either. So I'm closing this, and -- thank you for the ideas

[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate unnecessary rounds of physi...

2016-07-15 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14214 @marmbrus @zsxwing could you take a look and share some ideas? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...

2016-07-15 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14214 @mariobriggs Thanks for the information! > 1 can be eliminated because 'executedPlan' is a ' lazy val' on QueryExecution ? Yea indeed. Its being there can provide us debug i

[GitHub] spark pull request #14214: [SPARK-16545][SQL] Eliminate one unnecessary roun...

2016-07-14 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14214 [SPARK-16545][SQL] Eliminate one unnecessary round of physical planning in ForeachSink ## Problem As reported by [SPARK-16545](https://issues.apache.org/jira/browse/SPARK-16545

[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version

2016-07-13 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14165 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14165: [SPARK-16503] SparkSession should provide Spark v...

2016-07-13 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14165#discussion_r70570816 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -79,6 +79,9 @@ class SparkSession private

[GitHub] spark pull request #14165: [SPARK-16503] SparkSession should provide Spark v...

2016-07-12 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14165 [SPARK-16503] SparkSession should provide Spark version ## What changes were proposed in this pull request? This patch adds the following to SparkSession: ```scala

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70243938 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,336 @@ +/* + * Licensed

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70218352 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,336 @@ +/* + * Licensed

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70212106 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,336 @@ +/* + * Licensed

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70210589 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -401,7 +401,9 @@ class SessionCatalog

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70209020 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,337 @@ +/* + * Licensed

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70208643 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,337 @@ +/* + * Licensed

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70208504 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -425,7 +427,9 @@ class SessionCatalog( def

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70208490 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -401,7 +401,9 @@ class SessionCatalog

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70208420 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,337 @@ +/* + * Licensed

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-07-10 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 I think @HyukjinKwon has made a good point: it's kind of strange null strings can be written out, but can not be read back as nulls. So for `StringType`: nulls

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SQL] Make CSV cast null value...

2016-07-10 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 @HyukjinKwon hi. The explanation above intends to help reviewers better understand how we introduced the regression. Regarding whether `StringType` should be ignored or not, I don't have strong

[GitHub] spark pull request #14118: [SPARK-16462][SPARK-16460][SQL] Make CSV cast nul...

2016-07-10 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14118#discussion_r70182426 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -238,59 +238,55 @@ private[csv] object

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SQL] Make CSV cast null value...

2016-07-09 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 FYI, before [SPARK-14143](https://issues.apache.org/jira/browse/SPARK-14143), null values had been handled this way: : ```scala if (datum == options.nullValue &&

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SQL] Make CSV cast null value...

2016-07-09 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 The diff that github shows is a mess. The actual diff (which is quite small) is: ![diff](https://cloud.githubusercontent.com/assets/15843379/16711624/db6faf94-4697-11e6-8c56-53f10711aea5

[GitHub] spark pull request #14118: [SPARK-16462][SPARK-16460][SQL] Make CSV cast nul...

2016-07-09 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14118 [SPARK-16462][SPARK-16460][SQL] Make CSV cast null values properly ## What changes were proposed in this pull request? When casting given string datum to specified type, CSV should return

[GitHub] spark issue #14030: [SPARK-16350][SQL] Fix support for incremental planning ...

2016-07-07 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14030 Updated. @zsxwing could you take another look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #14030: [SPARK-16350][SQL] Fix support for incremental pl...

2016-07-07 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14030#discussion_r69856084 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala --- @@ -30,7 +32,42 @@ import org.apache.spark.sql.{DataFrame

[GitHub] spark pull request #14030: [SPARK-16350][SQL] Fix support for incremental pl...

2016-07-07 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14030#discussion_r69855749 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/ForeachSinkSuite.scala --- @@ -35,35 +35,109 @@ class ForeachSinkSuite extends

[GitHub] spark pull request #14030: [SPARK-16350][SQL] Fix support for incremental pl...

2016-07-07 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14030#discussion_r69855731 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/ForeachSinkSuite.scala --- @@ -35,35 +35,109 @@ class ForeachSinkSuite extends

[GitHub] spark pull request #14030: [SPARK-16350][SQL] Fix support for incremental pl...

2016-07-07 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14030#discussion_r69855673 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -155,7 +155,7 @@ private[sql] object Dataset { class Dataset[T] private[sql

[GitHub] spark pull request #14030: [SPARK-16350][SQL] Fix support for incremental pl...

2016-07-07 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14030#discussion_r69855694 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala --- @@ -30,7 +32,42 @@ import org.apache.spark.sql.{DataFrame

[GitHub] spark issue #14030: [SPARK-16350][SQL] Fix support for incremental planning ...

2016-07-05 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14030 @zsxwing could you take a look at this when you have time? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #13804: [Minor][Core] Fix display wrong free memory size in the ...

2016-07-03 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13804 hi @jerryshao, let's also back-port this into 1.6.x ([MemoryStore.scala#L395](https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L395

[GitHub] spark issue #14030: [SPARK-16350][SQL] Fix support for incremental planning ...

2016-07-02 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14030 @zsxwing could you take a look at this? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14030: [SPARK-16350][SQL] Fix support for incremental planning ...

2016-07-02 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14030 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14030: [WIP][SPARK-16350][SQL] Fix `foreach` for streaming Data...

2016-07-02 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14030 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14030: [WIP][SPARK-16350][SQL] Fix `foreach` for streami...

2016-07-02 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14030 [WIP][SPARK-16350][SQL] Fix `foreach` for streaming Dataset ## What changes were proposed in this pull request? - [x] add tests - [ ] fix `foreach` ## How was this patch tested

[GitHub] spark issue #13978: [SPARK-16256][DOCS] Minor fixes on the Structured Stream...

2016-06-30 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13978 The programming guide is totally awesome, thanks @tdas! Seems like there is one minor issue: we should also count this `12:11 dog` into window `12:05-12:15`, right? ![ssx](https

[GitHub] spark issue #13685: [SPARK-15963][CORE] Catch `TaskKilledException` correctl...

2016-06-24 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13685 Thanks, @squito @markhamstra ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13685: [SPARK-15963][CORE] Catch `TaskKilledException` correctl...

2016-06-23 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13685 Addressed all comments. @squito would you take another look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #13685: [SPARK-15963][CORE] Catch `TaskKilledException` c...

2016-06-22 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/13685#discussion_r68165895 --- Diff: core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala --- @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #13685: [SPARK-15963][CORE] Catch `TaskKilledException` correctl...

2016-06-22 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13685 Hi @squito thanks for the comments! > how you'd have a TaskKilledException, but without setting the task to `killed` This can be reproduced when, a task gets killed([Executor#L

[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...

2016-06-19 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13652 hi @davies, two tests still fail after this patch when I build locally: ``` - from UTC timestamp *** FAILED *** "2016-03-13 0[2]:00:00.0" did not equal "2016-03-

[GitHub] spark pull request #13518: [WIP][SPARK-15472][SQL] Add support for writing i...

2016-06-16 Thread lw-lin
Github user lw-lin closed the pull request at: https://github.com/apache/spark/pull/13518 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #13705: [SPARK-15472][SQL] Add support for writing in `csv` form...

2016-06-16 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13705 I personally feel that it would be great if we can also support writing in `csv`, `json`, `txt` formats in Structured Streaming for the 2.0 release (I'd like to submit patches for `json`, `txt` very

[GitHub] spark pull request #13705: [SPARK-15472][SQL] Add support for writing in `cs...

2016-06-16 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/13705 [SPARK-15472][SQL] Add support for writing in `csv` format in Structured Streaming ## What changes were proposed in this pull request? This patch adds support for writing in `csv` format

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-16 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13518 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13518: [WIP][SPARK-15472][SQL] Add support for writing in `csv`...

2016-06-16 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13518 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13518: [WIP][SPARK-15472][SQL] Add support for writing i...

2016-06-16 Thread lw-lin
GitHub user lw-lin reopened a pull request: https://github.com/apache/spark/pull/13518 [WIP][SPARK-15472][SQL] Add support for writing in `csv`, `json`, `text` formats in Structured Streaming ## What changes were proposed in this pull request? This patch adds support

[GitHub] spark pull request #13575: [SPARK-15472][SQL] Add support for writing in `cs...

2016-06-16 Thread lw-lin
Github user lw-lin closed the pull request at: https://github.com/apache/spark/pull/13575 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #13685: [SPARK-15963][CORE] Catch `TaskKilledException` correctl...

2016-06-15 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13685 I couldn't come up with good syntax to express something like ```scala case e @ (_: TaskKilledException) | (_: InterruptedException if task.killed) => ... ``` So this pa

[GitHub] spark pull request #13685: [SPARK-15963][CORE] Catch `TaskKilledException` c...

2016-06-15 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/13685 [SPARK-15963][CORE] Catch `TaskKilledException` correctly in Executor.TaskRunner ## What changes were proposed in this pull request? Currently in [Executor.TaskRunner](https://github.com

[GitHub] spark issue #13683: [SPARK-15518][Core][Follow-up] Rename LocalSchedulerBack...

2016-06-15 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13683 @rxin would you mind taking a look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #13683: [SPARK-15518][Core][Follow-up] Rename LocalSchedu...

2016-06-15 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/13683 [SPARK-15518][Core][Follow-up] Rename LocalSchedulerBackendEndpoint -> LocalSchedulerBackend ## What changes were proposed in this pull request? This patch is a follow-up to ht

[GitHub] spark issue #11996: [SPARK-10530] [CORE] Kill other task attempts when one t...

2016-06-14 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/11996 @devaraj-kavali @kayousterhout this is good to have, but I just wonder if this would cause resources to leak? E.g when the task is in the middle of releasing resources in a `finally` block -- like

<    1   2   3   4   5   6   >