[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11389#issuecomment-190033592 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11389#issuecomment-190034727 @yhuai Could I ask that you have any clue on this? I think this is related with whole-code generation. This is happening some builds such as https

[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11389#issuecomment-190041162 As this passes sometimes (e.g. https://github.com/apache/spark/pull/11016), I well restart. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11389#issuecomment-190041264 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11389#issuecomment-190072703 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11389#issuecomment-190073355 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

2016-02-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11016#issuecomment-188550859 @falaki Hm.. Do JSON and TEXT data sources support `encoding` option? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

2016-02-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11016#issuecomment-188539066 @falaki Thanks. Then, I will try to generalize this and then change the title as well with some more commits. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-188707171 Sorry, `listFiles()` calls `listStatus()` internally. It looks there is no way to fetch a file deep without listing files. --- If your project is set up

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-188726983 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-188697104 @rxin I found an API `listFiles()` which returns an iterator. So, this will not list up in any case but just trying to find a single file. Also, I added

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-188699728 @rxin I found an API `listFiles()` which returns an iterator. So, this will not list up all files in any case but just try to find a single file. Also, I

[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11389#issuecomment-190163936 @rxin BTW would you merge this if it looks good? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11389#issuecomment-190125209 I see that's a problem in new vecterizedreader. I missed the exception message. Looking deeper. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11389#issuecomment-189183710 @rxin I opened this PR because it looks writing `csv()` should be added anyway. If I got the documentation stuff wrong, I will move that back. --- If your

[GitHub] spark pull request: [SPARK-11691][SQL] Support setting hadoop comp...

2016-02-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11324#issuecomment-189257194 @maropu @zjffdu @rxin I apologize that I carelessly open the same issue and submitted a PR. This is fixed in https://github.com/apache/spark/pull/11384

[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-26 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11389 [SPARK-13509][SPARK-13507][SQL] Support for writing CSV with a single function call https://issues.apache.org/jira/browse/SPARK-13507 https://issues.apache.org/jira/browse/SPARK-13509

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-187459776 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13442][SQL] Make type inference recogni...

2016-02-22 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11315 [SPARK-13442][SQL] Make type inference recognize boolean types This PR adds the support for inferring `BooleanType` for schema. It supports to infer case-insensitive `true` / `false

[GitHub] spark pull request: [SPARK-13442][SQL] Make type inference recogni...

2016-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11315#issuecomment-187528824 cc @falaki @yucheng1992 Would you review this please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-13442][SQL] Make type inference recogni...

2016-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11315#issuecomment-187488584 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13503][SQL] Support to specify the (wri...

2016-02-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11384#issuecomment-189142088 @rxin Would you merge this if it looks okay? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-13503][SQL] Support to specify the (wri...

2016-02-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11384#issuecomment-189162470 @rxin I guess you meant `DataFrameWriter`. Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-12871][SQL] Support to specify the opti...

2016-01-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10805#issuecomment-173016289 Oh yes it does. Actually I am reading compressed files in the test I added [here](https://github.com/HyukjinKwon/spark/blob/SPARK-12420/sql/core/src/test/scala/org

[GitHub] spark pull request: [SPARK-12872][SQL] Support to specify the opti...

2016-01-20 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/10858 [SPARK-12872][SQL] Support to specify the option for compression codec for JSON datasource https://issues.apache.org/jira/browse/SPARK-12872 This PR makes the JSON datasource can

[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

2016-01-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10895#issuecomment-174717824 @yhuai Sorry, I just checked this notification. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-12871][SQL] Support to specify the opti...

2016-01-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10805#issuecomment-173022112 I see. I will anyway try to figure this out though. I somehow this might be a bit too much as almost all files would have proper extensions and I think the (almost

[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

2016-01-24 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/10895 [SPARK-12901][SQL] Refactor options for JSON and CSV datasource (not case class and same format). https://issues.apache.org/jira/browse/SPARK-12901 This PR refactors the options in JSON

[GitHub] spark pull request: [SPARK-12990][BUILD] Fix fatal warnings due to...

2016-01-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10903#issuecomment-175311637 @srowen Hm.. I thought It was already fixed in https://github.com/apache/spark/commit/00026fa9912ecee5637f1e7dd222f977f31f6766. --- If your project is set up

[GitHub] spark pull request: [SPARK-12990][BUILD] Fix fatal warnings due to...

2016-01-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10903#issuecomment-175312936 Ah. that was merged yesterday. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-12997][SQL] Use cast expression to perf...

2016-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10980#issuecomment-184032962 @rxin @falaki Could you please look through this and tell me if I understood this correctly and this approach is appropriate? --- If your project is set up

[GitHub] spark pull request: [SPARK-12997][SQL] Use cast expression to perf...

2016-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10980#issuecomment-184033527 @rxin @falaki Could you please look through this and tell me if I understood this correctly and this approach is appropriate? --- If your project is set up

[GitHub] spark pull request: [SPARK-13137][SQL] NullPoingException in schem...

2016-02-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11023#issuecomment-185568359 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13381][SQL] Support for loading CSV wit...

2016-02-18 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11262 [SPARK-13381][SQL] Support for loading CSV with a single function call https://issues.apache.org/jira/browse/SPARK-13381 This PR adds the support to load CSV data directly by a single

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-19 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11270 [SPARK-8000][SQL] Support for auto-detecting data sources. https://issues.apache.org/jira/browse/SPARK-8000 This PR adds the support for detecting data source by extension

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11270#discussion_r53445997 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -408,7 +408,7 @@ class DataFrameReader private[sql](sqlContext

[GitHub] spark pull request: [SPARK-13381][SQL] Support for loading CSV wit...

2016-02-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11262#issuecomment-186173846 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11270#discussion_r53448453 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala --- @@ -130,7 +141,49 @@ object

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-186201103 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-186242687 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-186242792 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-186457302 @rxin Actually, as you know, `spark.sql.sources.default` can be different datasource, so I think we might have to add some logics to validate all datasources from

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-186459355 retest rhis please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-186463565 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-186480982 @rxin thanks! I will update soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-13381][SQL] Support for loading CSV wit...

2016-02-21 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11262#discussion_r53577852 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -345,6 +346,46 @@ class DataFrameReader private[sql](sqlContext

[GitHub] spark pull request: [SPARK-13381][SQL] Support for loading CSV wit...

2016-02-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11262#issuecomment-186956637 Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-13381][SQL] Support for loading CSV wit...

2016-02-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11262#issuecomment-186967397 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13309][SQL] Fix type inference issue wi...

2016-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11194#discussion_r52841904 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -387,4 +387,16 @@ class CSVSuite extends

[GitHub] spark pull request: [SPARK-13309][SQL] Fix type inference issue wi...

2016-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11194#issuecomment-183902081 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13309][SQL] Fix type inference issue wi...

2016-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11194#issuecomment-183880899 @rxin Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13309][SQL] Fix type inference issue wi...

2016-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11194#issuecomment-183883409 Overall, it looks good to me. This was already merged in https://github.com/databricks/spark-csv/pull/261 and the logic looks identical. --- If your project

[GitHub] spark pull request: [SPARK-13309][SQL] Fix type inference issue wi...

2016-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11194#issuecomment-183901833 Actually, I have had a thought that we might have to make a class such as `TestCSVData` for dataset for testing (similarly with [TestJsonData](https://github.com

[GitHub] spark pull request: [SPARK-13309][SQL] Fix type inference issue wi...

2016-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11194#discussion_r52841837 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -66,11 +69,7 @@ private[csv] object

[GitHub] spark pull request: [SPARK-13309][SQL] Fix type inference issue wi...

2016-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11194#discussion_r52842098 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -48,7 +47,11 @@ private[csv] object

[GitHub] spark pull request: [SPARK-13309][SQL] Fix type inference issue wi...

2016-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11194#discussion_r52842226 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala --- @@ -68,4 +68,10 @@ class

[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10502#issuecomment-182670583 @yhuai Would you look through this please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-187030511 I submitted some more commits. In summary, 1. Added a `DataSourceDetect` class separatly. 2. Now, it only picks a single file. If the given path

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-187031213 I submitted some more commits. In summary, 1. Added a `DataSourceDetect` class separatly. 2. Now, it only picks a single file. If the given path

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-187032665 I submitted some more commits. In summary, 1. Added a `DataSourceDetect` class separatly. 2. Now, it only picks a single file. If the given path

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11270#discussion_r53597525 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceDetection.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11270#discussion_r53602790 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceDetectionSuite.scala --- @@ -0,0 +1,70

[GitHub] spark pull request: [SPARK-12997][SQL] Use cast expression to perf...

2016-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10980#issuecomment-187107244 Hm.. I can't find a proper way to use `Cast()` for `DoubleType`, `FloatType` and `DecimalType`. Original ways of CSV casting works differently with the `Cast

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11270#discussion_r53600012 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceDetectionSuite.scala --- @@ -0,0 +1,70

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-187031347 I submitted some more commits. In summary, 1. Added a `DataSourceDetect` class separatly. 2. Now, it only picks a single file. If the given path

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11724#discussion_r56266566 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -86,6 +86,7 @@ private[csv] object

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11724#discussion_r56268274 --- Diff: sql/core/src/test/resources/decimal.csv --- @@ -0,0 +1,4 @@ +decimal +21602730330601001035858 --- End diff

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11756#discussion_r56300778 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -963,6 +963,31 @@ class JsonSuite extends

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11756#issuecomment-197224540 For example, the data below: ``` 1,2,3,4 3,2,1 ``` will produce the records below: - `PERMISSIVE` ``` Row(1,2,3,4

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11724#discussion_r56266282 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -108,14 +109,38 @@ private[csv] object

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11724#issuecomment-197089379 Just to make sure that checking precision work fine, the codes below work correctly. ```scala import java.math.BigDecimal import

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11724#issuecomment-197094083 this test please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13174][SparkR] Add read.csv and write.c...

2016-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11457#discussion_r55959195 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -26,7 +26,7 @@ test_that("Check masked functions", { maskedBySparkR

[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-196209492 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-196215307 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11724#issuecomment-196749567 cc @rxin @falaki --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11724#issuecomment-196751413 There should be a conflict with https://github.com/apache/spark/pull/11550. I will resolve the conflict as soon as either this one or that one is merged

[GitHub] spark pull request: [SPARK-13899][SQL] Produce InternalRow instead...

2016-03-15 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11717 [SPARK-13899][SQL] Produce InternalRow instead of external Row at CSV data source ## What changes were proposed in this pull request? This PR makes CSV data source produce

[GitHub] spark pull request: [SPARK-13899][SQL] Produce InternalRow instead...

2016-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11717#issuecomment-196710196 cc @rxin @falaki --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-03-15 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11724 [SPARK-13866][SQL] Handle decimal type in CSV inference at CSV data source. ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-13866

[GitHub] spark pull request: [SPARK-13899][SQL] Produce InternalRow instead...

2016-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11717#issuecomment-196754674 This PR would allow to infer `TimestampType` more flexibly (e.g. includeing `T` and `GMT`) rather than just using `Timestamp.valueOf()`. --- If your project

[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11550#issuecomment-194095931 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13728][SQL] Fix ORC PPD test so that pu...

2016-03-08 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11593 [SPARK-13728][SQL] Fix ORC PPD test so that pushed filters can be checked. ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/11509 makes

[GitHub] spark pull request: [SPARK-13728][SQL] Fix ORC PPD test so that pu...

2016-03-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11593#issuecomment-194035552 cc @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13728][SQL] Fix ORC PPD test so that pu...

2016-03-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11593#issuecomment-194036038 this this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13766][SQL] Consistent file extensions ...

2016-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11604#issuecomment-194539680 @rxin Sure. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11806#issuecomment-198206078 Actually, I did not understand why the overhead of compression at record (I mean a row in Spark, a key-value in Hadoop output format) level would be very high. I

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11806#issuecomment-198201760 I see.. Should I maybe close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/11270 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [MINOR][SQL] Use Hadoop 2.0 default value for ...

2016-03-19 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11806 [MINOR][SQL] Use Hadoop 2.0 default value for compression in data sources. ## What changes were proposed in this pull request? Currently, JSON, TEXT and CSV data sources use

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11806#issuecomment-198327029 @srewen Oh, wait. Should I better change `set()` to `setIfUnset()`? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11806#issuecomment-198209073 I see. AFAIK, record level compression does not actually compress whole record but only positions of the values. Could I maybe a bit wait until @tomwhite give some

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11806#issuecomment-198326625 Closing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-19 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/11806 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-15 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11752 [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows having an array type and a struct type in the same fieild ## What changes were proposed in this pull request? This https://github.com

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11752#issuecomment-197152181 cc @yhuai (Since the JIRA is pretty old one, I was confused if I should make a follow-up like this but I just made this since it is a follow-up

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11752#issuecomment-197152532 Just to make sure, I am doing this partly due to [SPARK-13764](https://issues.apache.org/jira/browse/SPARK-13764), which deals with parse modes just like in CSV

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11752#issuecomment-197152653 Just to make sure, I am doing this partly due to [SPARK-13764](https://issues.apache.org/jira/browse/SPARK-13764), which deals with parse modes just like in CSV

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11752#issuecomment-197152647 cc @yhuai (Since the JIRA is pretty old one, I was confused if I should make a follow-up like this but I just made this since it is a follow-up

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11752#issuecomment-197174373 Except the case above, all the types are set to `null` when fails to parse with a given schema. --- If your project is set up for it, you can reply to this email

<    1   2   3   4   5   6   7   8   9   10   >