[GitHub] spark pull request #13771: [SPARK-13748][PYSPARK][DOC] Add the description f...

2016-06-19 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13771 [SPARK-13748][PYSPARK][DOC] Add the description for explictly setting None for a named argument for ROW ## What changes were proposed in this pull request? It seems allowed

[GitHub] spark pull request #13725: [SPARK-15892][ML] Backport correctly merging AFTA...

2016-06-17 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/13725 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #13759: [SPARK-16044][SQL] input_file_name() returns empty strin...

2016-06-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13759 cc @cloud-fan Could you please take a look maybe? I remember renaming `SqlNewHadoopRDDState` to `InputFileNameHolder` was reviewed by you. --- If your project is set up for it, you can reply

[GitHub] spark issue #13725: [SPARK-15892][ML] Backport correctly merging AFTAggregat...

2016-06-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13725 I am closing this as this is merged. Thank you @mengxr again! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #13759: [SPARK-16044][SQL] input_file_name() returns empt...

2016-06-18 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13759 [SPARK-16044][SQL] input_file_name() returns empty strings in data sources based on NewHadoopRDD ## What changes were proposed in this pull request? This PR makes `input_file_name

[GitHub] spark issue #13771: [SPARK-13748][PYSPARK][DOC] Add the description for expl...

2016-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13771 Hi @davies, it seems related codes were written by you. Would this be a meaningful change? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #13866: [SPARK-16160] [STREAMING] Clear last remembered m...

2016-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13866#discussion_r68209882 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala --- @@ -18,10 +18,12 @@ package

[GitHub] spark issue #12268: [SPARK-14480][SQL] Simplify CSV parsing process with a b...

2016-06-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12268 @rxin I see. Actually, I added a benchmark for this change in the JIRA. So.. would this be okay if I do as below: 1. Get rid of `StringIteratorReader` 2. Refactoring (maybe

[GitHub] spark issue #12268: [SPARK-14480][SQL] Simplify CSV parsing process with a b...

2016-06-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12268 Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13759: [SPARK-16044][SQL] input_file_name() returns empty strin...

2016-06-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13759 Thank you @rxin! Would this be sensible if this one is backported to 1.6.2? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #13759: [SPARK-16044][SQL] input_file_name() returns empty strin...

2016-06-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13759 @rxin Sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #12268: [SPARK-14480][SQL] Simplify CSV parsing process w...

2016-06-21 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/12268 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #13640: [SPARK-15916][SQL] Correctly pushdown top level AND oper...

2016-06-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13640 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13640: [SPARK-15916][SQL] Correctly pushdown top level AND oper...

2016-06-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13640 cc @rxin do you mind if I ask to review this please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-12506][SQL]push down WHERE clause arith...

2016-01-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/10750#discussion_r49681620 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -480,10 +480,120 @@ private[sql] object

[GitHub] spark pull request: [SPARK-12506][SQL]push down WHERE clause arith...

2016-01-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10750#issuecomment-171513363 If we keep going to solve in `DataSourceStrategy` in this way, I think we should resolve the operators for other datasources. For this, dealing with `Cast` [SPARK

[GitHub] spark pull request: [SPARK-12506][SQL]push down WHERE clause arith...

2016-01-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/10750#discussion_r49679753 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -480,10 +480,120 @@ private[sql] object

[GitHub] spark pull request: [SPARK-12506][SQL]push down WHERE clause arith...

2016-01-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10750#issuecomment-171513129 Since `source.Filter` is shared with Parquet, ORC and etc., I think this might have to resolve the arithmetic operators in `DataSourceStrategy` itself

[GitHub] spark pull request: [SPARK-12506][SQL]push down WHERE clause arith...

2016-01-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10750#issuecomment-171517766 @huaxingao please change the title not to end with … --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-12506][SQL]push down WHERE clause arith...

2016-01-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10750#issuecomment-171527726 @viirya Yes, I think so. But the reason why I did not give a try for this is, `expression._` is being rapidly changed, which could affect to update codes

[GitHub] spark pull request: [SPARK-12506][SQL]push down WHERE clause arith...

2016-01-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10750#issuecomment-171515046 Actually, I also suggested the way similar with this in [SPARK-9182](https://issues.apache.org/jira/browse/SPARK-9182). If we keep adding filters in this way

[GitHub] spark pull request: [SPARK-12668] Providing aliases for CSV option...

2016-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10800#issuecomment-172442044 Sure. Do you think it would be great if JSON datasource has that one as well? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-12668] Renaming CSV options to be simil...

2016-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10800#issuecomment-172435937 @falaki Ah, right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12668] Renaming CSV options to be simil...

2016-01-17 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/10800 [SPARK-12668] Renaming CSV options to be similar to Pandas and R https://issues.apache.org/jira/browse/SPARK-12668 Spark CSV datasource has been being merged (file in [SPARK-12420

[GitHub] spark pull request: [SPARK-12668] Providing aliases for CSV option...

2016-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10800#issuecomment-172441735 Sorry that I misunderstood the issue patch. For `codec` I was thinking it was removed intendedly. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-12668][SQL] Providing aliases for CSV o...

2016-01-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10800#issuecomment-172459208 Hm.. Could you know anything about the test `test different encoding`? This is ignored and when I test this, this emits exception below. ``` 00:17

[GitHub] spark pull request: [SPARK-12668][SQL] Providing aliases for CSV o...

2016-01-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10800#issuecomment-172463923 Ah.. `CSVParameters` is not serializable but was trying to pass it out of driver side --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-12871][SQL] Support to specify the opti...

2016-01-18 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/10805 [SPARK-12871][SQL] Support to specify the option for compression codec. https://issues.apache.org/jira/browse/SPARK-12871 This PR added an option to support to specify compression codec

[GitHub] spark pull request: [SPARK-12668][SQL] Providing aliases for CSV o...

2016-01-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10800#issuecomment-172714365 This style failure is occuring in `sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala` which I did not change and it is fine in ny local

[GitHub] spark pull request: [SPARK-12668][SQL] Providing aliases for CSV o...

2016-01-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10800#issuecomment-172706793 Hm.. @rxin Can I make it like `JSONOptions` in a separate PR? Now the object of this class is being passed so it should be serializable. So, I think I should

[GitHub] spark pull request: [SPARK-12871][SQL] Support to specify the opti...

2016-01-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10805#issuecomment-17234 Although `CSVCompressionCodecs` might be shared with JSON datasource, I will make that share this at the separate PR for JSON. --- If your project is set up

[GitHub] spark pull request: [SPARK-12668][SQL] Providing aliases for CSV o...

2016-01-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10800#issuecomment-172729369 I am not too sure why I am hitting this issue though, but I just corrected some imports in an alphabetical order at `SparkStrategies` and `InnerJoinSuite

[GitHub] spark pull request: [SPARK-12871][SQL] Support to specify the opti...

2016-01-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10805#issuecomment-172747397 I will resolve conflicts and update this soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-12871][SQL] Support to specify the opti...

2016-01-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10805#issuecomment-172766510 Supported shorten names for compression codecs are below (case insensitive): `bzip2` -> `org.apache.hadoop.io.compress.BZip2Codec` `g

[GitHub] spark pull request: [SPARK-12871][SQL] Support to specify the opti...

2016-01-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/10805#discussion_r50053254 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParameters.scala --- @@ -71,6 +71,8 @@ private[sql] case class

[GitHub] spark pull request: [SPARK-12668][SQL] Providing aliases for CSV o...

2016-01-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/10800#discussion_r50052445 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParameters.scala --- @@ -44,9 +44,11 @@ private[sql] case class

[GitHub] spark issue #13845: [SPARK-13023][PROJECT INFRA][FOLLOWUP][BRANCH-1.6] Unabl...

2016-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13845 Sure! Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13845: [SPARK-13023][PROJECT INFRA][FOLLOWUP][BRANCH-1.6...

2016-06-27 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/13845 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #13806: [SPARK-16044][SQL] Backport input_file_name() for data s...

2016-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13806 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #13806: [SPARK-16044][SQL] Backport input_file_name() for data s...

2016-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13806 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #13806: [SPARK-16044][SQL] Backport input_file_name() for data s...

2016-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13806 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #13806: [SPARK-16044][SQL] Backport input_file_name() for data s...

2016-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13806 Again, I think the error message is not related with this change. I will retest this and meanwhile try to build in my local. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #13806: [SPARK-16044][SQL] Backport input_file_name() for data s...

2016-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13806 Hm... am I doing something wrong here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13806: [SPARK-16044][SQL] Backport input_file_name() for data s...

2016-06-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13806 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #13806: [SPARK-16044][SQL] Backport input_file_name() for data s...

2016-06-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13806 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #13942: [SPARK-16250][PYSPARK] Can't use escapeQuotes opt...

2016-06-28 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13942 [SPARK-16250][PYSPARK] Can't use escapeQuotes option in DataFrameWriter.csv() ## What changes were proposed in this pull request? This PR corrects `DataFrameWriter.csv()` to use

[GitHub] spark issue #13806: [SPARK-16044][SQL] Backport input_file_name() for data s...

2016-06-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13806 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #13806: [SPARK-16044][SQL] Backport input_file_name() for data s...

2016-06-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13806 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #13912: [SPARK-16216][SQL] CSV data source does not write...

2016-06-26 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13912 [SPARK-16216][SQL] CSV data source does not write date and timestamp correctly ## What changes were proposed in this pull request? This PR corrects CSV data sources in order to write

[GitHub] spark issue #13912: [SPARK-16216][SQL] CSV data source does not write date a...

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13912 Hm.. Anyway do you think I should close this and deal with JSON one? I can't decide. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #13912: [SPARK-16216][SQL] CSV data source does not write date a...

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13912 Thank you for your comment @srowen ! I did this for two reasons. Firstly, it can't write and read back the dates or timestamps but it might have to be converted manually by users

[GitHub] spark issue #13845: [SPARK-13023][PROJECT INFRA][FOLLOWUP][BRANCH-1.6] Unabl...

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13845 @JoshRosen Could you take a look by any change please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13910: make SparkPi iteration time correct

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13910 Get ready to receive a comment suggesting following https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark.. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #13912: [SPARK-16216][SQL] CSV data source does not write date a...

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13912 @srowen How about leaving them as timestamps long values but write them as formatted strings only when `dateFormat` is given? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #13912: [SPARK-16216][SQL] CSV data source does not write...

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13912#discussion_r68498245 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala --- @@ -195,18 +202,50 @@ private[sql] class

[GitHub] spark issue #13912: [SPARK-16216][SQL] CSV data source does not write date a...

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13912 I understand the problem without a timezone but.. hm.. to be honest, I have seen a lot of cases of reading and writing times without a timezone (e.g. by Microsoft Excel). --- If your project

[GitHub] spark pull request #13912: [SPARK-16216][SQL] CSV data source does not write...

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13912#discussion_r68511688 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala --- @@ -195,18 +202,50 @@ private[sql] class

[GitHub] spark pull request #13912: [SPARK-16216][SQL] CSV data source does not write...

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13912#discussion_r68511969 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala --- @@ -195,18 +202,50 @@ private[sql] class

[GitHub] spark issue #13915: [SPARK-16081][BUILD] Disallow using `l` as variable name

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13915 Oh, this was queued up in my local to submit a PR. I will leave a comment just to note that this complies https://github.com/databricks/scala-style-guide#variable-naming-convention

[GitHub] spark issue #13912: [SPARK-16216][SQL] CSV data source supports custom date ...

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13912 @srowen Hm.. I just wonder if I should correct JSON data source to be consistent with this meaning these below: 1. Default is to write time as numeric timestamps 2. Supports both

[GitHub] spark pull request #13917: [MINOR][SQL][TEST] checkAnswer does not print the...

2016-06-26 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13917 [MINOR][SQL][TEST] checkAnswer does not print the rows being compared in a correct order ## What changes were proposed in this pull request? This PR corrects the order of rows being

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types as option in ...

2016-06-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13517 cc @davies and @rxin I addressed the comments. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #13917: [MINOR][SQL][TEST] checkAnswer does not print the...

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/13917 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #13917: [MINOR][SQL][TEST] checkAnswer does not print the rows b...

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13917 Sorry, this will prints wrong items when the order is different. Closing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #13912: [SPARK-16216][SQL] CSV data source supports custom date ...

2016-06-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13912 @rxin, Could you please take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13944: Fix mapping Microsoft SQLServer dialect

2016-06-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13944 (I think it would be even nicer if this PR follows https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10502#issuecomment-182688789 @yhuai Would you look through this please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13260][SQL] count(*) does not work with...

2016-02-11 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11169 [SPARK-13260][SQL] count(*) does not work with CSV data source https://issues.apache.org/jira/browse/SPARK-13260 This is a quicky fix for `count(*)`. When the `requiredColumns

[GitHub] spark pull request: [SPARK-13260][SQL] count(*) does not work with...

2016-02-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11169#issuecomment-183119538 cc @rxin @falaki --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13260][SQL] count(*) does not work with...

2016-02-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11169#issuecomment-183158218 @rxin Can we maybe merge this for now and then take the optimisation into account in another PR? This optimisation would apply to all the pruned scan

[GitHub] spark pull request: [SPARK-13260][SQL] count(*) does not work with...

2016-02-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11169#issuecomment-183146848 This [CSVRelation.scala#L193-L199](https://github.com/HyukjinKwon/spark/blob/SPARK-13260/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv

[GitHub] spark pull request: [SPARK-13260][SQL] count(*) does not work with...

2016-02-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11169#issuecomment-183146606 I think I should have described this in more details. This works identical with the original CSV datasource. When the parsing mode is drop-malformed

[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10502#issuecomment-180324662 I will resolve the conflict on Monday. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12997][SQL] Use cast expression to perf...

2016-01-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10980#issuecomment-176739657 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12997][SQL] Use cast expression to perf...

2016-01-29 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/10980 [SPARK-12997][SQL] Use cast expression to perform type cast in csv https://issues.apache.org/jira/browse/SPARK-12997 CSVTypeCast.castTo should probably be removed, and just replace its

[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

2016-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11005#discussion_r51493048 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -455,31 +458,31 @@ class JDBCSuite extends SparkFunSuite

[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

2016-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11005#issuecomment-178233988 cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

2016-02-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11016#issuecomment-179678080 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

2016-02-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11016#issuecomment-179685723 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13114][SQL] Add a test for tokens more ...

2016-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11020#discussion_r51534298 --- Diff: sql/core/src/test/resources/cars-malformed.csv --- @@ -0,0 +1,5 @@ +year,make,model,comment,blank --- End diff -- Sure

[GitHub] spark pull request: [SPARK-13114][SQL] Add a test for tokens more ...

2016-02-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11020#issuecomment-178442040 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13137][SQL] NullPoingException in schem...

2016-02-02 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11023 [SPARK-13137][SQL] NullPoingException in schema inference for CSV when the first line is empty https://issues.apache.org/jira/browse/SPARK-13137 This PR adds a filter in schema

[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

2016-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11005#issuecomment-178420517 As @maropu said, for me I also agree with him but some may think differently as mentioned [here](https://github.com/apache/spark/pull/10750#issuecomment-175400704

[GitHub] spark pull request: [SPARK-13114][SQL] Add a test for tokens more ...

2016-02-01 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11020 [SPARK-13114][SQL] Add a test for tokens more than the fields in schema https://issues.apache.org/jira/browse/SPARK-13114 This PR adds a test for tokens more than the fields in schema

[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

2016-02-01 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11016 [SPARK-13108][SQL] Validate ascii compatible encodings https://issues.apache.org/jira/browse/SPARK-13108 CSV datasource currently does not support non-ascii compatible encodings

[GitHub] spark pull request: [SPARK-13137][SQL] NullPoingException in schem...

2016-02-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11023#issuecomment-178468002 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-187607865 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-02-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-187603526 I made some tests `ignored` for testing default data source and also commented some pyspark tests instead of setting `unittest.skip()` as they include some

[GitHub] spark pull request: [SPARK-12997][SQL] Use cast expression to perf...

2016-02-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10980#issuecomment-188103804 I am closing this. BTW, I think the issue itself is a little bit questionable at the end. If we can apply the same rules for casting, then it would

[GitHub] spark pull request: [SPARK-12997][SQL] Use cast expression to perf...

2016-02-23 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/10980 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-12997][SQL] Use cast expression to perf...

2016-02-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10980#issuecomment-188094839 @falaki Do you think we should close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12997][SQL] Use cast expression to perf...

2016-02-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10980#issuecomment-188094727 @falaki As far as I understood, I am trying to use the existing casting tools in Spark by using`Cast()`. However, it ended up with additional cases

[GitHub] spark pull request: [SPARK-12355][SQL] Implement unhandledFilter i...

2016-02-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/10502#issuecomment-189119103 @yhuai No problem. Then, please inform me when I am supposed to do something else. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-13503][SQL] Support to specify the (wri...

2016-02-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11384#issuecomment-189118328 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13503][SQL] Support to specify the (wri...

2016-02-25 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/11384 [SPARK-13503][SQL] Support to specify the (writing) option for compression codec for TEXT ## What changes were proposed in this pull request? This PR makes the TEXT datasource can

[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11389#issuecomment-189979099 @rxin Actually, do you think we need the `compression` option for Parquet and ORC as well (although I am not going to deal with them in this PR even if we need

[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11389#issuecomment-189987175 @rxin Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13108][SQL] Validate ascii compatible e...

2016-02-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11016#issuecomment-190018739 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11389#issuecomment-190031556 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-13509][SPARK-13507][SQL] Support for wr...

2016-02-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11389#issuecomment-190033580 Hm... It looks a bit weird. `ParquetHadoopFsRelationSuite` the tests about types ("test all data types - ...") keep failing. This is also happening at o

<    1   2   3   4   5   6   7   8   9   10   >