[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221508989
  
**[Test build #3018 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3018/consoleFull)**
 for PR 13267 at commit 
[`8c4bef1`](https://github.com/apache/spark/commit/8c4bef1fb465080db1a9197c0a30b39c0f24b02c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221489074
  
BTW don'r forget to update the title too.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221487940
  
**[Test build #3018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3018/consoleFull)**
 for PR 13267 at commit 
[`8c4bef1`](https://github.com/apache/spark/commit/8c4bef1fb465080db1a9197c0a30b39c0f24b02c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13267#discussion_r64522811
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -787,6 +787,9 @@ def csv(self, path, mode=None, compression=None, 
sep=None, quote=None, escape=No
   value, ``"``.
 :param escape: sets the single character used for escaping quotes 
inside an already
quoted value. If None is set, it uses the default 
value, ``\``
+:param escapeQuotes: A flag indicating whether values containing 
quotes should always
+ be enclosed in quotes. Default is to escape 
all values containing
+ a quote character. ``true``
--- End diff --

what's the true at the end here?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13267#discussion_r64522859
  
--- Diff: sql/core/pom.xml ---
@@ -39,7 +39,7 @@
 
   com.univocity
   univocity-parsers
-  2.1.0
+  2.1.1
--- End diff --

this will fail the dependency check - you will need to update the dep list. 
Jenkins will tell you what to do.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-25 Thread jurriaan
Github user jurriaan commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221486596
  
@rxin Done :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221469255
  
Yea I agree with escapeQuotes.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221469276
  
@jurriaan want to do the change?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread falaki
Github user falaki commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221468338
  
@rxin and @jurriaan I agree to keep it set by default. However, I think it 
is better to leave it configurable. In two cases before, I assumed a reasonable 
default value is good enough, but ended up exposing them in options. 

Also, I suggest a simpler name like `escapeQuotes` or 
`enableQuoteEscaping`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread jbax
Github user jbax commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221454486
  
@rxin In your case think it's better to have this turned on by default. 
Regarding your other questions:

1 - There's no timeline. 2.2.x will come out when new features are 
requested by our users and implemented. Currently there's nothing in the 
pipeline so we'll be on 2.1.x adding fixes and minor internal improvements over 
time. We have no open bugs either.

2 - Yes. It fixes a couple of bugs you guys probably won't come across, but 
it also improves the performance of the parser with whitespace trimming enabled 
(it's enabled by default, by the way).

3 - It's OK and I don't see why it would be a problem, other than having 
some client with a very uncommon use case (they are out there, that's why the 
library has a lot of configuration options).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221449233
  
Thanks, @jbax. Given this I think we should just have it on by default.

Some follow-up questions:

1. When will 2.2.x come out?
2. We should probably upgrade to 2.1.1 right?
3. Would it be OK if we always have this on (i.e. not having this option 
exposed to users at all)?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread jbax
Github user jbax commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221408197
  
It's disabled by default because earlier versions were slower when writing
CSV and it helped a little bit. Also because parsing unqoted values is
faster.

With version 2.1.0 the new algorithm made the writing performance improve a
lot, and having quoteEscaping enabled now makes writing faster. I found
this out after testing version 2.1.1 (a maintenance release) so I didn't
change the default behavior.

Versions 2.2.x and up will have this enabled by default.
On 25 May 2016 6:33 AM, "Reynold Xin"  wrote:

> @jbax  can we get a 2nd opinion here about
> quoteEscapingEnabled?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly or view it on GitHub
> 
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221390796
  
@jbax can we get a 2nd opinion here about quoteEscapingEnabled?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread jurriaan
Github user jurriaan commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221387961
  
@rxin Good question, I'm not sure what's the best approach here. It looks 
like setting the flag to true by default could be a good choice.

The comment at 
[https://github.com/uniVocity/univocity-parsers/blob/f3eb2af26374940e60d91d1703bde54619f50c51/src/main/java/com/univocity/parsers/csv/CsvWriterSettings.java#L231-L247](src/main/java/com/univocity/parsers/csv/CsvWriterSettings.java)
 mentions the default behaviour (quoteEscapingEnabled set to false) is not 
valid CSV according to RFC 4180.

To quote RFC 4180:
> If fields are not enclosed with double quotes, then double quotes may not 
appear inside the fields.

I'm not sure why they chose to turn it off by default. Maybe performance 
reasons? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221385900
  
Thanks - a follow up question: should this flag ever be false?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread jurriaan
Github user jurriaan commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221380701
  
@rxin 
An example using the following dataframe:

```
spark.createDataFrame([['test "quote"', 123, 'it "works"!', '"very" well']])
```

The default CSV behaviour will save the data like this when you specify `"` 
as quote and as escape char:
```
test "quote",123,it "works"!,"""very"" well"
```

With quoteEscapingEnabled set to true the output looks like this:
```
"test ""quote""",123,"it ""works""!","""very"" well"
```

As you can see the default does wrap a value in quotes only if it starts 
with quotes. When quoteEscapingEnabled is turned on it wraps all values 
containing quotation characters in quotes. This is needed in some CSV dialects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221370006
  
Can we explain using an example what this does when it is off? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221185335
  
**[Test build #3011 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3011/consoleFull)**
 for PR 13267 at commit 
[`caf8808`](https://github.com/apache/spark/commit/caf8808c78cd3b6feedc34ebbf02a05a6d194034).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread jurriaan
Github user jurriaan commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221176753
  
@HyukjinKwon Addressed your comments and improved the documentation a bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread jurriaan
Github user jurriaan commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221176282
  
@HyukjinKwon If you don't supply those options they are set to the 
defaults. For the workings of the setQuoteEscapingEnabled see 
https://github.com/uniVocity/univocity-parsers/issues/38. In the test I 
supplied them to show a possible usecase (Redshift's CSV dialect). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221171295
  
**[Test build #3011 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3011/consoleFull)**
 for PR 13267 at commit 
[`caf8808`](https://github.com/apache/spark/commit/caf8808c78cd3b6feedc34ebbf02a05a6d194034).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221137806
  
@jurriaan Just to double check.. It dose not escape `quote`s if `quote` 
and/or `escape` are not set?
I think they might better be documented..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13267#discussion_r64313317
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -364,6 +364,33 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("save csv with quote escaping enabled") {
+withTempDir { dir =>
+  val csvDir = new File(dir, "csv").getCanonicalPath
+
+  val df = spark.createDataFrame(Seq(("test \"quote\"", 123,
+ "it \"works\"!", "\"very\" 
well")))
+.toDF("a", "b", "c", "d")
+
+  df.coalesce(1).write
+.format("csv")
+.option("quote", "\"")
+.option("escape", "\"")
+.option("quoteEscapingEnabled", "true")
+.save(csvDir)
+
+  val results = spark.read
+.format("text")
+.load(csvDir)
+.collect()
+
+  val expected = Seq(Seq("\"test \"\"quote\"\"\",123,\"it 
\"\"works\"\"!\"," +
+   "\"\"\"very\"\" well\""))
+
--- End diff --

Here too.. maybe 

```scala
val expected = 
  Seq(Seq("\"test \"\"quote\"\"\",123,\"it \"\"works\"\"!\","\"\"\"very\"\" 
well\""))
```

and etc..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13267#discussion_r64312893
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -364,6 +364,33 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("save csv with quote escaping enabled") {
+withTempDir { dir =>
+  val csvDir = new File(dir, "csv").getCanonicalPath
+
+  val df = spark.createDataFrame(Seq(("test \"quote\"", 123,
+ "it \"works\"!", "\"very\" 
well")))
+.toDF("a", "b", "c", "d")
+
+  df.coalesce(1).write
+.format("csv")
+.option("quote", "\"")
+.option("escape", "\"")
+.option("quoteEscapingEnabled", "true")
+.save(csvDir)
+
+  val results = spark.read
+.format("text")
+.load(csvDir)
+.collect()
+
+  val expected = Seq(Seq("\"test \"\"quote\"\"\",123,\"it 
\"\"works\"\"!\"," +
+   "\"\"\"very\"\" well\""))
--- End diff --

Here too.. I see other codes doing such as ..

```scala
val expected = 
  Seq(Seq("\"test \"\"quote\"\"\",123,\"it \"\"works\"\"!\","\"\"\"very\"\" 
well\""))
```
and etc..



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13267#discussion_r64312722
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -364,6 +364,33 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("save csv with quote escaping enabled") {
+withTempDir { dir =>
+  val csvDir = new File(dir, "csv").getCanonicalPath
+
+  val df = spark.createDataFrame(Seq(("test \"quote\"", 123,
+ "it \"works\"!", "\"very\" 
well")))
+.toDF("a", "b", "c", "d")
--- End diff --

Indentation here.. maybe. I see other codes follow the indentations such as:

```scala
spark.createDataFrame(Seq(
  ("test \"quote\"", 123, "it \"works\"!", "\"very\" well")
)).toDF("a", "b", "c", "d")
```

```scala
val data = Seq(("test \"quote\"", 123, "it \"works\"!", "\"very\" well"))
spark.createDataFrame(data).toDF("a", "b", "c", "d")
```

and etc..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13267#discussion_r64312653
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -364,6 +364,33 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("save csv with quote escaping enabled") {
+withTempDir { dir =>
+  val csvDir = new File(dir, "csv").getCanonicalPath
+
+  val df = spark.createDataFrame(Seq(("test \"quote\"", 123,
+ "it \"works\"!", "\"very\" 
well")))
--- End diff --

Indentation here.. maybe. I see other codes follow the indentations such as:

```scala
spark.createDataFrame(
  Seq(("test \"quote\"", 123, "it \"works\"!", "\"very\" well")))
```

```scala
spark.createDataFrame(Seq(
  ("test \"quote\"", 123, "it \"works\"!", "\"very\" well")
))
```

```scala
val data = Seq(("test \"quote\"", 123, "it \"works\"!", "\"very\" well"))
spark.createDataFrame(data)
```

and etc..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221115849
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-23 Thread jurriaan
Github user jurriaan commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221115470
  
cc @rxin @HyukjinKwon 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-23 Thread jurriaan
GitHub user jurriaan opened a pull request:

https://github.com/apache/spark/pull/13267

[SPARK-15493][SQL] Allow setting the quoteEscapingEnabled flag when writing 
CSV

## What changes were proposed in this pull request?

See 
https://github.com/uniVocity/univocity-parsers/blob/f3eb2af26374940e60d91d1703bde54619f50c51/src/main/java/com/univocity/parsers/csv/CsvWriterSettings.java#L231-L247

This kind of functionality is needed to be able to write Amazon Redshift 
compatible CSV files 
(https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-csv)

https://issues.apache.org/jira/browse/SPARK-15493

## How was this patch tested?

Added a test that verifies the output is quoted correctly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jurriaan/spark quote-escaping

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13267


commit 23cb46c83afd4458078c204665130091a65ceec6
Author: Jurriaan Pruis 
Date:   2016-05-23T22:25:52Z

[SPARK-15493][SQL] Allow setting the quoteEscapingEnabled flag when writing 
CSV

See 
https://github.com/uniVocity/univocity-parsers/blob/f3eb2af26374940e60d91d1703bde54619f50c51/src/main/java/com/univocity/parsers/csv/CsvWriterSettings.java#L231-L247

This kind of functionality is needed to be able to write Amazon Redshift 
compatible CSV files 
(https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-csv)

https://issues.apache.org/jira/browse/SPARK-15493




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org