[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-02 Thread jbax
Github user jbax commented on the issue:

https://github.com/apache/spark/pull/21892
  
univocity-parsers-2.7.3 released. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-01 Thread jbax
Github user jbax commented on the issue:

https://github.com/apache/spark/pull/21892
  
Thanks @MaxGekk I've fixed the error and also made the parser run faster 
than before when processing fields that were not selected in general. 

Can you please retest with the latest SNAPSHOT build and let me know how it 
goes?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-31 Thread jbax
Github user jbax commented on the issue:

https://github.com/apache/spark/pull/21892
  
Did anyone had a chance to test with the 2.7.3-SNAPSHOT build I released to 
see if the performance issue has been addressed?  If it has then let me know 
and I'll release the final 2.7.3 build.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-14 Thread jbax
Github user jbax commented on the issue:

https://github.com/apache/spark/pull/17177
  
2.4.0 released, thank you guys!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread jbax
Github user jbax commented on the issue:

https://github.com/apache/spark/pull/17177
  
Doesn't seem correct to me. All test cases are using broken CSV and trigger 
the parser handling of unescaped quotes, where it tries to rescue the data and 
produce something sensible. See my test case here: 
https://github.com/uniVocity/univocity-parsers/issues/143


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread jbax
Github user jbax commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221454486
  
@rxin In your case think it's better to have this turned on by default. 
Regarding your other questions:

1 - There's no timeline. 2.2.x will come out when new features are 
requested by our users and implemented. Currently there's nothing in the 
pipeline so we'll be on 2.1.x adding fixes and minor internal improvements over 
time. We have no open bugs either.

2 - Yes. It fixes a couple of bugs you guys probably won't come across, but 
it also improves the performance of the parser with whitespace trimming enabled 
(it's enabled by default, by the way).

3 - It's OK and I don't see why it would be a problem, other than having 
some client with a very uncommon use case (they are out there, that's why the 
library has a lot of configuration options).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-24 Thread jbax
Github user jbax commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221408197
  
It's disabled by default because earlier versions were slower when writing
CSV and it helped a little bit. Also because parsing unqoted values is
faster.

With version 2.1.0 the new algorithm made the writing performance improve a
lot, and having quoteEscaping enabled now makes writing faster. I found
this out after testing version 2.1.1 (a maintenance release) so I didn't
change the default behavior.

Versions 2.2.x and up will have this enabled by default.
On 25 May 2016 6:33 AM, "Reynold Xin" <notificati...@github.com> wrote:

> @jbax <https://github.com/jbax> can we get a 2nd opinion here about
> quoteEscapingEnabled?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/13267#issuecomment-221390796>
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-04 Thread jbax
Github user jbax commented on the pull request:

https://github.com/apache/spark/pull/12818#issuecomment-216770729
  
By the way, may I suggest you guys to upgrade to version 2.1.0 as it comes 
with substantial performance improvements for parsing and writing CSV.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread jbax
Github user jbax commented on the pull request:

https://github.com/apache/spark/pull/12818#issuecomment-216747285
  
Foo and bar are part of the same value, they just happen to have a line 
ending in between. And yes `setLineSeparator()` it is related to the values 
themselves when writing, unless `normalizeLineEndingsWithinQuotes=false`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread jbax
Github user jbax commented on the pull request:

https://github.com/apache/spark/pull/12818#issuecomment-216743260
  
What happens if you do this:
```
scala> "foo\r\nbar\r\n".stripLineEnd
```
Shouldn't the result be this?
```
res0: String = foo\r\n
bar
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread jbax
Github user jbax commented on the pull request:

https://github.com/apache/spark/pull/12818#issuecomment-216738292
  
I just read the rest of this ticket. Be careful with the 
`setLineSeparator()`. It uses the default OS line separator but that's not 
always desired.

By default, this is used to transform the `normalizedLineSeparator` when 
writing. If you are running on Windows this will be:

```
normalizedLineSeparator = '\n'
lineSeparator = '\r\n'
```

Then write a value such as `"my \n multi line \n value"`

You will end up with `"my \r\n multi line \r\n value"`

If you actually want to have  `"my \n multi line \n value"` you must either:
- set the line separator explicitly to `\n`
- set `normalizeLineEndingsWithinQuotes` to `false`, so whatever is in the 
input will be written to the output, without any line ending transformation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][SQL] Remove not affected settings for ...

2016-05-03 Thread jbax
Github user jbax commented on the pull request:

https://github.com/apache/spark/pull/12818#issuecomment-216737346
  
Confirmed. It is only used if you call `CsvWriter.commentRow()` or  
`CsvWriter.commentRowToString()` to write comments to the output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org