> On 23 Jul 2017, at 09:55, Peter Uhnak <i.uh...@gmail.com> wrote:
> 
> Ah, ByteArrayMap, I was trying ByteArray.

ByteArrayMap is not a class, it is still a ByteArray, but of size 256 used as 
an inclusion map for characters that fit a byte.

BTW, using CharacterSet as an abstraction makes the trick easier.

I committed:

===
Name: Neo-CSV-Core-SvenVanCaekenberghe.26
Author: SvenVanCaekenberghe
Time: 23 July 2017, 1:54:22.095185 pm
UUID: e429f87e-5c11-0d00-a72d-ae5d00db2a8c
Ancestors: Neo-CSV-Core-PeterUhnak.25

make the core test in #writeOptionalQuotedField faster by using a primitive 
when possible

add #testWideOptionalQuoted
===
Name: Neo-CSV-Tests-SvenVanCaekenberghe.22
Author: SvenVanCaekenberghe
Time: 23 July 2017, 1:54:52.133003 pm
UUID: 5f81c280-5c11-0d00-a72e-7cd500db2a8c
Ancestors: Neo-CSV-Tests-PeterUhnak.21

make the core test in #writeOptionalQuotedField faster by using a primitive 
when possible

add #testWideOptionalQuoted
===

Extra caution was needed for WideStrings.

Sven

> Not sure what the next step is here: should I add it and send you mczs, or 
> will you do it yourself? (it is a simple change).
> 
> Peter
> 
> On Sat, Jul 22, 2017 at 10:51:42PM +0200, Sven Van Caekenberghe wrote:
>> Peter,
>> 
>>> On 22 Jul 2017, at 22:27, Peter Uhnak <i.uh...@gmail.com> wrote:
>>> 
>>> Hi Sven,
>>> 
>>> my use case was hand-edited CSVs (therefore the quotes are extra clutter), 
>>> which imples that I would be hand-viewing/editing only small CSVs (no 
>>> performance issues).
>>> 
>>> I agree that we should err on the safe side with CR & LF (e.g. tools may 
>>> sometimes autoconvert line endings).
>>> 
>>> Regarding performance:
>>> 
>>> #findFirstInString:inSet:startingAt: didn't work for me (not sure if bug, 
>>> or I don't know how to use), so I've trried with inCharacterSet:
>> 
>> Yes, it is a bitch to use, the the version you used does not use the 
>> primitive.
>> 
>> Try playing with this:
>> 
>> s10 := 'a' repeat: 10.
>> s100 := 'a' repeat: 100.
>> 
>> searchSet := ByteArray new: 256 withAll: 0.
>> ',"', String crlf do: [ :each | searchSet at: each asInteger + 1 put: 1 ].
>> 
>> searchSet := (CharacterSet newFrom: ',"', String crlf) byteArrayMap.
>> 
>> [ ByteString findFirstInString: s10 inSet: searchSet startingAt: 1 ] bench.
>> "15,160,161 per second"
>> [ ByteString findFirstInString: s100 inSet: searchSet startingAt: 1 ] bench.
>> "8,219,081 per second"
>> 
>> ByteString findFirstInString: ',"', String crlf inSet: searchSet startingAt: 
>> 1.
>> 
>> Sven
>> 
>>> Tested on worst-case scenario - strings don't contain tested symbols.
>>> 
>>> s10 := 'a' repeat: 10.
>>> s100 := 'a' repeat: 100.
>>> 
>>> [ {'"'. ','. String cr. String lf } anySatisfy: [ :each | s10 
>>> includesSubstring: each ] ] bench. "'1,200,046 per second'"
>>> [ {'"'. ','. String cr. String lf } anySatisfy: [ :each | s100 
>>> includesSubstring: each ] ] bench. "'495,482 per second'"
>>> 
>>> [ s10 includesAny: { $,. $". Character cr. Character lf } ] bench. 
>>> "'2,819,416 per second'"
>>> [ s100 includesAny: { $,. $". Character cr. Character lf } ] bench. 
>>> "'2,200,668 per second'"
>>> 
>>> [ ByteString findFirstInString: s10 inCharacterSet: ',"', String crlf 
>>> startingAt: 1 ] bench. "'1,187,324 per second'"
>>> [ ByteString findFirstInString: s100 inCharacterSet: ',"', String crlf 
>>> startingAt: 1 ] bench. "'165,526 per second'"
>>> 
>>> 
>>> #includesAny: seems to be the best by far.
>>> 
>>> Storing the tested characters didn't improve it by much.
>>> 
>>> Peter
>>> 
>>> On Sat, Jul 22, 2017 at 06:51:31PM +0200, Sven Van Caekenberghe wrote:
>>>> Hi Peter,
>>>> 
>>>>> On 22 Jul 2017, at 14:12, Peter Uhnak <i.uh...@gmail.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> this is a continuation of an older thread about quoting fields only when 
>>>>> necessary. ( 
>>>>> http://forum.world.st/NeoCSVWriter-automatic-quotes-td4924781.html )
>>>>> 
>>>>> I've attached fileouts of NeoCSV packages with the addition (I don't know 
>>>>> if Metacello can file-out only changesets).
>>>>> 
>>>>> The change doesn't affect the default behavior, only when explicitly 
>>>>> requested.
>>>>> 
>>>>> Peter
>>>> 
>>>> I accepted your changes as such, the .mcz's were copied over to the main 
>>>> repositories. This is a pure & clean extension, so that is good. Thank you.
>>>> 
>>>> This option is always going to be slower, but the current implementation 
>>>> might be improved, I think.
>>>> 
>>>> The key test in #writeOptionalQuotedField:
>>>> 
>>>> {
>>>> lineEnd asString.
>>>> separator asString.
>>>> '"' } anySatisfy: [ :each | string includesSubstring: each ]
>>>> 
>>>> will be quite slow. 
>>>> 
>>>> If we accept a little bit of (over safe) error on EOL and use any 
>>>> occurrence of CR or LF as needing a quote, we could switch to characters 
>>>> to test for. There exists a fast (primitive) test, 
>>>> #findFirstInString:inSet:startingAt: that can do all the testing in one 
>>>> go. If your version turns out to be slow, we could try that, if 
>>>> measurements show a difference.
>>>> 
>>>> Regards,
>>>> 
>>>> Sven 
>>>> 
>>>> 
>>> 
>> 
>> 
> 


Reply via email to