> On 23 Jul 2017, at 09:55, Peter Uhnak <i.uh...@gmail.com> wrote: > > Ah, ByteArrayMap, I was trying ByteArray.
ByteArrayMap is not a class, it is still a ByteArray, but of size 256 used as an inclusion map for characters that fit a byte. BTW, using CharacterSet as an abstraction makes the trick easier. I committed: === Name: Neo-CSV-Core-SvenVanCaekenberghe.26 Author: SvenVanCaekenberghe Time: 23 July 2017, 1:54:22.095185 pm UUID: e429f87e-5c11-0d00-a72d-ae5d00db2a8c Ancestors: Neo-CSV-Core-PeterUhnak.25 make the core test in #writeOptionalQuotedField faster by using a primitive when possible add #testWideOptionalQuoted === Name: Neo-CSV-Tests-SvenVanCaekenberghe.22 Author: SvenVanCaekenberghe Time: 23 July 2017, 1:54:52.133003 pm UUID: 5f81c280-5c11-0d00-a72e-7cd500db2a8c Ancestors: Neo-CSV-Tests-PeterUhnak.21 make the core test in #writeOptionalQuotedField faster by using a primitive when possible add #testWideOptionalQuoted === Extra caution was needed for WideStrings. Sven > Not sure what the next step is here: should I add it and send you mczs, or > will you do it yourself? (it is a simple change). > > Peter > > On Sat, Jul 22, 2017 at 10:51:42PM +0200, Sven Van Caekenberghe wrote: >> Peter, >> >>> On 22 Jul 2017, at 22:27, Peter Uhnak <i.uh...@gmail.com> wrote: >>> >>> Hi Sven, >>> >>> my use case was hand-edited CSVs (therefore the quotes are extra clutter), >>> which imples that I would be hand-viewing/editing only small CSVs (no >>> performance issues). >>> >>> I agree that we should err on the safe side with CR & LF (e.g. tools may >>> sometimes autoconvert line endings). >>> >>> Regarding performance: >>> >>> #findFirstInString:inSet:startingAt: didn't work for me (not sure if bug, >>> or I don't know how to use), so I've trried with inCharacterSet: >> >> Yes, it is a bitch to use, the the version you used does not use the >> primitive. >> >> Try playing with this: >> >> s10 := 'a' repeat: 10. >> s100 := 'a' repeat: 100. >> >> searchSet := ByteArray new: 256 withAll: 0. >> ',"', String crlf do: [ :each | searchSet at: each asInteger + 1 put: 1 ]. >> >> searchSet := (CharacterSet newFrom: ',"', String crlf) byteArrayMap. >> >> [ ByteString findFirstInString: s10 inSet: searchSet startingAt: 1 ] bench. >> "15,160,161 per second" >> [ ByteString findFirstInString: s100 inSet: searchSet startingAt: 1 ] bench. >> "8,219,081 per second" >> >> ByteString findFirstInString: ',"', String crlf inSet: searchSet startingAt: >> 1. >> >> Sven >> >>> Tested on worst-case scenario - strings don't contain tested symbols. >>> >>> s10 := 'a' repeat: 10. >>> s100 := 'a' repeat: 100. >>> >>> [ {'"'. ','. String cr. String lf } anySatisfy: [ :each | s10 >>> includesSubstring: each ] ] bench. "'1,200,046 per second'" >>> [ {'"'. ','. String cr. String lf } anySatisfy: [ :each | s100 >>> includesSubstring: each ] ] bench. "'495,482 per second'" >>> >>> [ s10 includesAny: { $,. $". Character cr. Character lf } ] bench. >>> "'2,819,416 per second'" >>> [ s100 includesAny: { $,. $". Character cr. Character lf } ] bench. >>> "'2,200,668 per second'" >>> >>> [ ByteString findFirstInString: s10 inCharacterSet: ',"', String crlf >>> startingAt: 1 ] bench. "'1,187,324 per second'" >>> [ ByteString findFirstInString: s100 inCharacterSet: ',"', String crlf >>> startingAt: 1 ] bench. "'165,526 per second'" >>> >>> >>> #includesAny: seems to be the best by far. >>> >>> Storing the tested characters didn't improve it by much. >>> >>> Peter >>> >>> On Sat, Jul 22, 2017 at 06:51:31PM +0200, Sven Van Caekenberghe wrote: >>>> Hi Peter, >>>> >>>>> On 22 Jul 2017, at 14:12, Peter Uhnak <i.uh...@gmail.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> this is a continuation of an older thread about quoting fields only when >>>>> necessary. ( >>>>> http://forum.world.st/NeoCSVWriter-automatic-quotes-td4924781.html ) >>>>> >>>>> I've attached fileouts of NeoCSV packages with the addition (I don't know >>>>> if Metacello can file-out only changesets). >>>>> >>>>> The change doesn't affect the default behavior, only when explicitly >>>>> requested. >>>>> >>>>> Peter >>>> >>>> I accepted your changes as such, the .mcz's were copied over to the main >>>> repositories. This is a pure & clean extension, so that is good. Thank you. >>>> >>>> This option is always going to be slower, but the current implementation >>>> might be improved, I think. >>>> >>>> The key test in #writeOptionalQuotedField: >>>> >>>> { >>>> lineEnd asString. >>>> separator asString. >>>> '"' } anySatisfy: [ :each | string includesSubstring: each ] >>>> >>>> will be quite slow. >>>> >>>> If we accept a little bit of (over safe) error on EOL and use any >>>> occurrence of CR or LF as needing a quote, we could switch to characters >>>> to test for. There exists a fast (primitive) test, >>>> #findFirstInString:inSet:startingAt: that can do all the testing in one >>>> go. If your version turns out to be slow, we could try that, if >>>> measurements show a difference. >>>> >>>> Regards, >>>> >>>> Sven >>>> >>>> >>> >> >> >