Ah, ByteArrayMap, I was trying ByteArray.

Not sure what the next step is here: should I add it and send you mczs, or will 
you do it yourself? (it is a simple change).

Peter

On Sat, Jul 22, 2017 at 10:51:42PM +0200, Sven Van Caekenberghe wrote:
> Peter,
> 
> > On 22 Jul 2017, at 22:27, Peter Uhnak <i.uh...@gmail.com> wrote:
> > 
> > Hi Sven,
> > 
> > my use case was hand-edited CSVs (therefore the quotes are extra clutter), 
> > which imples that I would be hand-viewing/editing only small CSVs (no 
> > performance issues).
> > 
> > I agree that we should err on the safe side with CR & LF (e.g. tools may 
> > sometimes autoconvert line endings).
> > 
> > Regarding performance:
> > 
> > #findFirstInString:inSet:startingAt: didn't work for me (not sure if bug, 
> > or I don't know how to use), so I've trried with inCharacterSet:
> 
> Yes, it is a bitch to use, the the version you used does not use the 
> primitive.
> 
> Try playing with this:
> 
> s10 := 'a' repeat: 10.
> s100 := 'a' repeat: 100.
> 
> searchSet := ByteArray new: 256 withAll: 0.
> ',"', String crlf do: [ :each | searchSet at: each asInteger + 1 put: 1 ].
> 
> searchSet := (CharacterSet newFrom: ',"', String crlf) byteArrayMap.
> 
> [ ByteString findFirstInString: s10 inSet: searchSet startingAt: 1 ] bench.
>  "15,160,161 per second"
> [ ByteString findFirstInString: s100 inSet: searchSet startingAt: 1 ] bench.
>  "8,219,081 per second"
> 
> ByteString findFirstInString: ',"', String crlf inSet: searchSet startingAt: 
> 1.
> 
> Sven
> 
> > Tested on worst-case scenario - strings don't contain tested symbols.
> > 
> > s10 := 'a' repeat: 10.
> > s100 := 'a' repeat: 100.
> > 
> > [ {'"'. ','. String cr. String lf } anySatisfy: [ :each | s10 
> > includesSubstring: each ] ] bench. "'1,200,046 per second'"
> > [ {'"'. ','. String cr. String lf } anySatisfy: [ :each | s100 
> > includesSubstring: each ] ] bench. "'495,482 per second'"
> > 
> > [ s10 includesAny: { $,. $". Character cr. Character lf } ] bench. 
> > "'2,819,416 per second'"
> > [ s100 includesAny: { $,. $". Character cr. Character lf } ] bench. 
> > "'2,200,668 per second'"
> > 
> > [ ByteString findFirstInString: s10 inCharacterSet: ',"', String crlf 
> > startingAt: 1 ] bench. "'1,187,324 per second'"
> > [ ByteString findFirstInString: s100 inCharacterSet: ',"', String crlf 
> > startingAt: 1 ] bench. "'165,526 per second'"
> > 
> > 
> > #includesAny: seems to be the best by far.
> > 
> > Storing the tested characters didn't improve it by much.
> > 
> > Peter
> > 
> > On Sat, Jul 22, 2017 at 06:51:31PM +0200, Sven Van Caekenberghe wrote:
> >> Hi Peter,
> >> 
> >>> On 22 Jul 2017, at 14:12, Peter Uhnak <i.uh...@gmail.com> wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> this is a continuation of an older thread about quoting fields only when 
> >>> necessary. ( 
> >>> http://forum.world.st/NeoCSVWriter-automatic-quotes-td4924781.html )
> >>> 
> >>> I've attached fileouts of NeoCSV packages with the addition (I don't know 
> >>> if Metacello can file-out only changesets).
> >>> 
> >>> The change doesn't affect the default behavior, only when explicitly 
> >>> requested.
> >>> 
> >>> Peter
> >> 
> >> I accepted your changes as such, the .mcz's were copied over to the main 
> >> repositories. This is a pure & clean extension, so that is good. Thank you.
> >> 
> >> This option is always going to be slower, but the current implementation 
> >> might be improved, I think.
> >> 
> >> The key test in #writeOptionalQuotedField:
> >> 
> >> {
> >>  lineEnd asString.
> >>  separator asString.
> >>  '"' } anySatisfy: [ :each | string includesSubstring: each ]
> >> 
> >> will be quite slow. 
> >> 
> >> If we accept a little bit of (over safe) error on EOL and use any 
> >> occurrence of CR or LF as needing a quote, we could switch to characters 
> >> to test for. There exists a fast (primitive) test, 
> >> #findFirstInString:inSet:startingAt: that can do all the testing in one 
> >> go. If your version turns out to be slow, we could try that, if 
> >> measurements show a difference.
> >> 
> >> Regards,
> >> 
> >> Sven 
> >> 
> >> 
> > 
> 
> 

Reply via email to