On Tue, Jul 21, 2015 at 7:08 PM, Jameson Nash <vtjn...@gmail.com> wrote: > does `copy` work? although `bytestring` also seems like a good method for > this also. it seems wrong to me also that `match` is making a copy of the > original string (if that is indeed what it is doing)
Isn't it `s[i:end]` that is doing the copy? > > On Tue, Jul 21, 2015 at 6:57 PM andrew cooke <and...@acooke.org> wrote: >> >> >> string(bytestring(...)) seems to do it. would appreciate any more >> efficient solutions (and confirmation the analysis is correct - is this >> worth filing as an issue?) >> >> >> On Tuesday, 21 July 2015 19:33:05 UTC-3, andrew cooke wrote: >>> >>> >>> well, this was fun... the following code rapidly triggers the OOM killer >>> on my machine (julia 0.4 trunk): >>> >>> s = repeat("a", 1000000) >>> l = Any[] >>> r = r"^\w" >>> >>> for i in 1:length(s) >>> m = match(r, s[i:end]) >>> push!(l, m.match) >>> end >>> >>> note that: (1) the regexp is only matching one character, so the array l >>> is at most a million characters long. >>> >>> what i think is happening (but this is only a guess) is that s[i:end] is >>> being passed though to the c level regexp library as a new string. the >>> result (m.match) is then a substring into that. because the substring is >>> kept around, the backing string cannot be collected. and so there's an n^2 >>> memory use. >>> >>> ideally, i don't think a new copy of the string should be passed to the >>> regexp engine. maybe i am wrong? >>> >>> anyway, for now, if the above is right, i need some way to copy m.match. >>> as far as i can tell string() doesn't help. so what works? or am i wrong? >>> >>> thanks, >>> andrew