string(bytestring(...)) seems to do it. would appreciate any more efficient solutions (and confirmation the analysis is correct - is this worth filing as an issue?)
On Tuesday, 21 July 2015 19:33:05 UTC-3, andrew cooke wrote: > > > well, this was fun... the following code rapidly triggers the OOM killer > on my machine (julia 0.4 trunk): > > s = repeat("a", 1000000) > l = Any[] > r = r"^\w" > > for i in 1:length(s) > m = match(r, s[i:end]) > push!(l, m.match) > end > > note that: (1) the regexp is only matching one character, so the array l > is at most a million characters long. > > what i think is happening (but this is only a guess) is that s[i:end] is > being passed though to the c level regexp library as a new string. the > result (m.match) is then a substring into that. because the substring is > kept around, the backing string cannot be collected. and so there's an n^2 > memory use. > > ideally, i don't think a new copy of the string should be passed to the > regexp engine. maybe i am wrong? > > anyway, for now, if the above is right, i need some way to copy m.match. > as far as i can tell string() doesn't help. so what works? or am i wrong? > > thanks, > andrew >