First, nice job paying attention to the memory allocation. Many people seem not to notice this, but it's critical to analyzing performance problems.
When you have questions like this, one nice tool is ProfileView, which can help you identify the line(s) that are triggering garbage collection (which is coupled, somewhat loosely, to allocation). This identifies iterator.jl:advance_filter:line 88 as the primary culprit. You might suspect that it would make it better if you made the filtering function a generic function, e.g., not499(x) = x != 499 rather than an anonymous function, x->x != 499 It does help, a little bit. But the bigger problem is that functions passed as arguments are not currently inlined. See issues #3426 and #210. This is a "deep" issue and rather nontrivial to fix. It's on the TODO list, however; see issue #3440. --Tim On Sunday, April 27, 2014 12:20:03 AM Spencer Liang wrote: > Sometimes when I have a collection, I would like to be able to iterate over > all but a few elements. It seems like the best way to do this in a > functional style is to filter the original iterator. Looking at the code in > base/iterator.jl, it appears that this should be efficient. However, this > doesn't seem to be the case in the test I tried: > > function f() > v = [1:1000] > s = 0 > for x in v > x == 499 && continue > s += x > end > return s > end > > > function g() > v = [1:1000] > s = 0 > for x in Filter(x -> x != 499, v) > s += x > end > return s > end > > > function h() > v = [1:1000] > s = 0 > for x in filter(x -> x != 499, v) > s += x > end > return s > end > > > println(f()) > println(g()) > println(h()) > > > @time for i in 1:10000; f(); end; > @time for i in 1:10000; g(); end; > @time for i in 1:10000; h(); end; > > Function f() is what I would have to do "by hand,", function g() is using a > filtered iterator, and function h() is filtering the array itself. > > Here are the results I got: > > 500001 > 500001 > 500001 > elapsed time: 0.074505435 seconds (80640000 bytes allocated) > elapsed time: 0.422853779 seconds (159120000 bytes allocated) > elapsed time: 0.533385388 seconds (330880000 bytes allocated) > > Why does this happen? Where are all the extra bytes being allocated in g()?