First, nice job paying attention to the memory allocation. Many people seem 
not to notice this, but it's critical to analyzing performance problems.

When you have questions like this, one nice tool is ProfileView, which can help 
you identify the line(s) that are triggering garbage collection (which is 
coupled, somewhat loosely, to allocation). This identifies 
iterator.jl:advance_filter:line 88 as the primary culprit. 

You might suspect that it would make it better if you made the filtering 
function a generic function, e.g.,
    not499(x)  = x != 499
rather than an anonymous function,
    x->x != 499
It does help, a little bit. But the bigger problem is that functions passed as 
arguments are not currently inlined. See issues #3426 and #210.  This is a 
"deep" issue and rather nontrivial to fix. It's on the TODO list, however; see 
issue #3440.

--Tim

On Sunday, April 27, 2014 12:20:03 AM Spencer Liang wrote:
> Sometimes when I have a collection, I would like to be able to iterate over
> all but a few elements. It seems like the best way to do this in a
> functional style is to filter the original iterator. Looking at the code in
> base/iterator.jl, it appears that this should be efficient. However, this
> doesn't seem to be the case in the test I tried:
> 
> function f()
>     v = [1:1000]
>     s = 0
>     for x in v
>         x == 499 && continue
>         s += x
>     end
>     return s
> end
> 
> 
> function g()
>     v = [1:1000]
>     s = 0
>     for x in Filter(x -> x != 499, v)
>         s += x
>     end
>     return s
> end
> 
> 
> function h()
>     v = [1:1000]
>     s = 0
>     for x in filter(x -> x != 499, v)
>         s += x
>     end
>     return s
> end
> 
> 
> println(f())
> println(g())
> println(h())
> 
> 
> @time for i in 1:10000; f(); end;
> @time for i in 1:10000; g(); end;
> @time for i in 1:10000; h(); end;
> 
> Function f() is what I would have to do "by hand,", function g() is using a
> filtered iterator, and function h() is filtering the array itself.
> 
> Here are the results I got:
> 
> 500001
> 500001
> 500001
> elapsed time: 0.074505435 seconds (80640000 bytes allocated)
> elapsed time: 0.422853779 seconds (159120000 bytes allocated)
> elapsed time: 0.533385388 seconds (330880000 bytes allocated)
> 
> Why does this happen? Where are all the extra bytes being allocated in g()?

Reply via email to