> But you initialized it in both cases. Yes.
> Is there a compiler optimization going on here that combines the zeros() and fill()? No. But there is a kernel optimization going on that complicates this measurement. Approximately, the memory requested by `malloc` (& friends) is not actually allocated until you try to read or write to it. So there are in fact 3 effects here (roughly speaking, they are malloc, A[1:4096:end], and fill()), where that second operation is unavoidable, and orders of magnitude slower than the other two. You measured the speed of 1 vs. 1+2+3. Whereas I measured the speed of 1+2+3 vs 1+2+3+3. On Mon Nov 24 2014 at 6:59:50 PM David Smith <david.sm...@gmail.com> wrote: > But you initialized it in both cases. Is there a compiler optimization > going on here that combines the zeros() and fill()? > > > On Monday, November 24, 2014 5:12:56 PM UTC-6, Jameson wrote: > >> yes. the point is to compare the cost of implicitly calling `zero` >> (resulting in the equivalent of calling zero twice) to the cost of not >> initializing the memory before writing to it. I could alternatively have >> done: `@time x=zeros(); @time fill(x, 0)` to measure the same information. >> >> On Mon Nov 24 2014 at 5:57:29 PM David Smith <david...@gmail.com> wrote: >> >>> Did you mean to call zeros() in both cases? >>> >>> >>> On Monday, November 24, 2014 3:09:38 PM UTC-6, Jameson wrote: >>> >>>> It appears the fill operation accounts for about 0.15 seconds of the >>>> 6.15 seconds that my OS X laptop takes to create this array: >>>> >>>> $ ./julia -q >>>> >>>> *julia> **N=10^9* >>>> >>>> *1000000000* >>>> >>>> >>>> *julia> **@time begin x=zeros(Int64,N); fill(x,0) end* >>>> >>>> elapsed time: 6.325660691 seconds (8000136616 bytes allocated, 1.71% gc >>>> time) >>>> >>>> *0-element Array{Array{Int64,1},1}* >>>> >>>> >>>> $ ./julia -q >>>> >>>> *julia> **N=10^9* >>>> >>>> *1000000000* >>>> >>>> >>>> *julia> **@time x=zeros(Int64,N)* >>>> >>>> elapsed time: 6.160623835 seconds (8000014320 bytes allocated, 0.22% gc >>>> time) >>>> >>>> >>>> >>>> On Mon Nov 24 2014 at 3:18:39 PM Erik Schnetter <schn...@cct.lsu.edu> >>>> wrote: >>>> >>>>> On Mon, Nov 24, 2014 at 3:01 PM, David Smith <david...@gmail.com> >>>>> wrote: >>>>> > To add some data to this conversation, I just timed allocating a >>>>> billion >>>>> > Int64s on my macbook, and I got this (I ran these multiple times >>>>> before this >>>>> > and got similar timings): >>>>> > >>>>> > julia> N=1_000_000_000 >>>>> > 1000000000 >>>>> > >>>>> > julia> @time x = Array(Int64,N); >>>>> > elapsed time: 0.022577671 seconds (8000000128 bytes allocated) >>>>> > >>>>> > julia> @time x = zeros(Int64,N); >>>>> > elapsed time: 3.95432248 seconds (8000000152 bytes allocated) >>>>> > >>>>> > So we are talking adding possibly seconds to a program per large >>>>> array >>>>> > allocation. >>>>> >>>>> This is not quite right -- the first does not actually map the pages >>>>> into memory; this is only done lazily when they are accessed the first >>>>> time. You need to compare "alloc uninitialized; then initialize once" >>>>> with "alloc zero-initialized; then initialize again". >>>>> >>>>> Current high-end system architectures have memory write speeds of ten >>>>> or twenty GByte per second; this is what you should see for very large >>>>> arrays -- this would be about 0.4 seconds for your case. For smaller >>>>> arrays, the data would reside in the cache, so that the allocation >>>>> overhead should be significantly smaller even. >>>>> >>>>> -erik >>>>> >>>>> -- >>>>> >>>> Erik Schnetter <schn...@cct.lsu.edu> >>>>> http://www.perimeterinstitute.ca/personal/eschnetter/ >>>>> >>>>