This is what I was thinking. I just assumed that the fill() time would be 
constant for both and factored that out, not knowing that malloc() was lazy.

I get similar results for Stefan's bench, although the variance is large.

On Monday, November 24, 2014 6:20:21 PM UTC-6, Stefan Karpinski wrote:
>
> Should the comparison actually be more like this:
>
> julia> @time begin
>            x = Array(Int,N)
>            fill!(x,1)
>        end;
> elapsed time: 6.782572096 seconds (8000000128 bytes allocated)
>
> julia> @time begin
>            x = zeros(Int,N)
>            fill!(x,1)
>        end;
> elapsed time: 14.166256835 seconds (8000000176 bytes allocated)
>
>
> At least that's the comparison that makes sense for code that allocates 
> and then initializes an array. I consistently see a 2x slowdown or more.
>
> On Mon, Nov 24, 2014 at 7:09 PM, Jameson Nash <vtj...@gmail.com 
> <javascript:>> wrote:
>
>> > But you initialized it in both cases. 
>>
>> Yes.
>>
>> > Is there a compiler optimization going on here that combines the 
>> zeros() and fill()?
>>
>> No.
>>
>> But there is a kernel optimization going on that complicates this 
>> measurement. Approximately, the memory requested by `malloc` (& friends) is 
>> not actually allocated until you try to read or write to it. So there are 
>> in fact 3 effects here (roughly speaking, they are malloc, A[1:4096:end], 
>> and fill()), where that second operation is unavoidable, and orders of 
>> magnitude slower than the other two. You measured the speed of 1 vs. 1+2+3. 
>> Whereas I measured the speed of 1+2+3 vs 1+2+3+3.
>>
>> On Mon Nov 24 2014 at 6:59:50 PM David Smith <david...@gmail.com 
>> <javascript:>> wrote:
>>
>>> But you initialized it in both cases.  Is there a compiler optimization 
>>> going on here that combines the zeros() and fill()?
>>>
>>>
>>> On Monday, November 24, 2014 5:12:56 PM UTC-6, Jameson wrote:
>>>
>>>> yes. the point is to compare the cost of implicitly calling `zero` 
>>>> (resulting in the equivalent of calling zero twice) to the cost of not 
>>>> initializing the memory before writing to it. I could alternatively have 
>>>> done: `@time x=zeros(); @time fill(x, 0)` to measure the same information.
>>>>
>>>> On Mon Nov 24 2014 at 5:57:29 PM David Smith <david...@gmail.com> 
>>>> wrote:
>>>>
>>>>> Did you mean to call zeros() in both cases?
>>>>>
>>>>>
>>>>> On Monday, November 24, 2014 3:09:38 PM UTC-6, Jameson wrote:
>>>>>
>>>>>> It appears the fill operation accounts for about 0.15 seconds of the 
>>>>>> 6.15 seconds that my OS X laptop takes to create this array:
>>>>>>
>>>>>> $ ./julia -q
>>>>>>
>>>>>> *julia> **N=10^9*
>>>>>>
>>>>>> *1000000000*
>>>>>>
>>>>>>
>>>>>> *julia> **@time begin x=zeros(Int64,N); fill(x,0) end*
>>>>>>
>>>>>> elapsed time: 6.325660691 seconds (8000136616 bytes allocated, 1.71% 
>>>>>> gc time)
>>>>>>
>>>>>> *0-element Array{Array{Int64,1},1}*
>>>>>>
>>>>>>
>>>>>> $ ./julia -q
>>>>>>
>>>>>> *julia> **N=10^9*
>>>>>>
>>>>>> *1000000000*
>>>>>>
>>>>>>
>>>>>> *julia> **@time x=zeros(Int64,N)*
>>>>>>
>>>>>> elapsed time: 6.160623835 seconds (8000014320 bytes allocated, 0.22% 
>>>>>> gc time)
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon Nov 24 2014 at 3:18:39 PM Erik Schnetter <schn...@cct.lsu.edu> 
>>>>>> wrote:
>>>>>>
>>>>>>> On Mon, Nov 24, 2014 at 3:01 PM, David Smith <david...@gmail.com> 
>>>>>>> wrote:
>>>>>>> > To add some data to this conversation, I just timed allocating a 
>>>>>>> billion
>>>>>>> > Int64s on my macbook, and I got this (I ran these multiple times 
>>>>>>> before this
>>>>>>> > and got similar timings):
>>>>>>> >
>>>>>>> > julia> N=1_000_000_000
>>>>>>> > 1000000000
>>>>>>> >
>>>>>>> > julia> @time x = Array(Int64,N);
>>>>>>> > elapsed time: 0.022577671 seconds (8000000128 bytes allocated)
>>>>>>> >
>>>>>>> > julia> @time x = zeros(Int64,N);
>>>>>>> > elapsed time: 3.95432248 seconds (8000000152 bytes allocated)
>>>>>>> >
>>>>>>> > So we are talking adding possibly seconds to a program per large 
>>>>>>> array
>>>>>>> > allocation.
>>>>>>>
>>>>>>> This is not quite right -- the first does not actually map the pages
>>>>>>> into memory; this is only done lazily when they are accessed the 
>>>>>>> first
>>>>>>> time. You need to compare "alloc uninitialized; then initialize once"
>>>>>>> with "alloc zero-initialized; then initialize again".
>>>>>>>
>>>>>>> Current high-end system architectures have memory write speeds of ten
>>>>>>> or twenty GByte per second; this is what you should see for very 
>>>>>>> large
>>>>>>> arrays -- this would be about 0.4 seconds for your case. For smaller
>>>>>>> arrays, the data would reside in the cache, so that the allocation
>>>>>>> overhead should be significantly smaller even.
>>>>>>>
>>>>>>> -erik
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>> Erik Schnetter <schn...@cct.lsu.edu>
>>>>>>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>>>>>>
>>>>>>  
>

Reply via email to