Filling random garbage is even more time consuming than filling zeros! 

-viral

On Tuesday, November 25, 2014 1:26:29 AM UTC+5:30, Stefan Karpinski wrote:
>
> I guess part of the problem is that calling the `zeros` function may be 
> less obvious as a way of constructing an array to many new programmers than 
> calling the `Array` constructor. Having a Boolean fill keyword argument 
> approach might be reasonable, although calling it `zeroed` might be more 
> accurate since we won't be filling the array with a specific value but 
> rather zeroing the memory first. Alternatively, we could just fill the 
> array with random garbage intentionally so that programmers are made 
> painfully aware that they didn't initialize the array ;-)
>
> On Nov 24, 2014, at 2:39 PM, Tomas Lycken <tomas.lyc...@gmail.com> wrote:
>
> That *is* the default usage in most introductory settings - just don't 
> show them the Array(T,n) constructor, but give them zeros and ones 
> functions instead. (It's perfectly fine to do e.g. A = zeros(10); fill!(A, 
> 5) if you don't care about the extra write...)
>
> If there is a specific setting where the students actually *need* to 
> allocate uninitialized memory (e.g. for speed), they are probably ready to 
> learn that the Array constructor gives them that.
>
> Julia's approach has so far seemed to be that the users are consenting 
> adults. I like that approach.
>
> // T
>
> On Monday, November 24, 2014 8:30:10 PM UTC+1, Ronald L. Rivest wrote:
>>
>> Regarding initialization:
>>
>>    -- I'm toying with the idea of recommending Julia for an introductory 
>> programming
>>       class (rather than Python).  
>>
>>    -- For this purpose, the language should not have hazards that catch 
>> the unwary.
>>
>>    -- Not initializing storage is definitely a hazard.  With 
>> uninitialized storage, a 
>>       program may run fine one day, and fail mysteriously the next, 
>> depending on 
>>       the contents of memory.  This is about predictability, reliability, 
>> dependability,
>>       and correctness.
>>
>>    -- I would favor a solution like
>>              A = Array(Int64,n)                   -- fills with zeros
>>              A = Array(Int64,n,fill=1)          -- to fill with ones
>>              A = Array(Int64,n,fill=None)    -- for an uninitialized array
>>        so that the *default* is an initialized array, but the speed geeks
>>        can get what they want.
>>
>> Cheers,
>> Ron
>>
>> On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote:
>>>
>>> If we can make allocating zeroed arrays faster that's great, but unless 
>>> we can close the performance gap all the way and eliminate the need to 
>>> allocated uninitialized arrays altogether, this proposal is just a rename – 
>>> Unchecked.Array 
>>> plays the exact same role as the current Array constructor. It's 
>>> unclear that this would even address the original concern since it still 
>>> *allows* uninitialized allocation of arrays. This rename would just force 
>>> people who have used Array correctly in code that cares about being as 
>>> efficient as possible even for very large arrays to change their code and 
>>> use Unchecked.Array instead.
>>>
>>> On Nov 24, 2014, at 1:36 PM, Jameson Nash <vtj...@gmail.com> wrote:
>>>
>>> I think that Rivest’s question may be a good reason to rethink the 
>>> initialization of structs and offer the explicit guarantee that all 
>>> unassigned elements will be initialized to 0 (and not just the jl_value_t 
>>> pointers). I would argue that the current behavior resulted more from a 
>>> desire to avoid clearing the array twice (if the user is about to call 
>>> fill, zeros, ones, +, etc.) than an intentional, casual exposure of 
>>> uninitialized memory.
>>>
>>> A random array of integers is also a security concern if an attacker can 
>>> extract some other information (with some probability) about the state of 
>>> the program. Julia is not hardened by design, so you can’t safely run an 
>>> unknown code fragment, but you still might have an unintended memory 
>>> exposure in a client-facing app. While zero’ing memory doesn’t prevent the 
>>> user from simply reusing a memory buffer in a security-unaware fashion 
>>> (rather than consistently allocating a new one for each use), it’s not 
>>> clear to me that the performance penalty would be all that noticeable for 
>>> map Array(X) to zero(X), and only providing an internal constructor for 
>>> grabbing uninitialized memory (perhaps Base.Unchecked.Array(X) from 
>>> #8227)
>>>
>>> On Mon Nov 24 2014 at 12:57:22 PM Stefan Karpinski 
>>> stefan.karpin...@gmail.com <http://mailto:stefan.karpin...@gmail.com> 
>>> wrote:
>>>
>>> There are two rather different issues to consider:
>>>>
>>>> 1. Preventing problems due to inadvertent programmer errors.
>>>> 2. Preventing malicious security attacks.
>>>>
>>>> When we initially made this choice, it wasn't clear if 1 would be a big 
>>>> issue but we decided to see how it played out. It hasn't been a problem in 
>>>> practice: once people grok that the Array(T, dims...) constructor gives 
>>>> uninitialized memory and that the standard usage pattern is to call it and 
>>>> then immediately initialize the memory, everything is ok. I can't 
>>>> recall a single situation where someone has had some terrible bug due to 
>>>> uninitialized int/float arrays.
>>>>
>>>> Regarding 2, Julia is not intended to be a hardened language for 
>>>> writing highly secure software. It allows all sorts of unsafe actions: 
>>>> pointer arithmetic, direct memory access, calling arbitrary C functions, 
>>>> etc. The future of really secure software seems to be small formally 
>>>> verified kernels written in statically typed languages that communicate 
>>>> with larger unverified systems over restricted channels. Julia might be 
>>>> appropriate for the larger unverified system but certainly not for the 
>>>> trusted kernel. Adding enough verification to Julia to write secure 
>>>> kernels 
>>>> is not inconceivable, but would be a major research effort. The 
>>>> implementation would have to check lots of things, including, of course, 
>>>> ensuring that all arrays are initialized.
>>>>
>>>> A couple of other points:
>>>>
>>>> Modern OSes protect against data leaking between processes by zeroing 
>>>> pages before a process first accesses them. Thus any data exposed by 
>>>> Array(T, dims...) comes from the same process and is not a security leak.
>>>>
>>>> An uninitialized array of, say, integers is not in itself a security 
>>>> concern – the issue is what you do with those integers. The classic 
>>>> security hole is to use a "random" value from uninitialized memory to 
>>>> access other memory by using it to index into an array or otherwise 
>>>> convert 
>>>> it to a pointer. In the presence of bounds checking, however, this isn't 
>>>> actually a big concern since you will still either get a bounds error or a 
>>>> valid array value – not a meaningful one, of course, but still just a 
>>>> value.
>>>>
>>>> Writing programs that are secure against malicious attacks is a hard, 
>>>> unsolved problem. So is doing efficient, productive high-level numerical 
>>>> programming. Trying to solve both problems at the same time seems like a 
>>>> recipe for failing at both.
>>>>
>>>> On Nov 24, 2014, at 11:43 AM, David Smith <david...@gmail.com> wrote:
>>>>
>>>> Some ideas:
>>>>
>>>> Is there a way to return an error for accesses before at least one 
>>>> assignment in bits types?  I.e. when the object is created uninitialized 
>>>> it 
>>>> is marked "dirty" and only after assignment of some user values can it be 
>>>> "cleanly" accessed?
>>>>
>>>> Can Julia provide a thin memory management layer that grabs memory from 
>>>> the OS first, zeroes it, and then gives it to the user upon initial 
>>>> allocation?  After gc+reallocation it doesn't need to be zeroed again, 
>>>> unless the next allocation is larger than anything previous, at which time 
>>>> Julia grabs more memory, sanitizes it, and hands it off. 
>>>>
>>>> On Monday, November 24, 2014 2:48:05 AM UTC-6, Mauro wrote:
>>>>>
>>>>> Pointer types will initialise to undef and any operation on them 
>>>>> fails: 
>>>>> julia> a = Array(ASCIIString, 5); 
>>>>>
>>>>> julia> a[1] 
>>>>> ERROR: access to undefined reference 
>>>>>  in getindex at array.jl:246 
>>>>>
>>>>> But you're right, for bits-types this is not an error an will just 
>>>>> return whatever was there before.  I think the reason this will stay 
>>>>> that way is that Julia is a numerics oriented language.  Thus you many 
>>>>> wanna create a 1GB array of Float64 and then fill it with something as 
>>>>> opposed to first fill it with zeros and then fill it with something. 
>>>>> See: 
>>>>>
>>>>> julia> @time b = Array(Float64, 10^9); 
>>>>> elapsed time: 0.029523638 seconds (8000000144 bytes allocated) 
>>>>>
>>>>> julia> @time c = zeros(Float64, 10^9); 
>>>>> elapsed time: 0.835062841 seconds (8000000168 bytes allocated) 
>>>>>
>>>>> You can argue that the time gain isn't worth the risk but I suspect 
>>>>> that 
>>>>> others may feel different. 
>>>>>
>>>>> On Mon, 2014-11-24 at 09:28, Ronald L. Rivest <rives...@gmail.com> 
>>>>> wrote: 
>>>>> > I am just learning Julia... 
>>>>> > 
>>>>> > I was quite shocked today to learn that Julia does *not* 
>>>>> > initialize allocated storage (e.g. to 0 or some default value). 
>>>>> > E.g. the code 
>>>>> >      A = Array(Int64,5) 
>>>>> >      println(A[1]) 
>>>>> > has unpredictable behavior, may disclose information from 
>>>>> > other modules, etc. 
>>>>> > 
>>>>> > This is really quite unacceptable in a modern programming 
>>>>> > language; it is as bad as not checking array reads for out-of-bounds 
>>>>> > indices.   
>>>>> > 
>>>>> > Google for "uninitialized security" to find numerous instances 
>>>>> > of security violations and unreliability problems caused by the 
>>>>> > use of uninitialized variables, and numerous security advisories 
>>>>> > warning of problems caused by the (perhaps inadvertent) use 
>>>>> > of uninitialized variables. 
>>>>> > 
>>>>> > You can't design a programming language today under the naive 
>>>>> > assumption that code in that language won't be used in highly 
>>>>> > critical applications or won't be under adversarial attack. 
>>>>> > 
>>>>> > You can't reasonably ask all programmers to properly initialize 
>>>>> > their allocated storage manually any more than you can ask them 
>>>>> > to test all indices before accessing an array manually; these are 
>>>>> > things that a high-level language should do for you. 
>>>>> > 
>>>>> > The default non-initialization of allocated storage is a 
>>>>> > mis-feature that should absolutely be fixed. 
>>>>> > 
>>>>> > There is no efficiency argument here in favor of uninitialized 
>>>>> storage 
>>>>> > that can outweigh the security and reliability disadvantages... 
>>>>> > 
>>>>> > Cheers, 
>>>>> > Ron Rivest 
>>>>>
>>>>> ​
>>>
>>>

Reply via email to