Filling random garbage is even more time consuming than filling zeros! -viral
On Tuesday, November 25, 2014 1:26:29 AM UTC+5:30, Stefan Karpinski wrote: > > I guess part of the problem is that calling the `zeros` function may be > less obvious as a way of constructing an array to many new programmers than > calling the `Array` constructor. Having a Boolean fill keyword argument > approach might be reasonable, although calling it `zeroed` might be more > accurate since we won't be filling the array with a specific value but > rather zeroing the memory first. Alternatively, we could just fill the > array with random garbage intentionally so that programmers are made > painfully aware that they didn't initialize the array ;-) > > On Nov 24, 2014, at 2:39 PM, Tomas Lycken <tomas.lyc...@gmail.com> wrote: > > That *is* the default usage in most introductory settings - just don't > show them the Array(T,n) constructor, but give them zeros and ones > functions instead. (It's perfectly fine to do e.g. A = zeros(10); fill!(A, > 5) if you don't care about the extra write...) > > If there is a specific setting where the students actually *need* to > allocate uninitialized memory (e.g. for speed), they are probably ready to > learn that the Array constructor gives them that. > > Julia's approach has so far seemed to be that the users are consenting > adults. I like that approach. > > // T > > On Monday, November 24, 2014 8:30:10 PM UTC+1, Ronald L. Rivest wrote: >> >> Regarding initialization: >> >> -- I'm toying with the idea of recommending Julia for an introductory >> programming >> class (rather than Python). >> >> -- For this purpose, the language should not have hazards that catch >> the unwary. >> >> -- Not initializing storage is definitely a hazard. With >> uninitialized storage, a >> program may run fine one day, and fail mysteriously the next, >> depending on >> the contents of memory. This is about predictability, reliability, >> dependability, >> and correctness. >> >> -- I would favor a solution like >> A = Array(Int64,n) -- fills with zeros >> A = Array(Int64,n,fill=1) -- to fill with ones >> A = Array(Int64,n,fill=None) -- for an uninitialized array >> so that the *default* is an initialized array, but the speed geeks >> can get what they want. >> >> Cheers, >> Ron >> >> On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote: >>> >>> If we can make allocating zeroed arrays faster that's great, but unless >>> we can close the performance gap all the way and eliminate the need to >>> allocated uninitialized arrays altogether, this proposal is just a rename – >>> Unchecked.Array >>> plays the exact same role as the current Array constructor. It's >>> unclear that this would even address the original concern since it still >>> *allows* uninitialized allocation of arrays. This rename would just force >>> people who have used Array correctly in code that cares about being as >>> efficient as possible even for very large arrays to change their code and >>> use Unchecked.Array instead. >>> >>> On Nov 24, 2014, at 1:36 PM, Jameson Nash <vtj...@gmail.com> wrote: >>> >>> I think that Rivest’s question may be a good reason to rethink the >>> initialization of structs and offer the explicit guarantee that all >>> unassigned elements will be initialized to 0 (and not just the jl_value_t >>> pointers). I would argue that the current behavior resulted more from a >>> desire to avoid clearing the array twice (if the user is about to call >>> fill, zeros, ones, +, etc.) than an intentional, casual exposure of >>> uninitialized memory. >>> >>> A random array of integers is also a security concern if an attacker can >>> extract some other information (with some probability) about the state of >>> the program. Julia is not hardened by design, so you can’t safely run an >>> unknown code fragment, but you still might have an unintended memory >>> exposure in a client-facing app. While zero’ing memory doesn’t prevent the >>> user from simply reusing a memory buffer in a security-unaware fashion >>> (rather than consistently allocating a new one for each use), it’s not >>> clear to me that the performance penalty would be all that noticeable for >>> map Array(X) to zero(X), and only providing an internal constructor for >>> grabbing uninitialized memory (perhaps Base.Unchecked.Array(X) from >>> #8227) >>> >>> On Mon Nov 24 2014 at 12:57:22 PM Stefan Karpinski >>> stefan.karpin...@gmail.com <http://mailto:stefan.karpin...@gmail.com> >>> wrote: >>> >>> There are two rather different issues to consider: >>>> >>>> 1. Preventing problems due to inadvertent programmer errors. >>>> 2. Preventing malicious security attacks. >>>> >>>> When we initially made this choice, it wasn't clear if 1 would be a big >>>> issue but we decided to see how it played out. It hasn't been a problem in >>>> practice: once people grok that the Array(T, dims...) constructor gives >>>> uninitialized memory and that the standard usage pattern is to call it and >>>> then immediately initialize the memory, everything is ok. I can't >>>> recall a single situation where someone has had some terrible bug due to >>>> uninitialized int/float arrays. >>>> >>>> Regarding 2, Julia is not intended to be a hardened language for >>>> writing highly secure software. It allows all sorts of unsafe actions: >>>> pointer arithmetic, direct memory access, calling arbitrary C functions, >>>> etc. The future of really secure software seems to be small formally >>>> verified kernels written in statically typed languages that communicate >>>> with larger unverified systems over restricted channels. Julia might be >>>> appropriate for the larger unverified system but certainly not for the >>>> trusted kernel. Adding enough verification to Julia to write secure >>>> kernels >>>> is not inconceivable, but would be a major research effort. The >>>> implementation would have to check lots of things, including, of course, >>>> ensuring that all arrays are initialized. >>>> >>>> A couple of other points: >>>> >>>> Modern OSes protect against data leaking between processes by zeroing >>>> pages before a process first accesses them. Thus any data exposed by >>>> Array(T, dims...) comes from the same process and is not a security leak. >>>> >>>> An uninitialized array of, say, integers is not in itself a security >>>> concern – the issue is what you do with those integers. The classic >>>> security hole is to use a "random" value from uninitialized memory to >>>> access other memory by using it to index into an array or otherwise >>>> convert >>>> it to a pointer. In the presence of bounds checking, however, this isn't >>>> actually a big concern since you will still either get a bounds error or a >>>> valid array value – not a meaningful one, of course, but still just a >>>> value. >>>> >>>> Writing programs that are secure against malicious attacks is a hard, >>>> unsolved problem. So is doing efficient, productive high-level numerical >>>> programming. Trying to solve both problems at the same time seems like a >>>> recipe for failing at both. >>>> >>>> On Nov 24, 2014, at 11:43 AM, David Smith <david...@gmail.com> wrote: >>>> >>>> Some ideas: >>>> >>>> Is there a way to return an error for accesses before at least one >>>> assignment in bits types? I.e. when the object is created uninitialized >>>> it >>>> is marked "dirty" and only after assignment of some user values can it be >>>> "cleanly" accessed? >>>> >>>> Can Julia provide a thin memory management layer that grabs memory from >>>> the OS first, zeroes it, and then gives it to the user upon initial >>>> allocation? After gc+reallocation it doesn't need to be zeroed again, >>>> unless the next allocation is larger than anything previous, at which time >>>> Julia grabs more memory, sanitizes it, and hands it off. >>>> >>>> On Monday, November 24, 2014 2:48:05 AM UTC-6, Mauro wrote: >>>>> >>>>> Pointer types will initialise to undef and any operation on them >>>>> fails: >>>>> julia> a = Array(ASCIIString, 5); >>>>> >>>>> julia> a[1] >>>>> ERROR: access to undefined reference >>>>> in getindex at array.jl:246 >>>>> >>>>> But you're right, for bits-types this is not an error an will just >>>>> return whatever was there before. I think the reason this will stay >>>>> that way is that Julia is a numerics oriented language. Thus you many >>>>> wanna create a 1GB array of Float64 and then fill it with something as >>>>> opposed to first fill it with zeros and then fill it with something. >>>>> See: >>>>> >>>>> julia> @time b = Array(Float64, 10^9); >>>>> elapsed time: 0.029523638 seconds (8000000144 bytes allocated) >>>>> >>>>> julia> @time c = zeros(Float64, 10^9); >>>>> elapsed time: 0.835062841 seconds (8000000168 bytes allocated) >>>>> >>>>> You can argue that the time gain isn't worth the risk but I suspect >>>>> that >>>>> others may feel different. >>>>> >>>>> On Mon, 2014-11-24 at 09:28, Ronald L. Rivest <rives...@gmail.com> >>>>> wrote: >>>>> > I am just learning Julia... >>>>> > >>>>> > I was quite shocked today to learn that Julia does *not* >>>>> > initialize allocated storage (e.g. to 0 or some default value). >>>>> > E.g. the code >>>>> > A = Array(Int64,5) >>>>> > println(A[1]) >>>>> > has unpredictable behavior, may disclose information from >>>>> > other modules, etc. >>>>> > >>>>> > This is really quite unacceptable in a modern programming >>>>> > language; it is as bad as not checking array reads for out-of-bounds >>>>> > indices. >>>>> > >>>>> > Google for "uninitialized security" to find numerous instances >>>>> > of security violations and unreliability problems caused by the >>>>> > use of uninitialized variables, and numerous security advisories >>>>> > warning of problems caused by the (perhaps inadvertent) use >>>>> > of uninitialized variables. >>>>> > >>>>> > You can't design a programming language today under the naive >>>>> > assumption that code in that language won't be used in highly >>>>> > critical applications or won't be under adversarial attack. >>>>> > >>>>> > You can't reasonably ask all programmers to properly initialize >>>>> > their allocated storage manually any more than you can ask them >>>>> > to test all indices before accessing an array manually; these are >>>>> > things that a high-level language should do for you. >>>>> > >>>>> > The default non-initialization of allocated storage is a >>>>> > mis-feature that should absolutely be fixed. >>>>> > >>>>> > There is no efficiency argument here in favor of uninitialized >>>>> storage >>>>> > that can outweigh the security and reliability disadvantages... >>>>> > >>>>> > Cheers, >>>>> > Ron Rivest >>>>> >>>>> >>> >>>