Issue created: https://github.com/JuliaLang/julia/issues/9147.
On Tue, Nov 25, 2014 at 10:16 AM, Stefan Karpinski <ste...@karpinski.org> wrote: > It seems more reasonable to me to always zero uninitialized fields of > composite values. This is basically free since objects larger than a memory > page are not common. > > On Tue, Nov 25, 2014 at 1:13 AM, Ronald L. Rivest <rivest....@gmail.com> > wrote: > >> Sorry; zeros() does not work here instead of new(). My mistake. >> Is there a safe alternative to new() that guarantees that all fields >> will have a definite fixed value? >> >> Cheers, >> Ron >> >> >> On Tuesday, November 25, 2014 1:05:40 AM UTC-5, Ronald L. Rivest wrote: >>> >>> The problem also exists for new() (e.g. when initializing a >>> record/object). zeros() can >>> apparently be used here instead. >>> >>> Cheers, >>> Ron >>> >>> On Tuesday, November 25, 2014 12:29:07 AM UTC-5, Viral Shah wrote: >>>> >>>> Much has been already said on this topic. >>>> >>>> The Array(...) interface was kind of meant to be low-level for the user >>>> of scientific computing, only to be used when they know what they are >>>> doing. You get the raw uninitialized memory as fast as possible. >>>> >>>> The user-facing interface was always an array constructor - zeros(), >>>> ones(), rand(), etc. Some of this is because of our past experience coming >>>> from a matlab/R-like world. >>>> >>>> As Julia has become more popular, we have realized that those not >>>> coming from matlab/R end up using all the possible constructors. While this >>>> has raised a variety of issues, I'd like to say that this will not get >>>> sorted out satisfactorily before the 0.4 release. For a class that may be >>>> taught soon, the thing to do would be to use the zeros/ones/rand >>>> constructors to construct arrays, instead of Array(), which currently is >>>> more for a package developer. I understand that Array() is a much better >>>> name as Stefan points out, but zeros() is not too terrible - it at least >>>> clearly tells the user that they get zeroed out arrays. >>>> >>>> While we have other "features" that can lead to unsafe code (ccall, >>>> @inbounds), none of these are things one is likely to run into while >>>> learning the language. >>>> >>>> -viral >>>> >>>> On Tuesday, November 25, 2014 1:00:10 AM UTC+5:30, Ronald L. Rivest >>>> wrote: >>>>> >>>>> Regarding initialization: >>>>> >>>>> -- I'm toying with the idea of recommending Julia for an >>>>> introductory programming >>>>> class (rather than Python). >>>>> >>>>> -- For this purpose, the language should not have hazards that >>>>> catch the unwary. >>>>> >>>>> -- Not initializing storage is definitely a hazard. With >>>>> uninitialized storage, a >>>>> program may run fine one day, and fail mysteriously the next, >>>>> depending on >>>>> the contents of memory. This is about predictability, >>>>> reliability, dependability, >>>>> and correctness. >>>>> >>>>> -- I would favor a solution like >>>>> A = Array(Int64,n) -- fills with zeros >>>>> A = Array(Int64,n,fill=1) -- to fill with ones >>>>> A = Array(Int64,n,fill=None) -- for an uninitialized >>>>> array >>>>> so that the *default* is an initialized array, but the speed >>>>> geeks >>>>> can get what they want. >>>>> >>>>> Cheers, >>>>> Ron >>>>> >>>>> On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote: >>>>>> >>>>>> If we can make allocating zeroed arrays faster that's great, but >>>>>> unless we can close the performance gap all the way and eliminate the >>>>>> need >>>>>> to allocated uninitialized arrays altogether, this proposal is just a >>>>>> rename – Unchecked.Array plays the exact same role as the current >>>>>> Array constructor. It's unclear that this would even address the original >>>>>> concern since it still *allows* uninitialized allocation of arrays. This >>>>>> rename would just force people who have used Array correctly in code that >>>>>> cares about being as efficient as possible even for very large arrays to >>>>>> change their code and use Unchecked.Array instead. >>>>>> >>>>>> On Nov 24, 2014, at 1:36 PM, Jameson Nash <vtj...@gmail.com> wrote: >>>>>> >>>>>> I think that Rivest’s question may be a good reason to rethink the >>>>>> initialization of structs and offer the explicit guarantee that all >>>>>> unassigned elements will be initialized to 0 (and not just the jl_value_t >>>>>> pointers). I would argue that the current behavior resulted more from a >>>>>> desire to avoid clearing the array twice (if the user is about to call >>>>>> fill, zeros, ones, +, etc.) than an intentional, casual exposure of >>>>>> uninitialized memory. >>>>>> >>>>>> A random array of integers is also a security concern if an attacker >>>>>> can extract some other information (with some probability) about the >>>>>> state >>>>>> of the program. Julia is not hardened by design, so you can’t safely run >>>>>> an >>>>>> unknown code fragment, but you still might have an unintended memory >>>>>> exposure in a client-facing app. While zero’ing memory doesn’t prevent >>>>>> the >>>>>> user from simply reusing a memory buffer in a security-unaware fashion >>>>>> (rather than consistently allocating a new one for each use), it’s not >>>>>> clear to me that the performance penalty would be all that noticeable for >>>>>> map Array(X) to zero(X), and only providing an internal constructor for >>>>>> grabbing uninitialized memory (perhaps Base.Unchecked.Array(X) from >>>>>> #8227) >>>>>> >>>>>> On Mon Nov 24 2014 at 12:57:22 PM Stefan Karpinski >>>>>> stefan.karpin...@gmail.com <http://mailto:stefan.karpin...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> There are two rather different issues to consider: >>>>>>> >>>>>>> 1. Preventing problems due to inadvertent programmer errors. >>>>>>> 2. Preventing malicious security attacks. >>>>>>> >>>>>>> When we initially made this choice, it wasn't clear if 1 would be a >>>>>>> big issue but we decided to see how it played out. It hasn't been a >>>>>>> problem >>>>>>> in practice: once people grok that the Array(T, dims...) constructor >>>>>>> gives >>>>>>> uninitialized memory and that the standard usage pattern is to call it >>>>>>> and >>>>>>> then immediately initialize the memory, everything is ok. I can't >>>>>>> recall a single situation where someone has had some terrible bug due to >>>>>>> uninitialized int/float arrays. >>>>>>> >>>>>>> Regarding 2, Julia is not intended to be a hardened language for >>>>>>> writing highly secure software. It allows all sorts of unsafe actions: >>>>>>> pointer arithmetic, direct memory access, calling arbitrary C functions, >>>>>>> etc. The future of really secure software seems to be small formally >>>>>>> verified kernels written in statically typed languages that communicate >>>>>>> with larger unverified systems over restricted channels. Julia might be >>>>>>> appropriate for the larger unverified system but certainly not for the >>>>>>> trusted kernel. Adding enough verification to Julia to write secure >>>>>>> kernels >>>>>>> is not inconceivable, but would be a major research effort. The >>>>>>> implementation would have to check lots of things, including, of course, >>>>>>> ensuring that all arrays are initialized. >>>>>>> >>>>>>> A couple of other points: >>>>>>> >>>>>>> Modern OSes protect against data leaking between processes by >>>>>>> zeroing pages before a process first accesses them. Thus any data >>>>>>> exposed >>>>>>> by Array(T, dims...) comes from the same process and is not a security >>>>>>> leak. >>>>>>> >>>>>>> An uninitialized array of, say, integers is not in itself a security >>>>>>> concern – the issue is what you do with those integers. The classic >>>>>>> security hole is to use a "random" value from uninitialized memory to >>>>>>> access other memory by using it to index into an array or otherwise >>>>>>> convert >>>>>>> it to a pointer. In the presence of bounds checking, however, this isn't >>>>>>> actually a big concern since you will still either get a bounds error >>>>>>> or a >>>>>>> valid array value – not a meaningful one, of course, but still just a >>>>>>> value. >>>>>>> >>>>>>> Writing programs that are secure against malicious attacks is a >>>>>>> hard, unsolved problem. So is doing efficient, productive high-level >>>>>>> numerical programming. Trying to solve both problems at the same time >>>>>>> seems >>>>>>> like a recipe for failing at both. >>>>>>> >>>>>>> On Nov 24, 2014, at 11:43 AM, David Smith <david...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> Some ideas: >>>>>>> >>>>>>> Is there a way to return an error for accesses before at least one >>>>>>> assignment in bits types? I.e. when the object is created >>>>>>> uninitialized it >>>>>>> is marked "dirty" and only after assignment of some user values can it >>>>>>> be >>>>>>> "cleanly" accessed? >>>>>>> >>>>>>> Can Julia provide a thin memory management layer that grabs memory >>>>>>> from the OS first, zeroes it, and then gives it to the user upon initial >>>>>>> allocation? After gc+reallocation it doesn't need to be zeroed again, >>>>>>> unless the next allocation is larger than anything previous, at which >>>>>>> time >>>>>>> Julia grabs more memory, sanitizes it, and hands it off. >>>>>>> >>>>>>> On Monday, November 24, 2014 2:48:05 AM UTC-6, Mauro wrote: >>>>>>>> >>>>>>>> Pointer types will initialise to undef and any operation on them >>>>>>>> fails: >>>>>>>> julia> a = Array(ASCIIString, 5); >>>>>>>> >>>>>>>> julia> a[1] >>>>>>>> ERROR: access to undefined reference >>>>>>>> in getindex at array.jl:246 >>>>>>>> >>>>>>>> But you're right, for bits-types this is not an error an will just >>>>>>>> return whatever was there before. I think the reason this will >>>>>>>> stay >>>>>>>> that way is that Julia is a numerics oriented language. Thus you >>>>>>>> many >>>>>>>> wanna create a 1GB array of Float64 and then fill it with something >>>>>>>> as >>>>>>>> opposed to first fill it with zeros and then fill it with >>>>>>>> something. >>>>>>>> See: >>>>>>>> >>>>>>>> julia> @time b = Array(Float64, 10^9); >>>>>>>> elapsed time: 0.029523638 seconds (8000000144 bytes allocated) >>>>>>>> >>>>>>>> julia> @time c = zeros(Float64, 10^9); >>>>>>>> elapsed time: 0.835062841 seconds (8000000168 bytes allocated) >>>>>>>> >>>>>>>> You can argue that the time gain isn't worth the risk but I suspect >>>>>>>> that >>>>>>>> others may feel different. >>>>>>>> >>>>>>>> On Mon, 2014-11-24 at 09:28, Ronald L. Rivest <rives...@gmail.com> >>>>>>>> wrote: >>>>>>>> > I am just learning Julia... >>>>>>>> > >>>>>>>> > I was quite shocked today to learn that Julia does *not* >>>>>>>> > initialize allocated storage (e.g. to 0 or some default value). >>>>>>>> > E.g. the code >>>>>>>> > A = Array(Int64,5) >>>>>>>> > println(A[1]) >>>>>>>> > has unpredictable behavior, may disclose information from >>>>>>>> > other modules, etc. >>>>>>>> > >>>>>>>> > This is really quite unacceptable in a modern programming >>>>>>>> > language; it is as bad as not checking array reads for >>>>>>>> out-of-bounds >>>>>>>> > indices. >>>>>>>> > >>>>>>>> > Google for "uninitialized security" to find numerous instances >>>>>>>> > of security violations and unreliability problems caused by the >>>>>>>> > use of uninitialized variables, and numerous security advisories >>>>>>>> > warning of problems caused by the (perhaps inadvertent) use >>>>>>>> > of uninitialized variables. >>>>>>>> > >>>>>>>> > You can't design a programming language today under the naive >>>>>>>> > assumption that code in that language won't be used in highly >>>>>>>> > critical applications or won't be under adversarial attack. >>>>>>>> > >>>>>>>> > You can't reasonably ask all programmers to properly initialize >>>>>>>> > their allocated storage manually any more than you can ask them >>>>>>>> > to test all indices before accessing an array manually; these are >>>>>>>> > things that a high-level language should do for you. >>>>>>>> > >>>>>>>> > The default non-initialization of allocated storage is a >>>>>>>> > mis-feature that should absolutely be fixed. >>>>>>>> > >>>>>>>> > There is no efficiency argument here in favor of uninitialized >>>>>>>> storage >>>>>>>> > that can outweigh the security and reliability disadvantages... >>>>>>>> > >>>>>>>> > Cheers, >>>>>>>> > Ron Rivest >>>>>>>> >>>>>>>> >>>>>> >>>>>> >