I also haven't read this thread carefully, but it does seem that in this case 
one needs to automatically mutate code like this:

    x = A[5]
    if isa(x, BadType)
        ...
    elseif isa(x, GoodType)
        ...  # do something with x
    end

into

    dt = peektype(A, 5)  # gets the type of A[5] without evaluating it
    if dt <: BadType
        ...
    elseif dt <: GoodType
        x = A[5]
        ...  # do something with x
    end

so that dt can be type-stable.

--Tim


On Friday, August 01, 2014 02:54:47 PM Jameson Nash wrote:
> I could (and probably will, someday) revive that commit. At the time,
> though, I seemed to find that it provided little performance benefit -- the
> gc cost of allocating boxes was far greater (for type uncertainty involving
> bitstypes) and the type dispatch wasn't as much of a performance impact as
> I had previously assumed.
> 
> 
> On Friday, August 1, 2014, Keno Fischer <kfisc...@college.harvard.edu>
> 
> wrote:
> > It is possible to do generic compiler improvements for Union types
> > (Jameson had a branch at some point that did callsite splitting if we
> > inferred a Union type). However, I think the best way to go here is to
> > maintain the current separation of two arrays (one of the values one
> > for the NAs), but give an option type on access. The option type would
> > then most likely be in memory and wouldn't have much overhead. Please
> > let me know if there's anything specific I should explain how the
> > compiler will handle it, I admit I have only skimmed this thread.
> > 
> > On Fri, Aug 1, 2014 at 9:18 AM, Simon Kornblith <si...@simonster.com
> > 
> > <javascript:;>> wrote:
> > > On Friday, August 1, 2014 6:23:59 AM UTC-4, Milan Bouchet-Valat wrote:
> > >> Le jeudi 31 juillet 2014 à 21:19 -0700, John Myles White a écrit :
> > >> 
> > >> To address Simon’s general points, which are really good reasons to
> > 
> > avoid
> > 
> > >> jumping on the Option{T} bandwagon too soon:
> > >> 
> > >> 
> > >> 
> > >> * I agree that most languages use tagged union types for Option{T}
> > 
> > rather
> > 
> > >> than a wrapper type that contains a Boolean value. It’s also totally
> > 
> > true
> > 
> > >> that many compilers are able to make those constructs more efficient
> > 
> > than
> > 
> > >> Julia currently does. But what we should expect from Julia in the
> > >> coming
> > >> years isn’t so clear to me. (And I personally think we need to settle
> > 
> > on a
> > 
> > >> solution for representing missing data that’s viable in a year rather
> > 
> > than
> > 
> > >> viable in five years.) This is an issue that I’d really like to have
> > 
> > input
> > 
> > >> on from Jeff, Keno, Jameson or someone else involved with the internals
> > 
> > of
> > 
> > >> the compiler. Getting input from the broader community is the main
> > 
> > reason I
> > 
> > >> wanted to put a demo of OptionTypes.jl out in front of other folks.
> > >> 
> > >> 
> > >> 
> > >> * I’m not clear how we could come to know that a datum is not missing
> > >> without a resolution step that’s effectively equivalent to the get()
> > >> function for Option{T}. I agree that the enforced use of get() means
> > 
> > that
> > 
> > >> you can’t hope to use generic functions like sum on collections of
> > >> Option{T}. But I’m also not sure that’s such a bad thing: I think the
> > >> easiest way to express to the compiler that you know that all of the
> > 
> > entries
> > 
> > >> of a DataArray are not NA is to convert the DataArray to a straight
> > 
> > Array.
> > 
> > >> But maybe you have other mechanisms for expressing this knowledge.
> > 
> > Certainly
> > 
> > >> my proposal to do conversions to Arrays isn’t the most elegant
> > >> strategy.
> > >> It’s just all that I’ve got so far.
> > >> 
> > >> 
> > >> 
> > >> * I kind of like the idea of Option{T} standing outside of the main
> > >> type
> > >> system in a kind of mirror type system. I’m less happy about Union(NA,
> > 
> > T)
> > 
> > >> being a super type of T, even though there are some good reasons that
> > 
> > you’d
> > 
> > >> like to view T as a specialization of Union(NA, T). But I agree that I
> > 
> > don’t
> > 
> > >> have a good feel about where missing data belongs in the type
> > >> hierarchy.
> > >> This is another question for which I’d love to get input from others.
> > >> 
> > >> 
> > >> 
> > >> In regard to Simon’s performance points:
> > >> 
> > >> 
> > >> 
> > >> * Yes, memory usage alone argues strongly for working with DataArray{T}
> > >> rather than Array{Option{T}}.
> > >> 
> > >> 
> > >> 
> > >> * Exploting tricks that make operations like anyna() faster is another
> > >> good argument for keeping DataArray{T} around.
> > >> 
> > >> 
> > >> 
> > >> * I’m not sure how to deal with inlining concerns or the undefined
> > >> reference checks. Do you have ideas for improving this within
> > 
> > DataArrays or
> > 
> > >> do we need supporting changes in the compiler?
> > >> 
> > >> Actually it seems it would be possible to make Array{Union(NAtype, T)}
> > >> more similar to and as efficient as DataArray{T}, by handling a few
> > 
> > things
> > 
> > >> in the compiler. This would create a generalization of DataArray to any
> > 
> > kind
> > 
> > >> of union type, which could be useful in other contexts. But more
> > >> importantly, it would make missing values integrate seamlessly into
> > 
> > Julia,
> > 
> > >> getting rid of any hacks.
> > >> 
> > >> More specifically, the following features would need to be supported:
> > >> - a way of telling the compiler to store the data as two arrays of
> > >> concrete types (here T and NAtype), instead of as an array of boxed
> > 
> > values,
> > 
> > >> so that:
> > >>     * efficient operations can be performed on the T values (by
> > >>     skipping
> > >> 
> > >> the missing ones manually)
> > >> 
> > >>     * T values are stored as a dense array and can be converted to
> > >> 
> > >> Array{T} without any copy or passed to BLAS when no missing values are
> > >> present
> > >> 
> > >>    * NA values can be packed in a BitArray to save memory and make NA
> > >> 
> > >> detection faster (see below)
> > >> - a fonction to check whether a given element of the array is of type T
> > >> rather than of NAtype (generalization of isna())
> > >> - a fonction to check whether all elements of the array are of type T
> > >> rather than of NAtype (generalization of anyna(), more efficient than
> > >> calling the previous function on all elements thanks to the packing of
> > 
> > NAs
> > 
> > >> in a BitArray)
> > >> In this scheme, what is missing is how to allow the compiler to pack
> > >> NAs
> > >> in a BitArray. Somehow, NAtype would have to be defined as a 1-bit
> > 
> > object.
> > 
> > >> Maybe by making it an enum-like immutable with a 1-bit field inside it?
> > >> 
> > >> How does it sound?
> > > 
> > > I've thought a bit about this, but it seems like it would be too much
> > > complexity in the compiler. Storing arrays as something besides
> > 
> > contiguous
> > 
> > > elements and interaction between the codegen in C/C++ and the BitArray
> > 
> > code
> > 
> > > in Julia both seem likely to be painful, although Jeff, Keno, and
> > > Jameson
> > > would know better than I. Additionally, this optimization (of storage of
> > > arrays of unions of a singleton type and a bits type) seems pretty
> > 
> > specific
> > 
> > > to DataArrays, but the actual advantages in terms of performance and
> > > expressibility would be small or non-existent. (This is in contrast to
> > > optimizing storage/dispatch with union types, which could benefit a lot
> > 
> > of
> > 
> > > code and is something a lot of languages do.) Finally, there are cases
> > 
> > where
> > 
> > > it is useful to have direct access to the na BitArray chunks beyond
> > 
> > anyna,
> > 
> > > e.g. pairwise summation and reductions across the first dimension.
> > > 
> > > Simon

Reply via email to