Re: [julia-users] Reducing complexity of OffsetArrays

Tamas Papp Wed, 02 Nov 2016 00:21:43 -0700

Is there a general recommendation for dealing with the case when a
function encounters two arrays with different index sets? Eg


OffsetArray(1:5,1:5) + OffsetArray(1:5, -2:2)

currently gives a DimensionMismatch exception via promote_shape, so
perhaps one should simply rely on what the latter does. But one can
imagine other options too (eg all unrepresented indices as 0, etc). If
there is a blog post or issue summarizing the options, I would be
interested in reading it (could not find anything).

On Tue, Nov 01 2016, Tim Holy wrote:

> There's still a simple model: the indices are the *key* of the entry. Think
> about an array as a very special Dict. You could create a Dict with integer
> keys, let's say from -2:5. Presumably you wouldn't be exactly happy if
> there were some magical way that the number you originally stored as `d[-1]
> = 3.2` were alternatively accessible as `d[2]`, simply because the smallest
> index was -2 and therefore 3.2 is the "second" entry?
>
> Like a Dict, for an array the value always goes with the key (the indices).
> Perhaps this will help:
> ```
> julia> using OffsetArrays
>
> julia> a = OffsetArray(rand(11), -5:5)
> OffsetArrays.OffsetArray{Float64,1,Array{Float64,1}} with indices -5:5:
>  0.815289
>  0.0043941
>  0.00403153
>  0.478065
>  0.150709
>  0.256156
>  0.934703
>  0.672495
>  0.428721
>  0.242469
>  0.43742
>
> julia> idx = OffsetArray(-1:1, -1:1)
> OffsetArrays.OffsetArray{Int64,1,UnitRange{Int64}} with indices -1:1:
>  -1
>   0
>   1
>
> julia> b = a[idx]
> OffsetArrays.OffsetArray{Float64,1,Array{Float64,1}} with indices -1:1:
>  0.150709
>  0.256156
>  0.934703
>
> julia> a[-1]
> 0.15070935766983662
>
> julia> b[-1]
> 0.15070935766983662
> ```
> So indexing `b = a[idx]` means that `b[j] = a[idx[j]]`. Does that help?
>
> Best,
> --Tim
>
> On Tue, Nov 1, 2016 at 3:36 PM, Bob Portmann <bobportm...@gmail.com> wrote:
>
>> Like I said, no real practical experience yet. The increase in complexity
>> that I fear is the loss of, e.g., writing arr[2,3] and having it not be the
>> element in the 2nd row and third column (i.e., the loss of a simple model
>> of how things are laid out). Maybe my fears are unfounded. Others don't
>> seem concerned it would seem.
>>
>> I'll check out those packages that you mention.
>>
>> Thanks,
>> Bob
>>
>> On Sun, Oct 30, 2016 at 2:29 PM, Tim Holy <tim.h...@gmail.com> wrote:
>>
>>> I'm afraid I still don't understand the claimed big increment in
>>> complexity. First, let's distinguish "generic offset arrays" from the
>>> OffsetArrays.jl package. If you're happy using OffsetArrays, you don't have
>>> to write your own offset-array type. Being able to use an established &
>>> tested package reduces your burden a lot, and you can ignore the second
>>> half of the devdocs page entirely.
>>>
>>> If you just want to *use* OffsetArrays.jl, the basic changes in coding
>>> style for writing indices-aware code are:
>>>
>>> - any place you used to call `size`, you probably want to call `indices`
>>> instead (and likely make minor adjustments elsewhere, since `incides`
>>> returns a tuple-of-ranges---but such changes tend to be very obvious);
>>> - check all uses of `similar`; some will stay as-is, other will migrate
>>> to `similar(f, inds)` style.
>>>
>>> In my experience, that's just about it. The devdocs goes into quite a lot
>>> of detail to explain the rationale, but really the actual changes are quite
>>> small. While you can't quite do it via `grep` and `sed`, to me that just
>>> doesn't seem complicated.
>>>
>>> Where the pain comes is that if you're converting old code, you sometimes
>>> have to think your way through it again---"hmm, what do I really mean by
>>> this index"? If your code had complicated indexing the first time you wrote
>>> it, unfortunately you're going to have to think about it carefully again;
>>> so in some cases, "porting" code is almost as bad as writing it the first
>>> time. However, if you write indices-aware code in the first place, in my
>>> experience the added burden is almost negligible, and in quite a few cases
>>> the ability to offset array indices makes things *easier* (e.g., "padding"
>>> an array on its edges is oh-so-much-clearer than it used to be, it's like a
>>> different world). That's the whole reason I implemented this facility in
>>> julia-0.5: to make life easier, not to make it harder. (Personally I think
>>> the whole argument over 0-based and 1-based indexing is stupid; it's the
>>> ability to use arbitrary indices that I find interesting & useful, and it
>>> makes most of my code prettier.)
>>>
>>> For examples of packages that use OffsetArrays, check the following:
>>> - CatIndices
>>> - FFTViews
>>> - ImageFiltering
>>>
>>> ImageFiltering is a mixed bag: there's a small performance penalty in a
>>> few cases (even if you use @unsafe) because that LLVM doesn't always
>>> optimize code as well as it could in principle (maybe @polly will help
>>> someday...). Because image filtering is an extraordinarily
>>> performance-sensitive operation, there are a few places where I had to make
>>> some yucky hand optimizations.
>>>
>>> Again, I'm very happy to continue this conversation---I'd really like to
>>> understand your concern, but without a concrete example I confess I'm lost.
>>>
>>> Best,
>>> --Tim
>>>
>>>
>>> On Sat, Oct 29, 2016 at 8:44 PM, Bob Portmann <bobportm...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for the thoughtful response. I hope you'll tolerate one reply in
>>>> the "abstract".
>>>>
>>>> I am resisting the big change that occurred in 0.5. In 0.4 and earlier
>>>> if one declares an array as an `AbstractArray` in a function then one knew
>>>> that the indices were one based (a nice simple model, even if it is hated
>>>> by many). In 0.5, if one wants to write general code, then one has to
>>>> assume that arrays can have ANY indices. And one needs to write code using
>>>> more abstract tools. This seems to me to be a large cost in complexity for
>>>> the small subset of cases where offset indices are helpful. This is the
>>>> core issue to me.
>>>>
>>>> One way around this, it would seem, is to declare arrays as `::Array`
>>>> and not `::AbstractArray` in functions (if one want to be sure they are
>>>> `OneTo`). But then one gives up accepting many types of arrays that would
>>>> pose no problem (e.g, DistributedArrays with OneTo indices). Thus, I'm
>>>> proposing to have a high level abstract type that would capture all arrays
>>>> types that can be assumed to be `OneTo`. Then one can write library
>>>> functions against that type. This alone would help I think.
>>>>
>>>> The auto-conversion is an extra step that I thought might work since it
>>>> is (I think) low cost to convert an `OffsetArray` to a `OneTo` array . Thus
>>>> if you passed an OffsetArray to a function that takes the abstract OneTo
>>>> type (I kept its name AbstractArray above but that need not be its name)
>>>> you expect that if it returned a `similar` array it would be of `OneTo`
>>>> type. You would then have to convert it back to an OffsetArray type. It
>>>> would probably be easy to write one line functions to do this for you. If
>>>> it was sparse I assume it would remain so. It would be up to the writer of
>>>> the new OffsetArray type to declare what type it would convert to in the
>>>> OneTo array abstract tree.
>>>>
>>>> Does this help?
>>>>
>>>> Bob
>>>>
>>>> ps I don't have any concrete examples yet because OffsetArrays are so
>>>> scarce in the julia ecosystem. I'm just looking ahead to a world where it
>>>> will no longer be possible to assume AbstractArrays are OneTo. This seems
>>>> more painful than telling the 0-based crowd to code against 1-based arrays.
>>>>
>>>>
>>>> On Sat, Oct 29, 2016 at 10:40 AM, Tim Holy <tim.h...@gmail.com> wrote:
>>>>
>>>>> I should add, abstract discussions are all well and good, but I fully
>>>>> recognize you may have something very specific in mind and my answer
>>>>> simply
>>>>> fails to understand your concern. If you can give one or more concrete
>>>>> examples of where the current system is either bug-prone or a pain in
>>>>> the
>>>>> neck, I'd love to see what you mean.
>>>>>
>>>>> Best,
>>>>> --Tim
>>>>>
>>>>> On Thursday, October 27, 2016 5:21:26 PM CDT Bob Portmann wrote:
>>>>> > TL;DR I think the type system should take care of converting between
>>>>> > different types of arrays and thus free the user to code against the
>>>>> type
>>>>> > of array they want (OneTo, ZeroTo, Offset, etc).
>>>>> >
>>>>> > Maybe I have a deep mis-understanding of how the new generalized
>>>>> offset
>>>>> > arrays are suppose to work in Julia. If so I'd be happy to be set
>>>>> straight.
>>>>> > If not then I think the present system will lead to unnecessary
>>>>> complexity
>>>>> > and bug-ridden code. In the present system one can define array types
>>>>> that
>>>>> > are not one-based but can have arbitrary offsets. This is a great
>>>>> idea but
>>>>> > to make it work a very general system for indexing relative to the
>>>>> ends of
>>>>> > the array has been devised. If one writes a function that accepts an
>>>>> > `AbstractArray` then one MUST use this more general system or produce
>>>>> bugs
>>>>> > when e.g. a `ZeroToArray` is passed in. This puts a large burden on
>>>>> writers
>>>>> > of library code that works on arrays and I think is an unnecessary
>>>>> > complexity.
>>>>> >
>>>>> > Since Fortran arrays are a nice example let look at the situation in
>>>>> > Fortran. If I define an array in a function `real arr(-10:10)` then I
>>>>> would
>>>>> > index it using indices that range from -10 to 10. If I pass this into
>>>>> > another function that declared the array using its total size (usually
>>>>> > passed in separately or in a parameter statement) `real arr(21)` then
>>>>> in
>>>>> > that subroutine one would index the array using "normal" indices that
>>>>> range
>>>>> > from 1 to 21. I.E., you state in the function how you want to treat
>>>>> the
>>>>> > array and are not forced to work with in offset form unless you want
>>>>> to. In
>>>>> > my opinion, this is what makes using offset arrays in Fortran sane
>>>>> and is
>>>>> > the key thing that Julia should try to emulate.
>>>>> >
>>>>> > One way to get this behavior in Julia would be to use the type system
>>>>> and
>>>>> > `convert`. Since `AbstractArray` has meant until 0.5 a `OneTo` array I
>>>>> > think it is wise that it remain that way so that all old code will
>>>>> work
>>>>> > unchanged (even with the new array types). Thus, I propose adding a
>>>>> new
>>>>> > top-level type above AbstractArray type to capture all arrays
>>>>> something
>>>>> > like:
>>>>> >
>>>>> > ```
>>>>> > Array{T,N} <: DenseArray{T,N} <: AbstractArray{T,N} <:
>>>>> > AbstractAllArray{T,N} <: Any
>>>>> > ```
>>>>> >
>>>>> > And the offset arrays would be subtypes of `AbstractAllArrays` on a
>>>>> > different branch
>>>>> >
>>>>> > ```
>>>>> > OffsetArray{T,N} <: AbstractOffsetArray{T,N} <: AbstractAllArray{T,N}
>>>>> <: Any
>>>>> > ```
>>>>> >
>>>>> > And similarly for `ZeroToArray`.
>>>>> >
>>>>> > Then, if one declares an Array as
>>>>> >
>>>>> > ```
>>>>> > function func(arr::AbstractArray)
>>>>> > ```
>>>>> >
>>>>> > one can safely assume it is a 1-based Array inside `func`. If an
>>>>> offset
>>>>> > array is passed into `func` then it is automatically converted to a
>>>>> `OneTo`
>>>>> > array inside `func`. This is the key point of this proposal and is
>>>>> similar
>>>>> > to the auto-conversion of, e.g., Int to Floats in a function call.
>>>>> > Similarly if one declares an array as a `ZeroToArray` in a function
>>>>> and
>>>>> > passes in an `Array` then it will be converted to a `ZeroToArray` in
>>>>> the
>>>>> > function. Some conversions would need more information and thus would
>>>>> be
>>>>> > disallowed (e.g., `Array` to `OffsetArray`). I think this system
>>>>> would go a
>>>>> > long way towards making `ZeroToArray`s and `OffsetArray`s simple to
>>>>> use in
>>>>> > Julia w/o using the generalized indexing techniques introduced in
>>>>> Julia
>>>>> > 0.5. Note that it would still be possible to use the
>>>>> `AbstractAllArray`
>>>>> > type and accept all arrays w/o conversion and use the generalized
>>>>> indexing
>>>>> > techniques.
>>>>> >
>>>>> > I'm curious what people think about this.
>>>>> >
>>>>> > Bob
>>>>> >
>>>>> > ps Paste into github to see in formatted form. I'm not sure how to
>>>>> get that
>>>>> > in an email.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Re: [julia-users] Reducing complexity of OffsetArrays

Reply via email to