Re: [julia-users] Array/Cell - a useful distinction, or not?

Oliver Woodford Wed, 30 Apr 2014 07:04:54 -0700

Stefan

Firstly, thank you for taking the time to write such a lengthy response. I 
think it comes closest to addressing the particular question I had. Sadly 
I'm an Engineer, not a Computer Scientist, so it's taking me a while to get 
my head round. I've put a few comments inline.


On Tuesday, April 29, 2014 4:51:52 PM UTC+1, Stefan Karpinski wrote:
>
> The real root issue here is parametric 
> variance<https://en.wikipedia.org/wiki/Covariance_and_contravariance_(computer_science)>.
>  
> Julia has opted for parametric invariance  – i.e. that P{A} <: P{B} only if 
> A = B – which is what James mentions in his StackOverflow answer. The 
> wikipedia page does a decent job of summarizing the issue:
>
> A programming language designer will consider variance when devising 
>> typing rules for e.g. arrays, inheritance, and generic datatypes. By making 
>> type constructors covariant or contravariant instead of invariant, more 
>> programs will be accepted as well-typed. On the other hand, programmers 
>> often find contravariance unintuitive, and accurately tracking variance to 
>> avoid runtime type errors can lead to complex typing rules. In order to 
>> keep the type system simple and allow useful programs, a language may treat 
>> a type constructor as invariant even if it would be safe to consider it 
>> variant, or treat it as covariant even when that can violate type safety.
>
>
> Julia has chosen the simple approach of invariance everywhere. This 
> simplifies the subtyping rules, but there is no free lunch – variance 
> issues still do crop up. But at least Julia's rules are simple if 
> occasionally a bit inconvenient. Basically, this is the situation:
>
>    - *covariance:* *P{A} <: P{B} ⟺ A <: B* – correct for read, incorrect 
>    for write.
>    - *contravariance:* *P{A} <: P{B} ⟺ B <: A* – correct for write, 
>    incorrect for read. 
>    - *invariance:* *P{A} <: P{B} ⟺ A = B* – correct for both read and 
>    write.
>
> I'm not sure what you mean by "correct for read, incorrect for write" etc.
 

> The main purpose of variance in a programming language is to allow 
> definitions to apply to more things. In particular, since the number of 
> types that a parametric type may encompass is unbounded, you don't want to 
> have to write an infinite number of method definitions to define some 
> functionality for P{A} for all A <: B where B is some unbounded abstract 
> type. Instead of accomplishing this with covariance or contravariance, 
> Julia accomplishes it with more powerful dispatch – specifically, method 
> signatures with quantified type parameters, such as:
>
> frob{A<:B}(x::P{A})
>
>
> This is effectively a covariant signature, applying to P{A} for any A <: 
> B. It seems like you would be happier if parametric types were covariant 
> and you were allowed to write this as:
>
> frob(x::P{B})
>
>
> where the method implicitly also applies to x::P{A} for every A <: B. This 
> would certainly be less of an eyesore.
>

Yes, I guess I am saying this, but only for a particular type, which would 
instantiate homogeneous arrays. I will call this Array, and the type for 
heterogeneous arrays I will call Cell. 
 

> There are a couple of important points to be made, however:
>
>    1. In the longer, more explicit form, it is clear that B may not be 
>    the actual parameter type of x; the parameter type of x is some A which is 
>    a subtype of B.
>
> With homogeneous arrays it is clear that the input array must have 
concrete type, even if the declaration uses an abstract type, therefore the 
short form still makes sense. This is like x::Real: we don't have to use 
frob{T<:Real}(x::T).
 

>
>    1. If the shorter form were shorthand for the longer form without 
>    explicit A, there would be no way to express that x has an 
>    exact parameter type of B.
>
>  With homogenous arrays, x::Array{Real} would not be a valid concrete 
type, so there is no ambiguity.

>
>    
> The explicit distinction between A – the actual parameter type of x – and 
> B, which is simply an upper bound on A, becomes clearer in situations like 
> this:
>
> frob{A<:B}(x::P{A}, y::A)
> frob{A<:B}(x::P{A}, y::B)
>
>
> The first definition only applies to x and y where y is of the actual 
> parameter type of x and thus could be used as is by x. The second 
> definition applies when y is of type B, regardless of x's actual parameter 
> type, which means that it might not be usable as is by x.
>
> We could actually make parametric types covariant without losing the 
> ability to express this distinction; it would just mean that the second 
> definition could be written as:
>
> frob(x::P{B}, y::B)
>
>
> Currently, this means that x must have exact parameter type B; if types 
> were covariant then it would mean that x's parameter type could be any A <: 
> B. Although it may seem questionable that y not be of the true parameter 
> type of x, allowing this wiggle room and then doing conversion at run-time 
> as necessary is actually the standard way to do things in Julia. Conversion 
> is often done implicitly by assignment to a type location such as a field 
> of x or an array slot. The fact that this is quite common is why covariance 
> might actually be ok. The main thing we would lose with parametric 
> covariance is the ability to talk about arrays that are of *exactly* 
> element type B where B is some abstract type. But I'm just not sure how 
> useful that actually is. Those types are generally discouraged in any case 
> and it's rare to want to give them special behaviors that wouldn't also 
> apply to parametric types with concrete parameters. So perhaps we ought to 
> consider making parametric types in Julia covariant.
>

As I said before, with homogeneous arrays, Array{Real} would be an abstract 
type, just like Real, and not a concrete type. If you want to talk about 
heterogeneous arrays that can contain any kind of Real, then I'm suggesting 
the use of Cell{Real}.
 

>
> Thinking about it some more, the big problem with covariance is that it 
> forces the confusion of two distinct meanings of P{B} for abstract B:
>    
>    1. The concrete instance of type P with actual parameter type B.
>
>
>    1. The abstract type including all instances P{A} where A <: B.
>
> If P{B} meant the latter, then we would either need a new way to express 
> the former or we would be forced to identify the two. Unlike static 
> languages, the concrete interpretation is not just a hallucination of the 
> compiler – it exists at run-time – so we can't just have no name for it – 
> there needs to be some concrete answer when you write typeof(x). 
> Identifying the two meanings is very un-Julian as it would allow a concrete 
> type to have subtypes, which would be a massive and to my mind distasteful 
> change.


I am suggesting having two types, Array and Cell. With Array, which would 
use covariance, you wouldn't be able to have a concrete instance with an 
abstract type, so no confusion there. Cell would use invariance, so you 
would be able to distinguish between the two in the current way, and we 
would know that the array would always be heterogeneous (even if all the 
elements have the same type).
 

>
> On Tue, Apr 29, 2014 at 12:13 PM, Oliver Woodford <oliver....@gmail.com>
>  wrote:
> Quick question, folks:
> Does f(x::Array{Union(Int8,Int16)}) mean that x must be all Int8 or all 
> Int16 (homogenous), or each element can be either Int8 or Int16 
> (heterogeneous)?
> If the latter, then would I need to 
> use f(x::Union(Array{Int8},Array{Int16})) to achieve the former?
> Yes and yes.
>  
> If so, then what I am suggesting is having two array types, Array and 
> Cell, and have f(x::Array{Union(Int8,Int16)}) be equivalent to the 
> current f(x::Union(Array{Int8},Array{Int16})), and have 
> f(x::Cell{Union(Int8,Int16)}) be equivalent to the current 
> f(x::Array{Union(Int8,Int16)}). 

 

> This is not a minor change. What you're asking for is having both 
> invariant and covariant parametric types in the language – Array would be 
> covariant and Cell would be invariant. If you're going to have invariant 
> and covariant type parameters, you'll want contravariant type parameters 
> too. At that point, you need to have a way of annotating the variance of a 
> type's parameters – and worse still, you have to explain all this to 
> everyone. At that point, you have Scala and all 
> thecomplications<http://blogs.atlassian.com/2013/01/covariance-and-contravariance-in-scala/>
>  and 
> questions<http://stackoverflow.com/questions/4531455/whats-the-difference-between-ab-and-b-in-scala>
>  that 
> come with that. I don't think that's the right path for a language that's 
> meant to be used by people who are scientists rather than professional 
> programmers (honestly, I don't really think it's a good idea for 
> professional programmers either). Making all parametric types covariant is 
> a more reasonable possibility, but as I pointed out, it has its own 
> significant issues.


Yes. Sorry, I've been answering in order! I don't know if I'll want 
contravariant parameters. I just want one (very important) type which 
behaves differently to others, because this is useful and will be intuitive 
to people.

Currently, if I want to enforce input arrays to be homogeneous, but allow 
them to accept any Number, then I need to write the following function 
declaration: frob(x::Union(Array{Unit8},Array{Int8},...a very long list of 
other types). Is there no shorter form for this? Surely there is a simpler 
way.

Re: [julia-users] Array/Cell - a useful distinction, or not?

Reply via email to