Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
Does your code break if it is applied to an array with an abstract element type? If so, you must be doing something rather strange. Stefan, I think this is stated too strongly. There have been many questions and issues related to arrays with abstract element type. For functions returning arrays it is not so intuitive to get the promotion right and there are still problems in base, e.g. this funny one julia sum(Real[1.0 1.0],1) 1x2 Array{Int64,2}: 1 1 2014-05-01 17:20 GMT+02:00 Simon Kornblith si...@simonster.com: It seems to me that the massive difference in performance between homogeneous and heterogeneous arrays is at least in part a characteristic of the implementation and not the language. We currently store heterogeneous arrays as arrays of boxed pointers and perform function calls on values taken out of them using jl_apply_generic, but I think this could be made more efficient. For arrays of types that are sufficiently small or sufficiently close in size, e.g. Array{Union(Float32,Float64)}, we could store type information and value in the array instead of resorting to boxing, and we could create code that branches based on type unions instead of resorting to jl_apply_generic. Then (some?) heterogeneous arrays could be nearly as fast as homogeneous arrays, modulo branch prediction for the type. This is basically how we store DataArrays, which could just be Array{Union(T,NA)} if the compiler did this. This is easier for arrays of unions of concrete types than it is for arrays of abstract types, since for abstract types someone could subtype the element type of the array after the array has been constructed and both storage and code need to be able to accommodate that. But I'm not sure it's right to structure the language based on the performance characteristics of the current implementation unless we think they cannot be improved. Simon On Wednesday, April 30, 2014 10:08:15 AM UTC-4, Oliver Woodford wrote: On Wednesday, April 30, 2014 2:31:43 PM UTC+1, Patrick O'Leary wrote: It's a flexible type system, but it doesn't provide the power of ML or Haskell. If you really, really want this, do a runtime check: reduce((==), [typeof(el) for el in a]) I feel that the difference between homogeneous and heterogeneous arrays is a very important distinction, and not some odd thing that you might only rarely care about. The distinction has a massive impact on speed. The point of Julia is to be a fast dynamic language. Hiding the distinction under the carpet seems contrary to one of the aims of Julia. Ivar's suggestion of: @assert isleaftype(T) is nice, but it doesn't feel quite right to me. -- Med venlig hilsen Andreas Noack Jensen
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
Well, yes, I should have put a rather big asterisk on that: * Unless your code explicitly depends on the element type of that or derived arrays. So, yes, it's not that simple. On Fri, May 2, 2014 at 9:06 AM, Andreas Noack Jensen andreasnoackjen...@gmail.com wrote: Does your code break if it is applied to an array with an abstract element type? If so, you must be doing something rather strange. Stefan, I think this is stated too strongly. There have been many questions and issues related to arrays with abstract element type. For functions returning arrays it is not so intuitive to get the promotion right and there are still problems in base, e.g. this funny one julia sum(Real[1.0 1.0],1) 1x2 Array{Int64,2}: 1 1 2014-05-01 17:20 GMT+02:00 Simon Kornblith si...@simonster.com: It seems to me that the massive difference in performance between homogeneous and heterogeneous arrays is at least in part a characteristic of the implementation and not the language. We currently store heterogeneous arrays as arrays of boxed pointers and perform function calls on values taken out of them using jl_apply_generic, but I think this could be made more efficient. For arrays of types that are sufficiently small or sufficiently close in size, e.g. Array{Union(Float32,Float64)}, we could store type information and value in the array instead of resorting to boxing, and we could create code that branches based on type unions instead of resorting to jl_apply_generic. Then (some?) heterogeneous arrays could be nearly as fast as homogeneous arrays, modulo branch prediction for the type. This is basically how we store DataArrays, which could just be Array{Union(T,NA)} if the compiler did this. This is easier for arrays of unions of concrete types than it is for arrays of abstract types, since for abstract types someone could subtype the element type of the array after the array has been constructed and both storage and code need to be able to accommodate that. But I'm not sure it's right to structure the language based on the performance characteristics of the current implementation unless we think they cannot be improved. Simon On Wednesday, April 30, 2014 10:08:15 AM UTC-4, Oliver Woodford wrote: On Wednesday, April 30, 2014 2:31:43 PM UTC+1, Patrick O'Leary wrote: It's a flexible type system, but it doesn't provide the power of ML or Haskell. If you really, really want this, do a runtime check: reduce((==), [typeof(el) for el in a]) I feel that the difference between homogeneous and heterogeneous arrays is a very important distinction, and not some odd thing that you might only rarely care about. The distinction has a massive impact on speed. The point of Julia is to be a fast dynamic language. Hiding the distinction under the carpet seems contrary to one of the aims of Julia. Ivar's suggestion of: @assert isleaftype(T) is nice, but it doesn't feel quite right to me. -- Med venlig hilsen Andreas Noack Jensen
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Wednesday, April 30, 2014 9:13:50 PM UTC+1, Stefan Karpinski wrote: Back to the original subject, because I think there's some value in summarizing it. 1. Variance. It is somewhat unintuitive to a lot of newcomers that in Julia Array{Real} is a concrete type and Array{Int} is not a subtype of it. It might be possible to make parametric types in Julia covariant by default, but then one must either come up with a different way of writing the concrete type currently known as Array{Real} or allow concrete parametric types like Array{Real} to have subtypes, namely Array{T} for each T : Real. 2. Restricting type parameters to concrete types. The variance issue was a bit of a red herring. What Oliver was really after, it seems (correct me if I'm wrong), was being able to ensure that the T in Array{T} is always a concrete type. Well, that would certainly be a solution. You could have a type modifier, say Concrete(Real), which returns the union of all concrete types which are subtypes of Real (or whatever). Then you could use: frob{T:Concrete(Real)}(x:Array{T}) to assert that input arrays are homogeneous. What I would really like is to be able to write: frob(x:Array{Concrete(Real)}) instead, because the static parameter seems as redundant to a beginner as it does in frob{T:Real}(x::T). However, I understand that the two are different in the case of arrays/parameterized types, as the latter also encompasses some heterogeneous array types, and that that won't change unless the variance of Julia is changed. I'm not in a position to advocate such a thing.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Thursday, May 1, 2014 9:59:08 AM UTC+1, Oliver Woodford wrote: On Wednesday, April 30, 2014 9:13:50 PM UTC+1, Stefan Karpinski wrote: Back to the original subject, because I think there's some value in summarizing it. 1. Variance. It is somewhat unintuitive to a lot of newcomers that in Julia Array{Real} is a concrete type and Array{Int} is not a subtype of it. It might be possible to make parametric types in Julia covariant by default, but then one must either come up with a different way of writing the concrete type currently known as Array{Real} or allow concrete parametric types like Array{Real} to have subtypes, namely Array{T} for each T : Real. 2. Restricting type parameters to concrete types. The variance issue was a bit of a red herring. What Oliver was really after, it seems (correct me if I'm wrong), was being able to ensure that the T in Array{T} is always a concrete type. Well, that would certainly be a solution. You could have a type modifier, say Concrete(Real), which returns the union of all concrete types which are subtypes of Real (or whatever). Then you could use: frob{T:Concrete(Real)}(x:Array{T}) to assert that input arrays are homogeneous. What I would really like is to be able to write: frob(x:Array{Concrete(Real)}) instead, because the static parameter seems as redundant to a beginner as it does in frob{T:Real}(x::T). However, I understand that the two are different in the case of arrays/parameterized types, as the latter also encompasses some heterogeneous array types, and that that won't change unless the variance of Julia is changed. I'm not in a position to advocate such a thing. In fact, under the current type system frob(x:Array{Concrete(Real)}) is exactly equivalent to frob(x:Array{Real}). Julia could have a different type system, where Concrete() implies covariance, such that frob(x:Array{Concrete(Real)}) enforces arrays to be homogeneous (i.e. frob(x:Array{Concrete(Real)}) and frob(x:Array{Real}) are different), without losing the ability to describe any current array types. Just throwing it out there. Possibly more confusing than enabling.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
It seems to me that the massive difference in performance between homogeneous and heterogeneous arrays is at least in part a characteristic of the implementation and not the language. We currently store heterogeneous arrays as arrays of boxed pointers and perform function calls on values taken out of them using jl_apply_generic, but I think this could be made more efficient. For arrays of types that are sufficiently small or sufficiently close in size, e.g. Array{Union(Float32,Float64)}, we could store type information and value in the array instead of resorting to boxing, and we could create code that branches based on type unions instead of resorting to jl_apply_generic. Then (some?) heterogeneous arrays could be nearly as fast as homogeneous arrays, modulo branch prediction for the type. This is basically how we store DataArrays, which could just be Array{Union(T,NA)} if the compiler did this. This is easier for arrays of unions of concrete types than it is for arrays of abstract types, since for abstract types someone could subtype the element type of the array after the array has been constructed and both storage and code need to be able to accommodate that. But I'm not sure it's right to structure the language based on the performance characteristics of the current implementation unless we think they cannot be improved. Simon On Wednesday, April 30, 2014 10:08:15 AM UTC-4, Oliver Woodford wrote: On Wednesday, April 30, 2014 2:31:43 PM UTC+1, Patrick O'Leary wrote: It's a flexible type system, but it doesn't provide the power of ML or Haskell. If you really, really want this, do a runtime check: reduce((==), [typeof(el) for el in a]) I feel that the difference between homogeneous and heterogeneous arrays is a very important distinction, and not some odd thing that you might only rarely care about. The distinction has a massive impact on speed. The point of Julia is to be a fast dynamic language. Hiding the distinction under the carpet seems contrary to one of the aims of Julia. Ivar's suggestion of: @assert isleaftype(T) is nice, but it doesn't feel quite right to me.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Tuesday, April 29, 2014 8:07:08 PM UTC+1, Tim Holy wrote: On Tuesday, April 29, 2014 11:43:03 AM Tomas Lycken wrote: Say I have a variable x, and I want to make sure that it's an array of real numbers that are all of the same type, but I don't care which one. Can I say x::Array{T:Real} as a type assertion? function f{T:Real}(x::Array{T}) ... end does this. --Tim Really? In an Array{Real}, are the elements all Real, or concrete subtypes of Real, such as Float64, Uint8, etc.? I don't see how they can be just Real. Therefore, given that Array{Real} meets the type assertion, and it's elements can be of different types, the function declaration doesn't enforce the constraint on arrays that Tomas was after, i.e. that input arrays be homogeneous. As I understand it, if you want to do that you need: function f(x::Union(Array{Float64},Array{Float32},etc.)) ... end Is that correct? If not, what really is the correct way to constrain input arrays to be homogenous?
[julia-users] Re: Array/Cell - a useful distinction, or not?
On Tuesday, April 29, 2014 7:43:03 PM UTC+1, Tomas Lycken wrote: OK, well, just because I wrote that, I realize there is one problem that I can't off the top of my head say how I'd solve: type assertions for e.g. arrays. Say I have a variable x, and I want to make sure that it's an array of real numbers that are all of the same type, but I don't care which one. Can I say x::Array{T:Real} as a type assertion? (I'm at a computer without Julia atm, so I can't test it for myself...) No, you can't do x::Array{T:Real} or function f{T:Real}(x::Array{T}) and have that enforce homogeneous arrays. This is exactly the scenario I'm in. Yes (to address an earlier point), if I didn't bother with the type assertion, Julia would do the optimal thing in each case. However, my function will be a lot slower if you pass in a heterogeneous array, and I want to avoid programmers accidentally doing that. This is why I started this thread! Secondly, I don't think that should be done using a static parameter, either (not that it is). I agree with your other points on how nice Julia is, but I want to tackle the one narly point!
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Wednesday, April 30, 2014 7:52:16 AM UTC-5, Oliver Woodford wrote: Is that correct? If not, what really is the correct way to constrain input arrays to be homogenous? The tendency in Julia is to embrace that it's a dynamic language, and not excessively type constrain inputs. While I don't think there's a way to do exactly what you want, why do you want to do it? Patrick
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Wednesday, April 30, 2014 2:09:45 PM UTC+1, Patrick O'Leary wrote: On Wednesday, April 30, 2014 7:52:16 AM UTC-5, Oliver Woodford wrote: Is that correct? If not, what really is the correct way to constrain input arrays to be homogenous? The tendency in Julia is to embrace that it's a dynamic language, and not excessively type constrain inputs. While I don't think there's a way to do exactly what you want, why do you want to do it? When my function will be a lot slower if you pass in a heterogeneous array, and I want to avoid programmers accidentally and obliviously doing that. Now, I could convert heterogeneous arrays to homogeneous ones within the function, but the Julia style guidehttp://julia.readthedocs.org/en/latest/manual/style-guide/#handle-excess-argument-diversity-in-the-callervery sensibly counsels against that.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Wednesday, April 30, 2014 8:16:40 AM UTC-5, Oliver Woodford wrote: On Wednesday, April 30, 2014 2:09:45 PM UTC+1, Patrick O'Leary wrote: On Wednesday, April 30, 2014 7:52:16 AM UTC-5, Oliver Woodford wrote: Is that correct? If not, what really is the correct way to constrain input arrays to be homogenous? The tendency in Julia is to embrace that it's a dynamic language, and not excessively type constrain inputs. While I don't think there's a way to do exactly what you want, why do you want to do it? When my function will be a lot slower if you pass in a heterogeneous array, and I want to avoid programmers accidentally and obliviously doing that. Now, I could convert heterogeneous arrays to homogeneous ones within the function, but the Julia style guidehttp://julia.readthedocs.org/en/latest/manual/style-guide/#handle-excess-argument-diversity-in-the-callervery sensibly counsels against that. It's a flexible type system, but it doesn't provide the power of ML or Haskell. If you really, really want this, do a runtime check: reduce((==), [typeof(el) for el in a])
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
You can do something like function f{T:Real}(x::Array{T,1}) @assert isleaftype(T) # function implementation end kl. 15:16:40 UTC+2 onsdag 30. april 2014 skrev Oliver Woodford følgende: On Wednesday, April 30, 2014 2:09:45 PM UTC+1, Patrick O'Leary wrote: On Wednesday, April 30, 2014 7:52:16 AM UTC-5, Oliver Woodford wrote: Is that correct? If not, what really is the correct way to constrain input arrays to be homogenous? The tendency in Julia is to embrace that it's a dynamic language, and not excessively type constrain inputs. While I don't think there's a way to do exactly what you want, why do you want to do it? When my function will be a lot slower if you pass in a heterogeneous array, and I want to avoid programmers accidentally and obliviously doing that. Now, I could convert heterogeneous arrays to homogeneous ones within the function, but the Julia style guidehttp://julia.readthedocs.org/en/latest/manual/style-guide/#handle-excess-argument-diversity-in-the-callervery sensibly counsels against that.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
And today I learned about isleaftype()! This is a better implementation than what I posted. On Wednesday, April 30, 2014 8:32:45 AM UTC-5, Ivar Nesje wrote: You can do something like function f{T:Real}(x::Array{T,1}) @assert isleaftype(T) # function implementation end kl. 15:16:40 UTC+2 onsdag 30. april 2014 skrev Oliver Woodford følgende: On Wednesday, April 30, 2014 2:09:45 PM UTC+1, Patrick O'Leary wrote: On Wednesday, April 30, 2014 7:52:16 AM UTC-5, Oliver Woodford wrote: Is that correct? If not, what really is the correct way to constrain input arrays to be homogenous? The tendency in Julia is to embrace that it's a dynamic language, and not excessively type constrain inputs. While I don't think there's a way to do exactly what you want, why do you want to do it? When my function will be a lot slower if you pass in a heterogeneous array, and I want to avoid programmers accidentally and obliviously doing that. Now, I could convert heterogeneous arrays to homogeneous ones within the function, but the Julia style guidehttp://julia.readthedocs.org/en/latest/manual/style-guide/#handle-excess-argument-diversity-in-the-callervery sensibly counsels against that.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Wednesday, April 30, 2014 2:31:43 PM UTC+1, Patrick O'Leary wrote: It's a flexible type system, but it doesn't provide the power of ML or Haskell. If you really, really want this, do a runtime check: reduce((==), [typeof(el) for el in a]) I feel that the difference between homogeneous and heterogeneous arrays is a very important distinction, and not some odd thing that you might only rarely care about. The distinction has a massive impact on speed. The point of Julia is to be a fast dynamic language. Hiding the distinction under the carpet seems contrary to one of the aims of Julia. Ivar's suggestion of: @assert isleaftype(T) is nice, but it doesn't feel quite right to me.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Wednesday, April 30, 2014 3:20:23 PM UTC+1, Tim Holy wrote: You can add this as the first line of your function: assert_leaftype(T) where assert_leaftype(T) = isleaftype(T) || error(Must be a concrete type) To your users, this is at least as useful as ERROR: no method myfunction(Array{Real,1}) which is what it would be if you relied on method dispatch to generate the error (which is what I think you're asking for). --Tim It's OK, but it still requires me to have a static parameter. Is there a O(1) time solution which avoids the need for static parameters. Something like: function frob(x::Array) isleaftype(x) || error(Homogeneous array required) though I know this won't work.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
What's wrong with parameters? Are you making life harder for yourself than you need to? :-) If you're worried about the (very slight) overhead (isleaftype calls out to C, and therefore can't be inlined), then you could use a Union where you specify all the types you allow. Or define the following function: fastleaftype{T}(::Type{T}) = error(Must be a concrete type) for x in names(Core) T = eval(x) if isleaftype(T) T : Number @eval begin fastleaftype(::Type{$T}) = nothing end end end Then: function myfunction(x::Array) fastleaftype(eltype(x)) x .+ 1 end Now check this out: julia code_typed(myfunction, (Matrix{Float64},)) 1-element Array{Any,1}: :($(Expr(:lambda, {:x}, {{},{{:x,Array{Float64,2},0}},{}}, :(begin # none, line 2: nothing # line 3: return x::Array{Float64,2} .+ 1::Array{Float64,2} end::Array{Float64,2} julia code_typed(myfunction, (Matrix{Real},)) 1-element Array{Any,1}: :($(Expr(:lambda, {:x}, {{},{{:x,Array{Real,2},0}},{}}, :(begin # none, line 2: throw($(Expr(:new, :(top(getfield) (Base,:ErrorException)::Type{ErrorException}), Must be a concrete type))::ErrorException)::None # line 3: return x::Array{Real,2} .+ 1::Array{Real,2} end::Array{Real,2} The compiler does the check for you, and this adds nothing whatsoever to your runtime. --Tim On Wednesday, April 30, 2014 07:37:05 AM Oliver Woodford wrote: On Wednesday, April 30, 2014 3:20:23 PM UTC+1, Tim Holy wrote: You can add this as the first line of your function: assert_leaftype(T) where assert_leaftype(T) = isleaftype(T) || error(Must be a concrete type) To your users, this is at least as useful as ERROR: no method myfunction(Array{Real,1}) which is what it would be if you relied on method dispatch to generate the error (which is what I think you're asking for). --Tim It's OK, but it still requires me to have a static parameter. Is there a O(1) time solution which avoids the need for static parameters. Something like: function frob(x::Array) isleaftype(x) || error(Homogeneous array required) though I know this won't work.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
function frob(x::Array) isleaftype(eltype(x)) || error(Homogeneous array required')? Though, IMO, this is all a non-issue in my experience. Just specifying frob{T:Real}(x::Vector{T}) gets you exactly what you want--the ability to have JIT generate fast, efficient code for a range of types that the user can specify. The fact that this has never come up before or in any package implementations, to me, indicates that this issue if more of getting used to idiomatic Julia and spending some time playing with parametric types and the interactions with the type hierarchy. I come from a non-technical background and at first, the idea of parametric functions/types was a little wild and hard to wrap my head around, but after reading through the manual several times (which has a lot of great stuff!) and developing my own non-trivial codebase (Datetime.jl), I feel I'm comfortable with use cases and how they work in general. I think if you spend some more time developing code, poking around popular packages and Base, you'll come to find that there isn't really anything broken here (though quite possibly some things that need cleaned up a little). Cheers, -Jacob On Wed, Apr 30, 2014 at 10:37 AM, Oliver Woodford oliver.woodf...@gmail.com wrote: On Wednesday, April 30, 2014 3:20:23 PM UTC+1, Tim Holy wrote: You can add this as the first line of your function: assert_leaftype(T) where assert_leaftype(T) = isleaftype(T) || error(Must be a concrete type) To your users, this is at least as useful as ERROR: no method myfunction(Array{Real,1}) which is what it would be if you relied on method dispatch to generate the error (which is what I think you're asking for). --Tim It's OK, but it still requires me to have a static parameter. Is there a O(1) time solution which avoids the need for static parameters. Something like: function frob(x::Array) isleaftype(x) || error(Homogeneous array required) though I know this won't work.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
I really think the only code that should be too worried about whether something is a leaf type or not is the compiler... On Wed, Apr 30, 2014 at 10:59 AM, Tim Holy tim.h...@gmail.com wrote: What's wrong with parameters? Are you making life harder for yourself than you need to? :-) If you're worried about the (very slight) overhead (isleaftype calls out to C, and therefore can't be inlined), then you could use a Union where you specify all the types you allow. Or define the following function: fastleaftype{T}(::Type{T}) = error(Must be a concrete type) for x in names(Core) T = eval(x) if isleaftype(T) T : Number @eval begin fastleaftype(::Type{$T}) = nothing end end end Then: function myfunction(x::Array) fastleaftype(eltype(x)) x .+ 1 end Now check this out: julia code_typed(myfunction, (Matrix{Float64},)) 1-element Array{Any,1}: :($(Expr(:lambda, {:x}, {{},{{:x,Array{Float64,2},0}},{}}, :(begin # none, line 2: nothing # line 3: return x::Array{Float64,2} .+ 1::Array{Float64,2} end::Array{Float64,2} julia code_typed(myfunction, (Matrix{Real},)) 1-element Array{Any,1}: :($(Expr(:lambda, {:x}, {{},{{:x,Array{Real,2},0}},{}}, :(begin # none, line 2: throw($(Expr(:new, :(top(getfield) (Base,:ErrorException)::Type{ErrorException}), Must be a concrete type))::ErrorException)::None # line 3: return x::Array{Real,2} .+ 1::Array{Real,2} end::Array{Real,2} The compiler does the check for you, and this adds nothing whatsoever to your runtime. --Tim On Wednesday, April 30, 2014 07:37:05 AM Oliver Woodford wrote: On Wednesday, April 30, 2014 3:20:23 PM UTC+1, Tim Holy wrote: You can add this as the first line of your function: assert_leaftype(T) where assert_leaftype(T) = isleaftype(T) || error(Must be a concrete type) To your users, this is at least as useful as ERROR: no method myfunction(Array{Real,1}) which is what it would be if you relied on method dispatch to generate the error (which is what I think you're asking for). --Tim It's OK, but it still requires me to have a static parameter. Is there a O(1) time solution which avoids the need for static parameters. Something like: function frob(x::Array) isleaftype(x) || error(Homogeneous array required) though I know this won't work.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Wednesday, April 30, 2014 3:46:54 PM UTC+1, Jacob Quinn wrote: function frob(x::Array) isleaftype(eltype(x)) || error(Homogeneous array required')? Though, IMO, this is all a non-issue in my experience. Just specifying frob{T:Real}(x::Vector{T}) gets you exactly what you want--the ability to have JIT generate fast, efficient code for a range of types that the user can specify. The fact that this has never come up before or in any package implementations, to me, indicates that this issue if more of getting used to idiomatic Julia and spending some time playing with parametric types and the interactions with the type hierarchy. I come from a non-technical background and at first, the idea of parametric functions/types was a little wild and hard to wrap my head around, but after reading through the manual several times (which has a lot of great stuff!) and developing my own non-trivial codebase (Datetime.jl), I feel I'm comfortable with use cases and how they work in general. I think if you spend some more time developing code, poking around popular packages and Base, you'll come to find that there isn't really anything broken here (though quite possibly some things that need cleaned up a little). Cheers, -Jacob I have started developing my own non-trivial code base, which is what prompted me to start this thread. I also have a technical background. You don't care whether someone uses homogeneous or heterogeneous arrays. Maybe you think it's their problem if they make things slow. I want to make them aware of it. On that we differ. What I find absurd is that, rather than accept that different position, based on the explanation I have given (to make things fast), or make a point which invalidates my argument, you attempt to explain away this need by the fact that I am new to the language and that I'm simply not used to it. You have taken a very concrete point that I make, and attempted to dismiss it with the suggestion that I simply don't know what I'm talking about. Not at all constructive.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
I think Stefan was merely saying that my illustration of how to inline away the type-check was overkill :). --Tim On Wednesday, April 30, 2014 08:37:15 AM Oliver Woodford wrote: On Wednesday, April 30, 2014 4:01:22 PM UTC+1, Stefan Karpinski wrote: I really think the only code that should be too worried about whether something is a leaf type or not is the compiler... Are you saying: a) we should use another way to distinguish between homogeneous and heterogeneous input arrays, or b) that only the compiler should care whether input arrays are homogeneous and heterogeneous input arrays. If b) then: 1) doesn't it make a big difference to the speed of some functions, e.g. sum()? and 2) if so, is it unreasonable to encourage the user to use homogeneous arrays in those situations?
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
I think this suggestion was based on Jacob's personal experience – which others seem to have also had – that after a bit of time letting the language wash over you Julia, things that seemed problematic or disconcerting early on have a way of not seeming important anymore. That might or might not happen for you with this particular issue. Personally, I don't think library code should be forcing arrays to have concrete element types. Is the library going to raise an error? If so, you're needlessly forcing the user to either not use the library or modify it to excise the pointless check. That seems bad. Is it going to print a warning but let you continue? That's going to be annoying, although at least I guess the user definitely knows they have a array with abstract element type. But is this really your place to warn them? This seems to treat the user like a child – shouldn't a library assume that the user is doing things for a reason and not by mistake? Since the main concern with heterogeneous arrays is performance, I feel that should be discovered via benchmarking or other performance analysis tools, not because some library is feeling avuncular. On Wed, Apr 30, 2014 at 11:30 AM, Oliver Woodford oliver.woodf...@gmail.com wrote: On Wednesday, April 30, 2014 3:46:54 PM UTC+1, Jacob Quinn wrote: function frob(x::Array) isleaftype(eltype(x)) || error(Homogeneous array required')? Though, IMO, this is all a non-issue in my experience. Just specifying frob{T:Real}(x::Vector{T}) gets you exactly what you want--the ability to have JIT generate fast, efficient code for a range of types that the user can specify. The fact that this has never come up before or in any package implementations, to me, indicates that this issue if more of getting used to idiomatic Julia and spending some time playing with parametric types and the interactions with the type hierarchy. I come from a non-technical background and at first, the idea of parametric functions/types was a little wild and hard to wrap my head around, but after reading through the manual several times (which has a lot of great stuff!) and developing my own non-trivial codebase (Datetime.jl), I feel I'm comfortable with use cases and how they work in general. I think if you spend some more time developing code, poking around popular packages and Base, you'll come to find that there isn't really anything broken here (though quite possibly some things that need cleaned up a little). Cheers, -Jacob I have started developing my own non-trivial code base, which is what prompted me to start this thread. I also have a technical background. You don't care whether someone uses homogeneous or heterogeneous arrays. Maybe you think it's their problem if they make things slow. I want to make them aware of it. On that we differ. What I find absurd is that, rather than accept that different position, based on the explanation I have given (to make things fast), or make a point which invalidates my argument, you attempt to explain away this need by the fact that I am new to the language and that I'm simply not used to it. You have taken a very concrete point that I make, and attempted to dismiss it with the suggestion that I simply don't know what I'm talking about. Not at all constructive.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
I was just trying to share some of my own experience with Julia from having used it for the last two years, not be dismissive or condescending. That this hasn't been a show-stopper for some 300+ packages now, IMO, *is* a valid point against making a somewhat disruptive change to the type system. By stating that I come from a *non*-technical background, I was also trying to qualify that you may feel free to dismiss my argument because I may well be talking nonsense :). -Jacob On Wed, Apr 30, 2014 at 11:30 AM, Oliver Woodford oliver.woodf...@gmail.com wrote: On Wednesday, April 30, 2014 3:46:54 PM UTC+1, Jacob Quinn wrote: function frob(x::Array) isleaftype(eltype(x)) || error(Homogeneous array required')? Though, IMO, this is all a non-issue in my experience. Just specifying frob{T:Real}(x::Vector{T}) gets you exactly what you want--the ability to have JIT generate fast, efficient code for a range of types that the user can specify. The fact that this has never come up before or in any package implementations, to me, indicates that this issue if more of getting used to idiomatic Julia and spending some time playing with parametric types and the interactions with the type hierarchy. I come from a non-technical background and at first, the idea of parametric functions/types was a little wild and hard to wrap my head around, but after reading through the manual several times (which has a lot of great stuff!) and developing my own non-trivial codebase (Datetime.jl), I feel I'm comfortable with use cases and how they work in general. I think if you spend some more time developing code, poking around popular packages and Base, you'll come to find that there isn't really anything broken here (though quite possibly some things that need cleaned up a little). Cheers, -Jacob I have started developing my own non-trivial code base, which is what prompted me to start this thread. I also have a technical background. You don't care whether someone uses homogeneous or heterogeneous arrays. Maybe you think it's their problem if they make things slow. I want to make them aware of it. On that we differ. What I find absurd is that, rather than accept that different position, based on the explanation I have given (to make things fast), or make a point which invalidates my argument, you attempt to explain away this need by the fact that I am new to the language and that I'm simply not used to it. You have taken a very concrete point that I make, and attempted to dismiss it with the suggestion that I simply don't know what I'm talking about. Not at all constructive.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Wed, Apr 30, 2014 at 11:37 AM, Oliver Woodford oliver.woodf...@gmail.com wrote: On Wednesday, April 30, 2014 4:01:22 PM UTC+1, Stefan Karpinski wrote: I really think the only code that should be too worried about whether something is a leaf type or not is the compiler... Are you saying: a) we should use another way to distinguish between homogeneous and heterogeneous input arrays, or b) that only the compiler should care whether input arrays are homogeneous and heterogeneous input arrays. b) If b) then: 1) doesn't it make a big difference to the speed of some functions, e.g. sum()? and 2) if so, is it unreasonable to encourage the user to use homogeneous arrays in those situations? It can make a big difference, but I don't think that libraries or the language should refuse perfectly reasonable functionality because it might be a bit slower. There are lots of situations where you don't care that much about performance and flexibility is more important. Doing computations with BigInts and BigFloats is dog slow compared to using native types but sometimes the increased precision is more important than speed. Should libraries refuse to work with BigInts and BigFloats because they're known to be slow? Even if the same code would work just fine with BitInts as Ints? That's very un-Julian. The Julian approach is informed consent. You can use heterogeneous arrays and they will work, but they might be slow. If your code is too slow, consider changing it so that it uses homogeneous arrays instead. If knowing that this is the problem is hard, we should – and do – provide tools to help discover such things. The TypeCheck packagehttps://github.com/astrieanna/TypeCheck.jlis a great example of such a tool, giving many of the benefits of the static type checking without forcing static type checking on the user everywhere. TypeCheck can warn the user about things that they are allowed to do, but which might cause performance problems, like changing the type of a variable over the course of a loop or having functions whose return type depends on the values of its inputs, not just their type.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Wednesday, April 30, 2014 4:44:53 PM UTC+1, Stefan Karpinski wrote: I think this suggestion was based on Jacob's personal experience – which others seem to have also had – that after a bit of time letting the language wash over you Julia, things that seemed problematic or disconcerting early on have a way of not seeming important anymore. That might or might not happen for you with this particular issue. I understand, but I think it helps to counter arguments that have been made before resorting to more prosaic explanations. After all, we don't all have the same experiences, but we can all agree on whether homogeneous arrays are more efficient. IMO it didn't help the discussion and detracted from from the valid points I made (which, granted, annoyed me). Hence my rebuttal. Personally, I don't think library code should be forcing arrays to have concrete element types. Is the library going to raise an error? If so, you're needlessly forcing the user to either not use the library or modify it to excise the pointless check. That seems bad. Is it going to print a warning but let you continue? That's going to be annoying, although at least I guess the user definitely knows they have a array with abstract element type. But is this really your place to warn them? This seems to treat the user like a child – shouldn't a library assume that the user is doing things for a reason and not by mistake? Since the main concern with heterogeneous arrays is performance, I feel that should be discovered via benchmarking or other performance analysis tools, not because some library is feeling avuncular. The Julian approach. Catchy. I can certainly understand your point of view on this. However, why shouldn't it be the library's place to tell it's users how to make it run faster? The argument of treating the user like a child is an exaggeration, of course. You're treating them like someone who doesn't appreciate the efficiency savings of homogeneous arrays over heterogeneous ones. For the people that applies to, I suspect they'd appreciate the tip. I suppose the only real argument is that some advanced users may have a real need to pass in a heterogeneous array, and the library wouldn't let them. But then you're weighing up that risk against helping less advanced users (and even advanced users who've accidentally allowed an array to be heterogeneous) use it efficiently. It's not a clear cut decision; it's very subjective. Yes, benchmarking is another useful tool that help users improve performance, but it's not an argument as to why libraries shouldn't. So I think this is a situation where we can agree to disagree. An avuncular library - catchy too! The TypeCheck package sounds good. Does it check for homogeneity of arrays? Even so, for a library that would be impossible to catch when the heterogeneous array was be being passed in through the API, so then you're again relying on users to run that on their particular program. Will they?
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
I apologize if you felt it was a personal attack, because it certainly wasn't intended as such. I've always appreciated the Julia community and how welcoming and encouraging it is and I wouldn't want to tarnish that reputation. I'm far from the best person to comment on the particularities of type invariance/co-variance/contra-variance, but I still enjoy contributing to discussions when I can. In this case, I felt my own experience getting comfortable with parametric types could be useful. -Jacob On Wed, Apr 30, 2014 at 1:00 PM, Oliver Woodford oliver.woodf...@gmail.comwrote: On Wednesday, April 30, 2014 4:51:13 PM UTC+1, Jacob Quinn wrote: I was just trying to share some of my own experience with Julia from having used it for the last two years, not be dismissive or condescending. That this hasn't been a show-stopper for some 300+ packages now, IMO, *is* a valid point against making a somewhat disruptive change to the type system. OK, well from the first two lines it seemed like you were contesting the issue that detecting homo/heterogeneity could ever be useful, not about my suggested change to the language. I had given a good reason for the former already, which you didn't address. Also, the statement frob{T:Real}(x::Vector{T}) gets you exactly what you want isn't true; in my original post I specifically said I wanted to distinguish homogeneous and heterogeneous arrays, and as was pointed out, this doesn't achieve that. I was just trying to... No, you were *also* trying to explain away my question on the grounds of my experience, rather than address it directly. But a true fact is true, regardless of who says it, and we should give weight to the argument rather than who is saying it. The latter just gives more power to those who already have it, and corrupts society. Please consider this point, because I stand by what I said - I don't think your response was constructive. It attacked me and not my argument, and that is not a healthy way to evaluate ideas. By stating that I come from a *non*-technical background, I was also trying to qualify that you may feel free to dismiss my argument because I may well be talking nonsense :). Duly dismissed :).
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Wednesday, April 30, 2014 6:47:43 PM UTC+1, Jacob Quinn wrote: I apologize if you felt it was a personal attack, because it certainly wasn't intended as such. I've always appreciated the Julia community and how welcoming and encouraging it is and I wouldn't want to tarnish that reputation. I'm far from the best person to comment on the particularities of type invariance/co-variance/contra-variance, but I still enjoy contributing to discussions when I can. In this case, I felt my own experience getting comfortable with parametric types could be useful. -Jacob I don't think it was a personal attack - I'm sorry I used the word because you have missed the point I was making as a result. I am not after an apology. I don't want you to stop providing your valuable experience with anyone. Your testimony could be told to every person about to learn Julia - that would be great. However you shared it when we were halfway through a debate about a particular point, that we could perfectly well discuss on its own merits, drawing attention away from that and on to my experience, which at that point seems irrelevant. The point I made in my last post, and that I make again now, is that that approach to debate doesn't do us (me, you, this group, society) any favours - it's akin to focussing on personality not policy in politics, and what does that get us? IMO it simply wasn't the right moment for you to share that experience. All I would like is an understanding and acknowledgement of that point. And if you don't get it or disagree, not to worry. Please do continue to contribute in whatever way you feel is appropriate - and if I have had a positive impact in that respect, great. This will be my last message on the subject (trying to make the world a better place though sometimes it doesn't feel like it), so feel free to have the last word.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
Back to the original subject, because I think there's some value in summarizing it. 1. Variance. It is somewhat unintuitive to a lot of newcomers that in Julia Array{Real} is a concrete type and Array{Int} is not a subtype of it. It might be possible to make parametric types in Julia covariant by default, but then one must either come up with a different way of writing the concrete type currently known as Array{Real} or allow concrete parametric types like Array{Real} to have subtypes, namely Array{T} for each T : Real. 2. Restricting type parameters to concrete types. The variance issue was a bit of a red herring. What Oliver was really after, it seems (correct me if I'm wrong), was being able to ensure that the T in Array{T} is always a concrete type. On Wed, Apr 30, 2014 at 4:02 PM, Oliver Woodford oliver.woodf...@gmail.comwrote: On Wednesday, April 30, 2014 6:47:43 PM UTC+1, Jacob Quinn wrote: I apologize if you felt it was a personal attack, because it certainly wasn't intended as such. I've always appreciated the Julia community and how welcoming and encouraging it is and I wouldn't want to tarnish that reputation. I'm far from the best person to comment on the particularities of type invariance/co-variance/contra-variance, but I still enjoy contributing to discussions when I can. In this case, I felt my own experience getting comfortable with parametric types could be useful. -Jacob I don't think it was a personal attack - I'm sorry I used the word because you have missed the point I was making as a result. I am not after an apology. I don't want you to stop providing your valuable experience with anyone. Your testimony could be told to every person about to learn Julia - that would be great. However you shared it when we were halfway through a debate about a particular point, that we could perfectly well discuss on its own merits, drawing attention away from that and on to my experience, which at that point seems irrelevant. The point I made in my last post, and that I make again now, is that that approach to debate doesn't do us (me, you, this group, society) any favours - it's akin to focussing on personality not policy in politics, and what does that get us? IMO it simply wasn't the right moment for you to share that experience. All I would like is an understanding and acknowledgement of that point. And if you don't get it or disagree, not to worry. Please do continue to contribute in whatever way you feel is appropriate - and if I have had a positive impact in that respect, great. This will be my last message on the subject (trying to make the world a better place though sometimes it doesn't feel like it), so feel free to have the last word.
[julia-users] Re: Array/Cell - a useful distinction, or not?
Sorry for nitpicking, but point 3 is wrong, and it might cause trouble in the following discussion. f{T:Real}(a::Array{T}) Matches any array with a element type that is a subtype of Real (eg. Integer[1,2,BigInt(44)] and Real[1, 3.4]) I have had trouble with this too, but now that I somewhat understand the rationale, I'm less frustrated. I'm not at the point of defending the current behaviour (yet), so others will have to do that (again). kl. 11:38:33 UTC+2 tirsdag 29. april 2014 skrev Oliver Woodford følgende: A habitual MATLAB user, I've been trying out Julia over the last two weeks to see if it might be a suitable replacement. I want something that is as fast to develop using, but has much faster runtime. Perhaps I'll write about my general thoughts in another post. However, in this thread I want to address one linguistic thing I found confusing. Ignoring subarrays, dense/sparse arrays, there are two main types of array in Julia. I will call them homogenous and heterogenous. Homogenous arrays are declared as having all elements be the same type: e.g. array{Float64}. They are efficient to store in memory, as the elements are simply laid out consecutively in memory. Heterogenous arrays have an abstract element type, e.g. array{Real}. The way Julia interprets this is that every element must be a concrete subtype of Real, but that they don't have to be the same type. Each element can therefore be a different type, with different storage requirements, so these arrays contain a pointer to each element, which is then stored somewhere else - this carries a massive overhead. In MATLAB these arrays would be termed an array and a cell array respectively, so there is a clear distinction. What I found confusing with Julia is that the distinction is less clear. This confusion was highlighted in a stackoverflow questionhttp://stackoverflow.com/questions/23326848/julia-arrays-with-abstract-parameters-cause-errors-but-variables-with-abstract, which I'll outline it again, now: f(x::Real) = x is equivalent to f{T:Real}(x::T) = x, but f(x::Array{Real}) = x is different from f{T:Real}(x::Array{T}) = x. The second form for input arrays, requiring static parameters, is needed to declare that the array is homogenous, not heterogenous. This seems a funny way of doing things to me because: 1. The homogeneity/heterogeneity of the array is a characteristic of the array only, and not of the function 2. The static parameter T is not required anywhere else, and the Julia style guidehttp://julia.readthedocs.org/en/latest/manual/style-guide/#don-t-use-unnecessary-static-parameters explicitly counsels against the use of such parameters, where they are unnecessary. 3. To declare a function which can take homogenous or heterogenous arrays, I believe you'd have to do something like f{T:Real}(x::Union(Array{T}, Array{Real})) = x, which seems totally bizarre (due to point 1). What I would advocate instead is two types of array, one homogenous, one heterogenous. Array for homogenous and Cell for heterogenous would work. It would do away with the need for static parameters in this case, and also, in my view, make people far more aware of when they are using the different types of array. I suspect many beginners are oblivious to the distinction, currently. In the stackoverflow question, someone suggested two points against this: 1. Having an array whose elements are all guaranteed to be some subtype of Real is not particularly useful without specifying which subtype since without that information almost no structural information is being provided to the compiler (e.g. about memory layout, etc.) Well, firstly I disagree; there is a lot of structural information being supplied - to read each element, the compiler knows that it just needs to compute an offset, rather than compute an offset, read a pointer and then read another memory location. However, I don't think this is exploited (though it could be) because the function will be recompiled from scratch for each element type. Secondly, this isn't about helping the compiler, it's about making the language more consistent and sensible - helping the *user*. 2. You almost always pass homogenous arrays of a concrete type as arguments anyway and the compiler is able to specialize on that. Firstly, homogenous arrays that you pass in *always *have a concrete type. Secondly, you don't always know what that type will be. It might be Float64 or Unit8, etc. I haven't yet heard a convincing counterargument to it making more sense to distinguish homogenous and heterogenous arrays by the array type rather than by static function parameter. Let the discussion begin...
[julia-users] Re: Array/Cell - a useful distinction, or not?
On Tuesday, April 29, 2014 3:48:52 PM UTC+1, Ivar Nesje wrote: Sorry for nitpicking, but point 3 is wrong, and it might cause trouble in the following discussion. f{T:Real}(a::Array{T}) Matches any array with a element type that is a subtype of Real (eg. Integer[1,2,BigInt(44)] and Real[1, 3.4]) Are you saying that f{T:Real}(a::Array{T}) covers both homogeneous and heterogeneous arrays, whereas f(a::Array{Real}) only covers heterogeneous arrays? If that's the case it strikes me as just as confusing. I have had trouble with this too, but now that I somewhat understand the rationale, I'm less frustrated. I'm not at the point of defending the current behaviour (yet), so others will have to do that (again). If this has already been discussed then by all means post a link to it. No point repeating things! kl. 11:38:33 UTC+2 tirsdag 29. april 2014 skrev Oliver Woodford følgende: A habitual MATLAB user, I've been trying out Julia over the last two weeks to see if it might be a suitable replacement. I want something that is as fast to develop using, but has much faster runtime. Perhaps I'll write about my general thoughts in another post. However, in this thread I want to address one linguistic thing I found confusing. Ignoring subarrays, dense/sparse arrays, there are two main types of array in Julia. I will call them homogenous and heterogenous. Homogenous arrays are declared as having all elements be the same type: e.g. array{Float64}. They are efficient to store in memory, as the elements are simply laid out consecutively in memory. Heterogenous arrays have an abstract element type, e.g. array{Real}. The way Julia interprets this is that every element must be a concrete subtype of Real, but that they don't have to be the same type. Each element can therefore be a different type, with different storage requirements, so these arrays contain a pointer to each element, which is then stored somewhere else - this carries a massive overhead. In MATLAB these arrays would be termed an array and a cell array respectively, so there is a clear distinction. What I found confusing with Julia is that the distinction is less clear. This confusion was highlighted in a stackoverflow questionhttp://stackoverflow.com/questions/23326848/julia-arrays-with-abstract-parameters-cause-errors-but-variables-with-abstract, which I'll outline it again, now: f(x::Real) = x is equivalent to f{T:Real}(x::T) = x, but f(x::Array{Real}) = x is different from f{T:Real}(x::Array{T}) = x. The second form for input arrays, requiring static parameters, is needed to declare that the array is homogenous, not heterogenous. This seems a funny way of doing things to me because: 1. The homogeneity/heterogeneity of the array is a characteristic of the array only, and not of the function 2. The static parameter T is not required anywhere else, and the Julia style guidehttp://julia.readthedocs.org/en/latest/manual/style-guide/#don-t-use-unnecessary-static-parameters explicitly counsels against the use of such parameters, where they are unnecessary. 3. To declare a function which can take homogenous or heterogenous arrays, I believe you'd have to do something like f{T:Real}(x::Union(Array{T}, Array{Real})) = x, which seems totally bizarre (due to point 1). What I would advocate instead is two types of array, one homogenous, one heterogenous. Array for homogenous and Cell for heterogenous would work. It would do away with the need for static parameters in this case, and also, in my view, make people far more aware of when they are using the different types of array. I suspect many beginners are oblivious to the distinction, currently. In the stackoverflow question, someone suggested two points against this: 1. Having an array whose elements are all guaranteed to be some subtype of Real is not particularly useful without specifying which subtype since without that information almost no structural information is being provided to the compiler (e.g. about memory layout, etc.) Well, firstly I disagree; there is a lot of structural information being supplied - to read each element, the compiler knows that it just needs to compute an offset, rather than compute an offset, read a pointer and then read another memory location. However, I don't think this is exploited (though it could be) because the function will be recompiled from scratch for each element type. Secondly, this isn't about helping the compiler, it's about making the language more consistent and sensible - helping the *user*. 2. You almost always pass homogenous arrays of a concrete type as arguments anyway and the compiler is able to specialize on that. Firstly, homogenous arrays that you pass in *always *have a concrete type. Secondly, you don't always know what that type will be. It might be Float64 or Unit8, etc. I haven't yet heard a convincing
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Apr 29, 2014, at 8:28 AM, Oliver Woodford oliver.woodf...@gmail.com wrote: Are you saying that f{T:Real}(a::Array{T}) covers both homogeneous and heterogeneous arrays, whereas f(a::Array{Real}) only covers heterogeneous arrays? If that's the case it strikes me as just as confusing. This is indeed how parametric types work in Julia. Array{T} is a family of types, one for each value of T. Defining f{T}(a::Array{T}) allows you to generically define an infinite number of functions -- you get a unique function for each value of T. In contrast, Array{Real} is a single type -- specifically, the type generated by the parametric type Array{T} when its parameter T = Real. There is no variable T, so f(a::Array{Real}) defines exactly one function -- and that function accepts exactly one type of input. Array{Any} is also a single type. -- John
[julia-users] Re: Array/Cell - a useful distinction, or not?
It might be easier to understand if we start by stripping away the constraints and the function syntax and look just at the array type. Array{Real} is a concrete type. It's an array whose elements are all subtypes of the abstract type Real. Array{T} is a family of types. It describes a whole bunch of types, including Array{Real}, but also including Array{Float64}, Array{All} and Array{Bob}. T can itself be either abstract--a heterogenous array--or concrete. The constrained type variable {T:Real}(::Array{T}) still describes a family of types, but that family is smaller. Array{All} and Array{Bob} are out. Hope this helps! On Tuesday, April 29, 2014 10:28:28 AM UTC-5, Oliver Woodford wrote: On Tuesday, April 29, 2014 3:48:52 PM UTC+1, Ivar Nesje wrote: Sorry for nitpicking, but point 3 is wrong, and it might cause trouble in the following discussion. f{T:Real}(a::Array{T}) Matches any array with a element type that is a subtype of Real (eg. Integer[1,2,BigInt(44)] and Real[1, 3.4]) Are you saying that f{T:Real}(a::Array{T}) covers both homogeneous and heterogeneous arrays, whereas f(a::Array{Real}) only covers heterogeneous arrays? If that's the case it strikes me as just as confusing. I have had trouble with this too, but now that I somewhat understand the rationale, I'm less frustrated. I'm not at the point of defending the current behaviour (yet), so others will have to do that (again). If this has already been discussed then by all means post a link to it. No point repeating things! kl. 11:38:33 UTC+2 tirsdag 29. april 2014 skrev Oliver Woodford følgende: A habitual MATLAB user, I've been trying out Julia over the last two weeks to see if it might be a suitable replacement. I want something that is as fast to develop using, but has much faster runtime. Perhaps I'll write about my general thoughts in another post. However, in this thread I want to address one linguistic thing I found confusing. Ignoring subarrays, dense/sparse arrays, there are two main types of array in Julia. I will call them homogenous and heterogenous. Homogenous arrays are declared as having all elements be the same type: e.g. array{Float64}. They are efficient to store in memory, as the elements are simply laid out consecutively in memory. Heterogenous arrays have an abstract element type, e.g. array{Real}. The way Julia interprets this is that every element must be a concrete subtype of Real, but that they don't have to be the same type. Each element can therefore be a different type, with different storage requirements, so these arrays contain a pointer to each element, which is then stored somewhere else - this carries a massive overhead. In MATLAB these arrays would be termed an array and a cell array respectively, so there is a clear distinction. What I found confusing with Julia is that the distinction is less clear. This confusion was highlighted in a stackoverflow questionhttp://stackoverflow.com/questions/23326848/julia-arrays-with-abstract-parameters-cause-errors-but-variables-with-abstract, which I'll outline it again, now: f(x::Real) = x is equivalent to f{T:Real}(x::T) = x, but f(x::Array{Real}) = x is different from f{T:Real}(x::Array{T}) = x. The second form for input arrays, requiring static parameters, is needed to declare that the array is homogenous, not heterogenous. This seems a funny way of doing things to me because: 1. The homogeneity/heterogeneity of the array is a characteristic of the array only, and not of the function 2. The static parameter T is not required anywhere else, and the Julia style guidehttp://julia.readthedocs.org/en/latest/manual/style-guide/#don-t-use-unnecessary-static-parameters explicitly counsels against the use of such parameters, where they are unnecessary. 3. To declare a function which can take homogenous or heterogenous arrays, I believe you'd have to do something like f{T:Real}(x::Union(Array{T}, Array{Real})) = x, which seems totally bizarre (due to point 1). What I would advocate instead is two types of array, one homogenous, one heterogenous. Array for homogenous and Cell for heterogenous would work. It would do away with the need for static parameters in this case, and also, in my view, make people far more aware of when they are using the different types of array. I suspect many beginners are oblivious to the distinction, currently. In the stackoverflow question, someone suggested two points against this: 1. Having an array whose elements are all guaranteed to be some subtype of Real is not particularly useful without specifying which subtype since without that information almost no structural information is being provided to the compiler (e.g. about memory layout, etc.) Well, firstly I disagree; there is a lot of structural information being supplied - to read each element, the compiler knows that it just needs to compute an
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Tue, Apr 29, 2014 at 11:38 AM, John Myles White johnmyleswh...@gmail.com wrote: Array{Any} is also a single type. Not to nit-pick, but there's a second type parameter, so it's actually still abstract. But Array{Any,1} is a single type with no subtypes, which I think is what you meant.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Tuesday, April 29, 2014 4:38:59 PM UTC+1, John Myles White wrote: On Apr 29, 2014, at 8:28 AM, Oliver Woodford oliver@gmail.comjavascript: wrote: Are you saying that f{T:Real}(a::Array{T}) covers both homogeneous and heterogeneous arrays, whereas f(a::Array{Real}) only covers heterogeneous arrays? If that's the case it strikes me as just as confusing. This is indeed how parametric types work in Julia. Array{T} is a family of types, one for each value of T. Defining f{T}(a::Array{T}) allows you to generically define an infinite number of functions -- you get a unique function for each value of T. Yes, understood. In contrast, Array{Real} is a single type -- specifically, the type generated by the parametric type Array{T} when its parameter T = Real. There is no variable T, so f(a::Array{Real}) defines exactly one function -- and that function accepts exactly one type of input. Yes, understood. Array{Any} is also a single type. -- John However, you haven't addressed the point I make in my post. I will post it again, stated more clearly.
[julia-users] Re: Array/Cell - a useful distinction, or not?
On Tuesday, April 29, 2014 4:41:52 PM UTC+1, Patrick O'Leary wrote: It might be easier to understand if we start by stripping away the constraints and the function syntax and look just at the array type. Array{Real} is a concrete type. It's an array whose elements are all subtypes of the abstract type Real. Array{T} is a family of types. It describes a whole bunch of types, including Array{Real}, but also including Array{Float64}, Array{All} and Array{Bob}. T can itself be either abstract--a heterogenous array--or concrete. The constrained type variable {T:Real}(::Array{T}) still describes a family of types, but that family is smaller. Array{All} and Array{Bob} are out. Hope this helps! Thank you, Patrick. We have now covered how types for arrays currently do work (which I mostly understood), and corrected my mistake in point 3 of my original post. We haven't yet started to address the real point of my post yet, though. I'd be interested to hear more on that.
[julia-users] Re: Array/Cell - a useful distinction, or not?
On Tuesday, April 29, 2014 4:53:05 PM UTC+1, Matt Bauman wrote: I use cell arrays very often in Matlab, too, but I've found that I often don't really need to even worry about the distinction in julia. Square brackets will constrain the types as much as possible, and if it's not possible, Any[] == {}. Moreover, most of what I used cell arrays for in Matlab are completely obviated in Julia — Cell arrays of strings (strings are first class) and passing/parsing/splatting varargs (keyword arguments are wonderful and any Julian collection can be splatted). Yes, I'd been thinking I would use heterogeneous arrays less in Julia, for the reasons you mention.
[julia-users] Re: Array/Cell - a useful distinction, or not?
Quick question, folks: Does f(x::Array{Union(Int8,Int16)}) mean that x must be all Int8 or all Int16 (homogenous), or each element can be either Int8 or Int16 (heterogeneous)? If the latter, then would I need to use f(x::Union(Array{Int8},Array{Int16})) to achieve the former? If so, then what I am suggesting is having two array types, Array and Cell, and have f(x::Array{Union(Int8,Int16)}) be equivalent to the current f(x::Union(Array{Int8},Array{Int16})), and have f(x::Cell{Union(Int8,Int16)}) be equivalent to the current f(x::Array{Union(Int8,Int16)}). Stefan, thank you very much for your answer. I will get back to you on it.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
In my experience, I can think of a single time when having an array of a specific abstract type was useful (how multiple Period arithmetic is handled: https://github.com/karbarcca/Dates.jl/blob/master/src/periods.jl#L88). Almost always, I'm concentrating on making sure Arrays I work with are of a specific, concrete type to ensure performance and memory benefits. If I really realize a few different types might make it in an Array, I usually just go straight to a {} (Any Array) and move on. -Jacob On Tue, Apr 29, 2014 at 12:01 PM, Oliver Woodford oliver.woodf...@gmail.com wrote: On Tuesday, April 29, 2014 4:53:05 PM UTC+1, Matt Bauman wrote: I use cell arrays very often in Matlab, too, but I've found that I often don't really need to even worry about the distinction in julia. Square brackets will constrain the types as much as possible, and if it's not possible, Any[] == {}. Moreover, most of what I used cell arrays for in Matlab are completely obviated in Julia — Cell arrays of strings (strings are first class) and passing/parsing/splatting varargs (keyword arguments are wonderful and any Julian collection can be splatted). Yes, I'd been thinking I would use heterogeneous arrays less in Julia, for the reasons you mention.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Tuesday, April 29, 2014 5:13:08 PM UTC+1, Jacob Quinn wrote: In my experience, I can think of a single time when having an array of a specific abstract type was useful (how multiple Period arithmetic is handled: https://github.com/karbarcca/Dates.jl/blob/master/src/periods.jl#L88). Almost always, I'm concentrating on making sure Arrays I work with are of a specific, concrete type to ensure performance and memory benefits. If I really realize a few different types might make it in an Array, I usually just go straight to a {} (Any Array) and move on. -Jacob You will get performance and memory benefits if your arrays are homogeneous as opposed to heterogeneous. You don't always know what element type they'll have though. Sometimes you want Float64 for accuracy, sometimes Float32 for speed and lower memory use. To handle that you need a static function parameter, currently.
[julia-users] Re: Array/Cell - a useful distinction, or not?
On Tuesday, April 29, 2014 5:14:01 PM UTC+1, Jason Merrill wrote: Suppose the types that you want existed. Let's call them ConcreteArray, and Cell. ConcreteArray gives the guarantee that all of it's elements are stored directly with a fixed stride. Can you give some examples of functions that would use these in their type signature? Sure. sum(x::ConcreteArray{Number}). Because currently if I write sum(x::Array{Real}) and try to pass in an Array{Float64} I get an error. Note that most arrays are actually homogeneous (as Jacob and Matt both stated), so if you want to make sure of that you need to use static parameters. I'm suggesting a system that doesn't require static parameters for the usual case.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Tue, Apr 29, 2014 at 12:13 PM, Oliver Woodford oliver.woodf...@gmail.com wrote: Quick question, folks: Does f(x::Array{Union(Int8,Int16)}) mean that x must be all Int8 or all Int16 (homogenous), or each element can be either Int8 or Int16 (heterogeneous)? If the latter, then would I need to use f(x::Union(Array{Int8},Array{Int16})) to achieve the former? Yes and yes. If so, then what I am suggesting is having two array types, Array and Cell, and have f(x::Array{Union(Int8,Int16)}) be equivalent to the current f(x::Union(Array{Int8},Array{Int16})), and have f(x::Cell{Union(Int8,Int16)}) be equivalent to the current f(x::Array{Union(Int8,Int16)}). This is not a minor change. What you're asking for is having both invariant and covariant parametric types in the language – Array would be covariant and Cell would be invariant. If you're going to have invariant and covariant type parameters, you'll want contravariant type parameters too. At that point, you need to have a way of annotating the variance of a type's parameters – and worse still, you have to explain all this to everyone. At that point, you have Scala and all the complicationshttp://blogs.atlassian.com/2013/01/covariance-and-contravariance-in-scala/and questionshttp://stackoverflow.com/questions/4531455/whats-the-difference-between-ab-and-b-in-scalathat come with that. I don't think that's the right path for a language that's meant to be used by people who are scientists rather than professional programmers (honestly, I don't really think it's a good idea for professional programmers either). Making all parametric types covariant is a more reasonable possibility, but as I pointed out, it has its own significant issues.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
Perhaps the Julian curly-brace array syntax makes Any[] arrays seem special (especially to Matlab converts), when they're really not all that different. Yes, users need to learn that Array{Real} is not a supertype of Array{Int}. But that's a common theme throughout Julia's type system, and really needs to be learned as soon as learning about parametric types. I think the manual does a good job here — Parametric Typeshttp://docs.julialang.org/en/latest/manual/types/#man-parametric-types. Is there a way that it could be written more clearly? On Tuesday, April 29, 2014 12:39:28 PM UTC-4, Stefan Karpinski wrote: On Tue, Apr 29, 2014 at 12:13 PM, Oliver Woodford oliver@gmail.comjavascript: wrote: Quick question, folks: Does f(x::Array{Union(Int8,Int16)}) mean that x must be all Int8 or all Int16 (homogenous), or each element can be either Int8 or Int16 (heterogeneous)? If the latter, then would I need to use f(x::Union(Array{Int8},Array{Int16})) to achieve the former? Yes and yes. If so, then what I am suggesting is having two array types, Array and Cell, and have f(x::Array{Union(Int8,Int16)}) be equivalent to the current f(x::Union(Array{Int8},Array{Int16})), and have f(x::Cell{Union(Int8,Int16)}) be equivalent to the current f(x::Array{Union(Int8,Int16)}). This is not a minor change. What you're asking for is having both invariant and covariant parametric types in the language – Array would be covariant and Cell would be invariant. If you're going to have invariant and covariant type parameters, you'll want contravariant type parameters too. At that point, you need to have a way of annotating the variance of a type's parameters – and worse still, you have to explain all this to everyone. At that point, you have Scala and all the complicationshttp://blogs.atlassian.com/2013/01/covariance-and-contravariance-in-scala/and questionshttp://stackoverflow.com/questions/4531455/whats-the-difference-between-ab-and-b-in-scalathat come with that. I don't think that's the right path for a language that's meant to be used by people who are scientists rather than professional programmers (honestly, I don't really think it's a good idea for professional programmers either). Making all parametric types covariant is a more reasonable possibility, but as I pointed out, it has its own significant issues.
[julia-users] Re: Array/Cell - a useful distinction, or not?
On Tuesday, April 29, 2014 9:29:56 AM UTC-7, Oliver Woodford wrote: On Tuesday, April 29, 2014 5:14:01 PM UTC+1, Jason Merrill wrote: Suppose the types that you want existed. Let's call them ConcreteArray, and Cell. ConcreteArray gives the guarantee that all of it's elements are stored directly with a fixed stride. Can you give some examples of functions that would use these in their type signature? Sure. sum(x::ConcreteArray{Number}). Because currently if I write sum(x::Array{Real}) and try to pass in an Array{Float64} I get an error. Note that most arrays are actually homogeneous (as Jacob and Matt both stated), so if you want to make sure of that you need to use static parameters. I'm suggesting a system that doesn't require static parameters for the usual case. Take a look at the actual implementations of sum in reduce.jl: https://github.com/JuliaLang/julia/blob/master/base/reduce.jl#L179 https://github.com/JuliaLang/julia/blob/master/base/reduce.jl#L275 Julia often has method definitions that are significantly more generic than sum(x::ConcreteArray{Number}), which I think is really nice. The definitions above are fast for Array{Float64}, but the very same definitions also work for e.g. an array of matrices, or anything else with + defined (zero(T) might have to be defined too if the array is empty or has only 1 element). I don't see much advantage of having an implementation like sum(x::ConcreteArray{Number}) if someone later comes along and defines sum(x::AbstractArray) with essentially the same body.
[julia-users] Re: Array/Cell - a useful distinction, or not?
Yet another point of view - incidentally the one that convinced me that Julia has got this right, while most others don't - is the fact that in the grand scheme of things, JIT is *cheap *while working with types that might have subtypes is *really painful* if you don't want to compromise with performance. Julia does an absolutely beautiful job with a function definition such as function sum(x) # implementation goes here end with no arguments defined at all. Mostly, you don't need to define your argument types, since the compiler will generate a strongly typed function for you whenever you need it. For whatever type you need it. If, for some reason, you actually do have an instance of `Array{Real}`, containing both floats, ints and rationals, the above method will (probably) Just Work(TM) anyway, since most operations for real numbers are implemented to make them work with each other. And if you can write an implementation of `sum` that works more or less optimally on `Array{Float64}`, chances are the compiler can too - *and it will, as soon as you need one, *without you doing anything more than asking nicely (and you don't even have to realize that you're asking!). The same user code will be used to compile both methods, and for the Array{Float64} case it will be as fast as it can be, while for the more general method it will not be slower than it would be otherwise. And if you want to be able to say to the compiler that this is going to be an array of numbers - I don't know what kind of numbers yet, but they'll all be the same type, well, then that's already included in the language. You only need the type parameter on the function to say f{T:Number}(x::Array{T}) and you're done. My point is that the problems that types like Array{Real} solve in other languages - e.g. being able to sum a collection of numbers without knowing what type of numbers they are - is solved instead by JIT-compiling and multiple dispatch, just as efficiently (if not better). The syntax - and sometimes the mind-set of the programmer - needs to be a little different, but there are no problems (that I can see) that can't be solved just as efficiently, albeit in another way. OK, well, just because I wrote that, I realize there is one problem that I can't off the top of my head say how I'd solve: type assertions for e.g. arrays. Say I have a variable x, and I want to make sure that it's an array of real numbers that are all of the same type, but I don't care which one. Can I say x::Array{T:Real} as a type assertion? (I'm at a computer without Julia atm, so I can't test it for myself...) But yeah, other than that, I can't really think of a problem that isn't just as solve-able in Julia as in any other language. One just has to embrace the slightly different way of doing things, that is a result of a slightly unusual (in a good way!) type system. // T On Tuesday, April 29, 2014 7:37:52 PM UTC+2, Jason Merrill wrote: On Tuesday, April 29, 2014 9:29:56 AM UTC-7, Oliver Woodford wrote: On Tuesday, April 29, 2014 5:14:01 PM UTC+1, Jason Merrill wrote: Suppose the types that you want existed. Let's call them ConcreteArray, and Cell. ConcreteArray gives the guarantee that all of it's elements are stored directly with a fixed stride. Can you give some examples of functions that would use these in their type signature? Sure. sum(x::ConcreteArray{Number}). Because currently if I write sum(x::Array{Real}) and try to pass in an Array{Float64} I get an error. Note that most arrays are actually homogeneous (as Jacob and Matt both stated), so if you want to make sure of that you need to use static parameters. I'm suggesting a system that doesn't require static parameters for the usual case. Take a look at the actual implementations of sum in reduce.jl: https://github.com/JuliaLang/julia/blob/master/base/reduce.jl#L179 https://github.com/JuliaLang/julia/blob/master/base/reduce.jl#L275 Julia often has method definitions that are significantly more generic than sum(x::ConcreteArray{Number}), which I think is really nice. The definitions above are fast for Array{Float64}, but the very same definitions also work for e.g. an array of matrices, or anything else with + defined (zero(T) might have to be defined too if the array is empty or has only 1 element). I don't see much advantage of having an implementation like sum(x::ConcreteArray{Number}) if someone later comes along and defines sum(x::AbstractArray) with essentially the same body.
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
On Tuesday, April 29, 2014 11:43:03 AM Tomas Lycken wrote: Say I have a variable x, and I want to make sure that it's an array of real numbers that are all of the same type, but I don't care which one. Can I say x::Array{T:Real} as a type assertion? function f{T:Real}(x::Array{T}) ... end does this. --Tim
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
Yes, when declaring function arguments, which I believe fall under case 2. in this listhttp://docs.julialang.org/en/latest/manual/types/#type-declarations. Is it possible for a usage of `::` that falls under category 1? // T On Tuesday, April 29, 2014 9:07:08 PM UTC+2, Tim Holy wrote: On Tuesday, April 29, 2014 11:43:03 AM Tomas Lycken wrote: Say I have a variable x, and I want to make sure that it's an array of real numbers that are all of the same type, but I don't care which one. Can I say x::Array{T:Real} as a type assertion? function f{T:Real}(x::Array{T}) ... end does this. --Tim
Re: [julia-users] Re: Array/Cell - a useful distinction, or not?
Excellent points Tomas. I think this would be particularly helpful for those coming from Java. Perhaps some of your comments could be worked into the manual or FAQ somewhere. -Jacob On Tue, Apr 29, 2014 at 2:43 PM, Tomas Lycken tomas.lyc...@gmail.comwrote: Yet another point of view - incidentally the one that convinced me that Julia has got this right, while most others don't - is the fact that in the grand scheme of things, JIT is *cheap *while working with types that might have subtypes is *really painful* if you don't want to compromise with performance. Julia does an absolutely beautiful job with a function definition such as function sum(x) # implementation goes here end with no arguments defined at all. Mostly, you don't need to define your argument types, since the compiler will generate a strongly typed function for you whenever you need it. For whatever type you need it. If, for some reason, you actually do have an instance of `Array{Real}`, containing both floats, ints and rationals, the above method will (probably) Just Work(TM) anyway, since most operations for real numbers are implemented to make them work with each other. And if you can write an implementation of `sum` that works more or less optimally on `Array{Float64}`, chances are the compiler can too - *and it will, as soon as you need one, *without you doing anything more than asking nicely (and you don't even have to realize that you're asking!). The same user code will be used to compile both methods, and for the Array{Float64} case it will be as fast as it can be, while for the more general method it will not be slower than it would be otherwise. And if you want to be able to say to the compiler that this is going to be an array of numbers - I don't know what kind of numbers yet, but they'll all be the same type, well, then that's already included in the language. You only need the type parameter on the function to say f{T:Number}(x::Array{T}) and you're done. My point is that the problems that types like Array{Real} solve in other languages - e.g. being able to sum a collection of numbers without knowing what type of numbers they are - is solved instead by JIT-compiling and multiple dispatch, just as efficiently (if not better). The syntax - and sometimes the mind-set of the programmer - needs to be a little different, but there are no problems (that I can see) that can't be solved just as efficiently, albeit in another way. OK, well, just because I wrote that, I realize there is one problem that I can't off the top of my head say how I'd solve: type assertions for e.g. arrays. Say I have a variable x, and I want to make sure that it's an array of real numbers that are all of the same type, but I don't care which one. Can I say x::Array{T:Real} as a type assertion? (I'm at a computer without Julia atm, so I can't test it for myself...) But yeah, other than that, I can't really think of a problem that isn't just as solve-able in Julia as in any other language. One just has to embrace the slightly different way of doing things, that is a result of a slightly unusual (in a good way!) type system. // T On Tuesday, April 29, 2014 7:37:52 PM UTC+2, Jason Merrill wrote: On Tuesday, April 29, 2014 9:29:56 AM UTC-7, Oliver Woodford wrote: On Tuesday, April 29, 2014 5:14:01 PM UTC+1, Jason Merrill wrote: Suppose the types that you want existed. Let's call them ConcreteArray, and Cell. ConcreteArray gives the guarantee that all of it's elements are stored directly with a fixed stride. Can you give some examples of functions that would use these in their type signature? Sure. sum(x::ConcreteArray{Number}). Because currently if I write sum(x::Array{Real}) and try to pass in an Array{Float64} I get an error. Note that most arrays are actually homogeneous (as Jacob and Matt both stated), so if you want to make sure of that you need to use static parameters. I'm suggesting a system that doesn't require static parameters for the usual case. Take a look at the actual implementations of sum in reduce.jl: https://github.com/JuliaLang/julia/blob/master/base/reduce.jl#L179 https://github.com/JuliaLang/julia/blob/master/base/reduce.jl#L275 Julia often has method definitions that are significantly more generic than sum(x::ConcreteArray{Number}), which I think is really nice. The definitions above are fast for Array{Float64}, but the very same definitions also work for e.g. an array of matrices, or anything else with + defined (zero(T) might have to be defined too if the array is empty or has only 1 element). I don't see much advantage of having an implementation like sum(x::ConcreteArray{Number}) if someone later comes along and defines sum(x::AbstractArray) with essentially the same body.