Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})
On Thursday, April 14, 2016 01:09:19 PM James Fairbanks wrote: > I can't add the constraint that B and C must be "consistent" to the type > definitions. However, the constructor can use that constraint in the type > annotation on its arguments. > In that case there is nothing to stop you from making an "inconsistent" > instance of type A{B,C}, but anything that comes out of the outer > constructor will be consistent. > What is the PL name for this notion of consistency? You can use an inner constructor to enforce consistency, see the end of http://docs.julialang.org/en/stable/manual/performance-tips/#avoid-fields-with-abstract-containers. > Why does @inline eliminate the splatting penalty specifically? Does > inlining allow some later (LLVM?) pass to remove the operations to gather > the indices into a tuple and the splat them back out again (replacing them > with a no-op?)? In that case is it important that the other methods of > getindex are also defined with @inline? It's simpler than that: the penalty for gathering the arguments to foo(x...) = bar(x...) into a tuple goes away if foo itself gets inlined. (There's a little more going on that that, but this gives the general idea.) It just passes on the challenge to the downstream function bar, but if bar itself is inlined or specialized (perhaps by @generated) to handle a particular number of arguments, you're fine. --Tim
Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})
Correction: > So the general thing happening here is: > > > type A{B,C} >c::C > end > > function A{B,C}(c::C{B}) >return A{B,typeof(c)}(c) > end > > > Rest of original message: > I can't add the constraint that B and C must be "consistent" to the type > definitions. However, the constructor can use that constraint in the type > annotation on its arguments. > In that case there is nothing to stop you from making an "inconsistent" > instance of type A{B,C}, but anything that comes out of the outer > constructor will be consistent. > What is the PL name for this notion of consistency? > > >> Your 1D constructor could just dispatch to this with SparseArray(data, >> (dims,)) >> >> If you add @inline in front of the getindex/setindex! functions, you may >> be >> able to eliminate the splatting penalty, and you won't need the 1D >> specializations. >> > > Why does @inline eliminate the splatting penalty specifically? Does > inlining allow some later (LLVM?) pass to remove the operations to gather > the indices into a tuple and the splat them back out again (replacing them > with a no-op?)? In that case is it important that the other methods of > getindex are also defined with @inline? > > This is interesting, I don't usually pay this much attention to the types > I use. > > Thanks, > James >
Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})
Thanks Tim, > As you feared, using .types this way messes up inference. The better > approach > is > > SparseArray{K,V,N}(data::Associative{K,V}, dims::NTuple{N,Int}) = > SparseArray{V, N, typeof(data)}(data, dims) > So the general thing happening here is: type A{B,C} c::C end function A{B,C}(c::C{B}) return A{B,C}(c) end I can't add the constraint that B and C must be "consistent" to the type definitions. However, the constructor can use that constraint in the type annotation on its arguments. In that case there is nothing to stop you from making an "inconsistent" instance of type A{B,C}, but anything that comes out of the outer constructor will be consistent. What is the PL name for this notion of consistency? > Your 1D constructor could just dispatch to this with SparseArray(data, > (dims,)) > > If you add @inline in front of the getindex/setindex! functions, you may be > able to eliminate the splatting penalty, and you won't need the 1D > specializations. > Why does @inline eliminate the splatting penalty specifically? Does inlining allow some later (LLVM?) pass to remove the operations to gather the indices into a tuple and the splat them back out again (replacing them with a no-op?)? In that case is it important that the other methods of getindex are also defined with @inline? This is interesting, I don't usually pay this much attention to the types I use. Thanks, James
Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})
On Thursday, April 14, 2016 11:54:45 AM James Fairbanks wrote: > function SparseArray(data, dims) > nt, t = eltype(data).types > n = length(nt.types) > C = typeof(data) > return SparseArray{t, n, C}(data, dims) > end As you feared, using .types this way messes up inference. The better approach is SparseArray{K,V,N}(data::Associative{K,V}, dims::NTuple{N,Int}) = SparseArray{V, N, typeof(data)}(data, dims) Your 1D constructor could just dispatch to this with SparseArray(data, (dims,)) If you add @inline in front of the getindex/setindex! functions, you may be able to eliminate the splatting penalty, and you won't need the 1D specializations. Best, --Tim
Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})
> > Exactly! It's not currently possible to express the > "triangular-dispatch-like" dependency between `C` and `T`, `N`, so you can > just enforce it in the constructors. An ArgumentError here is probably > sensible, but I often find that in cases like these the inner constructor > becomes complicated enough that users always use the more convenient outer > constructors. Just by limiting the signature of your outer constructor, > you effectively end up with MethodErrors. I wrote this up and discovered the triangular dependency between C,T, and N isn't necessary. I don't know if the following is the right way to do it, but it appears to work. Two questions I have are: 1. Does the constructor extract the type information "the right way"? 2. Can you reduce the amount of code for handling the 1D case? import Base: getindex, setindex!, size using Base.Test immutable SparseArray{T,N,C<:Associative} <: AbstractArray{T,N} data::C dims::NTuple{N,Int} end # outer constructor infers the types from the data you pass # triangular dispatch constraints are always satisfied if you use this constructor # I use the types field here, does that ruin type inference or anything mysteriously bad? function SparseArray(data, dims) nt, t = eltype(data).types n = length(nt.types) C = typeof(data) return SparseArray{t, n, C}(data, dims) end # special case for 1D so you can index by plain integers rather than (Int,) function SparseArray(data, dims::Integer) nt, t = eltype(data).types return SparseArray{t, 1, typeof(data)}(data, (dims,)) end #special 1d case size{T,C<:Associative}(sa::SparseArray{T,1,C}) = sa.dims getindex{T,C<:Associative}(sa::SparseArray{T,1,C}, index::Integer) = sa.data[index] setindex!{T,C<:Associative}(sa::SparseArray{T,1,C}, x, index::Integer) = setindex!(sa.data, x, index) similar{T,C<:Associative}(sa::SparseArray{T,1,C}, ::Type{T}, dims) = SparseArray(Dict{Int, T}(), dims) # satisfying the interface size(sa::SparseArray) = sa.dims getindex(sa::SparseArray, indices...) = sa.data[indices] setindex!(sa::SparseArray, x, indices...) = setindex!(sa.data, x, indices) similar{T}(sa::SparseArray, ::Type{T}, dims) = SparseArray(Dict{typeof(dims), T}(), dims) On Thu, Apr 14, 2016 at 10:57 AM, Matt Baumanwrote: > > Yes, and it was incorporated into Base in 0.5. It's basically a > one-column SparseMatrixCSC. > I just upgraded to 4.5 today. I am getting more and more eager for 0.5-rc1. I implemented the
Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})
On Thursday, April 14, 2016 at 8:38:33 AM UTC-4, James Fairbanks wrote: > > Is that from this package https://github.com/JuliaSparse/SparseVectors.jl > or somewhere else? > Yes, and it was incorporated into Base in 0.5. It's basically a one-column SparseMatrixCSC. > > They should have *significantly* better performance than a dictionary in > many cases. > > What about performance of random order insertions? The structure I'd want > for fast linear algebra would be similar to one column of a csc matrix, > right? > > The workloads will have a relatively high ratio of insertions to reads, > and almost no bulk reads -- like you would have for a matvec. > > I guess one of the benefits of making the underlying storage type generic > is that you could do this comparison easily. > Yup, there are definitely places where the CSC format will struggle. The biggest advantage is when you can use specialized algorithms that only look at the stored elements — and the Base sparse implementation already has lots of those specializations in place (particularly for linear algebra, but other things like reductions, too). The two data structures will have very different algorithmic complexities and cache behaviors, so it'll all depend on how you use them. Your use-case may be a place where this kind of hashed array will shine. >
Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})
On Apr 13, 2016 7:57 PM, "Matt Bauman"> If your default element is going to be zero(T), have you tried using SparseVectors? Is that from this package https://github.com/JuliaSparse/SparseVectors.jl or somewhere else? > They should have *significantly* better performance than a dictionary in many cases. What about performance of random order insertions? The structure I'd want for fast linear algebra would be similar to one column of a csc matrix, right? The workloads will have a relatively high ratio of insertions to reads, and almost no bulk reads -- like you would have for a matvec. I guess one of the benefits of making the underlying storage type generic is that you could do this comparison easily. -James
[julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})
On Wednesday, April 13, 2016 at 6:29:21 PM UTC-4, James Fairbanks wrote: > > Or do you make a constructor that validates C <: Associative{NTuple{N, > Int},T} at construction time? > What would be the appropriate error to throw in this case? > Exactly! It's not currently possible to express the "triangular-dispatch-like" dependency between `C` and `T`, `N`, so you can just enforce it in the constructors. An ArgumentError here is probably sensible, but I often find that in cases like these the inner constructor becomes complicated enough that users always use the more convenient outer constructors. Just by limiting the signature of your outer constructor, you effectively end up with MethodErrors. If your default element is going to be zero(T), have you tried using SparseVectors? They should have *significantly* better performance than a dictionary in many cases.
[julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})
I am glad to get two consistent answers within 1 minute of each other! On Wednesday, April 13, 2016 at 1:48:20 PM UTC-4, Matt Bauman wrote: > > If you're able to define a `size` method for the Associative types, then > I'd call them all AbstractArrays. > Yeah they are 1 dimensional and have size(A) = n. I am thinking that it is 0 on every index that hasn't been set yet, and the only valid indices are i in 1:n. I want to use Dict{Int,T} as a "lazily allocated" Vector{T}. > I'd use a wrapper type like the last example in > http://docs.julialang.org/en/release-0.4/manual/interfaces/#abstract-arrays. > You could parameterize the `data` field so any associative type could be > wrapped. As a bonus: now it works like an array in Base code, too, without > changing any signatures. > So the SparseArray in the example would become immutable SparseArray{C<:Associative{N, T},T,N} <: AbstractArray{T,N} data::C dims::N end And then all my functions that currently use Array{T,N} can use AbstractArray{T,N} and then can use this SparseArray automagically? This gives an error ERROR: UndefVarError: N not defined immutable SparseArray2{C<:Associative,T,N} <: AbstractArray{T,N} data::C dims::NTuple{N,Int} end Works but then when constructing it you need to make sure that C, T, and N are consistent in the constructor logic because there is nothing in the type definition that would forbid you from instantiating a SparseArray2{Dict{Int, Float64}, Float32, 3}(Dict{Int, Float64}(), (2,3,4)) The problem is that data is not an Associative{NTuple{N, Int}, T}. Is there any way to specify this in the Type definitions? Or do you make a constructor that validates C <: Associative{NTuple{N, Int},T} at construction time? What would be the appropriate error to throw in this case? Thanks, James
[julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})
If you're able to define a `size` method for the Associative types, then I'd call them all AbstractArrays. I'd use a wrapper type like the last example in http://docs.julialang.org/en/release-0.4/manual/interfaces/#abstract-arrays. You could parameterize the `data` field so any associative type could be wrapped. As a bonus: now it works like an array in Base code, too, without changing any signatures. But Unions are fine to use in method signatures — there's no penalty there. I think the only place where you'll see a penalty is if you use a Union as a field type in a performance-critical object. That's not unique to Unions… you just want those fields to have concrete types: http://docs.julialang.org/en/release-0.4/manual/performance-tips/#avoid-fields-with-abstract-type. On Wednesday, April 13, 2016 at 1:14:42 PM UTC-4, James Fairbanks wrote: > > Over at LightGraphs we have been discussing support for collections > indexed by integers that can be either dense like Vector{T} or sparse like > Dict{Int,T}. > The idea is that you want to collect some T's together indexed by a range > of integers 1:n. Often you will have a value for all or most of the indices > and should use a Vector{T} for that case. But sometimes you will only have > a few values and don't want to allocate O(n) storage just to hold a much > smaller amount of data. > > What is the best way to express this in types? Does > Union{Associative{Int,T}, AbstractArray{T}} express this with no > performance penalty? Is there any reason to avoid Union types? > > Thanks, > James >
[julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})
On Wednesday, April 13, 2016 at 1:14:42 PM UTC-4, James Fairbanks wrote: > > What is the best way to express this in types? Does > Union{Associative{Int,T}, AbstractArray{T}} express this with no > performance penalty? Is there any reason to avoid Union types? > Unions have no better or worse performance than any other abstract type. Whether there is any performance impact at all depends on the context. 1) In a function argument, type declarations have zero effect on performance. It doesn't matter to the compiler whether you declare a function as f(a), f(a::Any), f(a::Vector{Float64}), or f{T}(a::Union{Associative{Int,T}, AbstractVector{T}}). The key thing to remember about Julia is that, each time you call a function f(a) with a different type of argument, Julia recompiles a new version of f specialized for that argument type, which is used for all subsequent calls for the same argument types. This specialized version is just as fast as if you had explicitly typed f(a::SomeType) in the first place. Argument-type declarations are not for performance in Julia, they are a filter: they say "use this method for these types". This is useful for multiple dispatch (if you have different methods depending on the argument types), for correctness (if your function will give unexpected/incorrect results for some types), and for clarity (to indicate clearly to the user what is expected to be passed to a function). (I feel like this should be added as an "anti-tip" to the performance tips in the manual: don't declare argument types for performance reasons). 2) When you define a data structure, any abstract type for a field (whether it is a Union type or another abstract type like Associative) will be significantly slower than a concrete type. See: http://docs.julialang.org/en/latest/manual/performance-tips/#avoid-fields-with-abstract-type