Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})

2016-04-14 Thread Tim Holy
On Thursday, April 14, 2016 01:09:19 PM James Fairbanks wrote:
> I can't add the constraint that B and C must be "consistent" to the type
> definitions. However, the constructor can use that constraint in the type
> annotation on its arguments.
> In that case there is nothing to stop you from making an "inconsistent"
> instance of type A{B,C}, but anything that comes out of the outer
> constructor will be consistent.
> What is the PL name for this notion of consistency?

You can use an inner constructor to enforce consistency, see the end of 
http://docs.julialang.org/en/stable/manual/performance-tips/#avoid-fields-with-abstract-containers.

> Why does @inline eliminate the splatting penalty specifically? Does
> inlining allow some later (LLVM?) pass to remove the operations to gather
> the indices into a tuple and the splat them back out again (replacing them
> with a no-op?)? In that case is it important that the other methods of
> getindex are also defined with @inline?

It's simpler than that: the penalty for gathering the arguments to

foo(x...) = bar(x...)

into a tuple goes away if foo itself gets inlined. (There's a little more 
going on that that, but this gives the general idea.) It just passes on the 
challenge to the downstream function bar, but if bar itself is inlined or 
specialized (perhaps by @generated) to handle a particular number of 
arguments, you're fine.

--Tim



Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})

2016-04-14 Thread James Fairbanks
Correction:

> So the general thing happening here is:
>
>
> type A{B,C}
>c::C
> end
>
> function A{B,C}(c::C{B})
>return A{B,typeof(c)}(c)
> end
>
>
> Rest of original message:


> I can't add the constraint that B and C must be "consistent" to the type
> definitions. However, the constructor can use that constraint in the type
> annotation on its arguments.
> In that case there is nothing to stop you from making an "inconsistent"
> instance of type A{B,C}, but anything that comes out of the outer
> constructor will be consistent.
> What is the PL name for this notion of consistency?
>
>
>> Your 1D constructor could just dispatch to this with SparseArray(data,
>> (dims,))
>>
>> If you add @inline in front of the getindex/setindex! functions, you may
>> be
>> able to eliminate the splatting penalty, and you won't need the 1D
>> specializations.
>>
>
> Why does @inline eliminate the splatting penalty specifically? Does
> inlining allow some later (LLVM?) pass to remove the operations to gather
> the indices into a tuple and the splat them back out again (replacing them
> with a no-op?)? In that case is it important that the other methods of
> getindex are also defined with @inline?
>
> This is interesting, I don't usually pay this much attention to the types
> I use.
>
> Thanks,
> James
>


Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})

2016-04-14 Thread James Fairbanks
Thanks Tim,

> As you feared, using .types this way messes up inference. The better
> approach
> is
>
> SparseArray{K,V,N}(data::Associative{K,V}, dims::NTuple{N,Int}) =
> SparseArray{V, N, typeof(data)}(data, dims)
>

So the general thing happening here is:


type A{B,C}
   c::C
end

function A{B,C}(c::C{B})
   return A{B,C}(c)
end


I can't add the constraint that B and C must be "consistent" to the type
definitions. However, the constructor can use that constraint in the type
annotation on its arguments.
In that case there is nothing to stop you from making an "inconsistent"
instance of type A{B,C}, but anything that comes out of the outer
constructor will be consistent.
What is the PL name for this notion of consistency?


> Your 1D constructor could just dispatch to this with SparseArray(data,
> (dims,))
>
> If you add @inline in front of the getindex/setindex! functions, you may be
> able to eliminate the splatting penalty, and you won't need the 1D
> specializations.
>

Why does @inline eliminate the splatting penalty specifically? Does
inlining allow some later (LLVM?) pass to remove the operations to gather
the indices into a tuple and the splat them back out again (replacing them
with a no-op?)? In that case is it important that the other methods of
getindex are also defined with @inline?

This is interesting, I don't usually pay this much attention to the types I
use.

Thanks,
James


Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})

2016-04-14 Thread Tim Holy
On Thursday, April 14, 2016 11:54:45 AM James Fairbanks wrote:
> function SparseArray(data, dims)
> nt, t = eltype(data).types
> n = length(nt.types)
> C = typeof(data)
> return SparseArray{t, n, C}(data, dims)
> end

As you feared, using .types this way messes up inference. The better approach 
is

SparseArray{K,V,N}(data::Associative{K,V}, dims::NTuple{N,Int}) = 
SparseArray{V, N, typeof(data)}(data, dims)

Your 1D constructor could just dispatch to this with SparseArray(data, 
(dims,))

If you add @inline in front of the getindex/setindex! functions, you may be 
able to eliminate the splatting penalty, and you won't need the 1D 
specializations.

Best,
--Tim



Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})

2016-04-14 Thread James Fairbanks
>
> Exactly!  It's not currently possible to express the
> "triangular-dispatch-like" dependency between `C` and `T`, `N`, so you can
> just enforce it in the constructors.  An ArgumentError here is probably
> sensible, but I often find that in cases like these the inner constructor
> becomes complicated enough that users always use the more convenient outer
> constructors.  Just by limiting the signature of your outer constructor,
> you effectively end up with MethodErrors.


I wrote this up and discovered the triangular dependency between C,T, and N
isn't necessary.
I don't know if the following is the right way to do it, but it appears to
work.

Two questions I have are:
1. Does the constructor extract the type information "the right way"?
2. Can you reduce the amount of code for handling the 1D case?



import Base: getindex, setindex!, size
using Base.Test

immutable SparseArray{T,N,C<:Associative} <: AbstractArray{T,N}
data::C
dims::NTuple{N,Int}
end

# outer constructor infers the types from the data you pass
# triangular dispatch constraints are always satisfied if you use this
constructor

# I use the types field here, does that ruin type inference or anything
mysteriously bad?

function SparseArray(data, dims)
nt, t = eltype(data).types
n = length(nt.types)
C = typeof(data)
return SparseArray{t, n, C}(data, dims)
end

# special case for 1D so you can index by plain integers rather than (Int,)
function SparseArray(data, dims::Integer)
nt, t = eltype(data).types
return SparseArray{t, 1, typeof(data)}(data, (dims,))
end

#special 1d case
size{T,C<:Associative}(sa::SparseArray{T,1,C}) = sa.dims
getindex{T,C<:Associative}(sa::SparseArray{T,1,C}, index::Integer) =
sa.data[index]
setindex!{T,C<:Associative}(sa::SparseArray{T,1,C}, x, index::Integer) =
setindex!(sa.data, x, index)
similar{T,C<:Associative}(sa::SparseArray{T,1,C}, ::Type{T}, dims) =
SparseArray(Dict{Int, T}(), dims)

# satisfying the interface
size(sa::SparseArray) = sa.dims
getindex(sa::SparseArray, indices...) = sa.data[indices]
setindex!(sa::SparseArray, x, indices...) = setindex!(sa.data, x, indices)
similar{T}(sa::SparseArray, ::Type{T}, dims) =
SparseArray(Dict{typeof(dims), T}(), dims)



On Thu, Apr 14, 2016 at 10:57 AM, Matt Bauman  wrote:
>
> Yes, and it was incorporated into Base in 0.5.  It's basically a
> one-column SparseMatrixCSC.
>
I just upgraded to 4.5 today. I am getting more and more eager for 0.5-rc1.

I implemented the


Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})

2016-04-14 Thread Matt Bauman
On Thursday, April 14, 2016 at 8:38:33 AM UTC-4, James Fairbanks wrote:
>
> Is that from this package https://github.com/JuliaSparse/SparseVectors.jl 
> or somewhere else?
>

Yes, and it was incorporated into Base in 0.5.  It's basically a one-column 
SparseMatrixCSC.
 

> > They should have *significantly* better performance than a dictionary in 
> many cases.
>
> What about performance of random order insertions? The structure I'd want 
> for fast linear algebra would be similar to one column of a csc matrix, 
> right? 
>
> The workloads will have a relatively high ratio of insertions to reads, 
> and almost no bulk reads -- like you would have for a matvec.
>
> I guess one of the benefits of making the underlying storage type generic 
> is that you could do this comparison easily.
>

Yup, there are definitely places where the CSC format will struggle.  The 
biggest advantage is when you can use specialized algorithms that only look 
at the stored elements — and the Base sparse implementation already has 
lots of those specializations in place (particularly for linear algebra, 
but other things like reductions, too).  The two data structures will have 
very different algorithmic complexities and cache behaviors, so it'll all 
depend on how you use them.  Your use-case may be a place where this kind 
of hashed array will shine.

>

Re: [julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})

2016-04-14 Thread James Fairbanks
On Apr 13, 2016 7:57 PM, "Matt Bauman" 

> If your default element is going to be zero(T), have you tried using
SparseVectors?

Is that from this package https://github.com/JuliaSparse/SparseVectors.jl
or somewhere else?

> They should have *significantly* better performance than a dictionary in
many cases.

What about performance of random order insertions? The structure I'd want
for fast linear algebra would be similar to one column of a csc matrix,
right?

The workloads will have a relatively high ratio of insertions to reads, and
almost no bulk reads -- like you would have for a matvec.

I guess one of the benefits of making the underlying storage type generic
is that you could do this comparison easily.

-James


[julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})

2016-04-13 Thread Matt Bauman
On Wednesday, April 13, 2016 at 6:29:21 PM UTC-4, James Fairbanks wrote:
>
> Or do you make a constructor that validates C <: Associative{NTuple{N, 
> Int},T} at construction time?
> What would be the appropriate error to throw in this case?
>

Exactly!  It's not currently possible to express the 
"triangular-dispatch-like" dependency between `C` and `T`, `N`, so you can 
just enforce it in the constructors.  An ArgumentError here is probably 
sensible, but I often find that in cases like these the inner constructor 
becomes complicated enough that users always use the more convenient outer 
constructors.  Just by limiting the signature of your outer constructor, 
you effectively end up with MethodErrors.

If your default element is going to be zero(T), have you tried using 
SparseVectors?  They should have *significantly* better performance than a 
dictionary in many cases.


[julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})

2016-04-13 Thread James Fairbanks
I am glad to get two consistent answers within 1 minute of each other!

On Wednesday, April 13, 2016 at 1:48:20 PM UTC-4, Matt Bauman wrote:
>
> If you're able to define a `size` method for the Associative types, then 
> I'd call them all AbstractArrays.
>
 
Yeah they are 1 dimensional and have size(A) = n. I am thinking that it is 
0 on every index that hasn't been set yet, and the only valid indices are i 
in 1:n.
I want to use Dict{Int,T} as a "lazily allocated" Vector{T}.
 

>  I'd use a wrapper type like the last example in 
> http://docs.julialang.org/en/release-0.4/manual/interfaces/#abstract-arrays. 
>  You could parameterize the `data` field so any associative type could be 
> wrapped.  As a bonus: now it works like an array in Base code, too, without 
> changing any signatures.
>
 
So the SparseArray in the example would become
immutable SparseArray{C<:Associative{N, T},T,N} <: AbstractArray{T,N}
 data::C
 dims::N
end

And then all my functions that currently use Array{T,N} can use 
AbstractArray{T,N} and then can use this SparseArray automagically?
This gives an error 

ERROR: UndefVarError: N not defined

immutable SparseArray2{C<:Associative,T,N} <: AbstractArray{T,N}
 data::C
 dims::NTuple{N,Int}
end
Works but then when constructing it you need to make sure that C, T, and N 
are consistent in the constructor logic because there is nothing in the 
type definition that would forbid you from instantiating a

SparseArray2{Dict{Int, Float64}, Float32, 3}(Dict{Int, Float64}(), (2,3,4))

The problem is that data is not an Associative{NTuple{N, Int}, T}.
Is there any way to specify this in the Type definitions?
Or do you make a constructor that validates C <: Associative{NTuple{N, 
Int},T} at construction time?
What would be the appropriate error to throw in this case?

Thanks,
James


[julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})

2016-04-13 Thread Matt Bauman
If you're able to define a `size` method for the Associative types, then 
I'd call them all AbstractArrays.  I'd use a wrapper type like the last 
example 
in http://docs.julialang.org/en/release-0.4/manual/interfaces/#abstract-arrays. 
 You could parameterize the `data` field so any associative type could be 
wrapped.  As a bonus: now it works like an array in Base code, too, without 
changing any signatures.

But Unions are fine to use in method signatures — there's no penalty there. 
 I think the only place where you'll see a penalty is if you use a Union as 
a field type in a performance-critical object.  That's not unique to 
Unions… you just want those fields to have concrete 
types: 
http://docs.julialang.org/en/release-0.4/manual/performance-tips/#avoid-fields-with-abstract-type.

On Wednesday, April 13, 2016 at 1:14:42 PM UTC-4, James Fairbanks wrote:
>
> Over at LightGraphs we have been discussing support for collections 
> indexed by integers that can be either dense like Vector{T} or sparse like 
> Dict{Int,T}.
> The idea is that you want to collect some T's together indexed by a range 
> of integers 1:n. Often you will have a value for all or most of the indices 
> and should use a Vector{T} for that case. But sometimes you will only have 
> a few values and don't want to allocate O(n) storage just to hold a much 
> smaller amount of data.
>
> What is the best way to express this in types? Does 
> Union{Associative{Int,T}, AbstractArray{T}} express this with no 
> performance penalty? Is there any reason to avoid Union types?
>
> Thanks,
> James
>


[julia-users] Re: What do you call an (Associative{Int,T} or AbstractArray{T})

2016-04-13 Thread Steven G. Johnson


On Wednesday, April 13, 2016 at 1:14:42 PM UTC-4, James Fairbanks wrote:
>
> What is the best way to express this in types? Does 
> Union{Associative{Int,T}, AbstractArray{T}} express this with no 
> performance penalty? Is there any reason to avoid Union types?
>

Unions have no better or worse performance than any other abstract type.   
Whether there is any performance impact at all depends on the context.

1) In a function argument, type declarations have zero effect on 
performance.  It doesn't matter to the compiler whether you declare a 
function as f(a), f(a::Any), f(a::Vector{Float64}), or 
f{T}(a::Union{Associative{Int,T}, AbstractVector{T}}).   

The key thing to remember about Julia is that, each time you call a 
function f(a) with a different type of argument, Julia recompiles a new 
version of f specialized for that argument type, which is used for all 
subsequent calls for the same argument types.  This specialized version is 
just as fast as if you had explicitly typed f(a::SomeType) in the first 
place.

Argument-type declarations are not for performance in Julia, they are a 
filter: they say "use this method for these types".  This is useful for 
multiple dispatch (if you have different methods depending on the argument 
types), for correctness (if your function will give unexpected/incorrect 
results for some types), and for clarity (to indicate clearly to the user 
what is expected to be passed to a function).

(I feel like this should be added as an "anti-tip" to the performance tips 
in the manual: don't declare argument types for performance reasons).

2) When you define a data structure, any abstract type for a field (whether 
it is a Union type or another abstract type like Associative) will be 
significantly slower than a concrete type.  See:

  
 
http://docs.julialang.org/en/latest/manual/performance-tips/#avoid-fields-with-abstract-type