Re: ANN: Sparse matrix support for Clojure with vectorz-clj 0.28.0

Mike Anderson Sat, 10 Jan 2015 20:41:00 -0800

Thanks Matt! I've just release Vectorz 0.45.0 including your changes.

A lot of sparse operations are much faster now!


On Monday, 29 December 2014 21:56:30 UTC+8, Matt Revelle wrote:
>
> Yes, will do.
>
> On Dec 28, 2014, at 9:58 PM, Mike Anderson <[email protected]> 
> wrote:
>
> Looks like you have some good changes in your Vectorz branch - any chance 
> you could tidy up and make a PR?
>
> I like the idea of specialised getSlices and getColumns in particular - 
> these should be much faster than getting the slices one-by-one if the data 
> is very sparse.
>
> On Monday, 29 December 2014 09:43:54 UTC+8, Matt Revelle wrote:
>>
>>
>> On Dec 28, 2014, at 7:28 PM, Mike Anderson <[email protected]> 
>> wrote:
>>
>> Interesting idea. The challenge is that I'm not sure how to add 
>> representation specification in an implementation independent way. It's a 
>> quirk of vectorz that it has both indexed and hashed storage, I probably 
>> wouldn't expect any other implementations to have that. Likewise row and 
>> column oriented storage are fairly obvious choices but I still wouldn't 
>> expect every implementation to support both.
>>
>> Any idea how you would specify the API?
>>
>> I guess we could simply pass an optional map argument of options, but 
>> behaviour would be completely implementation specific. 
>>
>>
>> I think the map is the way to go. You’re probably correct about few other 
>> implementations having as many options, but adding a map of “preferences” 
>> seems like a good option. Creating a sparse matrix might then look like:
>>
>> ;; preferences as a separate arg
>> (new-sparse-array [100000 100000] :vectorz {:order :row :indexed true})
>>
>> ;; an alternative, preferences combined with implementation selection
>> (new-sparse-array [100000 100000] {:impl :vectorz :order :row :indexed 
>> true})
>>
>> Implementations should throw an exception if they don’t support (or 
>> understand) the preferences.
>>
>> On Monday, 29 December 2014 02:12:05 UTC+8, Matt Revelle wrote:
>>>
>>> Glad to see the addition of new-sparse-array to core.matrix. It looks 
>>> like it defaults to SparseRowMatrix for the Vectorz implementation? Should 
>>> the API provide a way to specify which sparse matrix representation (e.g., 
>>> row- vs column-based, indexed vs hashed) should be used? I'd suggest a 
>>> 3-arity new-sparse-array which takes a keyword indicating the 
>>> representation to use as well as a new function which returns a list of 
>>> available representations for a specific implementation.
>>>
>>> I think at this point you incorporated (looks like we have some 
>>> duplication too, doh) all the changes I had made for sparse matrix support 
>>> in Vectorz, but will verify.
>>>
>>
>> I definitely haven't covered all the potential code paths - in particular 
>> a lot of things aren't yet optimised for sparse operations. So any review / 
>> patches would be appreciated!
>>
>>
>> I did some optimization of sparse ops but the code probably needs to be 
>> cleaned up before submitting (e.g., generalized and/or moved to the correct 
>> level in class hierarchy). Those changes were made hastily when I needed to 
>> quickly get a program running fast.
>>
>> A  branch containing all performance changes based on an older revision 
>> of the develop branch is available here:
>> https://github.com/mattrepl/vectorz/tree/sparse-speed
>>
>> There is a related sparse-speed branch in my forks of vectorz-clj and 
>> core.matrix.
>>
>> We should also look into other sparse array representations for Vectorz 
>> from: Matlab, MTJ (https://github.com/fommil/matrix-toolkits-java, 
>> specifically the LinkedSparseMatrix for row and column ops), etc.
>>
>> -Matt
>>
>>  
>>
>>>
>>> On Saturday, December 27, 2014 4:56:55 AM UTC-5, Mike Anderson wrote:
>>>>
>>>> Here is a little belated Christmas present for Clojure data aficionados:
>>>>
>>>> ;; setup
>>>> (use 'clojure.core.matrix)
>>>> (set-current-implementation :vectorz)
>>>>
>>>> ;; create a big sparse matrix with a trillion elements (initially zero)
>>>> (def A (new-sparse-array [1000000 1000000]))
>>>>
>>>> ;; we are hopefully smart enough to avoid printing the whole array!
>>>> A
>>>> => #<SparseRowMatrix Large matrix with shape: [1000000,1000000]>
>>>>
>>>> ;; mutable setter operations supported so that you can set individual 
>>>> sparse elements
>>>> (dotimes [i 1000]
>>>>      (mset! A (rand-int 1000000) (rand-int 1000000) (rand-int 100)))
>>>>
>>>> ;; all standard core.matrix operations supported
>>>> (esum A)
>>>> => 50479.0
>>>>
>>>> ;; efficient addition
>>>> (time (add A A))
>>>> => "Elapsed time: 12.849859 msecs"
>>>>
>>>> ;; matrix multiplication / inner products actually complete in sensible 
>>>> time
>>>> ;; (i.e. much faster than than the usual O(n^3) which might take a few 
>>>> thousand years)
>>>> (time (mmul A (transpose A)))
>>>> => "Elapsed time: 2673.085171 msecs"
>>>>
>>>>
>>>> Some nice things to note about the implementation:
>>>> - Everything goes through the core.matrix API, so your code won't have 
>>>> to change to use sparse matrices :-)
>>>> - Sparse matrices are 100% interoperable with non-sparse (dense) 
>>>> matrices
>>>> - Sparse arrays are fully mutable. Management of storage / indexing 
>>>> happens automatically
>>>> - It isn't just matrices - you can have sparse vectors, N-dimensional 
>>>> arrays etc.
>>>> - Code is pure JVM - no native dependencies to worry about
>>>>
>>>> This is all still very much alpha - so any comments / patches / more 
>>>> rigorous testing much appreciated!
>>>>
>>>>
>>>>
>>>>
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Numerical Clojure" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/numerical-clojure/LLpq4WHx-k8/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
> -- 
> You received this message because you are subscribed to a topic in the 
> Google Groups "Numerical Clojure" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/numerical-clojure/LLpq4WHx-k8/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to 
> [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: ANN: Sparse matrix support for Clojure with vectorz-clj 0.28.0

Reply via email to