[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

fommil Sun, 16 Feb 2014 14:30:34 -0800

Github user fommil commented on the pull request:

    https://github.com/apache/incubator-spark/pull/575#issuecomment-35218098
  
    @martinjaggi I'm happy to advise on what the best sparse format would be 
for any particular problem that you're wanting to solve in spark. just let me 
know the matrix operations that you're performing (noting the sorts of 
structures you expect for each symbol) and at what points the formats have to 
be sent over the wire.
    
    I wouldn't get too caught up on sparse benchmarks. All they will show is 
which storage format works well for that problem. I could give you some 
incredibly efficient sparse formats that will epically fail that test, because 
they are designed for another problem. Column vs Row compression is a classic 
example: column compressed are great for multiplication from the right (or 
transpose mult) whereas row compression are great for multiplication from the 
left... but even that depends on the format of the matrix or vector on the 
right. And this might not be the most efficient format from a memory PoV... 
what if the matrices have a low band size?




If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
[email protected] or file a JIRA ticket with INFRA.

[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

Reply via email to