Github user nilmeier commented on the pull request:

    https://github.com/apache/spark/pull/8563#issuecomment-138369072
  
    Hello @shivaram:
    
    Here are some more timings from a 10 node cluster.  npb in the legend is
    the number of entries per block.  The x axis is the number of blocks.
    
    For these cases, I am seeing better than n^3 scaling, which suggests that
    there are other slower processes that are showing up here.
    
    We run into stack overflow errors for a large number of blocks.  If we
    adjust the stacksize, we can run larger blocks.  Here, we report the 'out
    of the box' settings.
    
    I'm using a cluster that is dedicated to another project for these timings,
    so it is kind of hard to get a lot of data, but we can continue to work
    through timings if you like.
    
    I have attached the scripts used to generate the data.  Please forgive the
    fact that these are very ad hoc, but you can see how the timings are
    carried out.
    
    Please let me know if these are helpful for you, or if you need anything
    else.
    
    Sincerely, Jerome
    
    
    On Wed, Sep 2, 2015 at 11:05 AM, Jerome Nilmeier <nilme...@gmail.com> wrote:
    
    > This approach has some similarity to the CALU paper that you posted, and
    > follows what the paper describes as "classic right looking algorithms",
    > (p3).  There are differences to our approach, which I discussed in a *How
    > it Works* section in the documentation.  We don't have a publication for
    > this work as of yet.
    >
    > In terms of running time, I have some single node (i7 macbook) data
    > (attached).  The scaling here for the LU calc appears to be n^3.5, where n
    > is the number of rowBlocks.  The current approach is n^3 at best.
    > We're running timings on a 10 node (24 core ea.) cluster, and should have
    > some more comprehensive data for you shortly.   Please let me know if I 
can
    > provide anything else in the meantime, or if you'd like to meet to 
discuss.
    >
    > Sincerely, Jerome
    >
    > On Wed, Sep 2, 2015 at 10:19 AM, Shivaram Venkataraman <
    > notificati...@github.com> wrote:
    >
    >> @nilmeier <https://github.com/nilmeier> Do you have a reference to a
    >> paper which analyses the running time and communication costs for the
    >> algorithm implemented here ?
    >>
    >> —
    >> Reply to this email directly or view it on GitHub
    >> <https://github.com/apache/spark/pull/8563#issuecomment-137176779>.
    >>
    >
    >
    >
    > --
    > Jerome Nilmeier, PhD
    > Cell:      510-325-8695
    > Home:   925-292-5321
    >
    
    
    
    -- 
    Jerome Nilmeier, PhD
    Cell:      510-325-8695
    Home:   925-292-5321



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to