Github user frreiss commented on the pull request:

    https://github.com/apache/spark/pull/8563#issuecomment-143096254
  
    Review comments:
    
    --> Try making the portion of the examples with input data more condensed; 
perhaps
        by reading the matrices from string constants.
    
    --> There's an orphan/duplicate ScalaDoc comment at line 356 of 
BlockMatrix.scala
    
    --> Remove carriage return on lines 370, 481, 569, 732
    
    --> Recommend adding a call to the new subtract method to the MLLib
        programmer's guide
    
    --> New API calls to BlockMatrix should have corresponding PySpark APIs
    
    --> Error message at line 394 should print out the block sizes that don't 
match
    
    --> The code at line 384 should multiply every element of b.head by -1 as 
far
        as I can see
    
    --> Line 456 and 465-471 have wrong indentation
    
    --> Scaladoc at 474 should state that blockRowRange and blockColRange are 
block
        indexes, not row/column indexes
    
    --> In lines 460-463, consider making a single pass over the blocks instead 
of
        4 passes
    
    --> Add a note to SchurComplement that the current implementation assumes 
that
        a11 fits into memory on the driver
    
    --> Might want to use a case class in return type of blockLUtoSolver
    
    --> Take a close look at the performance impact of the chains of
        multiplications at line 811 when there are many levels of recursion
    
    --> In recursiveSequencesBuild, you may want to break off more than one 
block
        from the upper left corner of the matrix; in many cases, the available 
memory
        on the driver can comfortably hold, say 10x10 blocks. You should be 
able to
        query the SparkContext's memory information to determine how much heap
        space is available for the in-memory portion of the computation. On a
        side note, it would be good to produce a user-friendly error if it looks
        like there is not enough local heap to process one block's data locally.
    
    --> Might want to reuse buffers for the local data allocated at lines 
623-629 
        to avoid triggering global GC at each level of recursion
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to