[ 
https://issues.apache.org/jira/browse/SYSTEMML-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frederick Reiss updated SYSTEMML-413:
-------------------------------------
    Description: 
Pull the local (non-distributed) linear algebra components of SystemML into a 
separate package. Define a proper object-oriented Java API for creating and 
manipulating local matrices. Document this API. Refactor all tests of local 
linear algebra functionality so that those tests use the new API. Refactor the 
distributed linear algebra operators (both Spark and Hadoop map-reduce) to use 
the new APIs for local linear algebra. 

*Overall Refactoring Plan*
The MatrixBlock class will be the core locus of refactoring. The file is over 
6000 lines long, has dependencies on the HOPS and LOPS layers, and contains a 
lot of sparse matrix code that really ought to be in SparseBlock. Even if it’s 
modified in place, MatrixBlock will bear little resemblance to its current form 
after the refactoring is completed. I recommend setting aside the current 
MatrixBlock class and creating new classes with equivalent functionality by 
copying appropriate blocks of code from the old class. 

Major changes to make relative to MatrixBlock:
* We should create a new DenseMatrixBlock class that only covers dense linear 
algebra.
* Sparse-specific code should be moved into the SparseBlock class. 
* Common functionality across dense and sparse should go into the MatrixValue 
superclass.
* There should be a new class with a name like “Matrix” (we’ll need one anyway 
to serve as the public API) that contains a pointer to a MatrixValue and can 
switch between different representations. Ideally this class should be designed 
so that, in the future, it can serve as a matrix ADT that will wrap both local 
and distributed linear algebra.
* Several fields (maxrow, maxcolumn, numGroups, and various estimates of future 
numbers of nonzeros) are used for stashing data that is only for internal 
SystemML use. Either put these into a different data structure or provide a 
generic mechanism for tagging a matrix block with additional 
application-specific data.
* Clean up and simplify the multiple different initialization methods 
(different variants of the constructors and the methods init() and reset()). 
There should be one canonical method for each major type of initialization. 
Other methods that are shortcuts (i.e. reset() with no arguments) should call 
the canonical method internally.
* Consider refactoring the variants of ternaryOperations() that support 
ctable() into something simpler that is called ctable() – perhaps a Java API 
that can take null values for the optional arguments. 

Other changes outside MatrixBlock:
* The matrix classes currently depend on Hadoop I/O classes like Writable and 
DataInputBuffer. A local linear algebra library really shoudn’t require Hadoop. 
I/O methods that use Hadoop APIs should be factored out into a separate 
package. In paticular, MatrixValue needs to be separated from Hadoop’s 
WritableComparable API.
* The contents of the following packages need to move to the new library: 
sysml.runtime.functionobjects and sysml.runtime.matrix.operators
* The library will need local input and output functions. I haven’t found 
suitable functions yet, but they may be hidden somewhere; in that case the 
existing functions should be adjacent to the other local linear algebra code.
* Utility functions under classes in sysml.runtime.util will need to be 
replicated.
* The more obscure subclasses of MatrixValue (MatrixCell, WeightedCell, etc.) 
do NOT need to be moved over.


  was:Pull the local (non-distributed) linear algebra components of SystemML 
into a separate package. Define a proper object-oriented Java API for creating 
and manipulating local matrices. Document this API. Refactor all tests of local 
linear algebra functionality so that those tests use the new API. Refactor the 
distributed linear algebra operators (both Spark and Hadoop map-reduce) to use 
the new APIs for local linear algebra. 


> Runtime refactoring core matrix block library
> ---------------------------------------------
>
>                 Key: SYSTEMML-413
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-413
>             Project: SystemML
>          Issue Type: Task
>          Components: Runtime
>            Reporter: Matthias Boehm
>
> Pull the local (non-distributed) linear algebra components of SystemML into a 
> separate package. Define a proper object-oriented Java API for creating and 
> manipulating local matrices. Document this API. Refactor all tests of local 
> linear algebra functionality so that those tests use the new API. Refactor 
> the distributed linear algebra operators (both Spark and Hadoop map-reduce) 
> to use the new APIs for local linear algebra. 
> *Overall Refactoring Plan*
> The MatrixBlock class will be the core locus of refactoring. The file is over 
> 6000 lines long, has dependencies on the HOPS and LOPS layers, and contains a 
> lot of sparse matrix code that really ought to be in SparseBlock. Even if 
> it’s modified in place, MatrixBlock will bear little resemblance to its 
> current form after the refactoring is completed. I recommend setting aside 
> the current MatrixBlock class and creating new classes with equivalent 
> functionality by copying appropriate blocks of code from the old class. 
> Major changes to make relative to MatrixBlock:
> * We should create a new DenseMatrixBlock class that only covers dense linear 
> algebra.
> * Sparse-specific code should be moved into the SparseBlock class. 
> * Common functionality across dense and sparse should go into the MatrixValue 
> superclass.
> * There should be a new class with a name like “Matrix” (we’ll need one 
> anyway to serve as the public API) that contains a pointer to a MatrixValue 
> and can switch between different representations. Ideally this class should 
> be designed so that, in the future, it can serve as a matrix ADT that will 
> wrap both local and distributed linear algebra.
> * Several fields (maxrow, maxcolumn, numGroups, and various estimates of 
> future numbers of nonzeros) are used for stashing data that is only for 
> internal SystemML use. Either put these into a different data structure or 
> provide a generic mechanism for tagging a matrix block with additional 
> application-specific data.
> * Clean up and simplify the multiple different initialization methods 
> (different variants of the constructors and the methods init() and reset()). 
> There should be one canonical method for each major type of initialization. 
> Other methods that are shortcuts (i.e. reset() with no arguments) should call 
> the canonical method internally.
> * Consider refactoring the variants of ternaryOperations() that support 
> ctable() into something simpler that is called ctable() – perhaps a Java API 
> that can take null values for the optional arguments. 
> Other changes outside MatrixBlock:
> * The matrix classes currently depend on Hadoop I/O classes like Writable and 
> DataInputBuffer. A local linear algebra library really shoudn’t require 
> Hadoop. I/O methods that use Hadoop APIs should be factored out into a 
> separate package. In paticular, MatrixValue needs to be separated from 
> Hadoop’s WritableComparable API.
> * The contents of the following packages need to move to the new library: 
> sysml.runtime.functionobjects and sysml.runtime.matrix.operators
> * The library will need local input and output functions. I haven’t found 
> suitable functions yet, but they may be hidden somewhere; in that case the 
> existing functions should be adjacent to the other local linear algebra code.
> * Utility functions under classes in sysml.runtime.util will need to be 
> replicated.
> * The more obscure subclasses of MatrixValue (MatrixCell, WeightedCell, etc.) 
> do NOT need to be moved over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to