louislepage commented on PR #1849:
URL: https://github.com/apache/systemds/pull/1849#issuecomment-1615848526

   ## Experiment:
   To analyze and compare the performance of DenseBlockBool as Bitset and as 
BooleanArray to the current FP64 implementation, a java program was created 
that uses the classes from systemds 
([systemds-boolean-matrix-evaluation](https://github.com/louislepage/systemds-boolean-matrix-evaluation)).
 For each given matrix size, it does five computations with pseudo-randomly 
generated matrices for double to boolean operators and for boolean to boolean 
operators. 
   Both the total runtime of each operation and the memory used are stored. 
Finally, the average over the five runs is taken.
   The Matrix sizes are chosen to represent up to 16mio total entries, but with 
different dims. We used square matrices of up to 4000x4000 and non-square ones 
of up to 100x160000. In the non-square case, one dimension was always fixed to 
a length of 100.
   
   ## Evaluation of Double to Bool (GreaterThan)
   As input, two dense FP64 Matrices are randomly generated based on a given 
seed. As the output, a MatrixBlock with a DenseBlock of the desired type (FP64, 
Bitset, or BooleanArray) is created.
   The timer is started, the operator called with the two input and the output 
matrices, the timer is stopped, and the result matrix is returned for 
validation. Memory is computed as the difference between the memory used before 
the in- and output matrices are created and after the result is computed (but 
before any Matrix blocks were dereferenced).
   
   
![matrix_operations_DoubeToBool_large](https://github.com/apache/systemds/assets/66960491/823a061e-a7b7-4dbc-9127-61105edab9a7)
   
   The plot shows the runtimes on the left and memory usage on the right. In 
blue, the results of the bitset implementation are plotted. In green is the 
current Double/FP64 implementation, and in orange is the Boolean Array 
implementation. The different markers represent the results of square (square 
marker), Nx100 (circle), and 100xN (cross) matrices.
   
   As expected, both runtime and memory usage increase linearly with an 
increased number of entries in the matrix.
   
   The results show a visible performance increase for computation regarding 
runtime when using BooleanArray.  This is most likely due to faster, direct 
access to the smaller arrays and fewer translations between boolean results and 
their double representation to store them in the result matrix since the 
operators can return boolean values, which can then be directly stored in the 
result matrices.
   Interestingly the boolean array outperforms the Bitset here. The Bitset was 
even slower than the FP64 implementation. Without further investigation, this 
is probably due to fewer instructions necessary to access the boolean array 
compared to more when storing a value in a bitset. Also, it is worth noting 
that the bitset stores the values internally as `long` and therefore has to 
access an 8bytes large long whenever accessing a value. This might influence 
caching.
   
   Regarding memory, both boolean implementations took less space. As expected, 
the Bitset takes even less space since each boolean in the boolean arrays still 
takes up one byte, but the bitset stores the booleans as individual bits that 
are collected into `long`.
   
   There is no difference in runtime or memory when using non-square matrices 
as long as the total number of entires is the same.
   
   ## Evaluation of Boolen To Boolean (Xor)
   As input, two dense Matrices of the given type (FP64, Bitset, or 
BooleanArray) with boolean values are randomly generated based on a given seed. 
As the output, a MatrixBlock with a DenseBlock of the desired type is created.
   The timer is started, and the operator is called with the two input and the 
output matrices in a loop for 20 iterations. The timer is stopped, and the 
result matrix is returned for validation. Memory is computed as the difference 
between the memory used before the in- and output matrices are created and 
after the loop ended (but before any Matrix blocks were dereferenced).
   
   
![matrix_operations_BoolToBool_large](https://github.com/apache/systemds/assets/66960491/1a456f04-2092-40b6-a018-4189beb83256)
   
   
   The plot shows the runtimes on the left and memory usage on the right. In 
blue, the results of the bitset implementation are plotted. In green is the 
current Double/FP64 implementation, and in orange is the Boolean Array 
implementation. The different markers represent the results of square (square 
marker), Nx100 (circle), and 100xN (cross) matrices.
   
   As expected, an improved runtime for the boolean arrays is also visible in 
this case and both runtime and memory usage increase linearly with an increased 
number of entries in the matrix.
   
   We can also see an improvement in memory usage when using a boolean 
implementation. 
   The memory used is less than when running the Double to Boolean operator, 
even when using FP64, but this is probably due to reusing the result matrix as 
input to the next computation. Therefore after the first iteration, there are 
only two matrices in memory. 
   
   Memory usage for the Boolean to Boolean case, when using actual Booleans as 
input, has decreased significantly since all matrices in memory are small, not 
only the output matrix. It is also visible that the bitset again outperforms 
the boolean array regarding memory usage.
   
   There is no difference in runtime or memory when using non-square matrices 
as long as the total number of entries is the same.
   
   ## Conclusion:
   Since both boolean implementations improve memory usage, but only the 
boolean array outperforms the FP64 implementation in runtime, I suggest using 
the boolean array as the DenseBlock for any operation that results in only 
boolean data.
   
   
   Let me know if there are any further open questions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to