[GitHub] [systemds] louislepage commented on pull request #1849: [SYSTEMDS-3544] Boolean Matrix Support

via GitHub Sat, 01 Jul 2023 03:57:52 -0700


louislepage commented on PR #1849:
URL: https://github.com/apache/systemds/pull/1849#issuecomment-1615848526

## Experiment:
To analyze and compare the performance of DenseBlockBool as Bitset and as
BooleanArray to the current FP64 implementation, a java program was created
that uses the classes from systemds
([systemds-boolean-matrix-evaluation](https://github.com/louislepage/systemds-boolean-matrix-evaluation)).
For each given matrix size, it does five computations with pseudo-randomly
generated matrices for double to boolean operators and for boolean to boolean
operators.
Both the total runtime of each operation and the memory used are stored.
Finally, the average over the five runs is taken.
The Matrix sizes are chosen to represent up to 16mio total entries, but with
different dims. We used square matrices of up to 4000x4000 and non-square ones
of up to 100x160000. In the non-square case, one dimension was always fixed to
a length of 100.

## Evaluation of Double to Bool (GreaterThan)
As input, two dense FP64 Matrices are randomly generated based on a given
seed. As the output, a MatrixBlock with a DenseBlock of the desired type (FP64,
Bitset, or BooleanArray) is created.
The timer is started, the operator called with the two input and the output
matrices, the timer is stopped, and the result matrix is returned for
validation. Memory is computed as the difference between the memory used before
the in- and output matrices are created and after the result is computed (but
before any Matrix blocks were dereferenced).

![matrix_operations_DoubeToBool_large](https://github.com/apache/systemds/assets/66960491/823a061e-a7b7-4dbc-9127-61105edab9a7)

The plot shows the runtimes on the left and memory usage on the right. In
blue, the results of the bitset implementation are plotted. In green is the
current Double/FP64 implementation, and in orange is the Boolean Array
implementation. The different markers represent the results of square (square
marker), Nx100 (circle), and 100xN (cross) matrices.

As expected, both runtime and memory usage increase linearly with an
increased number of entries in the matrix.

The results show a visible performance increase for computation regarding
runtime when using BooleanArray. This is most likely due to faster, direct
access to the smaller arrays and fewer translations between boolean results and
their double representation to store them in the result matrix since the
operators can return boolean values, which can then be directly stored in the
result matrices.
Interestingly the boolean array outperforms the Bitset here. The Bitset was
even slower than the FP64 implementation. Without further investigation, this
is probably due to fewer instructions necessary to access the boolean array
compared to more when storing a value in a bitset. Also, it is worth noting
that the bitset stores the values internally as `long` and therefore has to
access an 8bytes large long whenever accessing a value. This might influence
caching.

Regarding memory, both boolean implementations took less space. As expected,
the Bitset takes even less space since each boolean in the boolean arrays still
takes up one byte, but the bitset stores the booleans as individual bits that
are collected into `long`.

There is no difference in runtime or memory when using non-square matrices
as long as the total number of entires is the same.

## Evaluation of Boolen To Boolean (Xor)
As input, two dense Matrices of the given type (FP64, Bitset, or
BooleanArray) with boolean values are randomly generated based on a given seed.
As the output, a MatrixBlock with a DenseBlock of the desired type is created.
The timer is started, and the operator is called with the two input and the
output matrices in a loop for 20 iterations. The timer is stopped, and the
result matrix is returned for validation. Memory is computed as the difference
between the memory used before the in- and output matrices are created and
after the loop ended (but before any Matrix blocks were dereferenced).

![matrix_operations_BoolToBool_large](https://github.com/apache/systemds/assets/66960491/1a456f04-2092-40b6-a018-4189beb83256)

As expected, an improved runtime for the boolean arrays is also visible in
this case and both runtime and memory usage increase linearly with an increased
number of entries in the matrix.

We can also see an improvement in memory usage when using a boolean
implementation.
The memory used is less than when running the Double to Boolean operator,
even when using FP64, but this is probably due to reusing the result matrix as
input to the next computation. Therefore after the first iteration, there are
only two matrices in memory.

Memory usage for the Boolean to Boolean case, when using actual Booleans as
input, has decreased significantly since all matrices in memory are small, not
only the output matrix. It is also visible that the bitset again outperforms
the boolean array regarding memory usage.

There is no difference in runtime or memory when using non-square matrices
as long as the total number of entries is the same.

## Conclusion:
Since both boolean implementations improve memory usage, but only the
boolean array outperforms the FP64 implementation in runtime, I suggest using
the boolean array as the DenseBlock for any operation that results in only
boolean data.

Let me know if there are any further open questions.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [systemds] louislepage commented on pull request #1849: [SYSTEMDS-3544] Boolean Matrix Support

Reply via email to