Jaybit0 opened a new pull request, #2062:
URL: https://github.com/apache/systemds/pull/2062
This PR is an extension to PR #2045 and implements support for
data-dependent n-grams using and extending the existing lineage functionality.
As we are dealing with DAGs which are not linear sequences of instructions, I
implemented the extension in such a way that it tracks every instruction path
of the length `n`. If we had for example the DAG `(a*b + c/d)` and wanted to
record bigrams, the two operation sequences `[(*, +), (/, +)]` would be added
to the bigram store. I also keep track of the individual data-types of each
instruction which is why I extended the existing lineage functionality as the
`_data` string is sometimes empty and contains inconsistent information.
The n-gram table now looks like this, where the arguments within brackets
show the input parameters of an instruction (separated by `°`) and the suffix
`[i]` represents the parameter index for the following instruction (e.g. for
the first entry the result of `rblk` is used as the second paremeter for
`ba+*`):
```
Most common 2-grams (sorted by absolute time):
# N-Gram Time(s) StdDev(t)/Mean(t) Count
1 (rblk·MATRIX·FP64(MATRIX·FP64) 1,144 (, 1.067) 4
[1], ba+*·MATRIX·FP64(MATRIX·F
P64 ° MATRIX·FP64))
2 (rblk·MATRIX·FP64(MATRIX·FP64) 0,853 (, 0.469) 3
[0], ba+*·MATRIX·FP64(MATRIX·F
P64 ° MATRIX·FP64))
3 (createvar·MATRIX·FP64()[0], r 0,343 (0.627, 0.929) 2
blk·MATRIX·FP64(MATRIX·FP64))
4 (rblk·MATRIX·FP64(MATRIX·FP64) 0,285 - 1
[0], cpvar·MATRIX·FP64(MATRIX·
FP64))
5 (+*·MATRIX·FP64(MATRIX·FP64 ° 0,153 - 1
SCALAR·FP64 ° MATRIX·FP64)[0],
write·MATRIX·FP64(MATRIX·FP64
° L_SCALAR·STRING ° L_SCALAR·
STRING ° L_SCALAR·INT64))
```
@mboehm7
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]