Jaybit0 opened a new pull request, #2045: URL: https://github.com/apache/systemds/pull/2045
This PR implements N-Gram statistics of instruction sequences when adding `-ngrams [<comma-separated tuple sizes> <topK>]` to the command line arguments. For instance, the CLI arguments `-ngrams 2,3 10` will record 2-grams and 3-grams of instruction sequences and will print the top 10 entries of all 2-grams and 3-grams. N-Grams are maintained for each thread individually and are merged together when printing the statistics. The table is currently sorted by the cumulative execution time of the instruction sequence (over all occurrences of that sequence). For example, a table could look like this: ``` Most common 2-grams (sorted by absolute time): # N-Gram Time(s) StdDev(t)/Mean(t) Count 1 (sp_rblk, createvar) 0,281 (0.857, 0.228) 2 2 (createvar, sp_rblk) 0,281 (0.676, 0.857) 2 3 (write, write) 0,134 - 1 4 (rmvar, write) 0,082 - 1 5 (write, rmvar) 0,053 - 1 6 (createvar, *) 0,014 (2.792, 1.971) 66 7 (==, mvvar) 0,014 (3.299, 2.433) 11 8 (round, rmvar) 0,012 - 1 9 (createvar, round) 0,011 - 1 10 (*, rmvar) 0,008 (2.303, 2.725) 57 ``` @mboehm7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org