Nakroma opened a new pull request, #2189: URL: https://github.com/apache/systemds/pull/2189
Student project `SYSTEMDS-3548` and follow-up to #2154 Contributions/discussion: - I did some follow-up to both suggestions from @Baunsgaard in the first PR and did testing with both chunking into smaller parts and fusing operations into fewer java calls. I was unable to get any real improvements that were replicable over the larger datasets, altough I don't have a ton of experience with Py4J, so this might still have some potential. I added some of the adjacent code for it though (fusing convert, setting only chunks of a FrameBlock etc.), so at least some of the work I did there contributes to the project. - As it turns out, anything involving the java gateway is super costly, so for example even simply doing a `if var == jvm.gate.sds.ValueType.String` comparison has a big overhead. I was able to optimize another constant time by reducing stuff like that to a minimum, see the first graph below. - For cases where `cols > rows` the current column-wise processing is very slow, so I added row-wise processing for that case to speed it up (see second graph below, tested on 1k rows x 10k cols). Note here that it currently only does that for edge cases where all columns have the same data type. This is because when testing, serializing over a row with different columns was very costly. I wasn't able to spend a lot of time on this, as the deadline is approaching, so I think there is a lot of potential here to find an efficient way to serialize to be able to also use it for mixed columns. I'd also except in the most optimal case for the time to be the same as if cols and rows were switched, so I think there is probably also more optimization potential in the current row-wise processing. - Small note that I switched how I compared times, before I was averaging runs but now I take the min as suggested by the [timeit docs](https://docs.python.org/3/library/timeit.html#timeit.Timer.repeat), so times might be slightly different from the first PR. - Fixed a regression from my first PR where exceptions in the threaded function calls wouldn't propagate properly. - Fixed a small bug in the perftests to be able to read multi-file data (since that's what datagen generates for larger datasets)   -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
