Nakroma opened a new pull request, #2189:
URL: https://github.com/apache/systemds/pull/2189

   Student project `SYSTEMDS-3548` and follow-up to #2154 
   
   Contributions/discussion:
   
   - I did some follow-up to both suggestions from @Baunsgaard in the first PR 
and did testing with both chunking into smaller parts and fusing operations 
into fewer java calls. I was unable to get any real improvements that were 
replicable over the larger datasets, altough I don't have a ton of experience 
with Py4J, so this might still have some potential. I added some of the 
adjacent code for it though (fusing convert, setting only chunks of a 
FrameBlock etc.), so at least some of the work I did there contributes to the 
project.
   - As it turns out, anything involving the java gateway is super costly, so 
for example even simply doing a `if var == jvm.gate.sds.ValueType.String` 
comparison has a big overhead. I was able to optimize another constant time by 
reducing stuff like that to a minimum, see the first graph below.
   - For cases where `cols > rows` the current column-wise processing is very 
slow, so I added row-wise processing for that case to speed it up (see second 
graph below, tested on 1k rows x 10k cols). Note here that it currently only 
does that for edge cases where all columns have the same data type. This is 
because when testing, serializing over a row with different columns was very 
costly. I wasn't able to spend a lot of time on this, as the deadline is 
approaching, so I think there is a lot of potential here to find an efficient 
way to serialize to be able to also use it for mixed columns. I'd also except 
in the most optimal case for the time to be the same as if cols and rows were 
switched, so I think there is probably also more optimization potential in the 
current row-wise processing.
   - Small note that I switched how I compared times, before I was averaging 
runs but now I take the min as suggested by the [timeit 
docs](https://docs.python.org/3/library/timeit.html#timeit.Timer.repeat), so 
times might be slightly different from the first PR.
   - Fixed a regression from my first PR where exceptions in the threaded 
function calls wouldn't propagate properly.
   - Fixed a small bug in the perftests to be able to read multi-file data 
(since that's what datagen generates for larger datasets)
   
   
![load_pandas_10k_1k_dense](https://github.com/user-attachments/assets/4b9650f0-5dd9-4df3-b30b-5cbee144485a)
   
![load_pandas_1k_10k_dense](https://github.com/user-attachments/assets/6bb11dbd-9957-4a3e-852c-5ddaaf185e1f)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to