Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/22328 @mhamilton723 could you take a look at this PR? Mark added some performance improvements in MMLSpark that we wanted to merge in and he also added support for streaming (this was one of the PRs: https://github.com/Azure/mmlspark/pull/134/files , there were a couple more after). He also had some concerns about performance (specifically how we were storing the images as OpenCV bytes in the dataframe which he said took a lot of memory and we should use the more compressed format instead) and I recall he had a few suggestions on how we could improve it in the future. This seems like a good place to discuss how we could improve performance more.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org