christinadionysio commented on PR #1943: URL: https://github.com/apache/systemds/pull/1943#issuecomment-1845913037
After running the perftest I created two figures that provide evidence that the first (`dist`) and second (`dist_missing `) methods do not perform well on larger datasets. However, the third method (`dist_sample`) is working for larger datasets by decreasing the sample size. The first figure shows the runtime for each method for different dataset sizes. As mentioned the first two methods do not perform well (java heap space exception) on larger datasets, which explains the missing values for `# rows 100000 1000000 10000000` [knn_perf_runtime.pdf](https://github.com/apache/systemds/files/13603755/knn_perf_runtime.pdf) The second figure shows how the sampling size for the third method was decreased for larger datasets. [knn_perf_sampling_fac.pdf](https://github.com/apache/systemds/files/13603758/knn_perf_sampling_fac.pdf) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
