[ https://issues.apache.org/jira/browse/MADLIB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan closed MADLIB-1342. ----------------------------------- Resolution: Fixed https://github.com/apache/madlib/pull/467 > Mini-batch preprocessor for images - performance issue > ------------------------------------------------------ > > Key: MADLIB-1342 > URL: https://issues.apache.org/jira/browse/MADLIB-1342 > Project: Apache MADlib > Issue Type: Improvement > Components: Module: Utilities > Reporter: Frank McQuillan > Priority: Major > Fix For: v1.17 > > > Improve performance of mini-batch preprocessor for images. May involve > writing a new matrix aggregation function to support multi-dimensional arrays. > I have a 2 segment GP5 cluster set up: > - preprocessing 50k training rows from CIFAR-10 fits into 3 buffers and takes > ~1 hour (buffer size of 24415 is reported in the summary file) -- i.e. used > NULL buffer size > - preprocessing 10k training rows from CIFAR-10 fits into 1 buffer and takes > ~2 minutes > More info: > If I use `buffer_size=5000` it takes 979 sec > If I use `buffer_size=500` it takes 75 sec > So I think there is an issue with large buffer sizes -- This message was sent by Atlassian Jira (v8.3.4#803005)