[ 
https://issues.apache.org/jira/browse/MADLIB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1342:
------------------------------------
    Description: 
Improve performance of mini-batch preprocessor for images.  May involve writing 
a new matrix aggregation function to support multi-dimensional arrays.

I have a 2 segment GP5 cluster set up:

- preprocessing 50k training rows from CIFAR-10 fits into 3 buffers and takes 
~1 hour (buffer size of 24415 is reported in the summary file) -- i.e. used 
NULL buffer size

- preprocessing 10k training rows from CIFAR-10 fits into 1 buffer and takes ~2 
minutes

More info:

If I use `buffer_size=5000` it takes 979 sec
If I use `buffer_size=500` it takes 75 sec

So I think there is an issue with large buffer sizes

  was:
Follow on from https://issues.apache.org/jira/browse/MADLIB-1334

Improve performance of mini-batch preprocessor for images.  May involve writing 
a new matrix aggregation function to support multi-dimensional arrays.

I have a 2 segment GP5 cluster set up:

- preprocessing 50k training rows from CIFAR-10 fits into 3 buffers and takes 
~1 hour (buffer size of 24415 is reported in the summary file)
- preprocessing 10k training rows from CIFAR-10 fits into 1 buffer and takes 
~2-3 minutes




> Mini-batch preprocessor for images - performance issue
> ------------------------------------------------------
>
>                 Key: MADLIB-1342
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1342
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Priority: Major
>             Fix For: v1.17
>
>
> Improve performance of mini-batch preprocessor for images.  May involve 
> writing a new matrix aggregation function to support multi-dimensional arrays.
> I have a 2 segment GP5 cluster set up:
> - preprocessing 50k training rows from CIFAR-10 fits into 3 buffers and takes 
> ~1 hour (buffer size of 24415 is reported in the summary file) -- i.e. used 
> NULL buffer size
> - preprocessing 10k training rows from CIFAR-10 fits into 1 buffer and takes 
> ~2 minutes
> More info:
> If I use `buffer_size=5000` it takes 979 sec
> If I use `buffer_size=500` it takes 75 sec
> So I think there is an issue with large buffer sizes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to