reductionista commented on issue #467: DL: Improve performance of mini-batch preprocessor URL: https://github.com/apache/madlib/pull/467#issuecomment-573278611 > @reductionista Looking at the 5k buffer size runs, can you pls just double check in the code that we are skipping the normalization if the normalization factor is `1.0` or `NULL` ? I am asking since there is no performance improvement on my small test cluster when I skip normalization. I turned on debugging for a small 3-segment cluster and verified that `scalar_array_mult()` is not called for the `NULL/1.0` cases. Breaking down the timings of the individual queries, the 4 main stages for the 5k buffer size with NULL normalization are: ``` 1-hot encoding : 2.5s batching: 422s redistribution: 1.5s bytea conversion: 35s ``` For 5k buffer size with 256.0 normalization they look like: ``` 1-hot encoding + normalization: 40s batching: redistribution: bytea conversion: ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
