fmcquillan99 edited a comment on issue #459: DL: Add support for asymmetric segment distribution to preprocessor URL: https://github.com/apache/madlib/pull/459#issuecomment-558917862 It looks like it's almost there, the last thing to fix is: ``` madlib=# SELECT * FROM segments_to_use ORDER BY hostname, dbid; dbid | hostname ------+----------------------- 2 | pm-demo-machine-keras 2 | pm-demo-machine-keras 2 | pm-demo-machine-keras 2 | pm-demo-machine-keras 2 | pm-demo-machine-keras 3 | pm-demo-machine-keras (6 rows) ``` produces ``` madlib=# DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary; DROP TABLE Time: 22.951 ms madlib=# SELECT madlib.training_preprocessor_dl('image_data', -- Source table 'image_data_packed', -- Output table 'species', -- Dependent variable 'rgb', -- Independent variable NULL, -- Buffer size 255, -- Normalizing constant NULL, 'segments_to_use' ); -[ RECORD 1 ]------------+- training_preprocessor_dl | Time: 2355.189 ms madlib=# madlib=# madlib=# SELECT * FROM image_data_packed_summary; -[ RECORD 1 ]-----------+------------------ source_table | image_data output_table | image_data_packed dependent_varname | species independent_varname | rgb dependent_vartype | text class_values | {bird,cat,dog} buffer_size | 26 normalizing_const | 255 num_classes | 3 distribution_rules | {2,2,2,2,2,3} __internal_gpu_config__ | {0,0,0,0,0,1} ``` I'd suggest throwing an error if there is a duplicate row in the distribution table, rather than passing it through like this. Besides that LGTM
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
