fmcquillan99 edited a comment on issue #459: DL: Add support for asymmetric 
segment distribution to preprocessor
URL: https://github.com/apache/madlib/pull/459#issuecomment-558917862
 
 
   It looks like it's almost there, the last thing to fix is:
   ```
   madlib=# SELECT * FROM segments_to_use ORDER BY hostname, dbid;
    dbid |       hostname        
   ------+-----------------------
       2 | pm-demo-machine-keras
       2 | pm-demo-machine-keras
       2 | pm-demo-machine-keras
       2 | pm-demo-machine-keras
       2 | pm-demo-machine-keras
       3 | pm-demo-machine-keras
   (6 rows)
   ```
   produces
   ```
   madlib=# DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary;  
                                                                       DROP 
TABLE
   Time: 22.951 ms
   madlib=# SELECT madlib.training_preprocessor_dl('image_data',         -- 
Source table                                                                    
                                   'image_data_packed',  -- Output table        
                                                                                
                       'species',            -- Dependent variable              
                                                                                
           'rgb',                -- Independent variable                        
                                                                               
NULL,                 -- Buffer size                                            
                                                                    255,        
           -- Normalizing constant                                              
                                                        NULL,                   
                                                                                
                                            'segments_to_use'                   
                                                                                
                                );
   -[ RECORD 1 ]------------+-
   training_preprocessor_dl | 
   
   Time: 2355.189 ms
   madlib=# 
   madlib=# 
   madlib=# SELECT * FROM image_data_packed_summary;
   -[ RECORD 1 ]-----------+------------------
   source_table            | image_data
   output_table            | image_data_packed
   dependent_varname       | species
   independent_varname     | rgb
   dependent_vartype       | text
   class_values            | {bird,cat,dog}
   buffer_size             | 26
   normalizing_const       | 255
   num_classes             | 3
   distribution_rules      | {2,2,2,2,2,3}
   __internal_gpu_config__ | {0,0,0,0,0,1}
   ```
   I'd suggest throwing an error if there is a duplicate row in the 
distribution table, rather than passing it through like this.
   
   Besides that
   
   LGTM
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to