fmcquillan99 edited a comment on issue #467: DL: Improve performance of 
mini-batch preprocessor
URL: https://github.com/apache/madlib/pull/467#issuecomment-573255892
 
 
   Functional and perf tests on small cluster (1 VM, 2 segments).  I know we do 
not expect big improvements on a small cluster, but wanted to compare with 1.16.
   
   
   (1)
   timings cifar-10 training set 50,000 images
   
   ```
   DROP TABLE IF EXISTS cifar_10_train_data_packed, 
cifar_10_train_data_packed_summary;
   
   SELECT madlib.training_preprocessor_dl('cifar_10_train_data',        -- 
Source table
                                          'cifar_10_train_data_packed', -- 
Output table
                                          'y',                          -- 
Dependent variable
                                          'x',                          -- 
Independent variable
                                           NULL,                        -- 
Buffer size
                                           256.0                        -- 
Normalizing constant
                                           );
   ```
   results
   ```
   buffer size | normalize | time (sec)
   100  | 256.0 | 47
   500  | 256.0 | 56
   1000 | 256.0 | 86
   2000 | 256.0 | 152
   5000 | 256.0 | 382
   5000 | 1.0 (no normalize) | 410
   5000 | NULL (no normalize) | 407
   max  | 256.0 | 1553
   ```
   
   
   (2)
   timings cifar-10 test set 10,000 images
   
   ```
   DROP TABLE IF EXISTS cifar_10_test_data_packed, 
cifar_10_test_data_packed_summary;
   
   SELECT madlib.training_preprocessor_dl('cifar_10_test_data',        -- 
Source table
                                          'cifar_10_test_data_packed', -- 
Output table
                                          'y',                          -- 
Dependent variable
                                          'x',                          -- 
Independent variable
                                           NULL,                        -- 
Buffer size
                                           256.0                        -- 
Normalizing constant
                                           );
   ```
   results
   ```
   buffer size | normalize | time (sec)
   max  | 256.0  118
   ```
   
   (3)
   test distribution rules
   
   ```
   DROP TABLE IF EXISTS cifar_10_train_data_packed, 
cifar_10_train_data_packed_summary;
   
   SELECT madlib.training_preprocessor_dl('cifar_10_train_data',        -- 
Source table
                                          'cifar_10_train_data_packed', -- 
Output table
                                          'y',                          -- 
Dependent variable
                                          'x',                          -- 
Independent variable
                                           100,                        -- 
Buffer size
                                           NULL,                        -- 
Normalizing constant
                                           NULL,                                
             -- Number of classes
                                           'gpu_segments'        -- 
Distribution rules
                                           );
   
   ERROR:  plpy.Error: training_preprocessor_dl: No GPUs configured on hosts. 
(plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "training_preprocessor_dl", line 24, in <module>
       training_preprocessor_obj.training_preprocessor_dl()
     PL/Python function "training_preprocessor_dl", line 820, in 
training_preprocessor_dl
     PL/Python function "training_preprocessor_dl", line 340, in 
input_preprocessor_dl
   PL/Python function "training_preprocessor_dl"
   ```
   OK
   
   
   (4)
   test distribution rules
   
   ```
   DROP TABLE IF EXISTS cifar_10_train_data_packed, 
cifar_10_train_data_packed_summary;
   
   SELECT madlib.training_preprocessor_dl('cifar_10_train_data',        -- 
Source table
                                          'cifar_10_train_data_packed', -- 
Output table
                                          'y',                          -- 
Dependent variable
                                          'x',                          -- 
Independent variable
                                           100,                        -- 
Buffer size
                                           NULL,                        -- 
Normalizing constant
                                           NULL,                                
             -- Number of classes
                                           'all_segments'        -- 
Distribution rules
                                           );
   ```
   OK
   
   
   (5)
   test distribution rules
   
   ```
   DROP TABLE IF EXISTS segments_to_use;
   CREATE TABLE segments_to_use(
       dbid INTEGER,
       hostname TEXT
   );
   INSERT INTO segments_to_use VALUES
   (3, 'hostname-01');
   
   DROP TABLE IF EXISTS cifar_10_train_data_packed, 
cifar_10_train_data_packed_summary;
   
   SELECT madlib.training_preprocessor_dl('cifar_10_train_data',        -- 
Source table
                                          'cifar_10_train_data_packed', -- 
Output table
                                          'y',                          -- 
Dependent variable
                                          'x',                          -- 
Independent variable
                                           100,                        -- 
Buffer size
                                           NULL,                        -- 
Normalizing constant
                                           NULL,                                
             -- Number of classes
                                           'segments_to_use'        -- 
Distribution rules
                                           );
   select * from cifar_10_train_data_packed_summary;
   
   -[ RECORD 1 ]-----------+---------------------------
   source_table            | cifar_10_train_data
   output_table            | cifar_10_train_data_packed
   dependent_varname       | y
   independent_varname     | x
   dependent_vartype       | text
   class_values            | {0,1,2,3,4,5,6,7,8,9}
   buffer_size             | 100
   normalizing_const       | 1
   num_classes             | 10
   distribution_rules      | {3}
   __internal_gpu_config__ | {1}
   ```
   OK
   
   
   (6)
   test distribution rules
   
   ```
   DROP TABLE IF EXISTS segments_to_use;
   CREATE TABLE segments_to_use(
       dbid INTEGER,
       hostname TEXT
   );
   INSERT INTO segments_to_use VALUES
   (2, 'hostname-01'),
   (3, 'hostname-01');
   
   DROP TABLE IF EXISTS cifar_10_train_data_packed, 
cifar_10_train_data_packed_summary;
   
   SELECT madlib.training_preprocessor_dl('cifar_10_train_data',        -- 
Source table
                                          'cifar_10_train_data_packed', -- 
Output table
                                          'y',                          -- 
Dependent variable
                                          'x',                          -- 
Independent variable
                                           100,                        -- 
Buffer size
                                           NULL,                        -- 
Normalizing constant
                                           NULL,                                
             -- Number of classes
                                           'segments_to_use'        -- 
Distribution rules
                                           );
   select * from cifar_10_train_data_packed_summary;
   
   -[ RECORD 1 ]-----------+---------------------------
   source_table            | cifar_10_train_data
   output_table            | cifar_10_train_data_packed
   dependent_varname       | y
   independent_varname     | x
   dependent_vartype       | text
   class_values            | {0,1,2,3,4,5,6,7,8,9}
   buffer_size             | 100
   normalizing_const       | 1
   num_classes             | 10
   distribution_rules      | {2,3}
   __internal_gpu_config__ | {0,1}
   ```
   OK
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to