[GitHub] [madlib] khannaekta commented on a change in pull request #439: DL: Add support in preprocessor to evenly distribute data for GPDB

GitBox Mon, 09 Sep 2019 14:00:52 -0700

khannaekta commented on a change in pull request #439: DL: Add support in 
preprocessor to evenly distribute data for GPDB
URL: https://github.com/apache/madlib/pull/439#discussion_r322452392


 ##########
 File path: 
src/ports/postgres/modules/deep_learning/test/input_data_preprocessor.sql_in
 ##########
 @@ -88,6 +89,23 @@ SELECT training_preprocessor_dl(
   'label',
   'x');
 
+-- Test data is evenly distributed across all segments (GPDB only)
+m4_changequote(`<!', `!>')
+m4_ifdef(<!__POSTGRESQL__!>, <!!>, <!
+DROP TABLE IF EXISTS data_preprocessor_input_batch, 
data_preprocessor_input_batch_summary;
+SELECT training_preprocessor_dl(
+  'data_preprocessor_input',
+  'data_preprocessor_input_batch',
+  'id',
+  'x',
+  1);
+
+SELECT assert(count(*)=(SELECT ceil(17.0/count(*)) from 
gp_segment_configuration WHERE role = 'p' and content != -1), 'Even 
distribution of buffers failed.')
+FROM data_preprocessor_input_batch
+WHERE gp_segment_id = 0
+GROUP BY gp_segment_id;
 
 Review comment:
   Good catch! Initially we were planning to count images for all the segments, 
but now we only want to assert on a single segment so `GROUP BY ` doesn't make 
sense. Will update the test query to remove it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [madlib] khannaekta commented on a change in pull request #439: DL: Add support in preprocessor to evenly distribute data for GPDB

Reply via email to