reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r586728227



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, 
strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',

Review comment:
       We don't, but it makes the code less error prone and more easily 
maintainable.
   
   I think the question you're getting at, which I was very confused about when 
I first looked at this code, is why were we handling it as a special case, 
rather than just running literal eval on it like we do for any other param?
   
   Previously we were adding quotes around the value if they weren't there 
already, and then removing them later if the result is equal to the strings 
'True' or 'False', with this awkward workaround:
   ```
   if 'shuffle' in fit_params_dict:
               shuffle_value = fit_params_dict['shuffle']
               if shuffle_value == 'True' or shuffle_value == 'False':
                   fit_params_dict['shuffle'] = bool(shuffle_value)
   ```
   But there is no reason to accept 'True' or 'False' as a param, as that's not 
valid python code for passing a boolean anyway.  So instead of changing True to 
'True' then back to True again, we just leave it as is.  If they pass in 
shuffle='True' or shuffle="True" that should generate an error, just as it does 
if you passed that as a fit param to keras outside of MADlib--it's not a valid 
parameter value.
   
   We do have to handle the callbacks fit param as a special case, since we 
need to block any callbacks besides Tensorboard (at least for now).
   
   So why were we doing a whole extra song and dance for shuffle as well?  The 
answer is not obvious, but I eventually found that it has to do with older 
versions of keras allowing an alternate syntax for passing optimizer classes to 
compile params.
   
   In recent versions of keras, the optimizer classes are required to be passed 
in the normal pythonic way, as class variables.  But older versions also 
allowed passing the name of the class as a string... which keras then converts 
into the actual class before using.
   
   How does that affect fit params?  It doesn't, except for the fact that we 
re-purposed a function that processes the compile params, which was designed in 
an overly-general way to allow the value of any parameter to either have quotes 
around it or not.
   
   If my understanding is correct, the only reason this was necessary, even for 
processing compile params, is to handle this special case of how optimizer 
classes can be passed in older versions of keras.   It's not necessary at all 
for the fit params, since none of them can be passed in the alternate way, in 
any version of keras.  We were allowing some extra weird possibilities (like 
passing shuffle='True' instead of shuffle=True) that keras doesn't normally 
allow, just so that we didn't have to handle fit params differently from 
compile params, or most compile params differently from optimizer params.
   
   Hopefully eventually, we will be able to stop adding quotes to any of the 
params passed in, and just use literal_eval on everything.  But for now, it 
seems like maintaining backwards compatibility for the optimizer params is a 
good idea... so we still add quotes for all compile params, just not fit params 
now.  This removes the need to handle shuffle as a special case.  As a next 
step perhaps, we can do the same to either most or all of the compile params, 
but I think it may require re-writing some tests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to