[GitHub] [madlib] khannaekta opened a new pull request #516: DL: Implement caching for fit_multiple_model

GitBox Thu, 10 Sep 2020 16:18:19 -0700


khannaekta opened a new pull request #516:
URL: https://github.com/apache/madlib/pull/516



   Currently passing around independent and dependent vars to the
   transition function is what takes up most of the time.
   As part of this PR, a new fit_multipl_transition function is called that
   reads all the rows (for each seg) into the cache(SD) for the very first
   hop and for each subsequent hop/iteration, the data is read from the
   cache instead of table and cleared out at the final training call. This
   helps reduces the time to pass along the data to the transition function.
   Since, the data is cached into memory, the memory usage per segment
   increases significantly. To avoid this, a new optional param
   `use_caching` is added to madlib_keras_fit_multiple_model(), that can be
   set to TRUE if the memory on each segment meets the following
   calculation:
   
      IND_SZ (indep var size of each row) = ((image_dimension)*4)*(#of images 
per buffer)
      DEP_SZ (indep var size of each row) = (#DEP_VAR * 4)*(#of images per 
buffer)
      memory_data = (#seg_per_host) * (#rows_per_seg * IND_SZ) + 
(#seg_per_host) * (#rows_per_seg * DEP_SZ)
      memory_model = model_size * #models_per_seg * #seg_per_host
      total_memory = memory_data + memory_model
   
   <!--  
   
   Thanks for sending a pull request!  Here are some tips for you:
   1. Refer to this link for contribution guidelines 
https://cwiki.apache.org/confluence/display/MADLIB/Contribution+Guidelines
   2. Please Provide the Module Name, a JIRA Number and a short description 
about your changes.
   -->
   
   - [ ] Add the module name, JIRA# to PR/commit and description.
   - [ ] Add tests for the change. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [madlib] khannaekta opened a new pull request #516: DL: Implement caching for fit_multiple_model

Reply via email to