Thanks for the writeup Anand, this looks like a good approach. > There is a *WIP <https://docs.google.com/document/d/12j4bDwsIBhMN_8DNT2KGXPol7YS_G-DZFy6fjRsUGOQ/edit#bookmark=id.wdsu0jkyygmh>* section in the doc, where I am figuring out a better solution. I would love to hear any suggestions or alternatives for that section.
Just wanted to boost this in case there are people who don't click through to the doc. The problem to solve is how to handle loading a new model without disrupting the current in progress threads that are performing inference on the old model (since loading a model can take minutes and take up a lot of space). Anand's current proposal is to load the second model into memory and require machines to have enough memory to store 2 models. If anyone has tried loading multiple large objects into a single process before, some insight on best practices could be helpful! Thanks, Danny On Mon, Nov 21, 2022 at 4:26 PM Anand Inguva via dev <dev@beam.apache.org> wrote: > Hi, > > I created a doc > <https://docs.google.com/document/d/12j4bDwsIBhMN_8DNT2KGXPol7YS_G-DZFy6fjRsUGOQ/edit?usp=sharing>[1] > on a feature that I am working on for the RunInference > <https://github.com/apache/beam/blob/814a5ded8c493d55edeaf350c808c131289165e8/sdks/python/apache_beam/ml/inference/base.py#L269> > transform, where users can provide dynamic model updates via side inputs to > the RunInference transform. > > There is a *WIP > <https://docs.google.com/document/d/12j4bDwsIBhMN_8DNT2KGXPol7YS_G-DZFy6fjRsUGOQ/edit#bookmark=id.wdsu0jkyygmh>* > section in the doc, where I am figuring out a better solution. I would love > to hear any suggestions or alternatives for that section. > > Please go through the doc and let me know what you think. > > Thanks, > Anand > > [1] > https://docs.google.com/document/d/12j4bDwsIBhMN_8DNT2KGXPol7YS_G-DZFy6fjRsUGOQ/edit?usp=sharing >