I did not know that about StringTableSize. I thought it was more of a hard limit. That's good to know. Thanks
On Wed, Feb 8, 2017 at 2:16 PM, Joern Kottmann <[email protected]> wrote: > The StringTableSize doesn't limit the amount of Strings that can be stored > in the pool, if the size is too small it just gets slower. > This would only be done for loading models, querying the model wouldn't be > affected. The predicate / feature strings would be interned. > > Jörn > > > > On Wed, Feb 8, 2017 at 6:37 PM, Jeffrey Zemerick <[email protected]> > wrote: > > > Would it be possible to have an option or setting somewhere that > determines > > if string pooling is used? The option would provide backward > compatibility > > in case someone has to adjust the -XX:StringTableSize because their > > existing models exceed the default JVM limit, and an option would also be > > useful for cases when the models were made from different data sources. > > (I'm assuming in that case using string pooling would be detrimental to > > performance.) > > > > Jeff > > > > > > On Wed, Feb 8, 2017 at 5:50 AM, Joern Kottmann <[email protected]> > wrote: > > > > > Hello all, > > > > > > I often run multiple models in production, often trained on the same > data > > > but with different types (typical name finder scenario). There could be > > one > > > model to detect person names, and another to detection locations. The > > > predicate Strings inside those models are always the same but the > models > > > can't share the same String instance. > > > > > > I would like to propose that we use String.intern in the model reader > to > > > ensure one string is only loaded once. > > > > > > We tried that in the past and this caused lots of issues with PermGen > > > space, but this was improved over time in Java. In Java 8 (on which we > > > depend now) this should work properly. > > > > > > Here is an interesting article about it: > > > http://java-performance.info/string-intern-in-java-6-7-8/ > > > > > > Using String.intern will make the model loading a bit slower (we can > > > benchmark that). > > > > > > Jörn > > > > > >
