[ https://issues.apache.org/jira/browse/SOLR-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15537281#comment-15537281 ]
John Call commented on SOLR-9579: --------------------------------- My reasoning for lazily creating the FieldType is that in the following three scenarios they are simply taking up memory with no use. 1) Systems where the implementations of SchemaField override CreateField and thus don't use this object. For example TrieField, PointType, and EnumField will not use this object at all. The main use I see is for text fields. 2) This flow is only used for ingestion path, for systems where queries are the main use dedicating any extra memory per field seems unnecessary. 3) For high ingestion systems with thousands of schema fields and sparse usage of some creating them all upfront could have slight performance impact on startup. Additionally, creating it lazily should still be faster than the current code. In regard to benchmarking any suggestions would be appreciated, I'm not sure if there is any standardization on which schema or data set to use (I believe I have seen others discussing using the GettingStarted but I've never looked at how much data that contains). I understand that the memory impact of creating the object in the constructor is on the order of KB for most systems so I can easily make that change if there is consensus around it. > Reuse lucene FieldType in createField flow during ingestion > ----------------------------------------------------------- > > Key: SOLR-9579 > URL: https://issues.apache.org/jira/browse/SOLR-9579 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis > Affects Versions: 6.x, master (7.0) > Environment: This has been primarily tested on Windows 8 and Windows > Server 2012 R2 > Reporter: John Call > Priority: Minor > Labels: gc, memory, reuse > Fix For: 6.x, master (7.0) > > Attachments: SOLR-9579.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > During ingestion createField in FieldType is being called for each field on > each document. For the subclasses of FieldType without their own > implementation of createField the lucene version of FieldType is created to > be stored along with the value. However the lucene FieldType object is > identical when created from the same SchemaField. In testing ingestion of one > million rows with 22 field each we were creating 22 million lucene FieldType > objects when only 22 are needed. Solr should lazily initialize a lucene > FieldType for each SchemaField and reuse them for future ingestion. Not only > does this relieve memory usage but also relieves significant pressure on the > gc. > There are also subclasses of Solr FieldType which create separate Lucene > FieldType for stored fields instead of reusing the static in StoredField. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org