I'd like to post some documentation to help other people trying to deal with thread-safety and lifetime issues on analysis components.
Here is what I think I know, based on corrections here I'll post something. Each Solr core has a schema. By default, Solr create a schema when it creates a core. If, however, shared schemas are enabled, then Solr maintains a map from schema names to schema, and cores that declare the same schema (via the name attribute in the schema XML file) share the schema object. The schema declares a set of field types. Each field type is represented by an object of some class that inherits from org.apache.solr.schema.FieldType. This class optionally stored two analyzers: the 'analyzer' for indexing, and the queryAnalyzer for queries. If a field type is declared with an <analyzer> element that has no class name attribute, Solr creates an analyzer of type org.apache.solr.analysis.TokenizerChain. These objects store a TokenizerFactory, a list of TokenFilterFactories, and a list of CharFilterFactories. They deliver, upon request, a java.io.Reader build from the char filters or a TokenStreamComponents object containing a new tokenizer and filter set. Solr typically runs in a multi-threaded servlet container, so each Solr request runs in the the container thread that handled the HTTP request. For an update request, DocInverterPerField will call Field.tokenStream to get a new token stream. It calls close() on that field when it is done (c.f. LUCENE-2145, which notes that this only closes the internal reader). So there is a new set of analysis components for each field for each request. For a query, the analysis components are, not too surprisingly, created by the query parser, since it is the query parser that must split any relevant strings into their constituent elements. To summarize, then, here is the typical situation. The core has a schema. This lives for the length of the core, or in the shared case, the core container. The schema has field types. Each field type has two analyzers. All of this, so far, has the lifetime of the schema. At update time, the analyzer is called upon to create tokenization components with the lifetime of processing a single document. At query time, the query analyzer is called upon to create tokenization components with the lifetime of processing one field of the query. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org