Hi Greg, Your understanding is correct, and I agree that this limits managed schema functionality.
Under SolrCloud, all Solr nodes participating in a collection bound to a configset with a managed schema keep a watch on the corresponding schema ZK node. In my testing (on my laptop), when the managed schema is written to ZK, the other nodes are notified very quickly (single-digit milliseconds) and immediately download and start parsing the schema. Incoming requests are bound to a snapshot of the live schema at the time they arrive, so there is a window of time between initial posting to ZK and swapping out the schema after parsing. Different loads on, and/or different network latentcy between ZK and each participating node can result in varying latencies before all nodes are in sync. For Schema API users, delaying a couple of seconds after adding fields before using them should workaround this problem. While not ideal, I think schema field additions are rare enough in the Solr collection lifecycle that this is not a huge problem. For schemaless users, the picture is worse, as you noted. Immediate distribution of documents triggering schema field addition could easily prove problematic. Maybe we need a schema update blocking mode, where after the ZK schema node watch is triggered, all new request processing is halted until the schema is finished downloading/parsing/swapping out? Can you make an issue, Greg? (Such a mode should help Schema API users too.) Thanks, Steve On Jun 3, 2014, at 8:06 PM, Gregory Chanan <gcha...@cloudera.com> wrote: > I'm trying to determine if the Managed Schema functionality works with > SolrCloud, and AFAICT the integration seems pretty limited. > > The issue I'm running into is variants of the issue that schema changes are > not pushed to all shards/replicas synchronously. So, for example, I can make > the following two requests: > 1) add a field to the collection on server1 using the Schema API > 2) add a document with the new field, the document is routed to a core on > server2 > > Then, there appears to be a race between when the document is processed by > the core on server2 and when the core on server2, via the > ZkIndexSchemaReader, gets the new schema. If the document is processed > first, I get a 400 error because the field doesn't exist. This is easily > reproducible by adding a sleep to the ZkIndexSchemaReader's processing. > > I hit a similar issue with Schemaless: the distributed request handler sends > out the document updates, but there is no guarantee that the other > shards/replicas see the schema changes made by the update.chain. > > Is my understanding correct? Is this expected? > > Thanks, > Greg --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org