There was a question on the user's list today about making lazily-loaded (aka transient) cores work with SolrCloud where I basically punted and said "not designed with that in mind". I've kind of avoided thinking about this as the use-case; the transient code wasn't written with SolrCloud in mind.
But what is the general reaction to that pairing? Mostly I'm looking for feedback at the level of "no way that could work without invasive changes to SolrCloud, don't even go there" or "sure, just allow ZK to get a list of all cores and it'll be fine, the user is responsible for the quirks though". Some questions that come to my mind: > Is a core that's not loaded be considered "live" by ZK? Would simply returning a list of all cores (both loaded and not loaded) be sufficient for ZK? (this list is already available so the admin UI can list all cores). > Does SolrCloud distributed update processing go through (or could be made to go through) the path that autoloads a core? > Ditto for querying. I suspect the answer to both is that it'll "just happen". > Would the idea of waiting for all the cores to load on all the nodes for an update be totally unacceptable? We already have the distributed deadlock potential, this seems to make that more likely by lengthening out the time the semaphore in question is held. > Would re-synching/leader election be an absolute nightmare? I can imagine that if all the cores for a particular shard weren't loaded at startup, there'd be a terrible time waiting for leader election for instance. > Stuff I haven't thought of Mostly I'm trying to get a "sense of the community" here about whether supporting transient cores in SolrCloud mode would be something that would be easy/do-able/really_hard/totally_unacceptable. Thanks, Erick