On 11/22/2014 10:41 AM, Alexandre Rafalovitch wrote: > I can't find a relevant Jira/discussion space if this exists. I > strongly feel that the "basic" example is still far from basic and > there needs to be a subgroup of people discussing of what can be > cut-off to demonstrate a true minimal configuration. > > I am happy to take a lead on that if nobody is doing it, but I would > like to do it as part of a group that can focus on _deleting_ > explanations, defaults, and near-identical definitions. > > My hope would be to have a solrconfig and schema that are under 30 > lines each not counting license. As a too-extreme example, I can offer > my earlier attempts to under-15-lines configuration: > https://github.com/arafalov/simplest-solr-config/tree/master/simplest-solr/collection1/conf > > I think such an example schema would go hand-in-hand with writing a > tutorial and would assist in telling an interesting story in the > "simplest terms" possible.
The general goal you've outlined sounds really good to me. The only criticism I have (and I hope it's constructive) is that your small schema/solrconfig files are basically hiding EVERYTHING. I agree that the examples we currently have count as information overload, but stripping it too far might represent another problem. In particular, the lack of an analyzed textField type makes simple keyword search impossible with that example -- and I believe that keyword search is one of the primary reasons that a new user will look into Solr. The text_general type in the full example seems reasonable ... there's some complexity, but it's not SUPER complicated. One numeric type/field might be a good thing to add as well, and perhaps _version_ too. The solrconfig should have the transaction log turned on. The default directory factory is the NRT version, which in some circumstances can hold onto index data only in RAM, which the transaction log protects against. Having the transaction log turned on means that autoCommit with openSearcher=false should be configured as well. While these may not be strictly required for a proof of concept or demo system, these are part of what I believe are best practices, which we should encourage in ALL our examples. A very simple and short config snippet: <!-- the default high-performance update handler --> <updateHandler class="solr.DirectUpdateHandler2"> <autoCommit> <maxDocs>25000</maxDocs> <maxTime>300000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <updateLog /> </updateHandler> With both of these ideas added, the size of the schema would still be in the ballpark of 30 lines, and the solrconfig would be a lot less. There may be other best practices that need to be considered, which might push things beyond the 30 line goal you have mentioned. Thanks, Shawn --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org