On 11/22/2014 10:41 AM, Alexandre Rafalovitch wrote:
> I can't find a relevant Jira/discussion space if this exists. I
> strongly feel that the "basic" example is still far from basic and
> there needs to be a subgroup of people discussing of what can be
> cut-off to demonstrate a true minimal configuration.
> 
> I am happy to take a lead on that if nobody is doing it, but I would
> like to do it as part of a group that can focus on _deleting_
> explanations, defaults, and near-identical definitions.
> 
> My hope would be to have a solrconfig and schema that are under 30
> lines each not counting license. As a too-extreme example, I can offer
> my earlier attempts to under-15-lines configuration:
> https://github.com/arafalov/simplest-solr-config/tree/master/simplest-solr/collection1/conf
> 
> I think such an example schema would go hand-in-hand with writing a
> tutorial and would assist in telling an interesting story in the
> "simplest terms" possible.

The general goal you've outlined sounds really good to me.  The only
criticism I have (and I hope it's constructive) is that your small
schema/solrconfig files are basically hiding EVERYTHING.  I agree that
the examples we currently have count as information overload, but
stripping it too far might represent another problem.

In particular, the lack of an analyzed textField type makes simple
keyword search impossible with that example -- and I believe that
keyword search is one of the primary reasons that a new user will look
into Solr.  The text_general type in the full example seems reasonable
... there's some complexity, but it's not SUPER complicated.  One
numeric type/field might be a good thing to add as well, and perhaps
_version_ too.

The solrconfig should have the transaction log turned on.  The default
directory factory is the NRT version, which in some circumstances can
hold onto index data only in RAM, which the transaction log protects
against.  Having the transaction log turned on means that autoCommit
with openSearcher=false should be configured as well.  While these may
not be strictly required for a proof of concept or demo system, these
are part of what I believe are best practices, which we should encourage
in ALL our examples.

A very simple and short config snippet:

<!-- the default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">
  <autoCommit>
    <maxDocs>25000</maxDocs>
    <maxTime>300000</maxTime>
    <openSearcher>false</openSearcher>
  </autoCommit>
  <updateLog />
</updateHandler>

With both of these ideas added, the size of the schema would still be in
the ballpark of 30 lines, and the solrconfig would be a lot less.  There
may be other best practices that need to be considered, which might push
things beyond the 30 line goal you have mentioned.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to