I looked into inserting a formal validation step in o.a.solr.core.Config
and ran some preliminary simple tests. The code is fairly simple; just
a couple of gotchas:
1) to use the RNC validation language (my preference), we would need to
pull in a couple of new jars, one of which is over 600K. Also, support
for RNC in the XML world is not very widespread: it's gotten more
interest from researchers and less uptake more broadly, so it might not
be the best choice, even if, aesthetically it is superior IMO.
2) The other alternatives are XML Schema and DTD. I think DTD is a
non-starter since it just can't allow things like arbitrary attributes
on an element (you have to list them explicitly). Schema is probably
the best choice all things considered: support for it is built into the
XML tools already in use, and it is widely adopted. The drawback is
that it's a baroque and unwieldy syntax designed by an indecisive
committee that loaded it down with excessive featuritis, and someone
will end up having to maintain this: every time you add a new
configuration option to the schema (or solrconfig, etc), then the
schema-schema (validation schema?) will have to be updated to reflect that.
3) Finally, to get good error reporting it's important to show file name
and line number where an error occurred. Although you can validate a
constructed XML tree (a DOM), it's better to run validation on a Stream
so the line numbers are available. Therefore it will probably be
necessary to run two passes (one to validate, and one to construct the
DOM), which means buffering the config. Doesn't seem like a big deal:
these are small files that only get loaded once, but this is a cost of
validation, I think.
Of course the benefit is that users would actually get fast-failing
specific and informative error messages covering a wide variety of
misconfigurations: I would hope we could be restrictive enough to catch
mis-spelled versions of known element and attribute names, or places
where elements are out of order.
I'd be willing to work this up, develop a preliminary schema (of
whichever sort we choose), and send in a patch, but other folks would
probably end up having to maintain it from time to time if it's to have
any value at all and not just get disabled, so I just want to make sure
this is something you all think is worth while before going any further.
-Mike
On 05/17/2011 09:04 AM, Michael McCandless wrote:
https://issues.apache.org/jira/browse/SOLR-2119 is a good example
where we are failing to catch mis-configuration on startup.
Is there some way we can baby step here? EG use one of these XML
validation packages, incrementally, on only sub-strings from the XML?
(Or simpler is to just do the checking ourselves w/ custom code).
Mike
http://blog.mikemccandless.com
On Wed, May 4, 2011 at 10:50 PM, Michael Sokolov<soko...@ifactory.com> wrote:
I'm not sure you will find anyone wanting to put in this effort now, but
another suggestion for a general approach might be:
1 very basic static analysis to catch what you can - this should be a pretty
minimal effort only given what can reasonably be achieved
2 throw runtime errors as Hoss says (probably already doing this well
enough, but maybe some incremental improvements are needed?)
3 an option to run a "configtest" like httpd provides that preloads all
declared handlers/plugins/modules etc, instantiates them and gives them an
opportunity to read their config and throw whatever errors they find. This
way you can set a standard (error on unrecognized parameter, say) in some
core areas, and distribute the effort. This is a hugely useful sanity check
to be able to run when you want to make config changes and not have your
server fall over when it starts (or worse - later).
-Mike "kibitzer" Sokolov
On 5/4/2011 6:55 PM, Chris Hostetter wrote:
As i said: any improvements to help catch the mistakes we can identify
would be great, but we should maintain perspective of the effort/gain
tradeoff given that there is likely nothing we can do about the basic
problem of "a string that won't be evaluated until runtime"
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org