Re: Solr Config XML DTD's

Mike Sokolov Wed, 18 May 2011 13:17:37 -0700

I looked into inserting a formal validation step in o.a.solr.core.Configand ran some preliminary simple tests. The code is fairly simple; justa couple of gotchas:

1) to use the RNC validation language (my preference), we would need topull in a couple of new jars, one of which is over 600K. Also, supportfor RNC in the XML world is not very widespread: it's gotten moreinterest from researchers and less uptake more broadly, so it might notbe the best choice, even if, aesthetically it is superior IMO.

2) The other alternatives are XML Schema and DTD. I think DTD is anon-starter since it just can't allow things like arbitrary attributeson an element (you have to list them explicitly). Schema is probablythe best choice all things considered: support for it is built into theXML tools already in use, and it is widely adopted. The drawback isthat it's a baroque and unwieldy syntax designed by an indecisivecommittee that loaded it down with excessive featuritis, and someonewill end up having to maintain this: every time you add a newconfiguration option to the schema (or solrconfig, etc), then theschema-schema (validation schema?) will have to be updated to reflect that.

3) Finally, to get good error reporting it's important to show file nameand line number where an error occurred. Although you can validate aconstructed XML tree (a DOM), it's better to run validation on a Streamso the line numbers are available. Therefore it will probably benecessary to run two passes (one to validate, and one to construct theDOM), which means buffering the config. Doesn't seem like a big deal:these are small files that only get loaded once, but this is a cost ofvalidation, I think.

Of course the benefit is that users would actually get fast-failingspecific and informative error messages covering a wide variety ofmisconfigurations: I would hope we could be restrictive enough to catchmis-spelled versions of known element and attribute names, or placeswhere elements are out of order.

I'd be willing to work this up, develop a preliminary schema (ofwhichever sort we choose), and send in a patch, but other folks wouldprobably end up having to maintain it from time to time if it's to haveany value at all and not just get disabled, so I just want to make surethis is something you all think is worth while before going any further.


-Mike



On 05/17/2011 09:04 AM, Michael McCandless wrote:

https://issues.apache.org/jira/browse/SOLR-2119 is a good example
where we are failing to catch mis-configuration on startup.

Is there some way we can baby step here?  EG use one of these XML
validation packages, incrementally, on only sub-strings from the XML?
(Or simpler is to just do the checking ourselves w/ custom code).

Mike

http://blog.mikemccandless.com

On Wed, May 4, 2011 at 10:50 PM, Michael Sokolov<soko...@ifactory.com>  wrote:

I'm not sure you will find anyone wanting to put in this effort now, but
another suggestion for a general approach might be:

1 very basic static analysis to catch what you can - this should be a pretty
minimal effort only given what can reasonably be achieved

2 throw runtime errors as Hoss says (probably already doing this well
enough, but maybe some incremental improvements are needed?)

3 an option to run a "configtest" like httpd provides that preloads all
declared handlers/plugins/modules etc, instantiates them and gives them an
opportunity to read their config and throw whatever errors they find.  This
way you can set a standard (error on unrecognized parameter, say) in some
core areas, and distribute the effort.  This is a hugely useful sanity check
to be able to run when you want to make config changes and not have your
server fall over when it starts (or worse - later).

-Mike "kibitzer" Sokolov

On 5/4/2011 6:55 PM, Chris Hostetter wrote:

As i said: any improvements to help catch the mistakes we can identify
would be great, but we should maintain perspective of the effort/gain
tradeoff given that there is likely nothing we can do about the basic
problem of "a string that won't be evaluated until runtime"


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr Config XML DTD's

Reply via email to