[
https://issues.apache.org/jira/browse/SOLR-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038355#comment-13038355
]
Mike Sokolov commented on SOLR-1758:
------------------------------------
This was originally reported in the context of DIH, but as the OP said, it
applies equally well to all configuration.
The config-validation.patch includes changes to Config that validate all XML
configuration files loaded there. The patch includes a schema with rules for
<config/>, <schema>, <solr/>, <elevate/> and <root/> (used in tests). It could
be extended for other files as well. The change causes Config to look in
solr.home for a file called config.xsd. If found, it is loaded and used to
validate whatever configuration file is being loaded. If a validation error
occurs, an exception is raised (and logged? this seemed to be the way it was
done before, although it seemed odd to me - I'd have thought exception logging
would want to be handled at an outermost layer).
The Solr XML usage seems to be very flexible in practice. Therefore the schema
attempts to allow a fair amount of flexibility: for elements marked as
"plugins" in the Wiki documentation, I've allowed pretty much arbitrary child
content. The wildcards in the schema are "lax" which means that they allow any
element, even unknown elements, but when known elements are found, they are
validated against the model in the schema (eg: <str> is not allowed to have any
child elements).
All the Solr tests but one pass with the patch, which means that the
configuration in the solr example, as well as the various test configurations
in solr/src/test-files/solr/conf, are all valid according to the schema. The
exception is one solrconfig.xml with a
luceneMatchVersion=4.0; I think this should LUCENE_40? The patch also includes
one new test of an invalid schema; it probably should have a few more.
However, my knowledge of Solr configuration options is far from encyclopedic -
I spent a while with the documentation and examples - and there are almost
certainly additional configuration options out there that are in use and
should be accounted for in the "standard" schema, eg some elements that should
accept any attribute that don't currently.
In general I expect the schema could be evolved to be looser in some areas, and
perhaps, tighter in others.
To help with that, I created some ant rules to convert the schema from Relax NG
Compact syntax to XML Schema. I find Relax easier to maintain, but including
runtime validation support for Relax would require a large jar to be added to
solr. In this patch is dev-tools/schema; in there is a config.rnc, which is
the source schema, and build.xml which compiles config.xsd from that using the
trang.jar library and copies it into a few
places in the solr source tree.
Some TODOs:
It might be better to have separate schema files for separate configuration
documents - this way the decision to validate could be made on a per-file
basis, rather than globally for all configuration.
There is no model for <highlighting> in the schema - it's just a big wildcard
right now.
> schema definition for configuration files
> -----------------------------------------
>
> Key: SOLR-1758
> URL: https://issues.apache.org/jira/browse/SOLR-1758
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Affects Versions: 1.4
> Reporter: Jorg Heymans
> Attachments: config-validation-20110523.patch
>
>
> A schema definition would be able to spot the subtle error in below config
> {code}
> <dataSource name="ora" driver="oracle.jdbc.OracleDriver" url="...." />
> <datasource name="orablob" type="FieldStreamDataSource" />
> <document name="mydoc">
> <entity dataSource="ora" name="meta" query="select id, filename,
> bytes from documents" >
> <field column="ID" name="id" />
> <field column="FILENAME" name="filename" />
> <entity dataSource="orablob" processor="TikaEntityProcessor"
> url="bytes" dataField="meta.BYTES">
> <field column="text" name="mainDocument"/>
> </entity>
> </entity>
> </document>
> {code}
> Also, many xml editors support auto completion based on schema definition so
> it would be easier to create configuration without constantly having to refer
> to javadoc or samples from the distribution.
> This applies equally to schema.xml and solr-config.xml
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]