andrea crotti <andrea.crott...@gmail.com> writes: > ... > The reason is that it has to work on many platforms and without any c module > installed, the reason of that
Searching for a pure Python solution, you might have a look at "PyXB". It has not been designed to validate XML instances against XML-Schema (but to map between XML instances and Python objects based on an XML-Schema description) but it detects many problems in the XML instances. It does not introduce its own C extensions (but relies on an XML parser shipped with Python). > Anyway in a sense it's also quite interesting, and I don't need to implement > the whole XML, so it should be fine. The XML is the lesser problem. The big problem is XML-Schema: it is *very* complex with structure definitions (elements, attributes and "#PCData"), inheritance, redefinition, grouping, scoping rules, inclusion, data types with restrictions and extensions. Thus if you want to implement a reliable algorithm which for given XML-schema and XML-instance checks whether the instance is valid with respect to the schema, then you have a really big task. Maybe, you have a fixed (and quite simple) schema. Then you may be able to implement a validator (for the fixed schema). But I do not understand why you would want such a validation. If you generate the XML instances, then thouroughly test your generation process (using any available validator) and then trust it. If the XML instances come from somewhere else and must be interpreted by your application, then the important thing is that they are understood by your application, not that they are valid. If you get a complaint that your application cannot handle a specific XML instance, then you validate it in your development environment (again with any validator available) and if the validation fails, you have good arguments. > What I haven't found yet is an explanation of a possible algorithm to use for > the validation, that I could then implement.. You parse the XML (and get a tree) and then recursively check that the elements, attributes and text nodes in the tree conform to the schema (in an abstract sense, the schema is a collection of content models for the various elements; each content model tells you how the element content and attributes should look like). For a simple schema, this is straight forward. If the schema starts to include foreign schemas, uses extensions, restrictions or "redefine"s, then it gets considerably more difficult. -- Dieter -- http://mail.python.org/mailman/listinfo/python-list