To answer your first question
"Schema Agnostic" ...
That is a general term ( because there is no 'standard' term for this concept).
What it means more specifically is that MarkLogic supports and uses schema
but doesn't require them. Its in line with the W3C XQuery and XML Specs
(except for possibly a bug you found and a few places we have extensions) ...
but the W3C Specs don't have a definition more precise then "Implementation
Dependent" and "Schema Aware". MarkLogic is both and more but its
intentionally not 'strict' about schemas unless you ask for it, like doing a
validate.
First, the reasoning behind this.
We want to support a multitude of use cases and make it as easy and useful for
as many people.
So we don't require you have any schema, a valid schema or a a locatable schema
for most things.
Most of the time this is what is desired. For example if you load an XML doc,
or say a million XML docs
that reference a schema but you forgot to put one in the database, or maybe
you have 1 of a million document that isn't quite 100% right . most people
don't want the load to abort (say 2/3s in as you hit one bad document) ...
That's *usually* not what people want. So ML ignores schemas, for purposes
of validation, for document inserts. That is the behavior you are seeing in
the update, its intentional.
If you want to validate you can using validate {} (we need to look into the
differencing issue .. I suspect there is something else going on).
Another example is *finding* schemas. It can be quite hard sometimes to make
sure your schemas are found correctly and the right ones found, especially if
you don't use namespaces.
If you have 100 schemas with no namespaces and 10 of them define a <name>
element ...
and you put a million documents in with <name> elements ... most people would
rather that the system keep function then stop you from doing anything because
you have accidently inserted ambiguous schemas.
There are also issues of typing. W3Schema is a strange thing as it attempts to
fill the role of many jobs but often people only need a few. A good example is
simple atomic types. Its very useful to have a basic schema that defines that
<children> is an xs:integer and <birthdate> is a xs:date so when you write
XQuery and make indexes, transform to JSON etc you don't have to put casts and
type conversions everywhere. Rather
$doc/person/children gt 10 vs xs:int($doc/person/ children) gt 10
and it allows you to write type safe XQuery like the following without too much
casting.
declare function is-old-enough-to-drink(
$person as element(person) ,
$drinkingAge as xs:yearMonthDuration
) as xs:boolean
{
return ( $birthDate + $drinkingAge ) gt current-date()
}
...
if( is-old-enough-to-drink( $person , $state-laws[state eq
my-state()]/minimum-drinking-age ) then
"Serve Beer"
else
"Call Cops"
This can be done with an incomplete schema that only defines the bare necessary
types for a few elements. The document is allows to not
validate against the schema unless you call validate {} but it still gives you
the benefit of adding just enough type information but only spend as much
effort as is worth it for your application and development. You can choose no
schemas, some schemas, partial schemes or fully explicit schemas
and MarkLogic will *attempt* to make use of them as best it can, part of that
being the usability issue of not failing miserably if it fails to find the
schema
or validate unnecessarily.
That is 'tradeoff' or 'balance' of being completely 'schema free' (like many
NoSQL databases ) or completely 'Schema Aware ' or 'Schema Required'
like many XML or Hybrid database --
'Schema Agnostic' means that MarkLogic - as a product - doesn't require schemas
unless you ask for it.
But at the same time it makes as much use as possible of schemas you do supply
without putting an unnecessary burden ...
The tradeoff is a minor thing for small projects (either/or small amounts of
code or data) involved to either totally fix your data and schemas vs
writing you code to do the job a schema would do is fairly small ...
But when you start working on large projects (say millions of documents from
external sources of various quality, and lots of code) ...
having 'schema support' but it not being a chokehold on you is extremely
valuable.
You will see this architecture in many places in MarkLogic where there is a
range of 'strictness' in the API's and features,
because with every degree of required constraint - comes a cost ( to guarantee
that your data and schemas are 100% perfect).
So we leave the choice up to you to decide in what cases that matters and when
it does not.
An example being Range Indexes, there's an option to ignore or reject documents
with data values that are not valid for the type of the index.
A strict type system (like a relational DB typed field ) doesn't give you
that choice.
Sometimes you would prefer to load the 1million documents and simply ignore
the 1 badly formed value - atleast to the point of being able to query for it
...
or sometimes its more important to abort the whole thing and not let ANY of the
data in if a single field is bad.
That's "Schema Agnostic"
As for the issue with your dereferencing docs in validate, it would be useful
if you provided a small complete example of
1) The full XML file
2) The full schema
3) The XQuery used
The snippet of code you show shouldn't be failing in the way you describe so
either there is a bug or were not seeing the big picture.
-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
[email protected]
Phone: +1 812-482-5224
Cell: +1 812-630-7622
www.marklogic.com<http://www.marklogic.com/>
From: [email protected]
[mailto:[email protected]] On Behalf Of mohan mohan
Sent: Tuesday, December 09, 2014 11:13 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Schema validation incorrect for
no-namespace document
Can some body explain me what is schema agnostic ??
On Wed, Dec 10, 2014 at 4:12 AM, Will Thompson
<[email protected]<mailto:[email protected]>> wrote:
Sure, here is the schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning"
vc:minVersion="1.0" vc:maxVersion="1.1">
<xs:element name="dir">
<xs:complexType>
<xs:sequence>
<xs:element name="doc" type="type-doc" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="uri-source" type="xs:string" use="required"/>
<xs:attribute name="uri-target" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
<xs:complexType name="type-doc">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="unknown" type="type-unknown"/>
<xs:element name="deleted" type="type-deleted"/>
<xs:element name="updated" type="type-updated"/>
</xs:choice>
<xs:attribute name="uri-source" type="xs:string" use="required"/>
<xs:attribute name="uri-target" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="type-updated">
<xs:attribute name="id-source" type="xs:string" use="required"/>
<xs:attribute name="id-target" type="xs:string" use="required" />
<xs:attribute name="title-target" type="xs:string" use="required" />
<xs:attribute name="ancestor-title-target" type="xs:string"
use="required" />
<xs:attribute name="location-source" type="xs:string" use="required" />
<xs:attribute name="location-target" type="xs:string" use="required" />
<xs:attribute name="status" type="type-status" use="required" />
</xs:complexType>
<xs:complexType name="type-unknown">
<xs:attribute name="id-source" type="xs:string" use="required"/>
</xs:complexType>
<xs:complexType name="type-deleted">
<xs:attribute name="id-source" type="xs:string" use="required"/>
</xs:complexType>
<xs:simpleType name="type-status">
<xs:restriction base="xs:string">
<xs:enumeration value="unknown"/>
<xs:enumeration value="changed"/>
<xs:enumeration value="unchanged"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
And here is a simple test:
xdmp:document-insert('test-doc.xml',
<dir uri-source="/books-search/comm/flh/2014/"
uri-target="/books-search/comm/flh/2015/">
<doc uri-source="/books-search/comm/flh/2014/FLH_ch02.xml"
uri-target="/books-search/comm/flh/2015/FLH_ch02.xml">
<updated status="changed"/>
<updated id-source="/chapter/subchapter[1]/section[3]/section[3]/p"
id-target="p0f30b428fdccc"
title-target="§3.3 Rebutting community-property presumption."
ancestor-title-target="§3. Establishing Character of Marital Property"
location-source="2_a_3_3" location-target="2_a_3_3" status="unchanged"/>
</doc>
</dir>)
Followed by
validate strict { doc('test-doc.xml') }
The first <updated> element requires all the attributes from the second, so it
should fail. It doesn't for me, but after dereferencing the doc it does. Since
namespacing the docs (there aren't many) and the schema does work, that's my
current workaround (and probably better practice anyway).
Let me know if you can't reproduce it. Thanks for following up!
-Will
> On Dec 9, 2014, at 3:25 PM, Mary Holstege
> <[email protected]<mailto:[email protected]>> wrote:
>
> On Tue, 09 Dec 2014 12:42:29 -0800, Will Thompson
> <[email protected]<mailto:[email protected]>> wrote:
>
>> I recently ran into some issues validating a no-namespace document. The
>> schema was updated, which should have caused the document to fail
>> validation, but it didn't. I have been using
>> xdmp:expanded-tree-cache-clear() following schema updates, but neither that
>> nor a server restart had any affect.
>>
>> After doing some more testing, I discovered that dereferencing it before
>> validation works:
>>
>> validate strict { document { doc($uri) }/* }
>>
>> And if I namespace the document and schema, everything works as expected as
>> well. Bug, or am I missing something? This is on 7.0-4.1.
>>
>> -Will
>
> Yes, this sounds like a bug, and probably is not related to schema change
> so much as schema processing in general, since you cleared the cache.
> (Correct me if I am wrong.)
>
> Namespaced and no-namespaced schemas should work consistently.
> That said, nonamespaced schemas are a little trickier because of the
> interaction with elementForm and attributeForm, so I wouldn't be too
> shocked if there a code path somewhere that doesn't handle things
> properly.
>
> If you have a test case you'd be willing to share, I'd love to see it.
>
> //Mary
>
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general