Re: [HippoCMS-dev] Mixed content and extractors

Nico Tromp Mon, 08 Sep 2008 13:35:17 -0700

Ard,

I fiddled with the extractors and the file I tried to get into the
repository. I found two different ways to get it working


Solution 1:
Add both namespaces to the documents.

<?xml version="1.0" encoding="UTF-8"?>
<h:human xmlns:h="http://human.org"; xmlns:d="http://pet.org";>
    <h:name>John Doe</h:name>
</h:human>

and

<?xml version="1.0" encoding="UTF-8"?>
<d:pet xmlns:d="http://pet.org"; xmlns:h="http://human.org";>
    <d:name>Kitty</d:name>
</d:pet>

and the extractors are:

    <extractor classname="nl.hippo.slide.extractor.HippoSimpleXmlExtractor"
uri="/files/default.www/demo" content-type="text/xml | text/xml;
charset=UTF-8">
        <configuration>
          <instruction property="human_name" namespace="
http://hippo.nl/cms/1.0"; xpath="string(/h:human/h:name)"/>
       </configuration>
    </extractor>

    <extractor classname="nl.hippo.slide.extractor.HippoSimpleXmlExtractor"
uri="/files/default.www/demo" content-type="text/xml | text/xml;
charset=UTF-8">
        <configuration>
          <instruction property="pet_name" namespace="
http://hippo.nl/cms/1.0"; xpath="string(/d:pet/d:name)"/>
       </configuration>
    </extractor>



Solution 2:
use the MultiValueXMLPropertyExtractor

<?xml version="1.0" encoding="UTF-8"?>
<h:human xmlns:h="http://human.org";>
    <h:name>John Doe</h:name>
</h:human>

and

<?xml version="1.0" encoding="UTF-8"?>
<d:pet xmlns:d="http://pet.org";>
    <d:name>Kitty</d:name>
</d:pet>

    <extractor
classname="nl.hippo.slide.extractor.MultiValueXMLPropertyExtractor"
uri="/files/default.www/demo" content-type="text/xml | text/xml;
charset=UTF-8">
        <configuration>
          <instruction property="human_name" namespace="
http://hippo.nl/cms/1.0"; xpath="string(/h:human/h:name)"/>
       </configuration>
    </extractor>

    <extractor
classname="nl.hippo.slide.extractor.MultiValueXMLPropertyExtractor"
uri="/files/default.www/demo" content-type="text/xml | text/xml;
charset=UTF-8">
        <configuration>
          <instruction property="pet_name" namespace="
http://hippo.nl/cms/1.0"; xpath="string(/d:pet/d:name)"/>
       </configuration>
    </extractor>

Is this (the later) because it is designed to handle 0 or more occurences
for the elements as specified by the XPath expression?


I'll have to take a good look at the structure of our documents before I can
decide which solution we will be using. I hope it is possible to use
solution 2, because I don't like adding namespaces to unrelated documents
just to make the extractors happy.


Now that I got them working I got a new question for which I will post
shortly.


Kind regards

Nico Tromp



On Sun, Sep 7, 2008 at 9:59 PM, Ard Schrijvers <[EMAIL PROTECTED]>wrote:

> Hello Nico,
>
> > Ard, no need to apologize.
> >
> > The main thing I am trying to do is defining extractors for
> > two types of documents that have a different namespaces. The
> > namespace for the properties is not the issue.
> > So when I only define an extractor lets say for documents
> > that are in the http://human.org namespace adding documents
> > within the same namespace succeed, while documents wihtin the
> > namespace http://pet.org fail with a 406 error code.
> >
> > (cadaver output) "406
> > Not+Acceptable:+Exception+while+retrieving+content".
> >
> > and visa versa. So the main issue is: is it possible to
> > define extractors for two diffrerent kind of documents that
> > are stored in the same location in the repository?
>
> Yes, extractors can be defined for any type of document AFAIK. Can you
> send the extractors.xml file that is giving you the problems?
>
> Regards Ard
>
> >
> >
> > Kind regards
> >
> > Nico Tromp
> >
> > On Fri, Sep 5, 2008 at 10:18 AM, Ard Schrijvers
> > <[EMAIL PROTECTED]>wrote:
> >
> > > Hello Nico,
> > >
> > > A little addon to my explanation yesterday (it was late :-)):
> > >
> > > You were configuring your extractors a little wrong regarding
> > > namespaces: if you would do the following:
> > >
> > > <?xml version="1.0"?>
> > > <extractors xmlns:h="http://human.org/props";
> > > xmlns:p="http://pet.org/props";>
> > >     <extractor
> > > classname="nl.hippo.slide.extractor.HippoSimpleXmlExtractor"
> > > uri="/files/default.preview/livingbeings"
> > >        content-type="text/xml | application/xml | text/xml;
> > > charset=UTF-8">
> > >        <configuration>
> > >            <instruction property="pet_name"
> > > namespace="http://hippo.nl/cms/1.0"; xpath="string(/p:pet/p:name)"/>
> > > <instruction property="human_name"
> > namespace="http://hippo.nl/cms/1.0";
> > > xpath="string(/h:human/h:name)"/>
> > >         </configuration>
> > >    </extractor>
> > > </extractors>
> > >
> > > That should do the job. Note my namespace declaration in the
> > > <extractors> element.Also, you could change in the <instruction>
> > > element the namespace from 'http://hippo.nl/cms/1.0' to yours,
> > > but....why put every property in the repository with a different
> > > namespace? I wouldn't do so.
> > >
> > > Furthermore, also see in [1], try to avoid implicit
> > namespaces in the
> > > xml content. They result in extracting problems.
> > >
> > > Regards Ard
> > >
> > > [1]
> > >
> > http://www.nabble.com/Namespaces-in-documents-with-extractor-t
> > o13854562.
> > >
> > html#a13854562<http://www.nabble.com/Namespaces-in-documents-with-extr
> > > actor-to13854562.html#a13854562>
> > >
> > > >
> > > > Hello Nico,
> > > >
> > > >
> > > > > Hi all.
> > > > >
> > > > > Is it possible to define two extractors that map to the
> > > > same URI and
> > > > > that extract content from documents in two different namespaces?
> > > > >
> > > > >
> > > > > So giving the following situation.
> > > > >
> > > > >
> > > > >
> > > > > Document one looks like this:
> > > > >
> > > > > <h:human xmlns:h="http://human.org";>
> > > > >     <h:name>Nico</h:name>
> > > > > </h:human>
> > > > >
> > > > > and the second document looks like this:
> > > > >
> > > > > <p:pet xmlns:p="http://pet.org";>
> > > > >     <p:name></p:name>
> > > > > </p:pet>
> > > > >
> > > >
> > > > So, I understand correctly you have two uncorrelated separate
> > > > documents...,right?
> > > >
> > > > >
> > > > > The extractors look like this:
> > > > >
> > > > > <extractor
> > > > > classname="nl.hippo.slide.extractor.HippoSimpleXmlExtractor"
> > > > > uri="/files/default.www/livingbeings" content-type="text/xml
> > > > > | text/xml; charset=UTF-8">
> > > > >     <configuration>
> > > > >         <instruction property="human_name"
> > > > > namespace="http://human.org/props";
> > > > > xpath="string(/h:human/h:name)"/>
> > > > >     </configuration>
> > > > > </extractor>
> > > > >
> > > > > <extractor
> > > > > classname="nl.hippo.slide.extractor.HippoSimpleXmlExtractor"
> > > > > uri="/files/default.www/livingbeings" content-type="text/xml
> > > > > | text/xml; charset=UTF-8">
> > > > >     <configuration>
> > > > >         <instruction property="pet_name"
> > > > > namespace="http://pet.org/props";
> > > > > xpath="string(/p:pet/p:name)"/>
> > > > >     </configuration>
> > > > > </extractor>
> > > >
> > > > Why use your own namespaces? Think that
> > > >
> > > > <instruction property="pet_name"
> > namespace="http://hippo.nl/cms/1.0";
> > > > xpath="string(/p:pet/p:name)"/> and
> > > > <instruction property="human_name"
> > > > namespace="http://hippo.nl/cms/1.0";
> > > > xpath="string(/h:human/h:name)"/>
> > > > Should be fine.  You can also both put them together in
> > one single
> > > > <extractor>
> > > >
> > > > >
> > > > >
> > > > > I have tried different extractor configurations but
> > none of them
> > > > > seems to do the job.
> > > >
> > > > You mean you cannot see the property value being extracted?
> > > >
> > > > >
> > > > > Should it be possible to put the two documents into
> > > > > "/files/default.www/levingbeings" or a sub directory? If the
> > > >
> > > > Anywhere below /files/default.www/levingbeings would mean
> > that any
> > > > document below this path, the extractor runs for....and
> > now, I see
> > > > you have /files/default.www/levingbeings...but, are you
> > sure you are
> > > > looking at the live documents? Normally, when you edit
> > through the
> > > > hippocms, we set it to /files/default.preview (with or without
> > > > /levingbeings appended). When a document is published,
> > the extracted
> > > > properties are also published, so do not need to extract
> > for the www
> > > > again. I see the documentation has is set to www, this is
> > not very
> > > > clear :-)))
> > > >
> > > > > answer is yes, how do I configure the extractors and
> > should it be
> > > > > possible to use the same setup with
> > > > > nl.hippo.slide.extractor.MultiValueXMLPropertyExtractor
> > > > > extractor?
> > > >
> > > > No, you won't need this one
> > > >
> > > > Hope this helps,
> > > >
> > > > Regards Ard
> > > >
> > > >
> > > > >
> > > > > Kind regards
> > > > >
> > > > > Nico Tromp
> > > > > ********************************************
> > > > > Hippocms-dev: Hippo CMS development public mailinglist
> > > > >
> > > > > Searchable archives can be found at:
> > > > > MarkMail: http://hippocms-dev.markmail.org
> > > > > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> > > > >
> > > > >
> > > > ********************************************
> > > > Hippocms-dev: Hippo CMS development public mailinglist
> > > >
> > > > Searchable archives can be found at:
> > > > MarkMail: http://hippocms-dev.markmail.org
> > > > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> > > >
> > > >
> > > ********************************************
> > > Hippocms-dev: Hippo CMS development public mailinglist
> > >
> > > Searchable archives can be found at:
> > > MarkMail: http://hippocms-dev.markmail.org
> > > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> > >
> > >
> > ********************************************
> > Hippocms-dev: Hippo CMS development public mailinglist
> >
> > Searchable archives can be found at:
> > MarkMail: http://hippocms-dev.markmail.org
> > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> >
> >
> ********************************************
> Hippocms-dev: Hippo CMS development public mailinglist
>
> Searchable archives can be found at:
> MarkMail: http://hippocms-dev.markmail.org
> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>
>
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

Re: [HippoCMS-dev] Mixed content and extractors

Reply via email to