Re: Graph chain: org.apache.jena.riot exception

Rupert Westenthaler Thu, 13 Oct 2016 03:01:55 -0700

Hi Michael,

An engine does only "fail" if it can not process a contetItem it
expects to process. E.g. if you send a Image (3) will not fail, as it
does not expect to process image data. If you send a text but (3)  can
not extract topics (e.g. because an error during the classification)
it will fail.


Enhancement Engines have two methods canEnhance(..) and
computeEnhancements(..). If
* canEnhance(..) reruns that it can not enhance this contentItem it is
not a failure (and that what (3) is expected to do for Image Data.
* canEnhance(..) can also throw an EngineException. This would count
as Error but AFAIK it is not done actual engine implementations
* computeEnhancements(..) could also return normally without changing
anything in the ContentItem. This would also not count as an error but
is rarely used. Some engines will do this. e.g. if you ask the
language detection engine to detect the language for an blank text
* computeEnhancements(..) can also return an EngineException. This
counts as an error

Typical engine implantation will return CANNOT_ENHANCE for
canEnhance(..) requests if pre-requirements are not fulfilled (e.g.
most NLP processing engines require a language annotation). So the
OPTIONAL flag is useful to define if a chain should recover from an
abnormal execution of an engine. It is not intended to be used for
assertions on enhancement results (e.g. requiring that a language
annotation is present after langdetect , or a plain text content part
is present after Tika)

best
Rupert

On Thu, Oct 13, 2016 at 11:41 AM, Michal Krajňanský
<michal.krajnan...@gmail.com> wrote:
> Hi Rupert,
>
> Good, what you are saying confirms what I understood from documentation.
>
> I am thinking about a scenario like this:
>
> (1) Tika -- get document type (mandatory)
>       (2) for images -- find objects (optional, depends on 1)
>       (3) for text -- extract topics (optional, depends on 1)
> (4) do something with the extracted metadata (mandatory, depends on (2) OR
> (3) )
>
> I understand that both (2) and (3) must be optional, as only one of them
> will be executed for a single input doc.
>
> I am also thinking, how to best check the "failure", i.e. if the input can
> not be processed with neither (2) nor (3). Guess this needs to be checked
> in the internal logic of (4) then... Is it a good practice to analyze the
> metadata graph in the canEnhance() function?
>
> Thank you again!
>
> Best,
> Michal
>
>
>
> On Thu, Oct 13, 2016 at 11:22 AM Rupert Westenthaler <
> rupert.westentha...@gmail.com> wrote:
>
>> On Thu, Oct 13, 2016 at 11:12 AM, Michal Krajňanský
>> <michal.krajnan...@gmail.com> wrote:
>> > Hi Rupert,
>> >
>> > Thank you for the advice, this is very useful. By the name I assumed that
>> > chain list is only linear, but now I see that you can specify multiple
>> > dependencies of a single node.
>> >
>> > Anyways, I was able to make GraphChain work, with correcting the syntax
>> as
>> > A. Soroka suggested -- thank you as well!
>> >
>> > FYI, my aim is to understand how the branching works and how the
>> "optional"
>> > attribute behaves, when there are multiple mutually exclusive paths to
>> the
>> > same "sink" node (which is not optional).
>>
>> Optional just means that if an engine fails the execution of the chain
>> is continued.
>>
>> Branching is only used to improve performance by executing several
>> engines at the same time (if the engines do support it). A good
>> example ist Entity Linking against multiple Vocabularies. If you
>> include several entity linking engines in your chain they will be
>> executed concurrently in case you use a WeightedChain. In case of a
>> ListChain they will be executed in the order as configured in the
>> list.
>>
>> WeightedChain "sorts" engine executions based on metadata provided by
>> the engine implementations. This allows to (1) correctly sort engine
>> execution and (2) also allows for parallel execution of engines of the
>> same type. If one want (needs) fine grained control over when an
>> engine can run you need to use the GraphChain.
>>
>> In a graph chain an engine depending on an optional engine will wait
>> until the optional engine has completed. It will also be executed if
>> this engine fails.
>>
>> best
>> Rupert
>>
>> >
>> > Best Regards,
>> > Michal
>> >
>> > On Thu, Oct 13, 2016 at 10:32 AM Rupert Westenthaler <
>> > rupert.westentha...@gmail.com> wrote:
>> >
>> >> Hi Michael,
>> >>
>> >> I would suggest you try to define the execution plan by using the
>> >> chain list [1]. The best would be to do it via the UI provided by the
>> >> Felix Webconsole. By doing so you do not need to deal with RDF syntax
>> >> directly.
>> >>
>> >> Anyways the GraphChain is an advanced feature that allows fine grained
>> >> control over chain execution. In most of the cases configuring a
>> >> WeightedChain[2] is sufficient.
>> >>
>> >> best
>> >> Rupert
>> >>
>> >>
>> >> [1]
>> >>
>> https://stanbol.apache.org/docs/trunk/components/enhancer/chains/graphchain#chainlist
>> >> [2]
>> >>
>> https://stanbol.apache.org/docs/trunk/components/enhancer/chains/weightedchain.html
>> >>
>> >> On Mon, Oct 10, 2016 at 3:45 PM, A. Soroka <aj...@virginia.edu> wrote:
>> >> > Ah, I see. That is not at all parseable Turtle RDF. It really is just
>> a
>> >> list of URIs. You named the file ".ttl", but it is not proper Turtle:
>> >> >
>> >> > https://en.wikipedia.org/wiki/Turtle_(syntax)#Example
>> >> >
>> >> > There is a great deal of punctuation missing. I'm not at all sure why
>> >> the example in the docs is printed that way. It won't work.
>> >> >
>> >> > Perhaps someone who knows more about how the documentation is
>> generated
>> >> can comment? Perhaps that example was meant to acquire some formatting
>> >> before publication?
>> >> >
>> >> > ---
>> >> > A. Soroka
>> >> > The University of Virginia Library
>> >> >
>> >> >> On Oct 10, 2016, at 8:32 AM, Michal Krajňanský <
>> >> michal.krajnan...@gmail.com> wrote:
>> >> >>
>> >> >> Dear Mr. Soroka,
>> >> >>
>> >> >> Sorry for the mistake, trying again -- please find the attached file.
>> >> >>
>> >> >> As you can find, I took over the example from Stanbol web:
>> >> >>
>> >>
>> https://stanbol.apache.org/docs/trunk/components/enhancer/chains/executionplan.html
>> >> >> and copied the prefix declarations from here:
>> >> >>
>> >>
>> https://stanbol.apache.org/docs/trunk/components/enhancer/chains/graphchain.html
>> >> >>
>> >> >> Thank you for looking into this!
>> >> >>
>> >> >> Michal Krajnansky
>> >> >>
>> >> >>
>> >> >> On Mon, Oct 10, 2016 at 2:23 PM A. Soroka <aj...@virginia.edu>
>> wrote:
>> >> >> That RDF quote didn't come through the mailing list (at least not for
>> >> me). It became a list of URIs. Perhaps you can try pasting it as very
>> plain
>> >> text? Or you could put it in Gist or some other pastebin service?
>> >> >>
>> >> >>
>> >> >>
>> >> >> ---
>> >> >>
>> >> >> A. Soroka
>> >> >>
>> >> >> The University of Virginia Library
>> >> >>
>> >> >>
>> >> >>
>> >> >> > On Oct 10, 2016, at 5:22 AM, Michal Krajňanský <
>> >> michal.krajnan...@gmail.com> wrote:
>> >> >>
>> >> >> >
>> >> >>
>> >> >> > Dear Stanbol Users,
>> >> >>
>> >> >> >
>> >> >>
>> >> >> > I have been experimenting with the system and I would appreciate
>> your
>> >> help
>> >> >>
>> >> >> > with the following problem.
>> >> >>
>> >> >> >
>> >> >>
>> >> >> > I am trying to implement a custom Graph chain like the example
>> here:
>> >> >>
>> >> >> >
>> >>
>> https://stanbol.apache.org/docs/trunk/components/enhancer/chains/executionplan.html
>> >> >>
>> >> >> > and here:
>> >> >>
>> >> >> >
>> >>
>> https://stanbol.apache.org/docs/trunk/components/enhancer/chains/graphchain.html
>> >> >>
>> >> >> >
>> >> >>
>> >> >> > However, when trying to invoke the chain, I get the following
>> >> exception:
>> >> >>
>> >> >> >
>> >> >>
>> >> >> > *org.apache.jena.riot [line: 6, col: 1 ] Undefined prefix: urn*
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> > My execution plan RDF looks like this:
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> > *@prefix xsd: <http://www.w3.org/2001/XMLSchema#
>> >> >>
>> >> >> > <http://www.w3.org/2001/XMLSchema#>> .@prefix rdf:
>> >> >>
>> >> >> > <http://www.w3.org/1999/02/22-rdf-syntax-ns#
>> >> >>
>> >> >> > <http://www.w3.org/1999/02/22-rdf-syntax-ns#>> .@prefix ep:
>> >> >>
>> >> >> > <http://stanbol.apache.org/ontology/enhancer/executionplan#
>> >> >>
>> >> >> > <http://stanbol.apache.org/ontology/enhancer/executionplan#>>
>> >> .@prefix ehp:
>> >> >>
>> >> >> > <http://stanbol.apache.org/ontology/enhancementproperties#
>> >> >>
>> >> >> > <http://stanbol.apache.org/ontology/enhancementproperties#>>
>> >> >>
>> >> >> > .urn:execPlan    rdf:type ep:ExecutionPlan    ep:hasExecutionNode
>> >> >>
>> >> >> > urn:node1, urn:node2, urn:node3, urn:node4, urn:node5    ep:chain
>> >> >>
>> >> >> > "demoChain"urn:node1    rdf:type stanbol:ExecutionNode
>> >> >>
>> >> >> > ep:inExecutionPlan urn:execPlan    ep:engine langIdurn:node2
>> >> rdf:type
>> >> >>
>> >> >> > ep:ExecutionNode    ep:inExecutionPlan urn:execPlan    ep:dependsOn
>> >> >>
>> >> >> > urn:node1    ep:engine nerurn:node3    rdf:type ep:ExecutionNode
>> >> >>
>> >> >> > ep:inExecutionPlan urn:execPlan    ep:dependsOn urn:node1
>> ep:engine
>> >> >>
>> >> >> > dbpediaLinkingurn:node4    rdf:type ep:ExecutionNode
>> >> ep:inExecutionPlan
>> >> >>
>> >> >> > urn:execPlan    ep:dependsOn urn:node1    ep:engine
>> >> >>
>> >> >> > geonamesLinkingurn:node5    rdf:type ep:ExecutionNode
>> >> ep:inExecutionPlan
>> >> >>
>> >> >> > urn:execPlan    ep:engine zemanta    ep:optional
>> "true"^^xsd:boolean*
>> >> >>
>> >> >> >
>> >> >>
>> >> >> > Thank you in advance for your kind help.
>> >> >>
>> >> >> >
>> >> >>
>> >> >> > Best Regards,
>> >> >>
>> >> >> >
>> >> >>
>> >> >> >
>> >> >>
>> >> >> > Michal Krajnansky
>> >> >>
>> >> >>
>> >> >>
>> >> >> <example-chain.ttl>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westentha...@gmail.com
>> >> | Bodenlehenstraße 11                              ++43-699-11108907
>> <+43%20699%2011108907>
>> >> <+43%20699%2011108907>
>> >> | A-5500 Bischofshofen
>> >> | REDLINK.CO
>> >>
>> ..........................................................................
>> >> | http://redlink.co/
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westentha...@gmail.com
>> | Bodenlehenstraße 11                              ++43-699-11108907
>> <+43%20699%2011108907>
>> | A-5500 Bischofshofen
>> | REDLINK.CO
>> ..........................................................................
>> | http://redlink.co/
>>



-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO 
..........................................................................
| http://redlink.co/

Re: Graph chain: org.apache.jena.riot exception

Reply via email to