Let's have a go at implementing it! You know my thoughts (make it like OODT ;) 
)\



On 2/5/18, 8:37 AM, "Nick Burch" <apa...@gagravarr.org> wrote:

    Ping - anyone got any thoughts on the proposed metadata parser stuff, and 
    any ideas on the content part?
    
    On Tue, 2 Jan 2018, Nick Burch wrote:
    > On Thu, 26 Oct 2017, Chris Mattmann wrote:
    >> On collision, the precedence order defines what key takes precedence and 
    >> _overwrites_ the other. Overwrite is but one option (you could save 
*all* 
    >> the values it’s a multi-valued key structure so…)
    >
    > OK, I think that's fine. I've had a go at updating the wiki for the 
metadata 
    > case:
    > 
https://wiki.apache.org/tika/CompositeParserDiscussion#Supplementary.2FAdditive
    > And example Tika Config settings for it
    > https://wiki.apache.org/tika/CompositeParserDiscussion#line-20
    > If people are happy with how that sounds/looks, I can have a stab at 
    > implementing it, as I *think* it's quite easy
    >
    >
    > However... that still leaves the Context (XHTML SAX events) case to solve!
    >
    > Anyone have any ideas on how we can append to or cancel/reset the Content 
    > Handler series of SAX events when we move onto a second+ parser for a 
file?
    >
    > Thanks
    > Nick
    >
    >> On 10/26/17, 9:43 AM, "Nick Burch" <apa...@gagravarr.org> wrote:
    >>
    >>    On Thu, 26 Oct 2017, Chris Mattmann wrote:
    >>    > My general approach to conflicting metadata is simply to define
    >>    > precedence orders.
    >>    >
    >>    > For example here is one documented from OODT:
    >>    >
    >>    > 
    >> 
https://cwiki.apache.org/confluence/display/OODT/Understanding+CAS-PGE+Metadata+Precendence
    >>    >
    >>    > We can do similar things with Tika, e.g.,
    >>    >
    >>    > [CoreMetadata.PROPERTIES]
    >>    > [ImageParser.METADATA]
    >>    > [TikaOCR.METADATA]
    >>
    >>    What happens if two different parsers both output the same bit of 
    >> metadata
    >>    though? eg Tim's example of one giving dc:creator of Tim and the 
second
    >>    giving dc:creator of Chris?
    >> 
    >>
    >>    Secondly, what about the XHTML sax events stream? I think that's 
    >> probably
    >>    the harder case...
    >>
    >>    Nick


Reply via email to