Re: Named entity coref resolution based on dbpedia categories and rdf:type

Cristian Petroaca Fri, 28 Mar 2014 06:14:17 -0700

Examples :

1. Group membership :
    a. Spatial membership :


        "Microsoft anounced its 2013 earnings. <coref>The Richmond-based
company</coref> made huge profits."

    b. Organisational membership :

       "Mick Jagger started a new solo album. <coref>The Rolling Stones
singer</coref> did not say what the theme will be."

2. Functional membership :

   "Allianz announced its 2013 earnings. <coref>The financial services
company</coref> made a huge profit."

3.  If no matches were found for the current NER with rules from above then
if the yago:class which matched has more than 2 nouns then we also consider
this a good co-reference but with a lower confidence maybe.

   "Boris Becker will take part in a demonstrative tennis match. <coref>The
former tennis player</coref> will play again after 10 years."


2014-03-28 12:22 GMT+02:00 Rupert Westenthaler <
[email protected]>:

> Hi Cristian, all
>
> Looks good to me, nut I am not sure if I got everything. If you could
> provide example texts where those rules apply it would make it much
> easier to understand.
>
> Instead of using dbpedia properties you should define your own domain
> model (ontology). You can than align the dbpedia properties to your
> model. This will allow it to apply this approach also to knowledge
> bases other than dbpedia.
>
> For people new to this thread: The above message adds to the
> suggestion first made by Cristian on 4th February. Also the following
> 4 messages (until 7th Feb) provide additional context.
>
> best
> Rupert
>
>
> On Fri, Mar 28, 2014 at 9:23 AM, Cristian Petroaca
> <[email protected]> wrote:
> > Hi guys,
> >
> > After Rupert's last suggestions related to this enhancement engine I
> > devised a more comprehensive algorithm for matching the noun phrases
> > against the NER properties.Please take a look and let me know what you
> > think. Thanks.
> >
> > The following rules will be applied to every noun phrase in order to find
> > co-references:
> >
> > 1. For each NER prior to the current noun phrase in the text match the
> > yago:class label to the contents of the noun phrase.
> >
> > For the NERs which have a yago:class which matches, apply:
> >
> > 2. Group membership rules :
> >
> >     a. spatial membership : the NER is part of a Location. If the noun
> > phrase contains a LOCATION or a demonym then check any location
> properties
> > of the matching NER.
> >
> >     If matching NER is a :
> >     - person, match against :birthPlace, :region, :nationality
> >     - organisation, match against :foundationPlace, :locationCity,
> > :location, :hometown
> >     - place, match against :country, :subdivisionName, :location,
> >
> >     Ex: The Italian President, The Richmond-based company
> >
> >     b. organisational membership : the NER is part of an Organisation. If
> > the noun phrase contains an ORGANISATION then check the following
> > properties of the maching NER:
> >
> >     If matching NER is :
> >     - person, match against :occupation, :associatedActs
> >     - organisation ?
> >     - location ?
> >
> > Ex: The Microsoft executive, The Pink Floyd singer
> >
> > 3. Functional description rule: the noun phrase describes what the NER
> does
> > conceptually.
> > If there are no NERs in the noun phrase then match the following
> properties
> > of the matching NER to the contents of the noun phrase (aside from the
> > nouns which are part of the yago:class) :
> >
> >    If NER is a:
> >    - person ?
> >    - organisation : , match against :service, :industry, :genre
> >    - location ?
> >
> > Ex:  The software company.
> >
> > 4. If no matches were found for the current NER with rules 2 or 3 then if
> > the yago:class which matched has more than 2 nouns then we also consider
> > this a good co-reference but with a lower confidence maybe.
> >
> > Ex: The former tennis player, the theoretical physicist.
> >
> > 5. Based on the number of nouns which matched we create a confidence
> level.
> > The number of matched nouns cannot be lower than 2 and we must have a
> > yago:class match.
> >
> > For all NERs which got to this point, select the closest ones in the text
> > to the noun phrase which matched against the same properties (yago:class
> > and dbpedia) and mark them as co-references.
> >
> > Note: all noun phrases need to be lemmatized before all of this in case
> > there are any plurals.
> >
> >
> > 2014-03-25 20:50 GMT+02:00 Cristian Petroaca <
> [email protected]>:
> >
> >> That worked. Thanks.
> >>
> >> So, there are no exceptions during the startup of the launcher.
> >> The component tab in the felix console shows 6 WeightedChains the first
> >> time, including the default one but after my changes and a restart there
> >> are only 5 - the default one is missing altogether.
> >>
> >>
> >> 2014-03-24 20:18 GMT+02:00 Rupert Westenthaler <
> >> [email protected]>:
> >>
> >> Hi Cristian,
> >>>
> >>> I do see the same problem since last Friday. The solution as mentions
> >>> by [1] works for me.
> >>>
> >>>     mvn -Djsse.enableSNIExtension=false {goals}
> >>>
> >>> No Idea why https connections to github do currently cause this. I
> >>> could not find anything related via Google. So I suggest to use the
> >>> system property for now. If this persists for longer we can adapt the
> >>> build files accordingly.
> >>>
> >>> best
> >>> Rupert
> >>>
> >>>
> >>>
> >>>
> >>> [1]
> >>>
> http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0
> >>>
> >>> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca
> >>> <[email protected]> wrote:
> >>> > I did a clean on the whole project and now I wanted to do another
> "mvn
> >>> > clean install" but I am getting this :
> >>> >
> >>> > "[INFO]
> >>> >
> ------------------------------------------------------------------------
> >>> > [ERROR] Failed to execute goal
> >>> > org.apache.maven.plugins:maven-antrun-plugin:1.6:
> >>> > run (download) on project org.apache.stanbol.data.opennlp.lang.es:
> An
> >>> Ant
> >>> > BuildE
> >>> > xception has occured: The following error occurred while executing
> this
> >>> > line:
> >>> > [ERROR]
> >>> >
> C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
> >>> > 3: Failed to copy
> >>> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
> >>> > 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin to
> >>> > C:\Data\Pr
> >>> >
> >>>
> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
> >>> > data\opennlp\es-pos-maxent.bin due to
> javax.net.ssl.SSLProtocolException
> >>> > handshake alert : unrecognized_name"
> >>> >
> >>> >
> >>> >
> >>> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
> >>> > [email protected]>:
> >>> >
> >>> >> Hi Cristian,
> >>> >>
> >>> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
> >>> >> <[email protected]> wrote:
> >>> >> >
> >>> >>
> >>>
> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
> >>> >> > service.ranking=I"-2147483648"
> >>> >> > stanbol.enhancer.chain.name="default"
> >>> >>
> >>> >> Does look fine to me. Do you see any exception during the startup of
> >>> >> the launcher. Can you check the status of this component in the
> >>> >> component tab of the felix web console [1] (search for
> >>> >> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If
> >>> >> you have multiple you can find the correct one by comparing the
> >>> >> "Properties" with those in the configuration file.
> >>> >>
> >>> >> I guess that the according service is in the 'unsatisfied' as you do
> >>> >> not see it in the web interface. But if this is the case you should
> >>> >> also see the according exception in the log. You can also manually
> >>> >> stop/start the component. In this case the exception should be
> >>> >> re-thrown and you do not need to search the log for it.
> >>> >>
> >>> >> best
> >>> >> Rupert
> >>> >>
> >>> >>
> >>> >> [1] http://localhost:8080/system/console/components
> >>> >>
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
> >>> >> [email protected]
> >>> >> >>:
> >>> >> >
> >>> >> >> Hi Cristian,
> >>> >> >>
> >>> >> >> you can not send attachments to the list. Please copy the
> contents
> >>> >> >> directly to the mail
> >>> >> >>
> >>> >> >> thx
> >>> >> >> Rupert
> >>> >> >>
> >>> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
> >>> >> >> <[email protected]> wrote:
> >>> >> >> > The config attached.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
> >>> >> >> > <[email protected]>:
> >>> >> >> >
> >>> >> >> >> Hi Cristian,
> >>> >> >> >>
> >>> >> >> >> can you provide the contents of the chain after your
> >>> modifications?
> >>> >> >> >> Would be interesting to test why the chain is no longer active
> >>> after
> >>> >> >> >> the restart.
> >>> >> >> >>
> >>> >> >> >> You can find the config file in the 'stanbol/fileinstall'
> folder.
> >>> >> >> >>
> >>> >> >> >> best
> >>> >> >> >> Rupert
> >>> >> >> >>
> >>> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
> >>> >> >> >> <[email protected]> wrote:
> >>> >> >> >> > Related to the default chain selection rules : before
> restart I
> >>> >> had a
> >>> >> >> >> > chain
> >>> >> >> >> > with the name 'default' as in I could access it via
> >>> >> >> >> > enhancer/chain/default.
> >>> >> >> >> > Then I just added another engine to the 'default' chain. I
> >>> assumed
> >>> >> >> that
> >>> >> >> >> > after the restart the chain with the 'default' name would be
> >>> >> >> persisted.
> >>> >> >> >> > So
> >>> >> >> >> > the first rule should have been applied after the restart as
> >>> well.
> >>> >> But
> >>> >> >> >> > instead I cannot reach it via enhancer/chain/default anymore
> >>> so its
> >>> >> >> >> > gone.
> >>> >> >> >> > Anyway, this is not a big deal, it's not blocking me in any
> >>> way, I
> >>> >> >> just
> >>> >> >> >> > wanted to understand where the problem is.
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
> >>> >> >> >> > <[email protected]
> >>> >> >> >> >>:
> >>> >> >> >> >
> >>> >> >> >> >> Hi Cristian
> >>> >> >> >> >>
> >>> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
> >>> >> >> >> >> <[email protected]> wrote:
> >>> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool
> >>> >> >> >> >> >
> >>> >> >> >> >> > 2. I start the stable launcher -> create a new instance
> of
> >>> the
> >>> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At this
> >>> point
> >>> >> >> >> >> > everything
> >>> >> >> >> >> > looks good and works ok.
> >>> >> >> >> >> > After I restart the server the default chain is gone and
> >>> >> instead I
> >>> >> >> >> >> > see
> >>> >> >> >> >> this
> >>> >> >> >> >> > in the enhancement chains page : all-active (default, id:
> >>> 149,
> >>> >> >> >> >> > ranking:
> >>> >> >> >> >> 0,
> >>> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not contain
> >>> the
> >>> >> >> >> >> > 'default'
> >>> >> >> >> >> > word before the restart.
> >>> >> >> >> >> >
> >>> >> >> >> >>
> >>> >> >> >> >> Please note the default chain selection rules as described
> at
> >>> [1].
> >>> >> >> You
> >>> >> >> >> >> can also access chains chains under
> >>> '/enhancer/chain/{chain-name}'
> >>> >> >> >> >>
> >>> >> >> >> >> best
> >>> >> >> >> >> Rupert
> >>> >> >> >> >>
> >>> >> >> >> >> [1]
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
> >>> >> >> >> >>
> >>> >> >> >> >> > It looks like the config files are exactly what I need.
> >>> Thanks.
> >>> >> >> >> >> >
> >>> >> >> >> >> >
> >>> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
> >>> >> >> >> >> [email protected]
> >>> >> >> >> >> >>:
> >>> >> >> >> >> >
> >>> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
> >>> >> >> >> >> >> <[email protected]> wrote:
> >>> >> >> >> >> >> > Thanks Rupert.
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> > A couple more questions/issues :
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing this
> >>> in the
> >>> >> >> >> >> >> > console
> >>> >> >> >> >> >> > output :
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains
> get
> >>> >> messed
> >>> >> >> >> >> >> > up. I
> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine to
> it
> >>> so
> >>> >> there
> >>> >> >> >> >> >> > are
> >>> >> >> >> >> 11
> >>> >> >> >> >> >> > engines in it. After the restart this chain now
> contains
> >>> >> around
> >>> >> >> 23
> >>> >> >> >> >> >> engines
> >>> >> >> >> >> >> > in total.
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> I was not able to replicate this. What I tried was
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> (1) start up the stable launcher
> >>> >> >> >> >> >> (2) add an additional engine to the default chain
> >>> >> >> >> >> >> (3) restart the launcher
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> The default chain was not changed after (2) and (3). So
> I
> >>> would
> >>> >> >> need
> >>> >> >> >> >> >> further information for knowing why this is happening.
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> Generally it is better to create you own chain instance
> as
> >>> >> >> modifying
> >>> >> >> >> >> >> one that is provided by the default configuration. I
> would
> >>> also
> >>> >> >> >> >> >> recommend that you keep your test configuration in text
> >>> files
> >>> >> and
> >>> >> >> to
> >>> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so
> >>> >> prevent
> >>> >> >> you
> >>> >> >> >> >> >> from manually entering the configuration after a
> software
> >>> >> update.
> >>> >> >> >> >> >> The
> >>> >> >> >> >> >> production-mode section [3] provides information on how
> to
> >>> do
> >>> >> >> that.
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> best
> >>> >> >> >> >> >> Rupert
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
> >>> >> >> >> >> >> [2] http://svn.apache.org/r1576623
> >>> >> >> >> >> >> [3]
> http://stanbol.apache.org/docs/trunk/production-mode
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> > ERROR: Bundle
> >>> org.apache.stanbol.enhancer.engine.topic.web
> >>> >> >> [153]:
> >>> >> >> >> >> Error
> >>> >> >> >> >> >> > starting
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >>
> >>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
> >>> >> >> >> >> >> > (org.osgi
> >>> >> >> >> >> >> > .framework.BundleException: Unresolved constraint in
> >>> bundle
> >>> >> >> >> >> >> > org.apache.stanbol.e
> >>> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve
> 153.0:
> >>> >> missing
> >>> >> >> >> >> >> > requirement [15
> >>> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
> >>> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
> >>> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved
> >>> constraint in
> >>> >> >> >> >> >> > bundle
> >>> >> >> >> >> >> > org.apache.s
> >>> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to
> resolve
> >>> >> 153.0:
> >>> >> >> >> >> missing
> >>> >> >> >> >> >> > require
> >>> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
> >>> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
> >>> >> >> >> >> >> > )
> >>> >> >> >> >> >> >         at
> >>> >> >> >> >> >>
> >>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
> >>> >> >> >> >> >> >         at
> >>> >> >> >> >>
> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
> >>> >> >> >> >> >> >         at
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >>
> >>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >         at
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >>
> >>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
> >>> >> >> >> >> >> > )
> >>> >> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> > Despite of this the server starts fine and I can use
> the
> >>> >> >> enhancer
> >>> >> >> >> >> fine.
> >>> >> >> >> >> >> Do
> >>> >> >> >> >> >> > you guys see this as well?
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains
> get
> >>> >> messed
> >>> >> >> >> >> >> > up. I
> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine to
> it
> >>> so
> >>> >> there
> >>> >> >> >> >> >> > are
> >>> >> >> >> >> 11
> >>> >> >> >> >> >> > engines in it. After the restart this chain now
> contains
> >>> >> around
> >>> >> >> 23
> >>> >> >> >> >> >> engines
> >>> >> >> >> >> >> > in total.
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
> >>> >> >> >> >> >> [email protected]
> >>> >> >> >> >> >> >>:
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >> Hi Cristian,
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> NER Annotations are typically available as both
> >>> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and
>  fise:TextAnnotation
> >>> [1]
> >>> >> in
> >>> >> >> the
> >>> >> >> >> >> >> >> enhancement metadata. As you are already accessing
> the
> >>> >> >> >> >> >> >> AnayzedText I
> >>> >> >> >> >> >> >> would prefer using the
>  NlpAnnotations.NER_ANNOTATION.
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> best
> >>> >> >> >> >> >> >> Rupert
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> [1]
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
> >>> >> >> >> >> >> >> <[email protected]> wrote:
> >>> >> >> >> >> >> >> > Thanks.
> >>> >> >> >> >> >> >> > I assume I should get the Named entities using the
> >>> same
> >>> >> but
> >>> >> >> >> >> >> >> > with
> >>> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
> >>> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
> >>> >> >> >> >> >> >> > [email protected]>:
> >>> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >> Hallo Cristian,
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement
> >>> results.
> >>> >> >> You
> >>> >> >> >> >> need to
> >>> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> here is some demo code you can use in the
> >>> >> computeEnhancement
> >>> >> >> >> >> method
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >>         AnalysedText at =
> >>> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
> >>> >> >> >> >> ci,
> >>> >> >> >> >> >> >> true);
> >>> >> >> >> >> >> >> >>         Iterator<? extends Section> sections =
> >>> >> >> >> >> >> >> >> at.getSentences();
> >>> >> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as
> single
> >>> >> >> sentence
> >>> >> >> >> >> >> >> >>             sections =
> >>> >> Collections.singleton(at).iterator();
> >>> >> >> >> >> >> >> >>         }
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >>         while(sections.hasNext()){
> >>> >> >> >> >> >> >> >>             Section section = sections.next();
> >>> >> >> >> >> >> >> >>             Iterator<Span> chunks =
> >>> >> >> >> >> >> >> >>
> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
> >>> >> >> >> >> >> >> >>             while(chunks.hasNext()){
> >>> >> >> >> >> >> >> >>                 Span chunk = chunks.next();
> >>> >> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
> >>> >> >> >> >> >> >> >>
> >>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
> >>> >> >> >> >> >> >> >>                 if(phrase.value().getCategory() ==
> >>> >> >> >> >> >> >> LexicalCategory.Noun){
> >>> >> >> >> >> >> >> >>                     log.info(" - NounPhrase
> [{},{}]
> >>> {}",
> >>> >> >> new
> >>> >> >> >> >> >> Object[]{
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
> >>> >> >> >> >> >> >> >>                 }
> >>> >> >> >> >> >> >> >>             }
> >>> >> >> >> >> >> >> >>         }
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> hope this helps
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> best
> >>> >> >> >> >> >> >> >> Rupert
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> [1]
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
> >>> >> >> >> >> >> >> >> <[email protected]> wrote:
> >>> >> >> >> >> >> >> >> > I started to implement the engine and I'm having
> >>> >> problems
> >>> >> >> >> >> >> >> >> > with
> >>> >> >> >> >> >> getting
> >>> >> >> >> >> >> >> >> > results for noun phrases. I modified the
> "default"
> >>> >> >> weighted
> >>> >> >> >> >> chain
> >>> >> >> >> >> >> to
> >>> >> >> >> >> >> >> also
> >>> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample
> text
> >>> :
> >>> >> >> "Angela
> >>> >> >> >> >> Merkel
> >>> >> >> >> >> >> >> >> visted
> >>> >> >> >> >> >> >> >> > China. The german chancellor met with various
> >>> people".
> >>> >> I
> >>> >> >> >> >> expected
> >>> >> >> >> >> >> that
> >>> >> >> >> >> >> >> >> the
> >>> >> >> >> >> >> >> >> > RDF XML output would contain some info about the
> >>> noun
> >>> >> >> >> >> >> >> >> > phrases
> >>> >> >> >> >> but I
> >>> >> >> >> >> >> >> >> cannot
> >>> >> >> >> >> >> >> >> > see any.
> >>> >> >> >> >> >> >> >> > Could you point me to the correct way to
> generate
> >>> the
> >>> >> noun
> >>> >> >> >> >> phrases?
> >>> >> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >> > Thanks,
> >>> >> >> >> >> >> >> >> > Cristian
> >>> >> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
> >>> >> >> >> >> >> >> >> [email protected]>:
> >>> >> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >> >> Opened
> >>> >> >> https://issues.apache.org/jira/browse/STANBOL-1279
> >>> >> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
> >>> >> >> >> >> >> >> >> [email protected]>
> >>> >> >> >> >> >> >> >> >> :
> >>> >> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> >> Hi Rupert,
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll
> also
> >>> >> take a
> >>> >> >> >> >> >> >> >> >>> look
> >>> >> >> >> >> at
> >>> >> >> >> >> >> >> Yago.
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked about
> >>> here.
> >>> >> It
> >>> >> >> >> >> >> >> >> >>> will
> >>> >> >> >> >> >> >> probably
> >>> >> >> >> >> >> >> >> >>> have just a draft-like description for now and
> >>> will
> >>> >> be
> >>> >> >> >> >> >> >> >> >>> updated
> >>> >> >> >> >> >> as I
> >>> >> >> >> >> >> >> go
> >>> >> >> >> >> >> >> >> >>> along.
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>> Thanks,
> >>> >> >> >> >> >> >> >> >>> Cristian
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert
> Westenthaler <
> >>> >> >> >> >> >> >> >> >>> [email protected]>:
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>> Hi Cristian,
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You
> should
> >>> have
> >>> >> a
> >>> >> >> >> >> >> >> >> >>>> look at
> >>> >> >> >> >> >> Yago2
> >>> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago
> taxonomy
> >>> is
> >>> >> much
> >>> >> >> >> >> better
> >>> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia.
> Mapping
> >>> >> >> >> >> >> >> >> >>>> suggestions of
> >>> >> >> >> >> >> >> dbpedia
> >>> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia
> and
> >>> >> yago2
> >>> >> >> do
> >>> >> >> >> >> >> provide
> >>> >> >> >> >> >> >> >> >>>> mappings [2] and [3]
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
> >>> >> >> >> >> >> >> >> >>>> > <[email protected]>:
> >>> >> >> >> >> >> >> >> >>>> >>
> >>> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The
> >>> >> Redmond's
> >>> >> >> >> >> >> >> >> >>>> >> company
> >>> >> >> >> >> >> made
> >>> >> >> >> >> >> >> a
> >>> >> >> >> >> >> >> >> >>>> >> huge profit".
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial
> >>> contexts
> >>> >> >> are
> >>> >> >> >> >> >> >> >> >>>> very
> >>> >> >> >> >> >> >> >> >>>> important as they tend to be often used for
> >>> >> >> referencing.
> >>> >> >> >> >> >> >> >> >>>> So I
> >>> >> >> >> >> >> would
> >>> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial
> context.
> >>> For
> >>> >> >> >> >> >> >> >> >>>> spatial
> >>> >> >> >> >> >> >> Entities
> >>> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for
> other
> >>> >> (like a
> >>> >> >> >> >> Person,
> >>> >> >> >> >> >> >> >> >>>> Company) you could use relations to spatial
> >>> entities
> >>> >> >> >> >> >> >> >> >>>> define
> >>> >> >> >> >> >> their
> >>> >> >> >> >> >> >> >> >>>> spatial context. This context could than be
> >>> used to
> >>> >> >> >> >> >> >> >> >>>> correctly
> >>> >> >> >> >> >> link
> >>> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the
> "spatial"
> >>> >> >> context
> >>> >> >> >> >> >> >> >> >>>> of
> >>> >> >> >> >> each
> >>> >> >> >> >> >> >> >> >>>> entity (basically relation to entities that
> are
> >>> >> cities,
> >>> >> >> >> >> regions,
> >>> >> >> >> >> >> >> >> >>>> countries) as a separate dimension, because
> >>> those
> >>> >> are
> >>> >> >> >> >> >> >> >> >>>> very
> >>> >> >> >> >> often
> >>> >> >> >> >> >> >> used
> >>> >> >> >> >> >> >> >> >>>> for coreferences.
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> [1]
> http://www.mpi-inf.mpg.de/yago-naga/yago/
> >>> >> >> >> >> >> >> >> >>>> [2]
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
> >>> >> >> >> >> >> >> >> >>>> [3]
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian
> >>> Petroaca
> >>> >> >> >> >> >> >> >> >>>> <[email protected]> wrote:
> >>> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories for
> each
> >>> >> entity,
> >>> >> >> >> >> >> >> >> >>>> > in
> >>> >> >> >> >> this
> >>> >> >> >> >> >> >> case
> >>> >> >> >> >> >> >> >> for
> >>> >> >> >> >> >> >> >> >>>> > Microsoft we have :
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
> >>> >> >> >> >> >> >> >> >>>> > category:Microsoft
> >>> >> >> >> >> >> >> >> >>>> >
> >>> category:Software_companies_of_the_United_States
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> category:Software_companies_based_in_Washington_(state)
> >>> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
> >>> >> >> >> >> >> >> >> >>>> >
> >>> category:1975_establishments_in_the_United_States
> >>> >> >> >> >> >> >> >> >>>> >
> >>> category:Companies_based_in_Redmond,_Washington
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >>
> >>> >> >>
> category:Multinational_companies_headquartered_in_the_United_States
> >>> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in
> >>> >> >> Redmont,Washington"
> >>> >> >> >> >> which
> >>> >> >> >> >> >> >> could
> >>> >> >> >> >> >> >> >> be
> >>> >> >> >> >> >> >> >> >>>> > matched.
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > There is still other contextual information
> >>> from
> >>> >> >> >> >> >> >> >> >>>> > dbpedia
> >>> >> >> >> >> which
> >>> >> >> >> >> >> >> can
> >>> >> >> >> >> >> >> >> be
> >>> >> >> >> >> >> >> >> >>>> used.
> >>> >> >> >> >> >> >> >> >>>> > For example for an Organization we could
> also
> >>> >> >> include :
> >>> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
> >>> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack Obama)
> :
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
> >>> >> >> >> >> >> >> >> >>>> >
>  dbpedia:Author
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
> >>> >> >> >> >> >> >> >> >>>> >
>  dbpedia:Lawyer
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this as
> I
> >>> think
> >>> >> >> that
> >>> >> >> >> >> >> >> >> >>>> > it
> >>> >> >> >> >> may
> >>> >> >> >> >> >> >> have
> >>> >> >> >> >> >> >> >> >>>> some
> >>> >> >> >> >> >> >> >> >>>> > value in increasing the number of
> coreference
> >>> >> >> >> >> >> >> >> >>>> > resolutions
> >>> >> >> >> >> and
> >>> >> >> >> >> >> I'd
> >>> >> >> >> >> >> >> >> like
> >>> >> >> >> >> >> >> >> >>>> to
> >>> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather than
> >>> recall
> >>> >> >> since
> >>> >> >> >> >> >> >> >> >>>> > we
> >>> >> >> >> >> >> already
> >>> >> >> >> >> >> >> >> have
> >>> >> >> >> >> >> >> >> >>>> a
> >>> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the
> stanford
> >>> nlp
> >>> >> tool
> >>> >> >> >> >> >> >> >> >>>> > and
> >>> >> >> >> >> this
> >>> >> >> >> >> >> >> would
> >>> >> >> >> >> >> >> >> >>>> be as
> >>> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is how I
> >>> would
> >>> >> >> like
> >>> >> >> >> >> >> >> >> >>>> > to
> >>> >> >> >> >> use
> >>> >> >> >> >> >> >> it).
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a
> jira? I
> >>> >> could
> >>> >> >> >> >> >> >> >> >>>> > update
> >>> >> >> >> >> it
> >>> >> >> >> >> >> to
> >>> >> >> >> >> >> >> >> show
> >>> >> >> >> >> >> >> >> >>>> my
> >>> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if it
> >>> turns
> >>> >> out
> >>> >> >> >> >> >> >> >> >>>> > that
> >>> >> >> >> >> it
> >>> >> >> >> >> >> was
> >>> >> >> >> >> >> >> a
> >>> >> >> >> >> >> >> >> bad
> >>> >> >> >> >> >> >> >> >>>> idea
> >>> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll
> end up
> >>> >> with
> >>> >> >> >> >> >> >> >> >>>> > more
> >>> >> >> >> >> >> >> knowledge
> >>> >> >> >> >> >> >> >> >>>> about
> >>> >> >> >> >> >> >> >> >>>> > Stanbol in the end :).
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
> >>> >> >> >> >> >> >> >> >>>> > <[email protected]>:
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> >> Hi Cristian,
> >>> >> >> >> >> >> >> >> >>>> >>
> >>> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want to
> be
> >>> the
> >>> >> >> >> >> >> >> >> >>>> >> devil's
> >>> >> >> >> >> >> >> advocate
> >>> >> >> >> >> >> >> >> but
> >>> >> >> >> >> >> >> >> >>>> I'm
> >>> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using the
> >>> dbpedia
> >>> >> >> >> >> categories
> >>> >> >> >> >> >> >> >> feature.
> >>> >> >> >> >> >> >> >> >>>> For
> >>> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also
> >>> "Microsoft
> >>> >> >> posted
> >>> >> >> >> >> >> >> >> >>>> >> its
> >>> >> >> >> >> >> 2013
> >>> >> >> >> >> >> >> >> >>>> earnings.
> >>> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge profit".
> >>> So,
> >>> >> maybe
> >>> >> >> >> >> >> including
> >>> >> >> >> >> >> >> more
> >>> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia could
> >>> >> increase
> >>> >> >> the
> >>> >> >> >> >> recall
> >>> >> >> >> >> >> >> but
> >>> >> >> >> >> >> >> >> of
> >>> >> >> >> >> >> >> >> >>>> course
> >>> >> >> >> >> >> >> >> >>>> >> will reduce the precision.
> >>> >> >> >> >> >> >> >> >>>> >>
> >>> >> >> >> >> >> >> >> >>>> >> Cheers,
> >>> >> >> >> >> >> >> >> >>>> >> Rafa
> >>> >> >> >> >> >> >> >> >>>> >>
> >>> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca
> >>> escribió:
> >>> >> >> >> >> >> >> >> >>>> >>
> >>> >> >> >> >> >> >> >> >>>> >>  Back with a more detailed description of
> the
> >>> >> steps
> >>> >> >> >> >> >> >> >> >>>> >> for
> >>> >> >> >> >> >> making
> >>> >> >> >> >> >> >> this
> >>> >> >> >> >> >> >> >> >>>> kind of
> >>> >> >> >> >> >> >> >> >>>> >>> coreference work.
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> I will be using references to the
> following
> >>> >> text in
> >>> >> >> >> >> >> >> >> >>>> >>> the
> >>> >> >> >> >> >> steps
> >>> >> >> >> >> >> >> >> below
> >>> >> >> >> >> >> >> >> >>>> in
> >>> >> >> >> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft
> >>> posted
> >>> >> >> its
> >>> >> >> >> >> >> >> >> >>>> >>> 2013
> >>> >> >> >> >> >> >> >> earnings.
> >>> >> >> >> >> >> >> >> >>>> The
> >>> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text
> which
> >>> has :
> >>> >> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies
> >>> >> reference
> >>> >> >> to
> >>> >> >> >> >> >> >> >> >>>> >>> an
> >>> >> >> >> >> >> entity
> >>> >> >> >> >> >> >> >> local
> >>> >> >> >> >> >> >> >> >>>> to
> >>> >> >> >> >> >> >> >> >>>> >>> the
> >>> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but not
> >>> >> "another,
> >>> >> >> >> >> every",
> >>> >> >> >> >> >> etc
> >>> >> >> >> >> >> >> >> which
> >>> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity outside
> of
> >>> the
> >>> >> >> text.
> >>> >> >> >> >> >> >> >> >>>> >>>      b. having at least another noun
> aside
> >>> from
> >>> >> the
> >>> >> >> >> >> >> >> >> >>>> >>> main
> >>> >> >> >> >> >> >> required
> >>> >> >> >> >> >> >> >> >>>> noun
> >>> >> >> >> >> >> >> >> >>>> >>> which
> >>> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I will
> not
> >>> >> count
> >>> >> >> >> >> >> >> >> >>>> >>> "The
> >>> >> >> >> >> >> >> company"
> >>> >> >> >> >> >> >> >> as
> >>> >> >> >> >> >> >> >> >>>> being
> >>> >> >> >> >> >> >> >> >>>> >>> a
> >>> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could
> >>> create a
> >>> >> lot
> >>> >> >> of
> >>> >> >> >> >> false
> >>> >> >> >> >> >> >> >> >>>> positives by
> >>> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of some
> words
> >>> >> such
> >>> >> >> as
> >>> >> >> >> >> >> >> >> >>>> >>> "in
> >>> >> >> >> >> the
> >>> >> >> >> >> >> >> >> company
> >>> >> >> >> >> >> >> >> >>>> of
> >>> >> >> >> >> >> >> >> >>>> >>> good people".
> >>> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good
> candidate
> >>> >> since we
> >>> >> >> >> >> >> >> >> >>>> >>> also
> >>> >> >> >> >> >> have
> >>> >> >> >> >> >> >> >> >>>> "software".
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to
> the
> >>> >> >> contents
> >>> >> >> >> >> >> >> >> >>>> >>> of
> >>> >> >> >> >> the
> >>> >> >> >> >> >> >> >> dbpedia
> >>> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found
> prior
> >>> to
> >>> >> the
> >>> >> >> >> >> location
> >>> >> >> >> >> >> of
> >>> >> >> >> >> >> >> the
> >>> >> >> >> >> >> >> >> >>>> noun
> >>> >> >> >> >> >> >> >> >>>> >>> phrase in the text.
> >>> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the
> following
> >>> >> format
> >>> >> >> >> >> >> >> >> >>>> >>> (for
> >>> >> >> >> >> >> >> Microsoft
> >>> >> >> >> >> >> >> >> for
> >>> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the
> United
> >>> >> >> States".
> >>> >> >> >> >> >> >> >> >>>> >>>   So we try to match "software company"
> with
> >>> >> that.
> >>> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in
> the
> >>> >> dbpedia
> >>> >> >> >> >> category
> >>> >> >> >> >> >> >> has a
> >>> >> >> >> >> >> >> >> >>>> plural
> >>> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all categories
> >>> which
> >>> >> I
> >>> >> >> >> >> >> >> >> >>>> >>> saw. I
> >>> >> >> >> >> >> don't
> >>> >> >> >> >> >> >> >> know
> >>> >> >> >> >> >> >> >> >>>> if
> >>> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I
> >>> thought
> >>> >> of
> >>> >> >> >> >> applying a
> >>> >> >> >> >> >> >> >> >>>> lemmatizer on
> >>> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in order
> >>> for
> >>> >> them
> >>> >> >> to
> >>> >> >> >> >> have a
> >>> >> >> >> >> >> >> >> common
> >>> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun
> >>> phrase
> >>> >> >> itself
> >>> >> >> >> >> has a
> >>> >> >> >> >> >> >> plural
> >>> >> >> >> >> >> >> >> >>>> form.
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison
> >>> only the
> >>> >> >> >> >> >> >> >> >>>> >>> words in
> >>> >> >> >> >> >> the
> >>> >> >> >> >> >> >> >> >>>> category
> >>> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not
> >>> prepositions
> >>> >> or
> >>> >> >> >> >> >> determiners
> >>> >> >> >> >> >> >> >> such
> >>> >> >> >> >> >> >> >> >>>> as "of
> >>> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag
> the
> >>> >> >> categories
> >>> >> >> >> >> >> contents
> >>> >> >> >> >> >> >> as
> >>> >> >> >> >> >> >> >> >>>> well.
> >>> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and
> lemma
> >>> on
> >>> >> the
> >>> >> >> >> >> dbpedia
> >>> >> >> >> >> >> >> >> >>>> categories when
> >>> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub
> and
> >>> >> storing
> >>> >> >> >> >> >> >> >> >>>> >>> them
> >>> >> >> >> >> for
> >>> >> >> >> >> >> >> later
> >>> >> >> >> >> >> >> >> >>>> use - I
> >>> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the
> >>> moment.
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in the
> >>> noun
> >>> >> >> phrase
> >>> >> >> >> >> with
> >>> >> >> >> >> >> the
> >>> >> >> >> >> >> >> >> >>>> equivalent
> >>> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on the
> >>> number
> >>> >> of
> >>> >> >> >> >> matches I
> >>> >> >> >> >> >> >> can
> >>> >> >> >> >> >> >> >> >>>> create a
> >>> >> >> >> >> >> >> >> >>>> >>> confidence level.
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with
> >>> the
> >>> >> >> >> >> >> >> >> >>>> >>> rdf:type
> >>> >> >> >> >> from
> >>> >> >> >> >> >> >> >> dbpedia
> >>> >> >> >> >> >> >> >> >>>> of the
> >>> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase
> the
> >>> >> >> confidence
> >>> >> >> >> >> level.
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities
> >>> which
> >>> >> can
> >>> >> >> >> >> >> >> >> >>>> >>> match a
> >>> >> >> >> >> >> >> certain
> >>> >> >> >> >> >> >> >> >>>> noun
> >>> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with the
> >>> >> closest
> >>> >> >> >> >> >> >> >> >>>> >>> named
> >>> >> >> >> >> >> entity
> >>> >> >> >> >> >> >> >> prior
> >>> >> >> >> >> >> >> >> >>>> to it
> >>> >> >> >> >> >> >> >> >>>> >>> in the text.
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> What do you think?
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> Cristian
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
> >>> >> >> >> >> [email protected]>:
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic
> but
> >>> I'm
> >>> >> >> >> >> >> >> >> >>>> >>>> working on
> >>> >> >> >> >> >> it.
> >>> >> >> >> >> >> >> I'll
> >>> >> >> >> >> >> >> >> >>>> provide
> >>> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a
> >>> >> feedback on
> >>> >> >> >> >> >> >> >> >>>> >>>> it.
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools
> >>> such as
> >>> >> >> >> >> >> >> >> >>>> >>>> ArkRef
> >>> >> >> >> >> and
> >>> >> >> >> >> >> >> >> >>>> CherryPicker
> >>> >> >> >> >> >> >> >> >>>> >>>> and
> >>> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>> Cristian
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <[email protected]
> >:
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about your
> >>> >> concrete
> >>> >> >> >> >> heuristic,
> >>> >> >> >> >> >> >> in my
> >>> >> >> >> >> >> >> >> >>>> honest
> >>> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a
> >>> lot of
> >>> >> >> false
> >>> >> >> >> >> >> >> positives. I
> >>> >> >> >> >> >> >> >> >>>> don't
> >>> >> >> >> >> >> >> >> >>>> >>>>> know
> >>> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some
> "locality"
> >>> >> >> features
> >>> >> >> >> >> >> >> >> >>>> >>>>> to
> >>> >> >> >> >> >> detect
> >>> >> >> >> >> >> >> >> such
> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into
> >>> account
> >>> >> >> that
> >>> >> >> >> >> >> >> >> >>>> >>>>> it
> >>> >> >> >> >> is
> >>> >> >> >> >> >> >> quite
> >>> >> >> >> >> >> >> >> >>>> usual
> >>> >> >> >> >> >> >> >> >>>> >>>>> that
> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even
> in
> >>> >> >> different
> >>> >> >> >> >> >> >> paragraphs.
> >>> >> >> >> >> >> >> >> >>>> Although
> >>> >> >> >> >> >> >> >> >>>> >>>>> I'm
> >>> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
> >>> >> Understanding,
> >>> >> >> I
> >>> >> >> >> >> would
> >>> >> >> >> >> >> say
> >>> >> >> >> >> >> >> it
> >>> >> >> >> >> >> >> >> is
> >>> >> >> >> >> >> >> >> >>>> quite
> >>> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent
> precision/recall
> >>> rates
> >>> >> >> for
> >>> >> >> >> >> >> >> coreferencing
> >>> >> >> >> >> >> >> >> >>>> using
> >>> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try
> to
> >>> >> others
> >>> >> >> >> >> >> >> >> >>>> >>>>> tools
> >>> >> >> >> >> like
> >>> >> >> >> >> >> >> BART
> >>> >> >> >> >> >> >> >> (
> >>> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
> >>> >> >> >> >> >> >> >> >>>> >>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>> Cheers,
> >>> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
> >>> >> >> >> >> >> >> >> >>>> >>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca
> >>> escribió:
> >>> >> >> >> >> >> >> >> >>>> >>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>   Hi,
> >>> >> >> >> >> >> >> >> >>>> >>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for
> >>> implementing
> >>> >> the
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Event
> >>> >> >> >> >> >> >> >> extraction
> >>> >> >> >> >> >> >> >> >>>> Engine
> >>> >> >> >> >> >> >> >> >>>> >>>>>> feature :
> >>> >> >> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
> >>> >> >> >> >> >> >> >> >>>> to
> >>> >> >> >> >> >> >> >> >>>> >>>>>> have
> >>> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given
> text.
> >>> >> This
> >>> >> >> is
> >>> >> >> >> >> >> provided
> >>> >> >> >> >> >> >> now
> >>> >> >> >> >> >> >> >> >>>> via the
> >>> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I
> saw
> >>> this
> >>> >> >> >> >> >> >> >> >>>> >>>>>> module
> >>> >> >> >> >> is
> >>> >> >> >> >> >> >> >> performing
> >>> >> >> >> >> >> >> >> >>>> >>>>>> mostly
> >>> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack
> >>> Obama
> >>> >> and
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Mr.
> >>> >> >> >> >> >> Obama)
> >>> >> >> >> >> >> >> >> >>>> coreference
> >>> >> >> >> >> >> >> >> >>>> >>>>>> resolution.
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences from
> >>> the
> >>> >> text
> >>> >> >> I
> >>> >> >> >> >> though
> >>> >> >> >> >> >> of
> >>> >> >> >> >> >> >> >> >>>> creating
> >>> >> >> >> >> >> >> >> >>>> >>>>>> some
> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
> >>> >> >> coreference :
> >>> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The
> >>> >> software
> >>> >> >> >> >> company
> >>> >> >> >> >> >> just
> >>> >> >> >> >> >> >> >> >>>> announced
> >>> >> >> >> >> >> >> >> >>>> >>>>>> its
> >>> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously
> >>> refers
> >>> >> to
> >>> >> >> >> >> "Apple".
> >>> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of
> >>> Named
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Entities
> >>> >> >> >> >> >> which
> >>> >> >> >> >> >> >> are
> >>> >> >> >> >> >> >> >> of
> >>> >> >> >> >> >> >> >> >>>> the
> >>> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this
> >>> case
> >>> >> >> >> >> >> >> >> >>>> >>>>>> "company"
> >>> >> >> >> >> and
> >>> >> >> >> >> >> >> also
> >>> >> >> >> >> >> >> >> >>>> have
> >>> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the
> >>> dbpedia
> >>> >> >> >> >> categories
> >>> >> >> >> >> >> of
> >>> >> >> >> >> >> >> the
> >>> >> >> >> >> >> >> >> >>>> named
> >>> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as
> >>> "The
> >>> >> >> >> >> >> >> >> >>>> >>>>>> software
> >>> >> >> >> >> >> >> company" in
> >>> >> >> >> >> >> >> >> >>>> the
> >>> >> >> >> >> >> >> >> >>>> >>>>>> text
> >>> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using the
> >>> new
> >>> >> Pos
> >>> >> >> Tag
> >>> >> >> >> >> Based
> >>> >> >> >> >> >> >> Phrase
> >>> >> >> >> >> >> >> >> >>>> >>>>>> extraction
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
> >>> >> dependency
> >>> >> >> >> >> >> >> >> >>>> >>>>>> tree of
> >>> >> >> >> >> >> the
> >>> >> >> >> >> >> >> >> >>>> sentence and
> >>> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this
> >>> kind
> >>> >> of
> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic
> >>> >> >> >> >> >> would
> >>> >> >> >> >> >> >> be
> >>> >> >> >> >> >> >> >> >>>> useful
> >>> >> >> >> >> >> >> >> >>>> >>>>>> as a
> >>> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case
> the
> >>> >> >> precision
> >>> >> >> >> >> >> >> >> >>>> >>>>>> and
> >>> >> >> >> >> >> >> recall
> >>> >> >> >> >> >> >> >> are
> >>> >> >> >> >> >> >> >> >>>> good
> >>> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Thanks,
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Cristian
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> --
> >>> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler
> >>> >> >> >> >> [email protected]
> >>> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
> >>> >> >> >> >> >> >> ++43-699-11108907
> >>> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> --
> >>> >> >> >> >> >> >> >> | Rupert Westenthaler
> >>> >> >> >> >> >> >> >> [email protected]
> >>> >> >> >> >> >> >> >> | Bodenlehenstraße 11
> >>> >> >> >> >> ++43-699-11108907
> >>> >> >> >> >> >> >> >> | A-5500 Bischofshofen
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> --
> >>> >> >> >> >> >> >> | Rupert Westenthaler
> >>> >> >> [email protected]
> >>> >> >> >> >> >> >> | Bodenlehenstraße 11
> >>> >> >> >> >> >> >> ++43-699-11108907
> >>> >> >> >> >> >> >> | A-5500 Bischofshofen
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >>
> >>> >> >> >> >> >>
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> --
> >>> >> >> >> >> >> | Rupert Westenthaler
> >>> >> [email protected]
> >>> >> >> >> >> >> | Bodenlehenstraße 11
> >>> >> >> ++43-699-11108907
> >>> >> >> >> >> >> | A-5500 Bischofshofen
> >>> >> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >> --
> >>> >> >> >> >> | Rupert Westenthaler
> >>> [email protected]
> >>> >> >> >> >> | Bodenlehenstraße 11
> >>> >> ++43-699-11108907
> >>> >> >> >> >> | A-5500 Bischofshofen
> >>> >> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> --
> >>> >> >> >> | Rupert Westenthaler
> [email protected]
> >>> >> >> >> | Bodenlehenstraße 11
> >>> ++43-699-11108907
> >>> >> >> >> | A-5500 Bischofshofen
> >>> >> >> >
> >>> >> >> >
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> --
> >>> >> >> | Rupert Westenthaler             [email protected]
> >>> >> >> | Bodenlehenstraße 11
> ++43-699-11108907
> >>> >> >> | A-5500 Bischofshofen
> >>> >> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> | Rupert Westenthaler             [email protected]
> >>> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >>> >> | A-5500 Bischofshofen
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> | Rupert Westenthaler             [email protected]
> >>> | Bodenlehenstraße 11                             ++43-699-11108907
> >>> | A-5500 Bischofshofen
> >>>
> >>
> >>
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Reply via email to