Re: Named entity coref resolution based on dbpedia categories and rdf:type

Rupert Westenthaler Fri, 28 Mar 2014 03:23:39 -0700

Hi Cristian, all

Looks good to me, nut I am not sure if I got everything. If you could
provide example texts where those rules apply it would make it much
easier to understand.


Instead of using dbpedia properties you should define your own domain
model (ontology). You can than align the dbpedia properties to your
model. This will allow it to apply this approach also to knowledge
bases other than dbpedia.

For people new to this thread: The above message adds to the
suggestion first made by Cristian on 4th February. Also the following
4 messages (until 7th Feb) provide additional context.

best
Rupert


On Fri, Mar 28, 2014 at 9:23 AM, Cristian Petroaca
<[email protected]> wrote:
> Hi guys,
>
> After Rupert's last suggestions related to this enhancement engine I
> devised a more comprehensive algorithm for matching the noun phrases
> against the NER properties.Please take a look and let me know what you
> think. Thanks.
>
> The following rules will be applied to every noun phrase in order to find
> co-references:
>
> 1. For each NER prior to the current noun phrase in the text match the
> yago:class label to the contents of the noun phrase.
>
> For the NERs which have a yago:class which matches, apply:
>
> 2. Group membership rules :
>
>     a. spatial membership : the NER is part of a Location. If the noun
> phrase contains a LOCATION or a demonym then check any location properties
> of the matching NER.
>
>     If matching NER is a :
>     - person, match against :birthPlace, :region, :nationality
>     - organisation, match against :foundationPlace, :locationCity,
> :location, :hometown
>     - place, match against :country, :subdivisionName, :location,
>
>     Ex: The Italian President, The Richmond-based company
>
>     b. organisational membership : the NER is part of an Organisation. If
> the noun phrase contains an ORGANISATION then check the following
> properties of the maching NER:
>
>     If matching NER is :
>     - person, match against :occupation, :associatedActs
>     - organisation ?
>     - location ?
>
> Ex: The Microsoft executive, The Pink Floyd singer
>
> 3. Functional description rule: the noun phrase describes what the NER does
> conceptually.
> If there are no NERs in the noun phrase then match the following properties
> of the matching NER to the contents of the noun phrase (aside from the
> nouns which are part of the yago:class) :
>
>    If NER is a:
>    - person ?
>    - organisation : , match against :service, :industry, :genre
>    - location ?
>
> Ex:  The software company.
>
> 4. If no matches were found for the current NER with rules 2 or 3 then if
> the yago:class which matched has more than 2 nouns then we also consider
> this a good co-reference but with a lower confidence maybe.
>
> Ex: The former tennis player, the theoretical physicist.
>
> 5. Based on the number of nouns which matched we create a confidence level.
> The number of matched nouns cannot be lower than 2 and we must have a
> yago:class match.
>
> For all NERs which got to this point, select the closest ones in the text
> to the noun phrase which matched against the same properties (yago:class
> and dbpedia) and mark them as co-references.
>
> Note: all noun phrases need to be lemmatized before all of this in case
> there are any plurals.
>
>
> 2014-03-25 20:50 GMT+02:00 Cristian Petroaca <[email protected]>:
>
>> That worked. Thanks.
>>
>> So, there are no exceptions during the startup of the launcher.
>> The component tab in the felix console shows 6 WeightedChains the first
>> time, including the default one but after my changes and a restart there
>> are only 5 - the default one is missing altogether.
>>
>>
>> 2014-03-24 20:18 GMT+02:00 Rupert Westenthaler <
>> [email protected]>:
>>
>> Hi Cristian,
>>>
>>> I do see the same problem since last Friday. The solution as mentions
>>> by [1] works for me.
>>>
>>>     mvn -Djsse.enableSNIExtension=false {goals}
>>>
>>> No Idea why https connections to github do currently cause this. I
>>> could not find anything related via Google. So I suggest to use the
>>> system property for now. If this persists for longer we can adapt the
>>> build files accordingly.
>>>
>>> best
>>> Rupert
>>>
>>>
>>>
>>>
>>> [1]
>>> http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0
>>>
>>> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca
>>> <[email protected]> wrote:
>>> > I did a clean on the whole project and now I wanted to do another "mvn
>>> > clean install" but I am getting this :
>>> >
>>> > "[INFO]
>>> > ------------------------------------------------------------------------
>>> > [ERROR] Failed to execute goal
>>> > org.apache.maven.plugins:maven-antrun-plugin:1.6:
>>> > run (download) on project org.apache.stanbol.data.opennlp.lang.es: An
>>> Ant
>>> > BuildE
>>> > xception has occured: The following error occurred while executing this
>>> > line:
>>> > [ERROR]
>>> > C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
>>> > 3: Failed to copy
>>> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
>>> > 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin to
>>> > C:\Data\Pr
>>> >
>>> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
>>> > data\opennlp\es-pos-maxent.bin due to javax.net.ssl.SSLProtocolException
>>> > handshake alert : unrecognized_name"
>>> >
>>> >
>>> >
>>> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
>>> > [email protected]>:
>>> >
>>> >> Hi Cristian,
>>> >>
>>> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
>>> >> <[email protected]> wrote:
>>> >> >
>>> >>
>>> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
>>> >> > service.ranking=I"-2147483648"
>>> >> > stanbol.enhancer.chain.name="default"
>>> >>
>>> >> Does look fine to me. Do you see any exception during the startup of
>>> >> the launcher. Can you check the status of this component in the
>>> >> component tab of the felix web console [1] (search for
>>> >> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If
>>> >> you have multiple you can find the correct one by comparing the
>>> >> "Properties" with those in the configuration file.
>>> >>
>>> >> I guess that the according service is in the 'unsatisfied' as you do
>>> >> not see it in the web interface. But if this is the case you should
>>> >> also see the according exception in the log. You can also manually
>>> >> stop/start the component. In this case the exception should be
>>> >> re-thrown and you do not need to search the log for it.
>>> >>
>>> >> best
>>> >> Rupert
>>> >>
>>> >>
>>> >> [1] http://localhost:8080/system/console/components
>>> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
>>> >> [email protected]
>>> >> >>:
>>> >> >
>>> >> >> Hi Cristian,
>>> >> >>
>>> >> >> you can not send attachments to the list. Please copy the contents
>>> >> >> directly to the mail
>>> >> >>
>>> >> >> thx
>>> >> >> Rupert
>>> >> >>
>>> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
>>> >> >> <[email protected]> wrote:
>>> >> >> > The config attached.
>>> >> >> >
>>> >> >> >
>>> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
>>> >> >> > <[email protected]>:
>>> >> >> >
>>> >> >> >> Hi Cristian,
>>> >> >> >>
>>> >> >> >> can you provide the contents of the chain after your
>>> modifications?
>>> >> >> >> Would be interesting to test why the chain is no longer active
>>> after
>>> >> >> >> the restart.
>>> >> >> >>
>>> >> >> >> You can find the config file in the 'stanbol/fileinstall' folder.
>>> >> >> >>
>>> >> >> >> best
>>> >> >> >> Rupert
>>> >> >> >>
>>> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
>>> >> >> >> <[email protected]> wrote:
>>> >> >> >> > Related to the default chain selection rules : before restart I
>>> >> had a
>>> >> >> >> > chain
>>> >> >> >> > with the name 'default' as in I could access it via
>>> >> >> >> > enhancer/chain/default.
>>> >> >> >> > Then I just added another engine to the 'default' chain. I
>>> assumed
>>> >> >> that
>>> >> >> >> > after the restart the chain with the 'default' name would be
>>> >> >> persisted.
>>> >> >> >> > So
>>> >> >> >> > the first rule should have been applied after the restart as
>>> well.
>>> >> But
>>> >> >> >> > instead I cannot reach it via enhancer/chain/default anymore
>>> so its
>>> >> >> >> > gone.
>>> >> >> >> > Anyway, this is not a big deal, it's not blocking me in any
>>> way, I
>>> >> >> just
>>> >> >> >> > wanted to understand where the problem is.
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
>>> >> >> >> > <[email protected]
>>> >> >> >> >>:
>>> >> >> >> >
>>> >> >> >> >> Hi Cristian
>>> >> >> >> >>
>>> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>>> >> >> >> >> <[email protected]> wrote:
>>> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool
>>> >> >> >> >> >
>>> >> >> >> >> > 2. I start the stable launcher -> create a new instance of
>>> the
>>> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At this
>>> point
>>> >> >> >> >> > everything
>>> >> >> >> >> > looks good and works ok.
>>> >> >> >> >> > After I restart the server the default chain is gone and
>>> >> instead I
>>> >> >> >> >> > see
>>> >> >> >> >> this
>>> >> >> >> >> > in the enhancement chains page : all-active (default, id:
>>> 149,
>>> >> >> >> >> > ranking:
>>> >> >> >> >> 0,
>>> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not contain
>>> the
>>> >> >> >> >> > 'default'
>>> >> >> >> >> > word before the restart.
>>> >> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >> Please note the default chain selection rules as described at
>>> [1].
>>> >> >> You
>>> >> >> >> >> can also access chains chains under
>>> '/enhancer/chain/{chain-name}'
>>> >> >> >> >>
>>> >> >> >> >> best
>>> >> >> >> >> Rupert
>>> >> >> >> >>
>>> >> >> >> >> [1]
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >>
>>> >>
>>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>>> >> >> >> >>
>>> >> >> >> >> > It looks like the config files are exactly what I need.
>>> Thanks.
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>>> >> >> >> >> [email protected]
>>> >> >> >> >> >>:
>>> >> >> >> >> >
>>> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>>> >> >> >> >> >> <[email protected]> wrote:
>>> >> >> >> >> >> > Thanks Rupert.
>>> >> >> >> >> >> >
>>> >> >> >> >> >> > A couple more questions/issues :
>>> >> >> >> >> >> >
>>> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing this
>>> in the
>>> >> >> >> >> >> > console
>>> >> >> >> >> >> > output :
>>> >> >> >> >> >> >
>>> >> >> >> >> >>
>>> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
>>> >> >> >> >> >>
>>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get
>>> >> messed
>>> >> >> >> >> >> > up. I
>>> >> >> >> >> >> > usually use the 'default' chain and add my engine to it
>>> so
>>> >> there
>>> >> >> >> >> >> > are
>>> >> >> >> >> 11
>>> >> >> >> >> >> > engines in it. After the restart this chain now contains
>>> >> around
>>> >> >> 23
>>> >> >> >> >> >> engines
>>> >> >> >> >> >> > in total.
>>> >> >> >> >> >>
>>> >> >> >> >> >> I was not able to replicate this. What I tried was
>>> >> >> >> >> >>
>>> >> >> >> >> >> (1) start up the stable launcher
>>> >> >> >> >> >> (2) add an additional engine to the default chain
>>> >> >> >> >> >> (3) restart the launcher
>>> >> >> >> >> >>
>>> >> >> >> >> >> The default chain was not changed after (2) and (3). So I
>>> would
>>> >> >> need
>>> >> >> >> >> >> further information for knowing why this is happening.
>>> >> >> >> >> >>
>>> >> >> >> >> >> Generally it is better to create you own chain instance as
>>> >> >> modifying
>>> >> >> >> >> >> one that is provided by the default configuration. I would
>>> also
>>> >> >> >> >> >> recommend that you keep your test configuration in text
>>> files
>>> >> and
>>> >> >> to
>>> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so
>>> >> prevent
>>> >> >> you
>>> >> >> >> >> >> from manually entering the configuration after a software
>>> >> update.
>>> >> >> >> >> >> The
>>> >> >> >> >> >> production-mode section [3] provides information on how to
>>> do
>>> >> >> that.
>>> >> >> >> >> >>
>>> >> >> >> >> >> best
>>> >> >> >> >> >> Rupert
>>> >> >> >> >> >>
>>> >> >> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
>>> >> >> >> >> >> [2] http://svn.apache.org/r1576623
>>> >> >> >> >> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
>>> >> >> >> >> >>
>>> >> >> >> >> >> > ERROR: Bundle
>>> org.apache.stanbol.enhancer.engine.topic.web
>>> >> >> [153]:
>>> >> >> >> >> Error
>>> >> >> >> >> >> > starting
>>> >> >> >> >> >> >
>>> >> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >>
>>> >>
>>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >>
>>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>>> >> >> >> >> >> > (org.osgi
>>> >> >> >> >> >> > .framework.BundleException: Unresolved constraint in
>>> bundle
>>> >> >> >> >> >> > org.apache.stanbol.e
>>> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0:
>>> >> missing
>>> >> >> >> >> >> > requirement [15
>>> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
>>> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
>>> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved
>>> constraint in
>>> >> >> >> >> >> > bundle
>>> >> >> >> >> >> > org.apache.s
>>> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve
>>> >> 153.0:
>>> >> >> >> >> missing
>>> >> >> >> >> >> > require
>>> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
>>> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
>>> >> >> >> >> >> > )
>>> >> >> >> >> >> >         at
>>> >> >> >> >> >>
>>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>>> >> >> >> >> >> >         at
>>> >> >> >> >> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>>> >> >> >> >> >> >         at
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >>
>>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >         at
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >>
>>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>>> >> >> >> >> >> > )
>>> >> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
>>> >> >> >> >> >> >
>>> >> >> >> >> >> > Despite of this the server starts fine and I can use the
>>> >> >> enhancer
>>> >> >> >> >> fine.
>>> >> >> >> >> >> Do
>>> >> >> >> >> >> > you guys see this as well?
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get
>>> >> messed
>>> >> >> >> >> >> > up. I
>>> >> >> >> >> >> > usually use the 'default' chain and add my engine to it
>>> so
>>> >> there
>>> >> >> >> >> >> > are
>>> >> >> >> >> 11
>>> >> >> >> >> >> > engines in it. After the restart this chain now contains
>>> >> around
>>> >> >> 23
>>> >> >> >> >> >> engines
>>> >> >> >> >> >> > in total.
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>>> >> >> >> >> >> [email protected]
>>> >> >> >> >> >> >>:
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >> Hi Cristian,
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >> NER Annotations are typically available as both
>>> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation
>>> [1]
>>> >> in
>>> >> >> the
>>> >> >> >> >> >> >> enhancement metadata. As you are already accessing the
>>> >> >> >> >> >> >> AnayzedText I
>>> >> >> >> >> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >> best
>>> >> >> >> >> >> >> Rupert
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >> [1]
>>> >> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >>
>>> >>
>>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>>> >> >> >> >> >> >> <[email protected]> wrote:
>>> >> >> >> >> >> >> > Thanks.
>>> >> >> >> >> >> >> > I assume I should get the Named entities using the
>>> same
>>> >> but
>>> >> >> >> >> >> >> > with
>>> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
>>> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>>> >> >> >> >> >> >> > [email protected]>:
>>> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >> Hallo Cristian,
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement
>>> results.
>>> >> >> You
>>> >> >> >> >> need to
>>> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> here is some demo code you can use in the
>>> >> computeEnhancement
>>> >> >> >> >> method
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >>         AnalysedText at =
>>> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
>>> >> >> >> >> ci,
>>> >> >> >> >> >> >> true);
>>> >> >> >> >> >> >> >>         Iterator<? extends Section> sections =
>>> >> >> >> >> >> >> >> at.getSentences();
>>> >> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as single
>>> >> >> sentence
>>> >> >> >> >> >> >> >>             sections =
>>> >> Collections.singleton(at).iterator();
>>> >> >> >> >> >> >> >>         }
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >>         while(sections.hasNext()){
>>> >> >> >> >> >> >> >>             Section section = sections.next();
>>> >> >> >> >> >> >> >>             Iterator<Span> chunks =
>>> >> >> >> >> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>>> >> >> >> >> >> >> >>             while(chunks.hasNext()){
>>> >> >> >> >> >> >> >>                 Span chunk = chunks.next();
>>> >> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
>>> >> >> >> >> >> >> >>
>>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>>> >> >> >> >> >> >> >>                 if(phrase.value().getCategory() ==
>>> >> >> >> >> >> >> LexicalCategory.Noun){
>>> >> >> >> >> >> >> >>                     log.info(" - NounPhrase [{},{}]
>>> {}",
>>> >> >> new
>>> >> >> >> >> >> Object[]{
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>>> >> >> >> >> >> >> >>                 }
>>> >> >> >> >> >> >> >>             }
>>> >> >> >> >> >> >> >>         }
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> hope this helps
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> best
>>> >> >> >> >> >> >> >> Rupert
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> [1]
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >>
>>> >>
>>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>>> >> >> >> >> >> >> >> <[email protected]> wrote:
>>> >> >> >> >> >> >> >> > I started to implement the engine and I'm having
>>> >> problems
>>> >> >> >> >> >> >> >> > with
>>> >> >> >> >> >> getting
>>> >> >> >> >> >> >> >> > results for noun phrases. I modified the "default"
>>> >> >> weighted
>>> >> >> >> >> chain
>>> >> >> >> >> >> to
>>> >> >> >> >> >> >> also
>>> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample text
>>> :
>>> >> >> "Angela
>>> >> >> >> >> Merkel
>>> >> >> >> >> >> >> >> visted
>>> >> >> >> >> >> >> >> > China. The german chancellor met with various
>>> people".
>>> >> I
>>> >> >> >> >> expected
>>> >> >> >> >> >> that
>>> >> >> >> >> >> >> >> the
>>> >> >> >> >> >> >> >> > RDF XML output would contain some info about the
>>> noun
>>> >> >> >> >> >> >> >> > phrases
>>> >> >> >> >> but I
>>> >> >> >> >> >> >> >> cannot
>>> >> >> >> >> >> >> >> > see any.
>>> >> >> >> >> >> >> >> > Could you point me to the correct way to generate
>>> the
>>> >> noun
>>> >> >> >> >> phrases?
>>> >> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >> > Thanks,
>>> >> >> >> >> >> >> >> > Cristian
>>> >> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>>> >> >> >> >> >> >> >> [email protected]>:
>>> >> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >> >> Opened
>>> >> >> https://issues.apache.org/jira/browse/STANBOL-1279
>>> >> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>>> >> >> >> >> >> >> >> [email protected]>
>>> >> >> >> >> >> >> >> >> :
>>> >> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> >> Hi Rupert,
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll also
>>> >> take a
>>> >> >> >> >> >> >> >> >>> look
>>> >> >> >> >> at
>>> >> >> >> >> >> >> Yago.
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked about
>>> here.
>>> >> It
>>> >> >> >> >> >> >> >> >>> will
>>> >> >> >> >> >> >> probably
>>> >> >> >> >> >> >> >> >>> have just a draft-like description for now and
>>> will
>>> >> be
>>> >> >> >> >> >> >> >> >>> updated
>>> >> >> >> >> >> as I
>>> >> >> >> >> >> >> go
>>> >> >> >> >> >> >> >> >>> along.
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>> Thanks,
>>> >> >> >> >> >> >> >> >>> Cristian
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>>> >> >> >> >> >> >> >> >>> [email protected]>:
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>> Hi Cristian,
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You should
>>> have
>>> >> a
>>> >> >> >> >> >> >> >> >>>> look at
>>> >> >> >> >> >> Yago2
>>> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy
>>> is
>>> >> much
>>> >> >> >> >> better
>>> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia. Mapping
>>> >> >> >> >> >> >> >> >>>> suggestions of
>>> >> >> >> >> >> >> dbpedia
>>> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and
>>> >> yago2
>>> >> >> do
>>> >> >> >> >> >> provide
>>> >> >> >> >> >> >> >> >>>> mappings [2] and [3]
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>>> >> >> >> >> >> >> >> >>>> > <[email protected]>:
>>> >> >> >> >> >> >> >> >>>> >>
>>> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The
>>> >> Redmond's
>>> >> >> >> >> >> >> >> >>>> >> company
>>> >> >> >> >> >> made
>>> >> >> >> >> >> >> a
>>> >> >> >> >> >> >> >> >>>> >> huge profit".
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial
>>> contexts
>>> >> >> are
>>> >> >> >> >> >> >> >> >>>> very
>>> >> >> >> >> >> >> >> >>>> important as they tend to be often used for
>>> >> >> referencing.
>>> >> >> >> >> >> >> >> >>>> So I
>>> >> >> >> >> >> would
>>> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial context.
>>> For
>>> >> >> >> >> >> >> >> >>>> spatial
>>> >> >> >> >> >> >> Entities
>>> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for other
>>> >> (like a
>>> >> >> >> >> Person,
>>> >> >> >> >> >> >> >> >>>> Company) you could use relations to spatial
>>> entities
>>> >> >> >> >> >> >> >> >>>> define
>>> >> >> >> >> >> their
>>> >> >> >> >> >> >> >> >>>> spatial context. This context could than be
>>> used to
>>> >> >> >> >> >> >> >> >>>> correctly
>>> >> >> >> >> >> link
>>> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the "spatial"
>>> >> >> context
>>> >> >> >> >> >> >> >> >>>> of
>>> >> >> >> >> each
>>> >> >> >> >> >> >> >> >>>> entity (basically relation to entities that are
>>> >> cities,
>>> >> >> >> >> regions,
>>> >> >> >> >> >> >> >> >>>> countries) as a separate dimension, because
>>> those
>>> >> are
>>> >> >> >> >> >> >> >> >>>> very
>>> >> >> >> >> often
>>> >> >> >> >> >> >> used
>>> >> >> >> >> >> >> >> >>>> for coreferences.
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>>> >> >> >> >> >> >> >> >>>> [2]
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>>> >> >> >> >> >> >> >> >>>> [3]
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >>
>>> >>
>>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian
>>> Petroaca
>>> >> >> >> >> >> >> >> >>>> <[email protected]> wrote:
>>> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories for each
>>> >> entity,
>>> >> >> >> >> >> >> >> >>>> > in
>>> >> >> >> >> this
>>> >> >> >> >> >> >> case
>>> >> >> >> >> >> >> >> for
>>> >> >> >> >> >> >> >> >>>> > Microsoft we have :
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>>> >> >> >> >> >> >> >> >>>> > category:Microsoft
>>> >> >> >> >> >> >> >> >>>> >
>>> category:Software_companies_of_the_United_States
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> category:Software_companies_based_in_Washington_(state)
>>> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
>>> >> >> >> >> >> >> >> >>>> >
>>> category:1975_establishments_in_the_United_States
>>> >> >> >> >> >> >> >> >>>> >
>>> category:Companies_based_in_Redmond,_Washington
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >>
>>> >> >> category:Multinational_companies_headquartered_in_the_United_States
>>> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in
>>> >> >> Redmont,Washington"
>>> >> >> >> >> which
>>> >> >> >> >> >> >> could
>>> >> >> >> >> >> >> >> be
>>> >> >> >> >> >> >> >> >>>> > matched.
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > There is still other contextual information
>>> from
>>> >> >> >> >> >> >> >> >>>> > dbpedia
>>> >> >> >> >> which
>>> >> >> >> >> >> >> can
>>> >> >> >> >> >> >> >> be
>>> >> >> >> >> >> >> >> >>>> used.
>>> >> >> >> >> >> >> >> >>>> > For example for an Organization we could also
>>> >> >> include :
>>> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
>>> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
>>> >> >> >> >> >> >> >> >>>> >                                dbpedia:Author
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
>>> >> >> >> >> >> >> >> >>>> >                                dbpedia:Lawyer
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this as I
>>> think
>>> >> >> that
>>> >> >> >> >> >> >> >> >>>> > it
>>> >> >> >> >> may
>>> >> >> >> >> >> >> have
>>> >> >> >> >> >> >> >> >>>> some
>>> >> >> >> >> >> >> >> >>>> > value in increasing the number of coreference
>>> >> >> >> >> >> >> >> >>>> > resolutions
>>> >> >> >> >> and
>>> >> >> >> >> >> I'd
>>> >> >> >> >> >> >> >> like
>>> >> >> >> >> >> >> >> >>>> to
>>> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather than
>>> recall
>>> >> >> since
>>> >> >> >> >> >> >> >> >>>> > we
>>> >> >> >> >> >> already
>>> >> >> >> >> >> >> >> have
>>> >> >> >> >> >> >> >> >>>> a
>>> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the stanford
>>> nlp
>>> >> tool
>>> >> >> >> >> >> >> >> >>>> > and
>>> >> >> >> >> this
>>> >> >> >> >> >> >> would
>>> >> >> >> >> >> >> >> >>>> be as
>>> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is how I
>>> would
>>> >> >> like
>>> >> >> >> >> >> >> >> >>>> > to
>>> >> >> >> >> use
>>> >> >> >> >> >> >> it).
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a jira? I
>>> >> could
>>> >> >> >> >> >> >> >> >>>> > update
>>> >> >> >> >> it
>>> >> >> >> >> >> to
>>> >> >> >> >> >> >> >> show
>>> >> >> >> >> >> >> >> >>>> my
>>> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if it
>>> turns
>>> >> out
>>> >> >> >> >> >> >> >> >>>> > that
>>> >> >> >> >> it
>>> >> >> >> >> >> was
>>> >> >> >> >> >> >> a
>>> >> >> >> >> >> >> >> bad
>>> >> >> >> >> >> >> >> >>>> idea
>>> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll end up
>>> >> with
>>> >> >> >> >> >> >> >> >>>> > more
>>> >> >> >> >> >> >> knowledge
>>> >> >> >> >> >> >> >> >>>> about
>>> >> >> >> >> >> >> >> >>>> > Stanbol in the end :).
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>>> >> >> >> >> >> >> >> >>>> > <[email protected]>:
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> >> Hi Cristian,
>>> >> >> >> >> >> >> >> >>>> >>
>>> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want to be
>>> the
>>> >> >> >> >> >> >> >> >>>> >> devil's
>>> >> >> >> >> >> >> advocate
>>> >> >> >> >> >> >> >> but
>>> >> >> >> >> >> >> >> >>>> I'm
>>> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using the
>>> dbpedia
>>> >> >> >> >> categories
>>> >> >> >> >> >> >> >> feature.
>>> >> >> >> >> >> >> >> >>>> For
>>> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also
>>> "Microsoft
>>> >> >> posted
>>> >> >> >> >> >> >> >> >>>> >> its
>>> >> >> >> >> >> 2013
>>> >> >> >> >> >> >> >> >>>> earnings.
>>> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge profit".
>>> So,
>>> >> maybe
>>> >> >> >> >> >> including
>>> >> >> >> >> >> >> more
>>> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia could
>>> >> increase
>>> >> >> the
>>> >> >> >> >> recall
>>> >> >> >> >> >> >> but
>>> >> >> >> >> >> >> >> of
>>> >> >> >> >> >> >> >> >>>> course
>>> >> >> >> >> >> >> >> >>>> >> will reduce the precision.
>>> >> >> >> >> >> >> >> >>>> >>
>>> >> >> >> >> >> >> >> >>>> >> Cheers,
>>> >> >> >> >> >> >> >> >>>> >> Rafa
>>> >> >> >> >> >> >> >> >>>> >>
>>> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca
>>> escribió:
>>> >> >> >> >> >> >> >> >>>> >>
>>> >> >> >> >> >> >> >> >>>> >>  Back with a more detailed description of the
>>> >> steps
>>> >> >> >> >> >> >> >> >>>> >> for
>>> >> >> >> >> >> making
>>> >> >> >> >> >> >> this
>>> >> >> >> >> >> >> >> >>>> kind of
>>> >> >> >> >> >> >> >> >>>> >>> coreference work.
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> I will be using references to the following
>>> >> text in
>>> >> >> >> >> >> >> >> >>>> >>> the
>>> >> >> >> >> >> steps
>>> >> >> >> >> >> >> >> below
>>> >> >> >> >> >> >> >> >>>> in
>>> >> >> >> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft
>>> posted
>>> >> >> its
>>> >> >> >> >> >> >> >> >>>> >>> 2013
>>> >> >> >> >> >> >> >> earnings.
>>> >> >> >> >> >> >> >> >>>> The
>>> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text which
>>> has :
>>> >> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies
>>> >> reference
>>> >> >> to
>>> >> >> >> >> >> >> >> >>>> >>> an
>>> >> >> >> >> >> entity
>>> >> >> >> >> >> >> >> local
>>> >> >> >> >> >> >> >> >>>> to
>>> >> >> >> >> >> >> >> >>>> >>> the
>>> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but not
>>> >> "another,
>>> >> >> >> >> every",
>>> >> >> >> >> >> etc
>>> >> >> >> >> >> >> >> which
>>> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity outside of
>>> the
>>> >> >> text.
>>> >> >> >> >> >> >> >> >>>> >>>      b. having at least another noun aside
>>> from
>>> >> the
>>> >> >> >> >> >> >> >> >>>> >>> main
>>> >> >> >> >> >> >> required
>>> >> >> >> >> >> >> >> >>>> noun
>>> >> >> >> >> >> >> >> >>>> >>> which
>>> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I will not
>>> >> count
>>> >> >> >> >> >> >> >> >>>> >>> "The
>>> >> >> >> >> >> >> company"
>>> >> >> >> >> >> >> >> as
>>> >> >> >> >> >> >> >> >>>> being
>>> >> >> >> >> >> >> >> >>>> >>> a
>>> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could
>>> create a
>>> >> lot
>>> >> >> of
>>> >> >> >> >> false
>>> >> >> >> >> >> >> >> >>>> positives by
>>> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of some words
>>> >> such
>>> >> >> as
>>> >> >> >> >> >> >> >> >>>> >>> "in
>>> >> >> >> >> the
>>> >> >> >> >> >> >> >> company
>>> >> >> >> >> >> >> >> >>>> of
>>> >> >> >> >> >> >> >> >>>> >>> good people".
>>> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good candidate
>>> >> since we
>>> >> >> >> >> >> >> >> >>>> >>> also
>>> >> >> >> >> >> have
>>> >> >> >> >> >> >> >> >>>> "software".
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the
>>> >> >> contents
>>> >> >> >> >> >> >> >> >>>> >>> of
>>> >> >> >> >> the
>>> >> >> >> >> >> >> >> dbpedia
>>> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found prior
>>> to
>>> >> the
>>> >> >> >> >> location
>>> >> >> >> >> >> of
>>> >> >> >> >> >> >> the
>>> >> >> >> >> >> >> >> >>>> noun
>>> >> >> >> >> >> >> >> >>>> >>> phrase in the text.
>>> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the following
>>> >> format
>>> >> >> >> >> >> >> >> >>>> >>> (for
>>> >> >> >> >> >> >> Microsoft
>>> >> >> >> >> >> >> >> for
>>> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the United
>>> >> >> States".
>>> >> >> >> >> >> >> >> >>>> >>>   So we try to match "software company" with
>>> >> that.
>>> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in the
>>> >> dbpedia
>>> >> >> >> >> category
>>> >> >> >> >> >> >> has a
>>> >> >> >> >> >> >> >> >>>> plural
>>> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all categories
>>> which
>>> >> I
>>> >> >> >> >> >> >> >> >>>> >>> saw. I
>>> >> >> >> >> >> don't
>>> >> >> >> >> >> >> >> know
>>> >> >> >> >> >> >> >> >>>> if
>>> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I
>>> thought
>>> >> of
>>> >> >> >> >> applying a
>>> >> >> >> >> >> >> >> >>>> lemmatizer on
>>> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in order
>>> for
>>> >> them
>>> >> >> to
>>> >> >> >> >> have a
>>> >> >> >> >> >> >> >> common
>>> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun
>>> phrase
>>> >> >> itself
>>> >> >> >> >> has a
>>> >> >> >> >> >> >> plural
>>> >> >> >> >> >> >> >> >>>> form.
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison
>>> only the
>>> >> >> >> >> >> >> >> >>>> >>> words in
>>> >> >> >> >> >> the
>>> >> >> >> >> >> >> >> >>>> category
>>> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not
>>> prepositions
>>> >> or
>>> >> >> >> >> >> determiners
>>> >> >> >> >> >> >> >> such
>>> >> >> >> >> >> >> >> >>>> as "of
>>> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag the
>>> >> >> categories
>>> >> >> >> >> >> contents
>>> >> >> >> >> >> >> as
>>> >> >> >> >> >> >> >> >>>> well.
>>> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and lemma
>>> on
>>> >> the
>>> >> >> >> >> dbpedia
>>> >> >> >> >> >> >> >> >>>> categories when
>>> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub and
>>> >> storing
>>> >> >> >> >> >> >> >> >>>> >>> them
>>> >> >> >> >> for
>>> >> >> >> >> >> >> later
>>> >> >> >> >> >> >> >> >>>> use - I
>>> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the
>>> moment.
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in the
>>> noun
>>> >> >> phrase
>>> >> >> >> >> with
>>> >> >> >> >> >> the
>>> >> >> >> >> >> >> >> >>>> equivalent
>>> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on the
>>> number
>>> >> of
>>> >> >> >> >> matches I
>>> >> >> >> >> >> >> can
>>> >> >> >> >> >> >> >> >>>> create a
>>> >> >> >> >> >> >> >> >>>> >>> confidence level.
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with
>>> the
>>> >> >> >> >> >> >> >> >>>> >>> rdf:type
>>> >> >> >> >> from
>>> >> >> >> >> >> >> >> dbpedia
>>> >> >> >> >> >> >> >> >>>> of the
>>> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase the
>>> >> >> confidence
>>> >> >> >> >> level.
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities
>>> which
>>> >> can
>>> >> >> >> >> >> >> >> >>>> >>> match a
>>> >> >> >> >> >> >> certain
>>> >> >> >> >> >> >> >> >>>> noun
>>> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with the
>>> >> closest
>>> >> >> >> >> >> >> >> >>>> >>> named
>>> >> >> >> >> >> entity
>>> >> >> >> >> >> >> >> prior
>>> >> >> >> >> >> >> >> >>>> to it
>>> >> >> >> >> >> >> >> >>>> >>> in the text.
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> What do you think?
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> Cristian
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>>> >> >> >> >> [email protected]>:
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but
>>> I'm
>>> >> >> >> >> >> >> >> >>>> >>>> working on
>>> >> >> >> >> >> it.
>>> >> >> >> >> >> >> I'll
>>> >> >> >> >> >> >> >> >>>> provide
>>> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a
>>> >> feedback on
>>> >> >> >> >> >> >> >> >>>> >>>> it.
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools
>>> such as
>>> >> >> >> >> >> >> >> >>>> >>>> ArkRef
>>> >> >> >> >> and
>>> >> >> >> >> >> >> >> >>>> CherryPicker
>>> >> >> >> >> >> >> >> >>>> >>>> and
>>> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>> Cristian
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <[email protected]>:
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about your
>>> >> concrete
>>> >> >> >> >> heuristic,
>>> >> >> >> >> >> >> in my
>>> >> >> >> >> >> >> >> >>>> honest
>>> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a
>>> lot of
>>> >> >> false
>>> >> >> >> >> >> >> positives. I
>>> >> >> >> >> >> >> >> >>>> don't
>>> >> >> >> >> >> >> >> >>>> >>>>> know
>>> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some "locality"
>>> >> >> features
>>> >> >> >> >> >> >> >> >>>> >>>>> to
>>> >> >> >> >> >> detect
>>> >> >> >> >> >> >> >> such
>>> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into
>>> account
>>> >> >> that
>>> >> >> >> >> >> >> >> >>>> >>>>> it
>>> >> >> >> >> is
>>> >> >> >> >> >> >> quite
>>> >> >> >> >> >> >> >> >>>> usual
>>> >> >> >> >> >> >> >> >>>> >>>>> that
>>> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in
>>> >> >> different
>>> >> >> >> >> >> >> paragraphs.
>>> >> >> >> >> >> >> >> >>>> Although
>>> >> >> >> >> >> >> >> >>>> >>>>> I'm
>>> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
>>> >> Understanding,
>>> >> >> I
>>> >> >> >> >> would
>>> >> >> >> >> >> say
>>> >> >> >> >> >> >> it
>>> >> >> >> >> >> >> >> is
>>> >> >> >> >> >> >> >> >>>> quite
>>> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent precision/recall
>>> rates
>>> >> >> for
>>> >> >> >> >> >> >> coreferencing
>>> >> >> >> >> >> >> >> >>>> using
>>> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to
>>> >> others
>>> >> >> >> >> >> >> >> >>>> >>>>> tools
>>> >> >> >> >> like
>>> >> >> >> >> >> >> BART
>>> >> >> >> >> >> >> >> (
>>> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>> Cheers,
>>> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
>>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca
>>> escribió:
>>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>   Hi,
>>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for
>>> implementing
>>> >> the
>>> >> >> >> >> >> >> >> >>>> >>>>>> Event
>>> >> >> >> >> >> >> >> extraction
>>> >> >> >> >> >> >> >> >>>> Engine
>>> >> >> >> >> >> >> >> >>>> >>>>>> feature :
>>> >> >> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
>>> >> >> >> >> >> >> >> >>>> to
>>> >> >> >> >> >> >> >> >>>> >>>>>> have
>>> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given text.
>>> >> This
>>> >> >> is
>>> >> >> >> >> >> provided
>>> >> >> >> >> >> >> now
>>> >> >> >> >> >> >> >> >>>> via the
>>> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw
>>> this
>>> >> >> >> >> >> >> >> >>>> >>>>>> module
>>> >> >> >> >> is
>>> >> >> >> >> >> >> >> performing
>>> >> >> >> >> >> >> >> >>>> >>>>>> mostly
>>> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack
>>> Obama
>>> >> and
>>> >> >> >> >> >> >> >> >>>> >>>>>> Mr.
>>> >> >> >> >> >> Obama)
>>> >> >> >> >> >> >> >> >>>> coreference
>>> >> >> >> >> >> >> >> >>>> >>>>>> resolution.
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences from
>>> the
>>> >> text
>>> >> >> I
>>> >> >> >> >> though
>>> >> >> >> >> >> of
>>> >> >> >> >> >> >> >> >>>> creating
>>> >> >> >> >> >> >> >> >>>> >>>>>> some
>>> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
>>> >> >> coreference :
>>> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The
>>> >> software
>>> >> >> >> >> company
>>> >> >> >> >> >> just
>>> >> >> >> >> >> >> >> >>>> announced
>>> >> >> >> >> >> >> >> >>>> >>>>>> its
>>> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
>>> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously
>>> refers
>>> >> to
>>> >> >> >> >> "Apple".
>>> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of
>>> Named
>>> >> >> >> >> >> >> >> >>>> >>>>>> Entities
>>> >> >> >> >> >> which
>>> >> >> >> >> >> >> are
>>> >> >> >> >> >> >> >> of
>>> >> >> >> >> >> >> >> >>>> the
>>> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this
>>> case
>>> >> >> >> >> >> >> >> >>>> >>>>>> "company"
>>> >> >> >> >> and
>>> >> >> >> >> >> >> also
>>> >> >> >> >> >> >> >> >>>> have
>>> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the
>>> dbpedia
>>> >> >> >> >> categories
>>> >> >> >> >> >> of
>>> >> >> >> >> >> >> the
>>> >> >> >> >> >> >> >> >>>> named
>>> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as
>>> "The
>>> >> >> >> >> >> >> >> >>>> >>>>>> software
>>> >> >> >> >> >> >> company" in
>>> >> >> >> >> >> >> >> >>>> the
>>> >> >> >> >> >> >> >> >>>> >>>>>> text
>>> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using the
>>> new
>>> >> Pos
>>> >> >> Tag
>>> >> >> >> >> Based
>>> >> >> >> >> >> >> Phrase
>>> >> >> >> >> >> >> >> >>>> >>>>>> extraction
>>> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
>>> >> dependency
>>> >> >> >> >> >> >> >> >>>> >>>>>> tree of
>>> >> >> >> >> >> the
>>> >> >> >> >> >> >> >> >>>> sentence and
>>> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this
>>> kind
>>> >> of
>>> >> >> >> >> >> >> >> >>>> >>>>>> logic
>>> >> >> >> >> >> would
>>> >> >> >> >> >> >> be
>>> >> >> >> >> >> >> >> >>>> useful
>>> >> >> >> >> >> >> >> >>>> >>>>>> as a
>>> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the
>>> >> >> precision
>>> >> >> >> >> >> >> >> >>>> >>>>>> and
>>> >> >> >> >> >> >> recall
>>> >> >> >> >> >> >> >> are
>>> >> >> >> >> >> >> >> >>>> good
>>> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>> Thanks,
>>> >> >> >> >> >> >> >> >>>> >>>>>> Cristian
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> --
>>> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler
>>> >> >> >> >> [email protected]
>>> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
>>> >> >> >> >> >> >> ++43-699-11108907
>>> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> --
>>> >> >> >> >> >> >> >> | Rupert Westenthaler
>>> >> >> >> >> >> >> >> [email protected]
>>> >> >> >> >> >> >> >> | Bodenlehenstraße 11
>>> >> >> >> >> ++43-699-11108907
>>> >> >> >> >> >> >> >> | A-5500 Bischofshofen
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >> --
>>> >> >> >> >> >> >> | Rupert Westenthaler
>>> >> >> [email protected]
>>> >> >> >> >> >> >> | Bodenlehenstraße 11
>>> >> >> >> >> >> >> ++43-699-11108907
>>> >> >> >> >> >> >> | A-5500 Bischofshofen
>>> >> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >> >> --
>>> >> >> >> >> >> | Rupert Westenthaler
>>> >> [email protected]
>>> >> >> >> >> >> | Bodenlehenstraße 11
>>> >> >> ++43-699-11108907
>>> >> >> >> >> >> | A-5500 Bischofshofen
>>> >> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> --
>>> >> >> >> >> | Rupert Westenthaler
>>> [email protected]
>>> >> >> >> >> | Bodenlehenstraße 11
>>> >> ++43-699-11108907
>>> >> >> >> >> | A-5500 Bischofshofen
>>> >> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >> | Rupert Westenthaler             [email protected]
>>> >> >> >> | Bodenlehenstraße 11
>>> ++43-699-11108907
>>> >> >> >> | A-5500 Bischofshofen
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> | Rupert Westenthaler             [email protected]
>>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >> >> | A-5500 Bischofshofen
>>> >> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> | Rupert Westenthaler             [email protected]
>>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >> | A-5500 Bischofshofen
>>> >>
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             [email protected]
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Reply via email to