Examples : 1. Group membership : a. Spatial membership :
"Microsoft anounced its 2013 earnings. <coref>The Richmond-based company</coref> made huge profits." b. Organisational membership : "Mick Jagger started a new solo album. <coref>The Rolling Stones singer</coref> did not say what the theme will be." 2. Functional membership : "Allianz announced its 2013 earnings. <coref>The financial services company</coref> made a huge profit." 3. If no matches were found for the current NER with rules from above then if the yago:class which matched has more than 2 nouns then we also consider this a good co-reference but with a lower confidence maybe. "Boris Becker will take part in a demonstrative tennis match. <coref>The former tennis player</coref> will play again after 10 years." 2014-03-28 12:22 GMT+02:00 Rupert Westenthaler < rupert.westentha...@gmail.com>: > Hi Cristian, all > > Looks good to me, nut I am not sure if I got everything. If you could > provide example texts where those rules apply it would make it much > easier to understand. > > Instead of using dbpedia properties you should define your own domain > model (ontology). You can than align the dbpedia properties to your > model. This will allow it to apply this approach also to knowledge > bases other than dbpedia. > > For people new to this thread: The above message adds to the > suggestion first made by Cristian on 4th February. Also the following > 4 messages (until 7th Feb) provide additional context. > > best > Rupert > > > On Fri, Mar 28, 2014 at 9:23 AM, Cristian Petroaca > <cristian.petro...@gmail.com> wrote: > > Hi guys, > > > > After Rupert's last suggestions related to this enhancement engine I > > devised a more comprehensive algorithm for matching the noun phrases > > against the NER properties.Please take a look and let me know what you > > think. Thanks. > > > > The following rules will be applied to every noun phrase in order to find > > co-references: > > > > 1. For each NER prior to the current noun phrase in the text match the > > yago:class label to the contents of the noun phrase. > > > > For the NERs which have a yago:class which matches, apply: > > > > 2. Group membership rules : > > > > a. spatial membership : the NER is part of a Location. If the noun > > phrase contains a LOCATION or a demonym then check any location > properties > > of the matching NER. > > > > If matching NER is a : > > - person, match against :birthPlace, :region, :nationality > > - organisation, match against :foundationPlace, :locationCity, > > :location, :hometown > > - place, match against :country, :subdivisionName, :location, > > > > Ex: The Italian President, The Richmond-based company > > > > b. organisational membership : the NER is part of an Organisation. If > > the noun phrase contains an ORGANISATION then check the following > > properties of the maching NER: > > > > If matching NER is : > > - person, match against :occupation, :associatedActs > > - organisation ? > > - location ? > > > > Ex: The Microsoft executive, The Pink Floyd singer > > > > 3. Functional description rule: the noun phrase describes what the NER > does > > conceptually. > > If there are no NERs in the noun phrase then match the following > properties > > of the matching NER to the contents of the noun phrase (aside from the > > nouns which are part of the yago:class) : > > > > If NER is a: > > - person ? > > - organisation : , match against :service, :industry, :genre > > - location ? > > > > Ex: The software company. > > > > 4. If no matches were found for the current NER with rules 2 or 3 then if > > the yago:class which matched has more than 2 nouns then we also consider > > this a good co-reference but with a lower confidence maybe. > > > > Ex: The former tennis player, the theoretical physicist. > > > > 5. Based on the number of nouns which matched we create a confidence > level. > > The number of matched nouns cannot be lower than 2 and we must have a > > yago:class match. > > > > For all NERs which got to this point, select the closest ones in the text > > to the noun phrase which matched against the same properties (yago:class > > and dbpedia) and mark them as co-references. > > > > Note: all noun phrases need to be lemmatized before all of this in case > > there are any plurals. > > > > > > 2014-03-25 20:50 GMT+02:00 Cristian Petroaca < > cristian.petro...@gmail.com>: > > > >> That worked. Thanks. > >> > >> So, there are no exceptions during the startup of the launcher. > >> The component tab in the felix console shows 6 WeightedChains the first > >> time, including the default one but after my changes and a restart there > >> are only 5 - the default one is missing altogether. > >> > >> > >> 2014-03-24 20:18 GMT+02:00 Rupert Westenthaler < > >> rupert.westentha...@gmail.com>: > >> > >> Hi Cristian, > >>> > >>> I do see the same problem since last Friday. The solution as mentions > >>> by [1] works for me. > >>> > >>> mvn -Djsse.enableSNIExtension=false {goals} > >>> > >>> No Idea why https connections to github do currently cause this. I > >>> could not find anything related via Google. So I suggest to use the > >>> system property for now. If this persists for longer we can adapt the > >>> build files accordingly. > >>> > >>> best > >>> Rupert > >>> > >>> > >>> > >>> > >>> [1] > >>> > http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0 > >>> > >>> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca > >>> <cristian.petro...@gmail.com> wrote: > >>> > I did a clean on the whole project and now I wanted to do another > "mvn > >>> > clean install" but I am getting this : > >>> > > >>> > "[INFO] > >>> > > ------------------------------------------------------------------------ > >>> > [ERROR] Failed to execute goal > >>> > org.apache.maven.plugins:maven-antrun-plugin:1.6: > >>> > run (download) on project org.apache.stanbol.data.opennlp.lang.es: > An > >>> Ant > >>> > BuildE > >>> > xception has occured: The following error occurred while executing > this > >>> > line: > >>> > [ERROR] > >>> > > C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3 > >>> > 3: Failed to copy > >>> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140 > >>> > 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin to > >>> > C:\Data\Pr > >>> > > >>> > ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\ > >>> > data\opennlp\es-pos-maxent.bin due to > javax.net.ssl.SSLProtocolException > >>> > handshake alert : unrecognized_name" > >>> > > >>> > > >>> > > >>> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler < > >>> > rupert.westentha...@gmail.com>: > >>> > > >>> >> Hi Cristian, > >>> >> > >>> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca > >>> >> <cristian.petro...@gmail.com> wrote: > >>> >> > > >>> >> > >>> > stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"] > >>> >> > service.ranking=I"-2147483648" > >>> >> > stanbol.enhancer.chain.name="default" > >>> >> > >>> >> Does look fine to me. Do you see any exception during the startup of > >>> >> the launcher. Can you check the status of this component in the > >>> >> component tab of the felix web console [1] (search for > >>> >> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If > >>> >> you have multiple you can find the correct one by comparing the > >>> >> "Properties" with those in the configuration file. > >>> >> > >>> >> I guess that the according service is in the 'unsatisfied' as you do > >>> >> not see it in the web interface. But if this is the case you should > >>> >> also see the according exception in the log. You can also manually > >>> >> stop/start the component. In this case the exception should be > >>> >> re-thrown and you do not need to search the log for it. > >>> >> > >>> >> best > >>> >> Rupert > >>> >> > >>> >> > >>> >> [1] http://localhost:8080/system/console/components > >>> >> > >>> >> > > >>> >> > > >>> >> > > >>> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler < > >>> >> rupert.westentha...@gmail.com > >>> >> >>: > >>> >> > > >>> >> >> Hi Cristian, > >>> >> >> > >>> >> >> you can not send attachments to the list. Please copy the > contents > >>> >> >> directly to the mail > >>> >> >> > >>> >> >> thx > >>> >> >> Rupert > >>> >> >> > >>> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca > >>> >> >> <cristian.petro...@gmail.com> wrote: > >>> >> >> > The config attached. > >>> >> >> > > >>> >> >> > > >>> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler > >>> >> >> > <rupert.westentha...@gmail.com>: > >>> >> >> > > >>> >> >> >> Hi Cristian, > >>> >> >> >> > >>> >> >> >> can you provide the contents of the chain after your > >>> modifications? > >>> >> >> >> Would be interesting to test why the chain is no longer active > >>> after > >>> >> >> >> the restart. > >>> >> >> >> > >>> >> >> >> You can find the config file in the 'stanbol/fileinstall' > folder. > >>> >> >> >> > >>> >> >> >> best > >>> >> >> >> Rupert > >>> >> >> >> > >>> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca > >>> >> >> >> <cristian.petro...@gmail.com> wrote: > >>> >> >> >> > Related to the default chain selection rules : before > restart I > >>> >> had a > >>> >> >> >> > chain > >>> >> >> >> > with the name 'default' as in I could access it via > >>> >> >> >> > enhancer/chain/default. > >>> >> >> >> > Then I just added another engine to the 'default' chain. I > >>> assumed > >>> >> >> that > >>> >> >> >> > after the restart the chain with the 'default' name would be > >>> >> >> persisted. > >>> >> >> >> > So > >>> >> >> >> > the first rule should have been applied after the restart as > >>> well. > >>> >> But > >>> >> >> >> > instead I cannot reach it via enhancer/chain/default anymore > >>> so its > >>> >> >> >> > gone. > >>> >> >> >> > Anyway, this is not a big deal, it's not blocking me in any > >>> way, I > >>> >> >> just > >>> >> >> >> > wanted to understand where the problem is. > >>> >> >> >> > > >>> >> >> >> > > >>> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler > >>> >> >> >> > <rupert.westentha...@gmail.com > >>> >> >> >> >>: > >>> >> >> >> > > >>> >> >> >> >> Hi Cristian > >>> >> >> >> >> > >>> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca > >>> >> >> >> >> <cristian.petro...@gmail.com> wrote: > >>> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool > >>> >> >> >> >> > > >>> >> >> >> >> > 2. I start the stable launcher -> create a new instance > of > >>> the > >>> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At this > >>> point > >>> >> >> >> >> > everything > >>> >> >> >> >> > looks good and works ok. > >>> >> >> >> >> > After I restart the server the default chain is gone and > >>> >> instead I > >>> >> >> >> >> > see > >>> >> >> >> >> this > >>> >> >> >> >> > in the enhancement chains page : all-active (default, id: > >>> 149, > >>> >> >> >> >> > ranking: > >>> >> >> >> >> 0, > >>> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not contain > >>> the > >>> >> >> >> >> > 'default' > >>> >> >> >> >> > word before the restart. > >>> >> >> >> >> > > >>> >> >> >> >> > >>> >> >> >> >> Please note the default chain selection rules as described > at > >>> [1]. > >>> >> >> You > >>> >> >> >> >> can also access chains chains under > >>> '/enhancer/chain/{chain-name}' > >>> >> >> >> >> > >>> >> >> >> >> best > >>> >> >> >> >> Rupert > >>> >> >> >> >> > >>> >> >> >> >> [1] > >>> >> >> >> >> > >>> >> >> >> >> > >>> >> >> > >>> >> > >>> > http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain > >>> >> >> >> >> > >>> >> >> >> >> > It looks like the config files are exactly what I need. > >>> Thanks. > >>> >> >> >> >> > > >>> >> >> >> >> > > >>> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler < > >>> >> >> >> >> rupert.westentha...@gmail.com > >>> >> >> >> >> >>: > >>> >> >> >> >> > > >>> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca > >>> >> >> >> >> >> <cristian.petro...@gmail.com> wrote: > >>> >> >> >> >> >> > Thanks Rupert. > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > A couple more questions/issues : > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing this > >>> in the > >>> >> >> >> >> >> > console > >>> >> >> >> >> >> > output : > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > >>> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2] > >>> >> >> >> >> >> > >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains > get > >>> >> messed > >>> >> >> >> >> >> > up. I > >>> >> >> >> >> >> > usually use the 'default' chain and add my engine to > it > >>> so > >>> >> there > >>> >> >> >> >> >> > are > >>> >> >> >> >> 11 > >>> >> >> >> >> >> > engines in it. After the restart this chain now > contains > >>> >> around > >>> >> >> 23 > >>> >> >> >> >> >> engines > >>> >> >> >> >> >> > in total. > >>> >> >> >> >> >> > >>> >> >> >> >> >> I was not able to replicate this. What I tried was > >>> >> >> >> >> >> > >>> >> >> >> >> >> (1) start up the stable launcher > >>> >> >> >> >> >> (2) add an additional engine to the default chain > >>> >> >> >> >> >> (3) restart the launcher > >>> >> >> >> >> >> > >>> >> >> >> >> >> The default chain was not changed after (2) and (3). So > I > >>> would > >>> >> >> need > >>> >> >> >> >> >> further information for knowing why this is happening. > >>> >> >> >> >> >> > >>> >> >> >> >> >> Generally it is better to create you own chain instance > as > >>> >> >> modifying > >>> >> >> >> >> >> one that is provided by the default configuration. I > would > >>> also > >>> >> >> >> >> >> recommend that you keep your test configuration in text > >>> files > >>> >> and > >>> >> >> to > >>> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so > >>> >> prevent > >>> >> >> you > >>> >> >> >> >> >> from manually entering the configuration after a > software > >>> >> update. > >>> >> >> >> >> >> The > >>> >> >> >> >> >> production-mode section [3] provides information on how > to > >>> do > >>> >> >> that. > >>> >> >> >> >> >> > >>> >> >> >> >> >> best > >>> >> >> >> >> >> Rupert > >>> >> >> >> >> >> > >>> >> >> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278 > >>> >> >> >> >> >> [2] http://svn.apache.org/r1576623 > >>> >> >> >> >> >> [3] > http://stanbol.apache.org/docs/trunk/production-mode > >>> >> >> >> >> >> > >>> >> >> >> >> >> > ERROR: Bundle > >>> org.apache.stanbol.enhancer.engine.topic.web > >>> >> >> [153]: > >>> >> >> >> >> Error > >>> >> >> >> >> >> > starting > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > >>> >> >> >> >> > >>> >> >> >> >> > >>> >> >> > >>> >> > >>> > slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > > >>> >> >> > >>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar > >>> >> >> >> >> >> > (org.osgi > >>> >> >> >> >> >> > .framework.BundleException: Unresolved constraint in > >>> bundle > >>> >> >> >> >> >> > org.apache.stanbol.e > >>> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve > 153.0: > >>> >> missing > >>> >> >> >> >> >> > requirement [15 > >>> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs > >>> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0)))) > >>> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved > >>> constraint in > >>> >> >> >> >> >> > bundle > >>> >> >> >> >> >> > org.apache.s > >>> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to > resolve > >>> >> 153.0: > >>> >> >> >> >> missing > >>> >> >> >> >> >> > require > >>> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs > >>> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0)) > >>> >> >> >> >> >> > ) > >>> >> >> >> >> >> > at > >>> >> >> >> >> >> > >>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443) > >>> >> >> >> >> >> > at > >>> >> >> >> >> > org.apache.felix.framework.Felix.startBundle(Felix.java:1727) > >>> >> >> >> >> >> > at > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > > >>> >> >> > >>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156) > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > at > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > > >>> >> >> > >>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264 > >>> >> >> >> >> >> > ) > >>> >> >> >> >> >> > at java.lang.Thread.run(Unknown Source) > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > Despite of this the server starts fine and I can use > the > >>> >> >> enhancer > >>> >> >> >> >> fine. > >>> >> >> >> >> >> Do > >>> >> >> >> >> >> > you guys see this as well? > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains > get > >>> >> messed > >>> >> >> >> >> >> > up. I > >>> >> >> >> >> >> > usually use the 'default' chain and add my engine to > it > >>> so > >>> >> there > >>> >> >> >> >> >> > are > >>> >> >> >> >> 11 > >>> >> >> >> >> >> > engines in it. After the restart this chain now > contains > >>> >> around > >>> >> >> 23 > >>> >> >> >> >> >> engines > >>> >> >> >> >> >> > in total. > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > > >>> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler < > >>> >> >> >> >> >> rupert.westentha...@gmail.com > >>> >> >> >> >> >> >>: > >>> >> >> >> >> >> > > >>> >> >> >> >> >> >> Hi Cristian, > >>> >> >> >> >> >> >> > >>> >> >> >> >> >> >> NER Annotations are typically available as both > >>> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and > fise:TextAnnotation > >>> [1] > >>> >> in > >>> >> >> the > >>> >> >> >> >> >> >> enhancement metadata. As you are already accessing > the > >>> >> >> >> >> >> >> AnayzedText I > >>> >> >> >> >> >> >> would prefer using the > NlpAnnotations.NER_ANNOTATION. > >>> >> >> >> >> >> >> > >>> >> >> >> >> >> >> best > >>> >> >> >> >> >> >> Rupert > >>> >> >> >> >> >> >> > >>> >> >> >> >> >> >> [1] > >>> >> >> >> >> >> >> > >>> >> >> >> >> >> > >>> >> >> >> >> > >>> >> >> >> >> > >>> >> >> > >>> >> > >>> > http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation > >>> >> >> >> >> >> >> > >>> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca > >>> >> >> >> >> >> >> <cristian.petro...@gmail.com> wrote: > >>> >> >> >> >> >> >> > Thanks. > >>> >> >> >> >> >> >> > I assume I should get the Named entities using the > >>> same > >>> >> but > >>> >> >> >> >> >> >> > with > >>> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION? > >>> >> >> >> >> >> >> > > >>> >> >> >> >> >> >> > > >>> >> >> >> >> >> >> > > >>> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler < > >>> >> >> >> >> >> >> > rupert.westentha...@gmail.com>: > >>> >> >> >> >> >> >> > > >>> >> >> >> >> >> >> >> Hallo Cristian, > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement > >>> results. > >>> >> >> You > >>> >> >> >> >> need to > >>> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1] > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> here is some demo code you can use in the > >>> >> computeEnhancement > >>> >> >> >> >> method > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> AnalysedText at = > >>> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this, > >>> >> >> >> >> ci, > >>> >> >> >> >> >> >> true); > >>> >> >> >> >> >> >> >> Iterator<? extends Section> sections = > >>> >> >> >> >> >> >> >> at.getSentences(); > >>> >> >> >> >> >> >> >> if(!sections.hasNext()){ //process as > single > >>> >> >> sentence > >>> >> >> >> >> >> >> >> sections = > >>> >> Collections.singleton(at).iterator(); > >>> >> >> >> >> >> >> >> } > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> while(sections.hasNext()){ > >>> >> >> >> >> >> >> >> Section section = sections.next(); > >>> >> >> >> >> >> >> >> Iterator<Span> chunks = > >>> >> >> >> >> >> >> >> > section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk)); > >>> >> >> >> >> >> >> >> while(chunks.hasNext()){ > >>> >> >> >> >> >> >> >> Span chunk = chunks.next(); > >>> >> >> >> >> >> >> >> Value<PhraseTag> phrase = > >>> >> >> >> >> >> >> >> > >>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION); > >>> >> >> >> >> >> >> >> if(phrase.value().getCategory() == > >>> >> >> >> >> >> >> LexicalCategory.Noun){ > >>> >> >> >> >> >> >> >> log.info(" - NounPhrase > [{},{}] > >>> {}", > >>> >> >> new > >>> >> >> >> >> >> Object[]{ > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()}); > >>> >> >> >> >> >> >> >> } > >>> >> >> >> >> >> >> >> } > >>> >> >> >> >> >> >> >> } > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> hope this helps > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> best > >>> >> >> >> >> >> >> >> Rupert > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> [1] > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> > >>> >> >> >> >> >> > >>> >> >> >> >> > >>> >> >> >> >> > >>> >> >> > >>> >> > >>> > http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca > >>> >> >> >> >> >> >> >> <cristian.petro...@gmail.com> wrote: > >>> >> >> >> >> >> >> >> > I started to implement the engine and I'm having > >>> >> problems > >>> >> >> >> >> >> >> >> > with > >>> >> >> >> >> >> getting > >>> >> >> >> >> >> >> >> > results for noun phrases. I modified the > "default" > >>> >> >> weighted > >>> >> >> >> >> chain > >>> >> >> >> >> >> to > >>> >> >> >> >> >> >> also > >>> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample > text > >>> : > >>> >> >> "Angela > >>> >> >> >> >> Merkel > >>> >> >> >> >> >> >> >> visted > >>> >> >> >> >> >> >> >> > China. The german chancellor met with various > >>> people". > >>> >> I > >>> >> >> >> >> expected > >>> >> >> >> >> >> that > >>> >> >> >> >> >> >> >> the > >>> >> >> >> >> >> >> >> > RDF XML output would contain some info about the > >>> noun > >>> >> >> >> >> >> >> >> > phrases > >>> >> >> >> >> but I > >>> >> >> >> >> >> >> >> cannot > >>> >> >> >> >> >> >> >> > see any. > >>> >> >> >> >> >> >> >> > Could you point me to the correct way to > generate > >>> the > >>> >> noun > >>> >> >> >> >> phrases? > >>> >> >> >> >> >> >> >> > > >>> >> >> >> >> >> >> >> > Thanks, > >>> >> >> >> >> >> >> >> > Cristian > >>> >> >> >> >> >> >> >> > > >>> >> >> >> >> >> >> >> > > >>> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca < > >>> >> >> >> >> >> >> >> cristian.petro...@gmail.com>: > >>> >> >> >> >> >> >> >> > > >>> >> >> >> >> >> >> >> >> Opened > >>> >> >> https://issues.apache.org/jira/browse/STANBOL-1279 > >>> >> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca < > >>> >> >> >> >> >> >> >> cristian.petro...@gmail.com> > >>> >> >> >> >> >> >> >> >> : > >>> >> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> >> Hi Rupert, > >>> >> >> >> >> >> >> >> >>> > >>> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll > also > >>> >> take a > >>> >> >> >> >> >> >> >> >>> look > >>> >> >> >> >> at > >>> >> >> >> >> >> >> Yago. > >>> >> >> >> >> >> >> >> >>> > >>> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked about > >>> here. > >>> >> It > >>> >> >> >> >> >> >> >> >>> will > >>> >> >> >> >> >> >> probably > >>> >> >> >> >> >> >> >> >>> have just a draft-like description for now and > >>> will > >>> >> be > >>> >> >> >> >> >> >> >> >>> updated > >>> >> >> >> >> >> as I > >>> >> >> >> >> >> >> go > >>> >> >> >> >> >> >> >> >>> along. > >>> >> >> >> >> >> >> >> >>> > >>> >> >> >> >> >> >> >> >>> Thanks, > >>> >> >> >> >> >> >> >> >>> Cristian > >>> >> >> >> >> >> >> >> >>> > >>> >> >> >> >> >> >> >> >>> > >>> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert > Westenthaler < > >>> >> >> >> >> >> >> >> >>> rupert.westentha...@gmail.com>: > >>> >> >> >> >> >> >> >> >>> > >>> >> >> >> >> >> >> >> >>> Hi Cristian, > >>> >> >> >> >> >> >> >> >>>> > >>> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You > should > >>> have > >>> >> a > >>> >> >> >> >> >> >> >> >>>> look at > >>> >> >> >> >> >> Yago2 > >>> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago > taxonomy > >>> is > >>> >> much > >>> >> >> >> >> better > >>> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia. > Mapping > >>> >> >> >> >> >> >> >> >>>> suggestions of > >>> >> >> >> >> >> >> dbpedia > >>> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia > and > >>> >> yago2 > >>> >> >> do > >>> >> >> >> >> >> provide > >>> >> >> >> >> >> >> >> >>>> mappings [2] and [3] > >>> >> >> >> >> >> >> >> >>>> > >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro > >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>: > >>> >> >> >> >> >> >> >> >>>> >> > >>> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The > >>> >> Redmond's > >>> >> >> >> >> >> >> >> >>>> >> company > >>> >> >> >> >> >> made > >>> >> >> >> >> >> >> a > >>> >> >> >> >> >> >> >> >>>> >> huge profit". > >>> >> >> >> >> >> >> >> >>>> > >>> >> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial > >>> contexts > >>> >> >> are > >>> >> >> >> >> >> >> >> >>>> very > >>> >> >> >> >> >> >> >> >>>> important as they tend to be often used for > >>> >> >> referencing. > >>> >> >> >> >> >> >> >> >>>> So I > >>> >> >> >> >> >> would > >>> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial > context. > >>> For > >>> >> >> >> >> >> >> >> >>>> spatial > >>> >> >> >> >> >> >> Entities > >>> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for > other > >>> >> (like a > >>> >> >> >> >> Person, > >>> >> >> >> >> >> >> >> >>>> Company) you could use relations to spatial > >>> entities > >>> >> >> >> >> >> >> >> >>>> define > >>> >> >> >> >> >> their > >>> >> >> >> >> >> >> >> >>>> spatial context. This context could than be > >>> used to > >>> >> >> >> >> >> >> >> >>>> correctly > >>> >> >> >> >> >> link > >>> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft". > >>> >> >> >> >> >> >> >> >>>> > >>> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the > "spatial" > >>> >> >> context > >>> >> >> >> >> >> >> >> >>>> of > >>> >> >> >> >> each > >>> >> >> >> >> >> >> >> >>>> entity (basically relation to entities that > are > >>> >> cities, > >>> >> >> >> >> regions, > >>> >> >> >> >> >> >> >> >>>> countries) as a separate dimension, because > >>> those > >>> >> are > >>> >> >> >> >> >> >> >> >>>> very > >>> >> >> >> >> often > >>> >> >> >> >> >> >> used > >>> >> >> >> >> >> >> >> >>>> for coreferences. > >>> >> >> >> >> >> >> >> >>>> > >>> >> >> >> >> >> >> >> >>>> [1] > http://www.mpi-inf.mpg.de/yago-naga/yago/ > >>> >> >> >> >> >> >> >> >>>> [2] > >>> >> >> >> >> >> >> >> >>>> > >>> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2 > >>> >> >> >> >> >> >> >> >>>> [3] > >>> >> >> >> >> >> >> >> >>>> > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> > >>> >> >> >> >> >> > >>> >> >> >> >> > >>> >> >> >> >> > >>> >> >> > >>> >> > >>> > http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z > >>> >> >> >> >> >> >> >> >>>> > >>> >> >> >> >> >> >> >> >>>> > >>> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian > >>> Petroaca > >>> >> >> >> >> >> >> >> >>>> <cristian.petro...@gmail.com> wrote: > >>> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories for > each > >>> >> entity, > >>> >> >> >> >> >> >> >> >>>> > in > >>> >> >> >> >> this > >>> >> >> >> >> >> >> case > >>> >> >> >> >> >> >> >> for > >>> >> >> >> >> >> >> >> >>>> > Microsoft we have : > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index > >>> >> >> >> >> >> >> >> >>>> > category:Microsoft > >>> >> >> >> >> >> >> >> >>>> > > >>> category:Software_companies_of_the_United_States > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> category:Software_companies_based_in_Washington_(state) > >>> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975 > >>> >> >> >> >> >> >> >> >>>> > > >>> category:1975_establishments_in_the_United_States > >>> >> >> >> >> >> >> >> >>>> > > >>> category:Companies_based_in_Redmond,_Washington > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> > >>> >> >> >> >> >> >> > >>> >> >> > category:Multinational_companies_headquartered_in_the_United_States > >>> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in > >>> >> >> Redmont,Washington" > >>> >> >> >> >> which > >>> >> >> >> >> >> >> could > >>> >> >> >> >> >> >> >> be > >>> >> >> >> >> >> >> >> >>>> > matched. > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> >> >>>> > There is still other contextual information > >>> from > >>> >> >> >> >> >> >> >> >>>> > dbpedia > >>> >> >> >> >> which > >>> >> >> >> >> >> >> can > >>> >> >> >> >> >> >> >> be > >>> >> >> >> >> >> >> >> >>>> used. > >>> >> >> >> >> >> >> >> >>>> > For example for an Organization we could > also > >>> >> >> include : > >>> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software > >>> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack Obama) > : > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession: > >>> >> >> >> >> >> >> >> >>>> > > dbpedia:Author > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law > >>> >> >> >> >> >> >> >> >>>> > > dbpedia:Lawyer > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this as > I > >>> think > >>> >> >> that > >>> >> >> >> >> >> >> >> >>>> > it > >>> >> >> >> >> may > >>> >> >> >> >> >> >> have > >>> >> >> >> >> >> >> >> >>>> some > >>> >> >> >> >> >> >> >> >>>> > value in increasing the number of > coreference > >>> >> >> >> >> >> >> >> >>>> > resolutions > >>> >> >> >> >> and > >>> >> >> >> >> >> I'd > >>> >> >> >> >> >> >> >> like > >>> >> >> >> >> >> >> >> >>>> to > >>> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather than > >>> recall > >>> >> >> since > >>> >> >> >> >> >> >> >> >>>> > we > >>> >> >> >> >> >> already > >>> >> >> >> >> >> >> >> have > >>> >> >> >> >> >> >> >> >>>> a > >>> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the > stanford > >>> nlp > >>> >> tool > >>> >> >> >> >> >> >> >> >>>> > and > >>> >> >> >> >> this > >>> >> >> >> >> >> >> would > >>> >> >> >> >> >> >> >> >>>> be as > >>> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is how I > >>> would > >>> >> >> like > >>> >> >> >> >> >> >> >> >>>> > to > >>> >> >> >> >> use > >>> >> >> >> >> >> >> it). > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a > jira? I > >>> >> could > >>> >> >> >> >> >> >> >> >>>> > update > >>> >> >> >> >> it > >>> >> >> >> >> >> to > >>> >> >> >> >> >> >> >> show > >>> >> >> >> >> >> >> >> >>>> my > >>> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if it > >>> turns > >>> >> out > >>> >> >> >> >> >> >> >> >>>> > that > >>> >> >> >> >> it > >>> >> >> >> >> >> was > >>> >> >> >> >> >> >> a > >>> >> >> >> >> >> >> >> bad > >>> >> >> >> >> >> >> >> >>>> idea > >>> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll > end up > >>> >> with > >>> >> >> >> >> >> >> >> >>>> > more > >>> >> >> >> >> >> >> knowledge > >>> >> >> >> >> >> >> >> >>>> about > >>> >> >> >> >> >> >> >> >>>> > Stanbol in the end :). > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro > >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>: > >>> >> >> >> >> >> >> >> >>>> > > >>> >> >> >> >> >> >> >> >>>> >> Hi Cristian, > >>> >> >> >> >> >> >> >> >>>> >> > >>> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want to > be > >>> the > >>> >> >> >> >> >> >> >> >>>> >> devil's > >>> >> >> >> >> >> >> advocate > >>> >> >> >> >> >> >> >> but > >>> >> >> >> >> >> >> >> >>>> I'm > >>> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using the > >>> dbpedia > >>> >> >> >> >> categories > >>> >> >> >> >> >> >> >> feature. > >>> >> >> >> >> >> >> >> >>>> For > >>> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also > >>> "Microsoft > >>> >> >> posted > >>> >> >> >> >> >> >> >> >>>> >> its > >>> >> >> >> >> >> 2013 > >>> >> >> >> >> >> >> >> >>>> earnings. > >>> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge profit". > >>> So, > >>> >> maybe > >>> >> >> >> >> >> including > >>> >> >> >> >> >> >> more > >>> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia could > >>> >> increase > >>> >> >> the > >>> >> >> >> >> recall > >>> >> >> >> >> >> >> but > >>> >> >> >> >> >> >> >> of > >>> >> >> >> >> >> >> >> >>>> course > >>> >> >> >> >> >> >> >> >>>> >> will reduce the precision. > >>> >> >> >> >> >> >> >> >>>> >> > >>> >> >> >> >> >> >> >> >>>> >> Cheers, > >>> >> >> >> >> >> >> >> >>>> >> Rafa > >>> >> >> >> >> >> >> >> >>>> >> > >>> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca > >>> escribió: > >>> >> >> >> >> >> >> >> >>>> >> > >>> >> >> >> >> >> >> >> >>>> >> Back with a more detailed description of > the > >>> >> steps > >>> >> >> >> >> >> >> >> >>>> >> for > >>> >> >> >> >> >> making > >>> >> >> >> >> >> >> this > >>> >> >> >> >> >> >> >> >>>> kind of > >>> >> >> >> >> >> >> >> >>>> >>> coreference work. > >>> >> >> >> >> >> >> >> >>>> >>> > >>> >> >> >> >> >> >> >> >>>> >>> I will be using references to the > following > >>> >> text in > >>> >> >> >> >> >> >> >> >>>> >>> the > >>> >> >> >> >> >> steps > >>> >> >> >> >> >> >> >> below > >>> >> >> >> >> >> >> >> >>>> in > >>> >> >> >> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft > >>> posted > >>> >> >> its > >>> >> >> >> >> >> >> >> >>>> >>> 2013 > >>> >> >> >> >> >> >> >> earnings. > >>> >> >> >> >> >> >> >> >>>> The > >>> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit." > >>> >> >> >> >> >> >> >> >>>> >>> > >>> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text > which > >>> has : > >>> >> >> >> >> >> >> >> >>>> >>> a. a determinate pos which implies > >>> >> reference > >>> >> >> to > >>> >> >> >> >> >> >> >> >>>> >>> an > >>> >> >> >> >> >> entity > >>> >> >> >> >> >> >> >> local > >>> >> >> >> >> >> >> >> >>>> to > >>> >> >> >> >> >> >> >> >>>> >>> the > >>> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but not > >>> >> "another, > >>> >> >> >> >> every", > >>> >> >> >> >> >> etc > >>> >> >> >> >> >> >> >> which > >>> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity outside > of > >>> the > >>> >> >> text. > >>> >> >> >> >> >> >> >> >>>> >>> b. having at least another noun > aside > >>> from > >>> >> the > >>> >> >> >> >> >> >> >> >>>> >>> main > >>> >> >> >> >> >> >> required > >>> >> >> >> >> >> >> >> >>>> noun > >>> >> >> >> >> >> >> >> >>>> >>> which > >>> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I will > not > >>> >> count > >>> >> >> >> >> >> >> >> >>>> >>> "The > >>> >> >> >> >> >> >> company" > >>> >> >> >> >> >> >> >> as > >>> >> >> >> >> >> >> >> >>>> being > >>> >> >> >> >> >> >> >> >>>> >>> a > >>> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could > >>> create a > >>> >> lot > >>> >> >> of > >>> >> >> >> >> false > >>> >> >> >> >> >> >> >> >>>> positives by > >>> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of some > words > >>> >> such > >>> >> >> as > >>> >> >> >> >> >> >> >> >>>> >>> "in > >>> >> >> >> >> the > >>> >> >> >> >> >> >> >> company > >>> >> >> >> >> >> >> >> >>>> of > >>> >> >> >> >> >> >> >> >>>> >>> good people". > >>> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good > candidate > >>> >> since we > >>> >> >> >> >> >> >> >> >>>> >>> also > >>> >> >> >> >> >> have > >>> >> >> >> >> >> >> >> >>>> "software". > >>> >> >> >> >> >> >> >> >>>> >>> > >>> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to > the > >>> >> >> contents > >>> >> >> >> >> >> >> >> >>>> >>> of > >>> >> >> >> >> the > >>> >> >> >> >> >> >> >> dbpedia > >>> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found > prior > >>> to > >>> >> the > >>> >> >> >> >> location > >>> >> >> >> >> >> of > >>> >> >> >> >> >> >> the > >>> >> >> >> >> >> >> >> >>>> noun > >>> >> >> >> >> >> >> >> >>>> >>> phrase in the text. > >>> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the > following > >>> >> format > >>> >> >> >> >> >> >> >> >>>> >>> (for > >>> >> >> >> >> >> >> Microsoft > >>> >> >> >> >> >> >> >> for > >>> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the > United > >>> >> >> States". > >>> >> >> >> >> >> >> >> >>>> >>> So we try to match "software company" > with > >>> >> that. > >>> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in > the > >>> >> dbpedia > >>> >> >> >> >> category > >>> >> >> >> >> >> >> has a > >>> >> >> >> >> >> >> >> >>>> plural > >>> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all categories > >>> which > >>> >> I > >>> >> >> >> >> >> >> >> >>>> >>> saw. I > >>> >> >> >> >> >> don't > >>> >> >> >> >> >> >> >> know > >>> >> >> >> >> >> >> >> >>>> if > >>> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I > >>> thought > >>> >> of > >>> >> >> >> >> applying a > >>> >> >> >> >> >> >> >> >>>> lemmatizer on > >>> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in order > >>> for > >>> >> them > >>> >> >> to > >>> >> >> >> >> have a > >>> >> >> >> >> >> >> >> common > >>> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun > >>> phrase > >>> >> >> itself > >>> >> >> >> >> has a > >>> >> >> >> >> >> >> plural > >>> >> >> >> >> >> >> >> >>>> form. > >>> >> >> >> >> >> >> >> >>>> >>> > >>> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison > >>> only the > >>> >> >> >> >> >> >> >> >>>> >>> words in > >>> >> >> >> >> >> the > >>> >> >> >> >> >> >> >> >>>> category > >>> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not > >>> prepositions > >>> >> or > >>> >> >> >> >> >> determiners > >>> >> >> >> >> >> >> >> such > >>> >> >> >> >> >> >> >> >>>> as "of > >>> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag > the > >>> >> >> categories > >>> >> >> >> >> >> contents > >>> >> >> >> >> >> >> as > >>> >> >> >> >> >> >> >> >>>> well. > >>> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and > lemma > >>> on > >>> >> the > >>> >> >> >> >> dbpedia > >>> >> >> >> >> >> >> >> >>>> categories when > >>> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub > and > >>> >> storing > >>> >> >> >> >> >> >> >> >>>> >>> them > >>> >> >> >> >> for > >>> >> >> >> >> >> >> later > >>> >> >> >> >> >> >> >> >>>> use - I > >>> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the > >>> moment. > >>> >> >> >> >> >> >> >> >>>> >>> > >>> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in the > >>> noun > >>> >> >> phrase > >>> >> >> >> >> with > >>> >> >> >> >> >> the > >>> >> >> >> >> >> >> >> >>>> equivalent > >>> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on the > >>> number > >>> >> of > >>> >> >> >> >> matches I > >>> >> >> >> >> >> >> can > >>> >> >> >> >> >> >> >> >>>> create a > >>> >> >> >> >> >> >> >> >>>> >>> confidence level. > >>> >> >> >> >> >> >> >> >>>> >>> > >>> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with > >>> the > >>> >> >> >> >> >> >> >> >>>> >>> rdf:type > >>> >> >> >> >> from > >>> >> >> >> >> >> >> >> dbpedia > >>> >> >> >> >> >> >> >> >>>> of the > >>> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase > the > >>> >> >> confidence > >>> >> >> >> >> level. > >>> >> >> >> >> >> >> >> >>>> >>> > >>> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities > >>> which > >>> >> can > >>> >> >> >> >> >> >> >> >>>> >>> match a > >>> >> >> >> >> >> >> certain > >>> >> >> >> >> >> >> >> >>>> noun > >>> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with the > >>> >> closest > >>> >> >> >> >> >> >> >> >>>> >>> named > >>> >> >> >> >> >> entity > >>> >> >> >> >> >> >> >> prior > >>> >> >> >> >> >> >> >> >>>> to it > >>> >> >> >> >> >> >> >> >>>> >>> in the text. > >>> >> >> >> >> >> >> >> >>>> >>> > >>> >> >> >> >> >> >> >> >>>> >>> What do you think? > >>> >> >> >> >> >> >> >> >>>> >>> > >>> >> >> >> >> >> >> >> >>>> >>> Cristian > >>> >> >> >> >> >> >> >> >>>> >>> > >>> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca < > >>> >> >> >> >> cristian.petro...@gmail.com>: > >>> >> >> >> >> >> >> >> >>>> >>> > >>> >> >> >> >> >> >> >> >>>> >>> Hi Rafa, > >>> >> >> >> >> >> >> >> >>>> >>>> > >>> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic > but > >>> I'm > >>> >> >> >> >> >> >> >> >>>> >>>> working on > >>> >> >> >> >> >> it. > >>> >> >> >> >> >> >> I'll > >>> >> >> >> >> >> >> >> >>>> provide > >>> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a > >>> >> feedback on > >>> >> >> >> >> >> >> >> >>>> >>>> it. > >>> >> >> >> >> >> >> >> >>>> >>>> > >>> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features? > >>> >> >> >> >> >> >> >> >>>> >>>> > >>> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools > >>> such as > >>> >> >> >> >> >> >> >> >>>> >>>> ArkRef > >>> >> >> >> >> and > >>> >> >> >> >> >> >> >> >>>> CherryPicker > >>> >> >> >> >> >> >> >> >>>> >>>> and > >>> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference. > >>> >> >> >> >> >> >> >> >>>> >>>> > >>> >> >> >> >> >> >> >> >>>> >>>> Cristian > >>> >> >> >> >> >> >> >> >>>> >>>> > >>> >> >> >> >> >> >> >> >>>> >>>> > >>> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org > >: > >>> >> >> >> >> >> >> >> >>>> >>>> > >>> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian, > >>> >> >> >> >> >> >> >> >>>> >>>> > >>> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about your > >>> >> concrete > >>> >> >> >> >> heuristic, > >>> >> >> >> >> >> >> in my > >>> >> >> >> >> >> >> >> >>>> honest > >>> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a > >>> lot of > >>> >> >> false > >>> >> >> >> >> >> >> positives. I > >>> >> >> >> >> >> >> >> >>>> don't > >>> >> >> >> >> >> >> >> >>>> >>>>> know > >>> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some > "locality" > >>> >> >> features > >>> >> >> >> >> >> >> >> >>>> >>>>> to > >>> >> >> >> >> >> detect > >>> >> >> >> >> >> >> >> such > >>> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into > >>> account > >>> >> >> that > >>> >> >> >> >> >> >> >> >>>> >>>>> it > >>> >> >> >> >> is > >>> >> >> >> >> >> >> quite > >>> >> >> >> >> >> >> >> >>>> usual > >>> >> >> >> >> >> >> >> >>>> >>>>> that > >>> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even > in > >>> >> >> different > >>> >> >> >> >> >> >> paragraphs. > >>> >> >> >> >> >> >> >> >>>> Although > >>> >> >> >> >> >> >> >> >>>> >>>>> I'm > >>> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language > >>> >> Understanding, > >>> >> >> I > >>> >> >> >> >> would > >>> >> >> >> >> >> say > >>> >> >> >> >> >> >> it > >>> >> >> >> >> >> >> >> is > >>> >> >> >> >> >> >> >> >>>> quite > >>> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent > precision/recall > >>> rates > >>> >> >> for > >>> >> >> >> >> >> >> coreferencing > >>> >> >> >> >> >> >> >> >>>> using > >>> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try > to > >>> >> others > >>> >> >> >> >> >> >> >> >>>> >>>>> tools > >>> >> >> >> >> like > >>> >> >> >> >> >> >> BART > >>> >> >> >> >> >> >> >> ( > >>> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/). > >>> >> >> >> >> >> >> >> >>>> >>>>> > >>> >> >> >> >> >> >> >> >>>> >>>>> Cheers, > >>> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro > >>> >> >> >> >> >> >> >> >>>> >>>>> > >>> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca > >>> escribió: > >>> >> >> >> >> >> >> >> >>>> >>>>> > >>> >> >> >> >> >> >> >> >>>> >>>>> Hi, > >>> >> >> >> >> >> >> >> >>>> >>>>> > >>> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for > >>> implementing > >>> >> the > >>> >> >> >> >> >> >> >> >>>> >>>>>> Event > >>> >> >> >> >> >> >> >> extraction > >>> >> >> >> >> >> >> >> >>>> Engine > >>> >> >> >> >> >> >> >> >>>> >>>>>> feature : > >>> >> >> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is > >>> >> >> >> >> >> >> >> >>>> to > >>> >> >> >> >> >> >> >> >>>> >>>>>> have > >>> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given > text. > >>> >> This > >>> >> >> is > >>> >> >> >> >> >> provided > >>> >> >> >> >> >> >> now > >>> >> >> >> >> >> >> >> >>>> via the > >>> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I > saw > >>> this > >>> >> >> >> >> >> >> >> >>>> >>>>>> module > >>> >> >> >> >> is > >>> >> >> >> >> >> >> >> performing > >>> >> >> >> >> >> >> >> >>>> >>>>>> mostly > >>> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack > >>> Obama > >>> >> and > >>> >> >> >> >> >> >> >> >>>> >>>>>> Mr. > >>> >> >> >> >> >> Obama) > >>> >> >> >> >> >> >> >> >>>> coreference > >>> >> >> >> >> >> >> >> >>>> >>>>>> resolution. > >>> >> >> >> >> >> >> >> >>>> >>>>>> > >>> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences from > >>> the > >>> >> text > >>> >> >> I > >>> >> >> >> >> though > >>> >> >> >> >> >> of > >>> >> >> >> >> >> >> >> >>>> creating > >>> >> >> >> >> >> >> >> >>>> >>>>>> some > >>> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of > >>> >> >> coreference : > >>> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The > >>> >> software > >>> >> >> >> >> company > >>> >> >> >> >> >> just > >>> >> >> >> >> >> >> >> >>>> announced > >>> >> >> >> >> >> >> >> >>>> >>>>>> its > >>> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings." > >>> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously > >>> refers > >>> >> to > >>> >> >> >> >> "Apple". > >>> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of > >>> Named > >>> >> >> >> >> >> >> >> >>>> >>>>>> Entities > >>> >> >> >> >> >> which > >>> >> >> >> >> >> >> are > >>> >> >> >> >> >> >> >> of > >>> >> >> >> >> >> >> >> >>>> the > >>> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this > >>> case > >>> >> >> >> >> >> >> >> >>>> >>>>>> "company" > >>> >> >> >> >> and > >>> >> >> >> >> >> >> also > >>> >> >> >> >> >> >> >> >>>> have > >>> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the > >>> dbpedia > >>> >> >> >> >> categories > >>> >> >> >> >> >> of > >>> >> >> >> >> >> >> the > >>> >> >> >> >> >> >> >> >>>> named > >>> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software". > >>> >> >> >> >> >> >> >> >>>> >>>>>> > >>> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as > >>> "The > >>> >> >> >> >> >> >> >> >>>> >>>>>> software > >>> >> >> >> >> >> >> company" in > >>> >> >> >> >> >> >> >> >>>> the > >>> >> >> >> >> >> >> >> >>>> >>>>>> text > >>> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using the > >>> new > >>> >> Pos > >>> >> >> Tag > >>> >> >> >> >> Based > >>> >> >> >> >> >> >> Phrase > >>> >> >> >> >> >> >> >> >>>> >>>>>> extraction > >>> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a > >>> >> dependency > >>> >> >> >> >> >> >> >> >>>> >>>>>> tree of > >>> >> >> >> >> >> the > >>> >> >> >> >> >> >> >> >>>> sentence and > >>> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects. > >>> >> >> >> >> >> >> >> >>>> >>>>>> > >>> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this > >>> kind > >>> >> of > >>> >> >> >> >> >> >> >> >>>> >>>>>> logic > >>> >> >> >> >> >> would > >>> >> >> >> >> >> >> be > >>> >> >> >> >> >> >> >> >>>> useful > >>> >> >> >> >> >> >> >> >>>> >>>>>> as a > >>> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case > the > >>> >> >> precision > >>> >> >> >> >> >> >> >> >>>> >>>>>> and > >>> >> >> >> >> >> >> recall > >>> >> >> >> >> >> >> >> are > >>> >> >> >> >> >> >> >> >>>> good > >>> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol? > >>> >> >> >> >> >> >> >> >>>> >>>>>> > >>> >> >> >> >> >> >> >> >>>> >>>>>> Thanks, > >>> >> >> >> >> >> >> >> >>>> >>>>>> Cristian > >>> >> >> >> >> >> >> >> >>>> >>>>>> > >>> >> >> >> >> >> >> >> >>>> >>>>>> > >>> >> >> >> >> >> >> >> >>>> >>>>>> > >>> >> >> >> >> >> >> >> >>>> >> > >>> >> >> >> >> >> >> >> >>>> > >>> >> >> >> >> >> >> >> >>>> > >>> >> >> >> >> >> >> >> >>>> > >>> >> >> >> >> >> >> >> >>>> -- > >>> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler > >>> >> >> >> >> rupert.westentha...@gmail.com > >>> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11 > >>> >> >> >> >> >> >> ++43-699-11108907 > >>> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen > >>> >> >> >> >> >> >> >> >>>> > >>> >> >> >> >> >> >> >> >>> > >>> >> >> >> >> >> >> >> >>> > >>> >> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> >> -- > >>> >> >> >> >> >> >> >> | Rupert Westenthaler > >>> >> >> >> >> >> >> >> rupert.westentha...@gmail.com > >>> >> >> >> >> >> >> >> | Bodenlehenstraße 11 > >>> >> >> >> >> ++43-699-11108907 > >>> >> >> >> >> >> >> >> | A-5500 Bischofshofen > >>> >> >> >> >> >> >> >> > >>> >> >> >> >> >> >> > >>> >> >> >> >> >> >> > >>> >> >> >> >> >> >> > >>> >> >> >> >> >> >> -- > >>> >> >> >> >> >> >> | Rupert Westenthaler > >>> >> >> rupert.westentha...@gmail.com > >>> >> >> >> >> >> >> | Bodenlehenstraße 11 > >>> >> >> >> >> >> >> ++43-699-11108907 > >>> >> >> >> >> >> >> | A-5500 Bischofshofen > >>> >> >> >> >> >> >> > >>> >> >> >> >> >> > >>> >> >> >> >> >> > >>> >> >> >> >> >> > >>> >> >> >> >> >> -- > >>> >> >> >> >> >> | Rupert Westenthaler > >>> >> rupert.westentha...@gmail.com > >>> >> >> >> >> >> | Bodenlehenstraße 11 > >>> >> >> ++43-699-11108907 > >>> >> >> >> >> >> | A-5500 Bischofshofen > >>> >> >> >> >> >> > >>> >> >> >> >> > >>> >> >> >> >> > >>> >> >> >> >> > >>> >> >> >> >> -- > >>> >> >> >> >> | Rupert Westenthaler > >>> rupert.westentha...@gmail.com > >>> >> >> >> >> | Bodenlehenstraße 11 > >>> >> ++43-699-11108907 > >>> >> >> >> >> | A-5500 Bischofshofen > >>> >> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> -- > >>> >> >> >> | Rupert Westenthaler > rupert.westentha...@gmail.com > >>> >> >> >> | Bodenlehenstraße 11 > >>> ++43-699-11108907 > >>> >> >> >> | A-5500 Bischofshofen > >>> >> >> > > >>> >> >> > > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> -- > >>> >> >> | Rupert Westenthaler rupert.westentha...@gmail.com > >>> >> >> | Bodenlehenstraße 11 > ++43-699-11108907 > >>> >> >> | A-5500 Bischofshofen > >>> >> >> > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> | Rupert Westenthaler rupert.westentha...@gmail.com > >>> >> | Bodenlehenstraße 11 ++43-699-11108907 > >>> >> | A-5500 Bischofshofen > >>> >> > >>> > >>> > >>> > >>> -- > >>> | Rupert Westenthaler rupert.westentha...@gmail.com > >>> | Bodenlehenstraße 11 ++43-699-11108907 > >>> | A-5500 Bischofshofen > >>> > >> > >> > > > > -- > | Rupert Westenthaler rupert.westentha...@gmail.com > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >