Hi, I've started to implement the dbpedia properties logic and I'd like to get some feedback on some things that I am doing : I want to get a NER from the text and search for it in the dbpedia data so that I can get certain dbpedia properties. The way I'm trying to do this is by getting the NER_ANNOTATION chunk's text and search that in the Entityhub ( which from what I saw is by default configured with dbpedia data). I haven't yet performed a query to actually get the data but before I continue I'd like to ask if this is the way to go?
Thanks, Cristian 2014-03-28 15:12 GMT+02:00 Cristian Petroaca <cristian.petro...@gmail.com>: > Examples : > > 1. Group membership : > a. Spatial membership : > > "Microsoft anounced its 2013 earnings. <coref>The Richmond-based > company</coref> made huge profits." > > b. Organisational membership : > > "Mick Jagger started a new solo album. <coref>The Rolling Stones > singer</coref> did not say what the theme will be." > > 2. Functional membership : > > "Allianz announced its 2013 earnings. <coref>The financial services > company</coref> made a huge profit." > > 3. If no matches were found for the current NER with rules from above > then if the yago:class which matched has more than 2 nouns then we also > consider this a good co-reference but with a lower confidence maybe. > > "Boris Becker will take part in a demonstrative tennis match. > <coref>The former tennis player</coref> will play again after 10 years." > > > 2014-03-28 12:22 GMT+02:00 Rupert Westenthaler < > rupert.westentha...@gmail.com>: > >> Hi Cristian, all >> >> Looks good to me, nut I am not sure if I got everything. If you could >> provide example texts where those rules apply it would make it much >> easier to understand. >> >> Instead of using dbpedia properties you should define your own domain >> model (ontology). You can than align the dbpedia properties to your >> model. This will allow it to apply this approach also to knowledge >> bases other than dbpedia. >> >> For people new to this thread: The above message adds to the >> suggestion first made by Cristian on 4th February. Also the following >> 4 messages (until 7th Feb) provide additional context. >> >> best >> Rupert >> >> >> On Fri, Mar 28, 2014 at 9:23 AM, Cristian Petroaca >> <cristian.petro...@gmail.com> wrote: >> > Hi guys, >> > >> > After Rupert's last suggestions related to this enhancement engine I >> > devised a more comprehensive algorithm for matching the noun phrases >> > against the NER properties.Please take a look and let me know what you >> > think. Thanks. >> > >> > The following rules will be applied to every noun phrase in order to >> find >> > co-references: >> > >> > 1. For each NER prior to the current noun phrase in the text match the >> > yago:class label to the contents of the noun phrase. >> > >> > For the NERs which have a yago:class which matches, apply: >> > >> > 2. Group membership rules : >> > >> > a. spatial membership : the NER is part of a Location. If the noun >> > phrase contains a LOCATION or a demonym then check any location >> properties >> > of the matching NER. >> > >> > If matching NER is a : >> > - person, match against :birthPlace, :region, :nationality >> > - organisation, match against :foundationPlace, :locationCity, >> > :location, :hometown >> > - place, match against :country, :subdivisionName, :location, >> > >> > Ex: The Italian President, The Richmond-based company >> > >> > b. organisational membership : the NER is part of an Organisation. >> If >> > the noun phrase contains an ORGANISATION then check the following >> > properties of the maching NER: >> > >> > If matching NER is : >> > - person, match against :occupation, :associatedActs >> > - organisation ? >> > - location ? >> > >> > Ex: The Microsoft executive, The Pink Floyd singer >> > >> > 3. Functional description rule: the noun phrase describes what the NER >> does >> > conceptually. >> > If there are no NERs in the noun phrase then match the following >> properties >> > of the matching NER to the contents of the noun phrase (aside from the >> > nouns which are part of the yago:class) : >> > >> > If NER is a: >> > - person ? >> > - organisation : , match against :service, :industry, :genre >> > - location ? >> > >> > Ex: The software company. >> > >> > 4. If no matches were found for the current NER with rules 2 or 3 then >> if >> > the yago:class which matched has more than 2 nouns then we also consider >> > this a good co-reference but with a lower confidence maybe. >> > >> > Ex: The former tennis player, the theoretical physicist. >> > >> > 5. Based on the number of nouns which matched we create a confidence >> level. >> > The number of matched nouns cannot be lower than 2 and we must have a >> > yago:class match. >> > >> > For all NERs which got to this point, select the closest ones in the >> text >> > to the noun phrase which matched against the same properties (yago:class >> > and dbpedia) and mark them as co-references. >> > >> > Note: all noun phrases need to be lemmatized before all of this in case >> > there are any plurals. >> > >> > >> > 2014-03-25 20:50 GMT+02:00 Cristian Petroaca < >> cristian.petro...@gmail.com>: >> > >> >> That worked. Thanks. >> >> >> >> So, there are no exceptions during the startup of the launcher. >> >> The component tab in the felix console shows 6 WeightedChains the first >> >> time, including the default one but after my changes and a restart >> there >> >> are only 5 - the default one is missing altogether. >> >> >> >> >> >> 2014-03-24 20:18 GMT+02:00 Rupert Westenthaler < >> >> rupert.westentha...@gmail.com>: >> >> >> >> Hi Cristian, >> >>> >> >>> I do see the same problem since last Friday. The solution as mentions >> >>> by [1] works for me. >> >>> >> >>> mvn -Djsse.enableSNIExtension=false {goals} >> >>> >> >>> No Idea why https connections to github do currently cause this. I >> >>> could not find anything related via Google. So I suggest to use the >> >>> system property for now. If this persists for longer we can adapt the >> >>> build files accordingly. >> >>> >> >>> best >> >>> Rupert >> >>> >> >>> >> >>> >> >>> >> >>> [1] >> >>> >> http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0 >> >>> >> >>> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca >> >>> <cristian.petro...@gmail.com> wrote: >> >>> > I did a clean on the whole project and now I wanted to do another >> "mvn >> >>> > clean install" but I am getting this : >> >>> > >> >>> > "[INFO] >> >>> > >> ------------------------------------------------------------------------ >> >>> > [ERROR] Failed to execute goal >> >>> > org.apache.maven.plugins:maven-antrun-plugin:1.6: >> >>> > run (download) on project org.apache.stanbol.data.opennlp.lang.es: >> An >> >>> Ant >> >>> > BuildE >> >>> > xception has occured: The following error occurred while executing >> this >> >>> > line: >> >>> > [ERROR] >> >>> > >> C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3 >> >>> > 3: Failed to copy >> >>> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140 >> >>> > 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin >> to >> >>> > C:\Data\Pr >> >>> > >> >>> >> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\ >> >>> > data\opennlp\es-pos-maxent.bin due to >> javax.net.ssl.SSLProtocolException >> >>> > handshake alert : unrecognized_name" >> >>> > >> >>> > >> >>> > >> >>> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler < >> >>> > rupert.westentha...@gmail.com>: >> >>> > >> >>> >> Hi Cristian, >> >>> >> >> >>> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca >> >>> >> <cristian.petro...@gmail.com> wrote: >> >>> >> > >> >>> >> >> >>> >> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"] >> >>> >> > service.ranking=I"-2147483648" >> >>> >> > stanbol.enhancer.chain.name="default" >> >>> >> >> >>> >> Does look fine to me. Do you see any exception during the startup >> of >> >>> >> the launcher. Can you check the status of this component in the >> >>> >> component tab of the felix web console [1] (search for >> >>> >> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). >> If >> >>> >> you have multiple you can find the correct one by comparing the >> >>> >> "Properties" with those in the configuration file. >> >>> >> >> >>> >> I guess that the according service is in the 'unsatisfied' as you >> do >> >>> >> not see it in the web interface. But if this is the case you should >> >>> >> also see the according exception in the log. You can also manually >> >>> >> stop/start the component. In this case the exception should be >> >>> >> re-thrown and you do not need to search the log for it. >> >>> >> >> >>> >> best >> >>> >> Rupert >> >>> >> >> >>> >> >> >>> >> [1] http://localhost:8080/system/console/components >> >>> >> >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler < >> >>> >> rupert.westentha...@gmail.com >> >>> >> >>: >> >>> >> > >> >>> >> >> Hi Cristian, >> >>> >> >> >> >>> >> >> you can not send attachments to the list. Please copy the >> contents >> >>> >> >> directly to the mail >> >>> >> >> >> >>> >> >> thx >> >>> >> >> Rupert >> >>> >> >> >> >>> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca >> >>> >> >> <cristian.petro...@gmail.com> wrote: >> >>> >> >> > The config attached. >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler >> >>> >> >> > <rupert.westentha...@gmail.com>: >> >>> >> >> > >> >>> >> >> >> Hi Cristian, >> >>> >> >> >> >> >>> >> >> >> can you provide the contents of the chain after your >> >>> modifications? >> >>> >> >> >> Would be interesting to test why the chain is no longer >> active >> >>> after >> >>> >> >> >> the restart. >> >>> >> >> >> >> >>> >> >> >> You can find the config file in the 'stanbol/fileinstall' >> folder. >> >>> >> >> >> >> >>> >> >> >> best >> >>> >> >> >> Rupert >> >>> >> >> >> >> >>> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca >> >>> >> >> >> <cristian.petro...@gmail.com> wrote: >> >>> >> >> >> > Related to the default chain selection rules : before >> restart I >> >>> >> had a >> >>> >> >> >> > chain >> >>> >> >> >> > with the name 'default' as in I could access it via >> >>> >> >> >> > enhancer/chain/default. >> >>> >> >> >> > Then I just added another engine to the 'default' chain. I >> >>> assumed >> >>> >> >> that >> >>> >> >> >> > after the restart the chain with the 'default' name would >> be >> >>> >> >> persisted. >> >>> >> >> >> > So >> >>> >> >> >> > the first rule should have been applied after the restart >> as >> >>> well. >> >>> >> But >> >>> >> >> >> > instead I cannot reach it via enhancer/chain/default >> anymore >> >>> so its >> >>> >> >> >> > gone. >> >>> >> >> >> > Anyway, this is not a big deal, it's not blocking me in any >> >>> way, I >> >>> >> >> just >> >>> >> >> >> > wanted to understand where the problem is. >> >>> >> >> >> > >> >>> >> >> >> > >> >>> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler >> >>> >> >> >> > <rupert.westentha...@gmail.com >> >>> >> >> >> >>: >> >>> >> >> >> > >> >>> >> >> >> >> Hi Cristian >> >>> >> >> >> >> >> >>> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca >> >>> >> >> >> >> <cristian.petro...@gmail.com> wrote: >> >>> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool >> >>> >> >> >> >> > >> >>> >> >> >> >> > 2. I start the stable launcher -> create a new instance >> of >> >>> the >> >>> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At this >> >>> point >> >>> >> >> >> >> > everything >> >>> >> >> >> >> > looks good and works ok. >> >>> >> >> >> >> > After I restart the server the default chain is gone and >> >>> >> instead I >> >>> >> >> >> >> > see >> >>> >> >> >> >> this >> >>> >> >> >> >> > in the enhancement chains page : all-active (default, >> id: >> >>> 149, >> >>> >> >> >> >> > ranking: >> >>> >> >> >> >> 0, >> >>> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not >> contain >> >>> the >> >>> >> >> >> >> > 'default' >> >>> >> >> >> >> > word before the restart. >> >>> >> >> >> >> > >> >>> >> >> >> >> >> >>> >> >> >> >> Please note the default chain selection rules as >> described at >> >>> [1]. >> >>> >> >> You >> >>> >> >> >> >> can also access chains chains under >> >>> '/enhancer/chain/{chain-name}' >> >>> >> >> >> >> >> >>> >> >> >> >> best >> >>> >> >> >> >> Rupert >> >>> >> >> >> >> >> >>> >> >> >> >> [1] >> >>> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain >> >>> >> >> >> >> >> >>> >> >> >> >> > It looks like the config files are exactly what I need. >> >>> Thanks. >> >>> >> >> >> >> > >> >>> >> >> >> >> > >> >>> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler < >> >>> >> >> >> >> rupert.westentha...@gmail.com >> >>> >> >> >> >> >>: >> >>> >> >> >> >> > >> >>> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca >> >>> >> >> >> >> >> <cristian.petro...@gmail.com> wrote: >> >>> >> >> >> >> >> > Thanks Rupert. >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> > A couple more questions/issues : >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing >> this >> >>> in the >> >>> >> >> >> >> >> > console >> >>> >> >> >> >> >> > output : >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2] >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains >> get >> >>> >> messed >> >>> >> >> >> >> >> > up. I >> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine to >> it >> >>> so >> >>> >> there >> >>> >> >> >> >> >> > are >> >>> >> >> >> >> 11 >> >>> >> >> >> >> >> > engines in it. After the restart this chain now >> contains >> >>> >> around >> >>> >> >> 23 >> >>> >> >> >> >> >> engines >> >>> >> >> >> >> >> > in total. >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> I was not able to replicate this. What I tried was >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> (1) start up the stable launcher >> >>> >> >> >> >> >> (2) add an additional engine to the default chain >> >>> >> >> >> >> >> (3) restart the launcher >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> The default chain was not changed after (2) and (3). >> So I >> >>> would >> >>> >> >> need >> >>> >> >> >> >> >> further information for knowing why this is happening. >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> Generally it is better to create you own chain >> instance as >> >>> >> >> modifying >> >>> >> >> >> >> >> one that is provided by the default configuration. I >> would >> >>> also >> >>> >> >> >> >> >> recommend that you keep your test configuration in text >> >>> files >> >>> >> and >> >>> >> >> to >> >>> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing >> so >> >>> >> prevent >> >>> >> >> you >> >>> >> >> >> >> >> from manually entering the configuration after a >> software >> >>> >> update. >> >>> >> >> >> >> >> The >> >>> >> >> >> >> >> production-mode section [3] provides information on >> how to >> >>> do >> >>> >> >> that. >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> best >> >>> >> >> >> >> >> Rupert >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278 >> >>> >> >> >> >> >> [2] http://svn.apache.org/r1576623 >> >>> >> >> >> >> >> [3] >> http://stanbol.apache.org/docs/trunk/production-mode >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> > ERROR: Bundle >> >>> org.apache.stanbol.enhancer.engine.topic.web >> >>> >> >> [153]: >> >>> >> >> >> >> Error >> >>> >> >> >> >> >> > starting >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> > >> >>> >> >> >> >>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar >> >>> >> >> >> >> >> > (org.osgi >> >>> >> >> >> >> >> > .framework.BundleException: Unresolved constraint in >> >>> bundle >> >>> >> >> >> >> >> > org.apache.stanbol.e >> >>> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve >> 153.0: >> >>> >> missing >> >>> >> >> >> >> >> > requirement [15 >> >>> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs >> >>> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0)))) >> >>> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved >> >>> constraint in >> >>> >> >> >> >> >> > bundle >> >>> >> >> >> >> >> > org.apache.s >> >>> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to >> resolve >> >>> >> 153.0: >> >>> >> >> >> >> missing >> >>> >> >> >> >> >> > require >> >>> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs >> >>> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0)) >> >>> >> >> >> >> >> > ) >> >>> >> >> >> >> >> > at >> >>> >> >> >> >> >> >> >>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443) >> >>> >> >> >> >> >> > at >> >>> >> >> >> >> >> org.apache.felix.framework.Felix.startBundle(Felix.java:1727) >> >>> >> >> >> >> >> > at >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> > >> >>> >> >> >> >>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156) >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> > at >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> > >> >>> >> >> >> >>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264 >> >>> >> >> >> >> >> > ) >> >>> >> >> >> >> >> > at java.lang.Thread.run(Unknown Source) >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> > Despite of this the server starts fine and I can use >> the >> >>> >> >> enhancer >> >>> >> >> >> >> fine. >> >>> >> >> >> >> >> Do >> >>> >> >> >> >> >> > you guys see this as well? >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains >> get >> >>> >> messed >> >>> >> >> >> >> >> > up. I >> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine to >> it >> >>> so >> >>> >> there >> >>> >> >> >> >> >> > are >> >>> >> >> >> >> 11 >> >>> >> >> >> >> >> > engines in it. After the restart this chain now >> contains >> >>> >> around >> >>> >> >> 23 >> >>> >> >> >> >> >> engines >> >>> >> >> >> >> >> > in total. >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler < >> >>> >> >> >> >> >> rupert.westentha...@gmail.com >> >>> >> >> >> >> >> >>: >> >>> >> >> >> >> >> > >> >>> >> >> >> >> >> >> Hi Cristian, >> >>> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> NER Annotations are typically available as both >> >>> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and >> fise:TextAnnotation >> >>> [1] >> >>> >> in >> >>> >> >> the >> >>> >> >> >> >> >> >> enhancement metadata. As you are already accessing >> the >> >>> >> >> >> >> >> >> AnayzedText I >> >>> >> >> >> >> >> >> would prefer using the >> NlpAnnotations.NER_ANNOTATION. >> >>> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> best >> >>> >> >> >> >> >> >> Rupert >> >>> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> [1] >> >>> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation >> >>> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca >> >>> >> >> >> >> >> >> <cristian.petro...@gmail.com> wrote: >> >>> >> >> >> >> >> >> > Thanks. >> >>> >> >> >> >> >> >> > I assume I should get the Named entities using the >> >>> same >> >>> >> but >> >>> >> >> >> >> >> >> > with >> >>> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION? >> >>> >> >> >> >> >> >> > >> >>> >> >> >> >> >> >> > >> >>> >> >> >> >> >> >> > >> >>> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler < >> >>> >> >> >> >> >> >> > rupert.westentha...@gmail.com>: >> >>> >> >> >> >> >> >> > >> >>> >> >> >> >> >> >> >> Hallo Cristian, >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement >> >>> results. >> >>> >> >> You >> >>> >> >> >> >> need to >> >>> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1] >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> here is some demo code you can use in the >> >>> >> computeEnhancement >> >>> >> >> >> >> method >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> AnalysedText at = >> >>> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this, >> >>> >> >> >> >> ci, >> >>> >> >> >> >> >> >> true); >> >>> >> >> >> >> >> >> >> Iterator<? extends Section> sections = >> >>> >> >> >> >> >> >> >> at.getSentences(); >> >>> >> >> >> >> >> >> >> if(!sections.hasNext()){ //process as >> single >> >>> >> >> sentence >> >>> >> >> >> >> >> >> >> sections = >> >>> >> Collections.singleton(at).iterator(); >> >>> >> >> >> >> >> >> >> } >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> while(sections.hasNext()){ >> >>> >> >> >> >> >> >> >> Section section = sections.next(); >> >>> >> >> >> >> >> >> >> Iterator<Span> chunks = >> >>> >> >> >> >> >> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk)); >> >>> >> >> >> >> >> >> >> while(chunks.hasNext()){ >> >>> >> >> >> >> >> >> >> Span chunk = chunks.next(); >> >>> >> >> >> >> >> >> >> Value<PhraseTag> phrase = >> >>> >> >> >> >> >> >> >> >> >>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION); >> >>> >> >> >> >> >> >> >> if(phrase.value().getCategory() >> == >> >>> >> >> >> >> >> >> LexicalCategory.Noun){ >> >>> >> >> >> >> >> >> >> log.info(" - NounPhrase >> [{},{}] >> >>> {}", >> >>> >> >> new >> >>> >> >> >> >> >> Object[]{ >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()}); >> >>> >> >> >> >> >> >> >> } >> >>> >> >> >> >> >> >> >> } >> >>> >> >> >> >> >> >> >> } >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> hope this helps >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> best >> >>> >> >> >> >> >> >> >> Rupert >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> [1] >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca >> >>> >> >> >> >> >> >> >> <cristian.petro...@gmail.com> wrote: >> >>> >> >> >> >> >> >> >> > I started to implement the engine and I'm >> having >> >>> >> problems >> >>> >> >> >> >> >> >> >> > with >> >>> >> >> >> >> >> getting >> >>> >> >> >> >> >> >> >> > results for noun phrases. I modified the >> "default" >> >>> >> >> weighted >> >>> >> >> >> >> chain >> >>> >> >> >> >> >> to >> >>> >> >> >> >> >> >> also >> >>> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample >> text >> >>> : >> >>> >> >> "Angela >> >>> >> >> >> >> Merkel >> >>> >> >> >> >> >> >> >> visted >> >>> >> >> >> >> >> >> >> > China. The german chancellor met with various >> >>> people". >> >>> >> I >> >>> >> >> >> >> expected >> >>> >> >> >> >> >> that >> >>> >> >> >> >> >> >> >> the >> >>> >> >> >> >> >> >> >> > RDF XML output would contain some info about >> the >> >>> noun >> >>> >> >> >> >> >> >> >> > phrases >> >>> >> >> >> >> but I >> >>> >> >> >> >> >> >> >> cannot >> >>> >> >> >> >> >> >> >> > see any. >> >>> >> >> >> >> >> >> >> > Could you point me to the correct way to >> generate >> >>> the >> >>> >> noun >> >>> >> >> >> >> phrases? >> >>> >> >> >> >> >> >> >> > >> >>> >> >> >> >> >> >> >> > Thanks, >> >>> >> >> >> >> >> >> >> > Cristian >> >>> >> >> >> >> >> >> >> > >> >>> >> >> >> >> >> >> >> > >> >>> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca < >> >>> >> >> >> >> >> >> >> cristian.petro...@gmail.com>: >> >>> >> >> >> >> >> >> >> > >> >>> >> >> >> >> >> >> >> >> Opened >> >>> >> >> https://issues.apache.org/jira/browse/STANBOL-1279 >> >>> >> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca < >> >>> >> >> >> >> >> >> >> cristian.petro...@gmail.com> >> >>> >> >> >> >> >> >> >> >> : >> >>> >> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> >> Hi Rupert, >> >>> >> >> >> >> >> >> >> >>> >> >>> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll >> also >> >>> >> take a >> >>> >> >> >> >> >> >> >> >>> look >> >>> >> >> >> >> at >> >>> >> >> >> >> >> >> Yago. >> >>> >> >> >> >> >> >> >> >>> >> >>> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked >> about >> >>> here. >> >>> >> It >> >>> >> >> >> >> >> >> >> >>> will >> >>> >> >> >> >> >> >> probably >> >>> >> >> >> >> >> >> >> >>> have just a draft-like description for now >> and >> >>> will >> >>> >> be >> >>> >> >> >> >> >> >> >> >>> updated >> >>> >> >> >> >> >> as I >> >>> >> >> >> >> >> >> go >> >>> >> >> >> >> >> >> >> >>> along. >> >>> >> >> >> >> >> >> >> >>> >> >>> >> >> >> >> >> >> >> >>> Thanks, >> >>> >> >> >> >> >> >> >> >>> Cristian >> >>> >> >> >> >> >> >> >> >>> >> >>> >> >> >> >> >> >> >> >>> >> >>> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert >> Westenthaler < >> >>> >> >> >> >> >> >> >> >>> rupert.westentha...@gmail.com>: >> >>> >> >> >> >> >> >> >> >>> >> >>> >> >> >> >> >> >> >> >>> Hi Cristian, >> >>> >> >> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You >> should >> >>> have >> >>> >> a >> >>> >> >> >> >> >> >> >> >>>> look at >> >>> >> >> >> >> >> Yago2 >> >>> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago >> taxonomy >> >>> is >> >>> >> much >> >>> >> >> >> >> better >> >>> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia. >> Mapping >> >>> >> >> >> >> >> >> >> >>>> suggestions of >> >>> >> >> >> >> >> >> dbpedia >> >>> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both >> dbpedia and >> >>> >> yago2 >> >>> >> >> do >> >>> >> >> >> >> >> provide >> >>> >> >> >> >> >> >> >> >>>> mappings [2] and [3] >> >>> >> >> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro >> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>: >> >>> >> >> >> >> >> >> >> >>>> >> >> >>> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The >> >>> >> Redmond's >> >>> >> >> >> >> >> >> >> >>>> >> company >> >>> >> >> >> >> >> made >> >>> >> >> >> >> >> >> a >> >>> >> >> >> >> >> >> >> >>>> >> huge profit". >> >>> >> >> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial >> >>> contexts >> >>> >> >> are >> >>> >> >> >> >> >> >> >> >>>> very >> >>> >> >> >> >> >> >> >> >>>> important as they tend to be often used for >> >>> >> >> referencing. >> >>> >> >> >> >> >> >> >> >>>> So I >> >>> >> >> >> >> >> would >> >>> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial >> context. >> >>> For >> >>> >> >> >> >> >> >> >> >>>> spatial >> >>> >> >> >> >> >> >> Entities >> >>> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for >> other >> >>> >> (like a >> >>> >> >> >> >> Person, >> >>> >> >> >> >> >> >> >> >>>> Company) you could use relations to spatial >> >>> entities >> >>> >> >> >> >> >> >> >> >>>> define >> >>> >> >> >> >> >> their >> >>> >> >> >> >> >> >> >> >>>> spatial context. This context could than be >> >>> used to >> >>> >> >> >> >> >> >> >> >>>> correctly >> >>> >> >> >> >> >> link >> >>> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft". >> >>> >> >> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the >> "spatial" >> >>> >> >> context >> >>> >> >> >> >> >> >> >> >>>> of >> >>> >> >> >> >> each >> >>> >> >> >> >> >> >> >> >>>> entity (basically relation to entities that >> are >> >>> >> cities, >> >>> >> >> >> >> regions, >> >>> >> >> >> >> >> >> >> >>>> countries) as a separate dimension, because >> >>> those >> >>> >> are >> >>> >> >> >> >> >> >> >> >>>> very >> >>> >> >> >> >> often >> >>> >> >> >> >> >> >> used >> >>> >> >> >> >> >> >> >> >>>> for coreferences. >> >>> >> >> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >> >>>> [1] >> http://www.mpi-inf.mpg.de/yago-naga/yago/ >> >>> >> >> >> >> >> >> >> >>>> [2] >> >>> >> >> >> >> >> >> >> >>>> >> >>> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2 >> >>> >> >> >> >> >> >> >> >>>> [3] >> >>> >> >> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z >> >>> >> >> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian >> >>> Petroaca >> >>> >> >> >> >> >> >> >> >>>> <cristian.petro...@gmail.com> wrote: >> >>> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories for >> each >> >>> >> entity, >> >>> >> >> >> >> >> >> >> >>>> > in >> >>> >> >> >> >> this >> >>> >> >> >> >> >> >> case >> >>> >> >> >> >> >> >> >> for >> >>> >> >> >> >> >> >> >> >>>> > Microsoft we have : >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index >> >>> >> >> >> >> >> >> >> >>>> > category:Microsoft >> >>> >> >> >> >> >> >> >> >>>> > >> >>> category:Software_companies_of_the_United_States >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> category:Software_companies_based_in_Washington_(state) >> >>> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975 >> >>> >> >> >> >> >> >> >> >>>> > >> >>> category:1975_establishments_in_the_United_States >> >>> >> >> >> >> >> >> >> >>>> > >> >>> category:Companies_based_in_Redmond,_Washington >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> >>> >> >> >> category:Multinational_companies_headquartered_in_the_United_States >> >>> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in >> >>> >> >> Redmont,Washington" >> >>> >> >> >> >> which >> >>> >> >> >> >> >> >> could >> >>> >> >> >> >> >> >> >> be >> >>> >> >> >> >> >> >> >> >>>> > matched. >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>>> > There is still other contextual >> information >> >>> from >> >>> >> >> >> >> >> >> >> >>>> > dbpedia >> >>> >> >> >> >> which >> >>> >> >> >> >> >> >> can >> >>> >> >> >> >> >> >> >> be >> >>> >> >> >> >> >> >> >> >>>> used. >> >>> >> >> >> >> >> >> >> >>>> > For example for an Organization we could >> also >> >>> >> >> include : >> >>> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software >> >>> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack >> Obama) : >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession: >> >>> >> >> >> >> >> >> >> >>>> > >> dbpedia:Author >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law >> >>> >> >> >> >> >> >> >> >>>> > >> dbpedia:Lawyer >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this >> as I >> >>> think >> >>> >> >> that >> >>> >> >> >> >> >> >> >> >>>> > it >> >>> >> >> >> >> may >> >>> >> >> >> >> >> >> have >> >>> >> >> >> >> >> >> >> >>>> some >> >>> >> >> >> >> >> >> >> >>>> > value in increasing the number of >> coreference >> >>> >> >> >> >> >> >> >> >>>> > resolutions >> >>> >> >> >> >> and >> >>> >> >> >> >> >> I'd >> >>> >> >> >> >> >> >> >> like >> >>> >> >> >> >> >> >> >> >>>> to >> >>> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather than >> >>> recall >> >>> >> >> since >> >>> >> >> >> >> >> >> >> >>>> > we >> >>> >> >> >> >> >> already >> >>> >> >> >> >> >> >> >> have >> >>> >> >> >> >> >> >> >> >>>> a >> >>> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the >> stanford >> >>> nlp >> >>> >> tool >> >>> >> >> >> >> >> >> >> >>>> > and >> >>> >> >> >> >> this >> >>> >> >> >> >> >> >> would >> >>> >> >> >> >> >> >> >> >>>> be as >> >>> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is how >> I >> >>> would >> >>> >> >> like >> >>> >> >> >> >> >> >> >> >>>> > to >> >>> >> >> >> >> use >> >>> >> >> >> >> >> >> it). >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a >> jira? I >> >>> >> could >> >>> >> >> >> >> >> >> >> >>>> > update >> >>> >> >> >> >> it >> >>> >> >> >> >> >> to >> >>> >> >> >> >> >> >> >> show >> >>> >> >> >> >> >> >> >> >>>> my >> >>> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if it >> >>> turns >> >>> >> out >> >>> >> >> >> >> >> >> >> >>>> > that >> >>> >> >> >> >> it >> >>> >> >> >> >> >> was >> >>> >> >> >> >> >> >> a >> >>> >> >> >> >> >> >> >> bad >> >>> >> >> >> >> >> >> >> >>>> idea >> >>> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll >> end up >> >>> >> with >> >>> >> >> >> >> >> >> >> >>>> > more >> >>> >> >> >> >> >> >> knowledge >> >>> >> >> >> >> >> >> >> >>>> about >> >>> >> >> >> >> >> >> >> >>>> > Stanbol in the end :). >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro >> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>: >> >>> >> >> >> >> >> >> >> >>>> > >> >>> >> >> >> >> >> >> >> >>>> >> Hi Cristian, >> >>> >> >> >> >> >> >> >> >>>> >> >> >>> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want >> to be >> >>> the >> >>> >> >> >> >> >> >> >> >>>> >> devil's >> >>> >> >> >> >> >> >> advocate >> >>> >> >> >> >> >> >> >> but >> >>> >> >> >> >> >> >> >> >>>> I'm >> >>> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using the >> >>> dbpedia >> >>> >> >> >> >> categories >> >>> >> >> >> >> >> >> >> feature. >> >>> >> >> >> >> >> >> >> >>>> For >> >>> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also >> >>> "Microsoft >> >>> >> >> posted >> >>> >> >> >> >> >> >> >> >>>> >> its >> >>> >> >> >> >> >> 2013 >> >>> >> >> >> >> >> >> >> >>>> earnings. >> >>> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge >> profit". >> >>> So, >> >>> >> maybe >> >>> >> >> >> >> >> including >> >>> >> >> >> >> >> >> more >> >>> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia could >> >>> >> increase >> >>> >> >> the >> >>> >> >> >> >> recall >> >>> >> >> >> >> >> >> but >> >>> >> >> >> >> >> >> >> of >> >>> >> >> >> >> >> >> >> >>>> course >> >>> >> >> >> >> >> >> >> >>>> >> will reduce the precision. >> >>> >> >> >> >> >> >> >> >>>> >> >> >>> >> >> >> >> >> >> >> >>>> >> Cheers, >> >>> >> >> >> >> >> >> >> >>>> >> Rafa >> >>> >> >> >> >> >> >> >> >>>> >> >> >>> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca >> >>> escribió: >> >>> >> >> >> >> >> >> >> >>>> >> >> >>> >> >> >> >> >> >> >> >>>> >> Back with a more detailed description >> of the >> >>> >> steps >> >>> >> >> >> >> >> >> >> >>>> >> for >> >>> >> >> >> >> >> making >> >>> >> >> >> >> >> >> this >> >>> >> >> >> >> >> >> >> >>>> kind of >> >>> >> >> >> >> >> >> >> >>>> >>> coreference work. >> >>> >> >> >> >> >> >> >> >>>> >>> >> >>> >> >> >> >> >> >> >> >>>> >>> I will be using references to the >> following >> >>> >> text in >> >>> >> >> >> >> >> >> >> >>>> >>> the >> >>> >> >> >> >> >> steps >> >>> >> >> >> >> >> >> >> below >> >>> >> >> >> >> >> >> >> >>>> in >> >>> >> >> >> >> >> >> >> >>>> >>> order to make things clearer : >> "Microsoft >> >>> posted >> >>> >> >> its >> >>> >> >> >> >> >> >> >> >>>> >>> 2013 >> >>> >> >> >> >> >> >> >> earnings. >> >>> >> >> >> >> >> >> >> >>>> The >> >>> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit." >> >>> >> >> >> >> >> >> >> >>>> >>> >> >>> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text >> which >> >>> has : >> >>> >> >> >> >> >> >> >> >>>> >>> a. a determinate pos which implies >> >>> >> reference >> >>> >> >> to >> >>> >> >> >> >> >> >> >> >>>> >>> an >> >>> >> >> >> >> >> entity >> >>> >> >> >> >> >> >> >> local >> >>> >> >> >> >> >> >> >> >>>> to >> >>> >> >> >> >> >> >> >> >>>> >>> the >> >>> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but >> not >> >>> >> "another, >> >>> >> >> >> >> every", >> >>> >> >> >> >> >> etc >> >>> >> >> >> >> >> >> >> which >> >>> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity >> outside of >> >>> the >> >>> >> >> text. >> >>> >> >> >> >> >> >> >> >>>> >>> b. having at least another noun >> aside >> >>> from >> >>> >> the >> >>> >> >> >> >> >> >> >> >>>> >>> main >> >>> >> >> >> >> >> >> required >> >>> >> >> >> >> >> >> >> >>>> noun >> >>> >> >> >> >> >> >> >> >>>> >>> which >> >>> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I >> will not >> >>> >> count >> >>> >> >> >> >> >> >> >> >>>> >>> "The >> >>> >> >> >> >> >> >> company" >> >>> >> >> >> >> >> >> >> as >> >>> >> >> >> >> >> >> >> >>>> being >> >>> >> >> >> >> >> >> >> >>>> >>> a >> >>> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could >> >>> create a >> >>> >> lot >> >>> >> >> of >> >>> >> >> >> >> false >> >>> >> >> >> >> >> >> >> >>>> positives by >> >>> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of some >> words >> >>> >> such >> >>> >> >> as >> >>> >> >> >> >> >> >> >> >>>> >>> "in >> >>> >> >> >> >> the >> >>> >> >> >> >> >> >> >> company >> >>> >> >> >> >> >> >> >> >>>> of >> >>> >> >> >> >> >> >> >> >>>> >>> good people". >> >>> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good >> candidate >> >>> >> since we >> >>> >> >> >> >> >> >> >> >>>> >>> also >> >>> >> >> >> >> >> have >> >>> >> >> >> >> >> >> >> >>>> "software". >> >>> >> >> >> >> >> >> >> >>>> >>> >> >>> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase >> to the >> >>> >> >> contents >> >>> >> >> >> >> >> >> >> >>>> >>> of >> >>> >> >> >> >> the >> >>> >> >> >> >> >> >> >> dbpedia >> >>> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found >> prior >> >>> to >> >>> >> the >> >>> >> >> >> >> location >> >>> >> >> >> >> >> of >> >>> >> >> >> >> >> >> the >> >>> >> >> >> >> >> >> >> >>>> noun >> >>> >> >> >> >> >> >> >> >>>> >>> phrase in the text. >> >>> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the >> following >> >>> >> format >> >>> >> >> >> >> >> >> >> >>>> >>> (for >> >>> >> >> >> >> >> >> Microsoft >> >>> >> >> >> >> >> >> >> for >> >>> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the >> United >> >>> >> >> States". >> >>> >> >> >> >> >> >> >> >>>> >>> So we try to match "software company" >> with >> >>> >> that. >> >>> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in >> the >> >>> >> dbpedia >> >>> >> >> >> >> category >> >>> >> >> >> >> >> >> has a >> >>> >> >> >> >> >> >> >> >>>> plural >> >>> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all >> categories >> >>> which >> >>> >> I >> >>> >> >> >> >> >> >> >> >>>> >>> saw. I >> >>> >> >> >> >> >> don't >> >>> >> >> >> >> >> >> >> know >> >>> >> >> >> >> >> >> >> >>>> if >> >>> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I >> >>> thought >> >>> >> of >> >>> >> >> >> >> applying a >> >>> >> >> >> >> >> >> >> >>>> lemmatizer on >> >>> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in >> order >> >>> for >> >>> >> them >> >>> >> >> to >> >>> >> >> >> >> have a >> >>> >> >> >> >> >> >> >> common >> >>> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun >> >>> phrase >> >>> >> >> itself >> >>> >> >> >> >> has a >> >>> >> >> >> >> >> >> plural >> >>> >> >> >> >> >> >> >> >>>> form. >> >>> >> >> >> >> >> >> >> >>>> >>> >> >>> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison >> >>> only the >> >>> >> >> >> >> >> >> >> >>>> >>> words in >> >>> >> >> >> >> >> the >> >>> >> >> >> >> >> >> >> >>>> category >> >>> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not >> >>> prepositions >> >>> >> or >> >>> >> >> >> >> >> determiners >> >>> >> >> >> >> >> >> >> such >> >>> >> >> >> >> >> >> >> >>>> as "of >> >>> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag >> the >> >>> >> >> categories >> >>> >> >> >> >> >> contents >> >>> >> >> >> >> >> >> as >> >>> >> >> >> >> >> >> >> >>>> well. >> >>> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and >> lemma >> >>> on >> >>> >> the >> >>> >> >> >> >> dbpedia >> >>> >> >> >> >> >> >> >> >>>> categories when >> >>> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub >> and >> >>> >> storing >> >>> >> >> >> >> >> >> >> >>>> >>> them >> >>> >> >> >> >> for >> >>> >> >> >> >> >> >> later >> >>> >> >> >> >> >> >> >> >>>> use - I >> >>> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the >> >>> moment. >> >>> >> >> >> >> >> >> >> >>>> >>> >> >>> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in >> the >> >>> noun >> >>> >> >> phrase >> >>> >> >> >> >> with >> >>> >> >> >> >> >> the >> >>> >> >> >> >> >> >> >> >>>> equivalent >> >>> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on the >> >>> number >> >>> >> of >> >>> >> >> >> >> matches I >> >>> >> >> >> >> >> >> can >> >>> >> >> >> >> >> >> >> >>>> create a >> >>> >> >> >> >> >> >> >> >>>> >>> confidence level. >> >>> >> >> >> >> >> >> >> >>>> >>> >> >>> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase >> with >> >>> the >> >>> >> >> >> >> >> >> >> >>>> >>> rdf:type >> >>> >> >> >> >> from >> >>> >> >> >> >> >> >> >> dbpedia >> >>> >> >> >> >> >> >> >> >>>> of the >> >>> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase >> the >> >>> >> >> confidence >> >>> >> >> >> >> level. >> >>> >> >> >> >> >> >> >> >>>> >>> >> >>> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities >> >>> which >> >>> >> can >> >>> >> >> >> >> >> >> >> >>>> >>> match a >> >>> >> >> >> >> >> >> certain >> >>> >> >> >> >> >> >> >> >>>> noun >> >>> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with >> the >> >>> >> closest >> >>> >> >> >> >> >> >> >> >>>> >>> named >> >>> >> >> >> >> >> entity >> >>> >> >> >> >> >> >> >> prior >> >>> >> >> >> >> >> >> >> >>>> to it >> >>> >> >> >> >> >> >> >> >>>> >>> in the text. >> >>> >> >> >> >> >> >> >> >>>> >>> >> >>> >> >> >> >> >> >> >> >>>> >>> What do you think? >> >>> >> >> >> >> >> >> >> >>>> >>> >> >>> >> >> >> >> >> >> >> >>>> >>> Cristian >> >>> >> >> >> >> >> >> >> >>>> >>> >> >>> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca < >> >>> >> >> >> >> cristian.petro...@gmail.com>: >> >>> >> >> >> >> >> >> >> >>>> >>> >> >>> >> >> >> >> >> >> >> >>>> >>> Hi Rafa, >> >>> >> >> >> >> >> >> >> >>>> >>>> >> >>> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic >> but >> >>> I'm >> >>> >> >> >> >> >> >> >> >>>> >>>> working on >> >>> >> >> >> >> >> it. >> >>> >> >> >> >> >> >> I'll >> >>> >> >> >> >> >> >> >> >>>> provide >> >>> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a >> >>> >> feedback on >> >>> >> >> >> >> >> >> >> >>>> >>>> it. >> >>> >> >> >> >> >> >> >> >>>> >>>> >> >>> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features? >> >>> >> >> >> >> >> >> >> >>>> >>>> >> >>> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools >> >>> such as >> >>> >> >> >> >> >> >> >> >>>> >>>> ArkRef >> >>> >> >> >> >> and >> >>> >> >> >> >> >> >> >> >>>> CherryPicker >> >>> >> >> >> >> >> >> >> >>>> >>>> and >> >>> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference. >> >>> >> >> >> >> >> >> >> >>>> >>>> >> >>> >> >> >> >> >> >> >> >>>> >>>> Cristian >> >>> >> >> >> >> >> >> >> >>>> >>>> >> >>> >> >> >> >> >> >> >> >>>> >>>> >> >>> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org >> >: >> >>> >> >> >> >> >> >> >> >>>> >>>> >> >>> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian, >> >>> >> >> >> >> >> >> >> >>>> >>>> >> >>> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about your >> >>> >> concrete >> >>> >> >> >> >> heuristic, >> >>> >> >> >> >> >> >> in my >> >>> >> >> >> >> >> >> >> >>>> honest >> >>> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a >> >>> lot of >> >>> >> >> false >> >>> >> >> >> >> >> >> positives. I >> >>> >> >> >> >> >> >> >> >>>> don't >> >>> >> >> >> >> >> >> >> >>>> >>>>> know >> >>> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some >> "locality" >> >>> >> >> features >> >>> >> >> >> >> >> >> >> >>>> >>>>> to >> >>> >> >> >> >> >> detect >> >>> >> >> >> >> >> >> >> such >> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into >> >>> account >> >>> >> >> that >> >>> >> >> >> >> >> >> >> >>>> >>>>> it >> >>> >> >> >> >> is >> >>> >> >> >> >> >> >> quite >> >>> >> >> >> >> >> >> >> >>>> usual >> >>> >> >> >> >> >> >> >> >>>> >>>>> that >> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even >> in >> >>> >> >> different >> >>> >> >> >> >> >> >> paragraphs. >> >>> >> >> >> >> >> >> >> >>>> Although >> >>> >> >> >> >> >> >> >> >>>> >>>>> I'm >> >>> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language >> >>> >> Understanding, >> >>> >> >> I >> >>> >> >> >> >> would >> >>> >> >> >> >> >> say >> >>> >> >> >> >> >> >> it >> >>> >> >> >> >> >> >> >> is >> >>> >> >> >> >> >> >> >> >>>> quite >> >>> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent >> precision/recall >> >>> rates >> >>> >> >> for >> >>> >> >> >> >> >> >> coreferencing >> >>> >> >> >> >> >> >> >> >>>> using >> >>> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try >> to >> >>> >> others >> >>> >> >> >> >> >> >> >> >>>> >>>>> tools >> >>> >> >> >> >> like >> >>> >> >> >> >> >> >> BART >> >>> >> >> >> >> >> >> >> ( >> >>> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/). >> >>> >> >> >> >> >> >> >> >>>> >>>>> >> >>> >> >> >> >> >> >> >> >>>> >>>>> Cheers, >> >>> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro >> >>> >> >> >> >> >> >> >> >>>> >>>>> >> >>> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca >> >>> escribió: >> >>> >> >> >> >> >> >> >> >>>> >>>>> >> >>> >> >> >> >> >> >> >> >>>> >>>>> Hi, >> >>> >> >> >> >> >> >> >> >>>> >>>>> >> >>> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for >> >>> implementing >> >>> >> the >> >>> >> >> >> >> >> >> >> >>>> >>>>>> Event >> >>> >> >> >> >> >> >> >> extraction >> >>> >> >> >> >> >> >> >> >>>> Engine >> >>> >> >> >> >> >> >> >> >>>> >>>>>> feature : >> >>> >> >> >> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is >> >>> >> >> >> >> >> >> >> >>>> to >> >>> >> >> >> >> >> >> >> >>>> >>>>>> have >> >>> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given >> text. >> >>> >> This >> >>> >> >> is >> >>> >> >> >> >> >> provided >> >>> >> >> >> >> >> >> now >> >>> >> >> >> >> >> >> >> >>>> via the >> >>> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I >> saw >> >>> this >> >>> >> >> >> >> >> >> >> >>>> >>>>>> module >> >>> >> >> >> >> is >> >>> >> >> >> >> >> >> >> performing >> >>> >> >> >> >> >> >> >> >>>> >>>>>> mostly >> >>> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal >> (Barack >> >>> Obama >> >>> >> and >> >>> >> >> >> >> >> >> >> >>>> >>>>>> Mr. >> >>> >> >> >> >> >> Obama) >> >>> >> >> >> >> >> >> >> >>>> coreference >> >>> >> >> >> >> >> >> >> >>>> >>>>>> resolution. >> >>> >> >> >> >> >> >> >> >>>> >>>>>> >> >>> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences >> from >> >>> the >> >>> >> text >> >>> >> >> I >> >>> >> >> >> >> though >> >>> >> >> >> >> >> of >> >>> >> >> >> >> >> >> >> >>>> creating >> >>> >> >> >> >> >> >> >> >>>> >>>>>> some >> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of >> >>> >> >> coreference : >> >>> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. >> The >> >>> >> software >> >>> >> >> >> >> company >> >>> >> >> >> >> >> just >> >>> >> >> >> >> >> >> >> >>>> announced >> >>> >> >> >> >> >> >> >> >>>> >>>>>> its >> >>> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings." >> >>> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously >> >>> refers >> >>> >> to >> >>> >> >> >> >> "Apple". >> >>> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of >> >>> Named >> >>> >> >> >> >> >> >> >> >>>> >>>>>> Entities >> >>> >> >> >> >> >> which >> >>> >> >> >> >> >> >> are >> >>> >> >> >> >> >> >> >> of >> >>> >> >> >> >> >> >> >> >>>> the >> >>> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in >> this >> >>> case >> >>> >> >> >> >> >> >> >> >>>> >>>>>> "company" >> >>> >> >> >> >> and >> >>> >> >> >> >> >> >> also >> >>> >> >> >> >> >> >> >> >>>> have >> >>> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the >> >>> dbpedia >> >>> >> >> >> >> categories >> >>> >> >> >> >> >> of >> >>> >> >> >> >> >> >> the >> >>> >> >> >> >> >> >> >> >>>> named >> >>> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software". >> >>> >> >> >> >> >> >> >> >>>> >>>>>> >> >>> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as >> >>> "The >> >>> >> >> >> >> >> >> >> >>>> >>>>>> software >> >>> >> >> >> >> >> >> company" in >> >>> >> >> >> >> >> >> >> >>>> the >> >>> >> >> >> >> >> >> >> >>>> >>>>>> text >> >>> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using >> the >> >>> new >> >>> >> Pos >> >>> >> >> Tag >> >>> >> >> >> >> Based >> >>> >> >> >> >> >> >> Phrase >> >>> >> >> >> >> >> >> >> >>>> >>>>>> extraction >> >>> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a >> >>> >> dependency >> >>> >> >> >> >> >> >> >> >>>> >>>>>> tree of >> >>> >> >> >> >> >> the >> >>> >> >> >> >> >> >> >> >>>> sentence and >> >>> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects. >> >>> >> >> >> >> >> >> >> >>>> >>>>>> >> >>> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if >> this >> >>> kind >> >>> >> of >> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic >> >>> >> >> >> >> >> would >> >>> >> >> >> >> >> >> be >> >>> >> >> >> >> >> >> >> >>>> useful >> >>> >> >> >> >> >> >> >> >>>> >>>>>> as a >> >>> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case >> the >> >>> >> >> precision >> >>> >> >> >> >> >> >> >> >>>> >>>>>> and >> >>> >> >> >> >> >> >> recall >> >>> >> >> >> >> >> >> >> are >> >>> >> >> >> >> >> >> >> >>>> good >> >>> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol? >> >>> >> >> >> >> >> >> >> >>>> >>>>>> >> >>> >> >> >> >> >> >> >> >>>> >>>>>> Thanks, >> >>> >> >> >> >> >> >> >> >>>> >>>>>> Cristian >> >>> >> >> >> >> >> >> >> >>>> >>>>>> >> >>> >> >> >> >> >> >> >> >>>> >>>>>> >> >>> >> >> >> >> >> >> >> >>>> >>>>>> >> >>> >> >> >> >> >> >> >> >>>> >> >> >>> >> >> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >> >>>> -- >> >>> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler >> >>> >> >> >> >> rupert.westentha...@gmail.com >> >>> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11 >> >>> >> >> >> >> >> >> ++43-699-11108907 >> >>> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen >> >>> >> >> >> >> >> >> >> >>>> >> >>> >> >> >> >> >> >> >> >>> >> >>> >> >> >> >> >> >> >> >>> >> >>> >> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> -- >> >>> >> >> >> >> >> >> >> | Rupert Westenthaler >> >>> >> >> >> >> >> >> >> rupert.westentha...@gmail.com >> >>> >> >> >> >> >> >> >> | Bodenlehenstraße 11 >> >>> >> >> >> >> ++43-699-11108907 >> >>> >> >> >> >> >> >> >> | A-5500 Bischofshofen >> >>> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> -- >> >>> >> >> >> >> >> >> | Rupert Westenthaler >> >>> >> >> rupert.westentha...@gmail.com >> >>> >> >> >> >> >> >> | Bodenlehenstraße 11 >> >>> >> >> >> >> >> >> ++43-699-11108907 >> >>> >> >> >> >> >> >> | A-5500 Bischofshofen >> >>> >> >> >> >> >> >> >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> -- >> >>> >> >> >> >> >> | Rupert Westenthaler >> >>> >> rupert.westentha...@gmail.com >> >>> >> >> >> >> >> | Bodenlehenstraße 11 >> >>> >> >> ++43-699-11108907 >> >>> >> >> >> >> >> | A-5500 Bischofshofen >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >> -- >> >>> >> >> >> >> | Rupert Westenthaler >> >>> rupert.westentha...@gmail.com >> >>> >> >> >> >> | Bodenlehenstraße 11 >> >>> >> ++43-699-11108907 >> >>> >> >> >> >> | A-5500 Bischofshofen >> >>> >> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> -- >> >>> >> >> >> | Rupert Westenthaler >> rupert.westentha...@gmail.com >> >>> >> >> >> | Bodenlehenstraße 11 >> >>> ++43-699-11108907 >> >>> >> >> >> | A-5500 Bischofshofen >> >>> >> >> > >> >>> >> >> > >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> -- >> >>> >> >> | Rupert Westenthaler rupert.westentha...@gmail.com >> >>> >> >> | Bodenlehenstraße 11 >> ++43-699-11108907 >> >>> >> >> | A-5500 Bischofshofen >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> -- >> >>> >> | Rupert Westenthaler rupert.westentha...@gmail.com >> >>> >> | Bodenlehenstraße 11 >> ++43-699-11108907 >> >>> >> | A-5500 Bischofshofen >> >>> >> >> >>> >> >>> >> >>> >> >>> -- >> >>> | Rupert Westenthaler rupert.westentha...@gmail.com >> >>> | Bodenlehenstraße 11 ++43-699-11108907 >> >>> | A-5500 Bischofshofen >> >>> >> >> >> >> >> >> >> >> -- >> | Rupert Westenthaler rupert.westentha...@gmail.com >> | Bodenlehenstraße 11 ++43-699-11108907 >> | A-5500 Bischofshofen >> > >