Hi Cristian, NER Annotations are typically available as both NlpAnnotations.NER_ANNOTATION and fise:TextAnnotation [1] in the enhancement metadata. As you are already accessing the AnayzedText I would prefer using the NlpAnnotations.NER_ANNOTATION.
best Rupert [1] http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca <cristian.petro...@gmail.com> wrote: > Thanks. > I assume I should get the Named entities using the same but with > NlpAnnotations.NER_ANNOTATION? > > > > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler < > rupert.westentha...@gmail.com>: > >> Hallo Cristian, >> >> NounPhrases are not added to the RDF enhancement results. You need to >> use the AnalyzedText ContentPart [1] >> >> here is some demo code you can use in the computeEnhancement method >> >> AnalysedText at = NlpEngineHelper.getAnalysedText(this, ci, true); >> Iterator<? extends Section> sections = at.getSentences(); >> if(!sections.hasNext()){ //process as single sentence >> sections = Collections.singleton(at).iterator(); >> } >> >> while(sections.hasNext()){ >> Section section = sections.next(); >> Iterator<Span> chunks = >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk)); >> while(chunks.hasNext()){ >> Span chunk = chunks.next(); >> Value<PhraseTag> phrase = >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION); >> if(phrase.value().getCategory() == LexicalCategory.Noun){ >> log.info(" - NounPhrase [{},{}] {}", new Object[]{ >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()}); >> } >> } >> } >> >> hope this helps >> >> best >> Rupert >> >> [1] >> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca >> <cristian.petro...@gmail.com> wrote: >> > I started to implement the engine and I'm having problems with getting >> > results for noun phrases. I modified the "default" weighted chain to also >> > include the PosChunkerEngine and ran a sample text : "Angela Merkel >> visted >> > China. The german chancellor met with various people". I expected that >> the >> > RDF XML output would contain some info about the noun phrases but I >> cannot >> > see any. >> > Could you point me to the correct way to generate the noun phrases? >> > >> > Thanks, >> > Cristian >> > >> > >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca < >> cristian.petro...@gmail.com>: >> > >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279 >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca < >> cristian.petro...@gmail.com> >> >> : >> >> >> >> Hi Rupert, >> >>> >> >>> The "spatial" dimension is a good idea. I'll also take a look at Yago. >> >>> >> >>> I will create a Jira with what we talked about here. It will probably >> >>> have just a draft-like description for now and will be updated as I go >> >>> along. >> >>> >> >>> Thanks, >> >>> Cristian >> >>> >> >>> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler < >> >>> rupert.westentha...@gmail.com>: >> >>> >> >>> Hi Cristian, >> >>>> >> >>>> definitely an interesting approach. You should have a look at Yago2 >> >>>> [1]. As far as I can remember the Yago taxonomy is much better >> >>>> structured as the one used by dbpedia. Mapping suggestions of dbpedia >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do provide >> >>>> mappings [2] and [3] >> >>>> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>: >> >>>> >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company made a >> >>>> >> huge profit". >> >>>> >> >>>> Thats actually a very good example. Spatial contexts are very >> >>>> important as they tend to be often used for referencing. So I would >> >>>> suggest to specially treat the spatial context. For spatial Entities >> >>>> (like a City) this is easy, but even for other (like a Person, >> >>>> Company) you could use relations to spatial entities define their >> >>>> spatial context. This context could than be used to correctly link >> >>>> "The Redmond's company" to "Microsoft". >> >>>> >> >>>> In addition I would suggest to use the "spatial" context of each >> >>>> entity (basically relation to entities that are cities, regions, >> >>>> countries) as a separate dimension, because those are very often used >> >>>> for coreferences. >> >>>> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/ >> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2 >> >>>> [3] >> >>>> >> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z >> >>>> >> >>>> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca >> >>>> <cristian.petro...@gmail.com> wrote: >> >>>> > There are several dbpedia categories for each entity, in this case >> for >> >>>> > Microsoft we have : >> >>>> > >> >>>> > category:Companies_in_the_NASDAQ-100_Index >> >>>> > category:Microsoft >> >>>> > category:Software_companies_of_the_United_States >> >>>> > category:Software_companies_based_in_Washington_(state) >> >>>> > category:Companies_established_in_1975 >> >>>> > category:1975_establishments_in_the_United_States >> >>>> > category:Companies_based_in_Redmond,_Washington >> >>>> > category:Multinational_companies_headquartered_in_the_United_States >> >>>> > category:Cloud_computing_providers >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average >> >>>> > >> >>>> > So we also have "Companies based in Redmont,Washington" which could >> be >> >>>> > matched. >> >>>> > >> >>>> > >> >>>> > There is still other contextual information from dbpedia which can >> be >> >>>> used. >> >>>> > For example for an Organization we could also include : >> >>>> > dbpprop:industry = Software >> >>>> > dbpprop:service = Online Service Providers >> >>>> > >> >>>> > and for a Person (that's for Barack Obama) : >> >>>> > >> >>>> > dbpedia-owl:profession: >> >>>> > dbpedia:Author >> >>>> > dbpedia:Constitutional_law >> >>>> > dbpedia:Lawyer >> >>>> > dbpedia:Community_organizing >> >>>> > >> >>>> > I'd like to continue investigating this as I think that it may have >> >>>> some >> >>>> > value in increasing the number of coreference resolutions and I'd >> like >> >>>> to >> >>>> > concentrate more on precision rather than recall since we already >> have >> >>>> a >> >>>> > set of coreferences detected by the stanford nlp tool and this would >> >>>> be as >> >>>> > an addition to that (at least this is how I would like to use it). >> >>>> > >> >>>> > Is it ok if I track this by opening a jira? I could update it to >> show >> >>>> my >> >>>> > progress and also my conclusions and if it turns out that it was a >> bad >> >>>> idea >> >>>> > then that's the situation at least I'll end up with more knowledge >> >>>> about >> >>>> > Stanbol in the end :). >> >>>> > >> >>>> > >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>: >> >>>> > >> >>>> >> Hi Cristian, >> >>>> >> >> >>>> >> The approach sounds nice. I don't want to be the devil's advocate >> but >> >>>> I'm >> >>>> >> just not sure about the recall using the dbpedia categories >> feature. >> >>>> For >> >>>> >> example, your sentence could be also "Microsoft posted its 2013 >> >>>> earnings. >> >>>> >> The Redmond's company made a huge profit". So, maybe including more >> >>>> >> contextual information from dbpedia could increase the recall but >> of >> >>>> course >> >>>> >> will reduce the precision. >> >>>> >> >> >>>> >> Cheers, >> >>>> >> Rafa >> >>>> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió: >> >>>> >> >> >>>> >> Back with a more detailed description of the steps for making this >> >>>> kind of >> >>>> >>> coreference work. >> >>>> >>> >> >>>> >>> I will be using references to the following text in the steps >> below >> >>>> in >> >>>> >>> order to make things clearer : "Microsoft posted its 2013 >> earnings. >> >>>> The >> >>>> >>> software company made a huge profit." >> >>>> >>> >> >>>> >>> 1. For every noun phrase in the text which has : >> >>>> >>> a. a determinate pos which implies reference to an entity >> local >> >>>> to >> >>>> >>> the >> >>>> >>> text, such as "the, this, these") but not "another, every", etc >> which >> >>>> >>> implies a reference to an entity outside of the text. >> >>>> >>> b. having at least another noun aside from the main required >> >>>> noun >> >>>> >>> which >> >>>> >>> further describes it. For example I will not count "The company" >> as >> >>>> being >> >>>> >>> a >> >>>> >>> legitimate candidate since this could create a lot of false >> >>>> positives by >> >>>> >>> considering the double meaning of some words such as "in the >> company >> >>>> of >> >>>> >>> good people". >> >>>> >>> "The software company" is a good candidate since we also have >> >>>> "software". >> >>>> >>> >> >>>> >>> 2. match the nouns in the noun phrase to the contents of the >> dbpedia >> >>>> >>> categories of each named entity found prior to the location of the >> >>>> noun >> >>>> >>> phrase in the text. >> >>>> >>> The dbpedia categories are in the following format (for Microsoft >> for >> >>>> >>> example) : "Software companies of the United States". >> >>>> >>> So we try to match "software company" with that. >> >>>> >>> First, as you can see, the main noun in the dbpedia category has a >> >>>> plural >> >>>> >>> form and it's the same for all categories which I saw. I don't >> know >> >>>> if >> >>>> >>> there's an easier way to do this but I thought of applying a >> >>>> lemmatizer on >> >>>> >>> the category and the noun phrase in order for them to have a >> common >> >>>> >>> denominator.This also works if the noun phrase itself has a plural >> >>>> form. >> >>>> >>> >> >>>> >>> Second, I'll need to use for comparison only the words in the >> >>>> category >> >>>> >>> which are themselves nouns and not prepositions or determiners >> such >> >>>> as "of >> >>>> >>> the".This means that I need to pos tag the categories contents as >> >>>> well. >> >>>> >>> I was thinking of running the pos and lemma on the dbpedia >> >>>> categories when >> >>>> >>> building the dbpedia backed entity hub and storing them for later >> >>>> use - I >> >>>> >>> don't know how feasible this is at the moment. >> >>>> >>> >> >>>> >>> After this I can compare each noun in the noun phrase with the >> >>>> equivalent >> >>>> >>> nouns in the categories and based on the number of matches I can >> >>>> create a >> >>>> >>> confidence level. >> >>>> >>> >> >>>> >>> 3. match the noun of the noun phrase with the rdf:type from >> dbpedia >> >>>> of the >> >>>> >>> named entity. If this matches increase the confidence level. >> >>>> >>> >> >>>> >>> 4. If there are multiple named entities which can match a certain >> >>>> noun >> >>>> >>> phrase then link the noun phrase with the closest named entity >> prior >> >>>> to it >> >>>> >>> in the text. >> >>>> >>> >> >>>> >>> What do you think? >> >>>> >>> >> >>>> >>> Cristian >> >>>> >>> >> >>>> >>> 2014-01-31 Cristian Petroaca <cristian.petro...@gmail.com>: >> >>>> >>> >> >>>> >>> Hi Rafa, >> >>>> >>>> >> >>>> >>>> I don't yet have a concrete heursitic but I'm working on it. I'll >> >>>> provide >> >>>> >>>> it here so that you guys can give me a feedback on it. >> >>>> >>>> >> >>>> >>>> What are "locality" features? >> >>>> >>>> >> >>>> >>>> I looked at Bart and other coref tools such as ArkRef and >> >>>> CherryPicker >> >>>> >>>> and >> >>>> >>>> they don't provide such a coreference. >> >>>> >>>> >> >>>> >>>> Cristian >> >>>> >>>> >> >>>> >>>> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>: >> >>>> >>>> >> >>>> >>>> Hi Cristian, >> >>>> >>>> >> >>>> >>>>> Without having more details about your concrete heuristic, in my >> >>>> honest >> >>>> >>>>> opinion, such approach could produce a lot of false positives. I >> >>>> don't >> >>>> >>>>> know >> >>>> >>>>> if you are planning to use some "locality" features to detect >> such >> >>>> >>>>> coreferences but you need to take into account that it is quite >> >>>> usual >> >>>> >>>>> that >> >>>> >>>>> coreferenced mentions can occurs even in different paragraphs. >> >>>> Although >> >>>> >>>>> I'm >> >>>> >>>>> not an expert in Natural Language Understanding, I would say it >> is >> >>>> quite >> >>>> >>>>> difficult to get decent precision/recall rates for coreferencing >> >>>> using >> >>>> >>>>> fixed rules. Maybe you can give a try to others tools like BART >> ( >> >>>> >>>>> http://www.bart-coref.org/). >> >>>> >>>>> >> >>>> >>>>> Cheers, >> >>>> >>>>> Rafa Haro >> >>>> >>>>> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió: >> >>>> >>>>> >> >>>> >>>>> Hi, >> >>>> >>>>> >> >>>> >>>>>> One of the necessary steps for implementing the Event >> extraction >> >>>> Engine >> >>>> >>>>>> feature : https://issues.apache.org/jira/browse/STANBOL-1121is >> >>>> to >> >>>> >>>>>> have >> >>>> >>>>>> coreference resolution in the given text. This is provided now >> >>>> via the >> >>>> >>>>>> stanford-nlp project but as far as I saw this module is >> performing >> >>>> >>>>>> mostly >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama) >> >>>> coreference >> >>>> >>>>>> resolution. >> >>>> >>>>>> >> >>>> >>>>>> In order to get more coreferences from the text I though of >> >>>> creating >> >>>> >>>>>> some >> >>>> >>>>>> logic that would detect this kind of coreference : >> >>>> >>>>>> "Apple reaches new profit heights. The software company just >> >>>> announced >> >>>> >>>>>> its >> >>>> >>>>>> 2013 earnings." >> >>>> >>>>>> Here "The software company" obviously refers to "Apple". >> >>>> >>>>>> So I'd like to detect coreferences of Named Entities which are >> of >> >>>> the >> >>>> >>>>>> rdf:type of the Named Entity , in this case "company" and also >> >>>> have >> >>>> >>>>>> attributes which can be found in the dbpedia categories of the >> >>>> named >> >>>> >>>>>> entity, in this case "software". >> >>>> >>>>>> >> >>>> >>>>>> The detection of coreferences such as "The software company" in >> >>>> the >> >>>> >>>>>> text >> >>>> >>>>>> would also be done by either using the new Pos Tag Based Phrase >> >>>> >>>>>> extraction >> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of the >> >>>> sentence and >> >>>> >>>>>> picking up only subjects or objects. >> >>>> >>>>>> >> >>>> >>>>>> At this point I'd like to know if this kind of logic would be >> >>>> useful >> >>>> >>>>>> as a >> >>>> >>>>>> separate Enhancement Engine (in case the precision and recall >> are >> >>>> good >> >>>> >>>>>> enough) in Stanbol? >> >>>> >>>>>> >> >>>> >>>>>> Thanks, >> >>>> >>>>>> Cristian >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> | Rupert Westenthaler rupert.westentha...@gmail.com >> >>>> | Bodenlehenstraße 11 ++43-699-11108907 >> >>>> | A-5500 Bischofshofen >> >>>> >> >>> >> >>> >> >> >> >> >> >> -- >> | Rupert Westenthaler rupert.westentha...@gmail.com >> | Bodenlehenstraße 11 ++43-699-11108907 >> | A-5500 Bischofshofen >> -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen