Re: Stanbol NER questions

Rajan Shah Tue, 09 Jun 2015 03:48:13 -0700

Hi Rupert,

Thanks a lot for the detailed answer.


Is there any plan from Christian to get something soon? or Is it even on
Stanbol roadmap for coming quarter? How can someone vote for the feature
request?

Suppose, in the meantime if I want to develop my custom enhancer to capture
very small subset of the feature request where two entities are associated
with simple relationship.

For ex.
Apple buys Metaio

What is the best way to approach in current framework? Is it possible to
provide some snippet/reference?

With best regards,
Rajan

On Mon, Jun 8, 2015 at 3:46 AM, Rupert Westenthaler <
rupert.westentha...@gmail.com> wrote:

> Hi Rajan,
>
> regarding dereferenceing:
>
>
> For small and medium sized Datasets using the SolrYard for both is the
> way to go. For big datasets (e.g. dbpedia) you can still use the
> SolrYard, but the size of the SolrCore will be much bigger as the size
> of the TripleStore. This is because Solr stores documents (stored
> fields) while the Triple Store stores a Grpah. So e.g. if your dataset
> contains 200k dbpedia:Person the SolrYard would store the URI
> "dbpedia:Person" 200k times. In the TripleStore you will just store it
> a single time. So while Solr does (by default) compress stored filed
> it will still be more inefficient for storage if your dataset contains
> a lot of URI values. If your dataset uses mainly Literal values this
> does not apply.
>
> On the other hand: Solr is amazingly fast for dereferencing ^^
>
>
> regarding Entity co-mention
>
> >> >
> >> > *3. Entity co-mention:*
> >> >
> >> > From the documentation, it's not crystal clear that how this engine
> >> works?
> >> > Is it possible to provide a quick concrete example in couple lines?
> >> >
> >> > Does it require two entities live in same solr index or namespace?
> >>
> >> IMO the example
> >>
> >>     ... Barack Obama gave a talk to members of the Labor Union ...
> >> Obama specially mentioned ...
> >>
> >> describes it well. Because "Barack Obama" is already mentioned before
> >> "Obama" is treated as a co-mention. The engine builds an index over
> >> mentions of previous fise:TextAnnotation. It only works on data
> >> already present in the ContentItem. Id does not require to have the CV
> >> in any specific storage (e.g. the Entityhub).
> >>
> >>
> > Is there any plan to extend it to capture the relation such as
> > "Researcher1" and "Researcher2" are two different entities and they're
> > mentioned in a research paper published by both of them?
>
> This more putting three entities (researcher1, researcher2, the
> research paper) in context to each others. Cristian Petroaca is doing
> some work on this but their is nothing ready to be used ATM. You might
> be interested in STANBOL-1121 and maybe
> http://markmail.org/message/3fqdprc7nsjgaz3t for more background
> information.
>
> best
> Rupert
>
>
> On Tue, Jun 2, 2015 at 6:06 PM, Rajan Shah <raja...@gmail.com> wrote:
> > Cool. Thanks a lot for the quick reply.
> >
> > Yes, it works very well.
> >
> > With best regards,
> > Rajan
> >
> > On Tue, Jun 2, 2015 at 10:57 AM, aj...@virginia.edu <aj...@virginia.edu>
> > wrote:
> >
> >> On Jun 2, 2015, at 10:54 AM, Rajan Shah <raja...@gmail.com> wrote:
> >>
> >> > In this case, is it fair to assume that one needs to have both of
> these
> >> > yards?
> >> >
> >> > a. Solr yard for fast search
> >> > b. Clerzza yard for dereference
> >> >
> >> > Is this the optimal way to use stanbol NER and leverage full
> potential?
> >>
> >> If your entity definitions are relatively simple (no bnodes, no
> "internal
> >> structure", just predicates with simple values) you can dereference them
> >> perfectly well from a SolrYard.
> >>
> >>
> >> ---
> >> A. Soroka
> >> The University of Virginia Library
> >>
> >>
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westentha...@gmail.com
> | Bodenlehenstraße 11                              ++43-699-11108907
> | A-5500 Bischofshofen
> | REDLINK.CO
> ..........................................................................
> | http://redlink.co/
>

Re: Stanbol NER questions

Reply via email to