Hi Pablo,
Thinking more deeply in this....if LingPipeSpotter is using Exact
Dictionary Based chunking, how is possible that a stopword was spotted
just being part of a surface form?? As far as I know, LingPipe's
dictionary implementation is based on exact matching.
Regards
El 15/12/12 17:39, Pablo N. Mendes escribió:
Hi Rafa,
The part that is perhaps confusing here is that the stopword list is
used in multiple places. The SpanishAnalyzer removes them from the
context index (used in disambiguation). What you report is that you
see stopwords being spotted, which is a problem with your spotter
dictionary (and the class that created it) or the spotter implementation.
Try this:
1) check if your *indexing.es.properties* configuration is pointing to
the right stopwords file for spanish. If yes, check if that file
contains the undesired words you see spotted. If no, that's your problem.
2) check if surfaceForms.tsv contain these spurious stopwords. If yes,
then you need to double check what's happening in
IndexLingPipeSpotter. Create a small surfaceForms.tsv and
stopwords.txt and step through the code
Which spotter are you using? I am assuming it is LingPipeSpotter.
Cheers
pablo
On Dec 15, 2012 12:13 AM, "Rafa Haro" <[email protected]
<mailto:[email protected]>> wrote:
Hi all,
I'm not sure if this is a bug, a problem with my local installation or
an issue in the project. Testing our local installation in Spanish we
are having problems with the list of stopwords. I'm almost sure
that the
list is being used properly during the indexing with Lucene's
SpanishAnalyzer. But then, when we annotate a text in Spanish, some
stopwords are selected as spotters and finally linked with a
candidate.
That is also happening sometimes with punctuation marks (dots,
quotes....).
Actually, I don't know if the system applies a stopwords removal
process
to the input text, but I was supposing that it should do it to prevent
this behaviour. Am I right??
Thanks. Regards
This message should be regarded as confidential. If you have
received this email in error please notify the sender and destroy
it immediately. Statements of intent shall only become binding
when confirmed in hard copy by an authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration
number 6440931. The Registered Office is 222 Westbourne Studios,
242 Acklam Road, London W10 5JJ, UK.
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add
services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
This message should be regarded as confidential. If you have received this
email in error please notify the sender and destroy it immediately. Statements
of intent shall only become binding when confirmed in hard copy by an
authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration number
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
London W10 5JJ, UK.
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users