[jira] Updated: (UIMA-1639) Fixed bugs which disabled compiled dicts, static dict attributes

2009-10-28 Thread Michael Tanenblatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Tanenblatt updated UIMA-1639:
-

Attachment: EntryPropertiesRoot.java

org/apache/uima/conceptMapper/support/dictionaryResource/EntryPropertiesRoot.java

for repository

> Fixed bugs which disabled compiled dicts, static dict attributes
> 
>
> Key: UIMA-1639
> URL: https://issues.apache.org/jira/browse/UIMA-1639
> Project: UIMA
>  Issue Type: Bug
>  Components: Sandbox-ConceptMapper
>Reporter: Michael Tanenblatt
> Attachments: CM-patch-20091027.txt, EntryPropertiesRoot.java
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (UIMA-1639) Fixed bugs which disabled compiled dicts, static dict attributes

2009-10-27 Thread Michael Tanenblatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Tanenblatt updated UIMA-1639:
-

Attachment: CM-patch-20091027.txt

> Fixed bugs which disabled compiled dicts, static dict attributes
> 
>
> Key: UIMA-1639
> URL: https://issues.apache.org/jira/browse/UIMA-1639
> Project: UIMA
>  Issue Type: Bug
>  Components: Sandbox-ConceptMapper
>Reporter: Michael Tanenblatt
> Attachments: CM-patch-20091027.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (UIMA-1639) Fixed bugs which disabled compiled dicts, static dict attributes

2009-10-27 Thread Michael Tanenblatt (JIRA)
Fixed bugs which disabled compiled dicts, static dict attributes


 Key: UIMA-1639
 URL: https://issues.apache.org/jira/browse/UIMA-1639
 Project: UIMA
  Issue Type: Bug
  Components: Sandbox-ConceptMapper
Reporter: Michael Tanenblatt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (UIMA-1605) Fixed Findbugs issues

2009-10-09 Thread Michael Tanenblatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Tanenblatt updated UIMA-1605:
-

Attachment: patch20091009-corrected.txt

Corrected patch file (?)

> Fixed Findbugs issues
> -
>
> Key: UIMA-1605
> URL: https://issues.apache.org/jira/browse/UIMA-1605
> Project: UIMA
>  Issue Type: Bug
>  Components: Sandbox-ConceptMapper
>Reporter: Michael Tanenblatt
>Assignee: Marshall Schor
>Priority: Minor
> Attachments: patch20091009-corrected.txt, patch20091009.txt
>
>
> Corrected a few issues discovered by Findbugs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (UIMA-1605) Fixed Findbugs issues

2009-10-09 Thread Michael Tanenblatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Tanenblatt updated UIMA-1605:
-

Attachment: patch20091009.txt

> Fixed Findbugs issues
> -
>
> Key: UIMA-1605
> URL: https://issues.apache.org/jira/browse/UIMA-1605
> Project: UIMA
>  Issue Type: Bug
>  Components: Sandbox-ConceptMapper
>Reporter: Michael Tanenblatt
>Priority: Minor
> Attachments: patch20091009.txt
>
>
> Corrected a few issues discovered by Findbugs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (UIMA-1605) Fixed Findbugs issues

2009-10-09 Thread Michael Tanenblatt (JIRA)
Fixed Findbugs issues
-

 Key: UIMA-1605
 URL: https://issues.apache.org/jira/browse/UIMA-1605
 Project: UIMA
  Issue Type: Bug
  Components: Sandbox-ConceptMapper
Reporter: Michael Tanenblatt
Priority: Minor
 Attachments: patch20091009.txt

Corrected a few issues discovered by Findbugs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (UIMA-1498) if an exception is rethrown, the original exception is not currently passed through

2009-08-17 Thread Michael Tanenblatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Tanenblatt updated UIMA-1498:
-

Attachment: patch.txt

This patch resolves the issue

> if an exception is rethrown, the original exception is not currently passed 
> through
> ---
>
> Key: UIMA-1498
> URL: https://issues.apache.org/jira/browse/UIMA-1498
> Project: UIMA
>  Issue Type: Improvement
>  Components: Sandbox-ConceptMapper
>Reporter: Michael Tanenblatt
>Priority: Minor
> Attachments: patch.txt
>
>
> if an exception is rethrown, the original exception is not currently passed 
> through

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (UIMA-1498) if an exception is rethrown, the original exception is not currently passed through

2009-08-17 Thread Michael Tanenblatt (JIRA)
if an exception is rethrown, the original exception is not currently passed 
through
---

 Key: UIMA-1498
 URL: https://issues.apache.org/jira/browse/UIMA-1498
 Project: UIMA
  Issue Type: Improvement
  Components: Sandbox-ConceptMapper
Reporter: Michael Tanenblatt
Priority: Minor


if an exception is rethrown, the original exception is not currently passed 
through

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (UIMA-1371) Performance improvement: remove reliance on Property class and excess String building to reduce in-memory dictionary size.

2009-08-17 Thread Michael Tanenblatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Tanenblatt closed UIMA-1371.



> Performance improvement: remove reliance on Property class and excess String 
> building to reduce in-memory dictionary size.
> --
>
> Key: UIMA-1371
> URL: https://issues.apache.org/jira/browse/UIMA-1371
> Project: UIMA
>  Issue Type: New Feature
>  Components: Sandbox-ConceptMapper
> Environment: All
>Reporter: Michael Tanenblatt
>Assignee: Michael Tanenblatt
> Attachments: cm-patch20090605.txt
>
>
> Performance improvement: remove reliance on Property class and excess String 
> building to reduce in-memory dictionary size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (UIMA-1372) Improve description of ConceptMapper on UIMA sandbox components web page

2009-06-05 Thread Michael Tanenblatt (JIRA)
Improve description of ConceptMapper on UIMA sandbox components web page


 Key: UIMA-1372
 URL: https://issues.apache.org/jira/browse/UIMA-1372
 Project: UIMA
  Issue Type: Improvement
 Environment: all
Reporter: Michael Tanenblatt


Here is the proposed new wording:


ConceptMapper is a powerful, highly configurable dictionary UIMA-based 
annotator. Numerous parameters can be used to specify various aspects of the 
lookup algorithm, input processing and output options. The dictionary structure 
is flexible, allowing any number synonyms to be associated with an entry, and 
any number of attributes to be associated with entries or synonyms. Lookup and 
matching against dictionary entries can be performed against contiguous or 
non-contiguous blocks of text, and token order independant lookup is also 
allowed (for example, the tokens "A" "B" would be considered a match against 
dictionary entry "B" "A"). Additionally, ConceptMapper can be configured to use 
any tokenizer annotator, enabling tokenization of its dictionaries identically 
with the input text.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (UIMA-1371) Performance improvement: remove reliance on Property class and excess String building to reduce in-memory dictionary size.

2009-06-05 Thread Michael Tanenblatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Tanenblatt updated UIMA-1371:
-

Attachment: cm-patch20090605.txt

addresses issues in UIMA-1371

> Performance improvement: remove reliance on Property class and excess String 
> building to reduce in-memory dictionary size.
> --
>
> Key: UIMA-1371
> URL: https://issues.apache.org/jira/browse/UIMA-1371
> Project: UIMA
>  Issue Type: New Feature
>  Components: Sandbox-ConceptMapper
> Environment: All
>Reporter: Michael Tanenblatt
> Attachments: cm-patch20090605.txt
>
>
> Performance improvement: remove reliance on Property class and excess String 
> building to reduce in-memory dictionary size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (UIMA-1371) Performance improvement: remove reliance on Property class and excess String building to reduce in-memory dictionary size.

2009-06-05 Thread Michael Tanenblatt (JIRA)
Performance improvement: remove reliance on Property class and excess String 
building to reduce in-memory dictionary size.
--

 Key: UIMA-1371
 URL: https://issues.apache.org/jira/browse/UIMA-1371
 Project: UIMA
  Issue Type: New Feature
  Components: Sandbox-ConceptMapper
 Environment: All
Reporter: Michael Tanenblatt


Performance improvement: remove reliance on Property class and excess String 
building to reduce in-memory dictionary size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (UIMA-1336) allow multiple dictionary entries to match against a single string

2009-04-29 Thread Michael Tanenblatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Tanenblatt updated UIMA-1336:
-

Attachment: CM-multireturn-patch.txt

This patch fixes the issue

> allow multiple dictionary entries to match against a single string
> --
>
> Key: UIMA-1336
> URL: https://issues.apache.org/jira/browse/UIMA-1336
> Project: UIMA
>  Issue Type: Improvement
>  Components: Sandbox-ConceptMapper
>Reporter: Michael Tanenblatt
> Attachments: CM-multireturn-patch.txt
>
>
> If multiple dictionary entries contain the same text, only one will be 
> selected to match against the input text, even if the parameter 
> "FindAllMatches" is set to true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (UIMA-1336) allow multiple dictionary entries to match against a single string

2009-04-29 Thread Michael Tanenblatt (JIRA)
allow multiple dictionary entries to match against a single string
--

 Key: UIMA-1336
 URL: https://issues.apache.org/jira/browse/UIMA-1336
 Project: UIMA
  Issue Type: Improvement
  Components: Sandbox-ConceptMapper
Reporter: Michael Tanenblatt


If multiple dictionary entries contain the same text, only one will be selected 
to match against the input text, even if the parameter "FindAllMatches" is set 
to true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (UIMA-1301) Update documentation, log problems when dictionary entries don't load, remove diagnostic message during dictionary loading

2009-03-11 Thread Michael Tanenblatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Tanenblatt updated UIMA-1301:
-

Attachment: ConceptMapper20090311Patch.txt

> Update documentation, log problems when dictionary entries don't load, remove 
> diagnostic message during dictionary loading
> --
>
> Key: UIMA-1301
> URL: https://issues.apache.org/jira/browse/UIMA-1301
> Project: UIMA
>  Issue Type: Improvement
>  Components: Sandbox-ConceptMapper
>Reporter: Michael Tanenblatt
> Attachments: ConceptMapper20090311Patch.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (UIMA-1301) Update documentation, log problems when dictionary entries don't load, remove diagnostic message during dictionary loading

2009-03-11 Thread Michael Tanenblatt (JIRA)
Update documentation, log problems when dictionary entries don't load, remove 
diagnostic message during dictionary loading
--

 Key: UIMA-1301
 URL: https://issues.apache.org/jira/browse/UIMA-1301
 Project: UIMA
  Issue Type: Improvement
  Components: Sandbox-ConceptMapper
Reporter: Michael Tanenblatt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (UIMA-1033) ConceptMapper--a highly configurable, token-based dictionary lookup UIMA component

2008-05-19 Thread Michael Tanenblatt (JIRA)

[ 
https://issues.apache.org/jira/browse/UIMA-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597986#action_12597986
 ] 

Michael Tanenblatt commented on UIMA-1033:
--

Sorry about that missing uima.tt.TokenAnnotation--I must have done that in an 
overexuberant fit of cleaning!
As to future plans: we use ConceptMapper extensively in our projects, and am 
certainly interested in helping maintain and enhance it as needed, time 
permitting.


> ConceptMapper--a highly configurable, token-based dictionary lookup UIMA 
> component
> --
>
> Key: UIMA-1033
> URL: https://issues.apache.org/jira/browse/UIMA-1033
> Project: UIMA
>  Issue Type: New Feature
>  Components: Sandbox
> Environment: Java 5
>Reporter: Michael Tanenblatt
>Priority: Minor
> Attachments: conceptMapper.zip, conceptMapper.zip.md5
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> ConceptMapper is a token-based dictionary lookup UIMA component. It was
> designed specifically to allow any external tokenizer that is a UIMA
> component to be used to tokenize its dictionary. Using the same tokenizer
> on both the dictionary and for subsequent text processing prevents
> situations where a particular dictionary entry is not found, though it
> exists, because it was tokenized differently than the text being processed.
> ConceptMapper is highly configurable, in terms of:
>  * the way dictionary entries are mapped to resultant annotations
>  * the way input documents are processed
>  * the availability of multiple lookup strategies
>  * its various output options.
> Additionally, a set of post-processing filters are supplied, as well as an
> interface to easily create new filters. This allows for overgenerating
> results during the lookup phase, if so desired, then reducing the result
> set according to particular rules.
> More details:
> The structure of the dictionary itself is quite flexible. Entries can have
> any number of variants (synonyms), and arbitrary features can be associated
> with dictionary entries. Individual variants inherit features from parent
> token (i.e., the canonical from), but can override them or add additional
> features. In the following sample dictionary entry, there are 5 variants of
> the canonical form, and as described earlier, each inherits the SemClass
> and POS attributes from the canonical form, with the exception of the
> variant "mesenteric fibromatosis (c48.1)", which overrides the value of the
> SemClass attribute (this is somewhat of a contrived example, just to make
> that point):
> 
>
>
> SemClass="Diagnosis-Site" />
>
>
> 
> Input tokens are processed one span at a time, where both the token and
> span (usually a sentence) annotation type are configurable. Additionally,
> the particular feature of the token annotation to use for lookups can be
> specified, otherwise its covered text is used. Other input configuration
> settings are whether to use case sensitive matching, an optional class name
> of a stemmer to apply to the tokens, and a list of stop words to to ignore
> during lookup. One additional input control mechanism is the ability to
> skip tokens during lookups based on particular feature values. In this way,
> it is easy to skip, for example, all tokens with particular part of speech
> tags, or with some previously computed semantic class.
> Output is in the form of new annotations, and the type of resulting
> annotations can be specified in a descriptor file. The mapping from
> dictionary entry attributes to the result annotation features can also be
> specified. Additionally, a string containing the matched text, a list of
> matched tokens, and the span enclosing the match can be specified to be set
> in the result annotations. It is also possible to indicate dictionary
> attributes to write back into each of the matched tokens.
> Dictionary lookup is controlled by three parameters in the descriptor, one
> of which allows for order-independent lookup (i.e., A B == B A), another
> togles between finding only the longest match vs. finding all possible
> matches. The final parameter specifies the search strategy, of which there
> are three. The default search strategy only considers contiguous tokens
> (not including tokens frm the stop word list or otherwise skipped tokens),
> and then begins the subsequent search after the longest match. The second
> strategy allows for ignoring non-matching tokens, allowing for disjoint
> matches, so that a dictionary entry of
> A C
> would match against the text
> A B C
> As with the default search strategy, the subsequent search begins after the
> longest match. The final search strategy is identical to the previous,
> except that subsequent searches begin one 

[jira] Updated: (UIMA-1033) ConceptMapper--a highly configurable, token-based dictionary lookup UIMA component

2008-05-14 Thread Michael Tanenblatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Tanenblatt updated UIMA-1033:
-

Attachment: conceptMapper.zip.md5
conceptMapper.zip

Source and md5 signature of ConceptMapper

> ConceptMapper--a highly configurable, token-based dictionary lookup UIMA 
> component
> --
>
> Key: UIMA-1033
> URL: https://issues.apache.org/jira/browse/UIMA-1033
> Project: UIMA
>  Issue Type: New Feature
>  Components: Sandbox
> Environment: Java 5
>Reporter: Michael Tanenblatt
>Priority: Minor
> Attachments: conceptMapper.zip, conceptMapper.zip.md5
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> ConceptMapper is a token-based dictionary lookup UIMA component. It was
> designed specifically to allow any external tokenizer that is a UIMA
> component to be used to tokenize its dictionary. Using the same tokenizer
> on both the dictionary and for subsequent text processing prevents
> situations where a particular dictionary entry is not found, though it
> exists, because it was tokenized differently than the text being processed.
> ConceptMapper is highly configurable, in terms of:
>  * the way dictionary entries are mapped to resultant annotations
>  * the way input documents are processed
>  * the availability of multiple lookup strategies
>  * its various output options.
> Additionally, a set of post-processing filters are supplied, as well as an
> interface to easily create new filters. This allows for overgenerating
> results during the lookup phase, if so desired, then reducing the result
> set according to particular rules.
> More details:
> The structure of the dictionary itself is quite flexible. Entries can have
> any number of variants (synonyms), and arbitrary features can be associated
> with dictionary entries. Individual variants inherit features from parent
> token (i.e., the canonical from), but can override them or add additional
> features. In the following sample dictionary entry, there are 5 variants of
> the canonical form, and as described earlier, each inherits the SemClass
> and POS attributes from the canonical form, with the exception of the
> variant "mesenteric fibromatosis (c48.1)", which overrides the value of the
> SemClass attribute (this is somewhat of a contrived example, just to make
> that point):
> 
>
>
> SemClass="Diagnosis-Site" />
>
>
> 
> Input tokens are processed one span at a time, where both the token and
> span (usually a sentence) annotation type are configurable. Additionally,
> the particular feature of the token annotation to use for lookups can be
> specified, otherwise its covered text is used. Other input configuration
> settings are whether to use case sensitive matching, an optional class name
> of a stemmer to apply to the tokens, and a list of stop words to to ignore
> during lookup. One additional input control mechanism is the ability to
> skip tokens during lookups based on particular feature values. In this way,
> it is easy to skip, for example, all tokens with particular part of speech
> tags, or with some previously computed semantic class.
> Output is in the form of new annotations, and the type of resulting
> annotations can be specified in a descriptor file. The mapping from
> dictionary entry attributes to the result annotation features can also be
> specified. Additionally, a string containing the matched text, a list of
> matched tokens, and the span enclosing the match can be specified to be set
> in the result annotations. It is also possible to indicate dictionary
> attributes to write back into each of the matched tokens.
> Dictionary lookup is controlled by three parameters in the descriptor, one
> of which allows for order-independent lookup (i.e., A B == B A), another
> togles between finding only the longest match vs. finding all possible
> matches. The final parameter specifies the search strategy, of which there
> are three. The default search strategy only considers contiguous tokens
> (not including tokens frm the stop word list or otherwise skipped tokens),
> and then begins the subsequent search after the longest match. The second
> strategy allows for ignoring non-matching tokens, allowing for disjoint
> matches, so that a dictionary entry of
> A C
> would match against the text
> A B C
> As with the default search strategy, the subsequent search begins after the
> longest match. The final search strategy is identical to the previous,
> except that subsequent searches begin one token ahead, instead of after the
> previous match. This enables overlapped matching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (UIMA-1033) ConceptMapper--a highly configurable, token-based dictionary lookup UIMA component

2008-05-14 Thread Michael Tanenblatt (JIRA)
ConceptMapper--a highly configurable, token-based dictionary lookup UIMA 
component
--

 Key: UIMA-1033
 URL: https://issues.apache.org/jira/browse/UIMA-1033
 Project: UIMA
  Issue Type: New Feature
  Components: Sandbox
 Environment: Java 5
Reporter: Michael Tanenblatt
Priority: Minor


ConceptMapper is a token-based dictionary lookup UIMA component. It was
designed specifically to allow any external tokenizer that is a UIMA
component to be used to tokenize its dictionary. Using the same tokenizer
on both the dictionary and for subsequent text processing prevents
situations where a particular dictionary entry is not found, though it
exists, because it was tokenized differently than the text being processed.

ConceptMapper is highly configurable, in terms of:
 * the way dictionary entries are mapped to resultant annotations
 * the way input documents are processed
 * the availability of multiple lookup strategies
 * its various output options.

Additionally, a set of post-processing filters are supplied, as well as an
interface to easily create new filters. This allows for overgenerating
results during the lookup phase, if so desired, then reducing the result
set according to particular rules.

More details:

The structure of the dictionary itself is quite flexible. Entries can have
any number of variants (synonyms), and arbitrary features can be associated
with dictionary entries. Individual variants inherit features from parent
token (i.e., the canonical from), but can override them or add additional
features. In the following sample dictionary entry, there are 5 variants of
the canonical form, and as described earlier, each inherits the SemClass
and POS attributes from the canonical form, with the exception of the
variant "mesenteric fibromatosis (c48.1)", which overrides the value of the
SemClass attribute (this is somewhat of a contrived example, just to make
that point):


   
   
   
   
   


Input tokens are processed one span at a time, where both the token and
span (usually a sentence) annotation type are configurable. Additionally,
the particular feature of the token annotation to use for lookups can be
specified, otherwise its covered text is used. Other input configuration
settings are whether to use case sensitive matching, an optional class name
of a stemmer to apply to the tokens, and a list of stop words to to ignore
during lookup. One additional input control mechanism is the ability to
skip tokens during lookups based on particular feature values. In this way,
it is easy to skip, for example, all tokens with particular part of speech
tags, or with some previously computed semantic class.

Output is in the form of new annotations, and the type of resulting
annotations can be specified in a descriptor file. The mapping from
dictionary entry attributes to the result annotation features can also be
specified. Additionally, a string containing the matched text, a list of
matched tokens, and the span enclosing the match can be specified to be set
in the result annotations. It is also possible to indicate dictionary
attributes to write back into each of the matched tokens.

Dictionary lookup is controlled by three parameters in the descriptor, one
of which allows for order-independent lookup (i.e., A B == B A), another
togles between finding only the longest match vs. finding all possible
matches. The final parameter specifies the search strategy, of which there
are three. The default search strategy only considers contiguous tokens
(not including tokens frm the stop word list or otherwise skipped tokens),
and then begins the subsequent search after the longest match. The second
strategy allows for ignoring non-matching tokens, allowing for disjoint
matches, so that a dictionary entry of

A C

would match against the text

A B C

As with the default search strategy, the subsequent search begins after the
longest match. The final search strategy is identical to the previous,
except that subsequent searches begin one token ahead, instead of after the
previous match. This enables overlapped matching.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.