[
https://issues.apache.org/jira/browse/CTAKES-373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959132#comment-15959132
]
Sean Finan commented on CTAKES-373:
-----------------------------------
There is also a capability within the RegexSectionizer to utilize line
dividers. The can be used to partition sections and can also be "removed" from
the cas as parseable text, thereby avoiding later parsing for sentences, lists,
identified annotations, etc.
> MaxentParserWrapper can't handle section dividers: "=========="
> ---------------------------------------------------------------
>
> Key: CTAKES-373
> URL: https://issues.apache.org/jira/browse/CTAKES-373
> Project: cTAKES
> Issue Type: Improvement
> Components: ctakes-constituency-parser
> Affects Versions: 3.2.0, 3.1.1, 3.2.1, 3.2.2, 3.2.3
> Reporter: Sean Finan
> Assignee: Sean Finan
> Labels: performance
> Fix For: 3.2.3
>
>
> Notes often contain section "dividers" of a single text character, such as:
> "============================================"
> When the Constituency Parser hits these [sentences], it can churn for 30
> seconds (in my runs). For 60 notes containing two of these lines, that is a
> solid hour of useless processing.
> There shouldn't be any downstream dependencies on such lines, so they
> shouldn't be parsed.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)