Quoth Sean:

Make a simple annotation engine that determines note type and adjusts the
> properties of sections identified with the common section header based upon
> the note type.


FWIW, this is what we do.  For inpatient documents, "History" maps to "Past
Medical History"; for outpatient radiology, "History" maps to "Reason for
Exam".

A lot of people in the community don't dream in java


I do, sometimes... but then I wake up screaming.  ;-)

Kean Kaufmann
Chief Architect - NLP
RecordsOne, Inc.

On Sat, Jan 30, 2021 at 10:01 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Thomas,
>
> Short answer:
> You can't do that.  The collection of Section definitions is shared
> through all of the pipelines.
>
> Long answer:
> I think that there might be another approach.
>
> My guess is that within your two different note types there is some common
> section header expression, but the content and intention and use of the
> section information is different.
>
> If that is the case, I would propose the following:
>
> 1.  Use just a single sectionizer.
> -- sectionization, as with any regex process, can be "slow".  It is better
> to detect a common word by running just a single regex over text than two
> different regex that look for the same word.
> 2.  Use one pipeline definition.
> -- While using two unlike pipelines simultaneously, if processing n notes
> of type A takes X seconds and processing n' notes of type B takes >>X
> seconds then you are stuck waiting on B process time.
> -- It also makes latter description of a single pipeline easier ...  as
> below (hopefully).
> 3.  Make a simple annotation engine that determines note type and adjusts
> the properties of sections identified with the common section header based
> upon the note type.
> -- The complexity of this depends upon the differences in sections with
> common headers.
>
> -- Please Note: I am typing this freehand, so there are probably typos and
> missing items.  There are also probably better ways to do the same thing.
> It should give you the general idea.  A lot of people in the community
> don't dream in java so I sometimes add this kind of thing to (hopefully)
> save time.
>
>
> String noteType = new NoteSpecs( jCas ).getDocumentType();
>
> List<Segment> sections = new ArrayList( JCasUtil.select( jCas,
> Segment.class ) );
> Collections.sort( Comparator.comparingInt( Segment::getBegin ) );
>
> if ( sections.size <= 1 ) {
>    return;
> }
>
> //  Join sections if one is unwanted.
> Collection<Segment> unwantedSections = new HashSet<>();
> Segment previousSection = sections.get( 0 );
> for ( int i=1; i<sections.size; i++ ) {
>    Segment section = sections.get( i );
>    if ( !isWantedSection( noteType, section.getPreferredText() ) {
>       previousSection.setEnd( section.getEnd() );
>       unwantedSections.add( section );
>       section.removeFromIndices();
>       continue;
>    }
>    previousSection = section;
> }
> sections.removeAll( unwantedSections );
>
> // Rename Sections
> sections.foreach( s -> adjustSectionInfo( noteType, s ) );
>
>
> //  Something to defined unwanted sections:
> Collection<String> BAD_A_SECTIONS = Arrays.asList( "Bilge", "Plumbing" );
> Collection<String> BAD_B_SECTIONS = Arrays.asList( "Joint", "Elbow" );
> boolean isWantedSection( String noteType, String sectionType ) {
>    return ( sectionType.equals("A") && BAD_A_SECTIONS.contains(
> sectionType ) )
>            ||   ( sectionType.equals("B") && BAD_B_SECTIONS.contains(
> sectionType ) )
> }
>
> // And something to adjust properties of certain section types:
> Map<String,String> X_TO_A_SECTIONS = new HashMap<>()
> Map<String,String> X_TO_B_SECTIONS = new HashMap<>()
> initRenameMaps() {
>    X_TO_A_SECTIONS.put( "Stern", "Sternum" );
>    X_TO_B_SECTIONS.put( "Stern", "Tough Guy" );
> }
> void adjustSectionInfo( String noteType, Segment section ) {
>    if ( noteType.equals( "A" ) ) {
>        String newName = X_TO_A_SECTIONS.get( segment.getPreferredText() );
>        if ( newName != null ) {
>          section.setPreferredText( newName );
>       }
>    } else if ( noteType.equals( "B" ) {
>       etc.
>    }
> }
>
>
>
> Sean
>
>
>
> ________________________________________
> From: Thomas W Loehfelm <twloehf...@ucdavis.edu.INVALID>
> Sent: Friday, January 29, 2021 7:25 PM
> To: dev@ctakes.apache.org
> Subject: Re: Passing SectionsBsv to piper containing BsvRegexSectionizer
> [EXTERNAL]
>
> * External Email - Caution *
>
>
> Sorry for the second email.
>
> The a_engine and b_engine lines contain typos in that they do not specify
> the specific a_ or b_pipeline – I inadvertently introduced this typo just
> while reproducing the generic example into the email – the original code is
> correct so that is not the source of the problem.
>
> And to further clarify, the general concept works – both AE pools are
> created, and both can process text, it is literally just that the
> SectionsBsv param setting persists between the two so that the second pool
> ends up using the same BSV file as the first one.
>
>
> From: Thomas W Loehfelm <twloehf...@ucdavis.edu.INVALID>
> Date: Friday, January 29, 2021 at 4:11 PM
> To: dev@ctakes.apache.org <dev@ctakes.apache.org>
> Subject: Passing SectionsBsv to piper containing BsvRegexSectionizer
> I have a CTakes API endpoint based on the REST API and I am trying to
> specifiy a different BSV file depending on the type of text.
>
> My idea is to instantiate two different analysis engine pools, and direct
> text one or the other depending on which type of report it is. This seems
> simpler to me than spinning up two entirely separate ctakes end points and
> using one for one type and one for the other, though I know that I could
> accomplish what I am looking to do by going that direction. It seems like I
> am missing something basic that is preventing my initial plan from working
> though.
>
> Let’s say the different AE pools are A and B as below, and say the
> PIPER_FILEs at the paths are the same except they hard code a different Bsv
> file like so:
> A_PIPER_FILE includes: add BsvRegexSectionizer SectionsBsv=resources/a.bsv
> B_PIPER_FILE includes: add BsvRegexSectionizer SectionsBsv=resources/b.bsv
>
> final PiperFileReader a_reader = new PiperFileReader(A_PIPER_FILE_PATH);
> final PipelineBuilder a_builder = a_reader.getBuilder();
> final AnalysisEngineDescription a_pipeline =
> a_builder.getAnalysisEngineDesc();
> _a_engine = UIMAFramework.produceAnalysisEngine(pipeline);
> _a_pool = new JCasPool( 2, _a_engine );
>
> final PiperFileReader b_reader = new PiperFileReader(B_PIPER_FILE_PATH);
> final PipelineBuilder b_builder = b_reader.getBuilder();
> final AnalysisEngineDescription b_pipeline =
> b_builder.getAnalysisEngineDesc();
> _b_engine = UIMAFramework.produceAnalysisEngine(pipeline);
> _b_pool = new JCasPool( 2, _b_engine );
>
>
> The problem I am running in to is that the “B” analysis engine uses the
> “A” SectionsBsv file even though the piper files specify the correct one to
> use. It seems that once SectionsBsv is set once, it is not reset even
> though a subsequent piper file may specify a different resource to use.
>
> Any ideas on what is happening, how I can clear or reset that param, or
> whether there is a different way to accomplish what I am trying to do?
>
> Things I have tried:
>
>   1.  Adding “_b_engine.reconfigure();” between _b_engine and _b_pool
> lines.
>      *   No effect.
>   2.  Removing the hard-coded SectionsBsv assignment from the piper file,
> using the SAME piper file for each instance, and passing in SectionsBsv as
> a param.
>      *   I am not sure how to do this using the construction above. I have
> looked in to CliOptionals but do not have a good grasp of them.
>      *   I have tried adding “a_builder.set(“SectionsBsv”,
> “resources/a.bsv”) after the a_builder is created but that had no affect
> either
>
> Thanks in advance for your consideration.
>
> Tom
> **CONFIDENTIALITY NOTICE** This e-mail communication and any attachments
> are for the sole use of the intended recipient and may contain information
> that is confidential and privileged under state and federal privacy laws.
> If you received this e-mail in error, be aware that any unauthorized use,
> disclosure, copying, or distribution is strictly prohibited. If you
> received this e-mail in error, please contact the sender immediately and
> destroy/delete all copies of this message.
>

Reply via email to