Quoth Sean: Make a simple annotation engine that determines note type and adjusts the > properties of sections identified with the common section header based upon > the note type.
FWIW, this is what we do. For inpatient documents, "History" maps to "Past Medical History"; for outpatient radiology, "History" maps to "Reason for Exam". A lot of people in the community don't dream in java I do, sometimes... but then I wake up screaming. ;-) Kean Kaufmann Chief Architect - NLP RecordsOne, Inc. On Sat, Jan 30, 2021 at 10:01 AM Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Thomas, > > Short answer: > You can't do that. The collection of Section definitions is shared > through all of the pipelines. > > Long answer: > I think that there might be another approach. > > My guess is that within your two different note types there is some common > section header expression, but the content and intention and use of the > section information is different. > > If that is the case, I would propose the following: > > 1. Use just a single sectionizer. > -- sectionization, as with any regex process, can be "slow". It is better > to detect a common word by running just a single regex over text than two > different regex that look for the same word. > 2. Use one pipeline definition. > -- While using two unlike pipelines simultaneously, if processing n notes > of type A takes X seconds and processing n' notes of type B takes >>X > seconds then you are stuck waiting on B process time. > -- It also makes latter description of a single pipeline easier ... as > below (hopefully). > 3. Make a simple annotation engine that determines note type and adjusts > the properties of sections identified with the common section header based > upon the note type. > -- The complexity of this depends upon the differences in sections with > common headers. > > -- Please Note: I am typing this freehand, so there are probably typos and > missing items. There are also probably better ways to do the same thing. > It should give you the general idea. A lot of people in the community > don't dream in java so I sometimes add this kind of thing to (hopefully) > save time. > > > String noteType = new NoteSpecs( jCas ).getDocumentType(); > > List<Segment> sections = new ArrayList( JCasUtil.select( jCas, > Segment.class ) ); > Collections.sort( Comparator.comparingInt( Segment::getBegin ) ); > > if ( sections.size <= 1 ) { > return; > } > > // Join sections if one is unwanted. > Collection<Segment> unwantedSections = new HashSet<>(); > Segment previousSection = sections.get( 0 ); > for ( int i=1; i<sections.size; i++ ) { > Segment section = sections.get( i ); > if ( !isWantedSection( noteType, section.getPreferredText() ) { > previousSection.setEnd( section.getEnd() ); > unwantedSections.add( section ); > section.removeFromIndices(); > continue; > } > previousSection = section; > } > sections.removeAll( unwantedSections ); > > // Rename Sections > sections.foreach( s -> adjustSectionInfo( noteType, s ) ); > > > // Something to defined unwanted sections: > Collection<String> BAD_A_SECTIONS = Arrays.asList( "Bilge", "Plumbing" ); > Collection<String> BAD_B_SECTIONS = Arrays.asList( "Joint", "Elbow" ); > boolean isWantedSection( String noteType, String sectionType ) { > return ( sectionType.equals("A") && BAD_A_SECTIONS.contains( > sectionType ) ) > || ( sectionType.equals("B") && BAD_B_SECTIONS.contains( > sectionType ) ) > } > > // And something to adjust properties of certain section types: > Map<String,String> X_TO_A_SECTIONS = new HashMap<>() > Map<String,String> X_TO_B_SECTIONS = new HashMap<>() > initRenameMaps() { > X_TO_A_SECTIONS.put( "Stern", "Sternum" ); > X_TO_B_SECTIONS.put( "Stern", "Tough Guy" ); > } > void adjustSectionInfo( String noteType, Segment section ) { > if ( noteType.equals( "A" ) ) { > String newName = X_TO_A_SECTIONS.get( segment.getPreferredText() ); > if ( newName != null ) { > section.setPreferredText( newName ); > } > } else if ( noteType.equals( "B" ) { > etc. > } > } > > > > Sean > > > > ________________________________________ > From: Thomas W Loehfelm <twloehf...@ucdavis.edu.INVALID> > Sent: Friday, January 29, 2021 7:25 PM > To: dev@ctakes.apache.org > Subject: Re: Passing SectionsBsv to piper containing BsvRegexSectionizer > [EXTERNAL] > > * External Email - Caution * > > > Sorry for the second email. > > The a_engine and b_engine lines contain typos in that they do not specify > the specific a_ or b_pipeline – I inadvertently introduced this typo just > while reproducing the generic example into the email – the original code is > correct so that is not the source of the problem. > > And to further clarify, the general concept works – both AE pools are > created, and both can process text, it is literally just that the > SectionsBsv param setting persists between the two so that the second pool > ends up using the same BSV file as the first one. > > > From: Thomas W Loehfelm <twloehf...@ucdavis.edu.INVALID> > Date: Friday, January 29, 2021 at 4:11 PM > To: dev@ctakes.apache.org <dev@ctakes.apache.org> > Subject: Passing SectionsBsv to piper containing BsvRegexSectionizer > I have a CTakes API endpoint based on the REST API and I am trying to > specifiy a different BSV file depending on the type of text. > > My idea is to instantiate two different analysis engine pools, and direct > text one or the other depending on which type of report it is. This seems > simpler to me than spinning up two entirely separate ctakes end points and > using one for one type and one for the other, though I know that I could > accomplish what I am looking to do by going that direction. It seems like I > am missing something basic that is preventing my initial plan from working > though. > > Let’s say the different AE pools are A and B as below, and say the > PIPER_FILEs at the paths are the same except they hard code a different Bsv > file like so: > A_PIPER_FILE includes: add BsvRegexSectionizer SectionsBsv=resources/a.bsv > B_PIPER_FILE includes: add BsvRegexSectionizer SectionsBsv=resources/b.bsv > > final PiperFileReader a_reader = new PiperFileReader(A_PIPER_FILE_PATH); > final PipelineBuilder a_builder = a_reader.getBuilder(); > final AnalysisEngineDescription a_pipeline = > a_builder.getAnalysisEngineDesc(); > _a_engine = UIMAFramework.produceAnalysisEngine(pipeline); > _a_pool = new JCasPool( 2, _a_engine ); > > final PiperFileReader b_reader = new PiperFileReader(B_PIPER_FILE_PATH); > final PipelineBuilder b_builder = b_reader.getBuilder(); > final AnalysisEngineDescription b_pipeline = > b_builder.getAnalysisEngineDesc(); > _b_engine = UIMAFramework.produceAnalysisEngine(pipeline); > _b_pool = new JCasPool( 2, _b_engine ); > > > The problem I am running in to is that the “B” analysis engine uses the > “A” SectionsBsv file even though the piper files specify the correct one to > use. It seems that once SectionsBsv is set once, it is not reset even > though a subsequent piper file may specify a different resource to use. > > Any ideas on what is happening, how I can clear or reset that param, or > whether there is a different way to accomplish what I am trying to do? > > Things I have tried: > > 1. Adding “_b_engine.reconfigure();” between _b_engine and _b_pool > lines. > * No effect. > 2. Removing the hard-coded SectionsBsv assignment from the piper file, > using the SAME piper file for each instance, and passing in SectionsBsv as > a param. > * I am not sure how to do this using the construction above. I have > looked in to CliOptionals but do not have a good grasp of them. > * I have tried adding “a_builder.set(“SectionsBsv”, > “resources/a.bsv”) after the a_builder is created but that had no affect > either > > Thanks in advance for your consideration. > > Tom > **CONFIDENTIALITY NOTICE** This e-mail communication and any attachments > are for the sole use of the intended recipient and may contain information > that is confidential and privileged under state and federal privacy laws. > If you received this e-mail in error, be aware that any unauthorized use, > disclosure, copying, or distribution is strictly prohibited. If you > received this e-mail in error, please contact the sender immediately and > destroy/delete all copies of this message. >