Re: Passing SectionsBsv to piper containing BsvRegexSectionizer [EXTERNAL] [SUSPICIOUS]

Thomas W Loehfelm Wed, 03 Feb 2021 15:01:28 -0800

Thanks for the suggestion Peter. This is sort of what I thought I was doing but 
may not be creating the second singleton class at a deep enough layer.


I follow the general pattern of the ctakes-tiny-rest 
(https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-tiny-rest/src/main/java/org/apache/ctakes/rest/service/RestPipelineRunner.java)

I thought that by defining two different RestPipelineRunner classes 
(A_PipelineRunner and B_PipelineRunner) that the analysis engines and JCasPools 
created within would be unrelated to each other, but it turns out that once the 
A_PipelineRunner specifies a BSV file to use, the B_PipelineRunner uses the 
same BSV file regardless of what the B_piper file says.
________________________________
From: Finan, Sean <[email protected]>
Sent: Saturday, January 30, 2021 2:32 PM
To: [email protected] <[email protected]>
Subject: Re: Passing SectionsBsv to piper containing BsvRegexSectionizer 
[EXTERNAL] [SUSPICIOUS]

True, true.
________________________________________
From: Peter Abramowitsch <[email protected]>
Sent: Saturday, January 30, 2021 5:21 PM
To: [email protected]
Subject: Re: Passing SectionsBsv to piper containing BsvRegexSectionizer 
[EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi Tom,  I think there is a way to do what you were thinking of.  I'm not
suggesting it's a better solution.  It's just a thought.   In Java you can
create a new ClassLoader, and with this ClassLoader you can create a second
definition of a Class, and from that you can create a new instance of the
class which does not share anything - even the statics of the other class
instance.... thus allowing you to create two singletons of the same
"class"  I doubt that you would be able to do it via a piper, though.
You'd have to create the pipeline programmatically using the AnalysisEngine
APIs and add the BSV lookups from the different class loaders.   It would
be a lot of work and not as tidy as the kind of thing Sean is suggesting.
But you can lie awake at night thinking about it anyway.

Peter

On Sat, Jan 30, 2021 at 4:27 PM Finan, Sean <
[email protected]> wrote:

> Hi Tom,
>
> You aren't the only person with an itchy "send" clicker.
>
> You probably don't want
>    if ( sections.size <= 1 ) {
>       return;
>    }
> Because you may want to rename that first section.
>
>
> Anyway, all of the code would go into a new JCasAnnotator_ImplBase class
> process( jCas ) method.
>
> class SectionAdjuster extends JCasAnnotator_ImplBase {
> @Override
>    public void process( final JCas jcas ) throws
> AnalysisEngineProcessException {
>       // all that code ...
>    }
> }
>
> Then that would go into the piper after the BsvRegexSectionizer
>
> // Fetch Sections
> add BsvRegexSectionizer SectionsBsv=/my/custom/file.bsv
> // Change or remove Sections
> add my.java.package.SectionAdjuster
>
>
>
> Sean
>
>
> ________________________________________
> From: Finan, Sean <[email protected]>
> Sent: Saturday, January 30, 2021 10:00 AM
> To: [email protected]
> Subject: Re: Passing SectionsBsv to piper containing BsvRegexSectionizer
> [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> Hi Thomas,
>
> Short answer:
> You can't do that.  The collection of Section definitions is shared
> through all of the pipelines.
>
> Long answer:
> I think that there might be another approach.
>
> My guess is that within your two different note types there is some common
> section header expression, but the content and intention and use of the
> section information is different.
>
> If that is the case, I would propose the following:
>
> 1.  Use just a single sectionizer.
> -- sectionization, as with any regex process, can be "slow".  It is better
> to detect a common word by running just a single regex over text than two
> different regex that look for the same word.
> 2.  Use one pipeline definition.
> -- While using two unlike pipelines simultaneously, if processing n notes
> of type A takes X seconds and processing n' notes of type B takes >>X
> seconds then you are stuck waiting on B process time.
> -- It also makes latter description of a single pipeline easier ...  as
> below (hopefully).
> 3.  Make a simple annotation engine that determines note type and adjusts
> the properties of sections identified with the common section header based
> upon the note type.
> -- The complexity of this depends upon the differences in sections with
> common headers.
>
> -- Please Note: I am typing this freehand, so there are probably typos and
> missing items.  There are also probably better ways to do the same thing.
> It should give you the general idea.  A lot of people in the community
> don't dream in java so I sometimes add this kind of thing to (hopefully)
> save time.
>
>
> String noteType = new NoteSpecs( jCas ).getDocumentType();
>
> List<Segment> sections = new ArrayList( JCasUtil.select( jCas,
> Segment.class ) );
> Collections.sort( Comparator.comparingInt( Segment::getBegin ) );
>
> if ( sections.size <= 1 ) {
>    return;
> }
>
> //  Join sections if one is unwanted.
> Collection<Segment> unwantedSections = new HashSet<>();
> Segment previousSection = sections.get( 0 );
> for ( int i=1; i<sections.size; i++ ) {
>    Segment section = sections.get( i );
>    if ( !isWantedSection( noteType, section.getPreferredText() ) {
>       previousSection.setEnd( section.getEnd() );
>       unwantedSections.add( section );
>       section.removeFromIndices();
>       continue;
>    }
>    previousSection = section;
> }
> sections.removeAll( unwantedSections );
>
> // Rename Sections
> sections.foreach( s -> adjustSectionInfo( noteType, s ) );
>
>
> //  Something to defined unwanted sections:
> Collection<String> BAD_A_SECTIONS = Arrays.asList( "Bilge", "Plumbing" );
> Collection<String> BAD_B_SECTIONS = Arrays.asList( "Joint", "Elbow" );
> boolean isWantedSection( String noteType, String sectionType ) {
>    return ( sectionType.equals("A") && BAD_A_SECTIONS.contains(
> sectionType ) )
>            ||   ( sectionType.equals("B") && BAD_B_SECTIONS.contains(
> sectionType ) )
> }
>
> // And something to adjust properties of certain section types:
> Map<String,String> X_TO_A_SECTIONS = new HashMap<>()
> Map<String,String> X_TO_B_SECTIONS = new HashMap<>()
> initRenameMaps() {
>    X_TO_A_SECTIONS.put( "Stern", "Sternum" );
>    X_TO_B_SECTIONS.put( "Stern", "Tough Guy" );
> }
> void adjustSectionInfo( String noteType, Segment section ) {
>    if ( noteType.equals( "A" ) ) {
>        String newName = X_TO_A_SECTIONS.get( segment.getPreferredText() );
>        if ( newName != null ) {
>          section.setPreferredText( newName );
>       }
>    } else if ( noteType.equals( "B" ) {
>       etc.
>    }
> }
>
>
>
> Sean
>
>
>
> ________________________________________
> From: Thomas W Loehfelm <[email protected]>
> Sent: Friday, January 29, 2021 7:25 PM
> To: [email protected]
> Subject: Re: Passing SectionsBsv to piper containing BsvRegexSectionizer
> [EXTERNAL]
>
> * External Email - Caution *
>
>
> Sorry for the second email.
>
> The a_engine and b_engine lines contain typos in that they do not specify
> the specific a_ or b_pipeline – I inadvertently introduced this typo just
> while reproducing the generic example into the email – the original code is
> correct so that is not the source of the problem.
>
> And to further clarify, the general concept works – both AE pools are
> created, and both can process text, it is literally just that the
> SectionsBsv param setting persists between the two so that the second pool
> ends up using the same BSV file as the first one.
>
>
> From: Thomas W Loehfelm <[email protected]>
> Date: Friday, January 29, 2021 at 4:11 PM
> To: [email protected] <[email protected]>
> Subject: Passing SectionsBsv to piper containing BsvRegexSectionizer
> I have a CTakes API endpoint based on the REST API and I am trying to
> specifiy a different BSV file depending on the type of text.
>
> My idea is to instantiate two different analysis engine pools, and direct
> text one or the other depending on which type of report it is. This seems
> simpler to me than spinning up two entirely separate ctakes end points and
> using one for one type and one for the other, though I know that I could
> accomplish what I am looking to do by going that direction. It seems like I
> am missing something basic that is preventing my initial plan from working
> though.
>
> Let’s say the different AE pools are A and B as below, and say the
> PIPER_FILEs at the paths are the same except they hard code a different Bsv
> file like so:
> A_PIPER_FILE includes: add BsvRegexSectionizer SectionsBsv=resources/a.bsv
> B_PIPER_FILE includes: add BsvRegexSectionizer SectionsBsv=resources/b.bsv
>
> final PiperFileReader a_reader = new PiperFileReader(A_PIPER_FILE_PATH);
> final PipelineBuilder a_builder = a_reader.getBuilder();
> final AnalysisEngineDescription a_pipeline =
> a_builder.getAnalysisEngineDesc();
> _a_engine = UIMAFramework.produceAnalysisEngine(pipeline);
> _a_pool = new JCasPool( 2, _a_engine );
>
> final PiperFileReader b_reader = new PiperFileReader(B_PIPER_FILE_PATH);
> final PipelineBuilder b_builder = b_reader.getBuilder();
> final AnalysisEngineDescription b_pipeline =
> b_builder.getAnalysisEngineDesc();
> _b_engine = UIMAFramework.produceAnalysisEngine(pipeline);
> _b_pool = new JCasPool( 2, _b_engine );
>
>
> The problem I am running in to is that the “B” analysis engine uses the
> “A” SectionsBsv file even though the piper files specify the correct one to
> use. It seems that once SectionsBsv is set once, it is not reset even
> though a subsequent piper file may specify a different resource to use.
>
> Any ideas on what is happening, how I can clear or reset that param, or
> whether there is a different way to accomplish what I am trying to do?
>
> Things I have tried:
>
>   1.  Adding “_b_engine.reconfigure();” between _b_engine and _b_pool
> lines.
>      *   No effect.
>   2.  Removing the hard-coded SectionsBsv assignment from the piper file,
> using the SAME piper file for each instance, and passing in SectionsBsv as
> a param.
>      *   I am not sure how to do this using the construction above. I have
> looked in to CliOptionals but do not have a good grasp of them.
>      *   I have tried adding “a_builder.set(“SectionsBsv”,
> “resources/a.bsv”) after the a_builder is created but that had no affect
> either
>
> Thanks in advance for your consideration.
>
> Tom
> **CONFIDENTIALITY NOTICE** This e-mail communication and any attachments
> are for the sole use of the intended recipient and may contain information
> that is confidential and privileged under state and federal privacy laws.
> If you received this e-mail in error, be aware that any unauthorized use,
> disclosure, copying, or distribution is strictly prohibited. If you
> received this e-mail in error, please contact the sender immediately and
> destroy/delete all copies of this message.
>

Re: Passing SectionsBsv to piper containing BsvRegexSectionizer [EXTERNAL] [SUSPICIOUS]

Reply via email to