Re: duplicate types in ctakes types system?

2019-10-24 Thread Dligach, Dmitriy
 was taken or administered.  
Value set includes Topical, Enteral_Oral, Parenteral_Intravenous, Other, 
undetermined, etc.
  
org.apache.ctakes.typesystem.type.refsem.Attribute
  

  value
  
  uima.cas.String

  


 
  org.apache.ctakes.typesystem.type.refsem.ProcedureMethod
  The way or the equipment used to give or administration 
something (medication, test). This corresponds to the Procedures UMLS semantic 
group.
More qualifying information on how the procedure was done.
  
org.apache.ctakes.typesystem.type.refsem.Attribute
  

  value
  
  uima.cas.String

  



  org.apache.ctakes.typesystem.type.refsem.BodyLaterality
  The proximity of the location in anatomical terms (distal, 
proximal, superior, anterior and etc.). This is finer-grained to allow 
combinations of values.
  
org.apache.ctakes.typesystem.type.refsem.Attribute
  

  value
  
  uima.cas.String
    
  




Dima


> On Oct 24, 2019, at 15:27, Dligach, Dmitriy  wrote:
> 
> Dear cTAKES developers,
> 
> Does anybody know why quite a few types are defined multiple times in 
> TypeSystem.xml?
> 
> E.g. I see this at line 576:
> 
>
>  org.apache.ctakes.typesystem.type.refsem.LabReferenceRange
>  Holds a narrative (i.e. string) reference 
> range
>  
> org.apache.ctakes.typesystem.type.refsem.Attribute
>  
>
>  value
>  
>  uima.cas.String
>
>  
>
>
> 
> And then I see this at line 2165:
> 
>
>  org.apache.ctakes.typesystem.type.refsem.LabReferenceRange
>  Holds a narrative (i.e. string) reference 
> range
>  
> org.apache.ctakes.typesystem.type.refsem.Attribute
>  
>
>  value
>  
>  uima.cas.String
>
>  
>
>
> 
> Thank you in advance,
> 
> Dima
> 



duplicate types in ctakes types system?

2019-10-24 Thread Dligach, Dmitriy
Dear cTAKES developers,

Does anybody know why quite a few types are defined multiple times in 
TypeSystem.xml?

E.g. I see this at line 576:


  org.apache.ctakes.typesystem.type.refsem.LabReferenceRange
  Holds a narrative (i.e. string) reference range
  
org.apache.ctakes.typesystem.type.refsem.Attribute
  

  value
  
  uima.cas.String

  



And then I see this at line 2165:


  org.apache.ctakes.typesystem.type.refsem.LabReferenceRange
  Holds a narrative (i.e. string) reference range
  
org.apache.ctakes.typesystem.type.refsem.Attribute
  

  value
  
  uima.cas.String

  



Thank you in advance,

Dima



Re: Relation Extractor Training Data

2019-08-07 Thread Dligach, Dmitriy
Hi John,

The relation extractor in cTAKES was trained using SHARP data. Please contact 
the hNLP center (http://center.healthnlp.org) to see if you can get access to 
this dataset.

All the best,

Dima

--
Dmitriy (Dima) Dligach
Assistant Professor
Department of Computer Science
Loyola University Chicago

On Aug 7, 2019, at 08:07, Petersam, John Contractor 
mailto:john.peter...@ssa.gov>> wrote:

Hi,
Can someone point me to the original data set that was used to train the 
relation extractor?  We have a requirement to add some additional relations but 
want to maintain the existing set as well.

Thanks,
John Petersam



mapping drug mentions to rxcuis

2018-10-04 Thread Dligach, Dmitriy
Dear cTAKES developers and users,

cTAKES dictionary lookup is very much capable of identifying drug mentions 
(e.g. ‘morphine sulfate’). I believe ctakes-drug-ner is capable of identifying 
drug dosage, frequency, etc. (although I haven’t tried it — if somebody had 
good experience with it please let me know).

My main question is:

Is there a tool in cTAKES or elsewhere that could map drug mentions to RxCUIs? 
This is slightly different from what dictionary lookup does; e.g. RxCUI for 
MORPHINE SULFATE 4 MG/ML INJ SOLN is 894779, while for MORPHINE SULFATE 2 MG/ML 
INJ, it is 892588 (i.e. it differs because of the dosage).

Thank you in advance,


Dima






Re: Cannot authenticate license on REST API TRACKING:000308016 [EXTERNAL]

2018-07-19 Thread Dligach, Dmitriy
Hi Ritika,

We had similar issues back in February (you can search the email archive). I 
was under impression that this happened when multiple instances of cTAKES were 
run on the same machine under different user names. A couple of times we were 
able to fix this problem by cleaning out a bunch of files from /tmp (such as 
conn.xml), but that “fix” stopped working at some problem. We weren’t able to 
determine the root cause of this problem...

Ultimately, we replaced CentOS with Ubuntu on that machine and the problem went 
away for good.

Best,

Dima



On Jul 19, 2018, at 06:09, Finan, Sean 
mailto:sean.fi...@childrens.harvard.edu>> 
wrote:

Hi Ritika,

I am glad that adding your proxy information got you one step closer to a 
working configuration.  However, I cannot say why your password isn't being 
properly validated.  If you can reach the umls server and your credentials are 
correct then the umls server should reply positively and ctakes should let the 
pipeline continue.

Does anybody else on the devlist have any ideas?

Sean

From: Jain, Ritika mailto:ritika.j...@philips.com>>
Sent: Thursday, July 19, 2018 5:06 AM
To: dev@ctakes.apache.org
Subject: RE: Cannot authenticate license on REST API TRACKING:000308016 
[EXTERNAL]

I can get it working adding proxy parameters in the java command, now I do not 
get the connection timeout, but a different error that the user is not valid. 
If you follow the email chain below, the support person from UMLS says that my 
user is a valid user and the account user to validate the user is not for end 
point users.

Can you help me with this?



14:29:06,054 DEBUG [DataBinder] DataBinder requires binding of required fields 
[]
14:29:06,060 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@39c6fd02]
14:29:06,067 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@39c6fd02]
14:29:06,072 INFO  [Chunker] Chunker model file: 
org/apache/ctakes/chunker/models/chunker-model.zip
14:29:07,745 DEBUG [DataBinder] DataBinder requires binding of required fields 
[]
14:29:07,746 INFO  [TokenizerAnnotatorPTB] Initializing 
org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
14:29:07,756 DEBUG [DataBinder] DataBinder requires binding of required fields 
[]
14:29:07,770 INFO  [ContextDependentTokenizerAnnotator] Finite state machines 
loaded.
14:29:07,778 DEBUG [DataBinder] DataBinder requires binding of required fields 
[]
14:29:07,779 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@725e7dcc]
14:29:07,782 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@725e7dcc]
14:29:07,785 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@725e7dcc]
14:29:07,792 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@725e7dcc]
14:29:07,854 DEBUG [StandardEnvironment] Initializing new StandardEnvironment
14:29:07,857 DEBUG [StandardEnvironment] Adding [systemProperties] 
PropertySource with lowest search precedence
14:29:07,858 DEBUG [StandardEnvironment] Adding [systemEnvironment] 
PropertySource with lowest search precedence
14:29:07,860 DEBUG [StandardEnvironment] Initialized StandardEnvironment with 
PropertySources [systemProperties,systemEnvironment]
14:29:07,864 DEBUG [StandardEnvironment] Initializing new StandardEnvironment
14:29:07,865 DEBUG [StandardEnvironment] Adding [systemProperties] 
PropertySource with lowest search precedence
14:29:07,867 DEBUG [StandardEnvironment] Adding [systemEnvironment] 
PropertySource with lowest search precedence
14:29:07,870 DEBUG [StandardEnvironment] Initialized StandardEnvironment with 
PropertySources [systemProperties,systemEnvironment]
14:29:07,873 DEBUG [StandardEnvironment] Initializing new StandardEnvironment
14:29:07,875 DEBUG [StandardEnvironment] Adding [systemProperties] 
PropertySource with lowest search precedence
14:29:07,877 DEBUG [StandardEnvironment] Adding [systemEnvironment] 
PropertySource with lowest search precedence
14:29:07,880 DEBUG [StandardEnvironment] Initialized StandardEnvironment with 
PropertySources [systemProperties,systemEnvironment]
14:29:07,890 DEBUG [StandardEnvironment] Initializing new StandardEnvironment
14:29:07,892 DEBUG [StandardEnvironment] Adding [systemProperties] 
PropertySource with lowest search precedence
14:29:07,895 DEBUG 

Re: UmlsUserApprover Error [EXTERNAL]

2018-03-13 Thread Dligach, Dmitriy
Hi Sean and everybody,

I just wanted to confirm that I intermittently run into the same issue. I was 
able to fix it yesterday by removing a bunch of files from /tmp (such as 
conn.xml, which dictionary lookup apparently creates there under my user name). 
However, today, the problem returned and I haven’t found a way to fix it.

Again, the same pipeline runs fine on my laptop (Mac OS) and another machine 
that runs Linux.

Dima



On Mar 13, 2018, at 14:13, Andrew Phillips 
> wrote:

Hi Sean,

I looked into changing the relevant configs to non-checking, but didn't
have much success. I am back to primarily trying to troubleshoot the
original error. I have tried removing files from /tmp (especially conn.xml)
as they seemed to be contributing to the issue. I also tried setting up
cTAKES with a freshly created account, and am still encountering this
issue. Other users on the server also have varying mileage with the same
task. Do you have any more ideas of anything else I can try?

Thank you.

*Andrew Phillips*
GitHub: github.com/skeledrew
LinkedIn: 
www.linkedin.com/in/aphillipstech

On 27 February 2018 at 17:45, Finan, Sean 
>
wrote:

Hi Andrew,

If you created your own dictionary then you are already bound by your own
license credentials used to get the umls download.  You do not need to use
the ctakes license check.

In your lookup configuration xml file you can change the UmlsJdbc** to the
non-checking Jdbc**.  There are several email threads with associated
information such as http://mail-archives.apache.org/mod_mbox/ctakes-dev/
201802.mbox/%3CCANLSW%2B%3DbGXkqDoY_fUMc%3DgvGjLbTfXfL4aq7vy-abe%
3DW6qm6Vw%40mail.gmail.com%3E

Sean

-Original Message-
From: Andrew Phillips [mailto:aphilli...@luc.edu]
Sent: Tuesday, February 27, 2018 5:13 PM
To: dev@ctakes.apache.org
Subject: Re: UmlsUserApprover Error [EXTERNAL]

Hi Sean,

I am using a dictionary created from the 2017AA-full which I downloaded
from the UTS site. Does this count as default?

*Andrew Phillips*
GitHub: github.com/skeledrew
LinkedIn: 
www.linkedin.com/in/aphillipstech

On 27 February 2018 at 15:41, Finan, Sean 
.
edu>
wrote:

Hi Andrew,

As far as I know there isn't an explicit timeout imposed by ctakes.
There is probably a java or system timeout, but I don't know of an
easy way to change it.

If you use the default dictionary then you should allow the account
check.

Sean

-Original Message-
From: Andrew Phillips [mailto:aphilli...@luc.edu]
Sent: Tuesday, February 27, 2018 4:27 PM
To: dev@ctakes.apache.org
Subject: Re: UmlsUserApprover Error [EXTERNAL]

@Gandhi: I have always skipped the tests whenever I run the install
command.

@Sean: 2 of the 4 folders I ran on were eventually processed, and only
after running the script multiple times, so it does seem to be a
connection issue. However I am trying to run the commands for the last
couple of folders manually and the error has been consistent. Is there
any way to bypass the account check, or increase the timeout? Or can I
just have a loop that continually retries until the operation succeeds?

*Andrew Phillips*
GitHub: github.com/skeledrew
LinkedIn: 
www.linkedin.com/in/aphillipstech

On 27 February 2018 at 07:18, Finan, Sean 
.
edu>
wrote:

Hi Andrew,

You wrote:
I ran my script earlier, [...]and only the first run was successful.

Are you saying that one run did succeed?  If that is the case then
the problem probably is your network.

The umls credential check will print dots in the log as time
progresses, such as the dots from your log below:
24 Feb 2018 18:22:25  INFO UmlsUserApprover - Checking UMLS
Account
[ ... ]
.. 10 ...
24 Feb 2018 18:22:40 ERROR UmlsUserApprover - 
uts-ws.nlm.nih.gov

It looks like the credential check took several  (~15?) seconds,
which might indicate a slow connection or an eventual connection
refused.
That does not mean that the slowdown is on your side.  It could be
that the nih server that handles the credential checks is rarely
getting to your request.  I'm not sure why that would be (No net
neutrality rants, please).

Anyway, if the credential check works even once then that is a good
indication that the problem is outside ctakes.

Sean






-Original Message-
From: Gandhi Rajan Natarajan
[mailto:gandhi.natara...@arisglobal.com]
Sent: Tuesday, February 27, 2018 7:49 AM
To: dev@ctakes.apache.org

Re: CAS Visual Debugger - [EXTERNAL]

2017-10-25 Thread Dligach, Dmitriy
+1

Also, I’d love to be able to point CVD to a directory containing XMI files at 
startup.

Dima



On Oct 25, 2017, at 12:41, Miller, Timothy 
>
 wrote:

I've had the same thought, and come to the same conclusions.
Tim


From: Melvin Ma >
Sent: Wednesday, October 25, 2017 1:33 PM
To: dev@ctakes.apache.org
Subject: CAS Visual Debugger - [EXTERNAL]

This is more of a question. I am fully aware that CAS Visual Debugger is
maintained in UIMA project.

For me for now, I will frequently need to use CVD to view .xmi file. It
would be really nice if I could put the type system xml as an input to CVD
startup argument (instead of manully lookup this file and load it). Do you
know anyway to do it? I checked the documents multiple times and was not
able to find anything.

Thanks.

Melvin



Re: ctakes demo down [EXTERNAL]

2017-09-04 Thread Dligach, Dmitriy
James, thanks for pointing out about creating a JIRA issue. I created one:

https://issues.apache.org/jira/browse/CTAKES-454

Dima



On Sep 1, 2017, at 18:21, James Masanz <masanz.ja...@gmail.com> wrote:

Hi Dima,

I would be surprised if the dictionary demo was gone for good since the
temporal one is working, and they were both updated to 4.0 earlier this
year, but can't say, since others maintain those servers.

Not sure if you noticed further down that page, for support it suggests to
submit a JIRA issue via https://issues.apache.org/jira/browse/CTAKES

-- James

On Fri, Sep 1, 2017 at 5:50 PM, Finan, Sean <Sean.Finan@childrens.harvard.
edu> wrote:

Hi Dima,

It looks like the temporal demo is up and running.

Sean
____
From: Dligach, Dmitriy <ddlig...@luc.edu>
Sent: Friday, September 1, 2017 4:16 PM
To: cTAKES Developer list
Subject: ctakes demo down [EXTERNAL]

I wanted to demo cTAKES to a physician collaborator and discovered that
the dictionary lookup demo (http://healthnlp.github.io/examples/) is
down. Does anybody know if it’s down for good or there is some simple way
to bring it back up?




Thank you in advance,



Dima











ctakes demo down

2017-09-01 Thread Dligach, Dmitriy
I wanted to demo cTAKES to a physician collaborator and discovered that the 
dictionary lookup demo (http://healthnlp.github.io/examples/) is down. Does 
anybody know if it’s down for good or there is some simple way to bring it back 
up?

Thank you in advance,

Dima





Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

2017-06-30 Thread Dligach, Dmitriy
Hi Tim,

Good point, but I happen to be using the ctakes-core sentence detector.

Dima



> On Jun 23, 2017, at 06:31, Miller, Timothy 
> <timothy.mil...@childrens.harvard.edu> wrote:
> 
> Something I just thought of is that if you are using the new (beta) sentence 
> detector trained on Mimic, it is a bit of a "lumper" rather than a 
> "splitter," meaning it is more likely to miss a sentence break and make 
> longer sentences, sometimes absurdly long if there are no clear cues. I know 
> that will slow down the constituency parser and dependency parser, but not 
> sure why it would only slow down when negation processing is added. So, not a 
> solution but something to keep in mind while debugging, especially if it 
> interacts with Steve and Sean's feedback.
> Tim
> 
> 
> 
> From: Dligach, Dmitriy <ddlig...@luc.edu>
> Sent: Wednesday, June 21, 2017 9:18 PM
> To: dev@ctakes.apache.org
> Cc: Miller, Timothy
> Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]
> 
> Sean, thanks for your comments. You are right. The slowdown doesn’t have 
> anything to do with documentID.
> 
> I am now convinced that the slowdown has to do with the Polarity annotator. 
> The reason you and others haven’t seen this in other pipelines is that you’ve 
> probably been processing relatively small files.
> 
> I am processing MIMIC patient files, which typically have thousands of words. 
> I just tried to process 300 files from the THYME corpus (where the files have 
> hundreds of words) and the slowdown was barely noticeable. When running the 
> same pipeline on the MIMIC files, the slowdown becomes very noticeable.
> 
> 
> Dima
> 
> 
> 
>> On Jun 5, 2017, at 10:42, Finan, Sean <sean.fi...@childrens.harvard.edu> 
>> wrote:
>> 
>> Hi Dima,
>> 
>> It looks like the UriCollectionReader that you are using never sets a 
>> document id (type DocumentID) in the cas.  However, this shouldn't be a 
>> problem as each document will be assigned a unique id "UnknownDocument"{###} 
>> where {###} is a number incremented per new document with an unknown id.  
>> The message that you are seeing is just a warning.  The code fetching the 
>> documentID and creating a default are very simple and should not take any 
>> real processing time.
>> 
>> The call to get document id is the very first line in 
>> AssertionCleartkAnalysisEngine:
>> @Override
>> public void process(JCas jCas) throws AnalysisEngineProcessException
>> {
>>   String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
>> 
>> So, the slowdown occurring after the warning message leads me to believe 
>> that the problem lies later in the process ...
>> 
>> My suggestion is that you put a breakpoint there and run your pipeline 
>> through a debugger.  Optionally, there are a couple of log.debug messages in 
>> that class, so you could change the granularity of your log4j and see if you 
>> can narrow down the problem.  Add more debug statements if it helps.
>> 
>> At any rate, I have not seen this problem in other pipelines.
>> 
>> Sean
>> 
>> -Original Message-
>> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu]
>> Sent: Wednesday, May 24, 2017 10:34 AM
>> To: cTAKES Developer list
>> Subject: negation/uncertainty: pipeline runs very slowly
>> 
>> Dear cTAKES developers,
>> 
>> I am observing something strange. As soon as I add at the end of my pipeline 
>> the uncertainty/negation AEs:
>> 
>> aggregateBuilder.add( 
>> PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); 
>> aggregateBuilder.add( 
>> UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
>> 
>> the pipeline becomes 10-20 times slower. I just confirmed this again. As 
>> soon as I remove these two AEs at the end of my pipeline, it runs very fast 
>> again.
>> 
>> It seems to get stuck often right after it outputs this warning:
>> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
>> 
>> If I remove the two AEs, this warning disappears.
>> 
>> The full pipeline is here:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_UmlsLookupPipeline.java=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=cQRgT9lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs=fnshTyx1ruwH-8ktFPX4JeX-7PVWplbiPO2RYdGSI9E=
>> 
>> Any clues?
>> 
>> Thank you very much,
>> 
>> Dima
>> 
>> 
>> 
> 



Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

2017-06-30 Thread Dligach, Dmitriy
Hi Sean,

First of all, thank you Sean, Steve, and Tim for giving this a thought. I 
definitely agree that the problem lies in this line:

List sents = new ArrayList<>(JCasUtil.selectCovering(jCas, 
Sentence.class, entityOrEventMention.getBegin(), 
entityOrEventMention.getEnd()));

The negation AE runs fine on shorter documents but as soon as I try to run it 
on large documents, which have LOTS of sentences, it becomes extremely slow.

I am sorry I haven’t been able to try the proposed solutions. I may have a 
little time after the long weekend. 

In the meantime, I created a JIRA issue: 
https://issues.apache.org/jira/browse/CTAKES-449

Thanks again for your help.

Dima



> On Jun 30, 2017, at 07:26, Finan, Sean <sean.fi...@childrens.harvard.edu> 
> wrote:
> 
> Hi Dima,
> Have you had a chance to play with the proposed solutions?  If not then let 
> us know and somebody will eventually get to it.
> Meanwhile, would you mind submitting a tar on jira?
> Thanks,
> Sean
> 
> -Original Message-
> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
> Sent: Wednesday, June 21, 2017 3:18 PM
> To: dev@ctakes.apache.org
> Cc: Miller, Timothy
> Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]
> 
> Sean, thanks for your comments. You are right. The slowdown doesn’t have 
> anything to do with documentID.
> 
> I am now convinced that the slowdown has to do with the Polarity annotator. 
> The reason you and others haven’t seen this in other pipelines is that you’ve 
> probably been processing relatively small files. 
> 
> I am processing MIMIC patient files, which typically have thousands of words. 
> I just tried to process 300 files from the THYME corpus (where the files have 
> hundreds of words) and the slowdown was barely noticeable. When running the 
> same pipeline on the MIMIC files, the slowdown becomes very noticeable.
> 
> 
> Dima
> 
> 
> 
>> On Jun 5, 2017, at 10:42, Finan, Sean <sean.fi...@childrens.harvard.edu> 
>> wrote:
>> 
>> Hi Dima,
>> 
>> It looks like the UriCollectionReader that you are using never sets a 
>> document id (type DocumentID) in the cas.  However, this shouldn't be a 
>> problem as each document will be assigned a unique id "UnknownDocument"{###} 
>> where {###} is a number incremented per new document with an unknown id.  
>> The message that you are seeing is just a warning.  The code fetching the 
>> documentID and creating a default are very simple and should not take any 
>> real processing time.
>> 
>> The call to get document id is the very first line in 
>> AssertionCleartkAnalysisEngine:
>> @Override
>> public void process(JCas jCas) throws AnalysisEngineProcessException  
>> {
>>   String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
>> 
>> So, the slowdown occurring after the warning message leads me to believe 
>> that the problem lies later in the process ...
>> 
>> My suggestion is that you put a breakpoint there and run your pipeline 
>> through a debugger.  Optionally, there are a couple of log.debug messages in 
>> that class, so you could change the granularity of your log4j and see if you 
>> can narrow down the problem.  Add more debug statements if it helps.
>> 
>> At any rate, I have not seen this problem in other pipelines.
>> 
>> Sean
>> 
>> -Original Message-
>> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu]
>> Sent: Wednesday, May 24, 2017 10:34 AM
>> To: cTAKES Developer list
>> Subject: negation/uncertainty: pipeline runs very slowly
>> 
>> Dear cTAKES developers,
>> 
>> I am observing something strange. As soon as I add at the end of my pipeline 
>> the uncertainty/negation AEs:
>> 
>> aggregateBuilder.add( 
>> PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); 
>> aggregateBuilder.add( 
>> UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
>> 
>> the pipeline becomes 10-20 times slower. I just confirmed this again. As 
>> soon as I remove these two AEs at the end of my pipeline, it runs very fast 
>> again.
>> 
>> It seems to get stuck often right after it outputs this warning:
>> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
>> 
>> If I remove the two AEs, this warning disappears.
>> 
>> The full pipeline is here:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitri
>> ydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pip
>> elines_UmlsLookupPipeline.java=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14
>> JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=cQRgT9
>> lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs=fnshTyx1ruwH-8ktFPX4JeX-7PVWpl
>> biPO2RYdGSI9E=
>> 
>> Any clues?
>> 
>> Thank you very much,
>> 
>> Dima
>> 
>> 
>> 
> 



Re: negation/uncertainty: pipeline runs very slowly

2017-06-21 Thread Dligach, Dmitriy
Sean, thanks for your comments. You are right. The slowdown doesn’t have 
anything to do with documentID.

I am now convinced that the slowdown has to do with the Polarity annotator. The 
reason you and others haven’t seen this in other pipelines is that you’ve 
probably been processing relatively small files. 

I am processing MIMIC patient files, which typically have thousands of words. I 
just tried to process 300 files from the THYME corpus (where the files have 
hundreds of words) and the slowdown was barely noticeable. When running the 
same pipeline on the MIMIC files, the slowdown becomes very noticeable.


Dima



> On Jun 5, 2017, at 10:42, Finan, Sean <sean.fi...@childrens.harvard.edu> 
> wrote:
> 
> Hi Dima,
> 
> It looks like the UriCollectionReader that you are using never sets a 
> document id (type DocumentID) in the cas.  However, this shouldn't be a 
> problem as each document will be assigned a unique id "UnknownDocument"{###} 
> where {###} is a number incremented per new document with an unknown id.  The 
> message that you are seeing is just a warning.  The code fetching the 
> documentID and creating a default are very simple and should not take any 
> real processing time.
> 
> The call to get document id is the very first line in 
> AssertionCleartkAnalysisEngine:
>  @Override
>  public void process(JCas jCas) throws AnalysisEngineProcessException
>  {
>String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
> 
> So, the slowdown occurring after the warning message leads me to believe that 
> the problem lies later in the process ...
> 
> My suggestion is that you put a breakpoint there and run your pipeline 
> through a debugger.  Optionally, there are a couple of log.debug messages in 
> that class, so you could change the granularity of your log4j and see if you 
> can narrow down the problem.  Add more debug statements if it helps.
> 
> At any rate, I have not seen this problem in other pipelines.
> 
> Sean
> 
> -Original Message-
> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
> Sent: Wednesday, May 24, 2017 10:34 AM
> To: cTAKES Developer list
> Subject: negation/uncertainty: pipeline runs very slowly
> 
> Dear cTAKES developers, 
> 
> I am observing something strange. As soon as I add at the end of my pipeline 
> the uncertainty/negation AEs:
> 
> aggregateBuilder.add( 
> PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); 
> aggregateBuilder.add( 
> UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
> 
> the pipeline becomes 10-20 times slower. I just confirmed this again. As soon 
> as I remove these two AEs at the end of my pipeline, it runs very fast again.
> 
> It seems to get stuck often right after it outputs this warning:
> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
> 
> If I remove the two AEs, this warning disappears.
> 
> The full pipeline is here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_UmlsLookupPipeline.java=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=cQRgT9lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs=fnshTyx1ruwH-8ktFPX4JeX-7PVWplbiPO2RYdGSI9E=
>  
> 
> Any clues?
> 
> Thank you very much,
> 
> Dima
> 
> 
> 



Re: mvn install error on ctakes-temporal module test

2017-05-18 Thread Dligach, Dmitriy
You may be getting the same “URI is not hierarchical error” I was experiencing 
last month. This is a known problem. While it’s getting fixed, there’s a 
temporary fix. Please take a look at the message archive and look for the 
messages with title: “URI is not hierarchical”. Here’s the gist of what you 
need to do:

"I checked out lvg from 
svn.code.sf.net/p/ctakesresources/code/trunk/ctakes-resources-lvg2008/src/main/resources/org/apache/ctakes/lvg/
 and put it in target/classes/org/apache/ctakes/."

Dima



> On May 18, 2017, at 13:55, Mullane, Sean *HS  
> wrote:
> 
> I have so far been unable to build and install ctakes 4.0.0 or 
> 4.0.1-snapshot. With both versions, the mvn clean install fails on the 
> ctakes-temporal module test. Environment info and test results below. Has 
> anyone else encountered this? This seems similar to the "URI is not 
> hierarchical" thread from last month but it's not clear to me whether that 
> thread applies here.
> 
> Environment:
> Windows 7 Pro
> Eclipse 4.5 Mars
> jdk1.8.0_131
> 
> I followed the step by step install instructions here exactly: 
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+Developer+Install+Guide
> Then in Eclipse I right-clicked the top-level package -> Run As -> Maven 
> Build... -> mvn clean install
> 
> The results are below:
> 
> 
> ---
> T E S T S
> ---
> Running org.apache.ctakes.temporal.ae.BackwardsTimeAnnotatorTest
> log4j: reset attribute= "false".
> log4j: Threshold ="null".
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressAppender] additivity to [false].
> log4j: Level value for ProgressAppender is  [INFO].
> log4j: ProgressAppender level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m].
> log4j: Adding appender named [noEolAppender] to category [ProgressAppender].
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressDone] additivity to [false].
> log4j: Level value for ProgressDone is  [INFO].
> log4j: ProgressDone level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m%n].
> log4j: Adding appender named [eolAppender] to category [ProgressDone].
> log4j: Level value for root is  [INFO].
> log4j: root level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%d{dd MMM  HH:mm:ss} %5p 
> %c{1} - %m%n].
> log4j: Adding appender named [consoleAppender] to category [root].
> 18 May 2017 14:09:40  INFO LvgAnnotator - URL for lvg.properties 
> =file:/C:/Users/*/.m2/repository/net/sourceforge/ctakesresources/ctakes-resources-lvg2008/4.0.0/ctakes-resources-lvg2008-4.0.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
> 18 May 2017 14:09:41  INFO SentenceDetector - Sentence detector model file: 
> org/apache/ctakes/core/sentdetect/sd-med-model.zip
> 18 May 2017 14:09:42  INFO TokenizerAnnotatorPTB - Initializing 
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.311 sec <<< 
> FAILURE!
> Running org.apache.ctakes.temporal.ae.ContextualModalityAnnotatorTest
> 18 May 2017 14:09:42  INFO LvgAnnotator - URL for lvg.properties 
> =file:/C:/Users/*/.m2/repository/net/sourceforge/ctakesresources/ctakes-resources-lvg2008/4.0.0/ctakes-resources-lvg2008-4.0.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
> 18 May 2017 14:09:44  INFO SentenceDetector - Sentence detector model file: 
> org/apache/ctakes/core/sentdetect/sd-med-model.zip
> 18 May 2017 14:09:44  INFO TokenizerAnnotatorPTB - Initializing 
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.355 sec <<< 
> FAILURE!
> Running org.apache.ctakes.temporal.ae.EventAnnotatorTest
> 18 May 2017 14:09:45  INFO LvgAnnotator - URL for lvg.properties 
> =file:/C:/Users/*/.m2/repository/net/sourceforge/ctakesresources/ctakes-resources-lvg2008/4.0.0/ctakes-resources-lvg2008-4.0.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
> 18 May 2017 14:09:46  INFO SentenceDetector - Sentence detector model file: 
> org/apache/ctakes/core/sentdetect/sd-med-model.zip
> 18 May 2017 14:09:46  INFO TokenizerAnnotatorPTB - Initializing 
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.653 sec <<< 
> FAILURE!
> Running org.apache.ctakes.temporal.ae.EventEventRelationAnnotatorTest
> 18 May 2017 14:09:46  INFO LvgAnnotator - URL for lvg.properties 
> 

Re: Temporal module dictionary

2017-05-11 Thread Dligach, Dmitriy
Hi Erin,

I would expect the relation annotator to catch contains(for 4 years, sinusitis) 
but clearly it’s far from perfect.

I am surprised it catches contains(4 years ago, *city*) but this is mostly 
because *city* is marked as an event (not sure why). 

Dima



> On May 11, 2017, at 12:00, Erin Gustafson <erin.gustaf...@northwestern.edu> 
> wrote:
> 
> Hey Dima,
> 
> Yes, that is one solution! Perhaps that's just the best way to go.
> 
> I'm seeing right now that the annotator is missing some relations that I 
> would expect it to catch (unless I'm misunderstanding something, which is 
> possible). For example:
> 
> He reports sinusitis and rhinitis for 4 years, which started when he moved to 
> *city* 4 years ago.
> -> misses sinusitis for 4 years, rhinitis for 4 years
> -> catches *city* 4 years ago
> 
> Would you expect those sort of expressions to be captured by the temporal 
> module?
> 
> Thanks,
> Erin
> 
> 
> -Original Message-
> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
> Sent: Thursday, May 11, 2017 11:50 AM
> To: dev@ctakes.apache.org
> Subject: Re: Temporal module dictionary
> 
> Hi Erin,
> 
> Is it an option to use all events and then just post-process the output of 
> temporal relation extraction to include the events you are interested in? The 
> temporal module may break if you exclude some events.
> 
> Dima
> 
> 
> 
>> On May 11, 2017, at 11:44, Erin Gustafson <erin.gustaf...@northwestern.edu> 
>> wrote:
>> 
>> Hi all,
>> 
>> I would like to use the temporal module to detect temporal relations 
>> involving events specific to a phenotype. I've created a custom .bsv 
>> dictionary with a limited set of concepts relevant to that phenotype, which 
>> I have used in the past as input to the dictionary look-up algorithm. Now 
>> I'd like to try to use the same dictionary with the temporal module to limit 
>> the extracted relations to those involving events of interest.
>> 
>> Is it possible to do this? I've plugged my dictionary in to 
>> FullTemporalExtractionPipeline, but the detected events still include 
>> concepts that fall outside my dictionary.
>> 
>> Thanks,
>> Erin
> 



Re: Temporal module dictionary

2017-05-11 Thread Dligach, Dmitriy
Hi Erin,

Is it an option to use all events and then just post-process the output of 
temporal relation extraction to include the events you are interested in? The 
temporal module may break if you exclude some events.

Dima



> On May 11, 2017, at 11:44, Erin Gustafson  
> wrote:
> 
> Hi all,
> 
> I would like to use the temporal module to detect temporal relations 
> involving events specific to a phenotype. I've created a custom .bsv 
> dictionary with a limited set of concepts relevant to that phenotype, which I 
> have used in the past as input to the dictionary look-up algorithm. Now I'd 
> like to try to use the same dictionary with the temporal module to limit the 
> extracted relations to those involving events of interest.
> 
> Is it possible to do this? I've plugged my dictionary in to 
> FullTemporalExtractionPipeline, but the detected events still include 
> concepts that fall outside my dictionary.
> 
> Thanks,
> Erin



Re: URI is not hierarchical

2017-05-01 Thread Dligach, Dmitriy
James,

I was able to get around the “URI not hierarchical” issue by doing what you 
suggested!

I checked out lvg from 
svn.code.sf.net/p/ctakesresources/code/trunk/ctakes-resources-lvg2008/src/main/resources/org/apache/ctakes/lvg/
 and put it in target/classes/org/apache/ctakes/.

Thank you so much, you made my day today :)

Dima



> On Apr 29, 2017, at 01:57, James Masanz <masanz.ja...@gmail.com> wrote:
> 
> Hi Dima,
> 
> I modified my local copy of FileLocator to avoid the
> StringIndexOutOfBoundsException I was seeing using HEAD, and here's the
> workaround for the problem you were seeing until the code change is made to
> LvgCmdApiResourceImpl:
> 
> Copy the lvg directory that's in
> 
> %CTAKES_HOME%/resources/org/apache/ctakes/
> 
> to be under the  target/classes/org/apache/ctakes  directory that's within
> your local copy of ctakes-misc
> 
> so for example, on my system, after the copy, I see
> C:\from.svn\ctakes-misc.git\trunk\target\classes\org\
> apache\ctakes\lvg\data\config\lvg.properties
> 
> (For anyone else reading this, that resources directory will show up when
> you do   mvn clean install   of cTAKES)
> 
> If you want lvg to use the actual lvg resources, you can't just copy the
> lvg.properties file for this workaround, you have to copy the entire
> subtree, or it will create an empty lvg database.
> 
> 
> I'll check out the StringIndexOutOfBoundsException on Monday unless someone
> else beats me to it before then.
> 
> 
> FYI, to test this workaround, before I did the copy, I was seeing
> 
> java.lang.IllegalArgumentException: URI is not hierarchical
>at java.io.File.(File.java:418)
>at org.apache.ctakes.lvg.resource.LvgCmdApiResourceImpl.load(
> LvgCmdApiResourceImpl.java:65)
> 
> After I copied that lvg directory, I was then able see output in the
> outputDirectory
> defined by UmlsLookupPipeline.java
> 
> FYI #2, I'm using Win 7 Pro. This workaround should work for linux/mac but
> was not tested there.
> 
> FYI #3, this was using your idea of not having ctakes-misc be under ctakes.
> thanks for that tip!  keeps things nicely separated - no need to update
> the parent pom at all.
> 
> 
> 
> On Sat, Apr 29, 2017 at 2:00 AM, James Masanz <masanz.ja...@gmail.com>
> wrote:
> 
>> Hi Dima,
>> 
>> what revision of trunk are you using?  I'm getting an error you weren't
>> seeing so I'm guessing it's because I checked out ctakes just today.
>> 
>> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
>> range: -7
>>at java.lang.String.substring(String.java:1967)
>>at org.apache.ctakes.dictionary.lookup2.util.
>> JdbcConnectionFactory.getConnectionUrl(JdbcConnectionFactory.java:110)
>>at org.apache.ctakes.dictionary.lookup2.util.
>> JdbcConnectionFactory.getConnection(JdbcConnectionFactory.java:63)
>>at org.apache.ctakes.dictionary.lookup2.dictionary.
>> JdbcRareWordDictionary.(JdbcRareWordDictionary.java:91)
>>at org.apache.ctakes.dictionary.lookup2.dictionary.
>> JdbcRareWordDictionary.(JdbcRareWordDictionary.java:72)
>>at org.apache.ctakes.dictionary.lookup2.dictionary.
>> UmlsJdbcRareWordDictionary.(UmlsJdbcRareWordDictionary.java:31)
>>... 27 more
>> 
>> FYI, I created the directories needed by UmlsLookupPipeline.java  for
>> chunker-model.zip  and   inputDirectory   and  outputDirectory, and I get
>> the above regardless of whether I use text input file containing just "pain
>> in left knee started on Wednesday." or if I  use
>> GenSurg_UmbilicalHernia_1.rtf as the input file instead.
>> 
>> 
>> On Fri, Apr 28, 2017 at 5:48 PM, Dligach, Dmitriy <ddlig...@luc.edu>
>> wrote:
>> 
>>> Hi James,
>>> 
>>> Thank you so much for looking into this!
>>> 
>>> Your general setup matches mine. I also do:
>>> 
>>> 1. svn co https://svn.apache.org/repos/asf/ctakes/trunk/
>>> 2. git clone https://github.com/dmitriydligach/ctakes-misc.git (in
>>> trunk/)
>>> 3. mvn clean compile (in trunk/)
>>> 4. mvn clean compile (in ctakes-misc/)
>>> 
>>> BTW, I just discovered that it’s not necessary to check out a fresh copy
>>> of ctakes-misc into a subdirectory in trunk. It will build no matter where
>>> it is on your system as long as you first do an ‘mvn clean compile’ in
>>> trunk/ (without it, ctakes-misc/ will not build).
>>> 
>>> Thanks again, James.
>>> 
>>> Dima
>>> 
>>> 
>>> 
>>>> On Apr 28, 2017, at 16:30, James Masanz 

Re: URI is not hierarchical

2017-04-29 Thread Dligach, Dmitriy
Hi James,

> Copy the lvg directory that's in
> 
> %CTAKES_HOME%/resources/org/apache/ctakes/

Sorry, in my case, I am only seeing one directory here called ‘dictionary’. Is 
that the one or should there be something called ‘lvg’?

Dima


> to be under the  target/classes/org/apache/ctakes  directory that's within
> your local copy of ctakes-misc
> 
> so for example, on my system, after the copy, I see
> C:\from.svn\ctakes-misc.git\trunk\target\classes\org\
> apache\ctakes\lvg\data\config\lvg.properties
> 
> (For anyone else reading this, that resources directory will show up when
> you do   mvn clean install   of cTAKES)
> 
> If you want lvg to use the actual lvg resources, you can't just copy the
> lvg.properties file for this workaround, you have to copy the entire
> subtree, or it will create an empty lvg database.
> 
> 
> I'll check out the StringIndexOutOfBoundsException on Monday unless someone
> else beats me to it before then.
> 
> 
> FYI, to test this workaround, before I did the copy, I was seeing
> 
> java.lang.IllegalArgumentException: URI is not hierarchical
>at java.io.File.(File.java:418)
>at org.apache.ctakes.lvg.resource.LvgCmdApiResourceImpl.load(
> LvgCmdApiResourceImpl.java:65)
> 
> After I copied that lvg directory, I was then able see output in the
> outputDirectory
> defined by UmlsLookupPipeline.java
> 
> FYI #2, I'm using Win 7 Pro. This workaround should work for linux/mac but
> was not tested there.
> 
> FYI #3, this was using your idea of not having ctakes-misc be under ctakes.
> thanks for that tip!  keeps things nicely separated - no need to update
> the parent pom at all.
> 
> 
> 
> On Sat, Apr 29, 2017 at 2:00 AM, James Masanz <masanz.ja...@gmail.com>
> wrote:
> 
>> Hi Dima,
>> 
>> what revision of trunk are you using?  I'm getting an error you weren't
>> seeing so I'm guessing it's because I checked out ctakes just today.
>> 
>> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
>> range: -7
>>at java.lang.String.substring(String.java:1967)
>>at org.apache.ctakes.dictionary.lookup2.util.
>> JdbcConnectionFactory.getConnectionUrl(JdbcConnectionFactory.java:110)
>>at org.apache.ctakes.dictionary.lookup2.util.
>> JdbcConnectionFactory.getConnection(JdbcConnectionFactory.java:63)
>>at org.apache.ctakes.dictionary.lookup2.dictionary.
>> JdbcRareWordDictionary.(JdbcRareWordDictionary.java:91)
>>at org.apache.ctakes.dictionary.lookup2.dictionary.
>> JdbcRareWordDictionary.(JdbcRareWordDictionary.java:72)
>>at org.apache.ctakes.dictionary.lookup2.dictionary.
>> UmlsJdbcRareWordDictionary.(UmlsJdbcRareWordDictionary.java:31)
>>... 27 more
>> 
>> FYI, I created the directories needed by UmlsLookupPipeline.java  for
>> chunker-model.zip  and   inputDirectory   and  outputDirectory, and I get
>> the above regardless of whether I use text input file containing just "pain
>> in left knee started on Wednesday." or if I  use
>> GenSurg_UmbilicalHernia_1.rtf as the input file instead.
>> 
>> 
>> On Fri, Apr 28, 2017 at 5:48 PM, Dligach, Dmitriy <ddlig...@luc.edu>
>> wrote:
>> 
>>> Hi James,
>>> 
>>> Thank you so much for looking into this!
>>> 
>>> Your general setup matches mine. I also do:
>>> 
>>> 1. svn co https://svn.apache.org/repos/asf/ctakes/trunk/
>>> 2. git clone https://github.com/dmitriydligach/ctakes-misc.git (in
>>> trunk/)
>>> 3. mvn clean compile (in trunk/)
>>> 4. mvn clean compile (in ctakes-misc/)
>>> 
>>> BTW, I just discovered that it’s not necessary to check out a fresh copy
>>> of ctakes-misc into a subdirectory in trunk. It will build no matter where
>>> it is on your system as long as you first do an ‘mvn clean compile’ in
>>> trunk/ (without it, ctakes-misc/ will not build).
>>> 
>>> Thanks again, James.
>>> 
>>> Dima
>>> 
>>> 
>>> 
>>>> On Apr 28, 2017, at 16:30, James Masanz <masanz.ja...@gmail.com> wrote:
>>>> 
>>>> Hi Dima,
>>>> 
>>>> Just to let you know I am taking a look at this.  More later, if not
>>> today,
>>>> then tomorrow. FYI here is where I'm at so far.
>>>> 
>>>> I checked out a fresh copy of ctakes trunk and put files from your
>>>> ctakes-misc into a subdirectory called ctakes-misc, and I updated my
>>> local
>>>> copy of the main pom.xml for ctakes to include c

Re: URI is not hierarchical

2017-04-26 Thread Dligach, Dmitriy
I am definitely still seeing the “URI is not hierarchical” issue. Here’s a 
piece of information that might help you figure out what the problem is:

It only happens if the pipeline includes dictionary lookup. For instance, this 
one fails:

https://github.com/dmitriydligach/ctakes-misc/blob/master/src/main/java/org/apache/ctakes/pipelines/UmlsLookupPipeline.java

But this one succeeds:

https://github.com/dmitriydligach/ctakes-misc/blob/master/src/main/java/org/apache/ctakes/pipelines/BasicPipeline.java

(it’s the same as the first one, but the dictionary lookup part is removed).

Dima



> On Apr 26, 2017, at 11:37, Finan, Sean <sean.fi...@childrens.harvard.edu> 
> wrote:
> 
> Hi again Dima,
> 
> The piper files are not meant to replace uimafit.  Uimafit is great for many 
> purposes.
> 
> As for that annoying old "URI is not hierarchical" bug, a while back I 
> checked in a fix that worked for me.  Since then I cannot duplicate it.  
> 
> Sean
> 
> -Original Message-
> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
> Sent: Wednesday, April 26, 2017 12:18 PM
> To: dev@ctakes.apache.org
> Subject: Re: URI is not hierarchical
> 
> As I said in my previous email, the piper approach looks very promising. 
> However many of us probably still have lots of existing uimaFIT pipelines and 
> it would be nice to be able to run them from command line.
> 
> So, are there any plans to finally fix this old “URI is not hierarchical” 
> problem? Do we at least know what’s causing it?
> 
> Dima
> 
> 
> 
>> On Apr 14, 2017, at 12:14, Finan, Sean <sean.fi...@childrens.harvard.edu> 
>> wrote:
>> 
>> Ok, thanks.  For your original question: 
>> 
>>> it fails with “URI is not hierarchical” when the dictionary lookup is 
>>> enabled. 
>>> I believe this is an old issue, so are there any plans for fixing it in the 
>>> new release?
>> 
>> I thought that I had already fixed it.  So much for my thorough testing.
>> 
>> Let me know what happens with the piper approach.
>> Sean
>> 
>> 
>> -Original Message-
>> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
>> Sent: Friday, April 14, 2017 12:47 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: URI is not hierarchical
>> 
>> Hi Sean,
>> 
>> 
>> 
>> The pipeline I am trying to run is this:
>> 
>> 
>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_UmlsLookupPipeline.java=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=xe6RaWm66aSyUeXjU20x8dEc3xfYTHbadlwhgSdrIsw=ixO9vX_WJ7p-dpJc0RdnzRfjwQqbNw4gkLwf0SSpp0I=
>>  
>> 
>> 
>> 
>> (This is the UmlsLookupPipeline class).
>> 
>> 
>> 
>> It runs fine in Eclipse but fails when I run from command line.
>> 
>> 
>> 
>> I will look into the solution you are suggesting (thanks!).
>> 
>> 
>> 
>> Dima
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Apr 14, 2017, at 11:35, Finan, Sean <sean.fi...@childrens.harvard.edu> 
>>> wrote:
>> 
>>> 
>> 
>>> Hi Dima,
>> 
>>> 
>> 
>>> Where did you get that class?  I don't have UmlsLookupPipeline or the 
>>> package org.apache.ctakes.pipelines.
>> 
>>> 
>> 
>>> If you want to run from command-line I highly recommend that you use the 
>>> PiperFileRunner class in core.pipeline.
>> 
>>> 
>> 
>>> To run the clinical pipeline use cli parameters:
>> 
>>> -p DefaultFastPipeline.piper
>> 
>>> -i {inputDir}
>> 
>>> --xmiOut {outputDir}
>> 
>>> --user {umlsUsername}
>> 
>>> --pass {umlsPassword}
>> 
>>> 
>> 
>>> If you have the binary installation there is a runClinicalPipeline script 
>>> in bin/
>> 
>>> 
>> 
>>> PiperFileRunner can run other piper files and take other parameters
>> 
>>> #   Runs the pipeline in the piper file specified by -p (piperfile)
>> 
>>> #   with any other provided parameters.  Standard parameters are:
>> 
>>> # -i , --inputDir {inputDirectory}
>> 
>>> # -o , --outputDir {outputDirectory}
>> 
>>> # -s , --subDir {subDirectory}  (for i/o)
>> 
>>> # --xmiOut {xmiOutputDirectory} (if different from -o)
>> 
>>> # -l , --lookupXml {dictionaryConfigFile} (fast onl

Re: URI is not hierarchical

2017-04-26 Thread Dligach, Dmitriy
Sean, thanks for getting back to me on this.

I am now trying to run PiperFileRunner in Eclipse (ultimately I want to run it 
from command line), so I believe the working directory now is ctakes-core. 

I am specifying the full path to DefaultFastPipeline.piper because I couldn’t 
get it to work any other way.

Dima



> On Apr 26, 2017, at 11:27, Finan, Sean <sean.fi...@childrens.harvard.edu> 
> wrote:
> 
> Hi Dima,
> 
> The error messages is telling you that ContextDependentTokenizerAnnotator is 
> not found.  That is the first ae outside of core.  It is in 
> ctakes-contexttokenizer.
> 
> It also looks like you are specifying a full path to 
> DefaultFastPipeline.piper.
> 
> So I have to ask: what is your working directory and what is your classpath?
> 
> Thanks,
> Sean
> 
> -Original Message-
> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
> Sent: Wednesday, April 26, 2017 12:16 PM
> To: dev@ctakes.apache.org
> Subject: Re: URI is not hierarchical
> 
> Hi Sean,
> 
> 
> 
> Thanks again for providing this information — the piper approach looks very 
> promising.
> 
> 
> 
> So I gave it a try, but it didn’t quite work. As you suggested, I am trying 
> to run the PiperFileRunner class in core.pipelines. I give it the following 
> parameters:
> 
> 
> 
> -p 
> /Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline-res/src/main/resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
> 
> -i 
> /Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-examples-res/src/main/resources/org/apache/ctakes/examples/notes/rtf/
> 
> --xmiOut /Users/Dima/Temp/
> 
> --user 
> 
> --pass 
> 
> 
> 
> I get this error:
> 
> 
> 
> 26 Apr 2017 11:11:40 ERROR PiperFileRunner - MESSAGE LOCALIZATION FAILED: 
> Can't find resource for bundle java.util.PropertyResourceBundle, key No 
> Analysis Component found for ContextDependentTokenizerAnnotator
> 
> 
> 
> Any thoughts?
> 
> 
> 
> Best,
> 
> 
> 
> Dima
> 
> 
> 
> 
> 
> 
> 
>> On Apr 14, 2017, at 11:35, Finan, Sean <sean.fi...@childrens.harvard.edu> 
>> wrote:
> 
>> 
> 
>> Hi Dima,
> 
>> 
> 
>> Where did you get that class?  I don't have UmlsLookupPipeline or the 
>> package org.apache.ctakes.pipelines.
> 
>> 
> 
>> If you want to run from command-line I highly recommend that you use the 
>> PiperFileRunner class in core.pipeline.
> 
>> 
> 
>> To run the clinical pipeline use cli parameters:
> 
>> -p DefaultFastPipeline.piper
> 
>> -i {inputDir}
> 
>> --xmiOut {outputDir}
> 
>> --user {umlsUsername}
> 
>> --pass {umlsPassword}
> 
>> 
> 
>> If you have the binary installation there is a runClinicalPipeline script in 
>> bin/
> 
>> 
> 
>> PiperFileRunner can run other piper files and take other parameters
> 
>> #   Runs the pipeline in the piper file specified by -p (piperfile)
> 
>> #   with any other provided parameters.  Standard parameters are:
> 
>> # -i , --inputDir {inputDirectory}
> 
>> # -o , --outputDir {outputDirectory}
> 
>> # -s , --subDir {subDirectory}  (for i/o)
> 
>> # --xmiOut {xmiOutputDirectory} (if different from -o)
> 
>> # -l , --lookupXml {dictionaryConfigFile} (fast only)
> 
>> # --user {umlsUsername}
> 
>> # --pass {umlsPassword}
> 
>> # -? , --help
> 
>> #
> 
>> #   Other parameters may be declared in the piper file using the cli command:
> 
>> # cli {parameterName}={singleCharacter}
> 
>> #   For instance, for declaration of ParagraphAnnotator path to regex file 
>> optional parameter PARAGRAPH_TYPES_PATH,
> 
>> #   in the custom piper file add the line:
> 
>> # cli PARAGRAPH_TYPES_PATH=t
> 
>> #   and when executing this script use:
> 
>> #  runPiperFile -p path/to/my/custom.piper -t path/to/my/custom.bsv  ...
> 
>> 
> 
>> 
> 
>> The above is a snippet from the runPiperFile script in the bin/ directory. 
> 
>> 
> 
>> I am in the process of writing documentation on piper files in the wiki.
> 
>> 
> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=QvJHfwMRwE-eH8zLpe6-qM3SFrDndefi0oRgnOIDowI=jfzeOzImM6Wyvbi7yAj5D6CfSaqoIspmc0NCbrt4Fcs=
>>  
> 
>> 
> 
>> 
> 
>> -Original Message-
> 
>> From: Dligach, Dmitriy [mailto:ddlig...

Re: URI is not hierarchical

2017-04-26 Thread Dligach, Dmitriy
As I said in my previous email, the piper approach looks very promising. 
However many of us probably still have lots of existing uimaFIT pipelines and 
it would be nice to be able to run them from command line.

So, are there any plans to finally fix this old “URI is not hierarchical” 
problem? Do we at least know what’s causing it?

Dima



> On Apr 14, 2017, at 12:14, Finan, Sean <sean.fi...@childrens.harvard.edu> 
> wrote:
> 
> Ok, thanks.  For your original question: 
> 
>> it fails with “URI is not hierarchical” when the dictionary lookup is 
>> enabled. 
>> I believe this is an old issue, so are there any plans for fixing it in the 
>> new release?
> 
> I thought that I had already fixed it.  So much for my thorough testing.
> 
> Let me know what happens with the piper approach.
> Sean
> 
> 
> -Original Message-
> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
> Sent: Friday, April 14, 2017 12:47 PM
> To: dev@ctakes.apache.org
> Subject: Re: URI is not hierarchical
> 
> Hi Sean,
> 
> 
> 
> The pipeline I am trying to run is this:
> 
> 
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_UmlsLookupPipeline.java=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=xe6RaWm66aSyUeXjU20x8dEc3xfYTHbadlwhgSdrIsw=ixO9vX_WJ7p-dpJc0RdnzRfjwQqbNw4gkLwf0SSpp0I=
>  
> 
> 
> 
> (This is the UmlsLookupPipeline class).
> 
> 
> 
> It runs fine in Eclipse but fails when I run from command line.
> 
> 
> 
> I will look into the solution you are suggesting (thanks!).
> 
> 
> 
> Dima
> 
> 
> 
> 
> 
> 
> 
>> On Apr 14, 2017, at 11:35, Finan, Sean <sean.fi...@childrens.harvard.edu> 
>> wrote:
> 
>> 
> 
>> Hi Dima,
> 
>> 
> 
>> Where did you get that class?  I don't have UmlsLookupPipeline or the 
>> package org.apache.ctakes.pipelines.
> 
>> 
> 
>> If you want to run from command-line I highly recommend that you use the 
>> PiperFileRunner class in core.pipeline.
> 
>> 
> 
>> To run the clinical pipeline use cli parameters:
> 
>> -p DefaultFastPipeline.piper
> 
>> -i {inputDir}
> 
>> --xmiOut {outputDir}
> 
>> --user {umlsUsername}
> 
>> --pass {umlsPassword}
> 
>> 
> 
>> If you have the binary installation there is a runClinicalPipeline script in 
>> bin/
> 
>> 
> 
>> PiperFileRunner can run other piper files and take other parameters
> 
>> #   Runs the pipeline in the piper file specified by -p (piperfile)
> 
>> #   with any other provided parameters.  Standard parameters are:
> 
>> # -i , --inputDir {inputDirectory}
> 
>> # -o , --outputDir {outputDirectory}
> 
>> # -s , --subDir {subDirectory}  (for i/o)
> 
>> # --xmiOut {xmiOutputDirectory} (if different from -o)
> 
>> # -l , --lookupXml {dictionaryConfigFile} (fast only)
> 
>> # --user {umlsUsername}
> 
>> # --pass {umlsPassword}
> 
>> # -? , --help
> 
>> #
> 
>> #   Other parameters may be declared in the piper file using the cli command:
> 
>> # cli {parameterName}={singleCharacter}
> 
>> #   For instance, for declaration of ParagraphAnnotator path to regex file 
>> optional parameter PARAGRAPH_TYPES_PATH,
> 
>> #   in the custom piper file add the line:
> 
>> # cli PARAGRAPH_TYPES_PATH=t
> 
>> #   and when executing this script use:
> 
>> #  runPiperFile -p path/to/my/custom.piper -t path/to/my/custom.bsv  ...
> 
>> 
> 
>> 
> 
>> The above is a snippet from the runPiperFile script in the bin/ directory. 
> 
>> 
> 
>> I am in the process of writing documentation on piper files in the wiki.
> 
>> 
> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=xe6RaWm66aSyUeXjU20x8dEc3xfYTHbadlwhgSdrIsw=pNVLJSOBMk5orJZ4Oy4ZwtpkPp0tU48-g1qxtpIBucw=
>>  
> 
>> 
> 
>> 
> 
>> -Original Message-
> 
>> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
> 
>> Sent: Friday, April 14, 2017 12:17 PM
> 
>> To: cTAKES Developer list
> 
>> Subject: URI is not hierarchical
> 
>> 
> 
>> Dear cTAKES developers,
> 
>> 
> 
>> 
> 
>> 
> 
>> I am trying to run a simple pipeline that involves dictionary lookup:
> 

Re: URI is not hierarchical

2017-04-26 Thread Dligach, Dmitriy
Hi Sean,

Thanks again for providing this information — the piper approach looks very 
promising.

So I gave it a try, but it didn’t quite work. As you suggested, I am trying to 
run the PiperFileRunner class in core.pipelines. I give it the following 
parameters:

-p 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline-res/src/main/resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper
-i 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-examples-res/src/main/resources/org/apache/ctakes/examples/notes/rtf/
--xmiOut /Users/Dima/Temp/
--user 
--pass 

I get this error:

26 Apr 2017 11:11:40 ERROR PiperFileRunner - MESSAGE LOCALIZATION FAILED: Can't 
find resource for bundle java.util.PropertyResourceBundle, key No Analysis 
Component found for ContextDependentTokenizerAnnotator

Any thoughts?

Best,

Dima



> On Apr 14, 2017, at 11:35, Finan, Sean <sean.fi...@childrens.harvard.edu> 
> wrote:
> 
> Hi Dima,
> 
> Where did you get that class?  I don't have UmlsLookupPipeline or the package 
> org.apache.ctakes.pipelines.
> 
> If you want to run from command-line I highly recommend that you use the 
> PiperFileRunner class in core.pipeline.
> 
> To run the clinical pipeline use cli parameters:
> -p DefaultFastPipeline.piper
> -i {inputDir}
> --xmiOut {outputDir}
> --user {umlsUsername}
> --pass {umlsPassword}
> 
> If you have the binary installation there is a runClinicalPipeline script in 
> bin/
> 
> PiperFileRunner can run other piper files and take other parameters
> #   Runs the pipeline in the piper file specified by -p (piperfile)
> #   with any other provided parameters.  Standard parameters are:
> # -i , --inputDir {inputDirectory}
> # -o , --outputDir {outputDirectory}
> # -s , --subDir {subDirectory}  (for i/o)
> # --xmiOut {xmiOutputDirectory} (if different from -o)
> # -l , --lookupXml {dictionaryConfigFile} (fast only)
> # --user {umlsUsername}
> # --pass {umlsPassword}
> # -? , --help
> #
> #   Other parameters may be declared in the piper file using the cli command:
> # cli {parameterName}={singleCharacter}
> #   For instance, for declaration of ParagraphAnnotator path to regex file 
> optional parameter PARAGRAPH_TYPES_PATH,
> #   in the custom piper file add the line:
> # cli PARAGRAPH_TYPES_PATH=t
> #   and when executing this script use:
> #  runPiperFile -p path/to/my/custom.piper -t path/to/my/custom.bsv  ...
> 
> 
> The above is a snippet from the runPiperFile script in the bin/ directory. 
> 
> I am in the process of writing documentation on piper files in the wiki.
> 
> https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files
> 
> 
> -Original Message-
> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
> Sent: Friday, April 14, 2017 12:17 PM
> To: cTAKES Developer list
> Subject: URI is not hierarchical
> 
> Dear cTAKES developers,
> 
> 
> 
> I am trying to run a simple pipeline that involves dictionary lookup:
> 
> 
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_UmlsLookupPipeline.java=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=-N-wYwlTFXbedV0kkcf_qgdQj3HYIeLCeKr9Q303b0Q=NWoDwPNChJkDhFseM3j8Gi7KNDdVqzi1BFt0onalf9I=
>  
> 
> 
> 
> from command line as follows:
> 
> 
> 
> mvn exec:java 
> -Dexec.mainClass=“org.apache.ctakes.pipelines.UmlsLookupPipeline” 
> 
> 
> 
> It runs fine if the dictionary lookup related fragmented is commented out, 
> but it fails with “URI is not hierarchical” when the dictionary lookup is 
> enabled.
> 
> 
> 
> I believe this is an old issue, so are there any plans for fixing it in the 
> new release? In the meantime, are there any workarounds?
> 
> 
> 
> Many thanks!
> 
> 
> 
> The full error is below.
> 
> 
> 
> Dima
> 
> 
> 
> 
> 
> 
> 
> 14 Apr 2017 11:04:24  INFO LvgAnnotator - URL for lvg.properties 
> =file:/home/dima/.m2/repository/net/sourceforge/ctakesresources/ctakes-resources-lvg2008/4.0.0/ctakes-resources-lvg2008-4.0.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
> 
> 14 Apr 2017 11:04:24  INFO SentenceDetector - Sentence detector model file: 
> org/apache/ctakes/core/sentdetect/sd-med-model.zip
> 
> 14 Apr 2017 11:04:24  INFO TokenizerAnnotatorPTB - Initializing 
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 
> 14 Apr 2017 11:04:24  INFO ContextDependentTokenizerAnnotator - Finite state 
> machines loaded.
> 
> 14 Apr 2017 11:04:24  INFO POSTagger - POS tagger model file: 
> org/apache/ctakes/postagger/models/mayo-pos.zi

Re: URI is not hierarchical

2017-04-14 Thread Dligach, Dmitriy
Hi Sean,

The pipeline I am trying to run is this:

https://github.com/dmitriydligach/ctakes-misc/blob/master/src/main/java/org/apache/ctakes/pipelines/UmlsLookupPipeline.java

(This is the UmlsLookupPipeline class).

It runs fine in Eclipse but fails when I run from command line.

I will look into the solution you are suggesting (thanks!).

Dima



> On Apr 14, 2017, at 11:35, Finan, Sean <sean.fi...@childrens.harvard.edu> 
> wrote:
> 
> Hi Dima,
> 
> Where did you get that class?  I don't have UmlsLookupPipeline or the package 
> org.apache.ctakes.pipelines.
> 
> If you want to run from command-line I highly recommend that you use the 
> PiperFileRunner class in core.pipeline.
> 
> To run the clinical pipeline use cli parameters:
> -p DefaultFastPipeline.piper
> -i {inputDir}
> --xmiOut {outputDir}
> --user {umlsUsername}
> --pass {umlsPassword}
> 
> If you have the binary installation there is a runClinicalPipeline script in 
> bin/
> 
> PiperFileRunner can run other piper files and take other parameters
> #   Runs the pipeline in the piper file specified by -p (piperfile)
> #   with any other provided parameters.  Standard parameters are:
> # -i , --inputDir {inputDirectory}
> # -o , --outputDir {outputDirectory}
> # -s , --subDir {subDirectory}  (for i/o)
> # --xmiOut {xmiOutputDirectory} (if different from -o)
> # -l , --lookupXml {dictionaryConfigFile} (fast only)
> # --user {umlsUsername}
> # --pass {umlsPassword}
> # -? , --help
> #
> #   Other parameters may be declared in the piper file using the cli command:
> # cli {parameterName}={singleCharacter}
> #   For instance, for declaration of ParagraphAnnotator path to regex file 
> optional parameter PARAGRAPH_TYPES_PATH,
> #   in the custom piper file add the line:
> # cli PARAGRAPH_TYPES_PATH=t
> #   and when executing this script use:
> #  runPiperFile -p path/to/my/custom.piper -t path/to/my/custom.bsv  ...
> 
> 
> The above is a snippet from the runPiperFile script in the bin/ directory. 
> 
> I am in the process of writing documentation on piper files in the wiki.
> 
> https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files
> 
> 
> -Original Message-
> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
> Sent: Friday, April 14, 2017 12:17 PM
> To: cTAKES Developer list
> Subject: URI is not hierarchical
> 
> Dear cTAKES developers,
> 
> 
> 
> I am trying to run a simple pipeline that involves dictionary lookup:
> 
> 
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_UmlsLookupPipeline.java=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=-N-wYwlTFXbedV0kkcf_qgdQj3HYIeLCeKr9Q303b0Q=NWoDwPNChJkDhFseM3j8Gi7KNDdVqzi1BFt0onalf9I=
>  
> 
> 
> 
> from command line as follows:
> 
> 
> 
> mvn exec:java 
> -Dexec.mainClass=“org.apache.ctakes.pipelines.UmlsLookupPipeline” 
> 
> 
> 
> It runs fine if the dictionary lookup related fragmented is commented out, 
> but it fails with “URI is not hierarchical” when the dictionary lookup is 
> enabled.
> 
> 
> 
> I believe this is an old issue, so are there any plans for fixing it in the 
> new release? In the meantime, are there any workarounds?
> 
> 
> 
> Many thanks!
> 
> 
> 
> The full error is below.
> 
> 
> 
> Dima
> 
> 
> 
> 
> 
> 
> 
> 14 Apr 2017 11:04:24  INFO LvgAnnotator - URL for lvg.properties 
> =file:/home/dima/.m2/repository/net/sourceforge/ctakesresources/ctakes-resources-lvg2008/4.0.0/ctakes-resources-lvg2008-4.0.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
> 
> 14 Apr 2017 11:04:24  INFO SentenceDetector - Sentence detector model file: 
> org/apache/ctakes/core/sentdetect/sd-med-model.zip
> 
> 14 Apr 2017 11:04:24  INFO TokenizerAnnotatorPTB - Initializing 
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 
> 14 Apr 2017 11:04:24  INFO ContextDependentTokenizerAnnotator - Finite state 
> machines loaded.
> 
> 14 Apr 2017 11:04:24  INFO POSTagger - POS tagger model file: 
> org/apache/ctakes/postagger/models/mayo-pos.zip
> 
> 14 Apr 2017 11:04:24  INFO Chunker - Chunker model file: 
> /home/dima/cTakes/trunk/ctakes-chunker-res/src/main/resources/org/apache/ctakes/chunker/models/chunker-model.zip
> 
> 14 Apr 2017 11:04:26  INFO AbstractJCasTermAnnotator - Using dictionary 
> lookup window type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 
> 14 Apr 2017 11:04:26  INFO AbstractJCasTermAnnotator - Exclusion tagset 
> loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VB

URI is not hierarchical

2017-04-14 Thread Dligach, Dmitriy
Dear cTAKES developers,

I am trying to run a simple pipeline that involves dictionary lookup:

https://github.com/dmitriydligach/ctakes-misc/blob/master/src/main/java/org/apache/ctakes/pipelines/UmlsLookupPipeline.java

from command line as follows:

mvn exec:java -Dexec.mainClass=“org.apache.ctakes.pipelines.UmlsLookupPipeline” 

It runs fine if the dictionary lookup related fragmented is commented out, but 
it fails with “URI is not hierarchical” when the dictionary lookup is enabled.

I believe this is an old issue, so are there any plans for fixing it in the new 
release? In the meantime, are there any workarounds?

Many thanks!

The full error is below.

Dima



14 Apr 2017 11:04:24  INFO LvgAnnotator - URL for lvg.properties 
=file:/home/dima/.m2/repository/net/sourceforge/ctakesresources/ctakes-resources-lvg2008/4.0.0/ctakes-resources-lvg2008-4.0.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
14 Apr 2017 11:04:24  INFO SentenceDetector - Sentence detector model file: 
org/apache/ctakes/core/sentdetect/sd-med-model.zip
14 Apr 2017 11:04:24  INFO TokenizerAnnotatorPTB - Initializing 
org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
14 Apr 2017 11:04:24  INFO ContextDependentTokenizerAnnotator - Finite state 
machines loaded.
14 Apr 2017 11:04:24  INFO POSTagger - POS tagger model file: 
org/apache/ctakes/postagger/models/mayo-pos.zip
14 Apr 2017 11:04:24  INFO Chunker - Chunker model file: 
/home/dima/cTakes/trunk/ctakes-chunker-res/src/main/resources/org/apache/ctakes/chunker/models/chunker-model.zip
14 Apr 2017 11:04:26  INFO AbstractJCasTermAnnotator - Using dictionary lookup 
window type: org.apache.ctakes.typesystem.type.textspan.Sentence
14 Apr 2017 11:04:26  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded: 
CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP VBZ WDT 
WP WPS WRB
14 Apr 2017 11:04:26  INFO AbstractJCasTermAnnotator - Using minimum term text 
span: 3
14 Apr 2017 11:04:26  INFO AbstractJCasTermAnnotator - Using Dictionary 
Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
14 Apr 2017 11:04:26  INFO DictionaryDescriptorParser - Parsing dictionary 
specifications:
14 Apr 2017 11:04:26  INFO UmlsUserApprover - Checking UMLS Account at 
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user dmitriydligach:
.14 Apr 2017 11:04:26  INFO UmlsUserApprover -   UMLS Account at 
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user dmitriydligach has 
been validated

14 Apr 2017 11:04:26  INFO JdbcConnectionFactory - Connecting to 
jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab:
14 Apr 2017 11:04:26  INFO ENGINE - open start - state not modified
.
14 Apr 2017 11:04:32  INFO JdbcConnectionFactory -  Database connected
14 Apr 2017 11:04:32  INFO JdbcRareWordDictionary - Connected to cui and term 
table CUI_TERMS
14 Apr 2017 11:04:32  INFO JdbcConceptFactory - Connected to concept table TUI 
with class TUI
14 Apr 2017 11:04:32  INFO JdbcConceptFactory - Connected to concept table 
RXNORM with class LONG
14 Apr 2017 11:04:32  INFO JdbcConceptFactory - Connected to concept table 
PREFTERM with class PREFTERM
14 Apr 2017 11:04:32  INFO JdbcConceptFactory - Connected to concept table 
SNOMEDCT_US with class LONG
[WARNING]
java.lang.IllegalArgumentException: URI is not hierarchical
at java.io.File.(File.java:418)
at 
org.apache.ctakes.lvg.resource.LvgCmdApiResourceImpl.load(LvgCmdApiResourceImpl.java:65)
at 
org.apache.uima.resource.impl.ResourceManager_impl.registerResource(ResourceManager_impl.java:628)
at 
org.apache.uima.resource.impl.ResourceManager_impl.initializeExternalResources(ResourceManager_impl.java:464)
at 
org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:193)
at 
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:157)
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:131)
at 
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at 
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
at 
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:407)
at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:256)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:429)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:186)
at 

Re: dictionary lookup error

2017-04-14 Thread Dligach, Dmitriy
Sean, thank you very much for your help.

I just wanted to inform everybody that after I did an ‘svn up’ this morning, 
dictionary lookup started to work without doing the additional steps Sean 
describes below.

Dima



> On Apr 11, 2017, at 11:50, Finan, Sean <sean.fi...@childrens.harvard.edu> 
> wrote:
> 
> Hi Dima,
> 
> Great question.  This is part of the change for v.4.0
> 
> You will need to pick up the new dictionary (sno_rx_16ab.zip) from 
> sourceforge:
> https://sourceforge.net/projects/ctakesresources/files/
> and unzip it in your resources/ directory.
> 
> Sno_rx_16ab is a new database in hsqldb 2.3.4 format containing the standard 
> ctakes tuis, cuis, snomedct and rxnorm from the umls 2016ab release.
> 
> The ctakes.apache.org website will have an updated dictionary-fast link when 
> we are ready to release 4.0.
> The pom in trunk will be switched to automatically grab that resource when 
> you compile as soon as it is published from sourceforge (i.e. maven can find 
> it).
> 
> I apologize for not sending out this information earlier.  A few items have 
> fallen through the cracks wrt the 4.0 rc, trunk 4.0.1 updates, sourceforge 
> publication, etc.
> 
> There are running edits on the instructions for future releases ... I hope 
> that we capture everything.
> 
> Sean
> 
> -Original Message-
> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
> Sent: Tuesday, April 11, 2017 12:36 PM
> To: cTAKES Developer list
> Subject: dictionary lookup error 
> 
> Hello,
> 
> 
> 
> After a recent ‘svn up’, a simple pipeline involving dictionary lookup 
> started to fail. It fails to locate a file: 
> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml. 
> 
> 
> 
> The pipeline is here:
> 
> 
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_BasicAnnotations.java=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=UlWU5Cj26g3v3pPKCjO8HMaLfwabkSR_pSDvc7er58A=DJO0GqRKzDgpx_rGUTN6yTFJX5xeB7B4oAiJoNRfNno=
>  
> 
> 
> 
> Any thoughts on what might be going wrong? 
> 
> 
> 
> Thank you in advance.
> 
> 
> 
> Here’s the full error message:
> 
> 
> 
> Exception in thread "main" 
> org.apache.uima.resource.ResourceInitializationException: Initialization of 
> annotator class 
> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator" failed.  
> (Descriptor: )
> 
>   at 
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
> 
>   at 
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:170)
> 
>   at 
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
> 
>   at 
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
> 
>   at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
> 
>   at 
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:407)
> 
>   at 
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:256)
> 
>   at 
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:429)
> 
>   at 
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
> 
>   at 
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:186)
> 
>   at 
> org.apache.uima.fit.factory.AnalysisEngineFactory.createEngine(AnalysisEngineFactory.java:711)
> 
>   at 
> org.apache.uima.fit.factory.AggregateBuilder.createAggregate(AggregateBuilder.java:207)
> 
>   at 
> org.apache.ctakes.pipelines.BasicAnnotations.main(BasicAnnotations.java:67)
> 
> Caused by: org.apache.uima.resource.ResourceInitializationException
> 
>   at 
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:131)
> 
>   at 
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
> 
>   ... 12 more
> 
> Caused by: java.io.FileNotFoundException: 
> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml (No such file or 
> directory)
> 
>   at java.io.FileInputStream.open0(Native Method)
> 
>   at java.io.FileInputStream.open(FileInputStream.java:195)
> 
>   at java.io.FileInputStream.(FileInputStream.java:138)
> 
>   at 
> org.apache.ctakes.core.resource.FileLocator.getAsStream(FileLocator.java:61)
> 
>   at 
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:128)
> 
>   ... 13 more
> 
> 
> 
> 
> 
> Dima
> 
> 
> 
> 
> 
> 
> 



cTAKES as a dependency

2017-04-14 Thread Dligach, Dmitriy
Hello,

Has anybody tried to run a cTAKES pipeline without having a local cTAKES 
installation? In other words, is it possible to set up a maven project that 
will use cTAKES as an external dependency?

Dima





dictionary lookup error

2017-04-11 Thread Dligach, Dmitriy
Hello,

After a recent ‘svn up’, a simple pipeline involving dictionary lookup started 
to fail. It fails to locate a file: 
org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml. 

The pipeline is here:

https://github.com/dmitriydligach/ctakes-misc/blob/master/src/main/java/org/apache/ctakes/pipelines/BasicAnnotations.java

Any thoughts on what might be going wrong? 

Thank you in advance.

Here’s the full error message:

Exception in thread "main" 
org.apache.uima.resource.ResourceInitializationException: Initialization of 
annotator class 
"org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator" failed.  
(Descriptor: )
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:170)
at 
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at 
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
at 
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:407)
at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:256)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:429)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:186)
at 
org.apache.uima.fit.factory.AnalysisEngineFactory.createEngine(AnalysisEngineFactory.java:711)
at 
org.apache.uima.fit.factory.AggregateBuilder.createAggregate(AggregateBuilder.java:207)
at 
org.apache.ctakes.pipelines.BasicAnnotations.main(BasicAnnotations.java:67)
Caused by: org.apache.uima.resource.ResourceInitializationException
at 
org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:131)
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
... 12 more
Caused by: java.io.FileNotFoundException: 
org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml (No such file or 
directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at 
org.apache.ctakes.core.resource.FileLocator.getAsStream(FileLocator.java:61)
at 
org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:128)
... 13 more


Dima





Re: [Relation_Extraction] Ctakes relation extraction

2017-03-06 Thread Dligach, Dmitriy
Hi Oleg,

Currently, cTAKES only includes the models for locationOf and DegreeOf 
relations. Even though I added the code for other relation types a while ago, 
we haven’t released the models (mostly because the performance on those 
relation types are not satisfactory).

So, you should just ignore everything except locationOf and degreeOf relations.

Hope this helps,

Dima



> On Mar 6, 2017, at 03:28, Oleg Bogatiryov <oleg.bogatir...@ctco.lv> wrote:
> 
> Hi Dmitriy.
> 
> Thank you for your response.
> 
> I am using RelationExtractorPipeline class to generate xmi's.
> I got good results for LocationOfRelationExtractorAnnotator and 
> DegreeOfRelationExtratorAnnotator.
> 
> I've edited RelationExtractorAggregate.xml file with adding more annotators.
> There are some results but category is displayed like "O".
> 
> For example working results:
> 
> class org.apache.ctakes.typesystem.type.relation.LocationOfTextRelation
> Category: location_of
> Argument1:blood pressure
> Role: Argument
> Argument2:blood
> Role: Related_to
> 
> 
> class org.apache.ctakes.typesystem.type.relation.DegreeOfTextRelation
> Category: degree_of
> Argument1:anxiety
> Role: Argument
> Argument2:acute
> Role: Related_to
> 
> 
> 
> And non-properly working
> 
> class org.apache.ctakes.typesystem.type.relation.ManifestationOfTextRelation
> Category: O
> Argument1:anxiety
> Role: Argument
> Argument2:anxiety
> Role: Related_to
> 
> class 
> org.apache.ctakes.typesystem.type.relation.CausesBringsAboutTextRelation
> Category: O
> Argument1:anxiety
> Role: Argument
> Argument2:symptoms
> Role: Related_to
> 
> Category is O.
> Could you please help me with getting right results for other relations 
> extraction annotators ?
> 
> I've attached my ae descriptors. They are primitive and included into 
> RelationExtractionAggregate.xml.
> Please remove .ze from the achive and uzip it.
> 
> Thanks in advance,
> Oleg.
> 
> 
> -Original Message-
> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu]
> Sent: Tuesday, February 28, 2017 1:58
> To: dev@ctakes.apache.org
> Subject: Re: Ctakes relation extraction
> 
> Hi Oleg,
> 
> You may want to look into learning about UIMA and UIMAFIT.
> 
> Once you are somewhat comfortable with these frameworks, take a look at the 
> package:
> 
> ctakes/ctakes-relation-extractor/src/main/java/org/apache/ctakes/relationextractor/pipelines/
> 
> There are several pipelines that may work for you (e.g. take a look at 
> RelationExtractorPipeline.java). Many of them should generate XMI files 
> which contain the relation annotations you are looking for. Once you have 
> generated the XMI files, you can parse them using UIMAFIT. E.g. take a look 
> at RelationAnnotationViewer.java which you should use as an example.
> 
> Hope this helps!
> 
> Dima
> 
> 
> 
>> On Feb 24, 2017, at 03:07, Oleg Bogatiryov <oleg.bogatir...@ctco.lv> 
>> wrote:
>> 
>> Hi Dmitriy.
>> 
>> Thank you for your reply.
>> 
>> I am not sure how can I extract location_of and degree_of.
>> AggregateTemplateFiller doesn't return anything in CVD as well as
>> TemporalAggregatePipeline.
>> There is BinaryTextRelation class that as I understand should fill in
>> relations but no data is displayed.
>> 
>> I was trying to execute RelationExtractorPipelineSingleCas but it
>> doesn't print anything related to relations.
>> 
>> Could you please help me with extraction of relations from the med text ?
>> 
>> As I am a new to ctakes and uima could you please provide step to step
>> instructions and exact classes to use.
>> 
>> 
>> Thanks in advance,
>> Oleg.
>> 
>> -Original Message-
>> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu]
>> Sent: 20 февраля 2017 г. 17:48
>> To: dev@ctakes.apache.org
>> Subject: Re: Ctakes relation extraction
>> 
>> Hi Oleg,
>> 
>> The relation extraction AE currently only handles location_of and
>> degree_of relations as described here:
>> 
>> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3994852/
>> 
>> The dependencies are handled by the dependency parser (i.e. a separate
>> module).
>> 
>> So you’ll need to run both and collect all annotated relations from
>> the the CAS.
>> 
>> Hope this helps.
>> 
>> Dima
>> 
>> 
>> 
>>> On Feb 20, 2017, at 01:31, Oleg Bogatiryov <oleg.bogatir...@ctco.lv>
>>> wrote:
>>> 
>>> Hello to everyone.
>>> 
>>> 
>>> 
>>> I am pleased to join the group.
>>> 
>>> 
>>> 
>>> I am trying to extract relation from the document.
>>> 
>>> Ideally I'd like to get the graph or tree of dependencies/relations
>>> from the clinical documents.
>>> 
>>> 
>>> 
>>> Could you please let me know how can I achieve it ?
>>> 
>>> 
>>> 
>>> I am able to run CVD and RelationExtractorAggregate analysis engine
>>> but there is no useful information
>>> 
>>> in results that can be used in order to build a relation graph.
>>> 
>>> 
>>> 
>>> Thanks in advance,
>>> 
>>> Oleg.
>>> 
>> 
> 
> 



Re: Ctakes relation extraction

2017-02-27 Thread Dligach, Dmitriy
Hi Oleg,

You may want to look into learning about UIMA and UIMAFIT.

Once you are somewhat comfortable with these frameworks, take a look at the 
package:

ctakes/ctakes-relation-extractor/src/main/java/org/apache/ctakes/relationextractor/pipelines/

There are several pipelines that may work for you (e.g. take a look at 
RelationExtractorPipeline.java). Many of them should generate XMI files which 
contain the relation annotations you are looking for. Once you have generated 
the XMI files, you can parse them using UIMAFIT. E.g. take a look at 
RelationAnnotationViewer.java which you should use as an example.

Hope this helps!

Dima



> On Feb 24, 2017, at 03:07, Oleg Bogatiryov <oleg.bogatir...@ctco.lv> wrote:
> 
> Hi Dmitriy.
> 
> Thank you for your reply.
> 
> I am not sure how can I extract location_of and degree_of.
> AggregateTemplateFiller doesn't return anything in CVD as well as 
> TemporalAggregatePipeline.
> There is BinaryTextRelation class that as I understand should fill in 
> relations but no data is displayed.
> 
> I was trying to execute RelationExtractorPipelineSingleCas but it doesn't 
> print anything related to relations.
> 
> Could you please help me with extraction of relations from the med text ?
> 
> As I am a new to ctakes and uima could you please provide step to step 
> instructions and exact classes to use.
> 
> 
> Thanks in advance,
> Oleg.
> 
> -Original Message-
> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu]
> Sent: 20 февраля 2017 г. 17:48
> To: dev@ctakes.apache.org
> Subject: Re: Ctakes relation extraction
> 
> Hi Oleg,
> 
> The relation extraction AE currently only handles location_of and degree_of 
> relations as described here:
> 
> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3994852/
> 
> The dependencies are handled by the dependency parser (i.e. a separate 
> module).
> 
> So you’ll need to run both and collect all annotated relations from the the 
> CAS.
> 
> Hope this helps.
> 
> Dima
> 
> 
> 
>> On Feb 20, 2017, at 01:31, Oleg Bogatiryov <oleg.bogatir...@ctco.lv> 
>> wrote:
>> 
>> Hello to everyone.
>> 
>> 
>> 
>> I am pleased to join the group.
>> 
>> 
>> 
>> I am trying to extract relation from the document.
>> 
>> Ideally I'd like to get the graph or tree of dependencies/relations
>> from the clinical documents.
>> 
>> 
>> 
>> Could you please let me know how can I achieve it ?
>> 
>> 
>> 
>> I am able to run CVD and RelationExtractorAggregate analysis engine
>> but there is no useful information
>> 
>> in results that can be used in order to build a relation graph.
>> 
>> 
>> 
>> Thanks in advance,
>> 
>> Oleg.
>> 
> 



Re: Phenotype-specific entities [SUSPICIOUS] [SUSPICIOUS]

2017-02-15 Thread Dligach, Dmitriy
Very nice! Thank you, Tim and Sean.

Dima



> On Feb 15, 2017, at 13:25, Miller, Timothy 
> <timothy.mil...@childrens.harvard.edu> wrote:
> 
> Lol was just about to send this:
> https://github.com/tmills/umls-graph-api
> 
> It points at your umls META directory, reads in the ctakes list of TUIs, and 
> builds a neo4j graph database with all the ISA links, and has a simple API 
> for getting parent/child CUIs.
> I used it for coreference.
> Tim
> 
> 
> From: Finan, Sean <sean.fi...@childrens.harvard.edu>
> Sent: Wednesday, February 15, 2017 2:23 PM
> To: dev@ctakes.apache.org
> Subject: RE: Phenotype-specific entities [SUSPICIOUS] [SUSPICIOUS]
> 
> The dictionary gui doesn't walk the ontology.  There are umls tables that 
> list relations, wherein things like "isa" (is a) relations may satisfy a 
> hypernym requirement.  If you have the umls rrf files look at mrrel.rrf.  The 
> structure is basically concept1|..| concept2|..|relationtype|..   See section 
> 3.3.9: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_books_NBK9685_=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=agwhlSpSUQ0H6VeJpnACDGcka2fVYSy3HaITKaJN9S8=aODQU20_0mAv1i_izwB3RZGOBB0U0ZucFkByxovUJJc=
> 
> If anybody has made anything that parses/uses umls relations and can be used 
> by ctakes, please contribute!  Something like a simple traversable umls 
> graphdb would be a great addition ...  Even if it is incomplete or rough, it 
> could be a valuable seed for a new effort.
> 
> Sean
> 
> -Original Message-
> From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu]
> Sent: Wednesday, February 15, 2017 1:54 PM
> To: dev@ctakes.apache.org
> Subject: RE: Phenotype-specific entities [SUSPICIOUS]
> 
> I don't believe there is a tool for walking the UMLS ontology, Dima. But Sean 
> should confirm that his dictionary building tool does not have that 
> functionality.
> 
> I think you can use the UMLS tables to get that information. It has been 
> quite a while I have used these tables, but I remember I was able to get that 
> information from them...
> 
> Sean,
> Does your dictionary building tool implement ontology walking?
> 
> --Guergana
> 
> -Original Message-
> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu]
> Sent: Wednesday, February 15, 2017 1:50 PM
> To: dev@ctakes.apache.org
> Subject: Re: Phenotype-specific entities
> 
> Guergana, thank you.
> 
> Is there anything in cTAKES now for walking the UMLS ontology (e.g. for 
> finding hypernyms, synonyms, etc.)?
> 
> Dima
> 
> 
> 
>> On Feb 15, 2017, at 12:45, Savova, Guergana 
>> <guergana.sav...@childrens.harvard.edu> wrote:
>> 
>> Hi Erin,
>> Yes, creating your customized dictionary is the way to go. You can prune by 
>> semantic types of interest and then remove branches that are not relevant to 
>> your specific phenotype. I am not aware of cTAKES implementing such a tool 
>> for a very customized dictionary.
>> 
>> You can also start with  a few terms that you know are relevant to your 
>> phenotype and then find their synonyms in the UMLS. Then, you can further 
>> walk a specific ontology and take siblings, parents if you think they are 
>> relevant.
>> 
>> Then, there is the whole field of using word embeddings to find 
>> synonyms/related terms from unlabeled data  if you want to become really 
>> fancy :-) At this point, cTAKES does not implement any deep learning 
>> algorithms, in the future we are planning to release a bridge to KERAS.
>> 
>> I hope this makes sense.
>> 
>> --
>> Guergana Savova, PhD, FACMI
>> Associate Professor
>> PI Natural Language Processing Lab
>> Boston Children's Hospital and Harvard Medical School
>> 300 Longwood Avenue
>> Mailstop: BCH3092
>> Enders 144.1
>> Boston, MA 02115
>> Tel: (617) 919-2972
>> Fax: (617) 730-0817
>> guergana.sav...@childrens.harvard.edu
>> Harvard Scholar: 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.edu_guergana-5Fk-5Fsavova_biocv=DwIFAw=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=EMsbVKH4fuTPUXGVRWfjw4vqV3ifyKdh-3K3OLUIogI=oAz3p_diNUmQdKL6UIfE9Vsnj1T4H5xq6CIof1jXisU=
>> ctakes.apache.org
>> thyme.healthnlp.org
>> cancer.healthnlp.org
>> share.healthnlp.org
>> 
>> 
>> -Original Message-
>> From: Erin Nicole Gustafson [mailto:erin.gustaf...@northwestern.edu]
>> Sent: Wednesday, Feb

Re: Phenotype-specific entities

2017-02-15 Thread Dligach, Dmitriy
Guergana, thank you. 

Is there anything in cTAKES now for walking the UMLS ontology (e.g. for finding 
hypernyms, synonyms, etc.)?

Dima



> On Feb 15, 2017, at 12:45, Savova, Guergana 
>  wrote:
> 
> Hi Erin,
> Yes, creating your customized dictionary is the way to go. You can prune by 
> semantic types of interest and then remove branches that are not relevant to 
> your specific phenotype. I am not aware of cTAKES implementing such a tool 
> for a very customized dictionary.
> 
> You can also start with  a few terms that you know are relevant to your 
> phenotype and then find their synonyms in the UMLS. Then, you can further 
> walk a specific ontology and take siblings, parents if you think they are 
> relevant.
> 
> Then, there is the whole field of using word embeddings to find 
> synonyms/related terms from unlabeled data  if you want to become really 
> fancy :-) At this point, cTAKES does not implement any deep learning 
> algorithms, in the future we are planning to release a bridge to KERAS. 
> 
> I hope this makes sense.
> 
> --
> Guergana Savova, PhD, FACMI
> Associate Professor
> PI Natural Language Processing Lab
> Boston Children's Hospital and Harvard Medical School
> 300 Longwood Avenue
> Mailstop: BCH3092
> Enders 144.1
> Boston, MA 02115
> Tel: (617) 919-2972
> Fax: (617) 730-0817
> guergana.sav...@childrens.harvard.edu
> Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
> ctakes.apache.org
> thyme.healthnlp.org
> cancer.healthnlp.org
> share.healthnlp.org
> 
> 
> -Original Message-
> From: Erin Nicole Gustafson [mailto:erin.gustaf...@northwestern.edu] 
> Sent: Wednesday, February 15, 2017 1:38 PM
> To: dev@ctakes.apache.org
> Subject: Phenotype-specific entities
> 
> Hi all,
> 
> I would like to be able to only identify entities that are relevant for some 
> specific phenotype. One step towards achieving this would be to build a 
> custom dictionary with a limited set of semantic types. However, this is not 
> quite specific enough to only identify mentions related to one disease while 
> ignoring those related to some other disease, for example.
> 
> Does cTAKES currently have a way to do this sort of filtering? Or, has anyone 
> developed their own tools that they'd be willing to share?
> 
> Thanks,
> Erin



Re: running pipeline from command line

2016-06-29 Thread Dligach, Dmitriy
Hi Peter,

This helps, thank you.

I also found a way to get the full command-line that Eclipse uses to run a 
class. (I believe Tim Miller discovered this a while ago too, but I had to 
re-discover this now).

http://stackoverflow.com/questions/2276219/can-i-run-from-command-line-program-created-by-eclipse

"Go to the Debug perspective, and select the program you just run (where it 
says Termintated, exit value... in the Debug tab) Right click, and choose 
Properties, there you can see the whole command line command that was launched 
by eclipse.”

It would still be nice to figure out how to run using maven (i.e. without 
having to specify the classpath manually).

Thank you everybody who helped with this issue.

Dima

> On Jun 28, 2016, at 13:32, Abramowitsch, Peter <pabramowit...@hearst.com> 
> wrote:
> 
> Hi Dima
> 
> Personally I hate Maven because of how it obscures classpaths and
> dependencies (even though it purports to make that easier).   My use case
> is a little different than yours, but when I launch a little pipeline from
> the command line, it looks as simple as this:
> 
> Cd into the top of my application directory
> export MSPK=$PWD
> 
> /usr/bin/java -cp 
> "${MSPK}/classes":"${MSPK}/desc/":"${MSPK}/resources/":"${MSPK}/lib/*"
> -Dlog4j.configuration=file:${MSPK}/config/log4j.xml
> -Dctakes.umlsuser=x  -Dctakes.umlspw=  -Xms512M -Xmx3g
> com.hbm.Main
> 
> 
> ** Desc being a tree where I have recursively copied all the descs from
> individual ctakes packages
> ** Resources being a tree where I have recursively copied all the resource
> from individual ctakes packages
> ** Lib contaning the complete set of Ctakes jars plus their supporting
> jars.
> 
> My main starts up a REST server that loads a pipeline and connects
> external requests to it.
> 
> 
> Hope this helps
> Peter
> 
> 
> 
> On 6/28/16, 11:22 AM, "Dligach, Dmitriy" <ddlig...@luc.edu> wrote:
> 
>> Dear cTAKES developers,
>> 
>> 
>> 
>> I ran into new issues running a uimafit pipeline from command line. I am
>> trying to get a basic pipeline to run using this command:
>> 
>> 
>> 
>> mvn exec:java 
>> -Dexec.mainClass=³org.apache.ctakes.examples.pipelines.BasicAnnotations"
>> 
>> 
>> 
>> It runs fine in Eclipse, but it stumbles on dictionary lookup when I run
>> from command line. The dictionary lookup is added as follows:
>> 
>> 
>> 
>> aggregateBuilder.add(
>> DefaultJCasTermAnnotator.createAnnotatorDescription() )
>> 
>> 
>> 
>> The full error message is below, but basically it is unable to find
>> org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml.
>> 
>> 
>> 
>> Which does exist on my system in two places:
>> 
>> 
>> 
>> ./ctakes-dictionary-lookup-fast-res/src/main/resources/org/apache/ctakes/d
>> ictionary/lookup/fast/cTakesHsql.xml
>> 
>> ./ctakes-dictionary-lookup-fast-res/target/classes/org/apache/ctakes/dicti
>> onary/lookup/fast/cTakesHsql.xml
>> 
>> 
>> 
>> I noticed that when I execute the mvn exec:java command, maven downloads
>> the jars into my maven repository. Not sure why it¹s doing that ‹
>> shouldn¹t it be able to run everything from the project directories?
>> 
>> 
>> 
>> Any thoughts will be greatly appreciated.
>> 
>> 
>> 
>> Dima
>> 
>> 
>> 
>> 28 Jun 2016 12:28:29  INFO AbstractJCasTermAnnotator - Using Dictionary
>> Descriptor: org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml
>> 
>> [WARNING] 
>> 
>> java.lang.reflect.InvocationTargetException
>> 
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 
>>  at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
>> 62)
>> 
>>  at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
>> pl.java:43)
>> 
>>  at java.lang.reflect.Method.invoke(Method.java:497)
>> 
>>  at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:294)
>> 
>>  at java.lang.Thread.run(Thread.java:745)
>> 
>> Caused by: org.apache.uima.resource.ResourceInitializationException:
>> Initialization of annotator class
>> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator"
>> failed.  (Descriptor: )
>> 
>>  at 
>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initiali
>> zeAnalysisComp

Re: URI is not hierarchical

2016-06-28 Thread Dligach, Dmitriy
Thanks, Chris.

I didn’t realize cTAKES was relying on environment variables for locating 
resources. Do you know which environment variable might be the issue?

Thanks again for your help,

Dima

> On Jun 28, 2016, at 10:26, Mattmann, Chris A (3980) 
> <chris.a.mattm...@jpl.nasa.gov> wrote:
> 
> Hi Dima,
> 
> In Apache OODT, when we do environment variable replacement on a URI,
> e.g., file://[SOME_ENV]/path, if $SOME_ENV isn’t defined, you get that
> error like you had below so that was the source of my suggestion sorry
> for not clarifying.
> 
> Cheers,
> Chris
> 
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 6/28/16, 8:03 AM, "Dligach, Dmitriy" <ddlig...@luc.edu> wrote:
> 
>> Hi Chris,
>> 
>> Sorry, could you please clarify which environment variable I am missing?
>> 
>> Thanks,
>> 
>> Dima
>> 
>>> On Jun 28, 2016, at 09:49, Mattmann, Chris A (3980) 
>>> <chris.a.mattm...@jpl.nasa.gov> wrote:
>>> 
>>> Looks to me like a missing environment variable..
>>> 
>>> ++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398)
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattm...@nasa.gov
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++
>>> Director, Information Retrieval and Data Science Group (IRDS)
>>> Adjunct Associate Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> WWW: http://irds.usc.edu/
>>> ++
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 6/28/16, 7:45 AM, "Dligach, Dmitriy" <ddlig...@luc.edu> wrote:
>>> 
>>>> Dear cTAKES developers,
>>>> 
>>>> I am looking for a simple cTAKEs pipeline that can be run from command 
>>>> line to introduce a group of people to cTAKES. This pipeline:
>>>> 
>>>> ctakes-clinical-pipeline/src/main/java/org/apache/ctakes/clinicalpipeline/ClinicalPipelineWithUmls.java
>>>> 
>>>> seemed like a good candidate so I gave it a try. It runs fine when in 
>>>> Eclipse, but when I run it from command line:
>>>> 
>>>> mvn exec:java 
>>>> -Dexec.mainClass=“org.apache.ctakes.clinicalpipeline.ClinicalPipelineWithUmls”
>>>> 
>>>> It begins to run but then crashes. I am including the error messages 
>>>> below. Any clues?
>>>> 
>>>> Thank you in advance,
>>>> 
>>>> Dima
>>>> 
>>>> ava.lang.reflect.InvocationTargetException
>>>>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>at 
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>at 
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>at java.lang.reflect.Method.invoke(Method.java:497)
>>>>at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
>>>>at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>>>>at java.io.File.(File.java:418)
>>>>at 
>>>> org.apache.ctakes.lvg.resource.LvgCmdApiResourceImpl.load(LvgCmdApiResourceImpl.java:65)
>>>>at 
>>>> org.apache.uima.resource.impl.ResourceManager_impl.registerResource(ResourceManager

Re: URI is not hierarchical

2016-06-28 Thread Dligach, Dmitriy
Hi Chris,

Sorry, could you please clarify which environment variable I am missing?

Thanks,

Dima

> On Jun 28, 2016, at 09:49, Mattmann, Chris A (3980) 
> <chris.a.mattm...@jpl.nasa.gov> wrote:
> 
> Looks to me like a missing environment variable..
> 
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 6/28/16, 7:45 AM, "Dligach, Dmitriy" <ddlig...@luc.edu> wrote:
> 
>> Dear cTAKES developers,
>> 
>> I am looking for a simple cTAKEs pipeline that can be run from command line 
>> to introduce a group of people to cTAKES. This pipeline:
>> 
>> ctakes-clinical-pipeline/src/main/java/org/apache/ctakes/clinicalpipeline/ClinicalPipelineWithUmls.java
>> 
>> seemed like a good candidate so I gave it a try. It runs fine when in 
>> Eclipse, but when I run it from command line:
>> 
>> mvn exec:java 
>> -Dexec.mainClass=“org.apache.ctakes.clinicalpipeline.ClinicalPipelineWithUmls”
>> 
>> It begins to run but then crashes. I am including the error messages below. 
>> Any clues?
>> 
>> Thank you in advance,
>> 
>> Dima
>> 
>> ava.lang.reflect.InvocationTargetException
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>  at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>  at java.lang.reflect.Method.invoke(Method.java:497)
>>  at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
>>  at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
>>  at java.io.File.(File.java:418)
>>  at 
>> org.apache.ctakes.lvg.resource.LvgCmdApiResourceImpl.load(LvgCmdApiResourceImpl.java:65)
>>  at 
>> org.apache.uima.resource.impl.ResourceManager_impl.registerResource(ResourceManager_impl.java:603)
>>  at 
>> org.apache.uima.resource.impl.ResourceManager_impl.initializeExternalResources(ResourceManager_impl.java:442)
>>  at 
>> org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:153)
>>  at 
>> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:157)
>>  at 
>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:123)
>>  at 
>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
>>  at 
>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
>>  at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
>>  at 
>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)
>>  at 
>> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)
>>  at 
>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431)
>>  at 
>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
>>  at 
>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185)
>>  at 
>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
>>  at 
>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
>>  at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
>>  at 
>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)
>>  at 
>> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)
>> 

URI is not hierarchical

2016-06-28 Thread Dligach, Dmitriy
Dear cTAKES developers,

I am looking for a simple cTAKEs pipeline that can be run from command line to 
introduce a group of people to cTAKES. This pipeline:

ctakes-clinical-pipeline/src/main/java/org/apache/ctakes/clinicalpipeline/ClinicalPipelineWithUmls.java

seemed like a good candidate so I gave it a try. It runs fine when in Eclipse, 
but when I run it from command line:

mvn exec:java 
-Dexec.mainClass=“org.apache.ctakes.clinicalpipeline.ClinicalPipelineWithUmls”

It begins to run but then crashes. I am including the error messages below. Any 
clues?

Thank you in advance,

Dima

ava.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
at java.io.File.(File.java:418)
at 
org.apache.ctakes.lvg.resource.LvgCmdApiResourceImpl.load(LvgCmdApiResourceImpl.java:65)
at 
org.apache.uima.resource.impl.ResourceManager_impl.registerResource(ResourceManager_impl.java:603)
at 
org.apache.uima.resource.impl.ResourceManager_impl.initializeExternalResources(ResourceManager_impl.java:442)
at 
org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:153)
at 
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:157)
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:123)
at 
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at 
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
at 
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)
at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185)
at 
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at 
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
at 
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)
at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185)
at 
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at 
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:314)
at 
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:425)
at 
org.apache.uima.fit.pipeline.JCasIterable.iterator(JCasIterable.java:76)
at 
org.apache.uima.fit.pipeline.JCasIterable.iterator(JCasIterable.java:42)
at 
org.apache.ctakes.clinicalpipeline.ClinicalPipelineWithUmls.main(ClinicalPipelineWithUmls.java:68)
... 6 more




file not found error

2016-04-11 Thread Dligach, Dmitriy
I installed a fresh copy of cTAKES and tried to run 
org.apache.ctakes.clinicalpipeline.ClinicalPipelineFactory (see below). It 
completed, but in the process I got a lot of FileNotFoundException(s) that all 
referred to a few subdirectories in ctakes-clinical-pipeline/. Did I need to 
install these files separately?

Thanks,

Dima

--

Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/misc/stopWords.data
 (No such file or directory)
** Error: problem of opening/reading stop words file: 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/misc/stopWords.data'.
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/misc/nonInfoWords.data
 (No such file or directory)
** Error: problem of opening/reading non-Info words file: 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/misc/nonInfoWords.data'.
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/misc/conjunctionWord.data
 (No such file or directory)
** Error: problem of opening/reading conjunction words file: 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/misc/conjunctionWord.data'.
** ERR: problem of opening/reading diacritics file: 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/Unicode/diacriticMap.data'.
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/Unicode/diacriticMap.data
 (No such file or directory)
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/Unicode/ligatureMap.data
 (No such file or directory)
** Error: problem of opening/reading ligature file: 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/Unicode/ligatureMap.data'.
** Error: problem of opening/reading symbol synonym file: 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/Unicode/synonymMap.data'.
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/Unicode/synonymMap.data
 (No such file or directory)
**Error: problem of opening/reading file 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/misc/removeS.data'.
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/misc/removeS.data
 (No such file or directory)
** Error: problem of opening/reading Unicode symbol file: 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/Unicode/symbolMap.data'.
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/Unicode/symbolMap.data
 (No such file or directory)
** Error: problem of opening/reading Unicode file: 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/Unicode/unicodeMap.data'.
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/Unicode/unicodeMap.data
 (No such file or directory)
** Error: problem of opening/reading nonStripMap file: 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/Unicode/nonStripMap.data'.
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/Unicode/nonStripMap.data
 (No such file or directory)
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/rules/im.rul
 (No such file or directory)
**Error: problem of opening/reading file 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline//data/rules/im.rul'.
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/rules/im.rul
 (No such file or directory)
**Error: problem of opening/reading file 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline//data/rules/im.rul'.
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/rules/dm.rul
 (No such file or directory)
**Error: problem of opening/reading file 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline//data/rules/dm.rul'.
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/rules/dm.rul
 (No such file or directory)
**Error: problem of opening/reading file 
'/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline//data/rules/dm.rul'.
Exception: java.io.FileNotFoundException: 
/Users/Dima/Loyola/Workspaces/cTakes/ctakes/ctakes-clinical-pipeline/data/misc/stopWords.data
 (No such file or directory)
** Error: problem of opening/reading stop words file: 

type for lists/conjunctions

2015-10-27 Thread Dligach, Dmitriy
Do we now have type in ctakes type system to handle lists and conjunctions such 
as in:

Evidence for metastatic disease in the [liver, spleen, pancreas or gallbladder].

The span I’d like to capture is in the square brackets. If we don’t have 
anything for this type of annotation, would it be worth creating one? Maybe 
make it a subtype of IdentifiedAnnotation?



Dima

--
Dmitriy (Dima) Dligach, Ph.D.
Boston Children's Hospital and Harvard Medical School
(617) 651-0397





Re: Fast Dictionary Update

2015-09-17 Thread Dligach, Dmitriy
Hi Brandon,

Relation extraction at the moment only handles two specific relation types: 
LocationOf and DegreeOf. You are welcome to run it if you need these specific 
relations.


Dima

--
Dmitriy (Dima) Dligach, Ph.D.
Boston Children's Hospital and Harvard Medical School
(617) 651-0397



On Sep 17, 2015, at 17:08, Geise, Brandon D. 
> wrote:

Does the RelationsExtractor need to be run in order to generate information on 
relationships from cTakes?  When running with 2011 UMLS dictionary I'm able to 
get relationships for BodyLocationMentions but with the dictionary I created I 
am not getting this information.  Any advice?

Thanks,
Brandon

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Thursday, September 17, 2015 1:18 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

It claims that the database is connected and the preceding line of are spat out 
during loading, which took ~3-4 seconds (so something was there):

17 Sep 2015 12:58:58  INFO JdbcConnectionFactory -  Database connected

Strange.  I don't really know what to tell you right now.  Perhaps something 
will click with me later ...


Did you also run org.apache.ctakes.dictionarytool.CodeMapCreator ?  It isn't 
strictly necessary but it stores the tuis in the database so that cTakes can 
identify the semantic group of a mention.




-Original Message-
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Thursday, September 17, 2015 1:02 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Not specifically loaded.  Here's what I see when loading the pipeline:

17 Sep 2015 12:58:54  INFO JdbcConnectionFactory - Connecting to 
jdbc:hsqldb:file:path/to/ctakes/ctakes-dictionary-lookup-fast-res/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/UMLS2015/snorx2015:

17 Sep 2015 12:58:58  INFO JdbcConnectionFactory -  Database connected

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Thursday, September 17, 2015 12:57 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Making an alternate copy of cTakesHsql.xml and pointing to the new dictionary 
is all that is necessary.  Do you see a message in the initialization output 
indicating that the dictionary db has been loaded?

-Original Message-
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Thursday, September 17, 2015 12:54 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Great, thanks both seemed to work for populating the script table.

Besides the path to the new dictionary needing to be changed in cTakesHsql.xml, 
does anything else need to be modified to use the new dictionary?  My pipeline 
runs however there aren't any annotations related to the UMLS concepts.  The 
only annotations I'm seeing are date, roman numeral, or modifier related. (My 
pipeline if UMLSFastProcessor with additions for modifiers and templatefiller). 
 Any suggestions would be appreciated.

Thanks,
Brandon

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Thursday, September 17, 2015 10:40 AM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Correct, Hsql should automatically read the .log file upon first use, and then 
perform the inserts into the .script file.

In case you want to play it safe, check the README in the resource/ directory 
(where you got the hsqldb template).  The last paragraph indicates how you can 
launch a simple sql tool to play with the db.  You will need to change the name 
of the db accordingly.  Upon first launch of the sql tool everything should be 
moved from the .log to the .script file.   It is a strange setup/workflow, but 
it seems to work.

Sean

-Original Message-
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Thursday, September 17, 2015 10:31 AM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

When I run the tool it outputs a file with a .log extension that has all the 
insert statements.  Do I copy this to the .script template from memcachedb in 
the dictionarytool project or should the inserts be put into the .script file 
by default on the program execution?

Thanks,
Brandon

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 9:59 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Excellent!

-Original Message-
From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
Sent: Wednesday, September 16, 2015 9:55 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

No, I had changed it on 

Re: Allergy Annotator

2015-07-10 Thread Dligach, Dmitriy
Hi Tom,

If the patters are pretty simple, you could just add a few rules on top of the 
cTAKES dictionary lookup output. Something of the kind “allergic to 
medication” or “allergies: medication1, medication2, substance1, ...”.

If these patterns are hard to express as rules, you should consider a machine 
learning based sequence labeling route (e.g. something similar to the cTAKES 
chunker).


Dima

--
Dmitriy (Dima) Dligach, Ph.D.
Boston Children's Hospital and Harvard Medical School
(617) 651-0397



On Jul 10, 2015, at 13:40, Tom Devel 
deve...@gmail.commailto:deve...@gmail.com wrote:

Sean,

It would be a wider net, such that if an allergy is mentioned in the
clinical note, this is captured in the corresponding IdentifiedAnnotation
(or alternatively, if the IdentifiedAnnotation class should not be changed
with a new attribute, in a separate allergy annotation).

This annotator would then have to of course run after the clinical pipeline
has run and discovered all IdentifiedAnnotations.

I am familiar with writing UIMA/cTAKES annotators, but not sure how a new
ML method could be integrated here for detecting allergies. Do you have any
thoughts about how to approach this in general?

Thanks,
Tom

On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean 
sean.fi...@childrens.harvard.edumailto:sean.fi...@childrens.harvard.edu 
wrote:

Hi Tom,

Are you interested in catching all allergies or just a few specific
allergies for a study?  If you are only concerned with a few then there is
a (possibly) simple solution.  If you are interested in throwing a wider
net then I think that a new module would need to be created; does anybody
reading this have an ML or regex style module?

Sean

-Original Message-
From: Tom Devel [mailto:deve...@gmail.com]
Sent: Friday, July 10, 2015 12:42 PM
To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org
Subject: Allergy Annotator

Hi,

I would like to use/extend cTAKES to detect allergies.

In the cTAKES publication (2010)

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih.gov_pmc_articles_PMC2995668_d=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTaom=ZApJmGKjzvFfNco5rRFVwSIyxmg4MRsxakfuXHbMZMEs=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe7t5EWcvhPYW7Loe=
there is the mention
that: Allergies to a given medication are handled by setting the negation
attribute of that medication to ‘is negated’.

However, in a post here in 2014 (RE: Allergy Indication) it is said that
cTAKES does not have a module for allergy discovery.

1. What is the current status of allergy detection in cTAKES?

2. I did some testing, while cTAKES discovers concepts about allegies
(wheat allergy is found as C0949570), using ALLERGIES:  PENICILLIN,
WHEAT or The patient is allergic to penicillin. does not give penicillin
or wheat annotations allergy status.

How would I go about detecting these allergy mentions?

Thanks,
Tom




Re: OpenNLP VS UIMA, general question.

2015-05-19 Thread Dligach, Dmitriy
I am also not fully clear on what’s being asked here, but I want to point out 
that OpenNLP provides some UIMA wrappers (if I remember correctly).

Some links are available here:

http://stackoverflow.com/questions/10829410/how-can-i-integrate-opennlp-with-uima


Dima

--
Dmitriy (Dima) Dligach, Ph.D.
Boston Children's Hospital and Harvard Medical School
(617) 651-0397



On May 19, 2015, at 9:52, Miller, Timothy 
timothy.mil...@childrens.harvard.edu wrote:

I'm not totally sure I understand your question. But if you are asking
if it's possible to use the clinical-trained OpenNLP models released
with cTAKES without using UIMA, yes, it should be possible. Some of the
cTAKES modules simply wrap OpenNLP APIs, and convert the
POS-tagged/Chunked/Parsed output into the UIMA typesystem.

Or are you asking if you can write a UIMA annotator that consumes
OpenNLP annotations rather than cTAKES' UIMA-based annotations? That is
possible too, though would definitely add complexity to a UIMA pipeline.

Tim

On 05/19/2015 10:08 AM, Damir Olejar wrote:
To whom it may concern,

I would like to ask whether it is possible to have a code written for
OpenNLP and then, if necessary, integrate it with UIMA.  Furthermore, is it
possible to go from UIMA to OpenNLP ? For example, I am interested in a
medical analysis with cTakes, but I cannot find a way how to do it using
only the OpenNLP.

The reason why I want to rely on OpenNLP as much as possible, is simply due
to a complexity of applications I am developing, and UIMA would simply
complicate everything without a necessity.

Thank you kindly for your answers!

Damir Olejar





head word identification

2015-03-02 Thread Dligach, Dmitriy
Hello,

Is anybody aware of a reliable way of identifying the head word of a UMLS 
entity? In the general domain, people often use Collins rules, but I’m not sure 
whether they would be applicable to clinical entities.

Until recently I was under impression that taking the last word of an entity 
would work pretty well, but now that I have looked at the data more closely, I 
am not so sure. E.g. it fails in these cases: “breast, left”, “ductal carcinoma 
in situ”, “carcinoma, consistent with breast primary”.

Dima


Dmitriy (Dima) Dligach, Ph.D.
Boston Children's Hospital and Harvard Medical School
(617) 651-0397





Re: slow dictionary lookup

2015-02-23 Thread Dligach, Dmitriy
I seem to have fixed this problem for now by unchecking the Project/Build 
Automatically option in Eclipse (and then following the instructions on the web 
page).

Dima


On Feb 22, 2015, at 10:16, Dmitriy Dligach 
dmitriy.dlig...@childrens.harvard.edumailto:dmitriy.dlig...@childrens.harvard.edu
 wrote:

Hello,

Despite of following the instructions here:
http://ctakes.apache.org/developer-faqs.html#how-do-i-work-around-issues-with-resource-directories-in-eclipse-and-m2e

I am still having the issue described on that page (slow dictionary lookup). 
Does anybody have any advice? Any alternative solutions?

Thank you in advance,

Dima







slow dictionary lookup

2015-02-22 Thread Dligach, Dmitriy
Hello,

Despite of following the instructions here:
http://ctakes.apache.org/developer-faqs.html#how-do-i-work-around-issues-with-resource-directories-in-eclipse-and-m2e

I am still having the issue described on that page (slow dictionary lookup). 
Does anybody have any advice? Any alternative solutions?

Thank you in advance,

Dima






Re: cTakes and uimaFIT

2014-10-31 Thread Dligach, Dmitriy
It’s already used quite extensively. Did you mean beyond what’s already done?

Dima




On Oct 31, 2014, at 7:48, Renaud Richardet ren...@apache.org wrote:

 Are there any plans and/or interest to move the cTAKES readers and engines
 to use uimaFIT?
 
 Thanks, Renaud



Re: sentence detector model

2014-09-29 Thread Dligach, Dmitriy
Maybe creating a made-up set of sentences would be an option? That way we could 
agree on the annotation of concrete cases. Although this would be more of a 
unit test than a corpus.

Dima




On Sep 27, 2014, at 12:15, Miller, Timothy 
timothy.mil...@childrens.harvard.edu wrote:

 I've just been using the opennlp command line cross validator on the small 
 dataset i annotated (along with some eyeballing). It would be cool if there 
 was a standard clinical resource available for this task, but I hadn't 
 considered it much because the data I annotated pulls from multiple datasets 
 and the process of  arranging with different institutions to make something 
 like that available would probably be a nightmare.
 Tim
 
 Sent from my iPad. Sorry about the typos.
 
 On Sep 27, 2014, at 12:16 PM, Dligach, Dmitriy 
 dmitriy.dlig...@childrens.harvard.edu wrote:
 
 Tim, thanks for working on this!
 
 Question: do we have some formal way of evaluating the sentence detector? 
 Maybe we should come up with some dev set that would include examples from 
 mimic...
 
 Dima
 
 
 
 
 On Sep 27, 2014, at 8:57, Miller, Timothy 
 timothy.mil...@childrens.harvard.edu wrote:
 
 I have been working on the sentence detector newline issue, training a 
 model to probabilistically split sentences on newlines rather than forcing 
 sentence breaks. I have checked in a model to the repo under 
 ctakes-core-res. I also attached a patch to ctakes-core to the jira issue:
 https://issues.apache.org/jira/browse/CTAKES-41
 
 for people to test. The status of my testing is that it doesn't seem to 
 break on notes where ctakes worked well before (those where newlines are 
 always sentence breaks), and is a slight improvement on notes where 
 newlines may or may not be sentence breaks. Once the change is checked in 
 we can continue improving the model by adding more data and features, but 
 the first hurdle I'd like to get past is making sure it runs well enough on 
 the type of data that the old model worked well on. Let me know if you have 
 any questions.
 
 Thanks
 Tim
 



Re: sentence detector model

2014-09-27 Thread Dligach, Dmitriy
Tim, thanks for working on this!

Question: do we have some formal way of evaluating the sentence detector? Maybe 
we should come up with some dev set that would include examples from mimic...

Dima




On Sep 27, 2014, at 8:57, Miller, Timothy 
timothy.mil...@childrens.harvard.edu wrote:

 I have been working on the sentence detector newline issue, training a model 
 to probabilistically split sentences on newlines rather than forcing sentence 
 breaks. I have checked in a model to the repo under ctakes-core-res. I also 
 attached a patch to ctakes-core to the jira issue:
 https://issues.apache.org/jira/browse/CTAKES-41
 
 for people to test. The status of my testing is that it doesn't seem to break 
 on notes where ctakes worked well before (those where newlines are always 
 sentence breaks), and is a slight improvement on notes where newlines may or 
 may not be sentence breaks. Once the change is checked in we can continue 
 improving the model by adding more data and features, but the first hurdle 
 I'd like to get past is making sure it runs well enough on the type of data 
 that the old model worked well on. Let me know if you have any questions.
 
 Thanks
 Tim



Re: Semantic similarity standard

2014-09-27 Thread Dligach, Dmitriy
This paper:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900161/

Mentions an evaluation on a dataset from Ted Pedersen. It might be one of the 
datasets listed here:

http://www.tc.umn.edu/~bthomson/sim/index.html

Dima




On Sep 27, 2014, at 12:26, John Green john.travis.gr...@gmail.com wrote:

 Does anyone know of a larger set of human generated measures for semantic 
 simalarity than the 500+ one cited by Vijay in his paper on semantic 
 simalarity? The paper cited was PMID 21347043.
 
 Jg
 —
 Sent from Mailbox



Re: uimafit and cleartk upgrades

2014-09-18 Thread Dligach, Dmitriy
Tim, thank you very much for working on this upgrade. I know it was a pretty 
serious effort.

Question: were there any particular changes in uimafit/cleartk that we should 
be aware of? Any major deviation from the way we typically used uimafit/cleartk?

Dima




On Sep 16, 2014, at 15:02, Miller, Timothy 
timothy.mil...@childrens.harvard.edu wrote:

 This mega-revision is now checked in. Jenkins will let me know if there
 are any major issues with compiling, but please feedback to the list if
 there is anything more subtle that was a mistake and we can try to work
 through it.
 
 Thanks
 Tim
 
 On 09/11/2014 03:41 PM, Miller, Timothy wrote:
 Just wanted to send a heads up to the list/community that I am working
 on upgrading ctakes to use most recent versions of Uimafit (2.1) and
 Cleartk (2.0). This will allow us to keep up with namespace changes in
 uimafit and new machine learning methods being used in cleartk (that we
 are also contributing to). However, this will entail a lot of small
 changes across the codebase, so I thought it would be good to give a bit
 of a warning. Specifically if anyone else has any large unchecked-in
 changes to the code svn may throw some conflicts at you.
 
 Tim
 
 



Re: uimafit and cleartk upgrades

2014-09-18 Thread Dligach, Dmitriy
Very useful, thanks!

Dima




On Sep 18, 2014, at 10:13, Miller, Timothy 
timothy.mil...@childrens.harvard.edu wrote:

 For Uimafit the biggest change is the namespace. There were also some
 minor changes to method names of the factory methods for readers and
 aes. The JCasIterable changed and now the JCasIterator is more like what
 the JCasIterable used to be.
 
 ConfigurationParameters now have mandatory=true by default. There were
 one or two places where I had to specify mandatory=false to allow for
 this to compile.
 
 XWriter and XWriterFileNamer were removed. There is a new utility class
 called CasIoUtil and there is also still a ctakes class called
 XmiWriterCasConsumerCtakes. CasIoUtil is a simple standalone utility
 class -- has methods that take an already processed JCas and a File to
 write to. XmiWriterCasConsumerCtakes is a consumer that goes on the end
 of a pipeline (similar to XWriter).
 
 Anyone interested in more detail on these issues can check out this link
 which I used to do the upgrade:
 https://cwiki.apache.org/confluence/display/UIMA/Migration+guide+1.x+to+2.x
 
 Cleartk changed many package names. The common feature interface
 SimpleFeatureExtractor was changed to FeatureExtractor1 (meaning, takes
 1 argument). There is also now FeatureExtractor2 which takes 2
 arguments, suitable for things like relation classification.
 
 Class names were standardized to prefer camel case conventions over
 library naming conventions (LibSVM = LibSvm, LIBLINEAR = LibLinear).
 
 There were some other similar things but I think those were the ones
 that concerned ctakes the most.
 
 Tim
 
 
 
 
 
 
 On 09/18/2014 10:37 AM, Dligach, Dmitriy wrote:
 Tim, thank you very much for working on this upgrade. I know it was a pretty 
 serious effort.
 
 Question: were there any particular changes in uimafit/cleartk that we 
 should be aware of? Any major deviation from the way we typically used 
 uimafit/cleartk?
 
 Dima
 
 
 
 
 On Sep 16, 2014, at 15:02, Miller, Timothy 
 timothy.mil...@childrens.harvard.edu wrote:
 
 This mega-revision is now checked in. Jenkins will let me know if there
 are any major issues with compiling, but please feedback to the list if
 there is anything more subtle that was a mistake and we can try to work
 through it.
 
 Thanks
 Tim
 
 On 09/11/2014 03:41 PM, Miller, Timothy wrote:
 Just wanted to send a heads up to the list/community that I am working
 on upgrading ctakes to use most recent versions of Uimafit (2.1) and
 Cleartk (2.0). This will allow us to keep up with namespace changes in
 uimafit and new machine learning methods being used in cleartk (that we
 are also contributing to). However, this will entail a lot of small
 changes across the codebase, so I thought it would be good to give a bit
 of a warning. Specifically if anyone else has any large unchecked-in
 changes to the code svn may throw some conflicts at you.
 
 Tim
 
 
 



Re: Preparing for an Apache cTAKES 3.2 Release?

2014-06-16 Thread Dligach, Dmitriy
+1

Dima




On Jun 16, 2014, at 9:42, Miller, Timothy 
timothy.mil...@childrens.harvard.edu wrote:

 Sorry to weigh in so late on this -- just returned from vacation. If we
 want to have a one release delay before making dictionary2 default for
 testing/documentation/configuration purposes, and there isn't an obvious
 function-related name, and the main difference is speed, maybe we could
 call it dictionary-lookup-fast? Besides being accurate and more
 descriptive than 2, it might lure people into trying it and give us
 some feedback.
 
 Tim
 
 
 On 06/16/2014 10:34 AM, Chen, Pei wrote:
 I'm making some significant updates to trunk that may cause some instability 
 for this release.
 It should be mostly transparent, but let me know if you encounter any issues 
 with trunk.
 
 Also, regarding the dictionary-lookup2.  If there are no strong objections, 
 we can leave default to as-is (old behavior).  Folks who wish to give the 
 new one a try are welcome to do so and we can change the default behavior in 
 a future release.
 
 [ducks for cover now]
 --Pei
 
 -Original Message-
 From: ksa...@gmail.com [mailto:ksa...@gmail.com] On Behalf Of Karthik
 Sarma
 Sent: Wednesday, June 11, 2014 9:58 AM
 To: dev@ctakes.apache.org
 Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
 
 Agreed
 
 On Wednesday, June 11, 2014, vijay garla vnga...@gmail.com wrote:
 
 regardless of the name, I think it would be incredibly helpful to have
 thorough documentation on the dictionary lookup, how to configure it,
 and how to create new dictionaries.  I would venture to say that this
 is the most important component in cTAKES, and probably the one that
 has generated the most questions on the newsgroup.
 
 
 
 On Wed, Jun 11, 2014 at 9:21 AM, Finan, Sean 
 sean.fi...@childrens.harvard.edu wrote:
 
 . The newer NER should have in its name the Behavior...
 I agree, but the *2 module is a complete replacement for the current
 lookup.  It does not (really) have any different behavior, just a
 different
 implementation and performance.  We plan to swap out the old with
 the new in the next release and get rid of the *2 suffix.  So, any
 name provided now is just temporary - unless people don't like the
 name dictionary-lookup at all.
 
 In my original sandbox it was named RareWordLookup, a nod to its
 implementation.  However, this doesn't help any users.
 
 Sean
 
 -Original Message-
 From: andy mcmurry [mailto:mcmurry.a...@gmail.com]
 Sent: Wednesday, June 11, 2014 3:09 AM
 To: dev@ctakes.apache.org
 Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
 
 2 doesn't mean much. The newer NER should have in its name the
 Behavior...
 
 Perhaps something like MetaMap Usage
 http://metamap.nlm.nih.gov/Docs/MM09_Usage.shtml --
 allow_overmatches
 or  --allow_concept_gaps or .other?
 
 Since yTex already provides a pluggable *DictionaryLookup, *that
 seems like the best place to define the differing Behavior /  Usage.
 
 https://cwiki.apache.org/confluence/display/CTAKES/User's+Guide
 https://code.google.com/p/ytex/wiki/DictionaryLookup_V05
 
 
 AndyMC
 
 On Tue, Jun 10, 2014 at 9:55 AM, britt fitch britt.fi...@gmail.com
 wrote:
 
 I don’t have an issue with the *-2 name. I also don’t have any
 objections to renaming it.
 
 It might be nice to keep the old dictionary code around for a
 release-worth of time but after that I would vote purging it.
 If someone needs it after that it’ll be accessible in the archived
 releases.
 
 
 
 On Jun 10, 2014, at 12:48 PM, Chen, Pei
 pei.c...@childrens.harvard.edu
 wrote:
 
 I think James has a fair point here.
 It may be worthwhile biting the bullet here and push forward.
 
 Since this essentially will be a full replacement of the
 ctakes-dictionary-lookup module, a good option maybe to just
 replace the entire module now and rename the existing module to *
 _deprecated.
 How do folks feel about that?  In a nutshell,
 ctakes-dictionary-lookup-2
 is a faster algorithm with a simpler code base- and comparable
 results (Sean has a full comparison in the documentation for those
 who are
 curious).
 --Pei
 
 -Original Message-
 From: britt fitch [mailto:britt.fi...@gmail.com]
 Sent: Monday, June 09, 2014 5:42 PM
 To: dev@ctakes.apache.org
 Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
 
 There is some documentation in the dictionary2 module under
 /doc/DictionaryLookupHelp.{txt | docx} that gives some some
 details of
 the
 different lookup implementation options within that module that
 I found helpful.
 
 
 On Jun 9, 2014, at 5:17 PM, Masanz, James J.
 
 
 
 --
 
 
 
 
 --
 Karthik Sarma
 UCLA Medical Scientist Training Program Class of 20??
 Member, UCLA Medical Imaging  Informatics Lab Member, CA Delegation
 to the House of Delegates of the American Medical Association
 ksa...@ksarma.com
 gchat: ksa...@gmail.com
 linkedin: www.linkedin.com/in/ksarma
 
 -- 
 Tim Miller
 Instructor
 Boston Children's Hospital and Harvard Medical School
 

Re: markable types

2014-05-16 Thread Dligach, Dmitriy
Weird… I sent this email days ago.

Dima




On May 16, 2014, at 16:16, Dligach, Dmitriy 
dmitriy.dlig...@childrens.harvard.edu wrote:

 Probably a good idea. Would this new type be related to IdentifiedAnnotation? 
 Its super type?
 
 Dima
 
 
 
 
 On May 15, 2014, at 3:02, Miller, Timothy 
 timothy.mil...@childrens.harvard.edu wrote:
 
 What do people think about taking the markable types out of the
 coreference project and adding them to the standard type system? This is
 a pretty standard concept in coreference that doesn't really have a
 great natural representation in the current type system -- it
 encompasses IdentifiedAnnotations as well as pronouns (It, him,
 her) and some determiners (this).
 
 The drawback I can see is that it is probably not something anyone would
 want extracted -- ultimately you want the actual coref pairs or chains.
 But it is useful for things like representing gold standard input or
 splitting coreference resolution into separate markable recognition and
 relation classification steps.
 
 Tim
 
 



Re: lvg entries

2014-04-17 Thread Dligach, Dmitriy
Tim, this is a very interesting observation. Could you please send a few 
examples of what LVG generates? Both sensical and non :)

Dima




On Apr 17, 2014, at 11:28, Miller, Timothy 
timothy.mil...@childrens.harvard.edu wrote:

 The LVG annotator creates an enormous number of lemmas for every
 WordToken in the CAS, and I'm wondering what the original purpose was? I
 think this is probably a minor bottleneck for speed but mostly a pretty
 big space hog (at least 50% of the space of xmi files in my tests).
 
 As of right now I'm not sure if any downstream components are using
 these lemmas, and on a manual inspection the precision seems to be
 pretty abysmal (meaning most of them are nonsensical as lexical
 variants), so as I said, just wondering if we can revisit why cTAKES
 generates so many and whether that component can be optimized.
 
 Thanks
 Tim
 



Re: lvg entries

2014-04-17 Thread Dligach, Dmitriy
I don’t know of any applications within cTAKES that make use of this… The 
reverse (mapping from these “variants” to the normal form) may be useful though.

Dima




On Apr 17, 2014, at 11:50, Miller, Timothy 
timothy.mil...@childrens.harvard.edu wrote:

 Sure, just as an example, I gave it a note with about 1000 words. It
 generates 11500 NonEmptyFSList elements (each is basically one lexical
 variant).
 
 For the word symptomatic, these are the first 10 of 20 lexical variants:
 Symptomaticer/JJ
 Symptomaticer/RB
 Symptomaticed/VB
 Symptomaticcing/VB
 Symptomatics/VB
 Symptomatics/NN
 Symptomaticked/VB
 Symptomatic/VB
 Symptomatic/JJ
 Symptomatic/RB
 
 Tim
 
 
 On 04/17/2014 12:31 PM, Dligach, Dmitriy wrote:
 Tim, this is a very interesting observation. Could you please send a few 
 examples of what LVG generates? Both sensical and non :)
 
 Dima
 
 
 
 
 On Apr 17, 2014, at 11:28, Miller, Timothy 
 timothy.mil...@childrens.harvard.edu wrote:
 
 The LVG annotator creates an enormous number of lemmas for every
 WordToken in the CAS, and I'm wondering what the original purpose was? I
 think this is probably a minor bottleneck for speed but mostly a pretty
 big space hog (at least 50% of the space of xmi files in my tests).
 
 As of right now I'm not sure if any downstream components are using
 these lemmas, and on a manual inspection the precision seems to be
 pretty abysmal (meaning most of them are nonsensical as lexical
 variants), so as I said, just wondering if we can revisit why cTAKES
 generates so many and whether that component can be optimized.
 
 Thanks
 Tim