Re: The SegmentRegexAnnotator of Ytex

2015-07-15 Thread vijay garla
Can you make sure you did everything documented here:
https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation
I can see from the stack trace that hibernate is not in the classpath (see
section 'Unzip YTEX Libraries')

Best,

VJ

On Tue, Jul 14, 2015 at 2:41 AM, Oranit Dror ora...@algotec.co.il wrote:

 Thank you, Vijay.
 However, I am still encountering with the crash.

 Best,
 Oranit.

 -Original Message-
 From: vijay garla [mailto:vnga...@gmail.com]
 Sent: Monday, July 13, 2015 5:53 PM
 To: dev@ctakes.apache.org
 Subject: Re: The SegmentRegexAnnotator of Ytex

 see https://cwiki.apache.org/confluence/display/CTAKES/User%27s+Guide

 best,

 vj

 On Mon, Jul 13, 2015 at 2:50 AM, Oranit Dror ora...@algotec.co.il wrote:

  Hello,
 
  I am using ctakes 3.2.2. and recently I have tried to apply the YTEX
  pipeline. Particularly, I am interested in the SegmentRegexAnnotator of
  Ytex.
 
  My questions are:
 
  1.   When running the pipeline, an
  org.apache.uima.resource.ResourceInitializationException is thrown,
  probably due to a failure in the initialization of
  org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator. Below is
 the
  stack trace.
 
  2.   Where can I find information on how the SegmentRegexAnnotator
  works, especially where the list of segments is defined.
 
  Thank you,
  Oranit.
 
 
  The stack trace for the Ytex pipeline crash:
 
  12 יול 2015 09:47:52 ERROR RunEngine - Failed to create AE from xml
  descriptor
 
 :E:/Data/Views/oranit_nlp/subprod1/nlp/java/algotec-nlp/desc/desc/algotec-nlp/desc/analysis_engine/AggregateDiseaseYtexUMLSProcessorDescriptor.xml
  org.apache.uima.resource.ResourceInitializationException: Initialization
  of annotator class
  org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator failed.
  (Descriptor: file:/E:/Program
 
 Files/apache-ctakes-3.2.2-rc2/desc/ctakes-ytex-uima/desc/analysis_engine/SegmentRegexAnnotator.xml)
 at
 
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:252)
 at
 
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:156)
 at
 
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
 at
 
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
 at
  org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
 at
 
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)
 at
 
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)
 at
 
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431)
 at
 
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
 at
 
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185)
 at
 
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
 at
 
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
 at
  org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
 at
 
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)
 at
 
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)
 at
 
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431)
 at
 
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
 at
 
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185)
 at
 
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
 at
 
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
 at
  org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
 at
 
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:354)
 at com.algotec.nlp.RunEngine.createCasObjects(RunEngine.java:1399)
 at com.algotec.nlp.RunEngine.ensureCasObjects(RunEngine.java:1373)
 at com.algotec.nlp.RunEngine.analyze(RunEngine.java:954)
 at
 
 com.algotec.nlp.servlet.ReportNLPServlet.doPost(ReportNLPServlet.java:128)
 at
 
 com.algotec.nlp.servlet.ReportNLPServlet.doPost(ReportNLPServlet.java:103)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java

Re: The SegmentRegexAnnotator of Ytex

2015-07-13 Thread vijay garla
see https://cwiki.apache.org/confluence/display/CTAKES/User%27s+Guide

best,

vj

On Mon, Jul 13, 2015 at 2:50 AM, Oranit Dror ora...@algotec.co.il wrote:

 Hello,

 I am using ctakes 3.2.2. and recently I have tried to apply the YTEX
 pipeline. Particularly, I am interested in the SegmentRegexAnnotator of
 Ytex.

 My questions are:

 1.   When running the pipeline, an
 org.apache.uima.resource.ResourceInitializationException is thrown,
 probably due to a failure in the initialization of
 org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator. Below is the
 stack trace.

 2.   Where can I find information on how the SegmentRegexAnnotator
 works, especially where the list of segments is defined.

 Thank you,
 Oranit.


 The stack trace for the Ytex pipeline crash:

 12 יול 2015 09:47:52 ERROR RunEngine - Failed to create AE from xml
 descriptor
 :E:/Data/Views/oranit_nlp/subprod1/nlp/java/algotec-nlp/desc/desc/algotec-nlp/desc/analysis_engine/AggregateDiseaseYtexUMLSProcessorDescriptor.xml
 org.apache.uima.resource.ResourceInitializationException: Initialization
 of annotator class
 org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator failed.
 (Descriptor: file:/E:/Program
 Files/apache-ctakes-3.2.2-rc2/desc/ctakes-ytex-uima/desc/analysis_engine/SegmentRegexAnnotator.xml)
at
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:252)
at
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:156)
at
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at
 org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
at
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)
at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)
at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431)
at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185)
at
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at
 org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
at
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)
at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)
at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431)
at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185)
at
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at
 org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
at
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:354)
at com.algotec.nlp.RunEngine.createCasObjects(RunEngine.java:1399)
at com.algotec.nlp.RunEngine.ensureCasObjects(RunEngine.java:1373)
at com.algotec.nlp.RunEngine.analyze(RunEngine.java:954)
at
 com.algotec.nlp.servlet.ReportNLPServlet.doPost(ReportNLPServlet.java:128)
at
 com.algotec.nlp.servlet.ReportNLPServlet.doPost(ReportNLPServlet.java:103)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
 org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)
at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
 

Re: Question about YTEX

2015-06-12 Thread vijay garla
You need to annotate some documents with a Collection Processing Engine
that stores results in the YTEX database.   I suggest walking through the
fracture demo sample here:
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2.0+-+YTEX+DBCollectionReader

-vj

On Fri, Jun 12, 2015 at 6:46 PM, Tsung-Ting Kuo ts...@ucsd.edu wrote:

 Hi Vijay,



 I would like to update my questions:



 (1)   *Auto-complete of concept IDs. *I realized that only concepts in
 the “v_snomed_fword_lookup” table are usable (e.g., I use “cough” and the
 auto-complete works as attached).



 (2)   *Clinical document searching. *Which table should I use to put the
 clinical documents in order to be searchable from the “Semantic Search”
 function?



 (3)   *Fracture demo. *I saw there is a “fracture_demo” table in the YTEX
 database, how could I use it in the “Semantic Search” function?



 Thanks very much!



 Best regards,

 Tim



 *From:* Tsung-Ting Kuo [mailto:ts...@ucsd.edu]
 *Sent:* Thursday, June 11, 2015 11:10 AM
 *To:* 'Vijay Garla'
 *Cc:* dev@ctakes.apache.org

 *Subject:* RE: Question about YTEX



 Thanks a lot, I followed the instructions and run “ytexweb.sh” script and
 successfully run the web app! However, the search cannot return any results
 (no matter what keyword I type), and the auto-complete function seems not
 work. Our experimental website is over here:



 http://textmining.ucsd.edu:8080/semanticSearch.iface



 And I also attached the screenshot and the output of the “ytexweb.sh”
 script. Could you kindly help to see if there is any setting /
 configuration needed to be modified in order to do the search?



 Thanks very much!



 Best regards,

 Tim



 *From:* Vijay Garla [mailto:vijay.ga...@yale.edu vijay.ga...@yale.edu]
 *Sent:* Wednesday, June 10, 2015 11:36 PM
 *To:* ts...@ucsd.edu
 *Cc:* Vijay Garla; dev@ctakes.apache.org
 *Subject:* Re: Question about YTEX



 This is part of the Semantic Similarity Web App.  There is a script to
 start it; see
 https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2.0+-+Semantic+Similarity
 - just run the ytexweb.sh script in the bin directory





 On Wed, Jun 10, 2015 at 11:32 PM, Tsung-Ting Kuo ts...@ucsd.edu wrote:

 Thanks very much, this is really helpful! I surely can see the
 “v_snomed_fword_lookup” table in my database!



 So my next question is: would there be any document to tell me how to
 setup the Clinical NLP Semantic Search Engine on Tomcat (actually this is
 the main purpose why I am seeking for YTEX’s help)? I saw
 “ctakes-ytex-web-3.2.2-classes.jar” in the “lib” folder, but I am not sure
 how to deploy it as a web service.



 Best regards,

 Tim





 *From:* Vijay Garla [mailto:vijay.ga...@yale.edu]
 *Sent:* Wednesday, June 10, 2015 12:10 PM
 *To:* ts...@ucsd.edu
 *Cc:* Vijay Garla; dev@ctakes.apache.org


 *Subject:* Re: Question about YTEX



 You should see a v_snomed_fword_lookup table in your database.



 YTEX doesn't use all UMLS.  It filters to SNOMED-CT and RXNORM, and
 specific Semantic Types



 The script is uses is here:
 https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex/scripts/data/mysql/umls/insert_view.template.sql
 https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk_ctakes-2Dytex_scripts_data_mysql_umls_insert-5Fview.template.sqld=AwMFaQc=-dg2m7zWuuDZ0MUcV7Sdqwr=rw3vAb56jh8xMYlMHfE0hS2hfbV3RFxqvusA5jfoLncm=ZKgA04a0yvzxPGVaJuAq7iXhdrZStPNaFJYLNpysn_As=A4kHSk5EMA2mDAk0zkZPlkPEDsxFWpazfC8SSMzSYDoe=



 You can create a dictionary with different vocabularies/semantic types
 very easily (just change the sql slightly).



 For more detail on how this works, see
 https://code.google.com/p/ytex/wiki/DictionaryLookup_V07
 https://urldefense.proofpoint.com/v2/url?u=https-3A__code.google.com_p_ytex_wiki_DictionaryLookup-5FV07d=AwMFaQc=-dg2m7zWuuDZ0MUcV7Sdqwr=rw3vAb56jh8xMYlMHfE0hS2hfbV3RFxqvusA5jfoLncm=ZKgA04a0yvzxPGVaJuAq7iXhdrZStPNaFJYLNpysn_As=ZmRyuqnM-9FtXw9jKlfrp29gqd-_kd6rjuGSEVEXGgMe=
 (docs slightly out of date).



 I have not tested this with the 'fast' dictionary lookup, but I think the
 main speed gain of the fast lookup is due to skipping an extra(neous)
 database query.



 HTH,



 VJ



 On Wed, Jun 10, 2015 at 8:15 PM, Tsung-Ting Kuo ts...@ucsd.edu wrote:

 BTW, my YTEX installation just completed, and the results is attached.
 Does my YTEX installation create dictionary lookup table with all concepts
 from the UMLS successfully?



 Thanks very much!



 Best regards,

 Tim



 *From:* Tsung-Ting Kuo [mailto:ts...@ucsd.edu]
 *Sent:* Wednesday, June 10, 2015 9:49 AM
 *To:* 'Vijay Garla'; dev@ctakes.apache.org
 *Subject:* RE: Question about YTEX



 Hi Vijay,



 You are absolutely right – after changing the permission of the “resource”
 directory, YTEX start installing without problem! In the meanwhile, I have
 a quick question: since I have installed the UMLS database and set the
 “umls.schema” to UMLS database name (I am using MySQL), how do I know

Re: Question about YTEX

2015-06-11 Thread Vijay Garla
This is part of the Semantic Similarity Web App.  There is a script to
start it; see
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2.0+-+Semantic+Similarity
- just run the ytexweb.sh script in the bin directory


On Wed, Jun 10, 2015 at 11:32 PM, Tsung-Ting Kuo ts...@ucsd.edu wrote:

 Thanks very much, this is really helpful! I surely can see the
 “v_snomed_fword_lookup” table in my database!



 So my next question is: would there be any document to tell me how to
 setup the Clinical NLP Semantic Search Engine on Tomcat (actually this is
 the main purpose why I am seeking for YTEX’s help)? I saw
 “ctakes-ytex-web-3.2.2-classes.jar” in the “lib” folder, but I am not sure
 how to deploy it as a web service.



 Best regards,

 Tim





 *From:* Vijay Garla [mailto:vijay.ga...@yale.edu]
 *Sent:* Wednesday, June 10, 2015 12:10 PM
 *To:* ts...@ucsd.edu
 *Cc:* Vijay Garla; dev@ctakes.apache.org

 *Subject:* Re: Question about YTEX



 You should see a v_snomed_fword_lookup table in your database.



 YTEX doesn't use all UMLS.  It filters to SNOMED-CT and RXNORM, and
 specific Semantic Types



 The script is uses is here:
 https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex/scripts/data/mysql/umls/insert_view.template.sql
 https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk_ctakes-2Dytex_scripts_data_mysql_umls_insert-5Fview.template.sqld=AwMFaQc=-dg2m7zWuuDZ0MUcV7Sdqwr=rw3vAb56jh8xMYlMHfE0hS2hfbV3RFxqvusA5jfoLncm=ZKgA04a0yvzxPGVaJuAq7iXhdrZStPNaFJYLNpysn_As=A4kHSk5EMA2mDAk0zkZPlkPEDsxFWpazfC8SSMzSYDoe=



 You can create a dictionary with different vocabularies/semantic types
 very easily (just change the sql slightly).



 For more detail on how this works, see
 https://code.google.com/p/ytex/wiki/DictionaryLookup_V07
 https://urldefense.proofpoint.com/v2/url?u=https-3A__code.google.com_p_ytex_wiki_DictionaryLookup-5FV07d=AwMFaQc=-dg2m7zWuuDZ0MUcV7Sdqwr=rw3vAb56jh8xMYlMHfE0hS2hfbV3RFxqvusA5jfoLncm=ZKgA04a0yvzxPGVaJuAq7iXhdrZStPNaFJYLNpysn_As=ZmRyuqnM-9FtXw9jKlfrp29gqd-_kd6rjuGSEVEXGgMe=
 (docs slightly out of date).



 I have not tested this with the 'fast' dictionary lookup, but I think the
 main speed gain of the fast lookup is due to skipping an extra(neous)
 database query.



 HTH,



 VJ



 On Wed, Jun 10, 2015 at 8:15 PM, Tsung-Ting Kuo ts...@ucsd.edu wrote:

 BTW, my YTEX installation just completed, and the results is attached.
 Does my YTEX installation create dictionary lookup table with all concepts
 from the UMLS successfully?



 Thanks very much!



 Best regards,

 Tim



 *From:* Tsung-Ting Kuo [mailto:ts...@ucsd.edu]
 *Sent:* Wednesday, June 10, 2015 9:49 AM
 *To:* 'Vijay Garla'; dev@ctakes.apache.org
 *Subject:* RE: Question about YTEX



 Hi Vijay,



 You are absolutely right – after changing the permission of the “resource”
 directory, YTEX start installing without problem! In the meanwhile, I have
 a quick question: since I have installed the UMLS database and set the
 “umls.schema” to UMLS database name (I am using MySQL), how do I know
 whether YTEX successfully creates a dictionary lookup table with all
 concepts from the UMLS or not?



 Thanks again!



 Best regards,

 Tim



 *From:* Vijay Garla [mailto:vijay.ga...@yale.edu vijay.ga...@yale.edu]
 *Sent:* Tuesday, June 9, 2015 11:31 PM
 *To:* ts...@ucsd.edu; dev@ctakes.apache.org
 *Cc:* vijay.ga...@yale.edu
 *Subject:* Re: Question about YTEX



 Hi Tsung-Ting,



 I see the following error:



 templateToConfig.extractTemplates:

  [echo] unpacking ytex templates from
 /usr/local/apache-ctakes-3.2.2/lib/ctakes-ytex-res-3.2.2.jar to
 /usr/local/apache-ctakes-3.2.2/resources

 [unzip] Expanding:
 /usr/local/apache-ctakes-3.2.2/lib/ctakes-ytex-res-3.2.2.jar into
 /usr/local/apache-ctakes-3.2.2/resources

 [unzip] Unable to expand to file
 /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/ytex/conceptGraph/sct-rxnorm.template.xml

 [unzip] Unable to expand to file
 /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/ytex/conceptGraph/sct-msh-csp-aod.template.xml

 [unzip] Unable to expand to file
 /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/ytex/conceptGraph/sct-umls.template.xml

 [unzip] Unable to expand to file
 /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/ytex/umls/model/UMLS.hbm.template.xml

 [unzip] Unable to expand to file
 /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/ytex/dictionary/lookup/LookupDesc_stem_SNOMED.template.xml

 [unzip] Unable to expand to file
 /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/ytex/dictionary/lookup/LookupDesc_SNOMED.template.xml



 I don't know why there was an issue extracting ctakes-ytex-res-3.2.2.jar.
 Can you make sure that  /usr/local/apache-ctakes-3.2.2/resources exists and
 is writable?



 -vj



 On Wed, Jun 10, 2015 at 12:17 AM, Tsung-Ting Kuo ts...@ucsd.edu wrote:

 Hi Vijay Garla,



 I am Tsung-Ting Kuo from UCSD

Re: cTakes polarity problem

2015-01-02 Thread vijay garla
As guergana mentioned ctakes has a rule based negation detection module.
In addition ytex adds a negex based analysis engine.  Both approaches are
very sensitive to sentence splitting (see previous threads on alternative
sentence splitters).

An additional advantage of rule based negation is you don't need some of
the memory  cpu intensive analysis engines required by the ml-based
negation detection ae.

Hth

Vj

On Thursday, January 1, 2015, John Green john.travis.gr...@gmail.com
wrote:

 As I was reading this thread I had the same thought as Tim, perhaps a
 combination. It seems over the perfect training corpus this wouldnt be
 necessary, but perhaps as a stop gap the ensemble approach for some using
 your training data but working in a diff corpus (not that I really have the
 time to write anything here, just spit balling bc its an interesting
 thread). Im still bootstrapping myself in ML so I may not have followed
 David's reasoning perfectly, but couldn't a simple approach be that
 anything that isnt negated by the new algo get passed to negex as a fall
 back? I think that was what you were saying Tim.

 One area that I can comment on in a more meaningful way would be chiming in
 on Tim's remarks regarding the legitimacy of the phrase Deny hepatitis: I
 agree, my clinical intuition says it's an unlikely phrase. More probable
 would be it was a typo; Negative for hepatitis would be more reasonable
 after, say, serology for HepB markers, though strictly speaking this would
 be less likely to be in a phrase reporting results of just that specific
 test (this would more likely be something a long the lines of hep panel
 negative or simply the the labs were unremarkable. However, I could see
 this phrase in something like the std screen was negative for hep but
 positive for hiv.

 The latter is definitely just one clinical opinion, people talk all kinds
 of ways on the wards, good and bad, and it ends up in their notes too.

 Best,
 JG

 On Wed, Dec 31, 2014 at 12:32 PM, David Kincaid kincaid.d...@gmail.com
 javascript:;
 wrote:

  Tim, I like your idea of a hybrid approach. I've thought about trying a
  hybrid approach in the past myself, but haven't had a chance to try it or
  seen any papers on it. It seems you could do it by either treating the
  NegEx output simply as a feature in the ML model or combining the output
 of
  NegEx and the ML model as an ensemble of sorts. The former would probably
  have the problem of the NegEx feature overwhelming any other features
  since it would be right most of the time. If I were doing it I think I'd
  start with the latter approach.
 
  In any event, it seems like right now people will need to see how the two
  systems (NegEx and ML) work on their particular data and go with
 whichever
  is best.
 
  - Dave
 
  On Wed, Dec 31, 2014 at 10:40 AM, Miller, Timothy 
  timothy.mil...@childrens.harvard.edu javascript:; wrote:
 
   Hi Michael,
   I'm somewhat sympathetic to that opinion. But we did a bunch of
   experiments and it seemed to us that negex was too hand-tailored for a
   specific dataset and that our new module did better across datasets and
   overall. The tradeoff is that it is harder to improve and it sometimes
   gives unexpected results on the kind of inputs people input by hand for
   preliminary testing. That is a tradeoff people will have to consider
 and
   like Guergana said, the rule-based module is still part of cTAKES.
   (FWIW, I believe it is possible to engineer examples that make Negex
   fail in unintuitive ways as well.) If you are interested in these
   experiments please check out our paper in Plos One where we look at the
   difficulty of the polarity problem, specifically porting systems to new
   domains:
  
 
 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0112774
  
   I've been wondering if some hybrid approach might be useful. For
   example, maybe a system that runs the ML module and Negex and adds in
   all the recalled negated terms that Negex finds over and above the ML.
   This would probably fix some of the issues with test sentences but does
   not solve the problem of being hard to debug. Another possibility is
   using a more transparent ML method like decision trees or something.
  
   Tim
  
  
  
  
  
   On 12/31/2014 11:22 AM, Michael J Gurley wrote:
I think this demonstrates that machine learning is not the right
  approach
to the negation/polarity problem.
   
   
Michael Gurley
m-gur...@northwestern.edu javascript:;
312 925 3268
Northwestern University Clinical and Translational Sciences Institute
(NUCATS)
http://www.nucats.northwestern.edu
Rubloff Building
750 N Lake Shore Drive, 11th Floor
Chicago, IL 60611
   
   
   
   
   
   
   
On 12/31/14 9:13 AM, Miller, Timothy
timothy.mil...@childrens.harvard.edu javascript:; wrote:
   
Hi Yu,
   
The new polarity module is machine-learning based so it is not
 always
easy to 

Re: YTEX semantic similarity concept graph questions

2014-10-16 Thread vijay garla
I don't know what the difference between PAR/CHD (parent/child) and RB/RN
(broader/narrower) is supposed to be.  some umls source vocabularies use
PAR/CHD only/predominantly (e.g. SNOMED-CT), others use RB/RN (e.g.
RXNORM).  You can use and experiment with whatever relationships you want
(I think there might be part of/contains relationships too).

the concept graph is a directed acyclic graph, and the query should return
parent-child edges (or maybe the other way around, not sure).  If your
query uses e.g. rel in ('PAR', 'CHD'), you will return edges going both
directions.  This shouldn't cause any problems, as we discard edges that
induce cycles, but it will create a bunch of overhead for no gain.

If you look at other concept graph configs, e.g.
https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/conceptGraph/sct-rxnorm.template.xml,
you will see that we use both PAR  RB relationships.

HTH,

VJ





On Thu, Oct 16, 2014 at 2:58 AM, John Green john.travis.gr...@gmail.com
wrote:

 Hope this finds everyone well.

 It is not immediately clear to me why

 select distinct cui1, cui2
 from umls.MRREL
 where sab in ('SNOMEDCT')
 and rel in ('PAR')
 order by cui1, cui2

 would only be selecting the relationship (REL) of PAR. Im not sure the
 selection criteria. This is honestly probably directed mostly at Vijay, but
 anyone else with experience in this domain would be a welcome voice. In the
 paper on YTEX, for instance, PAR and RB are chosen for UMLS. Why? Does this
 have to do with the flattening or orphaning that UMLS does to the
 vocabularies it includes? Why not PAR, RB, and RN? Why not more? Was this a
 computational (speed/memory) consideration, or a functional one that my
 lack of familiarity to the domain is keeping me from seeing.

 Im posting this fairly specific question to the Dev because it directly
 relates to building YTEX concept graphs, which is a functionality of our
 distro here.

 Best!
 JG



Re: NPE with ytex in ctakes 3.2.0

2014-10-14 Thread vijay garla
The error is caused by not finding the required properties files/xml config
files.
There are some issues with the ytex setup scripts for the 3.2 release; I
have fixed that in trunk.  I am updating the 3.2 installation guide with
the patched setup scripts.

It's not clear to me if you're running from a dev environment/eclipse, or
running from the ctakes distro.
If running from a development environment, see
https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex/README
If running from the ctakes distro, make sure you follow the ytex setup:
https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation

For the dev environment, the xml config file is in the ctakes-ytex-res (
https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/uima/beanRefContext.xml
)
For the binary distro, the xml config files are in
lib\ctakes-ytex-res-3.2.0.jar


-vj


On Fri, Oct 10, 2014 at 10:56 PM, David Kincaid kincaid.d...@gmail.com
wrote:

 I don't have that file anywhere either. Where do I get it from?

 On Fri, Oct 10, 2014 at 3:53 PM, Chen, Pei pei.c...@childrens.harvard.edu
 
 wrote:

  I think it’s in ctakes-ytex-res.jar (is that in your classpath)?
  This is just a guess… vj may have a better idea if it still doesn’t work
  for you.
 
  From: David Kincaid [mailto:kincaid.d...@gmail.com]
  Sent: Friday, October 10, 2014 4:51 PM
  To: u...@ctakes.apache.org
  Subject: Re: NPE with ytex in ctakes 3.2.0
 
  No. I have no file named beanRefContext.xml anywhere on my hard drive.
 
 
 
  On Fri, Oct 10, 2014 at 3:45 PM, Chen, Pei 
 pei.c...@childrens.harvard.edu
  mailto:pei.c...@childrens.harvard.edu wrote:
  I’m not too familiar with the ytex component,
  but my guess is that the ytexApplicationContext bean is null?
  It seems that it would be expected to be in the
  classpath*:org/apache/ctakes/ytex/uima/beanRefContext.xml?  Do those
 exists?
 
  From: David Kincaid [mailto:kincaid.d...@gmail.commailto:
  kincaid.d...@gmail.com]
  Sent: Friday, October 10, 2014 4:23 PM
  To: u...@ctakes.apache.orgmailto:u...@ctakes.apache.org
  Subject: NPE with ytex in ctakes 3.2.0
 
  I'm trying to experiment the ytex in 3.2.0. Trying to run
  AggregatePlaintextUMLSProcessor with the FilesInDirectoryCollectionReader
  and FileWriterCASConsumer. When I try to run it against some text files
 it
  blows up with a null pointer exception during initialization. Here's the
  relevant part of the stack trace. Anyone have any ideas what I might have
  wrong?:
 
  Caused by: org.apache.uima.resource.ResourceInitializationException:
  Initialization of annotator class
  org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator failed.
  (Descriptor:
 
 file:/home/davek/apps/apache-ctakes-3.2.0/desc/ctakes-ytex-uima/desc/analysis_engine/SegmentRegexAnnotator.xml)
at
 
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:252)
at
 
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:156)
at
 
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at
 
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at
  org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
at
 
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)
at
 
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)
at
 
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431)
at
 
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
at
 
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185)
at
 
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at
 
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at
  org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
at
  org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:314)
at
 
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:425)
at
 
 org.apache.uima.collection.impl.cpm.container.CPEFactory.produceIntegratedCasProcessor(CPEFactory.java:1088)
... 9 more
  Caused by: java.lang.NullPointerException
at
 
 org.apache.ctakes.ytex.uima.ApplicationContextHolder.getApplicationContext(ApplicationContextHolder.java:79)
at
 
 

Re: YTEX Semantic Sim RESTful

2014-10-14 Thread vijay garla
Hi John,

Looking at the code, that error is due to the concept graph 'umls' not
being loaded.  by default, ytex is configured to use the sct-rxnorm concept
graph.

Can you see if this works:
http://localhost:8080/services/rest/similarity?conceptGraph=sct-rxnormconcept1=C0018787concept2=C0024109metrics=LCH,INTRINSIC_LCH

To set the concept graph name set the ytex.conceptGraphName in
resources/org/apache/ctakes/ytex/ytex.properties

If not, there may be an issue in a config file; if you get another NPE,
please
* copy
https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/web/beans-kernel-simweb.xml
to CTAKES_HOME/resources/org/apache/ctakes/ytex/web/
* copy
https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-distribution/src/main/bin/ytexweb.bat
to CTAKES_HOME/bin

HTH!

Vijay

On Tue, Oct 14, 2014 at 1:51 PM, John Green john.travis.gr...@gmail.com
wrote:

 Good idea Kim! Unfortunately, that wasn't it. Ill admit, though, I hadnt
 looked at that variable yet.

 Thanks for your help,
 JG

 On Mon, Oct 13, 2014 at 6:28 PM, Kim Ebert 
 kim.eb...@perfectsearchcorp.com
 wrote:

  Perhaps your JVM is running out of heap? I've noticed that when I run
  out of heap, cTakes tends to behave erratically.
 
  Kim Ebert
  1.801.669.7342
  Perfect Search Corp
  http://www.perfectsearchcorp.com/
 
  On 10/13/2014 09:29 AM, John Green wrote:
   I've been putting off debugging this as it was a piece of this app Im
   working on, but one that fit in down the road in development.
 Development
   has progressed, and here I am. I have posted this one before, was
 hoping
  to
   find fresh help.
  
   When running ytex.sh in a distro installed at something like
   ./ctakes3.2.0/apache-ctakes-3.1.2-SNAPSHOT/bin$ under Ubuntu 14 and
  trying
   to access the restful interface per the docs on a query like such as
  
 
 http://localhost:8080/similarity?conceptGraph=umlsconcept1=C0018787concept2=C0024109metrics=LCH,INTRINSIC_LCH
   the query fails with a 500 (see below).
  
   Of note, the http://localhost:8080/semanticSim.jsf works just fine.
  
   Am I missing something simple?
  
   Thanks for any and all help,
   Best,
   JG
  
   500 error:
   HTTP ERROR 500
  
   Problem accessing /services/rest/similarity. Reason:
  
   Server Error
  
   Caused by:
  
   java.lang.RuntimeException: org.apache.cxf.interceptor.Fault
 at
 
 org.apache.cxf.interceptor.AbstractFaultChainInitiatorObserver.onMessage(AbstractFaultChainInitiatorObserver.java:116)
 at
 
 org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:333)
 at
 
 org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
 at
 
 org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:239)
 at
 
 org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:248)
 at
 
 org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:222)
 at
 
 org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:153)
 at
 
 org.apache.cxf.transport.servlet.CXFNonSpringServlet.invoke(CXFNonSpringServlet.java:167)
 at
 
 org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:286)
 at
 
 org.apache.cxf.transport.servlet.AbstractHTTPServlet.doGet(AbstractHTTPServlet.java:211)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
 at
 
 org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:262)
 at
  org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:698)
 at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:526)
 at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
 at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:568)
 at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
 at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1105)
 at
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:453)
 at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183)
 at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1039)
 at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
 at
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:201)
 at
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
 at
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
 at org.eclipse.jetty.server.Server.handle(Server.java:445)
 at
  

Re: sentence detector model

2014-09-29 Thread vijay garla
Why not use the i2b2 corpora?

On Monday, September 29, 2014, Dligach, Dmitriy 
dmitriy.dlig...@childrens.harvard.edu wrote:

 Maybe creating a made-up set of sentences would be an option? That way we
 could agree on the annotation of concrete cases. Although this would be
 more of a unit test than a corpus.

 Dima




 On Sep 27, 2014, at 12:15, Miller, Timothy 
 timothy.mil...@childrens.harvard.edu javascript:; wrote:

  I've just been using the opennlp command line cross validator on the
 small dataset i annotated (along with some eyeballing). It would be cool if
 there was a standard clinical resource available for this task, but I
 hadn't considered it much because the data I annotated pulls from multiple
 datasets and the process of  arranging with different institutions to make
 something like that available would probably be a nightmare.
  Tim
 
  Sent from my iPad. Sorry about the typos.
 
  On Sep 27, 2014, at 12:16 PM, Dligach, Dmitriy 
 dmitriy.dlig...@childrens.harvard.edu javascript:; wrote:
 
  Tim, thanks for working on this!
 
  Question: do we have some formal way of evaluating the sentence
 detector? Maybe we should come up with some dev set that would include
 examples from mimic...
 
  Dima
 
 
 
 
  On Sep 27, 2014, at 8:57, Miller, Timothy 
 timothy.mil...@childrens.harvard.edu javascript:; wrote:
 
  I have been working on the sentence detector newline issue, training a
 model to probabilistically split sentences on newlines rather than forcing
 sentence breaks. I have checked in a model to the repo under
 ctakes-core-res. I also attached a patch to ctakes-core to the jira issue:
  https://issues.apache.org/jira/browse/CTAKES-41
 
  for people to test. The status of my testing is that it doesn't seem
 to break on notes where ctakes worked well before (those where newlines are
 always sentence breaks), and is a slight improvement on notes where
 newlines may or may not be sentence breaks. Once the change is checked in
 we can continue improving the model by adding more data and features, but
 the first hurdle I'd like to get past is making sure it runs well enough on
 the type of data that the old model worked well on. Let me know if you have
 any questions.
 
  Thanks
  Tim
 




Re: org.apache.ctakes.ytex.umls.dao.UMLSDaoTest

2014-08-25 Thread vijay garla
That is an expected error having to do with the fact that UMLS isn't
installed in the test database that get's fired up for unit tests.  That is
actually a warning (and should be interpreted as an error only if you do
have UMLS set up)


On Mon, Aug 25, 2014 at 9:02 PM, Pei Chen chen...@apache.org wrote:

 Hi VJ,
 While on the subject of unit tests-

 I didn't get a chance to dig deeper and was hoping you would know the
 cause of this unit test failure:  mvn clean install

 2014-08-25 13:33:50,830 WARN  net.sf.ehcache.CacheManager  - Creating
 a new instance of CacheManager using the diskStorePath
 /var/folders/qc/d7xd4zzs0_xcybv88skt5_7mgn/T/ which is already
 used by an existing CacheManager.

 The source of the configuration was

 net.sf.ehcache.config.generator.ConfigurationSource$InputStreamConfigurationSource@7433a719.

 The diskStore path for this CacheManager will be set to

 /var/folders/qc/d7xd4zzs0_xcybv88skt5_7mgn/T//ehcache_auto_created_1408988030830.

 To avoid this warning consider using the CacheManager factory methods
 to create a singleton CacheManager or specifying a separate ehcache
 configuration (ehcache.xml) for each CacheManager instance.

 2014-08-25 13:33:51,082 WARN
 org.hibernate.engine.jdbc.spi.SqlExceptionHelper  - SQL Error: 62,
 SQLState: S0010

 2014-08-25 13:33:51,082 ERROR
 org.hibernate.engine.jdbc.spi.SqlExceptionHelper  - Unknown JDBC
 escape sequence: {{db.schema}.MRCONSO mrconso0_ where mrconso0_.aui?
 and length(mrconso0_.aui)0 and length(mrconso0_.str)200 and
 mrconso0_.lat='ENG' order by mrconso0_.aui

 2014-08-25 13:33:51,085 WARN
 org.apache.ctakes.ytex.umls.dao.UMLSDaoTest  - sql exception - mrconso
 probably doesn't exist, check error

 org.hibernate.exception.SQLGrammarException: could not prepare statement

 at
 org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:123)

 at
 org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:49)

 at
 org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:125)

 at
 org.hibernate.engine.jdbc.internal.StatementPreparerImpl$StatementPreparationTemplate.prepareStatement(StatementPreparerImpl.java:188)

 at
 org.hibernate.engine.jdbc.internal.StatementPreparerImpl.prepareQueryStatement(StatementPreparerImpl.java:159)

 at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1859)

 at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1836)

 at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1816)

 at org.hibernate.loader.Loader.doQuery(Loader.java:900)

 at
 org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:342)

 at org.hibernate.loader.Loader.doList(Loader.java:2526)

 at org.hibernate.loader.Loader.doList(Loader.java:2512)

 at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2342)

 at org.hibernate.loader.Loader.list(Loader.java:2337)

 at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:495)

 at
 org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:357)

 at
 org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:195)

 at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1269)

 at org.hibernate.internal.QueryImpl.list(QueryImpl.java:101)

 at
 org.apache.ctakes.ytex.umls.dao.UMLSDaoImpl.getAllAuiStr(UMLSDaoImpl.java:106)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)

 at
 org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:319)

 at
 org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)

 at
 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)

 at
 org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:110)

 at
 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)

 at
 org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90)

 at
 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)

 at
 org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)

 at com.sun.proxy.$Proxy11.getAllAuiStr(Unknown Source)

 at
 org.apache.ctakes.ytex.umls.dao.UMLSDaoTest.testGetAllAuiStr(UMLSDaoTest.java:53)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

Re: Exporting YTEX Pipeline

2014-07-30 Thread vijay garla
Can you try this:
copy
https://code.google.com/p/ytex/source/browse/trunk/workspace/examples/fracture/cui/export.template.xml
to CTAKES_HOME\desc\ctakes-ytex\fracture\cui.xml
replace %DB_SCHEMA% with your database schema name (value of db.schema in
your ytex.properties file)

Then from a command prompt, execute the following commands:
cd CTAKES_HOME
bin\setenv.bat
java -cp %CLASSPATH%
-Dlog4j.configuration=file:/%CTAKES_HOME%/config/log4j.xml -Xmx256m
org.apache.ctakes.ytex.kernel.SparseDataExporterImpl -prop
desc\ctakes-ytex\fracture\cui.xml -type weka

Tell me if you run into any issues.

I will add this to the ctakes confluence doc.

Best,

VJ


On Wed, Jul 30, 2014 at 5:11 PM, Clayton Turner caturn...@g.cofc.edu
wrote:

 Hi, I'm trying to export the data I get from running the pipeline through
 the Collection Processing Engine.

 I set up the pipeline where I have a directory where all the XML is output
 to, but I am having issues at this point.

 I've tried using the built in Exporter from the Data Mining section on this
 page https://cwiki.apache.org/confluence/display/CTAKES/User%27s+Guide but
 those notes are out of date. Even altering directories to match the files
 still gives me errors about not being able to find the ExporterImpl class.
 The class version of this file only exists outside of the target directory
 for the ctakes snapshot and attempting to use it still fails.

 I then ventured to here:

 https://code.google.com/p/ytex/source/browse/#svn%2Ftrunk%2Fworkspace%2Fexamples%2Ffracture

 The files here match up to the data mining section from the previous link -
 so I created my export.xml file and changed everything that needed to be
 changed for my example (tried to even run bone fracture), but I cannot get
 data exported, no matter what I do.

 Is there a way to use some new(er) implementation of the
 SparseDataExporterImpl class or is there an alternative for extracting data
 for use with weka?

 I've messaged about this in the past but I don't believe I was thorough
 enough with my issues.

 Thanks in advance,
 Clayton



Re: Exporting YTEX Pipeline

2014-07-30 Thread vijay garla
Great that it worked!  Note that the examples for fracture (bag of
words/bag of cuis) is just scratching the surface of feature
representations - there are a gazillion ways to export the document (bag of
words per section, include negation status, ...)  Doing this via SQL makes
it super easy

Best,

VJ


On Wed, Jul 30, 2014 at 9:07 PM, Clayton Turner caturn...@g.cofc.edu
wrote:

 Awesome!!

 It worked!

 The only things I had to change (since I'm on Windows) was flipping the
 slashes when necessary and removing the first slash when specifying the
 -Dlog4j.configuration=file:/...

 Thank you so much for putting up with my issues

 -Clayton


 On Wed, Jul 30, 2014 at 2:48 PM, vijay garla vnga...@gmail.com wrote:

  Can you try this:
  copy
 
 
 https://code.google.com/p/ytex/source/browse/trunk/workspace/examples/fracture/cui/export.template.xml
  to CTAKES_HOME\desc\ctakes-ytex\fracture\cui.xml
  replace %DB_SCHEMA% with your database schema name (value of db.schema in
  your ytex.properties file)
 
  Then from a command prompt, execute the following commands:
  cd CTAKES_HOME
  bin\setenv.bat
  java -cp %CLASSPATH%
  -Dlog4j.configuration=file:/%CTAKES_HOME%/config/log4j.xml -Xmx256m
  org.apache.ctakes.ytex.kernel.SparseDataExporterImpl -prop
  desc\ctakes-ytex\fracture\cui.xml -type weka
 
  Tell me if you run into any issues.
 
  I will add this to the ctakes confluence doc.
 
  Best,
 
  VJ
 
 
  On Wed, Jul 30, 2014 at 5:11 PM, Clayton Turner caturn...@g.cofc.edu
  wrote:
 
   Hi, I'm trying to export the data I get from running the pipeline
 through
   the Collection Processing Engine.
  
   I set up the pipeline where I have a directory where all the XML is
  output
   to, but I am having issues at this point.
  
   I've tried using the built in Exporter from the Data Mining section on
  this
   page https://cwiki.apache.org/confluence/display/CTAKES/User%27s+Guide
  but
   those notes are out of date. Even altering directories to match the
 files
   still gives me errors about not being able to find the ExporterImpl
  class.
   The class version of this file only exists outside of the target
  directory
   for the ctakes snapshot and attempting to use it still fails.
  
   I then ventured to here:
  
  
 
 https://code.google.com/p/ytex/source/browse/#svn%2Ftrunk%2Fworkspace%2Fexamples%2Ffracture
  
   The files here match up to the data mining section from the previous
  link -
   so I created my export.xml file and changed everything that needed to
 be
   changed for my example (tried to even run bone fracture), but I cannot
  get
   data exported, no matter what I do.
  
   Is there a way to use some new(er) implementation of the
   SparseDataExporterImpl class or is there an alternative for extracting
  data
   for use with weka?
  
   I've messaged about this in the past but I don't believe I was thorough
   enough with my issues.
  
   Thanks in advance,
   Clayton
  
 



 --
 --
 Clayton Turner
 email: caturn...@g.cofc.edu
 phone: (843)-424-3784
 web: claytonturner.blogspot.com

 -
 “When scientifically investigating the natural world, the only thing worse
 than a blind believer is a seeing denier.”
 - Neil deGrasse Tyson



Re: cTAKES CPE MySQL Exception

2014-07-24 Thread vijay garla
My guess is that this exception is coming out of the DictionaryLookup (it
creates a connection and holds on to it for the life of the AE).

If it is coming out of the DBCollectionReader/DBConsumer you're in luck, as
those use a connection pool, and you can configure it to check the
connection upon pulling from the pool

The file is: resources\org\apache\ctakes\ytex\beans-datasource.xml
see
http://commons.apache.org/proper/commons-dbcp/api-1.4/org/apache/commons/dbcp/BasicDataSource.html
- you want to set testOnBorrow to true, and set the validtionQuery to
something like select 1

You should also set the errorRateThreshold in the CPE config (you can't do
this via the gui - you have to do this in the xml) - that way the cpe
doesn't bomb on the first error it sees - a few bad apples shouldn't kill
the processing.

HTH,

VJ



On Thu, Jul 24, 2014 at 4:32 PM, Clayton Turner caturn...@g.cofc.edu
wrote:

 Hi, everyone.

 First off, I'd like to say awesome and thank you for the cTAKES 3.2
 release and information. I've been following those pages and it's been
 really helpful for helping me move along in my own progress. Really cool
 stuff.

 So I'm using the Collection Processing Engine (with ytex and umls) and I'm
 trying to process ~1 million notes (as opposed to the about 30 in the given
 demo).

 I've tried this the past 2 days and when I come back in to check the
 progress I see that I've received an error about 14000 notes into the
 process:

 org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator
 processing failed.
 CausedBy: org.springframework.transaction.CannotCreateTransactionException:
 Could not open Hibernate Session for transaction; nested exception is
 com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet
 successfully received from the server was 53,888,249 milliseconds ago. The
 last packet sent successfully to the server was 53,888,249 milliseconds
 ago. is longer than the server configured value of 'wait_timeout'. You
 should consider either expiring and/or testing connection validity before
 use in your application, increasing the server configured values for client
 timeouts, or using the Connector/J connection property 'autoReconnect=true'
 to avoid this problem.

 So, in my own debugging, I have ensured that autoReconnect true was on (it
 always has been).

 I looked at my CPE output in the command prompt and noticed a
 PacketTooBigException so I increased the packet max size to 1G (the max
 for sql server).

 I increased the time allowed for timeouts.

 I'm really unsure of what to do here. Should I find a way to see if there
 is a problematic note that is giving me issues (though I can't understand
 how 1 note would make a packet too large)? Should I try to do some
 horizontal sharding and break the problem into smaller chunks (though I
 would think this program could handle large datasets since it's using a
 query language)? I'm just at a loss with this error, especially since it
 takes so long to actually spit the error out at me.

 Thanks in advance everyone,
 Clayton

 --
 --
 Clayton Turner
 email: caturn...@g.cofc.edu
 phone: (843)-424-3784
 web: claytonturner.blogspot.com

 -
 “When scientifically investigating the natural world, the only thing worse
 than a blind believer is a seeing denier.”
 - Neil deGrasse Tyson



Re: DBConsumer

2014-07-15 Thread vijay garla
You can add the DBConsumer to any pipeline, or add it to any CPE config.
 See
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+YTEX+DBConsumer

You will have to set up ctakes to and your database as documented here:
https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation

-vj


On Tue, Jul 15, 2014 at 2:16 AM, John Green john.travis.gr...@gmail.com
wrote:

 The Ytex DBConsumer - If someone has a free moment, could they give me a
 hint at how I can plug into a mysql DB with the ytex DB consumer? For
 example, taking the default ytex pipeline and sending it to a db.

 If I get pointed in the right direction such that I figure it out Ill
 update the confluence with the how to for the future.

 Thanks!
 JG



Re: DBConsumer

2014-07-15 Thread vijay garla
I should probably add to the docs to use the component descriptor:
YTEX_HOME\desc\ctakes-ytex-uima\desc\analysis_engine\DBConsumer.xml


On Tue, Jul 15, 2014 at 2:53 PM, John Green john.travis.gr...@gmail.com
wrote:

 Ok. I must have missed something. I did read both of those. Ill go back
 and look again.


 Thank you for your time Vijay,JG


 —
 Sent from Mailbox for iPhone

 On Tue, Jul 15, 2014 at 8:26 AM, vijay garla vnga...@gmail.com wrote:

  You can add the DBConsumer to any pipeline, or add it to any CPE config.
   See
 
 https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+YTEX+DBConsumer
  You will have to set up ctakes to and your database as documented here:
  https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation
  -vj
  On Tue, Jul 15, 2014 at 2:16 AM, John Green john.travis.gr...@gmail.com
 
  wrote:
  The Ytex DBConsumer - If someone has a free moment, could they give me a
  hint at how I can plug into a mysql DB with the ytex DB consumer? For
  example, taking the default ytex pipeline and sending it to a db.
 
  If I get pointed in the right direction such that I figure it out Ill
  update the confluence with the how to for the future.
 
  Thanks!
  JG
 



Re: [VOTE] Release Apache cTAKES 3.2.0 (rc2)

2014-07-09 Thread vijay garla
Sorry, I meant the snomed dictionary lookup database - I think it is hsql,
not lucene.

-vj


On Wed, Jul 9, 2014 at 5:30 PM, Masanz, James J. masanz.ja...@mayo.edu
wrote:

 I believe the only dictionaries shipped with Apache cTAKES in lucene
 indexes contain just the Orange Book,  RxNorm, and some made-up terms, not
 SNOMED-CT or the other sources taken from UMLS. If that is not correct,
 then I agree there is a problem with what is being shipped in lucene
 indexes.

 -- James

 -Original Message-
 From: vijay garla [mailto:vnga...@gmail.com]
 Sent: Wednesday, July 09, 2014 9:30 AM
 To: dev@ctakes.apache.org
 Subject: Re: [VOTE] Release Apache cTAKES 3.2.0 (rc2)

 ctakes-ytex-lib-3.1.2-SNAPSHOT.zip
 https://ytex.googlecode.com/files/ctakes-ytex-lib-3.1.2-SNAPSHOT.zip -
 this contains non-asf compliant ytex libs.  I would like to add it to the
 sourceforge site / or add it to the ctakes resources directly (that way
 users simply have to unzip a single zip file)

 ctakes-ytex-resources-3.1.2-SNAPSHOT.zip
 
 http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip
 
 -
 this contains data derived from the UMLS - concept graphs and dictionary
 lookup tables.  downloading this requires a UTS login.  It is conceptually
 no different from the ctakes resources, so I believe it would be OK to add
 it to that zip file, but I'm not a lawyer.

 On another note: I think forcing users to specify the UTS username/password
 and contacting NIH every time you run cTAKES is problematic, and doesn't
 prevent users who don't have a valid UTS login from viewing the data
 contained in the lucene index dictionary.  I personally believe requiring a
 UTS login to download would be the best way to make resources derived from
 the UMLS available to users (this is what I'm doing for ytex-resources).

 to summarize: for now, I would like to add the ytex libs to the ctakes
 resources zip.

 -vj




 On Wed, Jul 9, 2014 at 4:04 PM, Chen, Pei pei.c...@childrens.harvard.edu
 wrote:

  The maven artifacts are also available in the staging area:
  https://repository.apache.org/content/repositories/orgapachectakes-1001
  VJ: Just curious- how did you envision ytex users downloading the
  jars/war? From the distro bin.zip or from maven central?
 
  --Pei
 
   -Original Message-
   From: Pei Chen [mailto:chen...@apache.org]
   Sent: Tuesday, July 08, 2014 6:11 PM
   To: dev@ctakes.apache.org
   Subject: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
  
   Hi all,
  
   The main difference between rc1 and rc2 is that we removed the lvg-res
  and
   assertion-res.jar from the distro.  They still need to be unpacked.
  
   This is a call for a vote on releasing the following candidate (rc2) as
  Apache
   cTAKES 3.2.0.
   The major changes include:
   - New optional YTEX component(s) (Yale Extensions to cTAKES)
   - New optional improved/faster dictionary lookup
 (dictionary-lookup-fast)
   - New optional Temporal component (Time + Event extraction.  Relations
  will
   be including in a future release.)
   - Other bug fixes/enhancements from Jira
  
   [TODO: Online documentation still needs to be updated on wiki]
  
   For more detailed information on the changes/release notes, please
 visit:
  
 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313621
   version=12324066
  
   The release was made using the cTAKES release process documented here:
   http://ctakes.apache.org/ctakes-release-guide.html
  
   The candidate is available at:
   http://people.apache.org/~chenpei/RCs/ctakes-3.2.0-rc2/apache-ctakes-
   3.2.0-src.tar.gz
   /.zip
  
   The tag to be voted on:
   http://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.2.0-rc2
  
   The MD5 checksum of the tarball can be found at:
   http://people.apache.org/~chenpei/RCs/ctakes-3.2.0-rc2/apache-ctakes-
   3.2.0-src.tar.gz.md5
   /.zip.md5
  
   The signature of the tarball can be found at:
   http://people.apache.org/~chenpei/RCs/ctakes-3.2.0-rc2/apache-ctakes-
   3.2.0-src.tar.gz.asc
   /.zip.asc
  
   Apache cTAKES' KEYS file, containing the PGP keys used to sign the
  release:
   https://dist.apache.org/repos/dist/release/ctakes/KEYS
  
   Please vote on releasing these packages as Apache cTAKES 3.2.0. The
 vote
  is
   open for at least the next 72 hours.
   Only votes from the cTAKES PMC are binding, but folks are welcome to
  check
   the release candidate and voice their approval or disapproval.
   The vote passes if at least three binding +1 votes are cast.
  
   [ ] +1 Release the packages as Apache cTAKES 3.2.0 [ ] -1 Do not
 release
  the
   packages because...
  
   Also, the convenience binary can be found at:
   http://people.apache.org/~chenpei/RCs/ctakes-3.2.0-rc2/apache-ctakes-
   3.2.0-bin.tar.gz
   /.zip
  
   Note: It's temporarily on people.a.o because the artifacts were too
  large for
   https://dist.apache.org/repos/dist/dev/ctakes (Working with infra on
   increasing the limit).
  
  
   Thanks!
 



Re: cTAKES 3.2 Analysis Batch Issue

2014-07-08 Thread vijay garla
Hi Clayton,

The screenshot is not coming through via the newsgroup emails.  can you
attach the log file?

vj


On Mon, Jul 7, 2014 at 5:38 PM, Clayton Turner caturn...@g.cofc.edu wrote:

 Any update on this issue? I have this problem even if I don't use the ytex
 version of the aggregate text processor (UMLS-independent as well).


 On Thu, Jul 3, 2014 at 2:33 PM, Clayton Turner caturn...@g.cofc.edu
 wrote:

 Yes, I am running the fracture_demo.xml cpe.

 There is no option for the analysis batch (that's the main issue). I also
 get no response in my MySQL database (umls installed - not sure if that can
 be related).

 Here's a screenshot of my CPE (using ytex):
 [image: Inline image 1]




 On Wed, Jul 2, 2014 at 10:48 PM, vijay garla vnga...@gmail.com wrote:

 Hi clayton,

 I assume you are running the fracture_demo.xml cpe - is that correct?
  The CPE GUI should give you the option to set the analysis batch. (see
 attached screenshot).  That being said, the analysis_batch is not required
 (it will default to the current date).  Can you attach the log file?

 -vj

 [image: Inline image 1]


 On Wed, Jul 2, 2014 at 12:22 PM, Clayton Turner caturn...@g.cofc.edu
 wrote:

 Hi, I'm a relatively new user of cTAKES.

 I recently cloned cTAKES from the repository and I am using UMLS
 installed
 in my mysql database. I have recently noticed an issue, though. When
 conducting the bone fracture demo, In the CPE, I use the
 DBCollectionReader
 and Analysis Engine from the ctakes-ytex-uima directory within my
 CTAKES_HOME.

 I can get this to run successfully, but I am not able to specify an
 analysis batch in the CPE. Because of this, my ytex database is not
 being
 updated with results of the CPE run (in the v_document tables). Any
 ideas
 why the analysis batch field is missing?

 Side question: Any update on when cTAKES 3.2 will be officially
 released? I
 see we're passed the expected release and was curious on how long it
 will
 be until it will officially come out.

 Thanks a lot,
 --
 Clayton Turner





 --
 --
 Clayton Turner
 email: caturn...@g.cofc.edu
 phone: (843)-424-3784
 web: claytonturner.blogspot.com

 -
 “When scientifically investigating the natural world, the only thing
 worse than a blind believer is a seeing denier.”
 - Neil deGrasse Tyson




 --
 --
 Clayton Turner
 email: caturn...@g.cofc.edu
 phone: (843)-424-3784
 web: claytonturner.blogspot.com

 -
 “When scientifically investigating the natural world, the only thing worse
 than a blind believer is a seeing denier.”
 - Neil deGrasse Tyson



Re: cTAKES 3.2 Analysis Batch Issue

2014-07-08 Thread vijay garla
My bad, the default log4j config just sends everything to the console.  Can
you run the cpe, can you redirect the output to a file like this:
runctakesCPE.bat  cpe.log 21

vj


On Tue, Jul 8, 2014 at 6:01 PM, Clayton Turner caturn...@g.cofc.edu wrote:

 I don't see a log file when running the CPE. When running the CVD I have
 access to a log file within the gui, but that does not seem to be present
 here. Is there a specific place that this log file is saved?


 On Tue, Jul 8, 2014 at 3:14 AM, vijay garla vnga...@gmail.com wrote:

  Hi Clayton,
 
  The screenshot is not coming through via the newsgroup emails.  can you
  attach the log file?
 
  vj
 
 
  On Mon, Jul 7, 2014 at 5:38 PM, Clayton Turner caturn...@g.cofc.edu
  wrote:
 
   Any update on this issue? I have this problem even if I don't use the
  ytex
   version of the aggregate text processor (UMLS-independent as well).
  
  
   On Thu, Jul 3, 2014 at 2:33 PM, Clayton Turner caturn...@g.cofc.edu
   wrote:
  
   Yes, I am running the fracture_demo.xml cpe.
  
   There is no option for the analysis batch (that's the main issue). I
  also
   get no response in my MySQL database (umls installed - not sure if
 that
  can
   be related).
  
   Here's a screenshot of my CPE (using ytex):
   [image: Inline image 1]
  
  
  
  
   On Wed, Jul 2, 2014 at 10:48 PM, vijay garla vnga...@gmail.com
 wrote:
  
   Hi clayton,
  
   I assume you are running the fracture_demo.xml cpe - is that correct?
The CPE GUI should give you the option to set the analysis batch.
 (see
   attached screenshot).  That being said, the analysis_batch is not
  required
   (it will default to the current date).  Can you attach the log file?
  
   -vj
  
   [image: Inline image 1]
  
  
   On Wed, Jul 2, 2014 at 12:22 PM, Clayton Turner 
 caturn...@g.cofc.edu
   wrote:
  
   Hi, I'm a relatively new user of cTAKES.
  
   I recently cloned cTAKES from the repository and I am using UMLS
   installed
   in my mysql database. I have recently noticed an issue, though. When
   conducting the bone fracture demo, In the CPE, I use the
   DBCollectionReader
   and Analysis Engine from the ctakes-ytex-uima directory within my
   CTAKES_HOME.
  
   I can get this to run successfully, but I am not able to specify an
   analysis batch in the CPE. Because of this, my ytex database is
 not
   being
   updated with results of the CPE run (in the v_document tables). Any
   ideas
   why the analysis batch field is missing?
  
   Side question: Any update on when cTAKES 3.2 will be officially
   released? I
   see we're passed the expected release and was curious on how long it
   will
   be until it will officially come out.
  
   Thanks a lot,
   --
   Clayton Turner
  
  
  
  
  
   --
   --
   Clayton Turner
   email: caturn...@g.cofc.edu
   phone: (843)-424-3784
   web: claytonturner.blogspot.com
  
  
 
 -
   “When scientifically investigating the natural world, the only thing
   worse than a blind believer is a seeing denier.”
   - Neil deGrasse Tyson
  
  
  
  
   --
   --
   Clayton Turner
   email: caturn...@g.cofc.edu
   phone: (843)-424-3784
   web: claytonturner.blogspot.com
  
  
 
 -
   “When scientifically investigating the natural world, the only thing
  worse
   than a blind believer is a seeing denier.”
   - Neil deGrasse Tyson
  
 



 --
 --
 Clayton Turner
 email: caturn...@g.cofc.edu
 phone: (843)-424-3784
 web: claytonturner.blogspot.com

 -
 “When scientifically investigating the natural world, the only thing worse
 than a blind believer is a seeing denier.”
 - Neil deGrasse Tyson



Re: Building

2014-07-04 Thread vijay garla
When you run the webapp, the restful sevices run as well

On Friday, July 4, 2014, John Green john.travis.gr...@gmail.com wrote:

 Vijay - Ha! Ok. Works perfect with cuis.

 Is there a way to run the web application as a RESTful API? You mention
 this as a service on your yale box, but I dont see a way to deploy it this
 way local.

 Thanks again,
 JG


 On Wed, Jul 2, 2014 at 10:58 PM, vijay garla vnga...@gmail.com
 javascript:; wrote:

  The ytexWeb application tries to look up concepts from terms using the
 ytex
  dictionary lookup table, which is a small subset of the UMLS.  Can you
 try
  specifying cuis?  That skips the lookup - if the concepts are in the
  concept graph, this will work.
 
  Best,
 
  vj
 
 
  On Sun, Jun 29, 2014 at 6:10 PM, John Green john.travis.gr...@gmail.com
 javascript:;
  wrote:
 
   Hi Vijay, thank you for your time.
  
   Your documentation was quite good. I had no problem setting up ytex
 with
   UMLS running on my local mysql server. Where I ran into problems was
   understanding how to launch the web service (also, is there anyway to
 run
   this in a RESTful mode? Btw, the informatics.yale links returns 502).
  After
   I did get it launched, and the confusion was probably all my fault, the
   concepts available to the similarity fields seemed very sparse; I just
   started typing randomly, hematochezia, choledocholithiasis, etc, and
   nothing would come up. The best I got was gallbladder function test,
  which,
   if Im understanding it right, would be an alkphos, but alkaline
  phosphatase
   didnt come up, which led to me to believe they were smaller sets of the
  the
   snomed, mesh, etc compilations (as I checked the UMLS db and these
  concepts
   are there).
  
   I think I got that execution command from the code.google, which is
   probably why it was stale. I did not see the ytex semantic similarity
  guide
   under the ctakes components part (sorry, thanks for pointing me there,
  ill
   get to work on reading it).
  
   So bottom line: are the ones that shipped watered down versions? And if
   not, why are my concepts coming up short? If you give me a hint at
 where
  to
   check Ill investigate.
  
   Thanks!
   JG
  
  
   On Sun, Jun 29, 2014 at 8:56 PM, vijay garla vnga...@gmail.com
 javascript:; wrote:
  
Hi John,
   
YTEX ships with 3 concept graphs (see
   
   
  
 
 https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+Semantic+Similarity
):
   
   - sct-rxnorm: concepts from SNOMED-CT and RXNORM.  This is the
   default.
   - sct-msh-csp-aod: concepts from the SNOMED-CT, MeSH, CRISP, and
   Alcohol
   and Drug thesaurus
   - umls: concepts from all restriction free (level 0) UMLS source
   vocabularies and SNOMED-CT
   
   
These concept graphs are included in ytex resources zip (see
https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation
 ):
3) Unzip YTEX Resources (Optional - UTS login required)
   
Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

   
  
 
 http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

'over'
your installation. This contains:
   
   - Concept Graphs derived from the UMLS2013AA used to compute
  semantic
   similarity measures
   
   
   
All YTEX packages moved from the ytex namespace into
   org.apache.ctakes.ytex
- can you tell me which document you were looking at that mentioned
ytex.kernel.dao.ConceptDaoImpl?  I thought I had fixed this in the
documentation.
   
HTH,
   
-vj
   
   
On Sun, Jun 29, 2014 at 2:25 PM, John Green 
  john.travis.gr...@gmail.com javascript:;
   
wrote:
   
 I got the semantic similarity web app running in ytex. Im still
   learning
 umls terminology, but I believe it says that out of the box its
  concept
 graphs are limited to the free set from umls? Does this mean
 without
 permissions? Similar to ctakes with umls rights? The concepts
  available
 seem limited so this would make sense.

 So, to take full advantage I would need to rebuild the concept
 graph,
 correct? Im in the process of doing this but getting classpath
  errors.
   I
 used java a bit ten years ago, so you can probably guess these will
   take
me
 a minute to resolve. Notably, it is complaining about
 ytex.kernel.dao.ConceptDaoImpl.


 Thanks all,


 JG

 —
 Sent from Mailbox for iPhone
   
  
 



Re: cTAKES 3.2 Analysis Batch Issue

2014-07-02 Thread vijay garla
Hi clayton,

I assume you are running the fracture_demo.xml cpe - is that correct?  The
CPE GUI should give you the option to set the analysis batch. (see attached
screenshot).  That being said, the analysis_batch is not required (it will
default to the current date).  Can you attach the log file?

-vj

[image: Inline image 1]


On Wed, Jul 2, 2014 at 12:22 PM, Clayton Turner caturn...@g.cofc.edu
wrote:

 Hi, I'm a relatively new user of cTAKES.

 I recently cloned cTAKES from the repository and I am using UMLS installed
 in my mysql database. I have recently noticed an issue, though. When
 conducting the bone fracture demo, In the CPE, I use the DBCollectionReader
 and Analysis Engine from the ctakes-ytex-uima directory within my
 CTAKES_HOME.

 I can get this to run successfully, but I am not able to specify an
 analysis batch in the CPE. Because of this, my ytex database is not being
 updated with results of the CPE run (in the v_document tables). Any ideas
 why the analysis batch field is missing?

 Side question: Any update on when cTAKES 3.2 will be officially released? I
 see we're passed the expected release and was curious on how long it will
 be until it will officially come out.

 Thanks a lot,
 --
 Clayton Turner



Re: Building

2014-07-02 Thread vijay garla
the concept graph used by the webapp is defined in ytex.properties.  You
can also override it using the ytex.conceptGraph system property (add
-Dytex.conceptGraph=xxx to the beginning of the ytexweb.bat java command
line).

I'm not sure about why you don't see any log output:
when I run this line specifying an invalid concept graph name:
java -cp %CLASSPATH% -Dlog4j.configuration=file:/%CTAKES_HOME%/config/log4j.xml
-Xmx1g org.apache.ctakes.ytex.kernel.dao.ConceptDaoImpl -name concept
graph nameC:\java\apache-ctakes-3.1.2-SNAPSHOTjava -cp %CLASSPATH%
-Dlog4j.configuration=
file:/%CTAKES_HOME%/config/log4j.xml -Xmx1g
org.apache.ctakes.ytex.kernel.dao.ConceptDaoImpl -name test

I get this output (indicating that the corresponding properties file can't
be found):
log4j: reset attribute= false.
log4j: Threshold =null.
log4j: Level value for root is  [INFO].
log4j: root level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: org.apache.log4j.PatternLayout
log4j: Setting property [conversionPattern] to [%d{dd MMM  HH:mm:ss}
%5p %c{1} - %m%n].
log4j: Adding appender named [consoleAppender] to category [root].
*properties file could not be located:
org/apache/ctakes/ytex/conceptGraph/test.xml *

If you're on linux, can you play around with the file url for log4j?

Best,

VJ


On Sun, Jun 29, 2014 at 6:30 PM, John Green john.travis.gr...@gmail.com
wrote:

 Successfully ran command to build the concept graph, however, it seems to
 be failing silently. The version issued with ytex is 10m. I expected, worst
 case, for mine to be the same, it was 400 bytes (the .gz output). I cant
 find anything logged. log4j is complaining it isnt setup correctly,
 however, it is directed to the correct config file. Im not familiar with
 this logging program, so perhaps the errors are ending up in some kind of
 /dev/null.

 Also, the web app is only loading sct-msh-csp-aod. I see that in the same
 dir there are the others you spoke of. The web app doesnt give an option
 for using them (this makes sense as the command line output makes no
 mention of loading them) but I can find where what is loaded is defined.

 I hope that wasnt too poorly explained,
 Thanks,
 John


 On Sun, Jun 29, 2014 at 9:10 PM, John Green john.travis.gr...@gmail.com
 wrote:

  Hi Vijay, thank you for your time.
 
  Your documentation was quite good. I had no problem setting up ytex with
  UMLS running on my local mysql server. Where I ran into problems was
  understanding how to launch the web service (also, is there anyway to run
  this in a RESTful mode? Btw, the informatics.yale links returns 502).
 After
  I did get it launched, and the confusion was probably all my fault, the
  concepts available to the similarity fields seemed very sparse; I just
  started typing randomly, hematochezia, choledocholithiasis, etc, and
  nothing would come up. The best I got was gallbladder function test,
 which,
  if Im understanding it right, would be an alkphos, but alkaline
 phosphatase
  didnt come up, which led to me to believe they were smaller sets of the
 the
  snomed, mesh, etc compilations (as I checked the UMLS db and these
 concepts
  are there).
 
  I think I got that execution command from the code.google, which is
  probably why it was stale. I did not see the ytex semantic similarity
 guide
  under the ctakes components part (sorry, thanks for pointing me there,
 ill
  get to work on reading it).
 
  So bottom line: are the ones that shipped watered down versions? And if
  not, why are my concepts coming up short? If you give me a hint at where
 to
  check Ill investigate.
 
  Thanks!
  JG
 
 
  On Sun, Jun 29, 2014 at 8:56 PM, vijay garla vnga...@gmail.com wrote:
 
  Hi John,
 
  YTEX ships with 3 concept graphs (see
 
 
 https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+Semantic+Similarity
  ):
 
 - sct-rxnorm: concepts from SNOMED-CT and RXNORM.  This is the
 default.
 - sct-msh-csp-aod: concepts from the SNOMED-CT, MeSH, CRISP, and
  Alcohol
 and Drug thesaurus
 - umls: concepts from all restriction free (level 0) UMLS source
 vocabularies and SNOMED-CT
 
 
  These concept graphs are included in ytex resources zip (see
  https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation):
  3) Unzip YTEX Resources (Optional - UTS login required)
 
  Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip
  
 
 http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip
  
  'over'
  your installation. This contains:
 
 - Concept Graphs derived from the UMLS2013AA used to compute semantic
 similarity measures
 
 
 
  All YTEX packages moved from the ytex namespace into
  org.apache.ctakes.ytex
  - can you tell me which document you were looking at that mentioned
  ytex.kernel.dao.ConceptDaoImpl?  I thought I had fixed this in the
  documentation.
 
  HTH,
 
  -vj
 
 
  On Sun, Jun 29, 2014 at 2:25 PM, John Green

Re: Building

2014-07-02 Thread vijay garla
The ytexWeb application tries to look up concepts from terms using the ytex
dictionary lookup table, which is a small subset of the UMLS.  Can you try
specifying cuis?  That skips the lookup - if the concepts are in the
concept graph, this will work.

Best,

vj


On Sun, Jun 29, 2014 at 6:10 PM, John Green john.travis.gr...@gmail.com
wrote:

 Hi Vijay, thank you for your time.

 Your documentation was quite good. I had no problem setting up ytex with
 UMLS running on my local mysql server. Where I ran into problems was
 understanding how to launch the web service (also, is there anyway to run
 this in a RESTful mode? Btw, the informatics.yale links returns 502). After
 I did get it launched, and the confusion was probably all my fault, the
 concepts available to the similarity fields seemed very sparse; I just
 started typing randomly, hematochezia, choledocholithiasis, etc, and
 nothing would come up. The best I got was gallbladder function test, which,
 if Im understanding it right, would be an alkphos, but alkaline phosphatase
 didnt come up, which led to me to believe they were smaller sets of the the
 snomed, mesh, etc compilations (as I checked the UMLS db and these concepts
 are there).

 I think I got that execution command from the code.google, which is
 probably why it was stale. I did not see the ytex semantic similarity guide
 under the ctakes components part (sorry, thanks for pointing me there, ill
 get to work on reading it).

 So bottom line: are the ones that shipped watered down versions? And if
 not, why are my concepts coming up short? If you give me a hint at where to
 check Ill investigate.

 Thanks!
 JG


 On Sun, Jun 29, 2014 at 8:56 PM, vijay garla vnga...@gmail.com wrote:

  Hi John,
 
  YTEX ships with 3 concept graphs (see
 
 
 https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+Semantic+Similarity
  ):
 
 - sct-rxnorm: concepts from SNOMED-CT and RXNORM.  This is the
 default.
 - sct-msh-csp-aod: concepts from the SNOMED-CT, MeSH, CRISP, and
 Alcohol
 and Drug thesaurus
 - umls: concepts from all restriction free (level 0) UMLS source
 vocabularies and SNOMED-CT
 
 
  These concept graphs are included in ytex resources zip (see
  https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation):
  3) Unzip YTEX Resources (Optional - UTS login required)
 
  Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip
  
 
 http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip
  
  'over'
  your installation. This contains:
 
 - Concept Graphs derived from the UMLS2013AA used to compute semantic
 similarity measures
 
 
 
  All YTEX packages moved from the ytex namespace into
 org.apache.ctakes.ytex
  - can you tell me which document you were looking at that mentioned
  ytex.kernel.dao.ConceptDaoImpl?  I thought I had fixed this in the
  documentation.
 
  HTH,
 
  -vj
 
 
  On Sun, Jun 29, 2014 at 2:25 PM, John Green john.travis.gr...@gmail.com
 
  wrote:
 
   I got the semantic similarity web app running in ytex. Im still
 learning
   umls terminology, but I believe it says that out of the box its concept
   graphs are limited to the free set from umls? Does this mean without
   permissions? Similar to ctakes with umls rights? The concepts available
   seem limited so this would make sense.
  
   So, to take full advantage I would need to rebuild the concept graph,
   correct? Im in the process of doing this but getting classpath errors.
 I
   used java a bit ten years ago, so you can probably guess these will
 take
  me
   a minute to resolve. Notably, it is complaining about
   ytex.kernel.dao.ConceptDaoImpl.
  
  
   Thanks all,
  
  
   JG
  
   —
   Sent from Mailbox for iPhone
 



Re: Ctakes-data-vis

2014-04-29 Thread vijay garla
I think one major issue with ctakes in a web server is thread safety.  I
know that LVG is not thread safe, and it isn't clear what the status is on
other components.


On Tue, Apr 29, 2014 at 9:20 AM, John Green john.travis.gr...@gmail.comwrote:

 Pei - I meant as a web app, can we keep the credentials loaded and the
 resources (more importantly) loaded in memory accross runs? E.g. Treat it
 like a que with the machinery already loaded and fed? Im sure this can be
 done, I just run ctakes from CPE right now and havent toyed with this so
 wasnt sure.


 Where is this at? Is anyone developing the front end? I might be able to
 invest some time into the easily.




 Jg
 —
 Sent from Mailbox for iPhone

 On Wed, Apr 16, 2014 at 12:23 PM, Chen, Pei
 pei.c...@childrens.harvard.edu wrote:

  John,
  How we use the VM is up to us to decide.  For an online demo,
  We can certainly load up cTAKES and it's resources.
  If it's a web app, we can prompt the user to enter umls credentials if
 they choose the umls resources?
  --Pei
  -Original Message-
  From: John Green [mailto:john.travis.gr...@gmail.com]
  Sent: Sunday, April 13, 2014 9:16 PM
  To: dev@ctakes.apache.org
  Subject: Re: Ctakes-data-vis
 
  Great! Ill try and fix that soon. Im back on the wards so time is slim.
 
 
  What are the next steps for the vm? For the demo site?
 
 
 
 
  Out of curiosity, would this allow resources to stay loaded and a kind
 of que
  be setup? Is there a solution that allows to do this now? That is, the
  resources stay loaded in mem, the umls auth stays current, and I could
 just
  pass content as it becomes available?
 
 
 
 
  Jg
  —
  Sent from Mailbox for iPhone
 
  On Sat, Apr 12, 2014 at 2:57 PM, andy mcmurry mcmurry.a...@gmail.com
  wrote:
 
   It looks great! The transitions are smooth and the hierarchical
   browsing is straightforward. The only edit I recommend I have is about
   spacing -- The information often exceeds the space of a single page.
   On Sat, Apr 5, 2014 at 12:13 PM, John Green
  john.travis.gr...@gmail.comwrote:
   Had to refresh my svn skills as its been years. As a result not much
   cleaning up got done Andy/Pei. The code is solid though and I sent
   four different ways to view the json up too; collapsable dendrogram
   is the most useful.
  
  
   The script could easily be re written to iterate through a directory
   as its in the form of a simple class. Also, it should take command
 line args.
   Im out of time this weekend, even for the ten minutes that would
   take, but I can do both next weekend.
  
  
   Let me know if its useful at all Andy or if you need tweaks on
   anything to make it useful for whatever demo u have in mind, id be
   happy to as time permits.
  
  
   Hope to make more significant contributions to this wonderful project
   sometime in the next year, Jg
   --
   Sent from Mailbox for iPhone


Re: ytex merged into trunk

2014-04-28 Thread vijay garla
 is org.hibernate.exception.SQLGram
 marException: could not prepare statement
   org.apache.ctakes.ytex.uima.annotators.SparseDataExporterTest: Unable to
 initialize group definition. Group resource name [classpa
 th*:org/apache/ctakes/ytex/uima/beanRefContext.xml], factory key
 [ytexApplicationContext]; nested exception is org.springframework.b
 eans.factory.BeanCreationException: Error creating bean with name
 'ytexApplicationContext' defined in URL [file:/C:/Spiffy/Dev/Apach
 eCtakesTrunk/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/uima/beanRefContext.xml]:
 Instantiation of bean failed; neste
 d exception is org.springframework.beans.BeanInstantiationException: Could
 not instantiate bean class [org.springframework.context.s
 upport.ClassPathXmlApplicationContext]: Constructor threw exception;
 nested exception is org.springframework.beans.factory.BeanCreat
 ionException: Error creating bean with name 'documentMapperService'
 defined in class path resource [org/apache/ctakes/ytex/uima/bean
 s-uima-mapper.xml]: Invocation of init method failed; nested exception is
 org.hibernate.exception.SQLGrammarException: could not pre
 pare statement

 Tests run: 12, Failures: 0, Errors: 7, Skipped: 0
 ...

 [INFO] Apache cTAKES Resources ctakes-ytex-res ... SUCCESS [
  1.089 s]
 [INFO] Apache cTAKES YTEX  SUCCESS [
 14.592 s]
 [INFO] Apache cTAKES YTEX UIMA ... FAILURE [01:34
 min]
 [INFO] Apache cTAKES ctakes-clinical-pipeline  SKIPPED
 [INFO] Apache cTAKES YTEX Web  SKIPPED

 ...
 [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on
 project ctakes-ytex-uima:
 There are test failures.


 -Original Message-
 From: vijay garla [mailto:vnga...@gmail.com]
 Sent: Sunday, April 27, 2014 10:56 PM
 To: dev@ctakes.apache.org
 Subject: ytex merged into trunk

 Hello All,

 I have merged YTEX into trunk, will keep the branch around a little while
 then delete it.  Some non-ytex related changes (I will gladly change/revert
 if there are objections):
 * ctakes-temporal does not compile; from the email threads I take it that
 this is under development and the compilation failures are to be expected.
  I have commented out these modules from the root pom.xml so that ctakes
 builds.  I wasn't able to use maven profiles to exclude ctakes-temporal,
 not sure why.
 * default max memory runctakesCPE/CVD.bat: I have increased this to 2G
 (was 1G).  I do not think it is possible to run cTAKES with less memory
 (definitely not in a 64-bit jdk) when loading the assertion models (which
 is the 'default' pipeline).

 I will clean up the ytex docs shortly.

 Best,

 VJ



Re: YTEX install - one error after building

2014-03-26 Thread vijay garla
Hi Paula,

UMLS.hbm.template.xml is a template used to generate a valid hibernate xml
config file.  If you have imported YTEX into eclipse, follow these
guidelines:

https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes-ytex/README

I believe the issue might be that you have validation enabled for XML; I
believe you can disable it for specific files (like UMLS.hbm.template.xml).
 I am using keper, and it doesn't complain about UMLS.hbm.template.xml; I'm
not sure if I tweaked my validator settings.

-vj



On Tue, Mar 25, 2014 at 6:16 PM, digital paula cybersat...@hotmail.comwrote:

 Hi VJ,

 As part of testing, I  did a fresh install of cTAKES with YTEX and
 everything installed correctly but after building I got one error
 pertaining to this page, five lines down.


 https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/umls/model/UMLS.hbm.template.xml


 The error is this line:
 hibernate-mapping package=org.apache.ctakes.ytex.umls.model
 schema=@umls.schema@ @filter.umls.catalog@

 Using Eclipse Juno, the error states:

 Element type hibernate-mapping must be followed by either attribute
 specifications,  or /.

 I tried using / instead of  and putting it all on one line instead of
 two but can't seem to fix it.

 Also,  I was about to install the sectionizer separately as a module but I
 see that YTEX already has a sectionizer(SegmentRegexSectionizer) so I look
 forward to exploring it further.

 Regards,
 Paula

  Date: Thu, 20 Mar 2014 14:08:32 -0400
  Subject: Re: YTEX Doc in cwiki
  From: vnga...@gmail.com
  To: dev@ctakes.apache.org
 
  I plan to fix all the links.
 
  I have not yet moved the scripts for the semantic similarity benchmark to
  cTAKES, so I dropped that from the cTAKES semantic similarity docs.  When
  those scripts get moved to cTAKES, I'll update the docs.
 
 
  On Thu, Mar 20, 2014 at 12:33 PM, Masanz, James J. 
 masanz.ja...@mayo.eduwrote:
 
   hi vijay,
  
   I have just skimmed a few sections so far.
  
   the page has links at the top to google docs pages and then links to
 our
   web pages (the children pages) at the bottom. Is your intent to remove
 the
   first 3 links once things are finalized?
  
   some of the examples on the Semantic+Similarity page use cd
 CTAKES_HOME
   but later use %CTAKES_HOME%
  
   so it looks like you meant cd %CTAKES_HOME%
  
   I didn't see anything about the Similarity Benchmark on the new pages.
 Is
   that still part of ytex?
  
   -- james
  
  
   -Original Message-
   From: vijay garla [mailto:vnga...@gmail.com]
   Sent: Sunday, March 16, 2014 8:53 PM
   To: dev@ctakes.apache.org
   Subject: YTEX Doc in cwiki
  
   Hello All,
  
   I've made a first cut at moving and updating the YTEX docs over from
 google
   code to the cTAKES confluence site.
  
   This is a first cut, and I'm trying to keep the YTEX docs separated,
 as it
   is not yet in trunk/released, and I don't want to mess up any existing
   docs.
  
   see https://cwiki.apache.org/confluence/display/CTAKES/YTEX+3.2
  
   Best,
  
   VJ
  



Re: YTEX Doc in cwiki

2014-03-20 Thread vijay garla
I plan to fix all the links.

I have not yet moved the scripts for the semantic similarity benchmark to
cTAKES, so I dropped that from the cTAKES semantic similarity docs.  When
those scripts get moved to cTAKES, I'll update the docs.


On Thu, Mar 20, 2014 at 12:33 PM, Masanz, James J. masanz.ja...@mayo.eduwrote:

 hi vijay,

 I have just skimmed a few sections so far.

 the page has links at the top to google docs pages and then links to our
 web pages (the children pages) at the bottom. Is your intent to remove the
 first 3 links once things are finalized?

 some of the examples on the Semantic+Similarity page use cd CTAKES_HOME
 but later use %CTAKES_HOME%

 so it looks like you meant cd %CTAKES_HOME%

 I didn't see anything about the Similarity Benchmark on the new pages. Is
 that still part of ytex?

 -- james


 -Original Message-
 From: vijay garla [mailto:vnga...@gmail.com]
 Sent: Sunday, March 16, 2014 8:53 PM
 To: dev@ctakes.apache.org
 Subject: YTEX Doc in cwiki

 Hello All,

 I've made a first cut at moving and updating the YTEX docs over from google
 code to the cTAKES confluence site.

 This is a first cut, and I'm trying to keep the YTEX docs separated, as it
 is not yet in trunk/released, and I don't want to mess up any existing
 docs.

 see https://cwiki.apache.org/confluence/display/CTAKES/YTEX+3.2

 Best,

 VJ



YTEX Doc in cwiki

2014-03-16 Thread vijay garla
Hello All,

I've made a first cut at moving and updating the YTEX docs over from google
code to the cTAKES confluence site.

This is a first cut, and I'm trying to keep the YTEX docs separated, as it
is not yet in trunk/released, and I don't want to mess up any existing docs.

see https://cwiki.apache.org/confluence/display/CTAKES/YTEX+3.2

Best,

VJ


Re: YTEX LVG Fix

2014-02-11 Thread vijay garla
Hi John,

Thanks for this.  I've updated the YTEXPipeline, fixed the lvg paths in
SetupAUIFirstWord.

If you want to re-run SetupAUIFirstWord (not necessary unless you are using
the stemmed words for dictionary lookup), just svn update, rebuild
ctakes-ytex-uima, and copy the jar to the lib dir of your ctakes install.

Best,

VJ


On Mon, Feb 10, 2014 at 6:25 PM, John David Osborne (Campus) ozb...@uab.edu
 wrote:

  These were the changes I made to get the YTEX pipeline working with LVG
 (2008). It looks like there were just a couple of spots with some old
 hard-coded paths in SetupAUIFirstWord.java that were appropriate to the old
 ytex directory structure.

 For now I have just swapped them out to fit with the new directory
 structure, but I suppose the correct fix may be to extract them out
 somewhere...  In any case I don't have write privileges, some someone else
 may want to fix this (Vijay?)

 I also included the YTEXPipeline.xml descriptor file I fixed as well in
 case anybody needs it.

  -John





Re: YTEX cTAKES 3.1.1 ready

2014-02-06 Thread vijay garla
I believe it is worth migrating to trunk.

Note that the sentence detector is also complementary - the existing ctakes
sentence detector is unchanged - users can choose which sentence detector
to use.  There are changes to assertion  dependency parsing to support
sentences without newlines, and that works with both sentence detectors.

I believe cTAKES absolutely has to support sentences with newlines within
them - I have yet to run across clinical text from a real EMR where
newlines represent the end of a sentence - the changes to assertion 
dependency parsing will have to be done at some point.

-vj


On Thu, Feb 6, 2014 at 10:19 AM, Chen, Pei
pei.c...@childrens.harvard.eduwrote:

 VJ,
 Aside from the changes to the existing cTAKES code (sentence detector,
 etc.) [which we could leave out if it's still being debated],
 Do you think it's worth migrating the ytex code to trunk at this point?
  As you mentioned earlier, it's largely complementary.
 [I was just thinking of saving effort to maintain the separate branch and
 for simplicity for dev...]

 --Pei

  -Original Message-
  From: vijay garla [mailto:vnga...@gmail.com]
  Sent: Wednesday, February 05, 2014 9:30 PM
  To: ytex-us...@googlegroups.com; ctakes-...@incubator.apache.org;
  vlad.valtchi...@gmail.com
  Subject: Re: YTEX cTAKES 3.1.1 ready
 
  Hi Vlad,
 
  I Updated the umls install guide; see
  https://code.google.com/p/ytex/wiki/UMLS_SQL_SERVER_3_1
 
  I would prefer to add the docs in the ctakes confluence, but as far as I
 can
  tell, I don't have write access there - can somebody give me write
 privileges
  on the ctakes confluence site?
 
  There was a bug in the umls install; copy
  https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes-
  ytex/scripts/data/build.xmlover
  the corresponding file in your ctakes-3.1.2 install
  (CTAKES_HOME\bin\ctakes-ytex\scripts\data) and you should be set.  The
  import is currently running on the UMLS 2013AA (I assume this will
 complete
  without issues as long as the umls schema hasn't changed from 2012).
 
  what trial and error did you have to go through to build the distro?
 
  -vj
 
 
  On Wed, Feb 5, 2014 at 5:33 PM, vijay garla vnga...@gmail.com wrote:
 
   Hi Vlad,
  
   sorry that the instructions aren't clear.
  
   re 1) What I am trying to say is install apache-ctakes-3.2.0-snapshot
   as usual (this is unchanged from 3.1.1).  After that you still have to
   apply the lib and resources (these are things that cannot be
   distributed via apache).
  
   re 2) Yes, I need to update those docs.  Hopefully will get to that at
   some point.  However, I assume you already have a UMLS DB (also assume
   SQL Server).  If you can't/don't want to use your existing umls DB,
   please tell me.  The I'll priortize upgrading the doc on importing the
   umls tables (the scripts are there).
  
   best,
  
   VJ
  
  
   On Wed, Feb 5, 2014 at 4:44 PM, vlad.valtchi...@gmail.com wrote:
  
   Hi VJ-
  
   so, with trial and error were able to make the distribution and now
   have the apache-ctakes-3.1.2-SNAPSHOT-bin.zip archive.
  
   Here's what's unclear.
  
   1. Is now this the only (combined) thing that you need for ctakes
   3.1.1 + Ytex?
   the current documentation (https://code.google.com/p/yte
   x/wiki/Installation_cTAKES_3_1?ts=1388793998updated=Instal
   lation_cTAKES_3_1)
   which most probably is outdated, talks about installing cTakes 3.1.1
   first and then applying 2 SNAPSHOT archives (downloadable) , lib and
   resources.
   This is a confusion point.
  
   2. The directions to import UMLS subset are then outdated as well.
   Maybe one should use the old version (ctakes 2.5 and ytex 0.8) to
   import the RRF files for the UMLS subset and then just use the
   resulting db. Thoughts?
  
   Thanks,
   Vlad Valtchinov
   Brigham Rad
  
  
   On Thursday, January 30, 2014 5:17:43 PM UTC-5, vijay garla wrote:
  
   Hi Vlad,
  
  
   All of ytex has been moved into ctakes, it is currently in a branch (
   https://svn.apache.org/repos/asf/ctakes/branches/ytex).  You don't
   have to install ytex-0.8 - instead you will have to build and
   install from the ytex branch to create your own distribution.  Steps
 2  3
  are correct.
  
   Although it is a pain, if you have the jdk, maven, and svn, you can
   easily build your own distro:
   * open a command prompt
   * make sure jdk, maven, and svn are in your path
   * cd to some directory where you want to check stuff out (I like
   c:\temp)
   * run the following commands
   rmdir /s /q ctakes
   svn co https://svn.apache.org/repos/asf/ctakes/branches/ytex ctakes
   cd ctakes mvn clean install -DskipTests
  
   And you will have the ctakes (with ytex) distro in
   ctakes\ctakes-distribution\target\apache-ctakes-3.1.2-SNAPSHOT-bin.z
   ip
  
   What is the process for getting the ytex branch merged into trunk?
   As I mentioned, there are very few changes to other ctakes
   classes/types - this should be completely complementary

Re: YTEX cTAKES 3.1.1 ready

2014-02-05 Thread vijay garla
Hi Vlad,

I Updated the umls install guide; see
https://code.google.com/p/ytex/wiki/UMLS_SQL_SERVER_3_1

I would prefer to add the docs in the ctakes confluence, but as far as I
can tell, I don't have write access there - can somebody give me write
privileges on the ctakes confluence site?

There was a bug in the umls install; copy
https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes-ytex/scripts/data/build.xmlover
the corresponding file in your ctakes-3.1.2 install
(CTAKES_HOME\bin\ctakes-ytex\scripts\data) and you should be set.  The
import is currently running on the UMLS 2013AA (I assume this will complete
without issues as long as the umls schema hasn't changed from 2012).

what trial and error did you have to go through to build the distro?

-vj


On Wed, Feb 5, 2014 at 5:33 PM, vijay garla vnga...@gmail.com wrote:

 Hi Vlad,

 sorry that the instructions aren't clear.

 re 1) What I am trying to say is install apache-ctakes-3.2.0-snapshot as
 usual (this is unchanged from 3.1.1).  After that you still have to apply
 the lib and resources (these are things that cannot be distributed via
 apache).

 re 2) Yes, I need to update those docs.  Hopefully will get to that at
 some point.  However, I assume you already have a UMLS DB (also assume SQL
 Server).  If you can't/don't want to use your existing umls DB, please tell
 me.  The I'll priortize upgrading the doc on importing the umls tables (the
 scripts are there).

 best,

 VJ


 On Wed, Feb 5, 2014 at 4:44 PM, vlad.valtchi...@gmail.com wrote:

 Hi VJ-

 so, with trial and error were able to make the distribution and now have
 the apache-ctakes-3.1.2-SNAPSHOT-bin.zip archive.

 Here's what's unclear.

 1. Is now this the only (combined) thing that you need for ctakes 3.1.1 +
 Ytex?
 the current documentation (https://code.google.com/p/yte
 x/wiki/Installation_cTAKES_3_1?ts=1388793998updated=Instal
 lation_cTAKES_3_1)
 which most probably is outdated, talks about installing cTakes 3.1.1
 first and then applying 2 SNAPSHOT archives (downloadable) , lib and
 resources.
 This is a confusion point.

 2. The directions to import UMLS subset are then outdated as well. Maybe
 one should use the old version (ctakes 2.5 and ytex 0.8) to
 import the RRF files for the UMLS subset and then just use the resulting
 db. Thoughts?

 Thanks,
 Vlad Valtchinov
 Brigham Rad


 On Thursday, January 30, 2014 5:17:43 PM UTC-5, vijay garla wrote:

 Hi Vlad,


 All of ytex has been moved into ctakes, it is currently in a branch (
 https://svn.apache.org/repos/asf/ctakes/branches/ytex).  You don't have
 to install ytex-0.8 - instead you will have to build and install from the
 ytex branch to create your own distribution.  Steps 2  3 are correct.

 Although it is a pain, if you have the jdk, maven, and svn, you can
 easily build your own distro:
 * open a command prompt
 * make sure jdk, maven, and svn are in your path
 * cd to some directory where you want to check stuff out (I like c:\temp)
 * run the following commands
 rmdir /s /q ctakes
 svn co https://svn.apache.org/repos/asf/ctakes/branches/ytex ctakes
 cd ctakes
 mvn clean install -DskipTests

 And you will have the ctakes (with ytex) distro in
 ctakes\ctakes-distribution\target\apache-ctakes-3.1.2-SNAPSHOT-bin.zip

 What is the process for getting the ytex branch merged into trunk?  As I
 mentioned, there are very few changes to other ctakes classes/types - this
 should be completely complementary and not affect any existing ctakes
 functionality.

 -vj






 On Thu, Jan 30, 2014 at 4:56 PM, vlad.va...@gmail.com wrote:

 Hi VJ--

 this is great!! Thanks for all the hard work on it!

 We're starting to look into the new install. For now we're trying the
 binaries out.

 There were these questions about the proper install steps:

 1. Do we first install ytex-0.8
 2. Then install the new cTakes 3.1.1 instance and also apply the
 SNAPSHOT lib and resources zips
 3. Work our way to install the UMLS ontologies in the db

 Its is not entirely clear from the new document (
 https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_
 1?ts=1388793998updated=Installation_cTAKES_3_1)
 if there's still need to install ytex-0.8, or YTEX has been entirely
 merged into cTakes?

 If the last statement is correct, there are missing parts in i.e the
 UMLS install steps that are linked from the new ctakes 3.1.1 document.

 Thanks,
 vlad


 On Friday, January 3, 2014 10:21:52 PM UTC-5, vijay garla wrote:

 Hello All,

 I have finished an initial cut at the port of YTEX to cTAKES 3.1.1.
  Most of the YTEX functionality has been ported and integrated with 
 cTAKES,
 and I've tested with MySQL and MS SQL Server (oracle tests pending).

 Most of the changes were made in new projects - very little existing
 cTAKES code has been modified.  The only non-trivial changes are
 in 
 /ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api
 - here I modified CharacterOffsetToLineTokenConverterCtakesImpl

Re: YTEX cTAKES 3.1.1 ready

2014-01-30 Thread vijay garla
Hi Vlad,

All of ytex has been moved into ctakes, it is currently in a branch (
https://svn.apache.org/repos/asf/ctakes/branches/ytex).  You don't have to
install ytex-0.8 - instead you will have to build and install from the ytex
branch to create your own distribution.  Steps 2  3 are correct.

Although it is a pain, if you have the jdk, maven, and svn, you can easily
build your own distro:
* open a command prompt
* make sure jdk, maven, and svn are in your path
* cd to some directory where you want to check stuff out (I like c:\temp)
* run the following commands
rmdir /s /q ctakes
svn co https://svn.apache.org/repos/asf/ctakes/branches/ytex ctakes
cd ctakes
mvn clean install -DskipTests

And you will have the ctakes (with ytex) distro in
ctakes\ctakes-distribution\target\apache-ctakes-3.1.2-SNAPSHOT-bin.zip

What is the process for getting the ytex branch merged into trunk?  As I
mentioned, there are very few changes to other ctakes classes/types - this
should be completely complementary and not affect any existing ctakes
functionality.

-vj






On Thu, Jan 30, 2014 at 4:56 PM, vlad.valtchi...@gmail.com wrote:

 Hi VJ--

 this is great!! Thanks for all the hard work on it!

 We're starting to look into the new install. For now we're trying the
 binaries out.

 There were these questions about the proper install steps:

 1. Do we first install ytex-0.8
 2. Then install the new cTakes 3.1.1 instance and also apply the SNAPSHOT
 lib and resources zips
 3. Work our way to install the UMLS ontologies in the db

 Its is not entirely clear from the new document (
 https://code.google.com/p/ytex/wiki/Installation_cTAKES_
 3_1?ts=1388793998updated=Installation_cTAKES_3_1)
 if there's still need to install ytex-0.8, or YTEX has been entirely
 merged into cTakes?

 If the last statement is correct, there are missing parts in i.e the UMLS
 install steps that are linked from the new ctakes 3.1.1 document.

 Thanks,
 vlad


 On Friday, January 3, 2014 10:21:52 PM UTC-5, vijay garla wrote:

 Hello All,

 I have finished an initial cut at the port of YTEX to cTAKES 3.1.1.  Most
 of the YTEX functionality has been ported and integrated with cTAKES, and
 I've tested with MySQL and MS SQL Server (oracle tests pending).

 Most of the changes were made in new projects - very little existing
 cTAKES code has been modified.  The only non-trivial changes are
 in 
 /ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api
 - here I modified CharacterOffsetToLineTokenConverterCtakesImpl 
 SingleDocumentProcessorCtakes to deal with newlines within sentences
 correctly.  Can somebody take a look at the changes in the ytex branch?

 I believe that the branch https://svn.apache.org/
 repos/asf/ctakes/branches/ytex is ready to be merged into ctakes trunk,
 but would like other users to test it as well.  Questions:

 * How can I distribute the ctakes binary distribution to ytex users
 before the merge? Can we make the branch build available somewhere?  The
 binary distribution is too large to host on the ytex google code site (max
 200 MB)
 * Non-ASF libraries - I have segregated these out into their own zip file
 that can be distributed via sourceforge.  As a stopgap, I can upload this
 to the ytex google code site, but would prefer to upload to sourceforge.
 * UMLS Derivatives - Ditto for these - would like to move to sourceforge.
 * Documentation - How can I update the confluence docs?  I would migrate
 the documentation from the google code website.

 Here the installation instructions (putting the wagon in front of the
 horse ...)

 https://code.google.com/p/ytex/wiki/Installation_cTAKES_
 3_1?ts=1388793998updated=Installation_cTAKES_3_1

 Best,

 VJ


  --
 You received this message because you are subscribed to the Google Groups
 ytex-users group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to ytex-users+unsubscr...@googlegroups.com.
 To post to this group, send email to ytex-us...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/ytex-users/70f03a80-ce1a-4c0e-b35d-5116d1c93ea0%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.



Re: sentence detector newline behavior

2014-01-27 Thread vijay garla
For clarity, I'd like to stress that the opennlp sentence model distributed
with ctakes today does 'work' with sentences that span newlines - as I
understand it, this model ignores newline tokens (or newlines are not
provided as features to that model).

I believe the improvements Tim and others are suggesting are for a new
sentence model + feature representation that takes advantage of newlines as
features.

Whatever we do, I believe we need backwards compatibility - those who are
using the current sentence model may need to continue using it.  To that
end:
* If we upgrade to the newest version of opennlp, will the old model work
(and produce the same results)?
* If a contributor trains a new model that uses a different feature
representation, I believe that should go into a new Sentence Detector
AnalysisEngine (or the same AE but with different configuration
parameters), so users have a choice between the old and the new.

-vj


On Mon, Jan 27, 2014 at 1:09 PM, digital paula cybersat...@hotmail.comwrote:




 Tim,

 I just had to chime in on a comment you made.My deadline has been
 extended a bit on my pressing issue but I do intend to get back to testing
 per VJ's fix or maybe another fix is in the works based on latest
 emails...I need to read them again since a lot has been stated on the issue.

 Okay, as a new user (working w/cTAKES since October) I have never thought
 what you had stated:

  And I think this is the kind of thing that can leave new users
 scratching their heads and doubting our overall competence.

 Yeah, the sentence-spanning-newline issue was a problem so I just brought
 attention to it by my post of inquiry earlier this month on VJ's fix from
 last month and worked around it with treating narrative as one string.

 Anyone who's looked at the code would appreciate and acknowledge that
 cTAKES is a powerful and complex application.  I'm overall impressed with
 it and I intend to continue to use it, improve it, and grow with it.  I've
 been delving deeper into cTAKES on the machine learning aspect...I'm
 struggling a bit with it and if anything I scratch my head and doubt my
 competence. ;-)

 Regards,
 Paula

  Date: Mon, 27 Jan 2014 09:52:00 -0500
  From: timothy.mil...@childrens.harvard.edu
  To: dev@ctakes.apache.org
  Subject: Re: sentence detector newline behavior
 
  OK, with the most recent version I am able to replicate the performance
  I was getting before. Thanks a lot Jörn!
 
  Assuming this is in the next incremental release of opennlp, how quickly
  can we get a re-trained model into cTAKES? I heard from a researcher at
  AMIA who tried cTAKES and because of this bug in the way we handle
  sentences was trying to find an outside sentence detector as a
  preprocess to cTAKES, and frankly that is insane. We should be able to
  get something this simple right. And I think this is the kind of thing
  that can leave new users scratching their heads and doubting our overall
  competence.
 
  James, I believe you are usually the one who rebuilds the models? What
  would be the best way to incorporate the data I have that has some
  instances of non-sentence terminating newlines?
 
  Tim
 
 
  On 01/27/2014 06:10 AM, Jörn Kottmann wrote:
   On 01/26/2014 11:29 PM, Miller, Timothy wrote:
   Yes, this fixes the whitespace sentence issue but the evaluation issue
   remains. I believe the problem is in SentenceSampleStream, where in
 the
   following block the whitespace trim happens before the LF character
 is
   replaced with the \n character. So test sentences that ended with LF
   will be one character longer than they should be.
  
  sentence = sentence.trim();
  sentence = replaceNewLineEscapeTags(sentence);
  sentencesString.append(sentence);
  int end = sentencesString.length();
  sentenceSpans.add(new Span(begin, end));
  sentencesString.append(' ');
  
   Yes, that must be the issue. During training the new line is inlucded
   in the span, and during
   detection the white space remover creates a span without the new line
   char.
  
   I suggest that the evaluator just ignores white space differences
   between sentences. My test case then
   has the expected performance numbers.
  
   What do you think?
  
   Anyway, I committed the change. Please give it a try.
  
   Jörn
 





Re: sentence detector newline behavior

2014-01-23 Thread vijay garla
 behavior

 The only rule I know of is that cTAKES (prior to ytex integration) always
 forces a sentence break at a newline.
 This was because the clinical notes cTAKES original processed never had
 newlines in the middle of a sentence, but did need sentence breaks to occur
 at end of sentence for good negation detection on those notes.
 I think Guergana earlier mentioned other EMRs also have this need, but it
 seems to not be ubiquitous.

 From others' posts, it seems that we could use an option in cTAKES to turn
 off this forcing of sentence breaks at newlines (or depending on how you
 look at it, an option to turn on the forcing of sentence breaks if we
 change the default behavior)

 I think we (cTAKES) need to decide the following:
  - do we want to do this for entire notes, or would it be  worth it to
 have it be on a section-by-section basis.
  - what do we make the default behavior - to force or not to force
 newlines to be sentence breaks
  - what data (that contains newlines) will we use for training the
 sentence detector

 Regardless of those answers, I think OpenNLP support for including
 newlines in training data would be valuable for those others who have
 sentences that span lines.  And having an option on OpenNLP to always break
 at newline would be useful for at least some cTAKES users (and we could
 remove the cTAKES code that does that)

 -- James

 -Original Message-
 From: dev-return-2390-Masanz.James=mayo@ctakes.apache.org [mailto:
 dev-return-2390-Masanz.James=mayo@ctakes.apache.org] On Behalf Of
 Jörn Kottmann
 Sent: Tuesday, January 21, 2014 4:29 AM
 To: dev@ctakes.apache.org
 Subject: Re: sentence detector newline behavior

 Yes, exactly, OPENNLP-602 is about training a sentence detector model
 which can use a new line as a end-of-sentence character.

 In case you have certain rules to split sentences we should have a look at
 them. The Sentence Detector could be extended to support a user provided
 rule based splitter. If there is an interest in that we could probably get
 it into 1.6.0 as well.

 Jörn

 On 01/20/2014 10:02 PM, Chen, Pei wrote:
  I presume Joern was suggesting that if he supports new lines in the
 opennlp SentenceDectector (either part of the trained models or post
 processing with some rules?) cTAKES will be able to use it out of the box
 and we should be able remove any additional custom logic that we currently
 have- which seems like a good idea.
 
  [but when to use within cTAKES individual components such as negation
  might be another discussion?] --Pei
 
  On Jan 20, 2014, at 12:46 PM, vijay garla vnga...@gmail.com wrote:
 
  The sentence detection opennlp model used by ctakes does not split
  sentences at newlines - there is additional logic in the takes
  sentence splitter that does this (and an alternative impl that
  doesn't is in the ytex branch). Afaik no retraining / change to the
  feature representation is necessary.
 
  Vj
 
  On Monday, January 20, 2014, Jörn Kottmann kottm...@gmail.com wrote:
 
  Hi all,
 
  currently I have quite a bit of time to work on OpenNLP, and would
  like to help you out with this issue.
 
  Here is the follow up issue for this change:
  https://issues.apache.org/jira/browse/OPENNLP-602
 
  I am still trying to figure out what would be the best option to
  implement this.
  In the training data a user could just use a special tag to identify
  the chars.
 
  Instead of NEWLINE it might be better to use CR and LF to
  encode these two chars in the training data. Any thoughts?
 
  I am planning to release this as part of OpenNLP 1.6.0.
 
  Thanks,
  Jörn
 
  On 05/22/2013 02:03 PM, Jörn Kottmann wrote:
 
  On 05/22/2013 01:17 PM, Miller, Timothy wrote:
 
  That's awesome! It might be worth trying at least. How does the
  training process change? Previously the training data would be one
  sentence per line, but with newlines as possible mid-sentence
  characters that could be trouble, is there a new representation
  for training data? Or would we have to use the training api?
  Good point, yes that will be a problem with the default training
  format, but it shouldn't be hard to solve. In the format itself we
  could define a new line tag e.g.
  NEWLINE to mark new lines.
  as a hack to make it work with 1.5.3 you could instead use a
  special char as a replacement for the new line char.
  When you pass the text down to the sentence detector a simple
  string replace could be used to convert all new line chars to the
  special new line marker char.
 
  If things work out for you performance wise as well we will just
  integrate it properly into OpenNLP for the next release.
 
  Could you produce a sentence detector training file with a new line
  marker char?
 
  You should try to pick a char you can also pass in on a terminal
  otherwise you have to use the API to train the model. The build in
  cross validation could be used to evaluate the performance.
 
  Jörn
 




Re: svn commit: r1551805 - /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java

2014-01-15 Thread vijay garla
The issue is indeed the sentence splitter - negation is limited to words
within the sentence, and if newlines are considered sentence boundaries, it
doesn't work properly (splitting on newlines breaks many other things as
well).  The YTEX branch includes a sentence splitter that does not
automatically split sentences on newlines.

best,

vj


On Wed, Jan 15, 2014 at 10:03 AM, Masanz, James J. masanz.ja...@mayo.eduwrote:

 Hi Paula,

 The sentence detector in 3.1.0 and 3.1.1 (and previous releases) assumes
 sentences don't cross line boundaries.
 OpenNLP is used to find sentence breaks, but then if newlines are found,
 those are also set (within cTAKES, not OpenNLP) to be sentence breaks.

 (just FYI I haven't had a chance to look at the ytex branch, which the
 subject commit is about)

 -- James

 -Original Message-
 From: dev-return-2375-Masanz.James=mayo@ctakes.apache.org [mailto:
 dev-return-2375-Masanz.James=mayo@ctakes.apache.org] On Behalf Of
 digital paula
 Sent: Tuesday, January 14, 2014 10:25 PM
 To: dev@ctakes.apache.org
 Subject: RE: svn commit: r1551805 -
 /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java







 Hello cTAKES Developer Community,
  I'm a little behind on reading poststhis one is from last month.  I
 think this issue is already addressed in current release? I'm still running
 the previous release...3.1.0.
 I just noticed something interesting, the negation didn't take when it is
 on a different line.  I just removed all carriage returns from narratives
 and negation picked it up as long as it's treated as one long string.   To
 better explain what I mean.  Two narrative comments below.

 1.  patient did not have diabetes
 2. patient did not have
 diabetes

 Number 1 above got negated but number 2 did not. This might be related to
 the issue w/the sectionizer.  I noticed that when I treated the narrative
 as one string the sectionizer never crashes with the NPE.   Well the
 sectionizer is of no point if narrative is as one string but it's helping
 me pinpoint the problem.

 Regards,
 Paula


  Date: Thu, 19 Dec 2013 11:04:57 -0500
  Subject: Re: FW: svn commit: r1551805 -
 /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
  From: vnga...@gmail.com
  To: dev@ctakes.apache.org
 
  Hi Pei,
 
  I'm not sure if that would solve the problem: change in the ytex branch
  causes newlines to be ignored (i.e. not treated as a token).  trunk's
  sentence splitter is splits sentences on newlines, so newlines would
 never
  be found in a sentence.  However, if we had a reproducer we could check
 it
  fairly easily in the ytex branch.
 
  Best,
 
  VJ
 
 
  On Thu, Dec 19, 2013 at 10:15 AM, Chen, Pei
  pei.c...@childrens.harvard.eduwrote:
 
   Vj,
   Do you think this is what was causing the NPE's [1]?
   If so, shall we make the same fix in trunk?
   --Pei
  
   [1]
  
 http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C924DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E
  
   -Original Message-
   From: vjapa...@apache.org [mailto:vjapa...@apache.org]
   Sent: Tuesday, December 17, 2013 9:15 PM
   To: comm...@ctakes.apache.org
   Subject: svn commit: r1551805 -
  
 /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
  
   Author: vjapache
   Date: Wed Dec 18 02:14:13 2013
   New Revision: 1551805
  
   URL: http://svn.apache.org/r1551805
   Log:
   add support for sentences that contain newline tokens.
  
   Modified:
  
  
 ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
  
   Modified:
  
 ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
   URL:
  
 http://svn.apache.org/viewvc/ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java?rev=1551805r1=1551804r2=1551805view=diff
  
  
 ==
   ---
  
 ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
   (original)
   +++
 ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctake
   +++
 s/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCta
   +++ kesImpl.java Wed Dec 18 02:14:13 2013
   @@ -32,8 +32,8 @@ import org.apache.uima.jcas.tcas.Annotat  import
   org.mitre.medfacts.i2b2.api.ApiConcept;
import org.mitre.medfacts.zoner.CharacterOffsetToLineTokenConverter;
import 

Re: sentence splitter forks/branches

2014-01-15 Thread vijay garla
It is unfortunately not that trivial, as allowing newlines within sentences
requires changes to the assertion and dependency parser modules.

If you're not using those AEs you could theoretically build the ytex
branch, and just add  ctakes-ytex-uima.jar and
ctakes-ytex-uima\desc\analysis_engine\SentenceDetectorAnnotator.xml to your
exsting ctakes install (haven't tried it, but it should work).

-vj


On Wed, Jan 15, 2014 at 1:57 PM, Lingren, Todd todd.ling...@cchmc.orgwrote:

 I have a general question about forks, specifically the YTEX branch that
 Vijay mentions.
 If I wanted to implement just the sentence splitter from YTEX into a
 currently existing 3.1 install, how would I do that? Is it possible? Or do
 I have to switch over completely to run from YTEX branch?

 Todd Lingren
 Biomedical Informatics
 Cincinnati Children's Hospital
 todd.ling...@cchmc.org
 513-803-9032


 -Original Message-
 From: vijay garla [mailto:vnga...@gmail.com]
 Sent: Wednesday, January 15, 2014 11:34 AM
 To: dev@ctakes.apache.org
 Subject: Re: svn commit: r1551805 -
 /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java

 The issue is indeed the sentence splitter - negation is limited to words
 within the sentence, and if newlines are considered sentence boundaries, it
 doesn't work properly (splitting on newlines breaks many other things as
 well).  The YTEX branch includes a sentence splitter that does not
 automatically split sentences on newlines.

 best,

 vj


 On Wed, Jan 15, 2014 at 10:03 AM, Masanz, James J. masanz.ja...@mayo.edu
 wrote:

  Hi Paula,
 
  The sentence detector in 3.1.0 and 3.1.1 (and previous releases)
  assumes sentences don't cross line boundaries.
  OpenNLP is used to find sentence breaks, but then if newlines are
  found, those are also set (within cTAKES, not OpenNLP) to be sentence
 breaks.
 
  (just FYI I haven't had a chance to look at the ytex branch, which the
  subject commit is about)
 
  -- James
 
  -Original Message-
  From: dev-return-2375-Masanz.James=mayo@ctakes.apache.org [mailto:
  dev-return-2375-Masanz.James=mayo@ctakes.apache.org] On Behalf Of
  digital paula
  Sent: Tuesday, January 14, 2014 10:25 PM
  To: dev@ctakes.apache.org
  Subject: RE: svn commit: r1551805 -
  /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes
  /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes
  Impl.java
 
 
 
 
 
 
 
  Hello cTAKES Developer Community,
   I'm a little behind on reading poststhis one is from last month.
  I think this issue is already addressed in current release? I'm still
  running the previous release...3.1.0.
  I just noticed something interesting, the negation didn't take when it
  is on a different line.  I just removed all carriage returns from
 narratives
  and negation picked it up as long as it's treated as one long string.
 To
  better explain what I mean.  Two narrative comments below.
 
  1.  patient did not have diabetes
  2. patient did not have
  diabetes
 
  Number 1 above got negated but number 2 did not. This might be related
  to the issue w/the sectionizer.  I noticed that when I treated the
 narrative
  as one string the sectionizer never crashes with the NPE.   Well the
  sectionizer is of no point if narrative is as one string but it's
  helping me pinpoint the problem.
 
  Regards,
  Paula
 
 
   Date: Thu, 19 Dec 2013 11:04:57 -0500
   Subject: Re: FW: svn commit: r1551805 -
  /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes
  /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes
  Impl.java
   From: vnga...@gmail.com
   To: dev@ctakes.apache.org
  
   Hi Pei,
  
   I'm not sure if that would solve the problem: change in the ytex
   branch causes newlines to be ignored (i.e. not treated as a token).
   trunk's sentence splitter is splits sentences on newlines, so
   newlines would
  never
   be found in a sentence.  However, if we had a reproducer we could
   check
  it
   fairly easily in the ytex branch.
  
   Best,
  
   VJ
  
  
   On Thu, Dec 19, 2013 at 10:15 AM, Chen, Pei
   pei.c...@childrens.harvard.eduwrote:
  
Vj,
Do you think this is what was causing the NPE's [1]?
If so, shall we make the same fix in trunk?
--Pei
   
[1]
   
  http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C924
  DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E
   
-Original Message-
From: vjapa...@apache.org [mailto:vjapa...@apache.org]
Sent: Tuesday, December 17, 2013 9:15 PM
To: comm...@ctakes.apache.org
Subject: svn commit: r1551805 -
   
  /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes
  /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes
  Impl.java
   
Author: vjapache
Date: Wed Dec 18 02:14:13 2013
New Revision: 1551805
   
URL

Re: YTEX cTAKES 3.1.1 ready

2014-01-08 Thread vijay garla
 sisEngineFactory_impl.java:94)
 at
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co
 mpositeResourceFactory_impl.java:62)
 at
 org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)

 at
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav
 a:387)
 at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java
 :254)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
 tASB(AggregateAnalysisEngine_impl.java:431)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
 tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
 tialize(AggregateAnalysisEngine_impl.java:185)
 at
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy
 sisEngineFactory_impl.java:94)
 at
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co
 mpositeResourceFactory_impl.java:62)
 at
 org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)

 at
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav
 a:354)
 at org.apache.uima.tools.cvd.MainFrame.setupAE(MainFrame.java:1484)
 at
 org.apache.uima.tools.cvd.MainFrame.loadAEDescriptor(MainFrame.java:4
 77)
 at
 org.apache.uima.tools.cvd.control.AnnotatorOpenEventHandler.actionPer
 formed(AnnotatorOpenEventHandler.java:52)
 at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
 at javax.swing.AbstractButton$Handler.actionPerformed(Unknown
 Source)
 at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown
 Source)
 at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
 at javax.swing.AbstractButton.doClick(Unknown Source)
 at javax.swing.plaf.basic.BasicMenuItemUI.doClick(Unknown Source)
 at
 javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(Unknown
 Source)
 at java.awt.Component.processMouseEvent(Unknown Source)
 at javax.swing.JComponent.processMouseEvent(Unknown Source)
 at java.awt.Component.processEvent(Unknown Source)
 at java.awt.Container.processEvent(Unknown Source)
 at java.awt.Component.dispatchEventImpl(Unknown Source)
 at java.awt.Container.dispatchEventImpl(Unknown Source)
 at java.awt.Component.dispatchEvent(Unknown Source)
 at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown
 Source)
 at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
 at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
 at java.awt.Container.dispatchEventImpl(Unknown Source)
 at java.awt.Window.dispatchEventImpl(Unknown Source)
 at java.awt.Component.dispatchEvent(Unknown Source)
 at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
 at java.awt.EventQueue.access$200(Unknown Source)
 at java.awt.EventQueue$3.run(Unknown Source)
 at java.awt.EventQueue$3.run(Unknown Source)
 at java.security.AccessController.doPrivileged(Native Method)
 at
 java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Sour
 ce)
 at
 java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Sour
 ce)
 at java.awt.EventQueue$4.run(Unknown Source)
 at java.awt.EventQueue$4.run(Unknown Source)
 at java.security.AccessController.doPrivileged(Native Method)
 at
 java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Sour
 ce)
 at java.awt.EventQueue.dispatchEvent(Unknown Source)
 at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown
 Source)
 at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
 at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown
 Source)
 at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
 at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
 at java.awt.EventDispatchThread.run(Unknown Source)



 On Saturday, January 4, 2014 9:06:52 AM UTC+5:45, vijay garla wrote:

 Hello All,

 I have finished an initial cut at the port of YTEX to cTAKES 3.1.1.  Most
 of the YTEX functionality has been ported and integrated with cTAKES, and
 I've tested with MySQL and MS SQL Server (oracle tests pending).

 Most of the changes were made in new projects - very little existing
 cTAKES code has been modified.  The only non-trivial changes are
 in 
 /ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api
 - here I modified CharacterOffsetToLineTokenConverterCtakesImpl 
 SingleDocumentProcessorCtakes to deal with newlines within sentences
 correctly.  Can somebody take a look at the changes in the ytex branch?

 I believe that the branch https://svn.apache.org/
 repos/asf/ctakes/branches/ytex is ready

Re: YTEX cTAKES 3.1.1 ready

2014-01-07 Thread vijay garla
see answers inline


On Tue, Jan 7, 2014 at 10:35 AM, Chen, Pei
pei.c...@childrens.harvard.eduwrote:

  * How can I distribute the ctakes binary distribution to ytex users
 before the
  merge? Can we make the branch build available somewhere?  The binary
  distribution is too large to host on the ytex google code site (max 200
 MB)
 Is this for testing purposes?  Or official release? If it's just for
 testing, there will be more options...
 Are you referring to the convenience binary/zip file?  Or maven artifacts
 that could be deployed to the SNAPSHOTS repo [1]?
 If it's for testing, you can always have users build from source via mvn
 package (assuming you added the ytex* to the ctakes-distribution module)?
 Again if it's for testing, you can always try the svn or home dir.  But
 it's not the recommended channel for actual distribution to users because
 that normally has to go through the normal release process (Voting, etc.).


This is for testing.  Ytex has been added to the ctakes distro



  * Non-ASF libraries - I have segregated these out into their own zip
 file that
  can be distributed via sourceforge.  As a stopgap, I can upload this to
 the ytex
  google code site, but would prefer to upload to sourceforge.
 Are these optional 3rd party libs available via maven central?


Most of them are.  The only exception is the MS SQL Driver, which is freely
redistributable (see http://msdn.microsoft.com/en-us/sqlserver/aa937725).
 I did not find anything similar for the oracle jdbc driver so I left that
out (users will have to download that separately).

The zip is here:
https://ytex.googlecode.com/files/ctakes-ytex-lib-3.1.2-SNAPSHOT.zip



  * UMLS Derivatives - Ditto for these - would like to move to sourceforge.
 Are you planning to distribute them via maven central?  I think it would
 be nice to make these available as maven artifacts.
 If so, what is your sourceforge id? We can grant you access to the
 existing ctakes resourcse project [2]:
 The pom.xml is already setup to upload to OSS Sonatype (request a login
 for oss sonatype to perform a mvn deploy for the actual upload later on)...


I have placed the umls resources behind a server that requires UTS
authentication (note that this obviates the need for supplying umls
username and password in ctakes config files/scripts).

The umls resources are here:
http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

This is a plain old apache http server with the module for CAS (the other
CAS) authentication.  If ctakes has an apache server somewhere, we could do
the same.



  * Documentation - How can I update the confluence docs?  I would migrate
  the documentation from the google code website.
 This would be great; You've been added to the cTAKES confluence space [3].

 Downloading the code now... To be continued...

 [1]
 https://repository.apache.org/content/groups/snapshots/org/apache/ctakes/
 [2] http://sourceforge.net/p/ctakesresources/code/HEAD/tree/trunk/
 [3] https://cwiki.apache.org/confluence/display/CTAKES/cTAKES

  -Original Message-
  From: vijay garla [mailto:vnga...@gmail.com]
  Sent: Friday, January 03, 2014 10:23 PM
  To: ytex-us...@googlegroups.com; ctakes-...@incubator.apache.org
  Subject: YTEX cTAKES 3.1.1 ready
 
  Hello All,
 
  I have finished an initial cut at the port of YTEX to cTAKES 3.1.1.
  Most of the
  YTEX functionality has been ported and integrated with cTAKES, and I've
  tested with MySQL and MS SQL Server (oracle tests pending).
 
  Most of the changes were made in new projects - very little existing
 cTAKES
  code has been modified.  The only non-trivial changes are in /ctakes-
  assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api
  - here I modified CharacterOffsetToLineTokenConverterCtakesImpl 
  SingleDocumentProcessorCtakes to deal with newlines within sentences
  correctly.  Can somebody take a look at the changes in the ytex branch?
 
  I believe that the branch
  https://svn.apache.org/repos/asf/ctakes/branches/ytex is ready to be
  merged into ctakes trunk, but would like other users to test it as well.
   Questions:
 
  * How can I distribute the ctakes binary distribution to ytex users
 before the
  merge? Can we make the branch build available somewhere?  The binary
  distribution is too large to host on the ytex google code site (max 200
 MB)
  * Non-ASF libraries - I have segregated these out into their own zip
 file that
  can be distributed via sourceforge.  As a stopgap, I can upload this to
 the ytex
  google code site, but would prefer to upload to sourceforge.
  * UMLS Derivatives - Ditto for these - would like to move to sourceforge.
  * Documentation - How can I update the confluence docs?  I would migrate
  the documentation from the google code website.
 
  Here the installation instructions (putting the wagon in front of the
 horse
  ...)
 
  https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_1?ts=13887939
  98updated

Re: ytex branch

2013-11-26 Thread vijay garla
Just adding fields

Best

Vj

On Tuesday, November 26, 2013, Chen, Pei wrote:

 Hi VJ,
 Sounds cool.  I guess once things are in the branch, we can start to take
 a look to see if it makes sense to incorporate them directly into existing
 ctakes modules or not?
 Just curious- were the type system changes mainly adding additional
 fields?  Just planning ahead especially for proposed type system changes...

 --Pei

  -Original Message-
  From: vijay garla [mailto:vnga...@gmail.com javascript:;]
  Sent: Monday, November 25, 2013 5:07 PM
  To: ctakes-...@incubator.apache.org javascript:;
  Subject: ytex branch
 
  Hello All,
 
  I'm close to done with the port of ytex to ctakes.  I would like to
 create
  branch to commit the changes to for review by the ctakes elders and other
  developers.  I will be adding the following projects:
  * ctakes-ytex-res - resources
  * ctakes-ytex - no uima/ctakes dependencies - primarily semantic
 similarity
  code
  * ctakes-ytex-uima - ctakes annotators and pipeline configs
 
  I made very few changes to other ctakes modules, these include:
  * fixing spring version conflicts
  * treatment of newlines in various annotators
  * added properties to OntologyConcept type to support word sense
  disambiguation
 
  Any objections to a branch?
 
  The main thing left to do is packaging for the binary distro.
  * setup ant scripts: I think bin\scripts would be a good spot
  * adding to ctakes-resources download: I have the following to add:
  - delimited text file with lookup dictionary (similar to hsqldb for
 current
  dictionary lookup)
  - concept graphs for semantic similarity and WSD
  - libraries for jdbc drivers (mysql, oracle, sql server) and hibernate
 For the
  ctakes-resources additions, I can create a zip file to add to the ctakes-
  resources, and send it to somebody (I think it will be a bit big to
 attach to a
  ticket, and the whole point is not to have non-asf compliant stuff
 lurking
  around apache)
 
  TIA,
 
  VJ



Re: using umls dictionary lookup offline?

2013-11-18 Thread vijay garla
No need to write any new annotator, just point
org.apache.ctakes.dictionary.lookup.ae.DictionaryLookupAnnotator at the
UMLS HSQL DB.

The UMLS password check is a very weak chastity belt.  For ytex, we make
umls resources available via a UTS password protected website - to get the
resources the user has to enter their UTS username/password which ensures
that the user has accepted the UMLS licenses.  I think making ctakes
resources available via a similar method would be more elegant.

-vj


On Fri, Nov 15, 2013 at 8:50 AM, Chen, Pei
pei.c...@childrens.harvard.eduwrote:

 Hi Matt,
 The license validation is ultimately the responsibility of the user.
 So if you ensure you have the license, technically, what you can do is
 just write a Annotator that just uses those resources directly :).
 --Pei

  -Original Message-
  From: Coarr, Matt [mailto:mco...@mitre.org]
  Sent: Thursday, November 14, 2013 9:37 PM
  To: dev@ctakes.apache.org
  Subject: using umls dictionary lookup offline?
 
  What would I need to do to run ctakes offline? In particular, I believe
 that
  means what do I need to do to get the full dictionary lookup to work?
 
  Do I need to build my own UMLS dictionary database and use a custom
  dictionary class?
 
  I looked around on the wiki (in the ctakes 3.0 and 3.1 developer guides
 and
  the ctakes 3.0 component guide).
 
  I know way back when we used to have this before we simplified things by
  pre-packaging the UMLS dictionary and doing online validation of the UMLS
  username and password.
 
  Thanks!
  Matt
 




ytex ctakes patches

2013-10-28 Thread vijay garla
For the YTEX port, I've taken a few baby steps ... I've filed some jira
tickets with patches:
CTAKES-253https://issues.apache.org/jira/browse/CTAKES-253
 and CTAKES-252 https://issues.apache.org/jira/browse/CTAKES-252, more
coming soon.

I have a question regarding testing: it seems to me that the old analysis
engines all use xml descriptors, whereas the newer analysis engines appear
to be using uimafit.  I understand why that's the case, but the dissonance
between the development  user directory structures makes it very difficult
to write portable tests and portable xml-based ae configs: for a 'user'
install, everything under desc is in the classpath, whereas for the
developer install, none of the desc directories are in the classpath.

When I'm writing an XML-based aggregate AE config, I prefer to import
delegate AEs by name instead of location as resolving files by classpath is
much more flexible than resolution by file paths.  Can we add the desc
directories to the maven-surefire-plugin classpath (as is done with
resources) so that the classpath is consistent across developer/user
installs?

TIA,

VJ


Re: move ytex annotators to ctakes.apache.org?

2013-10-21 Thread vijay garla
Hello All,

I've started on the ytex-ctakes port, and have some packaging questions.

* Hibernate  Weka  JDBC Driver (SQL Server, Oracle) dependencies:
I understand that we will not ship these jars as part of the ctakes
download.  Can we bundle the jars and ship them as part of an additional
download, available via sourceforge?  Hibernate is available via maven
central, weka and jdbc not.  I have added weka  jdbc drivers as system
dependencies.  I'm not sure how you collect all the dependencies for
shipment, but how do I tell maven not to include these?  Is it OK to check
weka  jdbc into source control?

* desc vs project-res
What are the guidelines for what goes where?  Configuration files are found
in both places, whereas data/models are in the -res directory.  Ytex has
many non-uima config files (hibernate, spring) which should be
user-modifiable, and I would put them in the desc directory.  However, desc
is not in the project classpath (but it is in the classpath for the ctakes
distro, e.g. in runctakesCPE.bat).  Any reason for this dissonance?  I
would add desc as a resources directory in the pom.

* distribution of umls concept graphs
for semantic similarity and word sense disambiguation, ytex provides
concept graphs derived from the UMLS.  We have a download site that
requires UTS login to get these concept graphs (
http://www.ytex-nlp.org/umls.download/secure/0.7/umls.zip).  I take it I
would just create a -res directory and add the concept graphs here, and
they would automagically appear in the ctakes-resources zip?

* patches to other ctakes projects
ytex has some patches to other ctakes annotators for handling edge cases
where they throw up with an exception; I will check to see if these changes
have already been made.  If not, I will file separate Jira tickets for
these patches.  Also, the CharacterOffsetToLineTokenConverterCtakesImpl
needs to be modified to properly handle cases where newlines are in
sentences; I will add a patch for that as well.

* post download setup
ytex provides an ant script to simplify the post download setup (database
schema, setup, configuration file generation).  Would it be possible to
ship ant with the ctakes distro, so that users can execute these scripts?
 If not, how best to automate setup?  I know from experience with earlier
versions of ytex that setting up the database schema is error prone, and
that this needs to be automated.


I was planning on creating the following projects:
* ctakes-ytex:
Base ytex, includes semantic similarity tools.  This has no dependencies on
ctakes, and I would create a separate distribution of just this package for
a semantic similarity distro.
* ctakes-ytex-res
Includes concept graphs for semantic similarity.
* ctakes-ytex-web
Provides User Interface, RESTful, and WebServices interface to semantic
similarity service.  This has no dependencies on ctakes, and this would be
included in the semantic similarity distro.
* ctakes-ytex-uima
Includes ytex analysis engines
* ctakes-ytex-uima-res
resources for ytex analysis engines

Alternatively, I can add ctakes-ytex-uima and ctakes-ytex-uima-res to
existing projects (don't know where they would fit).

Best,

Vijay




On Thu, Oct 3, 2013 at 7:06 PM, vijay garla vnga...@gmail.com wrote:

 Hi Pei,

 The WSD annotator relies on the semantic similarity component, which
 is a general purpose tool not strictly limited to ctakes or NLP.  I
 would like to keep the semantic similarity component 'standalone',
 i.e. with no dependencies on ctakes, and make it  redistributable on
 its own.  If that is possible as part of ctakes, I'd love to move it.
 If not, I'd leave the semantic similarity and the associated WSD
 annotator on google code.

 For those of you who want the back story:
 http://www.biomedcentral.com/1471-2105/13/261
 http://jamia.bmj.com/content/20/5/882.long


 -vj

 On Thu, Oct 3, 2013 at 5:13 PM, Chen, Pei
 pei.c...@childrens.harvard.edu wrote:
  vj,
  Were you thinking of contributing the new ytext Word Sense
 Disambiguation component as well- I think that will be really cool.
  --Pei
 
  -Original Message-
  From: ksa...@gmail.com [mailto:ksa...@gmail.com] On Behalf Of Karthik
  Sarma
  Sent: Thursday, October 03, 2013 1:05 PM
  To: dev@ctakes.apache.org
  Subject: Re: move ytex annotators to ctakes.apache.org?
 
  This would be quite valuable -- in particular, ytex's annotation
 database
  connection is much easier to use than what ships with cTAKES. There are
 a
  fair number of other advantages, and I think they'd all be very
 valuable!
 
 
 
 
 
  --
  Karthik Sarma
  UCLA Medical Scientist Training Program Class of 20??
  Member, UCLA Medical Imaging  Informatics Lab Member, CA Delegation
  to the House of Delegates of the American Medical Association
  ksa...@ksarma.com
  gchat: ksa...@gmail.com
  linkedin: www.linkedin.com/in/ksarma
 
 
  On Thu, Oct 3, 2013 at 5:50 AM, vijay garla vnga...@gmail.com wrote:
 
   Hello All,
  
   I'd like to contribute ytex to ctakes

move ytex annotators to ctakes.apache.org?

2013-10-03 Thread vijay garla
Hello All,

I'd like to contribute ytex to ctakes.  YTEX's main feature is the
ability to store *any* ctakes (or uima) annotation in a relational
database (in a relational format), and the ability to export these
annotations to ML packages (weka, libsvm, matlab, R).  All of this is
purely declarative/via configuration.

In addtion, Ytex provides the following:
* Negation Detection with Negex
* SegmentRegexAnnotator - section detection with regular expressions
* NamedEntityRegexAnnotator - named entity detection with regular expressions
* Sentence Splitter - modified ctakes sentence splitter making
sentence split patterns configurable (not hardcoded to \n)

YTEX currently works with ctakes 2.5; I would like to upgrade it to
the latest ctakes, and if the community is interested, contribute to
ctakes.apache.org.

A licensing question: YTEX uses Spring (apache 2.0 license), Hibernate
(lgpl 2.1),  weka (gpl).  Are there any issues with including these?

Cheers

vj


Re: move ytex annotators to ctakes.apache.org?

2013-10-03 Thread vijay garla
Hi Pei,

The WSD annotator relies on the semantic similarity component, which
is a general purpose tool not strictly limited to ctakes or NLP.  I
would like to keep the semantic similarity component 'standalone',
i.e. with no dependencies on ctakes, and make it  redistributable on
its own.  If that is possible as part of ctakes, I'd love to move it.
If not, I'd leave the semantic similarity and the associated WSD
annotator on google code.

For those of you who want the back story:
http://www.biomedcentral.com/1471-2105/13/261
http://jamia.bmj.com/content/20/5/882.long


-vj

On Thu, Oct 3, 2013 at 5:13 PM, Chen, Pei
pei.c...@childrens.harvard.edu wrote:
 vj,
 Were you thinking of contributing the new ytext Word Sense Disambiguation 
 component as well- I think that will be really cool.
 --Pei

 -Original Message-
 From: ksa...@gmail.com [mailto:ksa...@gmail.com] On Behalf Of Karthik
 Sarma
 Sent: Thursday, October 03, 2013 1:05 PM
 To: dev@ctakes.apache.org
 Subject: Re: move ytex annotators to ctakes.apache.org?

 This would be quite valuable -- in particular, ytex's annotation database
 connection is much easier to use than what ships with cTAKES. There are a
 fair number of other advantages, and I think they'd all be very valuable!





 --
 Karthik Sarma
 UCLA Medical Scientist Training Program Class of 20??
 Member, UCLA Medical Imaging  Informatics Lab Member, CA Delegation
 to the House of Delegates of the American Medical Association
 ksa...@ksarma.com
 gchat: ksa...@gmail.com
 linkedin: www.linkedin.com/in/ksarma


 On Thu, Oct 3, 2013 at 5:50 AM, vijay garla vnga...@gmail.com wrote:

  Hello All,
 
  I'd like to contribute ytex to ctakes.  YTEX's main feature is the
  ability to store *any* ctakes (or uima) annotation in a relational
  database (in a relational format), and the ability to export these
  annotations to ML packages (weka, libsvm, matlab, R).  All of this is
  purely declarative/via configuration.
 
  In addtion, Ytex provides the following:
  * Negation Detection with Negex
  * SegmentRegexAnnotator - section detection with regular expressions
  * NamedEntityRegexAnnotator - named entity detection with regular
  expressions
  * Sentence Splitter - modified ctakes sentence splitter making
  sentence split patterns configurable (not hardcoded to \n)
 
  YTEX currently works with ctakes 2.5; I would like to upgrade it to
  the latest ctakes, and if the community is interested, contribute to
  ctakes.apache.org.
 
  A licensing question: YTEX uses Spring (apache 2.0 license), Hibernate
  (lgpl 2.1),  weka (gpl).  Are there any issues with including these?
 
  Cheers
 
  vj
 


Re: Next cTAKES release (3.1)?

2013-06-27 Thread vijay garla
We released code on using cTAKES to annotate clinical text and SVMs that
use the annotations to classify clinical text from the CMC 2007 and I2B2
2008 challenges:

We did the cmd 2007 with cTAKES 2.5:
https://code.google.com/p/ytex/wiki/WordSenseDisambiguation_V08#Reproducing_results_on_CMC_2007_challengehttps://code.google.com/p/ytex/downloads/list


And the i2b2 2008 with the version of cTAKES distributed with the first
version of ARC:
https://code.google.com/p/ytex/wiki/FeatEng_V05#i2b2_2008

These are both publicly available datasets, and represent real-world
problems (in general I believe when publishing a paper the code should be
reproducible and made publicly available, but that's a different issue).

When we get around to upgrading YTEX to cTAKES 3.1, we would like to
upgrade these samples as well.

Best,

VJ



On Thu, Jun 27, 2013 at 8:32 PM, Andy McMurry mcmurry.a...@gmail.comwrote:

 +1 suggestion for documenting many examples of getting started NLP
 datasets.

 I have at least one we can use that was created by our lead Pathologist

 https://open.med.harvard.edu/svn/scrubber/releases/3.0/data/input/cases/train/traincase.xml

 We should provide at least one sample for each domain.
 Trouble is, privacy requires that these examples be made up by hand and
 not copy-pasted from EMR systems.

 --Andy

 On Jun 27, 2013, at 5:32 PM, Girivaraprasad Nambari girinamb...@gmail.com
 wrote:

  +1 for this observation Andy!
 
  Lowering time will motive users in writing blogs about features, how to,
  etc., which reduces core team work load on documentation.
 
  I have been trying to write a small how to write standalone client for
  ctakes with my experience (I saw at least 4 users posted similar
 question
  in last 2 months), but not getting enough time because ctakes depends on
  lot of other frameworks (UimaFit, cleartk, UIMA Framework etc.,), most of
  my spare time is being spent on juggling between these frameworks,
 posting
  and browsing those forums, relating observations to ctakes code. I think
 we
  need to have some high level documentation about these (with links to
  corresponding forums).
 
  Above case is for developers (I think this will be more user base as
 ctakes
  progress), for users I think documentation is lot better though some
  improvements need to be done.
 
  As a developer I felt tough with lack of sample training data (I am still
  struggling in this area even though I browsed all relevant code), though
  training class are there. I understood that there are licensing issues
 with
  REAL data, but at least some hand made example sentences, which may not
 be
  real but helps developers in understanding the type/structure of input
  TRAINING classes expecting. This way people who browse the code can
 reverse
  engineer and develop their own models. Sorry if you guys feel this as
  novice issue, but I feel most of the developers will be novice when they
  adopt a system and Machine Learning/NLP is ocean. Some documentation in
  this area will same lot of time for us.
 
  I wish there will be some activity in this area from ctakes core team.
 
  Thank you,
  Giri
 
 
 
  On Thu, Jun 27, 2013 at 5:11 PM, Andy McMurry mcmurry.a...@gmail.com
 wrote:
 
  ctakes is at a point where we have a LOT of features but it is still
 hard
  to get started.
 
  Judging from the mailing lists a lot of how cTakes works is not obvious
  and requires hand holding.
  This is very typical in early FOSS projects.
 
  Lowering the time to get invested in ctakes gets more users AND better
 bug
  reports, FAQ, etc.
 
  thoughts?
  --Andy
 
 
  On Apr 11, 2013, at 8:55 PM, Chen, Pei 
 pei.c...@childrens.harvard.edu
  wrote:
 
  Hi,
  I just wanted to gauge the interest of creating the next release of
  cTAKES (3.1) which is currently marked for May in Jira-
 
  There have already been 22/53 issues [1] marked as fixed or closed.
  Plenty of bug fixes and new components including:
  - New CEM Instance Template population
  - New Dependency Parser/Semantic Role Labeler
  - New optional Clear POSTagger
  - New regression testing component
 
  Should we wait for the Temporal component?
 
  [1]
 
 https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%223.1%22%20AND%20project%20%3D%20CTAKES