Re: The SegmentRegexAnnotator of Ytex
Can you make sure you did everything documented here: https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation I can see from the stack trace that hibernate is not in the classpath (see section 'Unzip YTEX Libraries') Best, VJ On Tue, Jul 14, 2015 at 2:41 AM, Oranit Dror ora...@algotec.co.il wrote: Thank you, Vijay. However, I am still encountering with the crash. Best, Oranit. -Original Message- From: vijay garla [mailto:vnga...@gmail.com] Sent: Monday, July 13, 2015 5:53 PM To: dev@ctakes.apache.org Subject: Re: The SegmentRegexAnnotator of Ytex see https://cwiki.apache.org/confluence/display/CTAKES/User%27s+Guide best, vj On Mon, Jul 13, 2015 at 2:50 AM, Oranit Dror ora...@algotec.co.il wrote: Hello, I am using ctakes 3.2.2. and recently I have tried to apply the YTEX pipeline. Particularly, I am interested in the SegmentRegexAnnotator of Ytex. My questions are: 1. When running the pipeline, an org.apache.uima.resource.ResourceInitializationException is thrown, probably due to a failure in the initialization of org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator. Below is the stack trace. 2. Where can I find information on how the SegmentRegexAnnotator works, especially where the list of segments is defined. Thank you, Oranit. The stack trace for the Ytex pipeline crash: 12 יול 2015 09:47:52 ERROR RunEngine - Failed to create AE from xml descriptor :E:/Data/Views/oranit_nlp/subprod1/nlp/java/algotec-nlp/desc/desc/algotec-nlp/desc/analysis_engine/AggregateDiseaseYtexUMLSProcessorDescriptor.xml org.apache.uima.resource.ResourceInitializationException: Initialization of annotator class org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator failed. (Descriptor: file:/E:/Program Files/apache-ctakes-3.2.2-rc2/desc/ctakes-ytex-uima/desc/analysis_engine/SegmentRegexAnnotator.xml) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:252) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:156) at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185) at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185) at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:354) at com.algotec.nlp.RunEngine.createCasObjects(RunEngine.java:1399) at com.algotec.nlp.RunEngine.ensureCasObjects(RunEngine.java:1373) at com.algotec.nlp.RunEngine.analyze(RunEngine.java:954) at com.algotec.nlp.servlet.ReportNLPServlet.doPost(ReportNLPServlet.java:128) at com.algotec.nlp.servlet.ReportNLPServlet.doPost(ReportNLPServlet.java:103) at javax.servlet.http.HttpServlet.service(HttpServlet.java:647) at javax.servlet.http.HttpServlet.service(HttpServlet.java
Re: The SegmentRegexAnnotator of Ytex
see https://cwiki.apache.org/confluence/display/CTAKES/User%27s+Guide best, vj On Mon, Jul 13, 2015 at 2:50 AM, Oranit Dror ora...@algotec.co.il wrote: Hello, I am using ctakes 3.2.2. and recently I have tried to apply the YTEX pipeline. Particularly, I am interested in the SegmentRegexAnnotator of Ytex. My questions are: 1. When running the pipeline, an org.apache.uima.resource.ResourceInitializationException is thrown, probably due to a failure in the initialization of org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator. Below is the stack trace. 2. Where can I find information on how the SegmentRegexAnnotator works, especially where the list of segments is defined. Thank you, Oranit. The stack trace for the Ytex pipeline crash: 12 יול 2015 09:47:52 ERROR RunEngine - Failed to create AE from xml descriptor :E:/Data/Views/oranit_nlp/subprod1/nlp/java/algotec-nlp/desc/desc/algotec-nlp/desc/analysis_engine/AggregateDiseaseYtexUMLSProcessorDescriptor.xml org.apache.uima.resource.ResourceInitializationException: Initialization of annotator class org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator failed. (Descriptor: file:/E:/Program Files/apache-ctakes-3.2.2-rc2/desc/ctakes-ytex-uima/desc/analysis_engine/SegmentRegexAnnotator.xml) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:252) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:156) at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185) at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185) at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:354) at com.algotec.nlp.RunEngine.createCasObjects(RunEngine.java:1399) at com.algotec.nlp.RunEngine.ensureCasObjects(RunEngine.java:1373) at com.algotec.nlp.RunEngine.analyze(RunEngine.java:954) at com.algotec.nlp.servlet.ReportNLPServlet.doPost(ReportNLPServlet.java:128) at com.algotec.nlp.servlet.ReportNLPServlet.doPost(ReportNLPServlet.java:103) at javax.servlet.http.HttpServlet.service(HttpServlet.java:647) at javax.servlet.http.HttpServlet.service(HttpServlet.java:728) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at
Re: Question about YTEX
You need to annotate some documents with a Collection Processing Engine that stores results in the YTEX database. I suggest walking through the fracture demo sample here: https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2.0+-+YTEX+DBCollectionReader -vj On Fri, Jun 12, 2015 at 6:46 PM, Tsung-Ting Kuo ts...@ucsd.edu wrote: Hi Vijay, I would like to update my questions: (1) *Auto-complete of concept IDs. *I realized that only concepts in the “v_snomed_fword_lookup” table are usable (e.g., I use “cough” and the auto-complete works as attached). (2) *Clinical document searching. *Which table should I use to put the clinical documents in order to be searchable from the “Semantic Search” function? (3) *Fracture demo. *I saw there is a “fracture_demo” table in the YTEX database, how could I use it in the “Semantic Search” function? Thanks very much! Best regards, Tim *From:* Tsung-Ting Kuo [mailto:ts...@ucsd.edu] *Sent:* Thursday, June 11, 2015 11:10 AM *To:* 'Vijay Garla' *Cc:* dev@ctakes.apache.org *Subject:* RE: Question about YTEX Thanks a lot, I followed the instructions and run “ytexweb.sh” script and successfully run the web app! However, the search cannot return any results (no matter what keyword I type), and the auto-complete function seems not work. Our experimental website is over here: http://textmining.ucsd.edu:8080/semanticSearch.iface And I also attached the screenshot and the output of the “ytexweb.sh” script. Could you kindly help to see if there is any setting / configuration needed to be modified in order to do the search? Thanks very much! Best regards, Tim *From:* Vijay Garla [mailto:vijay.ga...@yale.edu vijay.ga...@yale.edu] *Sent:* Wednesday, June 10, 2015 11:36 PM *To:* ts...@ucsd.edu *Cc:* Vijay Garla; dev@ctakes.apache.org *Subject:* Re: Question about YTEX This is part of the Semantic Similarity Web App. There is a script to start it; see https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2.0+-+Semantic+Similarity - just run the ytexweb.sh script in the bin directory On Wed, Jun 10, 2015 at 11:32 PM, Tsung-Ting Kuo ts...@ucsd.edu wrote: Thanks very much, this is really helpful! I surely can see the “v_snomed_fword_lookup” table in my database! So my next question is: would there be any document to tell me how to setup the Clinical NLP Semantic Search Engine on Tomcat (actually this is the main purpose why I am seeking for YTEX’s help)? I saw “ctakes-ytex-web-3.2.2-classes.jar” in the “lib” folder, but I am not sure how to deploy it as a web service. Best regards, Tim *From:* Vijay Garla [mailto:vijay.ga...@yale.edu] *Sent:* Wednesday, June 10, 2015 12:10 PM *To:* ts...@ucsd.edu *Cc:* Vijay Garla; dev@ctakes.apache.org *Subject:* Re: Question about YTEX You should see a v_snomed_fword_lookup table in your database. YTEX doesn't use all UMLS. It filters to SNOMED-CT and RXNORM, and specific Semantic Types The script is uses is here: https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex/scripts/data/mysql/umls/insert_view.template.sql https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk_ctakes-2Dytex_scripts_data_mysql_umls_insert-5Fview.template.sqld=AwMFaQc=-dg2m7zWuuDZ0MUcV7Sdqwr=rw3vAb56jh8xMYlMHfE0hS2hfbV3RFxqvusA5jfoLncm=ZKgA04a0yvzxPGVaJuAq7iXhdrZStPNaFJYLNpysn_As=A4kHSk5EMA2mDAk0zkZPlkPEDsxFWpazfC8SSMzSYDoe= You can create a dictionary with different vocabularies/semantic types very easily (just change the sql slightly). For more detail on how this works, see https://code.google.com/p/ytex/wiki/DictionaryLookup_V07 https://urldefense.proofpoint.com/v2/url?u=https-3A__code.google.com_p_ytex_wiki_DictionaryLookup-5FV07d=AwMFaQc=-dg2m7zWuuDZ0MUcV7Sdqwr=rw3vAb56jh8xMYlMHfE0hS2hfbV3RFxqvusA5jfoLncm=ZKgA04a0yvzxPGVaJuAq7iXhdrZStPNaFJYLNpysn_As=ZmRyuqnM-9FtXw9jKlfrp29gqd-_kd6rjuGSEVEXGgMe= (docs slightly out of date). I have not tested this with the 'fast' dictionary lookup, but I think the main speed gain of the fast lookup is due to skipping an extra(neous) database query. HTH, VJ On Wed, Jun 10, 2015 at 8:15 PM, Tsung-Ting Kuo ts...@ucsd.edu wrote: BTW, my YTEX installation just completed, and the results is attached. Does my YTEX installation create dictionary lookup table with all concepts from the UMLS successfully? Thanks very much! Best regards, Tim *From:* Tsung-Ting Kuo [mailto:ts...@ucsd.edu] *Sent:* Wednesday, June 10, 2015 9:49 AM *To:* 'Vijay Garla'; dev@ctakes.apache.org *Subject:* RE: Question about YTEX Hi Vijay, You are absolutely right – after changing the permission of the “resource” directory, YTEX start installing without problem! In the meanwhile, I have a quick question: since I have installed the UMLS database and set the “umls.schema” to UMLS database name (I am using MySQL), how do I know
Re: Question about YTEX
This is part of the Semantic Similarity Web App. There is a script to start it; see https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2.0+-+Semantic+Similarity - just run the ytexweb.sh script in the bin directory On Wed, Jun 10, 2015 at 11:32 PM, Tsung-Ting Kuo ts...@ucsd.edu wrote: Thanks very much, this is really helpful! I surely can see the “v_snomed_fword_lookup” table in my database! So my next question is: would there be any document to tell me how to setup the Clinical NLP Semantic Search Engine on Tomcat (actually this is the main purpose why I am seeking for YTEX’s help)? I saw “ctakes-ytex-web-3.2.2-classes.jar” in the “lib” folder, but I am not sure how to deploy it as a web service. Best regards, Tim *From:* Vijay Garla [mailto:vijay.ga...@yale.edu] *Sent:* Wednesday, June 10, 2015 12:10 PM *To:* ts...@ucsd.edu *Cc:* Vijay Garla; dev@ctakes.apache.org *Subject:* Re: Question about YTEX You should see a v_snomed_fword_lookup table in your database. YTEX doesn't use all UMLS. It filters to SNOMED-CT and RXNORM, and specific Semantic Types The script is uses is here: https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex/scripts/data/mysql/umls/insert_view.template.sql https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk_ctakes-2Dytex_scripts_data_mysql_umls_insert-5Fview.template.sqld=AwMFaQc=-dg2m7zWuuDZ0MUcV7Sdqwr=rw3vAb56jh8xMYlMHfE0hS2hfbV3RFxqvusA5jfoLncm=ZKgA04a0yvzxPGVaJuAq7iXhdrZStPNaFJYLNpysn_As=A4kHSk5EMA2mDAk0zkZPlkPEDsxFWpazfC8SSMzSYDoe= You can create a dictionary with different vocabularies/semantic types very easily (just change the sql slightly). For more detail on how this works, see https://code.google.com/p/ytex/wiki/DictionaryLookup_V07 https://urldefense.proofpoint.com/v2/url?u=https-3A__code.google.com_p_ytex_wiki_DictionaryLookup-5FV07d=AwMFaQc=-dg2m7zWuuDZ0MUcV7Sdqwr=rw3vAb56jh8xMYlMHfE0hS2hfbV3RFxqvusA5jfoLncm=ZKgA04a0yvzxPGVaJuAq7iXhdrZStPNaFJYLNpysn_As=ZmRyuqnM-9FtXw9jKlfrp29gqd-_kd6rjuGSEVEXGgMe= (docs slightly out of date). I have not tested this with the 'fast' dictionary lookup, but I think the main speed gain of the fast lookup is due to skipping an extra(neous) database query. HTH, VJ On Wed, Jun 10, 2015 at 8:15 PM, Tsung-Ting Kuo ts...@ucsd.edu wrote: BTW, my YTEX installation just completed, and the results is attached. Does my YTEX installation create dictionary lookup table with all concepts from the UMLS successfully? Thanks very much! Best regards, Tim *From:* Tsung-Ting Kuo [mailto:ts...@ucsd.edu] *Sent:* Wednesday, June 10, 2015 9:49 AM *To:* 'Vijay Garla'; dev@ctakes.apache.org *Subject:* RE: Question about YTEX Hi Vijay, You are absolutely right – after changing the permission of the “resource” directory, YTEX start installing without problem! In the meanwhile, I have a quick question: since I have installed the UMLS database and set the “umls.schema” to UMLS database name (I am using MySQL), how do I know whether YTEX successfully creates a dictionary lookup table with all concepts from the UMLS or not? Thanks again! Best regards, Tim *From:* Vijay Garla [mailto:vijay.ga...@yale.edu vijay.ga...@yale.edu] *Sent:* Tuesday, June 9, 2015 11:31 PM *To:* ts...@ucsd.edu; dev@ctakes.apache.org *Cc:* vijay.ga...@yale.edu *Subject:* Re: Question about YTEX Hi Tsung-Ting, I see the following error: templateToConfig.extractTemplates: [echo] unpacking ytex templates from /usr/local/apache-ctakes-3.2.2/lib/ctakes-ytex-res-3.2.2.jar to /usr/local/apache-ctakes-3.2.2/resources [unzip] Expanding: /usr/local/apache-ctakes-3.2.2/lib/ctakes-ytex-res-3.2.2.jar into /usr/local/apache-ctakes-3.2.2/resources [unzip] Unable to expand to file /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/ytex/conceptGraph/sct-rxnorm.template.xml [unzip] Unable to expand to file /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/ytex/conceptGraph/sct-msh-csp-aod.template.xml [unzip] Unable to expand to file /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/ytex/conceptGraph/sct-umls.template.xml [unzip] Unable to expand to file /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/ytex/umls/model/UMLS.hbm.template.xml [unzip] Unable to expand to file /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/ytex/dictionary/lookup/LookupDesc_stem_SNOMED.template.xml [unzip] Unable to expand to file /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/ytex/dictionary/lookup/LookupDesc_SNOMED.template.xml I don't know why there was an issue extracting ctakes-ytex-res-3.2.2.jar. Can you make sure that /usr/local/apache-ctakes-3.2.2/resources exists and is writable? -vj On Wed, Jun 10, 2015 at 12:17 AM, Tsung-Ting Kuo ts...@ucsd.edu wrote: Hi Vijay Garla, I am Tsung-Ting Kuo from UCSD
Re: cTakes polarity problem
As guergana mentioned ctakes has a rule based negation detection module. In addition ytex adds a negex based analysis engine. Both approaches are very sensitive to sentence splitting (see previous threads on alternative sentence splitters). An additional advantage of rule based negation is you don't need some of the memory cpu intensive analysis engines required by the ml-based negation detection ae. Hth Vj On Thursday, January 1, 2015, John Green john.travis.gr...@gmail.com wrote: As I was reading this thread I had the same thought as Tim, perhaps a combination. It seems over the perfect training corpus this wouldnt be necessary, but perhaps as a stop gap the ensemble approach for some using your training data but working in a diff corpus (not that I really have the time to write anything here, just spit balling bc its an interesting thread). Im still bootstrapping myself in ML so I may not have followed David's reasoning perfectly, but couldn't a simple approach be that anything that isnt negated by the new algo get passed to negex as a fall back? I think that was what you were saying Tim. One area that I can comment on in a more meaningful way would be chiming in on Tim's remarks regarding the legitimacy of the phrase Deny hepatitis: I agree, my clinical intuition says it's an unlikely phrase. More probable would be it was a typo; Negative for hepatitis would be more reasonable after, say, serology for HepB markers, though strictly speaking this would be less likely to be in a phrase reporting results of just that specific test (this would more likely be something a long the lines of hep panel negative or simply the the labs were unremarkable. However, I could see this phrase in something like the std screen was negative for hep but positive for hiv. The latter is definitely just one clinical opinion, people talk all kinds of ways on the wards, good and bad, and it ends up in their notes too. Best, JG On Wed, Dec 31, 2014 at 12:32 PM, David Kincaid kincaid.d...@gmail.com javascript:; wrote: Tim, I like your idea of a hybrid approach. I've thought about trying a hybrid approach in the past myself, but haven't had a chance to try it or seen any papers on it. It seems you could do it by either treating the NegEx output simply as a feature in the ML model or combining the output of NegEx and the ML model as an ensemble of sorts. The former would probably have the problem of the NegEx feature overwhelming any other features since it would be right most of the time. If I were doing it I think I'd start with the latter approach. In any event, it seems like right now people will need to see how the two systems (NegEx and ML) work on their particular data and go with whichever is best. - Dave On Wed, Dec 31, 2014 at 10:40 AM, Miller, Timothy timothy.mil...@childrens.harvard.edu javascript:; wrote: Hi Michael, I'm somewhat sympathetic to that opinion. But we did a bunch of experiments and it seemed to us that negex was too hand-tailored for a specific dataset and that our new module did better across datasets and overall. The tradeoff is that it is harder to improve and it sometimes gives unexpected results on the kind of inputs people input by hand for preliminary testing. That is a tradeoff people will have to consider and like Guergana said, the rule-based module is still part of cTAKES. (FWIW, I believe it is possible to engineer examples that make Negex fail in unintuitive ways as well.) If you are interested in these experiments please check out our paper in Plos One where we look at the difficulty of the polarity problem, specifically porting systems to new domains: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0112774 I've been wondering if some hybrid approach might be useful. For example, maybe a system that runs the ML module and Negex and adds in all the recalled negated terms that Negex finds over and above the ML. This would probably fix some of the issues with test sentences but does not solve the problem of being hard to debug. Another possibility is using a more transparent ML method like decision trees or something. Tim On 12/31/2014 11:22 AM, Michael J Gurley wrote: I think this demonstrates that machine learning is not the right approach to the negation/polarity problem. Michael Gurley m-gur...@northwestern.edu javascript:; 312 925 3268 Northwestern University Clinical and Translational Sciences Institute (NUCATS) http://www.nucats.northwestern.edu Rubloff Building 750 N Lake Shore Drive, 11th Floor Chicago, IL 60611 On 12/31/14 9:13 AM, Miller, Timothy timothy.mil...@childrens.harvard.edu javascript:; wrote: Hi Yu, The new polarity module is machine-learning based so it is not always easy to
Re: YTEX semantic similarity concept graph questions
I don't know what the difference between PAR/CHD (parent/child) and RB/RN (broader/narrower) is supposed to be. some umls source vocabularies use PAR/CHD only/predominantly (e.g. SNOMED-CT), others use RB/RN (e.g. RXNORM). You can use and experiment with whatever relationships you want (I think there might be part of/contains relationships too). the concept graph is a directed acyclic graph, and the query should return parent-child edges (or maybe the other way around, not sure). If your query uses e.g. rel in ('PAR', 'CHD'), you will return edges going both directions. This shouldn't cause any problems, as we discard edges that induce cycles, but it will create a bunch of overhead for no gain. If you look at other concept graph configs, e.g. https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/conceptGraph/sct-rxnorm.template.xml, you will see that we use both PAR RB relationships. HTH, VJ On Thu, Oct 16, 2014 at 2:58 AM, John Green john.travis.gr...@gmail.com wrote: Hope this finds everyone well. It is not immediately clear to me why select distinct cui1, cui2 from umls.MRREL where sab in ('SNOMEDCT') and rel in ('PAR') order by cui1, cui2 would only be selecting the relationship (REL) of PAR. Im not sure the selection criteria. This is honestly probably directed mostly at Vijay, but anyone else with experience in this domain would be a welcome voice. In the paper on YTEX, for instance, PAR and RB are chosen for UMLS. Why? Does this have to do with the flattening or orphaning that UMLS does to the vocabularies it includes? Why not PAR, RB, and RN? Why not more? Was this a computational (speed/memory) consideration, or a functional one that my lack of familiarity to the domain is keeping me from seeing. Im posting this fairly specific question to the Dev because it directly relates to building YTEX concept graphs, which is a functionality of our distro here. Best! JG
Re: NPE with ytex in ctakes 3.2.0
The error is caused by not finding the required properties files/xml config files. There are some issues with the ytex setup scripts for the 3.2 release; I have fixed that in trunk. I am updating the 3.2 installation guide with the patched setup scripts. It's not clear to me if you're running from a dev environment/eclipse, or running from the ctakes distro. If running from a development environment, see https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex/README If running from the ctakes distro, make sure you follow the ytex setup: https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation For the dev environment, the xml config file is in the ctakes-ytex-res ( https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/uima/beanRefContext.xml ) For the binary distro, the xml config files are in lib\ctakes-ytex-res-3.2.0.jar -vj On Fri, Oct 10, 2014 at 10:56 PM, David Kincaid kincaid.d...@gmail.com wrote: I don't have that file anywhere either. Where do I get it from? On Fri, Oct 10, 2014 at 3:53 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: I think it’s in ctakes-ytex-res.jar (is that in your classpath)? This is just a guess… vj may have a better idea if it still doesn’t work for you. From: David Kincaid [mailto:kincaid.d...@gmail.com] Sent: Friday, October 10, 2014 4:51 PM To: u...@ctakes.apache.org Subject: Re: NPE with ytex in ctakes 3.2.0 No. I have no file named beanRefContext.xml anywhere on my hard drive. On Fri, Oct 10, 2014 at 3:45 PM, Chen, Pei pei.c...@childrens.harvard.edu mailto:pei.c...@childrens.harvard.edu wrote: I’m not too familiar with the ytex component, but my guess is that the ytexApplicationContext bean is null? It seems that it would be expected to be in the classpath*:org/apache/ctakes/ytex/uima/beanRefContext.xml? Do those exists? From: David Kincaid [mailto:kincaid.d...@gmail.commailto: kincaid.d...@gmail.com] Sent: Friday, October 10, 2014 4:23 PM To: u...@ctakes.apache.orgmailto:u...@ctakes.apache.org Subject: NPE with ytex in ctakes 3.2.0 I'm trying to experiment the ytex in 3.2.0. Trying to run AggregatePlaintextUMLSProcessor with the FilesInDirectoryCollectionReader and FileWriterCASConsumer. When I try to run it against some text files it blows up with a null pointer exception during initialization. Here's the relevant part of the stack trace. Anyone have any ideas what I might have wrong?: Caused by: org.apache.uima.resource.ResourceInitializationException: Initialization of annotator class org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator failed. (Descriptor: file:/home/davek/apps/apache-ctakes-3.2.0/desc/ctakes-ytex-uima/desc/analysis_engine/SegmentRegexAnnotator.xml) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:252) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:156) at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185) at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:314) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:425) at org.apache.uima.collection.impl.cpm.container.CPEFactory.produceIntegratedCasProcessor(CPEFactory.java:1088) ... 9 more Caused by: java.lang.NullPointerException at org.apache.ctakes.ytex.uima.ApplicationContextHolder.getApplicationContext(ApplicationContextHolder.java:79) at
Re: YTEX Semantic Sim RESTful
Hi John, Looking at the code, that error is due to the concept graph 'umls' not being loaded. by default, ytex is configured to use the sct-rxnorm concept graph. Can you see if this works: http://localhost:8080/services/rest/similarity?conceptGraph=sct-rxnormconcept1=C0018787concept2=C0024109metrics=LCH,INTRINSIC_LCH To set the concept graph name set the ytex.conceptGraphName in resources/org/apache/ctakes/ytex/ytex.properties If not, there may be an issue in a config file; if you get another NPE, please * copy https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/web/beans-kernel-simweb.xml to CTAKES_HOME/resources/org/apache/ctakes/ytex/web/ * copy https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-distribution/src/main/bin/ytexweb.bat to CTAKES_HOME/bin HTH! Vijay On Tue, Oct 14, 2014 at 1:51 PM, John Green john.travis.gr...@gmail.com wrote: Good idea Kim! Unfortunately, that wasn't it. Ill admit, though, I hadnt looked at that variable yet. Thanks for your help, JG On Mon, Oct 13, 2014 at 6:28 PM, Kim Ebert kim.eb...@perfectsearchcorp.com wrote: Perhaps your JVM is running out of heap? I've noticed that when I run out of heap, cTakes tends to behave erratically. Kim Ebert 1.801.669.7342 Perfect Search Corp http://www.perfectsearchcorp.com/ On 10/13/2014 09:29 AM, John Green wrote: I've been putting off debugging this as it was a piece of this app Im working on, but one that fit in down the road in development. Development has progressed, and here I am. I have posted this one before, was hoping to find fresh help. When running ytex.sh in a distro installed at something like ./ctakes3.2.0/apache-ctakes-3.1.2-SNAPSHOT/bin$ under Ubuntu 14 and trying to access the restful interface per the docs on a query like such as http://localhost:8080/similarity?conceptGraph=umlsconcept1=C0018787concept2=C0024109metrics=LCH,INTRINSIC_LCH the query fails with a 500 (see below). Of note, the http://localhost:8080/semanticSim.jsf works just fine. Am I missing something simple? Thanks for any and all help, Best, JG 500 error: HTTP ERROR 500 Problem accessing /services/rest/similarity. Reason: Server Error Caused by: java.lang.RuntimeException: org.apache.cxf.interceptor.Fault at org.apache.cxf.interceptor.AbstractFaultChainInitiatorObserver.onMessage(AbstractFaultChainInitiatorObserver.java:116) at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:333) at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:239) at org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:248) at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:222) at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:153) at org.apache.cxf.transport.servlet.CXFNonSpringServlet.invoke(CXFNonSpringServlet.java:167) at org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:286) at org.apache.cxf.transport.servlet.AbstractHTTPServlet.doGet(AbstractHTTPServlet.java:211) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:262) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:698) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:526) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:568) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1105) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:453) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1039) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:201) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:445) at
Re: sentence detector model
Why not use the i2b2 corpora? On Monday, September 29, 2014, Dligach, Dmitriy dmitriy.dlig...@childrens.harvard.edu wrote: Maybe creating a made-up set of sentences would be an option? That way we could agree on the annotation of concrete cases. Although this would be more of a unit test than a corpus. Dima On Sep 27, 2014, at 12:15, Miller, Timothy timothy.mil...@childrens.harvard.edu javascript:; wrote: I've just been using the opennlp command line cross validator on the small dataset i annotated (along with some eyeballing). It would be cool if there was a standard clinical resource available for this task, but I hadn't considered it much because the data I annotated pulls from multiple datasets and the process of arranging with different institutions to make something like that available would probably be a nightmare. Tim Sent from my iPad. Sorry about the typos. On Sep 27, 2014, at 12:16 PM, Dligach, Dmitriy dmitriy.dlig...@childrens.harvard.edu javascript:; wrote: Tim, thanks for working on this! Question: do we have some formal way of evaluating the sentence detector? Maybe we should come up with some dev set that would include examples from mimic... Dima On Sep 27, 2014, at 8:57, Miller, Timothy timothy.mil...@childrens.harvard.edu javascript:; wrote: I have been working on the sentence detector newline issue, training a model to probabilistically split sentences on newlines rather than forcing sentence breaks. I have checked in a model to the repo under ctakes-core-res. I also attached a patch to ctakes-core to the jira issue: https://issues.apache.org/jira/browse/CTAKES-41 for people to test. The status of my testing is that it doesn't seem to break on notes where ctakes worked well before (those where newlines are always sentence breaks), and is a slight improvement on notes where newlines may or may not be sentence breaks. Once the change is checked in we can continue improving the model by adding more data and features, but the first hurdle I'd like to get past is making sure it runs well enough on the type of data that the old model worked well on. Let me know if you have any questions. Thanks Tim
Re: org.apache.ctakes.ytex.umls.dao.UMLSDaoTest
That is an expected error having to do with the fact that UMLS isn't installed in the test database that get's fired up for unit tests. That is actually a warning (and should be interpreted as an error only if you do have UMLS set up) On Mon, Aug 25, 2014 at 9:02 PM, Pei Chen chen...@apache.org wrote: Hi VJ, While on the subject of unit tests- I didn't get a chance to dig deeper and was hoping you would know the cause of this unit test failure: mvn clean install 2014-08-25 13:33:50,830 WARN net.sf.ehcache.CacheManager - Creating a new instance of CacheManager using the diskStorePath /var/folders/qc/d7xd4zzs0_xcybv88skt5_7mgn/T/ which is already used by an existing CacheManager. The source of the configuration was net.sf.ehcache.config.generator.ConfigurationSource$InputStreamConfigurationSource@7433a719. The diskStore path for this CacheManager will be set to /var/folders/qc/d7xd4zzs0_xcybv88skt5_7mgn/T//ehcache_auto_created_1408988030830. To avoid this warning consider using the CacheManager factory methods to create a singleton CacheManager or specifying a separate ehcache configuration (ehcache.xml) for each CacheManager instance. 2014-08-25 13:33:51,082 WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper - SQL Error: 62, SQLState: S0010 2014-08-25 13:33:51,082 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper - Unknown JDBC escape sequence: {{db.schema}.MRCONSO mrconso0_ where mrconso0_.aui? and length(mrconso0_.aui)0 and length(mrconso0_.str)200 and mrconso0_.lat='ENG' order by mrconso0_.aui 2014-08-25 13:33:51,085 WARN org.apache.ctakes.ytex.umls.dao.UMLSDaoTest - sql exception - mrconso probably doesn't exist, check error org.hibernate.exception.SQLGrammarException: could not prepare statement at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:123) at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:49) at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:125) at org.hibernate.engine.jdbc.internal.StatementPreparerImpl$StatementPreparationTemplate.prepareStatement(StatementPreparerImpl.java:188) at org.hibernate.engine.jdbc.internal.StatementPreparerImpl.prepareQueryStatement(StatementPreparerImpl.java:159) at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1859) at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1836) at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1816) at org.hibernate.loader.Loader.doQuery(Loader.java:900) at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:342) at org.hibernate.loader.Loader.doList(Loader.java:2526) at org.hibernate.loader.Loader.doList(Loader.java:2512) at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2342) at org.hibernate.loader.Loader.list(Loader.java:2337) at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:495) at org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:357) at org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:195) at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1269) at org.hibernate.internal.QueryImpl.list(QueryImpl.java:101) at org.apache.ctakes.ytex.umls.dao.UMLSDaoImpl.getAllAuiStr(UMLSDaoImpl.java:106) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:319) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:110) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at com.sun.proxy.$Proxy11.getAllAuiStr(Unknown Source) at org.apache.ctakes.ytex.umls.dao.UMLSDaoTest.testGetAllAuiStr(UMLSDaoTest.java:53) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Re: Exporting YTEX Pipeline
Can you try this: copy https://code.google.com/p/ytex/source/browse/trunk/workspace/examples/fracture/cui/export.template.xml to CTAKES_HOME\desc\ctakes-ytex\fracture\cui.xml replace %DB_SCHEMA% with your database schema name (value of db.schema in your ytex.properties file) Then from a command prompt, execute the following commands: cd CTAKES_HOME bin\setenv.bat java -cp %CLASSPATH% -Dlog4j.configuration=file:/%CTAKES_HOME%/config/log4j.xml -Xmx256m org.apache.ctakes.ytex.kernel.SparseDataExporterImpl -prop desc\ctakes-ytex\fracture\cui.xml -type weka Tell me if you run into any issues. I will add this to the ctakes confluence doc. Best, VJ On Wed, Jul 30, 2014 at 5:11 PM, Clayton Turner caturn...@g.cofc.edu wrote: Hi, I'm trying to export the data I get from running the pipeline through the Collection Processing Engine. I set up the pipeline where I have a directory where all the XML is output to, but I am having issues at this point. I've tried using the built in Exporter from the Data Mining section on this page https://cwiki.apache.org/confluence/display/CTAKES/User%27s+Guide but those notes are out of date. Even altering directories to match the files still gives me errors about not being able to find the ExporterImpl class. The class version of this file only exists outside of the target directory for the ctakes snapshot and attempting to use it still fails. I then ventured to here: https://code.google.com/p/ytex/source/browse/#svn%2Ftrunk%2Fworkspace%2Fexamples%2Ffracture The files here match up to the data mining section from the previous link - so I created my export.xml file and changed everything that needed to be changed for my example (tried to even run bone fracture), but I cannot get data exported, no matter what I do. Is there a way to use some new(er) implementation of the SparseDataExporterImpl class or is there an alternative for extracting data for use with weka? I've messaged about this in the past but I don't believe I was thorough enough with my issues. Thanks in advance, Clayton
Re: Exporting YTEX Pipeline
Great that it worked! Note that the examples for fracture (bag of words/bag of cuis) is just scratching the surface of feature representations - there are a gazillion ways to export the document (bag of words per section, include negation status, ...) Doing this via SQL makes it super easy Best, VJ On Wed, Jul 30, 2014 at 9:07 PM, Clayton Turner caturn...@g.cofc.edu wrote: Awesome!! It worked! The only things I had to change (since I'm on Windows) was flipping the slashes when necessary and removing the first slash when specifying the -Dlog4j.configuration=file:/... Thank you so much for putting up with my issues -Clayton On Wed, Jul 30, 2014 at 2:48 PM, vijay garla vnga...@gmail.com wrote: Can you try this: copy https://code.google.com/p/ytex/source/browse/trunk/workspace/examples/fracture/cui/export.template.xml to CTAKES_HOME\desc\ctakes-ytex\fracture\cui.xml replace %DB_SCHEMA% with your database schema name (value of db.schema in your ytex.properties file) Then from a command prompt, execute the following commands: cd CTAKES_HOME bin\setenv.bat java -cp %CLASSPATH% -Dlog4j.configuration=file:/%CTAKES_HOME%/config/log4j.xml -Xmx256m org.apache.ctakes.ytex.kernel.SparseDataExporterImpl -prop desc\ctakes-ytex\fracture\cui.xml -type weka Tell me if you run into any issues. I will add this to the ctakes confluence doc. Best, VJ On Wed, Jul 30, 2014 at 5:11 PM, Clayton Turner caturn...@g.cofc.edu wrote: Hi, I'm trying to export the data I get from running the pipeline through the Collection Processing Engine. I set up the pipeline where I have a directory where all the XML is output to, but I am having issues at this point. I've tried using the built in Exporter from the Data Mining section on this page https://cwiki.apache.org/confluence/display/CTAKES/User%27s+Guide but those notes are out of date. Even altering directories to match the files still gives me errors about not being able to find the ExporterImpl class. The class version of this file only exists outside of the target directory for the ctakes snapshot and attempting to use it still fails. I then ventured to here: https://code.google.com/p/ytex/source/browse/#svn%2Ftrunk%2Fworkspace%2Fexamples%2Ffracture The files here match up to the data mining section from the previous link - so I created my export.xml file and changed everything that needed to be changed for my example (tried to even run bone fracture), but I cannot get data exported, no matter what I do. Is there a way to use some new(er) implementation of the SparseDataExporterImpl class or is there an alternative for extracting data for use with weka? I've messaged about this in the past but I don't believe I was thorough enough with my issues. Thanks in advance, Clayton -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson
Re: cTAKES CPE MySQL Exception
My guess is that this exception is coming out of the DictionaryLookup (it creates a connection and holds on to it for the life of the AE). If it is coming out of the DBCollectionReader/DBConsumer you're in luck, as those use a connection pool, and you can configure it to check the connection upon pulling from the pool The file is: resources\org\apache\ctakes\ytex\beans-datasource.xml see http://commons.apache.org/proper/commons-dbcp/api-1.4/org/apache/commons/dbcp/BasicDataSource.html - you want to set testOnBorrow to true, and set the validtionQuery to something like select 1 You should also set the errorRateThreshold in the CPE config (you can't do this via the gui - you have to do this in the xml) - that way the cpe doesn't bomb on the first error it sees - a few bad apples shouldn't kill the processing. HTH, VJ On Thu, Jul 24, 2014 at 4:32 PM, Clayton Turner caturn...@g.cofc.edu wrote: Hi, everyone. First off, I'd like to say awesome and thank you for the cTAKES 3.2 release and information. I've been following those pages and it's been really helpful for helping me move along in my own progress. Really cool stuff. So I'm using the Collection Processing Engine (with ytex and umls) and I'm trying to process ~1 million notes (as opposed to the about 30 in the given demo). I've tried this the past 2 days and when I come back in to check the progress I see that I've received an error about 14000 notes into the process: org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed. CausedBy: org.springframework.transaction.CannotCreateTransactionException: Could not open Hibernate Session for transaction; nested exception is com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received from the server was 53,888,249 milliseconds ago. The last packet sent successfully to the server was 53,888,249 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem. So, in my own debugging, I have ensured that autoReconnect true was on (it always has been). I looked at my CPE output in the command prompt and noticed a PacketTooBigException so I increased the packet max size to 1G (the max for sql server). I increased the time allowed for timeouts. I'm really unsure of what to do here. Should I find a way to see if there is a problematic note that is giving me issues (though I can't understand how 1 note would make a packet too large)? Should I try to do some horizontal sharding and break the problem into smaller chunks (though I would think this program could handle large datasets since it's using a query language)? I'm just at a loss with this error, especially since it takes so long to actually spit the error out at me. Thanks in advance everyone, Clayton -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson
Re: DBConsumer
You can add the DBConsumer to any pipeline, or add it to any CPE config. See https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+YTEX+DBConsumer You will have to set up ctakes to and your database as documented here: https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation -vj On Tue, Jul 15, 2014 at 2:16 AM, John Green john.travis.gr...@gmail.com wrote: The Ytex DBConsumer - If someone has a free moment, could they give me a hint at how I can plug into a mysql DB with the ytex DB consumer? For example, taking the default ytex pipeline and sending it to a db. If I get pointed in the right direction such that I figure it out Ill update the confluence with the how to for the future. Thanks! JG
Re: DBConsumer
I should probably add to the docs to use the component descriptor: YTEX_HOME\desc\ctakes-ytex-uima\desc\analysis_engine\DBConsumer.xml On Tue, Jul 15, 2014 at 2:53 PM, John Green john.travis.gr...@gmail.com wrote: Ok. I must have missed something. I did read both of those. Ill go back and look again. Thank you for your time Vijay,JG — Sent from Mailbox for iPhone On Tue, Jul 15, 2014 at 8:26 AM, vijay garla vnga...@gmail.com wrote: You can add the DBConsumer to any pipeline, or add it to any CPE config. See https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+YTEX+DBConsumer You will have to set up ctakes to and your database as documented here: https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation -vj On Tue, Jul 15, 2014 at 2:16 AM, John Green john.travis.gr...@gmail.com wrote: The Ytex DBConsumer - If someone has a free moment, could they give me a hint at how I can plug into a mysql DB with the ytex DB consumer? For example, taking the default ytex pipeline and sending it to a db. If I get pointed in the right direction such that I figure it out Ill update the confluence with the how to for the future. Thanks! JG
Re: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
Sorry, I meant the snomed dictionary lookup database - I think it is hsql, not lucene. -vj On Wed, Jul 9, 2014 at 5:30 PM, Masanz, James J. masanz.ja...@mayo.edu wrote: I believe the only dictionaries shipped with Apache cTAKES in lucene indexes contain just the Orange Book, RxNorm, and some made-up terms, not SNOMED-CT or the other sources taken from UMLS. If that is not correct, then I agree there is a problem with what is being shipped in lucene indexes. -- James -Original Message- From: vijay garla [mailto:vnga...@gmail.com] Sent: Wednesday, July 09, 2014 9:30 AM To: dev@ctakes.apache.org Subject: Re: [VOTE] Release Apache cTAKES 3.2.0 (rc2) ctakes-ytex-lib-3.1.2-SNAPSHOT.zip https://ytex.googlecode.com/files/ctakes-ytex-lib-3.1.2-SNAPSHOT.zip - this contains non-asf compliant ytex libs. I would like to add it to the sourceforge site / or add it to the ctakes resources directly (that way users simply have to unzip a single zip file) ctakes-ytex-resources-3.1.2-SNAPSHOT.zip http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip - this contains data derived from the UMLS - concept graphs and dictionary lookup tables. downloading this requires a UTS login. It is conceptually no different from the ctakes resources, so I believe it would be OK to add it to that zip file, but I'm not a lawyer. On another note: I think forcing users to specify the UTS username/password and contacting NIH every time you run cTAKES is problematic, and doesn't prevent users who don't have a valid UTS login from viewing the data contained in the lucene index dictionary. I personally believe requiring a UTS login to download would be the best way to make resources derived from the UMLS available to users (this is what I'm doing for ytex-resources). to summarize: for now, I would like to add the ytex libs to the ctakes resources zip. -vj On Wed, Jul 9, 2014 at 4:04 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: The maven artifacts are also available in the staging area: https://repository.apache.org/content/repositories/orgapachectakes-1001 VJ: Just curious- how did you envision ytex users downloading the jars/war? From the distro bin.zip or from maven central? --Pei -Original Message- From: Pei Chen [mailto:chen...@apache.org] Sent: Tuesday, July 08, 2014 6:11 PM To: dev@ctakes.apache.org Subject: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Hi all, The main difference between rc1 and rc2 is that we removed the lvg-res and assertion-res.jar from the distro. They still need to be unpacked. This is a call for a vote on releasing the following candidate (rc2) as Apache cTAKES 3.2.0. The major changes include: - New optional YTEX component(s) (Yale Extensions to cTAKES) - New optional improved/faster dictionary lookup (dictionary-lookup-fast) - New optional Temporal component (Time + Event extraction. Relations will be including in a future release.) - Other bug fixes/enhancements from Jira [TODO: Online documentation still needs to be updated on wiki] For more detailed information on the changes/release notes, please visit: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313621 version=12324066 The release was made using the cTAKES release process documented here: http://ctakes.apache.org/ctakes-release-guide.html The candidate is available at: http://people.apache.org/~chenpei/RCs/ctakes-3.2.0-rc2/apache-ctakes- 3.2.0-src.tar.gz /.zip The tag to be voted on: http://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.2.0-rc2 The MD5 checksum of the tarball can be found at: http://people.apache.org/~chenpei/RCs/ctakes-3.2.0-rc2/apache-ctakes- 3.2.0-src.tar.gz.md5 /.zip.md5 The signature of the tarball can be found at: http://people.apache.org/~chenpei/RCs/ctakes-3.2.0-rc2/apache-ctakes- 3.2.0-src.tar.gz.asc /.zip.asc Apache cTAKES' KEYS file, containing the PGP keys used to sign the release: https://dist.apache.org/repos/dist/release/ctakes/KEYS Please vote on releasing these packages as Apache cTAKES 3.2.0. The vote is open for at least the next 72 hours. Only votes from the cTAKES PMC are binding, but folks are welcome to check the release candidate and voice their approval or disapproval. The vote passes if at least three binding +1 votes are cast. [ ] +1 Release the packages as Apache cTAKES 3.2.0 [ ] -1 Do not release the packages because... Also, the convenience binary can be found at: http://people.apache.org/~chenpei/RCs/ctakes-3.2.0-rc2/apache-ctakes- 3.2.0-bin.tar.gz /.zip Note: It's temporarily on people.a.o because the artifacts were too large for https://dist.apache.org/repos/dist/dev/ctakes (Working with infra on increasing the limit). Thanks!
Re: cTAKES 3.2 Analysis Batch Issue
Hi Clayton, The screenshot is not coming through via the newsgroup emails. can you attach the log file? vj On Mon, Jul 7, 2014 at 5:38 PM, Clayton Turner caturn...@g.cofc.edu wrote: Any update on this issue? I have this problem even if I don't use the ytex version of the aggregate text processor (UMLS-independent as well). On Thu, Jul 3, 2014 at 2:33 PM, Clayton Turner caturn...@g.cofc.edu wrote: Yes, I am running the fracture_demo.xml cpe. There is no option for the analysis batch (that's the main issue). I also get no response in my MySQL database (umls installed - not sure if that can be related). Here's a screenshot of my CPE (using ytex): [image: Inline image 1] On Wed, Jul 2, 2014 at 10:48 PM, vijay garla vnga...@gmail.com wrote: Hi clayton, I assume you are running the fracture_demo.xml cpe - is that correct? The CPE GUI should give you the option to set the analysis batch. (see attached screenshot). That being said, the analysis_batch is not required (it will default to the current date). Can you attach the log file? -vj [image: Inline image 1] On Wed, Jul 2, 2014 at 12:22 PM, Clayton Turner caturn...@g.cofc.edu wrote: Hi, I'm a relatively new user of cTAKES. I recently cloned cTAKES from the repository and I am using UMLS installed in my mysql database. I have recently noticed an issue, though. When conducting the bone fracture demo, In the CPE, I use the DBCollectionReader and Analysis Engine from the ctakes-ytex-uima directory within my CTAKES_HOME. I can get this to run successfully, but I am not able to specify an analysis batch in the CPE. Because of this, my ytex database is not being updated with results of the CPE run (in the v_document tables). Any ideas why the analysis batch field is missing? Side question: Any update on when cTAKES 3.2 will be officially released? I see we're passed the expected release and was curious on how long it will be until it will officially come out. Thanks a lot, -- Clayton Turner -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson
Re: cTAKES 3.2 Analysis Batch Issue
My bad, the default log4j config just sends everything to the console. Can you run the cpe, can you redirect the output to a file like this: runctakesCPE.bat cpe.log 21 vj On Tue, Jul 8, 2014 at 6:01 PM, Clayton Turner caturn...@g.cofc.edu wrote: I don't see a log file when running the CPE. When running the CVD I have access to a log file within the gui, but that does not seem to be present here. Is there a specific place that this log file is saved? On Tue, Jul 8, 2014 at 3:14 AM, vijay garla vnga...@gmail.com wrote: Hi Clayton, The screenshot is not coming through via the newsgroup emails. can you attach the log file? vj On Mon, Jul 7, 2014 at 5:38 PM, Clayton Turner caturn...@g.cofc.edu wrote: Any update on this issue? I have this problem even if I don't use the ytex version of the aggregate text processor (UMLS-independent as well). On Thu, Jul 3, 2014 at 2:33 PM, Clayton Turner caturn...@g.cofc.edu wrote: Yes, I am running the fracture_demo.xml cpe. There is no option for the analysis batch (that's the main issue). I also get no response in my MySQL database (umls installed - not sure if that can be related). Here's a screenshot of my CPE (using ytex): [image: Inline image 1] On Wed, Jul 2, 2014 at 10:48 PM, vijay garla vnga...@gmail.com wrote: Hi clayton, I assume you are running the fracture_demo.xml cpe - is that correct? The CPE GUI should give you the option to set the analysis batch. (see attached screenshot). That being said, the analysis_batch is not required (it will default to the current date). Can you attach the log file? -vj [image: Inline image 1] On Wed, Jul 2, 2014 at 12:22 PM, Clayton Turner caturn...@g.cofc.edu wrote: Hi, I'm a relatively new user of cTAKES. I recently cloned cTAKES from the repository and I am using UMLS installed in my mysql database. I have recently noticed an issue, though. When conducting the bone fracture demo, In the CPE, I use the DBCollectionReader and Analysis Engine from the ctakes-ytex-uima directory within my CTAKES_HOME. I can get this to run successfully, but I am not able to specify an analysis batch in the CPE. Because of this, my ytex database is not being updated with results of the CPE run (in the v_document tables). Any ideas why the analysis batch field is missing? Side question: Any update on when cTAKES 3.2 will be officially released? I see we're passed the expected release and was curious on how long it will be until it will officially come out. Thanks a lot, -- Clayton Turner -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson
Re: Building
When you run the webapp, the restful sevices run as well On Friday, July 4, 2014, John Green john.travis.gr...@gmail.com wrote: Vijay - Ha! Ok. Works perfect with cuis. Is there a way to run the web application as a RESTful API? You mention this as a service on your yale box, but I dont see a way to deploy it this way local. Thanks again, JG On Wed, Jul 2, 2014 at 10:58 PM, vijay garla vnga...@gmail.com javascript:; wrote: The ytexWeb application tries to look up concepts from terms using the ytex dictionary lookup table, which is a small subset of the UMLS. Can you try specifying cuis? That skips the lookup - if the concepts are in the concept graph, this will work. Best, vj On Sun, Jun 29, 2014 at 6:10 PM, John Green john.travis.gr...@gmail.com javascript:; wrote: Hi Vijay, thank you for your time. Your documentation was quite good. I had no problem setting up ytex with UMLS running on my local mysql server. Where I ran into problems was understanding how to launch the web service (also, is there anyway to run this in a RESTful mode? Btw, the informatics.yale links returns 502). After I did get it launched, and the confusion was probably all my fault, the concepts available to the similarity fields seemed very sparse; I just started typing randomly, hematochezia, choledocholithiasis, etc, and nothing would come up. The best I got was gallbladder function test, which, if Im understanding it right, would be an alkphos, but alkaline phosphatase didnt come up, which led to me to believe they were smaller sets of the the snomed, mesh, etc compilations (as I checked the UMLS db and these concepts are there). I think I got that execution command from the code.google, which is probably why it was stale. I did not see the ytex semantic similarity guide under the ctakes components part (sorry, thanks for pointing me there, ill get to work on reading it). So bottom line: are the ones that shipped watered down versions? And if not, why are my concepts coming up short? If you give me a hint at where to check Ill investigate. Thanks! JG On Sun, Jun 29, 2014 at 8:56 PM, vijay garla vnga...@gmail.com javascript:; wrote: Hi John, YTEX ships with 3 concept graphs (see https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+Semantic+Similarity ): - sct-rxnorm: concepts from SNOMED-CT and RXNORM. This is the default. - sct-msh-csp-aod: concepts from the SNOMED-CT, MeSH, CRISP, and Alcohol and Drug thesaurus - umls: concepts from all restriction free (level 0) UMLS source vocabularies and SNOMED-CT These concept graphs are included in ytex resources zip (see https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation ): 3) Unzip YTEX Resources (Optional - UTS login required) Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip 'over' your installation. This contains: - Concept Graphs derived from the UMLS2013AA used to compute semantic similarity measures All YTEX packages moved from the ytex namespace into org.apache.ctakes.ytex - can you tell me which document you were looking at that mentioned ytex.kernel.dao.ConceptDaoImpl? I thought I had fixed this in the documentation. HTH, -vj On Sun, Jun 29, 2014 at 2:25 PM, John Green john.travis.gr...@gmail.com javascript:; wrote: I got the semantic similarity web app running in ytex. Im still learning umls terminology, but I believe it says that out of the box its concept graphs are limited to the free set from umls? Does this mean without permissions? Similar to ctakes with umls rights? The concepts available seem limited so this would make sense. So, to take full advantage I would need to rebuild the concept graph, correct? Im in the process of doing this but getting classpath errors. I used java a bit ten years ago, so you can probably guess these will take me a minute to resolve. Notably, it is complaining about ytex.kernel.dao.ConceptDaoImpl. Thanks all, JG — Sent from Mailbox for iPhone
Re: cTAKES 3.2 Analysis Batch Issue
Hi clayton, I assume you are running the fracture_demo.xml cpe - is that correct? The CPE GUI should give you the option to set the analysis batch. (see attached screenshot). That being said, the analysis_batch is not required (it will default to the current date). Can you attach the log file? -vj [image: Inline image 1] On Wed, Jul 2, 2014 at 12:22 PM, Clayton Turner caturn...@g.cofc.edu wrote: Hi, I'm a relatively new user of cTAKES. I recently cloned cTAKES from the repository and I am using UMLS installed in my mysql database. I have recently noticed an issue, though. When conducting the bone fracture demo, In the CPE, I use the DBCollectionReader and Analysis Engine from the ctakes-ytex-uima directory within my CTAKES_HOME. I can get this to run successfully, but I am not able to specify an analysis batch in the CPE. Because of this, my ytex database is not being updated with results of the CPE run (in the v_document tables). Any ideas why the analysis batch field is missing? Side question: Any update on when cTAKES 3.2 will be officially released? I see we're passed the expected release and was curious on how long it will be until it will officially come out. Thanks a lot, -- Clayton Turner
Re: Building
the concept graph used by the webapp is defined in ytex.properties. You can also override it using the ytex.conceptGraph system property (add -Dytex.conceptGraph=xxx to the beginning of the ytexweb.bat java command line). I'm not sure about why you don't see any log output: when I run this line specifying an invalid concept graph name: java -cp %CLASSPATH% -Dlog4j.configuration=file:/%CTAKES_HOME%/config/log4j.xml -Xmx1g org.apache.ctakes.ytex.kernel.dao.ConceptDaoImpl -name concept graph nameC:\java\apache-ctakes-3.1.2-SNAPSHOTjava -cp %CLASSPATH% -Dlog4j.configuration= file:/%CTAKES_HOME%/config/log4j.xml -Xmx1g org.apache.ctakes.ytex.kernel.dao.ConceptDaoImpl -name test I get this output (indicating that the corresponding properties file can't be found): log4j: reset attribute= false. log4j: Threshold =null. log4j: Level value for root is [INFO]. log4j: root level set to INFO log4j: Class name: [org.apache.log4j.ConsoleAppender] log4j: Parsing layout of class: org.apache.log4j.PatternLayout log4j: Setting property [conversionPattern] to [%d{dd MMM HH:mm:ss} %5p %c{1} - %m%n]. log4j: Adding appender named [consoleAppender] to category [root]. *properties file could not be located: org/apache/ctakes/ytex/conceptGraph/test.xml * If you're on linux, can you play around with the file url for log4j? Best, VJ On Sun, Jun 29, 2014 at 6:30 PM, John Green john.travis.gr...@gmail.com wrote: Successfully ran command to build the concept graph, however, it seems to be failing silently. The version issued with ytex is 10m. I expected, worst case, for mine to be the same, it was 400 bytes (the .gz output). I cant find anything logged. log4j is complaining it isnt setup correctly, however, it is directed to the correct config file. Im not familiar with this logging program, so perhaps the errors are ending up in some kind of /dev/null. Also, the web app is only loading sct-msh-csp-aod. I see that in the same dir there are the others you spoke of. The web app doesnt give an option for using them (this makes sense as the command line output makes no mention of loading them) but I can find where what is loaded is defined. I hope that wasnt too poorly explained, Thanks, John On Sun, Jun 29, 2014 at 9:10 PM, John Green john.travis.gr...@gmail.com wrote: Hi Vijay, thank you for your time. Your documentation was quite good. I had no problem setting up ytex with UMLS running on my local mysql server. Where I ran into problems was understanding how to launch the web service (also, is there anyway to run this in a RESTful mode? Btw, the informatics.yale links returns 502). After I did get it launched, and the confusion was probably all my fault, the concepts available to the similarity fields seemed very sparse; I just started typing randomly, hematochezia, choledocholithiasis, etc, and nothing would come up. The best I got was gallbladder function test, which, if Im understanding it right, would be an alkphos, but alkaline phosphatase didnt come up, which led to me to believe they were smaller sets of the the snomed, mesh, etc compilations (as I checked the UMLS db and these concepts are there). I think I got that execution command from the code.google, which is probably why it was stale. I did not see the ytex semantic similarity guide under the ctakes components part (sorry, thanks for pointing me there, ill get to work on reading it). So bottom line: are the ones that shipped watered down versions? And if not, why are my concepts coming up short? If you give me a hint at where to check Ill investigate. Thanks! JG On Sun, Jun 29, 2014 at 8:56 PM, vijay garla vnga...@gmail.com wrote: Hi John, YTEX ships with 3 concept graphs (see https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+Semantic+Similarity ): - sct-rxnorm: concepts from SNOMED-CT and RXNORM. This is the default. - sct-msh-csp-aod: concepts from the SNOMED-CT, MeSH, CRISP, and Alcohol and Drug thesaurus - umls: concepts from all restriction free (level 0) UMLS source vocabularies and SNOMED-CT These concept graphs are included in ytex resources zip (see https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation): 3) Unzip YTEX Resources (Optional - UTS login required) Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip 'over' your installation. This contains: - Concept Graphs derived from the UMLS2013AA used to compute semantic similarity measures All YTEX packages moved from the ytex namespace into org.apache.ctakes.ytex - can you tell me which document you were looking at that mentioned ytex.kernel.dao.ConceptDaoImpl? I thought I had fixed this in the documentation. HTH, -vj On Sun, Jun 29, 2014 at 2:25 PM, John Green
Re: Building
The ytexWeb application tries to look up concepts from terms using the ytex dictionary lookup table, which is a small subset of the UMLS. Can you try specifying cuis? That skips the lookup - if the concepts are in the concept graph, this will work. Best, vj On Sun, Jun 29, 2014 at 6:10 PM, John Green john.travis.gr...@gmail.com wrote: Hi Vijay, thank you for your time. Your documentation was quite good. I had no problem setting up ytex with UMLS running on my local mysql server. Where I ran into problems was understanding how to launch the web service (also, is there anyway to run this in a RESTful mode? Btw, the informatics.yale links returns 502). After I did get it launched, and the confusion was probably all my fault, the concepts available to the similarity fields seemed very sparse; I just started typing randomly, hematochezia, choledocholithiasis, etc, and nothing would come up. The best I got was gallbladder function test, which, if Im understanding it right, would be an alkphos, but alkaline phosphatase didnt come up, which led to me to believe they were smaller sets of the the snomed, mesh, etc compilations (as I checked the UMLS db and these concepts are there). I think I got that execution command from the code.google, which is probably why it was stale. I did not see the ytex semantic similarity guide under the ctakes components part (sorry, thanks for pointing me there, ill get to work on reading it). So bottom line: are the ones that shipped watered down versions? And if not, why are my concepts coming up short? If you give me a hint at where to check Ill investigate. Thanks! JG On Sun, Jun 29, 2014 at 8:56 PM, vijay garla vnga...@gmail.com wrote: Hi John, YTEX ships with 3 concept graphs (see https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+Semantic+Similarity ): - sct-rxnorm: concepts from SNOMED-CT and RXNORM. This is the default. - sct-msh-csp-aod: concepts from the SNOMED-CT, MeSH, CRISP, and Alcohol and Drug thesaurus - umls: concepts from all restriction free (level 0) UMLS source vocabularies and SNOMED-CT These concept graphs are included in ytex resources zip (see https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation): 3) Unzip YTEX Resources (Optional - UTS login required) Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip 'over' your installation. This contains: - Concept Graphs derived from the UMLS2013AA used to compute semantic similarity measures All YTEX packages moved from the ytex namespace into org.apache.ctakes.ytex - can you tell me which document you were looking at that mentioned ytex.kernel.dao.ConceptDaoImpl? I thought I had fixed this in the documentation. HTH, -vj On Sun, Jun 29, 2014 at 2:25 PM, John Green john.travis.gr...@gmail.com wrote: I got the semantic similarity web app running in ytex. Im still learning umls terminology, but I believe it says that out of the box its concept graphs are limited to the free set from umls? Does this mean without permissions? Similar to ctakes with umls rights? The concepts available seem limited so this would make sense. So, to take full advantage I would need to rebuild the concept graph, correct? Im in the process of doing this but getting classpath errors. I used java a bit ten years ago, so you can probably guess these will take me a minute to resolve. Notably, it is complaining about ytex.kernel.dao.ConceptDaoImpl. Thanks all, JG — Sent from Mailbox for iPhone
Re: Ctakes-data-vis
I think one major issue with ctakes in a web server is thread safety. I know that LVG is not thread safe, and it isn't clear what the status is on other components. On Tue, Apr 29, 2014 at 9:20 AM, John Green john.travis.gr...@gmail.comwrote: Pei - I meant as a web app, can we keep the credentials loaded and the resources (more importantly) loaded in memory accross runs? E.g. Treat it like a que with the machinery already loaded and fed? Im sure this can be done, I just run ctakes from CPE right now and havent toyed with this so wasnt sure. Where is this at? Is anyone developing the front end? I might be able to invest some time into the easily. Jg — Sent from Mailbox for iPhone On Wed, Apr 16, 2014 at 12:23 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: John, How we use the VM is up to us to decide. For an online demo, We can certainly load up cTAKES and it's resources. If it's a web app, we can prompt the user to enter umls credentials if they choose the umls resources? --Pei -Original Message- From: John Green [mailto:john.travis.gr...@gmail.com] Sent: Sunday, April 13, 2014 9:16 PM To: dev@ctakes.apache.org Subject: Re: Ctakes-data-vis Great! Ill try and fix that soon. Im back on the wards so time is slim. What are the next steps for the vm? For the demo site? Out of curiosity, would this allow resources to stay loaded and a kind of que be setup? Is there a solution that allows to do this now? That is, the resources stay loaded in mem, the umls auth stays current, and I could just pass content as it becomes available? Jg — Sent from Mailbox for iPhone On Sat, Apr 12, 2014 at 2:57 PM, andy mcmurry mcmurry.a...@gmail.com wrote: It looks great! The transitions are smooth and the hierarchical browsing is straightforward. The only edit I recommend I have is about spacing -- The information often exceeds the space of a single page. On Sat, Apr 5, 2014 at 12:13 PM, John Green john.travis.gr...@gmail.comwrote: Had to refresh my svn skills as its been years. As a result not much cleaning up got done Andy/Pei. The code is solid though and I sent four different ways to view the json up too; collapsable dendrogram is the most useful. The script could easily be re written to iterate through a directory as its in the form of a simple class. Also, it should take command line args. Im out of time this weekend, even for the ten minutes that would take, but I can do both next weekend. Let me know if its useful at all Andy or if you need tweaks on anything to make it useful for whatever demo u have in mind, id be happy to as time permits. Hope to make more significant contributions to this wonderful project sometime in the next year, Jg -- Sent from Mailbox for iPhone
Re: ytex merged into trunk
is org.hibernate.exception.SQLGram marException: could not prepare statement org.apache.ctakes.ytex.uima.annotators.SparseDataExporterTest: Unable to initialize group definition. Group resource name [classpa th*:org/apache/ctakes/ytex/uima/beanRefContext.xml], factory key [ytexApplicationContext]; nested exception is org.springframework.b eans.factory.BeanCreationException: Error creating bean with name 'ytexApplicationContext' defined in URL [file:/C:/Spiffy/Dev/Apach eCtakesTrunk/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/uima/beanRefContext.xml]: Instantiation of bean failed; neste d exception is org.springframework.beans.BeanInstantiationException: Could not instantiate bean class [org.springframework.context.s upport.ClassPathXmlApplicationContext]: Constructor threw exception; nested exception is org.springframework.beans.factory.BeanCreat ionException: Error creating bean with name 'documentMapperService' defined in class path resource [org/apache/ctakes/ytex/uima/bean s-uima-mapper.xml]: Invocation of init method failed; nested exception is org.hibernate.exception.SQLGrammarException: could not pre pare statement Tests run: 12, Failures: 0, Errors: 7, Skipped: 0 ... [INFO] Apache cTAKES Resources ctakes-ytex-res ... SUCCESS [ 1.089 s] [INFO] Apache cTAKES YTEX SUCCESS [ 14.592 s] [INFO] Apache cTAKES YTEX UIMA ... FAILURE [01:34 min] [INFO] Apache cTAKES ctakes-clinical-pipeline SKIPPED [INFO] Apache cTAKES YTEX Web SKIPPED ... [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project ctakes-ytex-uima: There are test failures. -Original Message- From: vijay garla [mailto:vnga...@gmail.com] Sent: Sunday, April 27, 2014 10:56 PM To: dev@ctakes.apache.org Subject: ytex merged into trunk Hello All, I have merged YTEX into trunk, will keep the branch around a little while then delete it. Some non-ytex related changes (I will gladly change/revert if there are objections): * ctakes-temporal does not compile; from the email threads I take it that this is under development and the compilation failures are to be expected. I have commented out these modules from the root pom.xml so that ctakes builds. I wasn't able to use maven profiles to exclude ctakes-temporal, not sure why. * default max memory runctakesCPE/CVD.bat: I have increased this to 2G (was 1G). I do not think it is possible to run cTAKES with less memory (definitely not in a 64-bit jdk) when loading the assertion models (which is the 'default' pipeline). I will clean up the ytex docs shortly. Best, VJ
Re: YTEX install - one error after building
Hi Paula, UMLS.hbm.template.xml is a template used to generate a valid hibernate xml config file. If you have imported YTEX into eclipse, follow these guidelines: https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes-ytex/README I believe the issue might be that you have validation enabled for XML; I believe you can disable it for specific files (like UMLS.hbm.template.xml). I am using keper, and it doesn't complain about UMLS.hbm.template.xml; I'm not sure if I tweaked my validator settings. -vj On Tue, Mar 25, 2014 at 6:16 PM, digital paula cybersat...@hotmail.comwrote: Hi VJ, As part of testing, I did a fresh install of cTAKES with YTEX and everything installed correctly but after building I got one error pertaining to this page, five lines down. https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/umls/model/UMLS.hbm.template.xml The error is this line: hibernate-mapping package=org.apache.ctakes.ytex.umls.model schema=@umls.schema@ @filter.umls.catalog@ Using Eclipse Juno, the error states: Element type hibernate-mapping must be followed by either attribute specifications, or /. I tried using / instead of and putting it all on one line instead of two but can't seem to fix it. Also, I was about to install the sectionizer separately as a module but I see that YTEX already has a sectionizer(SegmentRegexSectionizer) so I look forward to exploring it further. Regards, Paula Date: Thu, 20 Mar 2014 14:08:32 -0400 Subject: Re: YTEX Doc in cwiki From: vnga...@gmail.com To: dev@ctakes.apache.org I plan to fix all the links. I have not yet moved the scripts for the semantic similarity benchmark to cTAKES, so I dropped that from the cTAKES semantic similarity docs. When those scripts get moved to cTAKES, I'll update the docs. On Thu, Mar 20, 2014 at 12:33 PM, Masanz, James J. masanz.ja...@mayo.eduwrote: hi vijay, I have just skimmed a few sections so far. the page has links at the top to google docs pages and then links to our web pages (the children pages) at the bottom. Is your intent to remove the first 3 links once things are finalized? some of the examples on the Semantic+Similarity page use cd CTAKES_HOME but later use %CTAKES_HOME% so it looks like you meant cd %CTAKES_HOME% I didn't see anything about the Similarity Benchmark on the new pages. Is that still part of ytex? -- james -Original Message- From: vijay garla [mailto:vnga...@gmail.com] Sent: Sunday, March 16, 2014 8:53 PM To: dev@ctakes.apache.org Subject: YTEX Doc in cwiki Hello All, I've made a first cut at moving and updating the YTEX docs over from google code to the cTAKES confluence site. This is a first cut, and I'm trying to keep the YTEX docs separated, as it is not yet in trunk/released, and I don't want to mess up any existing docs. see https://cwiki.apache.org/confluence/display/CTAKES/YTEX+3.2 Best, VJ
Re: YTEX Doc in cwiki
I plan to fix all the links. I have not yet moved the scripts for the semantic similarity benchmark to cTAKES, so I dropped that from the cTAKES semantic similarity docs. When those scripts get moved to cTAKES, I'll update the docs. On Thu, Mar 20, 2014 at 12:33 PM, Masanz, James J. masanz.ja...@mayo.eduwrote: hi vijay, I have just skimmed a few sections so far. the page has links at the top to google docs pages and then links to our web pages (the children pages) at the bottom. Is your intent to remove the first 3 links once things are finalized? some of the examples on the Semantic+Similarity page use cd CTAKES_HOME but later use %CTAKES_HOME% so it looks like you meant cd %CTAKES_HOME% I didn't see anything about the Similarity Benchmark on the new pages. Is that still part of ytex? -- james -Original Message- From: vijay garla [mailto:vnga...@gmail.com] Sent: Sunday, March 16, 2014 8:53 PM To: dev@ctakes.apache.org Subject: YTEX Doc in cwiki Hello All, I've made a first cut at moving and updating the YTEX docs over from google code to the cTAKES confluence site. This is a first cut, and I'm trying to keep the YTEX docs separated, as it is not yet in trunk/released, and I don't want to mess up any existing docs. see https://cwiki.apache.org/confluence/display/CTAKES/YTEX+3.2 Best, VJ
YTEX Doc in cwiki
Hello All, I've made a first cut at moving and updating the YTEX docs over from google code to the cTAKES confluence site. This is a first cut, and I'm trying to keep the YTEX docs separated, as it is not yet in trunk/released, and I don't want to mess up any existing docs. see https://cwiki.apache.org/confluence/display/CTAKES/YTEX+3.2 Best, VJ
Re: YTEX LVG Fix
Hi John, Thanks for this. I've updated the YTEXPipeline, fixed the lvg paths in SetupAUIFirstWord. If you want to re-run SetupAUIFirstWord (not necessary unless you are using the stemmed words for dictionary lookup), just svn update, rebuild ctakes-ytex-uima, and copy the jar to the lib dir of your ctakes install. Best, VJ On Mon, Feb 10, 2014 at 6:25 PM, John David Osborne (Campus) ozb...@uab.edu wrote: These were the changes I made to get the YTEX pipeline working with LVG (2008). It looks like there were just a couple of spots with some old hard-coded paths in SetupAUIFirstWord.java that were appropriate to the old ytex directory structure. For now I have just swapped them out to fit with the new directory structure, but I suppose the correct fix may be to extract them out somewhere... In any case I don't have write privileges, some someone else may want to fix this (Vijay?) I also included the YTEXPipeline.xml descriptor file I fixed as well in case anybody needs it. -John
Re: YTEX cTAKES 3.1.1 ready
I believe it is worth migrating to trunk. Note that the sentence detector is also complementary - the existing ctakes sentence detector is unchanged - users can choose which sentence detector to use. There are changes to assertion dependency parsing to support sentences without newlines, and that works with both sentence detectors. I believe cTAKES absolutely has to support sentences with newlines within them - I have yet to run across clinical text from a real EMR where newlines represent the end of a sentence - the changes to assertion dependency parsing will have to be done at some point. -vj On Thu, Feb 6, 2014 at 10:19 AM, Chen, Pei pei.c...@childrens.harvard.eduwrote: VJ, Aside from the changes to the existing cTAKES code (sentence detector, etc.) [which we could leave out if it's still being debated], Do you think it's worth migrating the ytex code to trunk at this point? As you mentioned earlier, it's largely complementary. [I was just thinking of saving effort to maintain the separate branch and for simplicity for dev...] --Pei -Original Message- From: vijay garla [mailto:vnga...@gmail.com] Sent: Wednesday, February 05, 2014 9:30 PM To: ytex-us...@googlegroups.com; ctakes-...@incubator.apache.org; vlad.valtchi...@gmail.com Subject: Re: YTEX cTAKES 3.1.1 ready Hi Vlad, I Updated the umls install guide; see https://code.google.com/p/ytex/wiki/UMLS_SQL_SERVER_3_1 I would prefer to add the docs in the ctakes confluence, but as far as I can tell, I don't have write access there - can somebody give me write privileges on the ctakes confluence site? There was a bug in the umls install; copy https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes- ytex/scripts/data/build.xmlover the corresponding file in your ctakes-3.1.2 install (CTAKES_HOME\bin\ctakes-ytex\scripts\data) and you should be set. The import is currently running on the UMLS 2013AA (I assume this will complete without issues as long as the umls schema hasn't changed from 2012). what trial and error did you have to go through to build the distro? -vj On Wed, Feb 5, 2014 at 5:33 PM, vijay garla vnga...@gmail.com wrote: Hi Vlad, sorry that the instructions aren't clear. re 1) What I am trying to say is install apache-ctakes-3.2.0-snapshot as usual (this is unchanged from 3.1.1). After that you still have to apply the lib and resources (these are things that cannot be distributed via apache). re 2) Yes, I need to update those docs. Hopefully will get to that at some point. However, I assume you already have a UMLS DB (also assume SQL Server). If you can't/don't want to use your existing umls DB, please tell me. The I'll priortize upgrading the doc on importing the umls tables (the scripts are there). best, VJ On Wed, Feb 5, 2014 at 4:44 PM, vlad.valtchi...@gmail.com wrote: Hi VJ- so, with trial and error were able to make the distribution and now have the apache-ctakes-3.1.2-SNAPSHOT-bin.zip archive. Here's what's unclear. 1. Is now this the only (combined) thing that you need for ctakes 3.1.1 + Ytex? the current documentation (https://code.google.com/p/yte x/wiki/Installation_cTAKES_3_1?ts=1388793998updated=Instal lation_cTAKES_3_1) which most probably is outdated, talks about installing cTakes 3.1.1 first and then applying 2 SNAPSHOT archives (downloadable) , lib and resources. This is a confusion point. 2. The directions to import UMLS subset are then outdated as well. Maybe one should use the old version (ctakes 2.5 and ytex 0.8) to import the RRF files for the UMLS subset and then just use the resulting db. Thoughts? Thanks, Vlad Valtchinov Brigham Rad On Thursday, January 30, 2014 5:17:43 PM UTC-5, vijay garla wrote: Hi Vlad, All of ytex has been moved into ctakes, it is currently in a branch ( https://svn.apache.org/repos/asf/ctakes/branches/ytex). You don't have to install ytex-0.8 - instead you will have to build and install from the ytex branch to create your own distribution. Steps 2 3 are correct. Although it is a pain, if you have the jdk, maven, and svn, you can easily build your own distro: * open a command prompt * make sure jdk, maven, and svn are in your path * cd to some directory where you want to check stuff out (I like c:\temp) * run the following commands rmdir /s /q ctakes svn co https://svn.apache.org/repos/asf/ctakes/branches/ytex ctakes cd ctakes mvn clean install -DskipTests And you will have the ctakes (with ytex) distro in ctakes\ctakes-distribution\target\apache-ctakes-3.1.2-SNAPSHOT-bin.z ip What is the process for getting the ytex branch merged into trunk? As I mentioned, there are very few changes to other ctakes classes/types - this should be completely complementary
Re: YTEX cTAKES 3.1.1 ready
Hi Vlad, I Updated the umls install guide; see https://code.google.com/p/ytex/wiki/UMLS_SQL_SERVER_3_1 I would prefer to add the docs in the ctakes confluence, but as far as I can tell, I don't have write access there - can somebody give me write privileges on the ctakes confluence site? There was a bug in the umls install; copy https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes-ytex/scripts/data/build.xmlover the corresponding file in your ctakes-3.1.2 install (CTAKES_HOME\bin\ctakes-ytex\scripts\data) and you should be set. The import is currently running on the UMLS 2013AA (I assume this will complete without issues as long as the umls schema hasn't changed from 2012). what trial and error did you have to go through to build the distro? -vj On Wed, Feb 5, 2014 at 5:33 PM, vijay garla vnga...@gmail.com wrote: Hi Vlad, sorry that the instructions aren't clear. re 1) What I am trying to say is install apache-ctakes-3.2.0-snapshot as usual (this is unchanged from 3.1.1). After that you still have to apply the lib and resources (these are things that cannot be distributed via apache). re 2) Yes, I need to update those docs. Hopefully will get to that at some point. However, I assume you already have a UMLS DB (also assume SQL Server). If you can't/don't want to use your existing umls DB, please tell me. The I'll priortize upgrading the doc on importing the umls tables (the scripts are there). best, VJ On Wed, Feb 5, 2014 at 4:44 PM, vlad.valtchi...@gmail.com wrote: Hi VJ- so, with trial and error were able to make the distribution and now have the apache-ctakes-3.1.2-SNAPSHOT-bin.zip archive. Here's what's unclear. 1. Is now this the only (combined) thing that you need for ctakes 3.1.1 + Ytex? the current documentation (https://code.google.com/p/yte x/wiki/Installation_cTAKES_3_1?ts=1388793998updated=Instal lation_cTAKES_3_1) which most probably is outdated, talks about installing cTakes 3.1.1 first and then applying 2 SNAPSHOT archives (downloadable) , lib and resources. This is a confusion point. 2. The directions to import UMLS subset are then outdated as well. Maybe one should use the old version (ctakes 2.5 and ytex 0.8) to import the RRF files for the UMLS subset and then just use the resulting db. Thoughts? Thanks, Vlad Valtchinov Brigham Rad On Thursday, January 30, 2014 5:17:43 PM UTC-5, vijay garla wrote: Hi Vlad, All of ytex has been moved into ctakes, it is currently in a branch ( https://svn.apache.org/repos/asf/ctakes/branches/ytex). You don't have to install ytex-0.8 - instead you will have to build and install from the ytex branch to create your own distribution. Steps 2 3 are correct. Although it is a pain, if you have the jdk, maven, and svn, you can easily build your own distro: * open a command prompt * make sure jdk, maven, and svn are in your path * cd to some directory where you want to check stuff out (I like c:\temp) * run the following commands rmdir /s /q ctakes svn co https://svn.apache.org/repos/asf/ctakes/branches/ytex ctakes cd ctakes mvn clean install -DskipTests And you will have the ctakes (with ytex) distro in ctakes\ctakes-distribution\target\apache-ctakes-3.1.2-SNAPSHOT-bin.zip What is the process for getting the ytex branch merged into trunk? As I mentioned, there are very few changes to other ctakes classes/types - this should be completely complementary and not affect any existing ctakes functionality. -vj On Thu, Jan 30, 2014 at 4:56 PM, vlad.va...@gmail.com wrote: Hi VJ-- this is great!! Thanks for all the hard work on it! We're starting to look into the new install. For now we're trying the binaries out. There were these questions about the proper install steps: 1. Do we first install ytex-0.8 2. Then install the new cTakes 3.1.1 instance and also apply the SNAPSHOT lib and resources zips 3. Work our way to install the UMLS ontologies in the db Its is not entirely clear from the new document ( https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_ 1?ts=1388793998updated=Installation_cTAKES_3_1) if there's still need to install ytex-0.8, or YTEX has been entirely merged into cTakes? If the last statement is correct, there are missing parts in i.e the UMLS install steps that are linked from the new ctakes 3.1.1 document. Thanks, vlad On Friday, January 3, 2014 10:21:52 PM UTC-5, vijay garla wrote: Hello All, I have finished an initial cut at the port of YTEX to cTAKES 3.1.1. Most of the YTEX functionality has been ported and integrated with cTAKES, and I've tested with MySQL and MS SQL Server (oracle tests pending). Most of the changes were made in new projects - very little existing cTAKES code has been modified. The only non-trivial changes are in /ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api - here I modified CharacterOffsetToLineTokenConverterCtakesImpl
Re: YTEX cTAKES 3.1.1 ready
Hi Vlad, All of ytex has been moved into ctakes, it is currently in a branch ( https://svn.apache.org/repos/asf/ctakes/branches/ytex). You don't have to install ytex-0.8 - instead you will have to build and install from the ytex branch to create your own distribution. Steps 2 3 are correct. Although it is a pain, if you have the jdk, maven, and svn, you can easily build your own distro: * open a command prompt * make sure jdk, maven, and svn are in your path * cd to some directory where you want to check stuff out (I like c:\temp) * run the following commands rmdir /s /q ctakes svn co https://svn.apache.org/repos/asf/ctakes/branches/ytex ctakes cd ctakes mvn clean install -DskipTests And you will have the ctakes (with ytex) distro in ctakes\ctakes-distribution\target\apache-ctakes-3.1.2-SNAPSHOT-bin.zip What is the process for getting the ytex branch merged into trunk? As I mentioned, there are very few changes to other ctakes classes/types - this should be completely complementary and not affect any existing ctakes functionality. -vj On Thu, Jan 30, 2014 at 4:56 PM, vlad.valtchi...@gmail.com wrote: Hi VJ-- this is great!! Thanks for all the hard work on it! We're starting to look into the new install. For now we're trying the binaries out. There were these questions about the proper install steps: 1. Do we first install ytex-0.8 2. Then install the new cTakes 3.1.1 instance and also apply the SNAPSHOT lib and resources zips 3. Work our way to install the UMLS ontologies in the db Its is not entirely clear from the new document ( https://code.google.com/p/ytex/wiki/Installation_cTAKES_ 3_1?ts=1388793998updated=Installation_cTAKES_3_1) if there's still need to install ytex-0.8, or YTEX has been entirely merged into cTakes? If the last statement is correct, there are missing parts in i.e the UMLS install steps that are linked from the new ctakes 3.1.1 document. Thanks, vlad On Friday, January 3, 2014 10:21:52 PM UTC-5, vijay garla wrote: Hello All, I have finished an initial cut at the port of YTEX to cTAKES 3.1.1. Most of the YTEX functionality has been ported and integrated with cTAKES, and I've tested with MySQL and MS SQL Server (oracle tests pending). Most of the changes were made in new projects - very little existing cTAKES code has been modified. The only non-trivial changes are in /ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api - here I modified CharacterOffsetToLineTokenConverterCtakesImpl SingleDocumentProcessorCtakes to deal with newlines within sentences correctly. Can somebody take a look at the changes in the ytex branch? I believe that the branch https://svn.apache.org/ repos/asf/ctakes/branches/ytex is ready to be merged into ctakes trunk, but would like other users to test it as well. Questions: * How can I distribute the ctakes binary distribution to ytex users before the merge? Can we make the branch build available somewhere? The binary distribution is too large to host on the ytex google code site (max 200 MB) * Non-ASF libraries - I have segregated these out into their own zip file that can be distributed via sourceforge. As a stopgap, I can upload this to the ytex google code site, but would prefer to upload to sourceforge. * UMLS Derivatives - Ditto for these - would like to move to sourceforge. * Documentation - How can I update the confluence docs? I would migrate the documentation from the google code website. Here the installation instructions (putting the wagon in front of the horse ...) https://code.google.com/p/ytex/wiki/Installation_cTAKES_ 3_1?ts=1388793998updated=Installation_cTAKES_3_1 Best, VJ -- You received this message because you are subscribed to the Google Groups ytex-users group. To unsubscribe from this group and stop receiving emails from it, send an email to ytex-users+unsubscr...@googlegroups.com. To post to this group, send email to ytex-us...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/ytex-users/70f03a80-ce1a-4c0e-b35d-5116d1c93ea0%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out.
Re: sentence detector newline behavior
For clarity, I'd like to stress that the opennlp sentence model distributed with ctakes today does 'work' with sentences that span newlines - as I understand it, this model ignores newline tokens (or newlines are not provided as features to that model). I believe the improvements Tim and others are suggesting are for a new sentence model + feature representation that takes advantage of newlines as features. Whatever we do, I believe we need backwards compatibility - those who are using the current sentence model may need to continue using it. To that end: * If we upgrade to the newest version of opennlp, will the old model work (and produce the same results)? * If a contributor trains a new model that uses a different feature representation, I believe that should go into a new Sentence Detector AnalysisEngine (or the same AE but with different configuration parameters), so users have a choice between the old and the new. -vj On Mon, Jan 27, 2014 at 1:09 PM, digital paula cybersat...@hotmail.comwrote: Tim, I just had to chime in on a comment you made.My deadline has been extended a bit on my pressing issue but I do intend to get back to testing per VJ's fix or maybe another fix is in the works based on latest emails...I need to read them again since a lot has been stated on the issue. Okay, as a new user (working w/cTAKES since October) I have never thought what you had stated: And I think this is the kind of thing that can leave new users scratching their heads and doubting our overall competence. Yeah, the sentence-spanning-newline issue was a problem so I just brought attention to it by my post of inquiry earlier this month on VJ's fix from last month and worked around it with treating narrative as one string. Anyone who's looked at the code would appreciate and acknowledge that cTAKES is a powerful and complex application. I'm overall impressed with it and I intend to continue to use it, improve it, and grow with it. I've been delving deeper into cTAKES on the machine learning aspect...I'm struggling a bit with it and if anything I scratch my head and doubt my competence. ;-) Regards, Paula Date: Mon, 27 Jan 2014 09:52:00 -0500 From: timothy.mil...@childrens.harvard.edu To: dev@ctakes.apache.org Subject: Re: sentence detector newline behavior OK, with the most recent version I am able to replicate the performance I was getting before. Thanks a lot Jörn! Assuming this is in the next incremental release of opennlp, how quickly can we get a re-trained model into cTAKES? I heard from a researcher at AMIA who tried cTAKES and because of this bug in the way we handle sentences was trying to find an outside sentence detector as a preprocess to cTAKES, and frankly that is insane. We should be able to get something this simple right. And I think this is the kind of thing that can leave new users scratching their heads and doubting our overall competence. James, I believe you are usually the one who rebuilds the models? What would be the best way to incorporate the data I have that has some instances of non-sentence terminating newlines? Tim On 01/27/2014 06:10 AM, Jörn Kottmann wrote: On 01/26/2014 11:29 PM, Miller, Timothy wrote: Yes, this fixes the whitespace sentence issue but the evaluation issue remains. I believe the problem is in SentenceSampleStream, where in the following block the whitespace trim happens before the LF character is replaced with the \n character. So test sentences that ended with LF will be one character longer than they should be. sentence = sentence.trim(); sentence = replaceNewLineEscapeTags(sentence); sentencesString.append(sentence); int end = sentencesString.length(); sentenceSpans.add(new Span(begin, end)); sentencesString.append(' '); Yes, that must be the issue. During training the new line is inlucded in the span, and during detection the white space remover creates a span without the new line char. I suggest that the evaluator just ignores white space differences between sentences. My test case then has the expected performance numbers. What do you think? Anyway, I committed the change. Please give it a try. Jörn
Re: sentence detector newline behavior
behavior The only rule I know of is that cTAKES (prior to ytex integration) always forces a sentence break at a newline. This was because the clinical notes cTAKES original processed never had newlines in the middle of a sentence, but did need sentence breaks to occur at end of sentence for good negation detection on those notes. I think Guergana earlier mentioned other EMRs also have this need, but it seems to not be ubiquitous. From others' posts, it seems that we could use an option in cTAKES to turn off this forcing of sentence breaks at newlines (or depending on how you look at it, an option to turn on the forcing of sentence breaks if we change the default behavior) I think we (cTAKES) need to decide the following: - do we want to do this for entire notes, or would it be worth it to have it be on a section-by-section basis. - what do we make the default behavior - to force or not to force newlines to be sentence breaks - what data (that contains newlines) will we use for training the sentence detector Regardless of those answers, I think OpenNLP support for including newlines in training data would be valuable for those others who have sentences that span lines. And having an option on OpenNLP to always break at newline would be useful for at least some cTAKES users (and we could remove the cTAKES code that does that) -- James -Original Message- From: dev-return-2390-Masanz.James=mayo@ctakes.apache.org [mailto: dev-return-2390-Masanz.James=mayo@ctakes.apache.org] On Behalf Of Jörn Kottmann Sent: Tuesday, January 21, 2014 4:29 AM To: dev@ctakes.apache.org Subject: Re: sentence detector newline behavior Yes, exactly, OPENNLP-602 is about training a sentence detector model which can use a new line as a end-of-sentence character. In case you have certain rules to split sentences we should have a look at them. The Sentence Detector could be extended to support a user provided rule based splitter. If there is an interest in that we could probably get it into 1.6.0 as well. Jörn On 01/20/2014 10:02 PM, Chen, Pei wrote: I presume Joern was suggesting that if he supports new lines in the opennlp SentenceDectector (either part of the trained models or post processing with some rules?) cTAKES will be able to use it out of the box and we should be able remove any additional custom logic that we currently have- which seems like a good idea. [but when to use within cTAKES individual components such as negation might be another discussion?] --Pei On Jan 20, 2014, at 12:46 PM, vijay garla vnga...@gmail.com wrote: The sentence detection opennlp model used by ctakes does not split sentences at newlines - there is additional logic in the takes sentence splitter that does this (and an alternative impl that doesn't is in the ytex branch). Afaik no retraining / change to the feature representation is necessary. Vj On Monday, January 20, 2014, Jörn Kottmann kottm...@gmail.com wrote: Hi all, currently I have quite a bit of time to work on OpenNLP, and would like to help you out with this issue. Here is the follow up issue for this change: https://issues.apache.org/jira/browse/OPENNLP-602 I am still trying to figure out what would be the best option to implement this. In the training data a user could just use a special tag to identify the chars. Instead of NEWLINE it might be better to use CR and LF to encode these two chars in the training data. Any thoughts? I am planning to release this as part of OpenNLP 1.6.0. Thanks, Jörn On 05/22/2013 02:03 PM, Jörn Kottmann wrote: On 05/22/2013 01:17 PM, Miller, Timothy wrote: That's awesome! It might be worth trying at least. How does the training process change? Previously the training data would be one sentence per line, but with newlines as possible mid-sentence characters that could be trouble, is there a new representation for training data? Or would we have to use the training api? Good point, yes that will be a problem with the default training format, but it shouldn't be hard to solve. In the format itself we could define a new line tag e.g. NEWLINE to mark new lines. as a hack to make it work with 1.5.3 you could instead use a special char as a replacement for the new line char. When you pass the text down to the sentence detector a simple string replace could be used to convert all new line chars to the special new line marker char. If things work out for you performance wise as well we will just integrate it properly into OpenNLP for the next release. Could you produce a sentence detector training file with a new line marker char? You should try to pick a char you can also pass in on a terminal otherwise you have to use the API to train the model. The build in cross validation could be used to evaluate the performance. Jörn
Re: svn commit: r1551805 - /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java
The issue is indeed the sentence splitter - negation is limited to words within the sentence, and if newlines are considered sentence boundaries, it doesn't work properly (splitting on newlines breaks many other things as well). The YTEX branch includes a sentence splitter that does not automatically split sentences on newlines. best, vj On Wed, Jan 15, 2014 at 10:03 AM, Masanz, James J. masanz.ja...@mayo.eduwrote: Hi Paula, The sentence detector in 3.1.0 and 3.1.1 (and previous releases) assumes sentences don't cross line boundaries. OpenNLP is used to find sentence breaks, but then if newlines are found, those are also set (within cTAKES, not OpenNLP) to be sentence breaks. (just FYI I haven't had a chance to look at the ytex branch, which the subject commit is about) -- James -Original Message- From: dev-return-2375-Masanz.James=mayo@ctakes.apache.org [mailto: dev-return-2375-Masanz.James=mayo@ctakes.apache.org] On Behalf Of digital paula Sent: Tuesday, January 14, 2014 10:25 PM To: dev@ctakes.apache.org Subject: RE: svn commit: r1551805 - /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java Hello cTAKES Developer Community, I'm a little behind on reading poststhis one is from last month. I think this issue is already addressed in current release? I'm still running the previous release...3.1.0. I just noticed something interesting, the negation didn't take when it is on a different line. I just removed all carriage returns from narratives and negation picked it up as long as it's treated as one long string. To better explain what I mean. Two narrative comments below. 1. patient did not have diabetes 2. patient did not have diabetes Number 1 above got negated but number 2 did not. This might be related to the issue w/the sectionizer. I noticed that when I treated the narrative as one string the sectionizer never crashes with the NPE. Well the sectionizer is of no point if narrative is as one string but it's helping me pinpoint the problem. Regards, Paula Date: Thu, 19 Dec 2013 11:04:57 -0500 Subject: Re: FW: svn commit: r1551805 - /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java From: vnga...@gmail.com To: dev@ctakes.apache.org Hi Pei, I'm not sure if that would solve the problem: change in the ytex branch causes newlines to be ignored (i.e. not treated as a token). trunk's sentence splitter is splits sentences on newlines, so newlines would never be found in a sentence. However, if we had a reproducer we could check it fairly easily in the ytex branch. Best, VJ On Thu, Dec 19, 2013 at 10:15 AM, Chen, Pei pei.c...@childrens.harvard.eduwrote: Vj, Do you think this is what was causing the NPE's [1]? If so, shall we make the same fix in trunk? --Pei [1] http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C924DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E -Original Message- From: vjapa...@apache.org [mailto:vjapa...@apache.org] Sent: Tuesday, December 17, 2013 9:15 PM To: comm...@ctakes.apache.org Subject: svn commit: r1551805 - /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java Author: vjapache Date: Wed Dec 18 02:14:13 2013 New Revision: 1551805 URL: http://svn.apache.org/r1551805 Log: add support for sentences that contain newline tokens. Modified: ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java Modified: ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java URL: http://svn.apache.org/viewvc/ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java?rev=1551805r1=1551804r2=1551805view=diff == --- ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java (original) +++ ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctake +++ s/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCta +++ kesImpl.java Wed Dec 18 02:14:13 2013 @@ -32,8 +32,8 @@ import org.apache.uima.jcas.tcas.Annotat import org.mitre.medfacts.i2b2.api.ApiConcept; import org.mitre.medfacts.zoner.CharacterOffsetToLineTokenConverter; import
Re: sentence splitter forks/branches
It is unfortunately not that trivial, as allowing newlines within sentences requires changes to the assertion and dependency parser modules. If you're not using those AEs you could theoretically build the ytex branch, and just add ctakes-ytex-uima.jar and ctakes-ytex-uima\desc\analysis_engine\SentenceDetectorAnnotator.xml to your exsting ctakes install (haven't tried it, but it should work). -vj On Wed, Jan 15, 2014 at 1:57 PM, Lingren, Todd todd.ling...@cchmc.orgwrote: I have a general question about forks, specifically the YTEX branch that Vijay mentions. If I wanted to implement just the sentence splitter from YTEX into a currently existing 3.1 install, how would I do that? Is it possible? Or do I have to switch over completely to run from YTEX branch? Todd Lingren Biomedical Informatics Cincinnati Children's Hospital todd.ling...@cchmc.org 513-803-9032 -Original Message- From: vijay garla [mailto:vnga...@gmail.com] Sent: Wednesday, January 15, 2014 11:34 AM To: dev@ctakes.apache.org Subject: Re: svn commit: r1551805 - /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakesImpl.java The issue is indeed the sentence splitter - negation is limited to words within the sentence, and if newlines are considered sentence boundaries, it doesn't work properly (splitting on newlines breaks many other things as well). The YTEX branch includes a sentence splitter that does not automatically split sentences on newlines. best, vj On Wed, Jan 15, 2014 at 10:03 AM, Masanz, James J. masanz.ja...@mayo.edu wrote: Hi Paula, The sentence detector in 3.1.0 and 3.1.1 (and previous releases) assumes sentences don't cross line boundaries. OpenNLP is used to find sentence breaks, but then if newlines are found, those are also set (within cTAKES, not OpenNLP) to be sentence breaks. (just FYI I haven't had a chance to look at the ytex branch, which the subject commit is about) -- James -Original Message- From: dev-return-2375-Masanz.James=mayo@ctakes.apache.org [mailto: dev-return-2375-Masanz.James=mayo@ctakes.apache.org] On Behalf Of digital paula Sent: Tuesday, January 14, 2014 10:25 PM To: dev@ctakes.apache.org Subject: RE: svn commit: r1551805 - /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes Impl.java Hello cTAKES Developer Community, I'm a little behind on reading poststhis one is from last month. I think this issue is already addressed in current release? I'm still running the previous release...3.1.0. I just noticed something interesting, the negation didn't take when it is on a different line. I just removed all carriage returns from narratives and negation picked it up as long as it's treated as one long string. To better explain what I mean. Two narrative comments below. 1. patient did not have diabetes 2. patient did not have diabetes Number 1 above got negated but number 2 did not. This might be related to the issue w/the sectionizer. I noticed that when I treated the narrative as one string the sectionizer never crashes with the NPE. Well the sectionizer is of no point if narrative is as one string but it's helping me pinpoint the problem. Regards, Paula Date: Thu, 19 Dec 2013 11:04:57 -0500 Subject: Re: FW: svn commit: r1551805 - /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes Impl.java From: vnga...@gmail.com To: dev@ctakes.apache.org Hi Pei, I'm not sure if that would solve the problem: change in the ytex branch causes newlines to be ignored (i.e. not treated as a token). trunk's sentence splitter is splits sentences on newlines, so newlines would never be found in a sentence. However, if we had a reproducer we could check it fairly easily in the ytex branch. Best, VJ On Thu, Dec 19, 2013 at 10:15 AM, Chen, Pei pei.c...@childrens.harvard.eduwrote: Vj, Do you think this is what was causing the NPE's [1]? If so, shall we make the same fix in trunk? --Pei [1] http://mail-archives.apache.org/mod_mbox/ctakes-dev/201309.mbox/%3C924 DE05C19409B438EB81DE683A942D9105A93CB%40CHEXMBX1A.CHBOSTON.ORG%3E -Original Message- From: vjapa...@apache.org [mailto:vjapa...@apache.org] Sent: Tuesday, December 17, 2013 9:15 PM To: comm...@ctakes.apache.org Subject: svn commit: r1551805 - /ctakes/branches/ytex/ctakes-assertion/src/main/java/org/apache/ctakes /assertion/medfacts/i2b2/api/CharacterOffsetToLineTokenConverterCtakes Impl.java Author: vjapache Date: Wed Dec 18 02:14:13 2013 New Revision: 1551805 URL
Re: YTEX cTAKES 3.1.1 ready
sisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co mpositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav a:387) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java :254) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini tASB(AggregateAnalysisEngine_impl.java:431) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini tialize(AggregateAnalysisEngine_impl.java:185) at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy sisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co mpositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav a:354) at org.apache.uima.tools.cvd.MainFrame.setupAE(MainFrame.java:1484) at org.apache.uima.tools.cvd.MainFrame.loadAEDescriptor(MainFrame.java:4 77) at org.apache.uima.tools.cvd.control.AnnotatorOpenEventHandler.actionPer formed(AnnotatorOpenEventHandler.java:52) at javax.swing.AbstractButton.fireActionPerformed(Unknown Source) at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source) at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source) at javax.swing.DefaultButtonModel.setPressed(Unknown Source) at javax.swing.AbstractButton.doClick(Unknown Source) at javax.swing.plaf.basic.BasicMenuItemUI.doClick(Unknown Source) at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(Unknown Source) at java.awt.Component.processMouseEvent(Unknown Source) at javax.swing.JComponent.processMouseEvent(Unknown Source) at java.awt.Component.processEvent(Unknown Source) at java.awt.Container.processEvent(Unknown Source) at java.awt.Component.dispatchEventImpl(Unknown Source) at java.awt.Container.dispatchEventImpl(Unknown Source) at java.awt.Component.dispatchEvent(Unknown Source) at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source) at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source) at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source) at java.awt.Container.dispatchEventImpl(Unknown Source) at java.awt.Window.dispatchEventImpl(Unknown Source) at java.awt.Component.dispatchEvent(Unknown Source) at java.awt.EventQueue.dispatchEventImpl(Unknown Source) at java.awt.EventQueue.access$200(Unknown Source) at java.awt.EventQueue$3.run(Unknown Source) at java.awt.EventQueue$3.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Sour ce) at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Sour ce) at java.awt.EventQueue$4.run(Unknown Source) at java.awt.EventQueue$4.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Sour ce) at java.awt.EventQueue.dispatchEvent(Unknown Source) at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source) at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source) at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source) at java.awt.EventDispatchThread.pumpEvents(Unknown Source) at java.awt.EventDispatchThread.pumpEvents(Unknown Source) at java.awt.EventDispatchThread.run(Unknown Source) On Saturday, January 4, 2014 9:06:52 AM UTC+5:45, vijay garla wrote: Hello All, I have finished an initial cut at the port of YTEX to cTAKES 3.1.1. Most of the YTEX functionality has been ported and integrated with cTAKES, and I've tested with MySQL and MS SQL Server (oracle tests pending). Most of the changes were made in new projects - very little existing cTAKES code has been modified. The only non-trivial changes are in /ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api - here I modified CharacterOffsetToLineTokenConverterCtakesImpl SingleDocumentProcessorCtakes to deal with newlines within sentences correctly. Can somebody take a look at the changes in the ytex branch? I believe that the branch https://svn.apache.org/ repos/asf/ctakes/branches/ytex is ready
Re: YTEX cTAKES 3.1.1 ready
see answers inline On Tue, Jan 7, 2014 at 10:35 AM, Chen, Pei pei.c...@childrens.harvard.eduwrote: * How can I distribute the ctakes binary distribution to ytex users before the merge? Can we make the branch build available somewhere? The binary distribution is too large to host on the ytex google code site (max 200 MB) Is this for testing purposes? Or official release? If it's just for testing, there will be more options... Are you referring to the convenience binary/zip file? Or maven artifacts that could be deployed to the SNAPSHOTS repo [1]? If it's for testing, you can always have users build from source via mvn package (assuming you added the ytex* to the ctakes-distribution module)? Again if it's for testing, you can always try the svn or home dir. But it's not the recommended channel for actual distribution to users because that normally has to go through the normal release process (Voting, etc.). This is for testing. Ytex has been added to the ctakes distro * Non-ASF libraries - I have segregated these out into their own zip file that can be distributed via sourceforge. As a stopgap, I can upload this to the ytex google code site, but would prefer to upload to sourceforge. Are these optional 3rd party libs available via maven central? Most of them are. The only exception is the MS SQL Driver, which is freely redistributable (see http://msdn.microsoft.com/en-us/sqlserver/aa937725). I did not find anything similar for the oracle jdbc driver so I left that out (users will have to download that separately). The zip is here: https://ytex.googlecode.com/files/ctakes-ytex-lib-3.1.2-SNAPSHOT.zip * UMLS Derivatives - Ditto for these - would like to move to sourceforge. Are you planning to distribute them via maven central? I think it would be nice to make these available as maven artifacts. If so, what is your sourceforge id? We can grant you access to the existing ctakes resourcse project [2]: The pom.xml is already setup to upload to OSS Sonatype (request a login for oss sonatype to perform a mvn deploy for the actual upload later on)... I have placed the umls resources behind a server that requires UTS authentication (note that this obviates the need for supplying umls username and password in ctakes config files/scripts). The umls resources are here: http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip This is a plain old apache http server with the module for CAS (the other CAS) authentication. If ctakes has an apache server somewhere, we could do the same. * Documentation - How can I update the confluence docs? I would migrate the documentation from the google code website. This would be great; You've been added to the cTAKES confluence space [3]. Downloading the code now... To be continued... [1] https://repository.apache.org/content/groups/snapshots/org/apache/ctakes/ [2] http://sourceforge.net/p/ctakesresources/code/HEAD/tree/trunk/ [3] https://cwiki.apache.org/confluence/display/CTAKES/cTAKES -Original Message- From: vijay garla [mailto:vnga...@gmail.com] Sent: Friday, January 03, 2014 10:23 PM To: ytex-us...@googlegroups.com; ctakes-...@incubator.apache.org Subject: YTEX cTAKES 3.1.1 ready Hello All, I have finished an initial cut at the port of YTEX to cTAKES 3.1.1. Most of the YTEX functionality has been ported and integrated with cTAKES, and I've tested with MySQL and MS SQL Server (oracle tests pending). Most of the changes were made in new projects - very little existing cTAKES code has been modified. The only non-trivial changes are in /ctakes- assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api - here I modified CharacterOffsetToLineTokenConverterCtakesImpl SingleDocumentProcessorCtakes to deal with newlines within sentences correctly. Can somebody take a look at the changes in the ytex branch? I believe that the branch https://svn.apache.org/repos/asf/ctakes/branches/ytex is ready to be merged into ctakes trunk, but would like other users to test it as well. Questions: * How can I distribute the ctakes binary distribution to ytex users before the merge? Can we make the branch build available somewhere? The binary distribution is too large to host on the ytex google code site (max 200 MB) * Non-ASF libraries - I have segregated these out into their own zip file that can be distributed via sourceforge. As a stopgap, I can upload this to the ytex google code site, but would prefer to upload to sourceforge. * UMLS Derivatives - Ditto for these - would like to move to sourceforge. * Documentation - How can I update the confluence docs? I would migrate the documentation from the google code website. Here the installation instructions (putting the wagon in front of the horse ...) https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_1?ts=13887939 98updated
Re: ytex branch
Just adding fields Best Vj On Tuesday, November 26, 2013, Chen, Pei wrote: Hi VJ, Sounds cool. I guess once things are in the branch, we can start to take a look to see if it makes sense to incorporate them directly into existing ctakes modules or not? Just curious- were the type system changes mainly adding additional fields? Just planning ahead especially for proposed type system changes... --Pei -Original Message- From: vijay garla [mailto:vnga...@gmail.com javascript:;] Sent: Monday, November 25, 2013 5:07 PM To: ctakes-...@incubator.apache.org javascript:; Subject: ytex branch Hello All, I'm close to done with the port of ytex to ctakes. I would like to create branch to commit the changes to for review by the ctakes elders and other developers. I will be adding the following projects: * ctakes-ytex-res - resources * ctakes-ytex - no uima/ctakes dependencies - primarily semantic similarity code * ctakes-ytex-uima - ctakes annotators and pipeline configs I made very few changes to other ctakes modules, these include: * fixing spring version conflicts * treatment of newlines in various annotators * added properties to OntologyConcept type to support word sense disambiguation Any objections to a branch? The main thing left to do is packaging for the binary distro. * setup ant scripts: I think bin\scripts would be a good spot * adding to ctakes-resources download: I have the following to add: - delimited text file with lookup dictionary (similar to hsqldb for current dictionary lookup) - concept graphs for semantic similarity and WSD - libraries for jdbc drivers (mysql, oracle, sql server) and hibernate For the ctakes-resources additions, I can create a zip file to add to the ctakes- resources, and send it to somebody (I think it will be a bit big to attach to a ticket, and the whole point is not to have non-asf compliant stuff lurking around apache) TIA, VJ
Re: using umls dictionary lookup offline?
No need to write any new annotator, just point org.apache.ctakes.dictionary.lookup.ae.DictionaryLookupAnnotator at the UMLS HSQL DB. The UMLS password check is a very weak chastity belt. For ytex, we make umls resources available via a UTS password protected website - to get the resources the user has to enter their UTS username/password which ensures that the user has accepted the UMLS licenses. I think making ctakes resources available via a similar method would be more elegant. -vj On Fri, Nov 15, 2013 at 8:50 AM, Chen, Pei pei.c...@childrens.harvard.eduwrote: Hi Matt, The license validation is ultimately the responsibility of the user. So if you ensure you have the license, technically, what you can do is just write a Annotator that just uses those resources directly :). --Pei -Original Message- From: Coarr, Matt [mailto:mco...@mitre.org] Sent: Thursday, November 14, 2013 9:37 PM To: dev@ctakes.apache.org Subject: using umls dictionary lookup offline? What would I need to do to run ctakes offline? In particular, I believe that means what do I need to do to get the full dictionary lookup to work? Do I need to build my own UMLS dictionary database and use a custom dictionary class? I looked around on the wiki (in the ctakes 3.0 and 3.1 developer guides and the ctakes 3.0 component guide). I know way back when we used to have this before we simplified things by pre-packaging the UMLS dictionary and doing online validation of the UMLS username and password. Thanks! Matt
ytex ctakes patches
For the YTEX port, I've taken a few baby steps ... I've filed some jira tickets with patches: CTAKES-253https://issues.apache.org/jira/browse/CTAKES-253 and CTAKES-252 https://issues.apache.org/jira/browse/CTAKES-252, more coming soon. I have a question regarding testing: it seems to me that the old analysis engines all use xml descriptors, whereas the newer analysis engines appear to be using uimafit. I understand why that's the case, but the dissonance between the development user directory structures makes it very difficult to write portable tests and portable xml-based ae configs: for a 'user' install, everything under desc is in the classpath, whereas for the developer install, none of the desc directories are in the classpath. When I'm writing an XML-based aggregate AE config, I prefer to import delegate AEs by name instead of location as resolving files by classpath is much more flexible than resolution by file paths. Can we add the desc directories to the maven-surefire-plugin classpath (as is done with resources) so that the classpath is consistent across developer/user installs? TIA, VJ
Re: move ytex annotators to ctakes.apache.org?
Hello All, I've started on the ytex-ctakes port, and have some packaging questions. * Hibernate Weka JDBC Driver (SQL Server, Oracle) dependencies: I understand that we will not ship these jars as part of the ctakes download. Can we bundle the jars and ship them as part of an additional download, available via sourceforge? Hibernate is available via maven central, weka and jdbc not. I have added weka jdbc drivers as system dependencies. I'm not sure how you collect all the dependencies for shipment, but how do I tell maven not to include these? Is it OK to check weka jdbc into source control? * desc vs project-res What are the guidelines for what goes where? Configuration files are found in both places, whereas data/models are in the -res directory. Ytex has many non-uima config files (hibernate, spring) which should be user-modifiable, and I would put them in the desc directory. However, desc is not in the project classpath (but it is in the classpath for the ctakes distro, e.g. in runctakesCPE.bat). Any reason for this dissonance? I would add desc as a resources directory in the pom. * distribution of umls concept graphs for semantic similarity and word sense disambiguation, ytex provides concept graphs derived from the UMLS. We have a download site that requires UTS login to get these concept graphs ( http://www.ytex-nlp.org/umls.download/secure/0.7/umls.zip). I take it I would just create a -res directory and add the concept graphs here, and they would automagically appear in the ctakes-resources zip? * patches to other ctakes projects ytex has some patches to other ctakes annotators for handling edge cases where they throw up with an exception; I will check to see if these changes have already been made. If not, I will file separate Jira tickets for these patches. Also, the CharacterOffsetToLineTokenConverterCtakesImpl needs to be modified to properly handle cases where newlines are in sentences; I will add a patch for that as well. * post download setup ytex provides an ant script to simplify the post download setup (database schema, setup, configuration file generation). Would it be possible to ship ant with the ctakes distro, so that users can execute these scripts? If not, how best to automate setup? I know from experience with earlier versions of ytex that setting up the database schema is error prone, and that this needs to be automated. I was planning on creating the following projects: * ctakes-ytex: Base ytex, includes semantic similarity tools. This has no dependencies on ctakes, and I would create a separate distribution of just this package for a semantic similarity distro. * ctakes-ytex-res Includes concept graphs for semantic similarity. * ctakes-ytex-web Provides User Interface, RESTful, and WebServices interface to semantic similarity service. This has no dependencies on ctakes, and this would be included in the semantic similarity distro. * ctakes-ytex-uima Includes ytex analysis engines * ctakes-ytex-uima-res resources for ytex analysis engines Alternatively, I can add ctakes-ytex-uima and ctakes-ytex-uima-res to existing projects (don't know where they would fit). Best, Vijay On Thu, Oct 3, 2013 at 7:06 PM, vijay garla vnga...@gmail.com wrote: Hi Pei, The WSD annotator relies on the semantic similarity component, which is a general purpose tool not strictly limited to ctakes or NLP. I would like to keep the semantic similarity component 'standalone', i.e. with no dependencies on ctakes, and make it redistributable on its own. If that is possible as part of ctakes, I'd love to move it. If not, I'd leave the semantic similarity and the associated WSD annotator on google code. For those of you who want the back story: http://www.biomedcentral.com/1471-2105/13/261 http://jamia.bmj.com/content/20/5/882.long -vj On Thu, Oct 3, 2013 at 5:13 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: vj, Were you thinking of contributing the new ytext Word Sense Disambiguation component as well- I think that will be really cool. --Pei -Original Message- From: ksa...@gmail.com [mailto:ksa...@gmail.com] On Behalf Of Karthik Sarma Sent: Thursday, October 03, 2013 1:05 PM To: dev@ctakes.apache.org Subject: Re: move ytex annotators to ctakes.apache.org? This would be quite valuable -- in particular, ytex's annotation database connection is much easier to use than what ships with cTAKES. There are a fair number of other advantages, and I think they'd all be very valuable! -- Karthik Sarma UCLA Medical Scientist Training Program Class of 20?? Member, UCLA Medical Imaging Informatics Lab Member, CA Delegation to the House of Delegates of the American Medical Association ksa...@ksarma.com gchat: ksa...@gmail.com linkedin: www.linkedin.com/in/ksarma On Thu, Oct 3, 2013 at 5:50 AM, vijay garla vnga...@gmail.com wrote: Hello All, I'd like to contribute ytex to ctakes
move ytex annotators to ctakes.apache.org?
Hello All, I'd like to contribute ytex to ctakes. YTEX's main feature is the ability to store *any* ctakes (or uima) annotation in a relational database (in a relational format), and the ability to export these annotations to ML packages (weka, libsvm, matlab, R). All of this is purely declarative/via configuration. In addtion, Ytex provides the following: * Negation Detection with Negex * SegmentRegexAnnotator - section detection with regular expressions * NamedEntityRegexAnnotator - named entity detection with regular expressions * Sentence Splitter - modified ctakes sentence splitter making sentence split patterns configurable (not hardcoded to \n) YTEX currently works with ctakes 2.5; I would like to upgrade it to the latest ctakes, and if the community is interested, contribute to ctakes.apache.org. A licensing question: YTEX uses Spring (apache 2.0 license), Hibernate (lgpl 2.1), weka (gpl). Are there any issues with including these? Cheers vj
Re: move ytex annotators to ctakes.apache.org?
Hi Pei, The WSD annotator relies on the semantic similarity component, which is a general purpose tool not strictly limited to ctakes or NLP. I would like to keep the semantic similarity component 'standalone', i.e. with no dependencies on ctakes, and make it redistributable on its own. If that is possible as part of ctakes, I'd love to move it. If not, I'd leave the semantic similarity and the associated WSD annotator on google code. For those of you who want the back story: http://www.biomedcentral.com/1471-2105/13/261 http://jamia.bmj.com/content/20/5/882.long -vj On Thu, Oct 3, 2013 at 5:13 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: vj, Were you thinking of contributing the new ytext Word Sense Disambiguation component as well- I think that will be really cool. --Pei -Original Message- From: ksa...@gmail.com [mailto:ksa...@gmail.com] On Behalf Of Karthik Sarma Sent: Thursday, October 03, 2013 1:05 PM To: dev@ctakes.apache.org Subject: Re: move ytex annotators to ctakes.apache.org? This would be quite valuable -- in particular, ytex's annotation database connection is much easier to use than what ships with cTAKES. There are a fair number of other advantages, and I think they'd all be very valuable! -- Karthik Sarma UCLA Medical Scientist Training Program Class of 20?? Member, UCLA Medical Imaging Informatics Lab Member, CA Delegation to the House of Delegates of the American Medical Association ksa...@ksarma.com gchat: ksa...@gmail.com linkedin: www.linkedin.com/in/ksarma On Thu, Oct 3, 2013 at 5:50 AM, vijay garla vnga...@gmail.com wrote: Hello All, I'd like to contribute ytex to ctakes. YTEX's main feature is the ability to store *any* ctakes (or uima) annotation in a relational database (in a relational format), and the ability to export these annotations to ML packages (weka, libsvm, matlab, R). All of this is purely declarative/via configuration. In addtion, Ytex provides the following: * Negation Detection with Negex * SegmentRegexAnnotator - section detection with regular expressions * NamedEntityRegexAnnotator - named entity detection with regular expressions * Sentence Splitter - modified ctakes sentence splitter making sentence split patterns configurable (not hardcoded to \n) YTEX currently works with ctakes 2.5; I would like to upgrade it to the latest ctakes, and if the community is interested, contribute to ctakes.apache.org. A licensing question: YTEX uses Spring (apache 2.0 license), Hibernate (lgpl 2.1), weka (gpl). Are there any issues with including these? Cheers vj
Re: Next cTAKES release (3.1)?
We released code on using cTAKES to annotate clinical text and SVMs that use the annotations to classify clinical text from the CMC 2007 and I2B2 2008 challenges: We did the cmd 2007 with cTAKES 2.5: https://code.google.com/p/ytex/wiki/WordSenseDisambiguation_V08#Reproducing_results_on_CMC_2007_challengehttps://code.google.com/p/ytex/downloads/list And the i2b2 2008 with the version of cTAKES distributed with the first version of ARC: https://code.google.com/p/ytex/wiki/FeatEng_V05#i2b2_2008 These are both publicly available datasets, and represent real-world problems (in general I believe when publishing a paper the code should be reproducible and made publicly available, but that's a different issue). When we get around to upgrading YTEX to cTAKES 3.1, we would like to upgrade these samples as well. Best, VJ On Thu, Jun 27, 2013 at 8:32 PM, Andy McMurry mcmurry.a...@gmail.comwrote: +1 suggestion for documenting many examples of getting started NLP datasets. I have at least one we can use that was created by our lead Pathologist https://open.med.harvard.edu/svn/scrubber/releases/3.0/data/input/cases/train/traincase.xml We should provide at least one sample for each domain. Trouble is, privacy requires that these examples be made up by hand and not copy-pasted from EMR systems. --Andy On Jun 27, 2013, at 5:32 PM, Girivaraprasad Nambari girinamb...@gmail.com wrote: +1 for this observation Andy! Lowering time will motive users in writing blogs about features, how to, etc., which reduces core team work load on documentation. I have been trying to write a small how to write standalone client for ctakes with my experience (I saw at least 4 users posted similar question in last 2 months), but not getting enough time because ctakes depends on lot of other frameworks (UimaFit, cleartk, UIMA Framework etc.,), most of my spare time is being spent on juggling between these frameworks, posting and browsing those forums, relating observations to ctakes code. I think we need to have some high level documentation about these (with links to corresponding forums). Above case is for developers (I think this will be more user base as ctakes progress), for users I think documentation is lot better though some improvements need to be done. As a developer I felt tough with lack of sample training data (I am still struggling in this area even though I browsed all relevant code), though training class are there. I understood that there are licensing issues with REAL data, but at least some hand made example sentences, which may not be real but helps developers in understanding the type/structure of input TRAINING classes expecting. This way people who browse the code can reverse engineer and develop their own models. Sorry if you guys feel this as novice issue, but I feel most of the developers will be novice when they adopt a system and Machine Learning/NLP is ocean. Some documentation in this area will same lot of time for us. I wish there will be some activity in this area from ctakes core team. Thank you, Giri On Thu, Jun 27, 2013 at 5:11 PM, Andy McMurry mcmurry.a...@gmail.com wrote: ctakes is at a point where we have a LOT of features but it is still hard to get started. Judging from the mailing lists a lot of how cTakes works is not obvious and requires hand holding. This is very typical in early FOSS projects. Lowering the time to get invested in ctakes gets more users AND better bug reports, FAQ, etc. thoughts? --Andy On Apr 11, 2013, at 8:55 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: Hi, I just wanted to gauge the interest of creating the next release of cTAKES (3.1) which is currently marked for May in Jira- There have already been 22/53 issues [1] marked as fixed or closed. Plenty of bug fixes and new components including: - New CEM Instance Template population - New Dependency Parser/Semantic Role Labeler - New optional Clear POSTagger - New regression testing component Should we wait for the Temporal component? [1] https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%223.1%22%20AND%20project%20%3D%20CTAKES