Hi Sandeep, That is great news, and good job. OK, for some ideas about developing your proposal, you may want to simply start with a Google Docs, and then share it with Pei. I'd be happy to help co-mentor if Pei and you think it's useful too.
Your proposal should likely cover: 1. Background - what's the state of CTAKES-189 and what's it trying to accomplish (include some figures, etc. along with your text) 2. Approach - what are you going to do to solve CTAKES-189. Be specific, and try to break it down into smaller, easily reversible steps 3. Schedule - how long and what is the schedule for achieving this? 4. Risks/etc. - any known risks like are you taking a vacation anytime soon :) or are there other time constraints? 5. References, etc. HTH and I'd be happy if you want to share the GDocs with me as you develop it. Cheers! Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: sandeep rg <sandeep.f...@gmail.com> Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> Date: Saturday, July 13, 2013 8:57 AM To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> Subject: Re: to involve in your development group >i have also gone through the technologies available for development of >ocr,from that i think apache tika and tessearact is best for resolving the >problem. > > >On Sat, Jul 13, 2013 at 9:02 PM, sandeep rg <sandeep.f...@gmail.com> >wrote: > >> hi Mattamann Chris, >> i has participated in the event coordinated by luciano resende >> >> http://community.apache.org/mentoringprogramme-icfoss-pilot.html >> >> and from that i learned about open source and like to work on your >>project >> ctakes.i would like to fix the jira >> >> https://issues.apache.org/jira/browse/CTAKES-189 >> >> chen pei accepted my requested to be my mentor.now i want to give a >> proposal to apache about the project i am going to work on.can you help >>me >> to prepare a proposal to be submitted before 18 th of this july. >> >> >> >> >> >> >> On Sat, Jul 13, 2013 at 2:26 AM, Mattmann, Chris A (398J) < >> chris.a.mattm...@jpl.nasa.gov> wrote: >> >>> Hi Sandeep, >>> >>> I think the best thing to do is: >>> >>> 1. Develop a JIRA issue here: >>> https://issues.apache.org/jira/browse/CTAKES >>> 1a. you can register for a new account on JIRA >>> 2. Once your JIRA issue is created, feel free to start a [DISCUSS] >>>thread >>> (e.g., with subject [DISCUSS] "some topic" where "some topic" is >>>perhaps >>> the main idea you have) on dev@ctakes.apache.org, referencing your >>>issue >>> and >>> asking for feedback >>> 3. Work with the Apache cTAKES PMC and committers to get your patches >>>and >>> other items attached to your issue from #1 committed into the sources >>> >>> Ideally if 1-3 happen and it's a good interaction, Apache is built on >>> meritocracy and you could possibly earn the merit to become a PMC >>>member >>> or committer on the project. >>> >>> Cheers, >>> Chris >>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Chris Mattmann, Ph.D. >>> Senior Computer Scientist >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>> Office: 171-266B, Mailstop: 171-246 >>> Email: chris.a.mattm...@nasa.gov >>> WWW: http://sunset.usc.edu/~mattmann/ >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Adjunct Assistant Professor, Computer Science Department >>> University of Southern California, Los Angeles, CA 90089 USA >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: sandeep rg <sandeep.f...@gmail.com> >>> Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >>> Date: Thursday, July 11, 2013 11:30 AM >>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >>> Subject: Re: to involve in your development group >>> >>> >can you provide what all details i should include in a >>>proposal?whether i >>> >wanted to include all implemetation(technical) details in the >>>proposal? >>> > >>> > >>> >On Thu, Jul 11, 2013 at 9:45 PM, Mattmann, Chris A (398J) < >>> >chris.a.mattm...@jpl.nasa.gov> wrote: >>> > >>> >> Dear Sandeep, >>> >> >>> >> Thanks for your interest in cTAKES. We would welcome your >>>contribution >>> >> and are happy to have your interest in the project. >>> >> >>> >> Cheers, >>> >> Chris >>> >> >>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >> Chris Mattmann, Ph.D. >>> >> Senior Computer Scientist >>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>> >> Office: 171-266B, Mailstop: 171-246 >>> >> Email: chris.a.mattm...@nasa.gov >>> >> WWW: http://sunset.usc.edu/~mattmann/ >>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >> Adjunct Assistant Professor, Computer Science Department >>> >> University of Southern California, Los Angeles, CA 90089 USA >>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> -----Original Message----- >>> >> From: sandeep rg <sandeep.f...@gmail.com> >>> >> Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >>> >> Date: Wednesday, July 10, 2013 11:01 AM >>> >> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >>> >> Subject: Re: to involve in your development group >>> >> >>> >> >sir, >>> >> > >>> >> >My name is sandeep rg.i am a btech graduate in computer science.now >>> >>doing >>> >> >an internship in a company in java language. >>> >> > >>> >> >then i had installed all things succesfully,now downloading the >>> >> >resource.ittake too much time. >>> >> > >>> >> >i have gone through the suggested ocr technologies. >>> >> >Javaocr has some good user review. >>> >> >Apache tika has a capability to process different types of format. >>> >> >More than that there is tesserract which are also used for ocr >>> purpose. >>> >> >then apache pdfbox is also used for text extratcion but only for >>>pdf >>> >> >files. >>> >> >now i am going through every thing to find out best technology from >>> >>this. >>> >> > >>> >> > >>> >> >On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei >>> >> ><pei.c...@childrens.harvard.edu>wrote: >>> >> > >>> >> >> Hi Sandeep, >>> >> >> I am delighted to work with you on this project. >>> >> >> >>> >> >> I was not sure if I understood you correctly- did you mean to say >>> >>that >>> >> >>you >>> >> >> have already tried using cTAKES and it's components? >>> >> >> If not, you can do an svn checkout of the code and try running >>>the >>> >> >> debugger gui from the command line (or eclipseide) that will >>>allow >>> >>you >>> >> >>to >>> >> >> type in plain text and get back the different structured content >>> >>(types) >>> >> >> that cTAKES produces: >>> >> >> MAVEN_OPTS="-Xmx2g -Xms1g" >>> >> >> mvn -PrunCVD compile >>> >> >> From the guide: >>> >> >> >>> >> >> >>> >> >>> >> >>> >>>https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Developer+ >>>I >>> >> >>nstall+Guide >>> >> >> >>> >> >> A bit of background: >>> >> >> Apache cTAKES uses SVN for version on control: >>> >> >> https://svn.apache.org/repos/asf/ctakes/trunk/ >>> >> >> Jira for issues tracking: >>> >> >> https://issues.apache.org/jira/browse/ctakes >>> >> >> Maven for building and dependency management. >>> >> >> A lot of the developers use Eclipse IDE for their development. >>> >> >> More info on ctakes.apache.org >>> >> >> >>> >> >> cTAKES is built on top of the Apache UIMA Framework. >>>Essentially, >>> >> >>cTAKES >>> >> >> is a collection of Annotators (Java Classes) and wired together >>>to >>> >>into >>> >> >>a >>> >> >> pipeline. >>> >> >> It's goal in a nutshell is to turn unstructured plain text into >>> >> >> structured/normalized form and specially trained for medical >>>notes. >>> >> >> Right now- the input cTAKES expects would be in plain text form >>>and >>> >> >>cTAKES >>> >> >> does not have an OCR component. >>> >> >> cTAKE-189:GSoC:implement OCR/tika to standardize text inputs was >>>an >>> >>idea >>> >> >> to allow cTAKES to take in any type of input (PDF, Images, Word, >>> XLS, >>> >> >>etc.) >>> >> >> and pass the text for cTAKES processing. >>> >> >> [I was originally thinking this could be done in some kind of >>> >> >> preprocessing, or an optional Annotator that could be added in >>>the >>> >> >> beginning of a pipeline]. There may be some existing work that >>> >>could be >>> >> >> potentially reused: Apache Tika ( >>> >> >> https://issues.apache.org/jira/browse/TIKA-93 ) as well as some >>> open >>> >> >> source OCR toolkits (JavaOCR). >>> >> >> >>> >> >> About Me: >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >>> >>>http://childrenshospital.org/cfapps/research/data_admin/Site3240/mainpag >>>e >>> >> >>S3240P8.html >>> >> >> http://www.linkedin.com/in/peistation >>> >> >> http://people.apache.org/committer-index.html#chenpei >>> >> >> >>> >> >> > -----Original Message----- >>> >> >> > From: sandeep rg [mailto:sandeep.f...@gmail.com] >>> >> >> > Sent: Tuesday, July 09, 2013 1:19 PM >>> >> >> > To: dev@ctakes.apache.org >>> >> >> > Subject: Re: to involve in your development group >>> >> >> > >>> >> >> > Thanks a lot for giving me support.i like to work with you. >>> >> >> > >>> >> >> > I have gone through the objectives of the software,used the >>> >>software >>> >> >>and >>> >> >> > gone through various components of the project.can you provide >>>me >>> >> >> starting >>> >> >> > point from where i should start to know more about the coding >>>part >>> >>of >>> >> >>the >>> >> >> > project. >>> >> >> > >>> >> >> > can you tell me more about the project and about you also? >>> >> >> > >>> >> >> > >>> >> >> > On Tue, Jul 9, 2013 at 1:14 AM, Chen, Pei >>> >> >> > <pei.c...@childrens.harvard.edu>wrote: >>> >> >> > >>> >> >> > > Hi Sandeep, >>> >> >> > > Thank you for the interest. I just had a quick look at the >>> >>ICFOSS >>> >> >> > > pilot mentoring program and will be happy to serve as a >>>mentor >>> >>for >>> >> >> > > your project >>> >> >> > > proposal(s) if you are interested. >>> >> >> > > >>> >> >> > > --Pei >>> >> >> > > >>> >> >> > > > -----Original Message----- >>> >> >> > > > From: sandeep rg [mailto:sandeep.f...@gmail.com] >>> >> >> > > > Sent: Monday, July 08, 2013 2:24 PM >>> >> >> > > > To: dev@ctakes.apache.org >>> >> >> > > > Subject: Re: to involve in your development group >>> >> >> > > > >>> >> >> > > > sir, >>> >> >> > > > >>> >> >> > > > details of the program Pilot mentoring programme with india >>> >>ICFOSS >>> >> >> > > > is >>> >> >> > > given >>> >> >> > > > in the below web address >>> >> >> > > > >>> >> >> > > > >>> >>http://community.apache.org/mentoringprogramme-icfoss-pilot.html >>> >> >> > > > >>> >> >> > > > >>> >> >> > > > I am new to this community so i need a mentor for the >>> >>project.It >>> >> >> > > > will be >>> >> >> > > more >>> >> >> > > > helpful for me.. >>> >> >> > > > >>> >> >> > > > >>> >> >> > > > On Mon, Jul 8, 2013 at 7:22 PM, Chen, Pei >>> >> >> > > > <pei.c...@childrens.harvard.edu>wrote: >>> >> >> > > > >>> >> >> > > > > Hi Sandeep, >>> >> >> > > > > Welcome! I am not familiar with the details of >>> >>icfoss-apache, >>> >> >>but >>> >> >> > > > > please- you are more than welcome to work on the code and >>> >> >> > > > > contributions will be greatly appreciated! >>> >> >> > > > > There may be a learning curve, but feel free let us know >>>if >>> >>you >>> >> >> > > > > have any questions/issues. >>> >> >> > > > > Thanks, >>> >> >> > > > > Pei >>> >> >> > > > > >>> >> >> > > > > > -----Original Message----- >>> >> >> > > > > > From: sandeep rg [mailto:sandeep.f...@gmail.com] >>> >> >> > > > > > Sent: Saturday, July 06, 2013 11:50 AM >>> >> >> > > > > > To: dev@ctakes.apache.org >>> >> >> > > > > > Subject: to involve in your development group >>> >> >> > > > > > >>> >> >> > > > > > my name is sandeep.i am btech graduate.i had >>>participated >>> >>in >>> >> >>a >>> >> >> > > > > > camp coordinated in kerala,India in association with >>> >> >> > > > > > icfoss-apache called as >>> >> >> > > > > youth >>> >> >> > > > > > mentoring programme coordinated by Luciano resende. >>> >> >> > > > > > >>> >> >> > > > > > i like the >>>project >>> >>and >>> >> >> > > > > > like to >>> >> >> > > > > involve in your project as a >>> >> >> > > > > > programmer.i have gone through the your project and >>>gone >>> >> >>through >>> >> >> > > > > > the bugs list.I like to work on the bug >>> >> >> > > > > > "cTAKE-189:GSoC:implement OCR/tika to standardize text >>> >>inputs >>> >> >> > > > > > for cTAKES".can you allow me to >>> >> >> > > work >>> >> >> > > > on that? >>> >> >> > > > > >>> >> >> > > >>> >> >> >>> >> >>> >> >>> >>> >>