can you provide your gmail id to share the proposal document with you?
On Tue, Jul 16, 2013 at 11:33 PM, sandeep rg <sandeep.f...@gmail.com> wrote: > sir, > i am providing proposal by two days.now i am mainly going through > ASF-ICFOSS gateway because if i gone through their way and my proposal is > get selected,ICFOSS will provide some sort of support such as > certificates,small financial support etc. to us. > > > but,main thing is i like programming,i like to explore through the new > technologies in coding and like to interact with the coding.so if my > proposal is got rejected,then also i like to work in your project as a > volunteer if you allow me.. > > now i am preparing a proposal,within 2 days i will submit it..Mattmann > chris helped me to know more about the format of proposal. > > > On Tue, Jul 16, 2013 at 8:12 PM, Chen, Pei <pei.c...@childrens.harvard.edu > > wrote: > >> Chris/Sandeep, >> According to ASF-ICFOSS, I believe the deadline for submitting proposals >> is this coming Friday (July 19). >> After which point, mentors will have 2 weeks to review and score/accept. >> Just curious, are we planning to follow the same process here? Or since >> it's all volunteer work, technically- sandeep and still contribute code to >> the community and participate in the dev group here. >> >> Looking forward to it. >> --Pei >> >> >> > -----Original Message----- >> > From: sandeep rg [mailto:sandeep.f...@gmail.com] >> > Sent: Monday, July 15, 2013 1:05 PM >> > To: dev@ctakes.apache.org >> > Subject: Re: to involve in your development group >> > >> > sir, >> > i gone through most of the ocr technologies and reached a conclusion.i >> > would like to use apache tika and java ocr for this pupose. >> > >> > Tessearact is a ocr tool,it can be used for extracting from multiple >> > languages.it is implemented in vc++.so it can acceded using java native >> > function.they provided another tool tess4j but review says that it has >> > many bugs. >> > >> > Apache tika developed in java language.it can be used to extract text >> data >> > from .xls,word,txt,pdf and other many formats.it is easy for >> implementing >> > in project also.i have just gone through its implementation way. >> > >> > then about javaocr,its good for extrating text from a jpeg or scanned >> > images.we can train it with various fonts.more we train more will be its >> > accuracy but its speed will get decreased.i didn't find any particular >> > documentation for that. >> > >> > >> > >> > On Sun, Jul 14, 2013 at 9:18 PM, sandeep rg <sandeep.f...@gmail.com> >> > wrote: >> > >> > > thanks a lot for both of your support.I will do my best to find >> solution >> > > for jira problem.i will share the proposal with both of you.. >> > > >> > > >> > > >> > > On Sun, Jul 14, 2013 at 1:46 AM, Chen, Pei >> > <pei.c...@childrens.harvard.edu >> > > > wrote: >> > > >> > >> Sandeep, >> > >> Its great to have Chris on board as well- he was one of the >> coordinators >> > >> of GSoC. >> > >> Looking forward to it. >> > >> >> > >> Sent from my iPhone >> > >> >> > >> On Jul 13, 2013, at 12:24 PM, "Mattmann, Chris A (398J)" < >> > >> chris.a.mattm...@jpl.nasa.gov> wrote: >> > >> >> > >> > Hi Sandeep, >> > >> > >> > >> > That is great news, and good job. OK, for some ideas about >> developing >> > >> > your proposal, you may want to simply start with a Google Docs, and >> > then >> > >> > share it with Pei. I'd be happy to help co-mentor if Pei and you >> think >> > >> > it's useful too. >> > >> > >> > >> > Your proposal should likely cover: >> > >> > >> > >> > 1. Background - what's the state of CTAKES-189 and what's it >> trying to >> > >> > accomplish >> > >> > (include some figures, etc. along with your text) >> > >> > >> > >> > 2. Approach - what are you going to do to solve CTAKES-189. Be >> specific, >> > >> > and >> > >> > try to break it down into smaller, easily reversible steps >> > >> > >> > >> > 3. Schedule - how long and what is the schedule for achieving this? >> > >> > >> > >> > 4. Risks/etc. - any known risks like are you taking a vacation >> anytime >> > >> > soon :) >> > >> > or are there other time constraints? >> > >> > >> > >> > 5. References, etc. >> > >> > >> > >> > HTH and I'd be happy if you want to share the GDocs with me as you >> > >> develop >> > >> > it. >> > >> > >> > >> > Cheers! >> > >> > >> > >> > Chris >> > >> > >> > >> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > ++++++++ >> > >> > Chris Mattmann, Ph.D. >> > >> > Senior Computer Scientist >> > >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> > >> > Office: 171-266B, Mailstop: 171-246 >> > >> > Email: chris.a.mattm...@nasa.gov >> > >> > WWW: http://sunset.usc.edu/~mattmann/ >> > >> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > ++++++++ >> > >> > Adjunct Assistant Professor, Computer Science Department >> > >> > University of Southern California, Los Angeles, CA 90089 USA >> > >> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > ++++++++ >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > -----Original Message----- >> > >> > From: sandeep rg <sandeep.f...@gmail.com> >> > >> > Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >> > >> > Date: Saturday, July 13, 2013 8:57 AM >> > >> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >> > >> > Subject: Re: to involve in your development group >> > >> > >> > >> >> i have also gone through the technologies available for >> development >> > of >> > >> >> ocr,from that i think apache tika and tessearact is best for >> resolving >> > >> the >> > >> >> problem. >> > >> >> >> > >> >> >> > >> >> On Sat, Jul 13, 2013 at 9:02 PM, sandeep rg >> > <sandeep.f...@gmail.com> >> > >> >> wrote: >> > >> >> >> > >> >>> hi Mattamann Chris, >> > >> >>> i has participated in the event coordinated by luciano resende >> > >> >>> >> > >> >>> http://community.apache.org/mentoringprogramme-icfoss- >> > pilot.html >> > >> >>> >> > >> >>> and from that i learned about open source and like to work on >> your >> > >> >>> project >> > >> >>> ctakes.i would like to fix the jira >> > >> >>> >> > >> >>> https://issues.apache.org/jira/browse/CTAKES-189 >> > >> >>> >> > >> >>> chen pei accepted my requested to be my mentor.now i want to give >> > a >> > >> >>> proposal to apache about the project i am going to work on.can >> you >> > >> help >> > >> >>> me >> > >> >>> to prepare a proposal to be submitted before 18 th of this july. >> > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> On Sat, Jul 13, 2013 at 2:26 AM, Mattmann, Chris A (398J) < >> > >> >>> chris.a.mattm...@jpl.nasa.gov> wrote: >> > >> >>> >> > >> >>>> Hi Sandeep, >> > >> >>>> >> > >> >>>> I think the best thing to do is: >> > >> >>>> >> > >> >>>> 1. Develop a JIRA issue here: >> > >> >>>> https://issues.apache.org/jira/browse/CTAKES >> > >> >>>> 1a. you can register for a new account on JIRA >> > >> >>>> 2. Once your JIRA issue is created, feel free to start a >> [DISCUSS] >> > >> >>>> thread >> > >> >>>> (e.g., with subject [DISCUSS] "some topic" where "some topic" is >> > >> >>>> perhaps >> > >> >>>> the main idea you have) on dev@ctakes.apache.org, referencing >> > your >> > >> >>>> issue >> > >> >>>> and >> > >> >>>> asking for feedback >> > >> >>>> 3. Work with the Apache cTAKES PMC and committers to get your >> > patches >> > >> >>>> and >> > >> >>>> other items attached to your issue from #1 committed into the >> > sources >> > >> >>>> >> > >> >>>> Ideally if 1-3 happen and it's a good interaction, Apache is >> built on >> > >> >>>> meritocracy and you could possibly earn the merit to become a >> PMC >> > >> >>>> member >> > >> >>>> or committer on the project. >> > >> >>>> >> > >> >>>> Cheers, >> > >> >>>> Chris >> > >> >>>> >> > >> >>>> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > ++++++++ >> > >> >>>> Chris Mattmann, Ph.D. >> > >> >>>> Senior Computer Scientist >> > >> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> > >> >>>> Office: 171-266B, Mailstop: 171-246 >> > >> >>>> Email: chris.a.mattm...@nasa.gov >> > >> >>>> WWW: http://sunset.usc.edu/~mattmann/ >> > >> >>>> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > ++++++++ >> > >> >>>> Adjunct Assistant Professor, Computer Science Department >> > >> >>>> University of Southern California, Los Angeles, CA 90089 USA >> > >> >>>> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > ++++++++ >> > >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> >> > >> >>>> -----Original Message----- >> > >> >>>> From: sandeep rg <sandeep.f...@gmail.com> >> > >> >>>> Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >> > >> >>>> Date: Thursday, July 11, 2013 11:30 AM >> > >> >>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >> > >> >>>> Subject: Re: to involve in your development group >> > >> >>>> >> > >> >>>>> can you provide what all details i should include in a >> > >> >>>> proposal?whether i >> > >> >>>>> wanted to include all implemetation(technical) details in the >> > >> >>>> proposal? >> > >> >>>>> >> > >> >>>>> >> > >> >>>>> On Thu, Jul 11, 2013 at 9:45 PM, Mattmann, Chris A (398J) < >> > >> >>>>> chris.a.mattm...@jpl.nasa.gov> wrote: >> > >> >>>>> >> > >> >>>>>> Dear Sandeep, >> > >> >>>>>> >> > >> >>>>>> Thanks for your interest in cTAKES. We would welcome your >> > >> >>>> contribution >> > >> >>>>>> and are happy to have your interest in the project. >> > >> >>>>>> >> > >> >>>>>> Cheers, >> > >> >>>>>> Chris >> > >> >>>>>> >> > >> >>>>>> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > ++++++++ >> > >> >>>>>> Chris Mattmann, Ph.D. >> > >> >>>>>> Senior Computer Scientist >> > >> >>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> > >> >>>>>> Office: 171-266B, Mailstop: 171-246 >> > >> >>>>>> Email: chris.a.mattm...@nasa.gov >> > >> >>>>>> WWW: http://sunset.usc.edu/~mattmann/ >> > >> >>>>>> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > ++++++++ >> > >> >>>>>> Adjunct Assistant Professor, Computer Science Department >> > >> >>>>>> University of Southern California, Los Angeles, CA 90089 USA >> > >> >>>>>> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > ++++++++ >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> -----Original Message----- >> > >> >>>>>> From: sandeep rg <sandeep.f...@gmail.com> >> > >> >>>>>> Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >> > >> >>>>>> Date: Wednesday, July 10, 2013 11:01 AM >> > >> >>>>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org> >> > >> >>>>>> Subject: Re: to involve in your development group >> > >> >>>>>> >> > >> >>>>>>> sir, >> > >> >>>>>>> >> > >> >>>>>>> My name is sandeep rg.i am a btech graduate in computer >> > >> science.now >> > >> >>>>>> doing >> > >> >>>>>>> an internship in a company in java language. >> > >> >>>>>>> >> > >> >>>>>>> then i had installed all things succesfully,now downloading >> the >> > >> >>>>>>> resource.ittake too much time. >> > >> >>>>>>> >> > >> >>>>>>> i have gone through the suggested ocr technologies. >> > >> >>>>>>> Javaocr has some good user review. >> > >> >>>>>>> Apache tika has a capability to process different types of >> format. >> > >> >>>>>>> More than that there is tesserract which are also used for >> ocr >> > >> >>>> purpose. >> > >> >>>>>>> then apache pdfbox is also used for text extratcion but only >> for >> > >> >>>> pdf >> > >> >>>>>>> files. >> > >> >>>>>>> now i am going through every thing to find out best >> technology >> > >> from >> > >> >>>>>> this. >> > >> >>>>>>> >> > >> >>>>>>> >> > >> >>>>>>> On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei >> > >> >>>>>>> <pei.c...@childrens.harvard.edu>wrote: >> > >> >>>>>>> >> > >> >>>>>>>> Hi Sandeep, >> > >> >>>>>>>> I am delighted to work with you on this project. >> > >> >>>>>>>> >> > >> >>>>>>>> I was not sure if I understood you correctly- did you mean >> to >> > say >> > >> >>>>>> that >> > >> >>>>>>>> you >> > >> >>>>>>>> have already tried using cTAKES and it's components? >> > >> >>>>>>>> If not, you can do an svn checkout of the code and try >> running >> > >> >>>> the >> > >> >>>>>>>> debugger gui from the command line (or eclipseide) that will >> > >> >>>> allow >> > >> >>>>>> you >> > >> >>>>>>>> to >> > >> >>>>>>>> type in plain text and get back the different structured >> content >> > >> >>>>>> (types) >> > >> >>>>>>>> that cTAKES produces: >> > >> >>>>>>>> MAVEN_OPTS="-Xmx2g -Xms1g" >> > >> >>>>>>>> mvn -PrunCVD compile >> > >> >>>>>>>> From the guide: >> > >> >>>> >> > >> >>>> >> > >> >> > https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Develope >> > r+ >> > >> >>>> I >> > >> >>>>>>>> nstall+Guide >> > >> >>>>>>>> >> > >> >>>>>>>> A bit of background: >> > >> >>>>>>>> Apache cTAKES uses SVN for version on control: >> > >> >>>>>>>> https://svn.apache.org/repos/asf/ctakes/trunk/ >> > >> >>>>>>>> Jira for issues tracking: >> > >> >>>>>>>> https://issues.apache.org/jira/browse/ctakes >> > >> >>>>>>>> Maven for building and dependency management. >> > >> >>>>>>>> A lot of the developers use Eclipse IDE for their >> development. >> > >> >>>>>>>> More info on ctakes.apache.org >> > >> >>>>>>>> >> > >> >>>>>>>> cTAKES is built on top of the Apache UIMA Framework. >> > >> >>>> Essentially, >> > >> >>>>>>>> cTAKES >> > >> >>>>>>>> is a collection of Annotators (Java Classes) and wired >> together >> > >> >>>> to >> > >> >>>>>> into >> > >> >>>>>>>> a >> > >> >>>>>>>> pipeline. >> > >> >>>>>>>> It's goal in a nutshell is to turn unstructured plain text >> into >> > >> >>>>>>>> structured/normalized form and specially trained for medical >> > >> >>>> notes. >> > >> >>>>>>>> Right now- the input cTAKES expects would be in plain text >> > form >> > >> >>>> and >> > >> >>>>>>>> cTAKES >> > >> >>>>>>>> does not have an OCR component. >> > >> >>>>>>>> cTAKE-189:GSoC:implement OCR/tika to standardize text >> > inputs was >> > >> >>>> an >> > >> >>>>>> idea >> > >> >>>>>>>> to allow cTAKES to take in any type of input (PDF, Images, >> > Word, >> > >> >>>> XLS, >> > >> >>>>>>>> etc.) >> > >> >>>>>>>> and pass the text for cTAKES processing. >> > >> >>>>>>>> [I was originally thinking this could be done in some kind >> of >> > >> >>>>>>>> preprocessing, or an optional Annotator that could be added >> in >> > >> >>>> the >> > >> >>>>>>>> beginning of a pipeline]. There may be some existing work >> > that >> > >> >>>>>> could be >> > >> >>>>>>>> potentially reused: Apache Tika ( >> > >> >>>>>>>> https://issues.apache.org/jira/browse/TIKA-93 ) as well as >> > some >> > >> >>>> open >> > >> >>>>>>>> source OCR toolkits (JavaOCR). >> > >> >>>>>>>> >> > >> >>>>>>>> About Me: >> > >> >>>> >> > >> >>>> >> > >> >> > >> http://childrenshospital.org/cfapps/research/data_admin/Site3240/mainpag >> > >> >>>> e >> > >> >>>>>>>> S3240P8.html >> > >> >>>>>>>> http://www.linkedin.com/in/peistation >> > >> >>>>>>>> http://people.apache.org/committer-index.html#chenpei >> > >> >>>>>>>> >> > >> >>>>>>>>> -----Original Message----- >> > >> >>>>>>>>> From: sandeep rg [mailto:sandeep.f...@gmail.com] >> > >> >>>>>>>>> Sent: Tuesday, July 09, 2013 1:19 PM >> > >> >>>>>>>>> To: dev@ctakes.apache.org >> > >> >>>>>>>>> Subject: Re: to involve in your development group >> > >> >>>>>>>>> >> > >> >>>>>>>>> Thanks a lot for giving me support.i like to work with you. >> > >> >>>>>>>>> >> > >> >>>>>>>>> I have gone through the objectives of the software,used the >> > >> >>>>>> software >> > >> >>>>>>>> and >> > >> >>>>>>>>> gone through various components of the project.can you >> > provide >> > >> >>>> me >> > >> >>>>>>>> starting >> > >> >>>>>>>>> point from where i should start to know more about the >> > coding >> > >> >>>> part >> > >> >>>>>> of >> > >> >>>>>>>> the >> > >> >>>>>>>>> project. >> > >> >>>>>>>>> >> > >> >>>>>>>>> can you tell me more about the project and about you also? >> > >> >>>>>>>>> >> > >> >>>>>>>>> >> > >> >>>>>>>>> On Tue, Jul 9, 2013 at 1:14 AM, Chen, Pei >> > >> >>>>>>>>> <pei.c...@childrens.harvard.edu>wrote: >> > >> >>>>>>>>> >> > >> >>>>>>>>>> Hi Sandeep, >> > >> >>>>>>>>>> Thank you for the interest. I just had a quick look at >> the >> > >> >>>>>> ICFOSS >> > >> >>>>>>>>>> pilot mentoring program and will be happy to serve as a >> > >> >>>> mentor >> > >> >>>>>> for >> > >> >>>>>>>>>> your project >> > >> >>>>>>>>>> proposal(s) if you are interested. >> > >> >>>>>>>>>> >> > >> >>>>>>>>>> --Pei >> > >> >>>>>>>>>> >> > >> >>>>>>>>>>> -----Original Message----- >> > >> >>>>>>>>>>> From: sandeep rg [mailto:sandeep.f...@gmail.com] >> > >> >>>>>>>>>>> Sent: Monday, July 08, 2013 2:24 PM >> > >> >>>>>>>>>>> To: dev@ctakes.apache.org >> > >> >>>>>>>>>>> Subject: Re: to involve in your development group >> > >> >>>>>>>>>>> >> > >> >>>>>>>>>>> sir, >> > >> >>>>>>>>>>> >> > >> >>>>>>>>>>> details of the program Pilot mentoring programme with >> > india >> > >> >>>>>> ICFOSS >> > >> >>>>>>>>>>> is >> > >> >>>>>>>>>> given >> > >> >>>>>>>>>>> in the below web address >> > >> >>>>>> http://community.apache.org/mentoringprogramme-icfoss- >> > pilot.html >> > >> >>>>>>>>>>> >> > >> >>>>>>>>>>> >> > >> >>>>>>>>>>> I am new to this community so i need a mentor for the >> > >> >>>>>> project.It >> > >> >>>>>>>>>>> will be >> > >> >>>>>>>>>> more >> > >> >>>>>>>>>>> helpful for me.. >> > >> >>>>>>>>>>> >> > >> >>>>>>>>>>> >> > >> >>>>>>>>>>> On Mon, Jul 8, 2013 at 7:22 PM, Chen, Pei >> > >> >>>>>>>>>>> <pei.c...@childrens.harvard.edu>wrote: >> > >> >>>>>>>>>>> >> > >> >>>>>>>>>>>> Hi Sandeep, >> > >> >>>>>>>>>>>> Welcome! I am not familiar with the details of >> > >> >>>>>> icfoss-apache, >> > >> >>>>>>>> but >> > >> >>>>>>>>>>>> please- you are more than welcome to work on the code >> > and >> > >> >>>>>>>>>>>> contributions will be greatly appreciated! >> > >> >>>>>>>>>>>> There may be a learning curve, but feel free let us know >> > >> >>>> if >> > >> >>>>>> you >> > >> >>>>>>>>>>>> have any questions/issues. >> > >> >>>>>>>>>>>> Thanks, >> > >> >>>>>>>>>>>> Pei >> > >> >>>>>>>>>>>> >> > >> >>>>>>>>>>>>> -----Original Message----- >> > >> >>>>>>>>>>>>> From: sandeep rg [mailto:sandeep.f...@gmail.com] >> > >> >>>>>>>>>>>>> Sent: Saturday, July 06, 2013 11:50 AM >> > >> >>>>>>>>>>>>> To: dev@ctakes.apache.org >> > >> >>>>>>>>>>>>> Subject: to involve in your development group >> > >> >>>>>>>>>>>>> >> > >> >>>>>>>>>>>>> my name is sandeep.i am btech graduate.i had >> > >> >>>> participated >> > >> >>>>>> in >> > >> >>>>>>>> a >> > >> >>>>>>>>>>>>> camp coordinated in kerala,India in association with >> > >> >>>>>>>>>>>>> icfoss-apache called as >> > >> >>>>>>>>>>>> youth >> > >> >>>>>>>>>>>>> mentoring programme coordinated by Luciano resende. >> > >> >>>>>>>>>>>>> >> > >> >>>>>>>>>>>>> i like the >> > >> >>>> project >> > >> >>>>>> and >> > >> >>>>>>>>>>>>> like to >> > >> >>>>>>>>>>>> involve in your project as a >> > >> >>>>>>>>>>>>> programmer.i have gone through the your project and >> > >> >>>> gone >> > >> >>>>>>>> through >> > >> >>>>>>>>>>>>> the bugs list.I like to work on the bug >> > >> >>>>>>>>>>>>> "cTAKE-189:GSoC:implement OCR/tika to standardize >> > text >> > >> >>>>>> inputs >> > >> >>>>>>>>>>>>> for cTAKES".can you allow me to >> > >> >>>>>>>>>> work >> > >> >>>>>>>>>>> on that? >> > >> > >> > >> >> > > >> > > >> > >