sir, i am providing proposal by two days.now i am mainly going through ASF-ICFOSS gateway because if i gone through their way and my proposal is get selected,ICFOSS will provide some sort of support such as certificates,small financial support etc. to us.
but,main thing is i like programming,i like to explore through the new technologies in coding and like to interact with the coding.so if my proposal is got rejected,then also i like to work in your project as a volunteer if you allow me.. now i am preparing a proposal,within 2 days i will submit it..Mattmann chris helped me to know more about the format of proposal. On Tue, Jul 16, 2013 at 8:12 PM, Chen, Pei <[email protected]>wrote: > Chris/Sandeep, > According to ASF-ICFOSS, I believe the deadline for submitting proposals > is this coming Friday (July 19). > After which point, mentors will have 2 weeks to review and score/accept. > Just curious, are we planning to follow the same process here? Or since > it's all volunteer work, technically- sandeep and still contribute code to > the community and participate in the dev group here. > > Looking forward to it. > --Pei > > > > -----Original Message----- > > From: sandeep rg [mailto:[email protected]] > > Sent: Monday, July 15, 2013 1:05 PM > > To: [email protected] > > Subject: Re: to involve in your development group > > > > sir, > > i gone through most of the ocr technologies and reached a conclusion.i > > would like to use apache tika and java ocr for this pupose. > > > > Tessearact is a ocr tool,it can be used for extracting from multiple > > languages.it is implemented in vc++.so it can acceded using java native > > function.they provided another tool tess4j but review says that it has > > many bugs. > > > > Apache tika developed in java language.it can be used to extract text > data > > from .xls,word,txt,pdf and other many formats.it is easy for > implementing > > in project also.i have just gone through its implementation way. > > > > then about javaocr,its good for extrating text from a jpeg or scanned > > images.we can train it with various fonts.more we train more will be its > > accuracy but its speed will get decreased.i didn't find any particular > > documentation for that. > > > > > > > > On Sun, Jul 14, 2013 at 9:18 PM, sandeep rg <[email protected]> > > wrote: > > > > > thanks a lot for both of your support.I will do my best to find > solution > > > for jira problem.i will share the proposal with both of you.. > > > > > > > > > > > > On Sun, Jul 14, 2013 at 1:46 AM, Chen, Pei > > <[email protected] > > > > wrote: > > > > > >> Sandeep, > > >> Its great to have Chris on board as well- he was one of the > coordinators > > >> of GSoC. > > >> Looking forward to it. > > >> > > >> Sent from my iPhone > > >> > > >> On Jul 13, 2013, at 12:24 PM, "Mattmann, Chris A (398J)" < > > >> [email protected]> wrote: > > >> > > >> > Hi Sandeep, > > >> > > > >> > That is great news, and good job. OK, for some ideas about > developing > > >> > your proposal, you may want to simply start with a Google Docs, and > > then > > >> > share it with Pei. I'd be happy to help co-mentor if Pei and you > think > > >> > it's useful too. > > >> > > > >> > Your proposal should likely cover: > > >> > > > >> > 1. Background - what's the state of CTAKES-189 and what's it trying > to > > >> > accomplish > > >> > (include some figures, etc. along with your text) > > >> > > > >> > 2. Approach - what are you going to do to solve CTAKES-189. Be > specific, > > >> > and > > >> > try to break it down into smaller, easily reversible steps > > >> > > > >> > 3. Schedule - how long and what is the schedule for achieving this? > > >> > > > >> > 4. Risks/etc. - any known risks like are you taking a vacation > anytime > > >> > soon :) > > >> > or are there other time constraints? > > >> > > > >> > 5. References, etc. > > >> > > > >> > HTH and I'd be happy if you want to share the GDocs with me as you > > >> develop > > >> > it. > > >> > > > >> > Cheers! > > >> > > > >> > Chris > > >> > > > >> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++ > > >> > Chris Mattmann, Ph.D. > > >> > Senior Computer Scientist > > >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > >> > Office: 171-266B, Mailstop: 171-246 > > >> > Email: [email protected] > > >> > WWW: http://sunset.usc.edu/~mattmann/ > > >> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++ > > >> > Adjunct Assistant Professor, Computer Science Department > > >> > University of Southern California, Los Angeles, CA 90089 USA > > >> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++ > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > -----Original Message----- > > >> > From: sandeep rg <[email protected]> > > >> > Reply-To: "[email protected]" <[email protected]> > > >> > Date: Saturday, July 13, 2013 8:57 AM > > >> > To: "[email protected]" <[email protected]> > > >> > Subject: Re: to involve in your development group > > >> > > > >> >> i have also gone through the technologies available for development > > of > > >> >> ocr,from that i think apache tika and tessearact is best for > resolving > > >> the > > >> >> problem. > > >> >> > > >> >> > > >> >> On Sat, Jul 13, 2013 at 9:02 PM, sandeep rg > > <[email protected]> > > >> >> wrote: > > >> >> > > >> >>> hi Mattamann Chris, > > >> >>> i has participated in the event coordinated by luciano resende > > >> >>> > > >> >>> http://community.apache.org/mentoringprogramme-icfoss- > > pilot.html > > >> >>> > > >> >>> and from that i learned about open source and like to work on your > > >> >>> project > > >> >>> ctakes.i would like to fix the jira > > >> >>> > > >> >>> https://issues.apache.org/jira/browse/CTAKES-189 > > >> >>> > > >> >>> chen pei accepted my requested to be my mentor.now i want to give > > a > > >> >>> proposal to apache about the project i am going to work on.can you > > >> help > > >> >>> me > > >> >>> to prepare a proposal to be submitted before 18 th of this july. > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> On Sat, Jul 13, 2013 at 2:26 AM, Mattmann, Chris A (398J) < > > >> >>> [email protected]> wrote: > > >> >>> > > >> >>>> Hi Sandeep, > > >> >>>> > > >> >>>> I think the best thing to do is: > > >> >>>> > > >> >>>> 1. Develop a JIRA issue here: > > >> >>>> https://issues.apache.org/jira/browse/CTAKES > > >> >>>> 1a. you can register for a new account on JIRA > > >> >>>> 2. Once your JIRA issue is created, feel free to start a > [DISCUSS] > > >> >>>> thread > > >> >>>> (e.g., with subject [DISCUSS] "some topic" where "some topic" is > > >> >>>> perhaps > > >> >>>> the main idea you have) on [email protected], referencing > > your > > >> >>>> issue > > >> >>>> and > > >> >>>> asking for feedback > > >> >>>> 3. Work with the Apache cTAKES PMC and committers to get your > > patches > > >> >>>> and > > >> >>>> other items attached to your issue from #1 committed into the > > sources > > >> >>>> > > >> >>>> Ideally if 1-3 happen and it's a good interaction, Apache is > built on > > >> >>>> meritocracy and you could possibly earn the merit to become a PMC > > >> >>>> member > > >> >>>> or committer on the project. > > >> >>>> > > >> >>>> Cheers, > > >> >>>> Chris > > >> >>>> > > >> >>>> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++ > > >> >>>> Chris Mattmann, Ph.D. > > >> >>>> Senior Computer Scientist > > >> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > >> >>>> Office: 171-266B, Mailstop: 171-246 > > >> >>>> Email: [email protected] > > >> >>>> WWW: http://sunset.usc.edu/~mattmann/ > > >> >>>> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++ > > >> >>>> Adjunct Assistant Professor, Computer Science Department > > >> >>>> University of Southern California, Los Angeles, CA 90089 USA > > >> >>>> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++ > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> -----Original Message----- > > >> >>>> From: sandeep rg <[email protected]> > > >> >>>> Reply-To: "[email protected]" <[email protected]> > > >> >>>> Date: Thursday, July 11, 2013 11:30 AM > > >> >>>> To: "[email protected]" <[email protected]> > > >> >>>> Subject: Re: to involve in your development group > > >> >>>> > > >> >>>>> can you provide what all details i should include in a > > >> >>>> proposal?whether i > > >> >>>>> wanted to include all implemetation(technical) details in the > > >> >>>> proposal? > > >> >>>>> > > >> >>>>> > > >> >>>>> On Thu, Jul 11, 2013 at 9:45 PM, Mattmann, Chris A (398J) < > > >> >>>>> [email protected]> wrote: > > >> >>>>> > > >> >>>>>> Dear Sandeep, > > >> >>>>>> > > >> >>>>>> Thanks for your interest in cTAKES. We would welcome your > > >> >>>> contribution > > >> >>>>>> and are happy to have your interest in the project. > > >> >>>>>> > > >> >>>>>> Cheers, > > >> >>>>>> Chris > > >> >>>>>> > > >> >>>>>> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++ > > >> >>>>>> Chris Mattmann, Ph.D. > > >> >>>>>> Senior Computer Scientist > > >> >>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > >> >>>>>> Office: 171-266B, Mailstop: 171-246 > > >> >>>>>> Email: [email protected] > > >> >>>>>> WWW: http://sunset.usc.edu/~mattmann/ > > >> >>>>>> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++ > > >> >>>>>> Adjunct Assistant Professor, Computer Science Department > > >> >>>>>> University of Southern California, Los Angeles, CA 90089 USA > > >> >>>>>> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++ > > >> >>>>>> > > >> >>>>>> > > >> >>>>>> > > >> >>>>>> > > >> >>>>>> > > >> >>>>>> > > >> >>>>>> -----Original Message----- > > >> >>>>>> From: sandeep rg <[email protected]> > > >> >>>>>> Reply-To: "[email protected]" <[email protected]> > > >> >>>>>> Date: Wednesday, July 10, 2013 11:01 AM > > >> >>>>>> To: "[email protected]" <[email protected]> > > >> >>>>>> Subject: Re: to involve in your development group > > >> >>>>>> > > >> >>>>>>> sir, > > >> >>>>>>> > > >> >>>>>>> My name is sandeep rg.i am a btech graduate in computer > > >> science.now > > >> >>>>>> doing > > >> >>>>>>> an internship in a company in java language. > > >> >>>>>>> > > >> >>>>>>> then i had installed all things succesfully,now downloading > the > > >> >>>>>>> resource.ittake too much time. > > >> >>>>>>> > > >> >>>>>>> i have gone through the suggested ocr technologies. > > >> >>>>>>> Javaocr has some good user review. > > >> >>>>>>> Apache tika has a capability to process different types of > format. > > >> >>>>>>> More than that there is tesserract which are also used for ocr > > >> >>>> purpose. > > >> >>>>>>> then apache pdfbox is also used for text extratcion but only > for > > >> >>>> pdf > > >> >>>>>>> files. > > >> >>>>>>> now i am going through every thing to find out best technology > > >> from > > >> >>>>>> this. > > >> >>>>>>> > > >> >>>>>>> > > >> >>>>>>> On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei > > >> >>>>>>> <[email protected]>wrote: > > >> >>>>>>> > > >> >>>>>>>> Hi Sandeep, > > >> >>>>>>>> I am delighted to work with you on this project. > > >> >>>>>>>> > > >> >>>>>>>> I was not sure if I understood you correctly- did you mean to > > say > > >> >>>>>> that > > >> >>>>>>>> you > > >> >>>>>>>> have already tried using cTAKES and it's components? > > >> >>>>>>>> If not, you can do an svn checkout of the code and try > running > > >> >>>> the > > >> >>>>>>>> debugger gui from the command line (or eclipseide) that will > > >> >>>> allow > > >> >>>>>> you > > >> >>>>>>>> to > > >> >>>>>>>> type in plain text and get back the different structured > content > > >> >>>>>> (types) > > >> >>>>>>>> that cTAKES produces: > > >> >>>>>>>> MAVEN_OPTS="-Xmx2g -Xms1g" > > >> >>>>>>>> mvn -PrunCVD compile > > >> >>>>>>>> From the guide: > > >> >>>> > > >> >>>> > > >> > > https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Develope > > r+ > > >> >>>> I > > >> >>>>>>>> nstall+Guide > > >> >>>>>>>> > > >> >>>>>>>> A bit of background: > > >> >>>>>>>> Apache cTAKES uses SVN for version on control: > > >> >>>>>>>> https://svn.apache.org/repos/asf/ctakes/trunk/ > > >> >>>>>>>> Jira for issues tracking: > > >> >>>>>>>> https://issues.apache.org/jira/browse/ctakes > > >> >>>>>>>> Maven for building and dependency management. > > >> >>>>>>>> A lot of the developers use Eclipse IDE for their > development. > > >> >>>>>>>> More info on ctakes.apache.org > > >> >>>>>>>> > > >> >>>>>>>> cTAKES is built on top of the Apache UIMA Framework. > > >> >>>> Essentially, > > >> >>>>>>>> cTAKES > > >> >>>>>>>> is a collection of Annotators (Java Classes) and wired > together > > >> >>>> to > > >> >>>>>> into > > >> >>>>>>>> a > > >> >>>>>>>> pipeline. > > >> >>>>>>>> It's goal in a nutshell is to turn unstructured plain text > into > > >> >>>>>>>> structured/normalized form and specially trained for medical > > >> >>>> notes. > > >> >>>>>>>> Right now- the input cTAKES expects would be in plain text > > form > > >> >>>> and > > >> >>>>>>>> cTAKES > > >> >>>>>>>> does not have an OCR component. > > >> >>>>>>>> cTAKE-189:GSoC:implement OCR/tika to standardize text > > inputs was > > >> >>>> an > > >> >>>>>> idea > > >> >>>>>>>> to allow cTAKES to take in any type of input (PDF, Images, > > Word, > > >> >>>> XLS, > > >> >>>>>>>> etc.) > > >> >>>>>>>> and pass the text for cTAKES processing. > > >> >>>>>>>> [I was originally thinking this could be done in some kind of > > >> >>>>>>>> preprocessing, or an optional Annotator that could be added > in > > >> >>>> the > > >> >>>>>>>> beginning of a pipeline]. There may be some existing work > > that > > >> >>>>>> could be > > >> >>>>>>>> potentially reused: Apache Tika ( > > >> >>>>>>>> https://issues.apache.org/jira/browse/TIKA-93 ) as well as > > some > > >> >>>> open > > >> >>>>>>>> source OCR toolkits (JavaOCR). > > >> >>>>>>>> > > >> >>>>>>>> About Me: > > >> >>>> > > >> >>>> > > >> > > http://childrenshospital.org/cfapps/research/data_admin/Site3240/mainpag > > >> >>>> e > > >> >>>>>>>> S3240P8.html > > >> >>>>>>>> http://www.linkedin.com/in/peistation > > >> >>>>>>>> http://people.apache.org/committer-index.html#chenpei > > >> >>>>>>>> > > >> >>>>>>>>> -----Original Message----- > > >> >>>>>>>>> From: sandeep rg [mailto:[email protected]] > > >> >>>>>>>>> Sent: Tuesday, July 09, 2013 1:19 PM > > >> >>>>>>>>> To: [email protected] > > >> >>>>>>>>> Subject: Re: to involve in your development group > > >> >>>>>>>>> > > >> >>>>>>>>> Thanks a lot for giving me support.i like to work with you. > > >> >>>>>>>>> > > >> >>>>>>>>> I have gone through the objectives of the software,used the > > >> >>>>>> software > > >> >>>>>>>> and > > >> >>>>>>>>> gone through various components of the project.can you > > provide > > >> >>>> me > > >> >>>>>>>> starting > > >> >>>>>>>>> point from where i should start to know more about the > > coding > > >> >>>> part > > >> >>>>>> of > > >> >>>>>>>> the > > >> >>>>>>>>> project. > > >> >>>>>>>>> > > >> >>>>>>>>> can you tell me more about the project and about you also? > > >> >>>>>>>>> > > >> >>>>>>>>> > > >> >>>>>>>>> On Tue, Jul 9, 2013 at 1:14 AM, Chen, Pei > > >> >>>>>>>>> <[email protected]>wrote: > > >> >>>>>>>>> > > >> >>>>>>>>>> Hi Sandeep, > > >> >>>>>>>>>> Thank you for the interest. I just had a quick look at the > > >> >>>>>> ICFOSS > > >> >>>>>>>>>> pilot mentoring program and will be happy to serve as a > > >> >>>> mentor > > >> >>>>>> for > > >> >>>>>>>>>> your project > > >> >>>>>>>>>> proposal(s) if you are interested. > > >> >>>>>>>>>> > > >> >>>>>>>>>> --Pei > > >> >>>>>>>>>> > > >> >>>>>>>>>>> -----Original Message----- > > >> >>>>>>>>>>> From: sandeep rg [mailto:[email protected]] > > >> >>>>>>>>>>> Sent: Monday, July 08, 2013 2:24 PM > > >> >>>>>>>>>>> To: [email protected] > > >> >>>>>>>>>>> Subject: Re: to involve in your development group > > >> >>>>>>>>>>> > > >> >>>>>>>>>>> sir, > > >> >>>>>>>>>>> > > >> >>>>>>>>>>> details of the program Pilot mentoring programme with > > india > > >> >>>>>> ICFOSS > > >> >>>>>>>>>>> is > > >> >>>>>>>>>> given > > >> >>>>>>>>>>> in the below web address > > >> >>>>>> http://community.apache.org/mentoringprogramme-icfoss- > > pilot.html > > >> >>>>>>>>>>> > > >> >>>>>>>>>>> > > >> >>>>>>>>>>> I am new to this community so i need a mentor for the > > >> >>>>>> project.It > > >> >>>>>>>>>>> will be > > >> >>>>>>>>>> more > > >> >>>>>>>>>>> helpful for me.. > > >> >>>>>>>>>>> > > >> >>>>>>>>>>> > > >> >>>>>>>>>>> On Mon, Jul 8, 2013 at 7:22 PM, Chen, Pei > > >> >>>>>>>>>>> <[email protected]>wrote: > > >> >>>>>>>>>>> > > >> >>>>>>>>>>>> Hi Sandeep, > > >> >>>>>>>>>>>> Welcome! I am not familiar with the details of > > >> >>>>>> icfoss-apache, > > >> >>>>>>>> but > > >> >>>>>>>>>>>> please- you are more than welcome to work on the code > > and > > >> >>>>>>>>>>>> contributions will be greatly appreciated! > > >> >>>>>>>>>>>> There may be a learning curve, but feel free let us know > > >> >>>> if > > >> >>>>>> you > > >> >>>>>>>>>>>> have any questions/issues. > > >> >>>>>>>>>>>> Thanks, > > >> >>>>>>>>>>>> Pei > > >> >>>>>>>>>>>> > > >> >>>>>>>>>>>>> -----Original Message----- > > >> >>>>>>>>>>>>> From: sandeep rg [mailto:[email protected]] > > >> >>>>>>>>>>>>> Sent: Saturday, July 06, 2013 11:50 AM > > >> >>>>>>>>>>>>> To: [email protected] > > >> >>>>>>>>>>>>> Subject: to involve in your development group > > >> >>>>>>>>>>>>> > > >> >>>>>>>>>>>>> my name is sandeep.i am btech graduate.i had > > >> >>>> participated > > >> >>>>>> in > > >> >>>>>>>> a > > >> >>>>>>>>>>>>> camp coordinated in kerala,India in association with > > >> >>>>>>>>>>>>> icfoss-apache called as > > >> >>>>>>>>>>>> youth > > >> >>>>>>>>>>>>> mentoring programme coordinated by Luciano resende. > > >> >>>>>>>>>>>>> > > >> >>>>>>>>>>>>> i like the > > >> >>>> project > > >> >>>>>> and > > >> >>>>>>>>>>>>> like to > > >> >>>>>>>>>>>> involve in your project as a > > >> >>>>>>>>>>>>> programmer.i have gone through the your project and > > >> >>>> gone > > >> >>>>>>>> through > > >> >>>>>>>>>>>>> the bugs list.I like to work on the bug > > >> >>>>>>>>>>>>> "cTAKE-189:GSoC:implement OCR/tika to standardize > > text > > >> >>>>>> inputs > > >> >>>>>>>>>>>>> for cTAKES".can you allow me to > > >> >>>>>>>>>> work > > >> >>>>>>>>>>> on that? > > >> > > > >> > > > > > > >
