Prakash, There is an open source library to extract text from a Word document here: http://www.textmining.org
-Ryan -----Original Message----- From: prakash jaya [mailto:[EMAIL PROTECTED] Sent: Thursday, October 27, 2005 9:28 AM To: [email protected] Subject: Re: how can i extract text from Powerpointfiles,Ms word files hello friends thank for u suggetions, i tried i got some result,but i am not getting the total document.i am getting only the first line of the document.plz give solution to this. here is my following code: //////////////////////////////////////////////////////////////////// import java.io.*; import org.apache.poi.hwpf.usermodel.*; import org.apache.poi.hwpf.HWPFDocument; public class Test11 { public Test11() { } public static void main(String[] args)throws IOException { try { HWPFDocument doc = new HWPFDocument (new FileInputStream (fin)); Range r = doc.getRange(); FileOutputStream out=new FileOutputStream("d:\\example.txt"); for (int x = 0; x < r.numSections(); x++) { Section s = r.getSection(x); for (int y = 0; y < s.numParagraphs(); y++) { Paragraph p = s.getParagraph(y); for (int z = 0; z < p.numCharacterRuns(); z++) { //character run CharacterRun run = p.getCharacterRun(z); //character run text String text = run.text(); byte[] b1=text.getBytes(); // show us the text out.write(b1); } out.close(); } } } catch (Throwable t) { t.printStackTrace(); } } } /////////////////////////////////////////////////////// my original document is: I want to read a powerpoit file "A" and write it content to create another powerpoint "B". The simple way is to use FileInputStream to read a byte array from file A.ppt and FileOutputStream to write the byte array to B.ppt. It's work. But today, i don't want to use raw byte array to write to B.ppt immediately(Just the program's demand, i do not want to do this too!!:<). I translate the byte array to "String", then translate it back to byte array and write it to B.ppt. One problem happens!! this program output is: I want to read a powerpoit file "A" and write it content to create another powerpoint "B". plz give solution to this problem.i would be thankful if u give solution to my problem >From: Rama Subba Reddy <[EMAIL PROTECTED]> >Reply-To: "POI Users List" <[email protected]> >To: POI Users List <[email protected]> >Subject: Re: how can i extract text from Powerpointfiles,Ms word files >Date: Thu, 27 Oct 2005 12:53:07 +0100 (BST) > >Hello, > use the following code and extract >HWPFDocument doc = new HWPFDocument(fin); > Range range = doc.getRange(); > int totParagraphs = range.numParagraphs(); >for (int i = 0; i < totParagraphs; i++) { > Paragraph para = range.getParagraph(i); >get text run from para and then get text and properties from run >} >prakash jaya <[EMAIL PROTECTED]> wrote: > >hello friend good morning, >i am getting text from the powerpoint >presentations using the powerpointextractor class of poi.but how to get >text from MS word files.i run HWPFDocument.java class.In the specification >it takes two aruments(one is sorce file,another is destination file).it >does >not give any result & also it does not create any destination file.can u >plz >give solution this problem.i would be thankful if u give solution. > >_________________________________________________________________ >Spice up your IM conversations. New colourful, animated emoticons. Go >chatting! http://server1.msn.co.in/SP05/emoticons/ > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >Mailing List: http://jakarta.apache.org/site/mail2.html#poi >The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/ > > > >--------------------------------- > Enjoy this Diwali with Y! India Click here _________________________________________________________________ Answer questions. Register with e-bay. Win gold, watches and more! http://pages.ebay.in/msnindia/msn_quad_shopwingold_sept.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
