Hi Mark,

I would sincerely like to convey my Thanks to you.
The tips you have given is really helpful.
appreciated your time and efforts.

Regards,
Bihag 



MSB wrote:
> 
> Have not had the time to do much work or ANY testing so please treat this
> with caution.
> 
> What I am proposing is that the contents of a Word document be converted
> into an ArrayList. That ArrayList will contain instances of the
> DocumentPart class and these will facilitate the comparison operation. I
> have not given those a great deal of thought yet but believe that we
> should check for any paragraphs - not tables yet - being inserted,
> deleted, modified (not sure how to proceed with this one yet) or moved. As
> you can see, I have provided constants in the DocumentPart class to
> support these different results. The comparison status flag is there to
> prevent a paragraph being checked again once a match has been found but I
> am thinking of another use if the logic holds.
> 
> As yet, I have not coded the compare methods or the save results method as
> I think it is wise to throughly test the loading method firstly. We need
> to be certain that the ArrayList of DocumentPart(s) accurately describes
> the documents. I think that you are in a 'better' time-zone and that you
> may have the opportunity to test the code before me. If you look at the
> main method of the DocumentComparator class, you will see how to run the
> code. All you need to do for now is make sure that the first two
> parameters to the compareDocuments() method point to Word files and then
> run the code. To check the results, you can either modify DocumentPart to
> add a toString() method that outputs  the instances contents or simply
> call the getParagraphText() and getCellContents() methods from the
> compareDocument() method.
> 
> Anyway, here is the code so far. Have a look and see if it is the way you
> want to go - or think makes sense. Do not feel that you cannot criticise
> or alter the code or the approach as, for now we are not committed to any
> particular strategy, just exploring what is possible.
> 
> package comparedocuments;
> 
> import java.io.File;
> import java.io.FileInputStream;
> import java.util.ArrayList;
> import java.io.FileNotFoundException;
> import java.io.IOException;
> 
> import org.apache.poi.hwpf.HWPFDocument;
> import org.apache.poi.hwpf.usermodel.Range;
> import org.apache.poi.hwpf.usermodel.Paragraph;
> 
> /**
>  * An instance of this calss can be used to perform a comparison between
> two
>  * binary (OLE2CDF) Microsoft Word documents.
>  *
>  * @author Mark B
>  * @version 1.00 27th July 2009
>  */
> public class DocumentComparator {
>     
>     /**
>      * Called to compare the two documents and output the results of the
>      * comparison to a third Microsoft Word document.
>      * 
>      * @param originalDoc The path to and name of the original document,
> the
>      *                    document that is the basis for the comparison.
>      * @param compareToDoc The path to and name of the document that
> should
>      *                     be compared with the original for any
> modifications.
>      * @param resultDoc The path to and name of the document that should
> contain
>      *                  the results of the comparison process.
>      * @param docTemplate The path to and name of the empty Word document
> that
>      *                    should be used as the basis for the rusults
> document.
>      * @throws java.io.IOException Thrown to signal that some sort of I/O
>      *                             Exception has occurred.
>      * @throws java.io.FileNotFoundException Thrown to signal that a file
>      *                                       could not be located.
>      */
>     public void compareDocuments(String originalDoc, String compareToDoc,
>                                  String resultDoc, String docTemplate)
>                                  throws IOException, FileNotFoundException
> {
>         ArrayList<DocumentPart> originalDocParts =
> this.loadDocument(originalDoc);
>         ArrayList<DocumentPart> compareToDocParts =
> this.loadDocument(compareToDoc);
>         this.compareDocs(originalDocParts, compareToDocParts);
>         this.saveResults(originalDocParts, compareToDocParts, resultDoc);
>     }
>     
>     /**
>      * Opens a named binary (OLE2CDF) Microsoft Word document and converts
> that
>      * documents contents into an ArrayList of instances of the
> DocumentPart
>      * class.
>      * @param docName The path to and name of a Microsoft Word document
> file.
>      * @return An instance of the ArrayList class encapsulating instances
>      *         of the DocumentPart class. Each DocumentPart will
> encapsulate
>      *         information about a paragraph of text or a table recovered
> from
>      *         the Microsoft Word document.
>      * @throws java.io.IOException If an I/O Exception occurs
>      * @throws java.io.FileNotFoundException Thrown to indicate that the
>      *                                       named Microsoft Word file
> could
>      *                                       not be located.
>      */
>     public ArrayList<DocumentPart> loadDocument(String docName)
>                                      throws IOException,
> FileNotFoundException {
>         File file = null;
>         FileInputStream fis = null;
>         HWPFDocument document = null;
>         Range overallRange = null;
>         Paragraph para = null;
>         int numParas = 0;
>         boolean inTable = false;
>         ArrayList<DocumentPart> docParts = null;
>         try {
>             // Open the Word file.
>             file = new File(docName);
>             fis = new FileInputStream(file);
>             document = new HWPFDocument(fis);
>             // Get the overall Range for the document and the number
>             // of paragraphs from this Range.
>             overallRange = document.getOverallRange();
>             numParas = overallRange.numParagraphs();
>             for(int i = 0; i < numParas; i++) {
>                 para = overallRange.getParagraph(i);
>                 // Is the paragraph 'in' a table? If so, it is possible to
>                 // recover a reference to that Table from the first
> paragraph
>                 // only. If calls are made to the getTable() method using
>                 // subsequent paragraphs then an exception will be thrown.
> So,
>                 // after getting the Table, a flag is set to prevent
> further
>                 // calls to the getTable() method.
>                 if(para.isInTable()) {
>                     if(!inTable) {
>                         // Get a reference to the Table and pass it to the
>                         // constructor of the DocumentPart class. Add the
>                         // DocumentPart instance to the ArrayLlist.
>                         docParts.add(new DocumentPart(
>                                 overallRange.getTable(para)));
>                         inTable = true;
>                     }
>                 }
>                 // The paragraph is not in a table so simply add a new
> instance
>                 // to the ArrayList that encapsulates the paragraph of
> text.
>                 else {
>                     docParts.add(new DocumentPart(para));
>                     inTable = false;
>                 }
>             }
>             return(docParts);
>         }
>         finally {
>             if(fis != null) {
>                 try {
>                   fis.close();  
>                 }
>                 catch(IOException ioEx) {
>                     // I G N O R E
>                 }
>             }
>         }
>     }
>     
>     public void compareDocs(ArrayList<DocumentPart> originalDocParts,
>                             ArrayList<DocumentPart> compareToDocParts) {
>         // TO DO: Code comparsion
>     }
>     
>     public void saveResults(ArrayList<DocumentPart> originalDocParts,
>                             ArrayList<DocumentPart> compareToDocParts,
>                             String resultDoc)
>                                      throws IOException,
> FileNotFoundException {
>         // TO DO: Code saving of results.
>     }
> 
>     /**
>      * Main entry point to the program.
>      *
>      * @param args
>      */
>     public static void main(String[] args) {
>         try {
>             DocumentComparator docComp = new DocumentComparator();
>             docComp.compareDocuments("original document",
>                                      "compare to document",
>                                      "results document",
>                                      "results document template");
>         }
>         catch(FileNotFoundException fnfEx) {
>             // TO DO: Code exception handling.
>         }
>         catch(IOException ioEx) {
>             // TO DO: Code exception handling.
>         }
>     }
> }
> 
> package comparedocuments;
> 
> import org.apache.poi.hwpf.usermodel.Range;
> import org.apache.poi.hwpf.usermodel.Paragraph;
> import org.apache.poi.hwpf.usermodel.Table;
> import org.apache.poi.hwpf.usermodel.TableRow;
> 
> /**
>  * Encapsulates a 'part' of a Microsoft Word document. Currently, that
> part can
>  * either be a Table or a paragraph of text.
>  *
>  * @author Mark B
>  * @version 1.00 27th July 2009.
>  */
> public class DocumentPart {
> 
>     private Range docPart = null;
>     private boolean comparisonStatus = false;
>     private int comparisonResult = 0;
> 
>     public static final int INSERTED = 0;
>     public static final int DELETED = 1;
>     public static final int MODIFIED = 2;
>     public static final int UN_MODIFIED = 3;
>     public static final int MOVED = 4;
> 
>     /**
>      * Create a new instance of the DocumentPart class using the following
>      * paramater.
>      *
>      * @param docPart An instance of the
> org.apache.poi.hwpf.usermodel.Range
>      *                class that will encapsulate an instance of the
>      *                org.apache.poi.hwpf.usermodel.Paragraph or an
> instance
>      *                of the org.apache.poi.hwpf.usermodel.Table class.
>      */
>     public DocumentPart(Range docPart) {
>         this.docPart = docPart;
>         // Note that as the part has not been successfully compared to
> another
>         // part the status is false.
>         this.comparisonStatus = false;
>         // and that the type is set to un-modified. Any parts that have
> not been
>         // checked or that are not un-modified will be written away to the
>         // results document.
>         this.comparisonResult = DocumentPart.UN_MODIFIED;
>     }
> 
>     /**
>      * Has a match been foound for this document part?
>      *
>      * @return A boolean value that indicates whether a match was found
> between
>      *         two document parts.
>      */
>     public boolean isMatched() {
>         return(this.comparisonStatus);
>     }
> 
>     /**
>      * Get the result of the comparison.
>      *
>      * @return A primitive int value that indicates the result of
> comparing
>      *         this document part to others. The following constants have
> been
>      *         declared;
>      *             DocumentPart.INSERTED = 0;
>      *             DocumentPart.DELETED = 1;
>      *             DocumentPart.MODIFIED = 2;
>      *             DocumentPart.UN_MODIFIED = 3;
>      *             DocumentPart.MOVED = 4;
>      *
>      */
>     public int getComparisonResult() {
>         return(this.comparisonResult);
>     }
> 
>     /**
>      * Store the result of the domnparsion between document parts.
>      *
>      * @param comparisonResult A primitive int whose value indicates the
> result
>      *                         of comparing one document part with others.
>      */
>     public void setComparisonResult(int comparisonResult) {
>         this.comparisonResult = comparisonResult;
>     }
> 
>     /**
>      * Does a DocumentPart encapsulate a table?
>      * @return A primitive boolean value; true if the DocumentPart
> encapsulates
>      *         a Table, false otherwise.
>      */
>     public boolean isTable() {
>         return(this.docPart instanceof Table);
>     }
> 
>     /**
>      * If the DocumentPart encapsulates a Table, get the number of rows in
> the
>      * rable.
>      *
>      * @return A primitive int whose value indicates how many rows there
> are in
>      *         the table.
>      * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
>      *         called for a DocumentPart instance that encapsulates a
> Paragraph.
>      */
>     public int getNumRows() throws UnsupportedOperationException {
>         int numRows = 0;
>         if(this.isTable()) {
>             Table table = (Table)this.docPart;
>             numRows = table.numRows();
>         }
>         else {
>             throw new UnsupportedOperationException("The DocumentPart does
> " +
>                     "not encapsulate a Table.");
>         }
>         return(numRows);
>     }
> 
>     /**
>      * How many columns are there in the Table. This method assumes that
> the
>      * table is 'square', i.e. that each row of the Table holds the same
> number
>      * of columns.
>      *
>      * @return A primitive int whose value indicates how many columns
> there are
>      *         in the Table.
>      * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
>      *         called for a DocumentPart instance that encapsulates a
> Paragraph.
>      */
>     public int getNumColumns() throws UnsupportedOperationException {
>         return(this.getNumColumns(0));
>     }
> 
>     /**
>      * How many columns are there in a specific row of the Table.
>      *
>      * @return A primitive int whose value indicates how many columns
> there are
>      *         in the Table row.
>      * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
>      *         called for a DocumentPart instance that encapsulates a
> Paragraph.
>      */
>     public int getNumColumns(int rowNum) throws
> UnsupportedOperationException {
>         int numColumns = 0;
>         if(this.isTable()) {
>             Table table = (Table)this.docPart;
>             TableRow row = table.getRow(rowNum);
>             numColumns = row.numCells();
>         }
>         else {
>             throw new UnsupportedOperationException("The DocumentPart does
> " +
>                     "not encapsulate a Table.");
>         }
>         return(numColumns);
>     }
> 
>     /**
>      * Return the contents of a specific cell.
>      *
>      * @param rowNum A primitive int that indicates the row the cell is
> on.
>      *               Remember that row indices are zero based.
>      * @param colNum A primitive int that indicates the column the cell is
> in.
>      *               Remember that column indices are zero based.
>      * @return An instance of the String class that encapsulates the cells
>      *         contents
>      * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
>      *         called for a DocumentPart instance that encapsulates a
> Paragraph.
>      */
>     public String getCellContents(int rowNum, int colNum)
>                                           throws
> UnsupportedOperationException {
>         return(null);
>     }
> 
>     /**
>      * Return the text of the Paragraph.
>      *
>      * @return An instance of the String class that encapsulates the text
>      *         the Paragraph contained. Note that this will be stripped of
>      *         all fields.
>      * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
>      *         called for a DocumentPart instance that encapsulates a
> Table.
>      */
>     public String getParagraphText() throws UnsupportedOperationException
> {
>         String returnValue = null;
>         if(!this.isTable()) {
>             Paragraph para = (Paragraph)this.docPart;
>             returnValue = Range.stripFields(para.text());
>         }
>         else {
>             throw new IllegalStateException("The DocumentPart does not " +
>                     "encapsulate a Paragraph.");
>         }
>         return(returnValue);
>     }
> }
> 
> 
> 
> bihag wrote:
>> 
>> Hi All,
>> 
>> We want to compare two document and what ever things are not common that
>> we have to highlight with some color or any other way ... So I thing we
>> have to merge document or create new document which has content of both
>> the document, and show difference with some color, like deleted with red,
>> newly added with blue ... 
>> 
>> Mainly we are looking for OLE2CDF doc compare solution ...
>> 
>> please provide some code sniplet if possible ...
>> 
>> Thanking you in advance ...
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-compare-2-word-doc-%28OLE2CDF-or-OpenXML%29.-tp24673506p24692130.html
Sent from the POI - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to