Hi Mark,
I would sincerely like to convey my Thanks to you.
The tips you have given is really helpful.
appreciated your time and efforts.
Regards,
Bihag
MSB wrote:
>
> Have not had the time to do much work or ANY testing so please treat this
> with caution.
>
> What I am proposing is that the contents of a Word document be converted
> into an ArrayList. That ArrayList will contain instances of the
> DocumentPart class and these will facilitate the comparison operation. I
> have not given those a great deal of thought yet but believe that we
> should check for any paragraphs - not tables yet - being inserted,
> deleted, modified (not sure how to proceed with this one yet) or moved. As
> you can see, I have provided constants in the DocumentPart class to
> support these different results. The comparison status flag is there to
> prevent a paragraph being checked again once a match has been found but I
> am thinking of another use if the logic holds.
>
> As yet, I have not coded the compare methods or the save results method as
> I think it is wise to throughly test the loading method firstly. We need
> to be certain that the ArrayList of DocumentPart(s) accurately describes
> the documents. I think that you are in a 'better' time-zone and that you
> may have the opportunity to test the code before me. If you look at the
> main method of the DocumentComparator class, you will see how to run the
> code. All you need to do for now is make sure that the first two
> parameters to the compareDocuments() method point to Word files and then
> run the code. To check the results, you can either modify DocumentPart to
> add a toString() method that outputs the instances contents or simply
> call the getParagraphText() and getCellContents() methods from the
> compareDocument() method.
>
> Anyway, here is the code so far. Have a look and see if it is the way you
> want to go - or think makes sense. Do not feel that you cannot criticise
> or alter the code or the approach as, for now we are not committed to any
> particular strategy, just exploring what is possible.
>
> package comparedocuments;
>
> import java.io.File;
> import java.io.FileInputStream;
> import java.util.ArrayList;
> import java.io.FileNotFoundException;
> import java.io.IOException;
>
> import org.apache.poi.hwpf.HWPFDocument;
> import org.apache.poi.hwpf.usermodel.Range;
> import org.apache.poi.hwpf.usermodel.Paragraph;
>
> /**
> * An instance of this calss can be used to perform a comparison between
> two
> * binary (OLE2CDF) Microsoft Word documents.
> *
> * @author Mark B
> * @version 1.00 27th July 2009
> */
> public class DocumentComparator {
>
> /**
> * Called to compare the two documents and output the results of the
> * comparison to a third Microsoft Word document.
> *
> * @param originalDoc The path to and name of the original document,
> the
> * document that is the basis for the comparison.
> * @param compareToDoc The path to and name of the document that
> should
> * be compared with the original for any
> modifications.
> * @param resultDoc The path to and name of the document that should
> contain
> * the results of the comparison process.
> * @param docTemplate The path to and name of the empty Word document
> that
> * should be used as the basis for the rusults
> document.
> * @throws java.io.IOException Thrown to signal that some sort of I/O
> * Exception has occurred.
> * @throws java.io.FileNotFoundException Thrown to signal that a file
> * could not be located.
> */
> public void compareDocuments(String originalDoc, String compareToDoc,
> String resultDoc, String docTemplate)
> throws IOException, FileNotFoundException
> {
> ArrayList<DocumentPart> originalDocParts =
> this.loadDocument(originalDoc);
> ArrayList<DocumentPart> compareToDocParts =
> this.loadDocument(compareToDoc);
> this.compareDocs(originalDocParts, compareToDocParts);
> this.saveResults(originalDocParts, compareToDocParts, resultDoc);
> }
>
> /**
> * Opens a named binary (OLE2CDF) Microsoft Word document and converts
> that
> * documents contents into an ArrayList of instances of the
> DocumentPart
> * class.
> * @param docName The path to and name of a Microsoft Word document
> file.
> * @return An instance of the ArrayList class encapsulating instances
> * of the DocumentPart class. Each DocumentPart will
> encapsulate
> * information about a paragraph of text or a table recovered
> from
> * the Microsoft Word document.
> * @throws java.io.IOException If an I/O Exception occurs
> * @throws java.io.FileNotFoundException Thrown to indicate that the
> * named Microsoft Word file
> could
> * not be located.
> */
> public ArrayList<DocumentPart> loadDocument(String docName)
> throws IOException,
> FileNotFoundException {
> File file = null;
> FileInputStream fis = null;
> HWPFDocument document = null;
> Range overallRange = null;
> Paragraph para = null;
> int numParas = 0;
> boolean inTable = false;
> ArrayList<DocumentPart> docParts = null;
> try {
> // Open the Word file.
> file = new File(docName);
> fis = new FileInputStream(file);
> document = new HWPFDocument(fis);
> // Get the overall Range for the document and the number
> // of paragraphs from this Range.
> overallRange = document.getOverallRange();
> numParas = overallRange.numParagraphs();
> for(int i = 0; i < numParas; i++) {
> para = overallRange.getParagraph(i);
> // Is the paragraph 'in' a table? If so, it is possible to
> // recover a reference to that Table from the first
> paragraph
> // only. If calls are made to the getTable() method using
> // subsequent paragraphs then an exception will be thrown.
> So,
> // after getting the Table, a flag is set to prevent
> further
> // calls to the getTable() method.
> if(para.isInTable()) {
> if(!inTable) {
> // Get a reference to the Table and pass it to the
> // constructor of the DocumentPart class. Add the
> // DocumentPart instance to the ArrayLlist.
> docParts.add(new DocumentPart(
> overallRange.getTable(para)));
> inTable = true;
> }
> }
> // The paragraph is not in a table so simply add a new
> instance
> // to the ArrayList that encapsulates the paragraph of
> text.
> else {
> docParts.add(new DocumentPart(para));
> inTable = false;
> }
> }
> return(docParts);
> }
> finally {
> if(fis != null) {
> try {
> fis.close();
> }
> catch(IOException ioEx) {
> // I G N O R E
> }
> }
> }
> }
>
> public void compareDocs(ArrayList<DocumentPart> originalDocParts,
> ArrayList<DocumentPart> compareToDocParts) {
> // TO DO: Code comparsion
> }
>
> public void saveResults(ArrayList<DocumentPart> originalDocParts,
> ArrayList<DocumentPart> compareToDocParts,
> String resultDoc)
> throws IOException,
> FileNotFoundException {
> // TO DO: Code saving of results.
> }
>
> /**
> * Main entry point to the program.
> *
> * @param args
> */
> public static void main(String[] args) {
> try {
> DocumentComparator docComp = new DocumentComparator();
> docComp.compareDocuments("original document",
> "compare to document",
> "results document",
> "results document template");
> }
> catch(FileNotFoundException fnfEx) {
> // TO DO: Code exception handling.
> }
> catch(IOException ioEx) {
> // TO DO: Code exception handling.
> }
> }
> }
>
> package comparedocuments;
>
> import org.apache.poi.hwpf.usermodel.Range;
> import org.apache.poi.hwpf.usermodel.Paragraph;
> import org.apache.poi.hwpf.usermodel.Table;
> import org.apache.poi.hwpf.usermodel.TableRow;
>
> /**
> * Encapsulates a 'part' of a Microsoft Word document. Currently, that
> part can
> * either be a Table or a paragraph of text.
> *
> * @author Mark B
> * @version 1.00 27th July 2009.
> */
> public class DocumentPart {
>
> private Range docPart = null;
> private boolean comparisonStatus = false;
> private int comparisonResult = 0;
>
> public static final int INSERTED = 0;
> public static final int DELETED = 1;
> public static final int MODIFIED = 2;
> public static final int UN_MODIFIED = 3;
> public static final int MOVED = 4;
>
> /**
> * Create a new instance of the DocumentPart class using the following
> * paramater.
> *
> * @param docPart An instance of the
> org.apache.poi.hwpf.usermodel.Range
> * class that will encapsulate an instance of the
> * org.apache.poi.hwpf.usermodel.Paragraph or an
> instance
> * of the org.apache.poi.hwpf.usermodel.Table class.
> */
> public DocumentPart(Range docPart) {
> this.docPart = docPart;
> // Note that as the part has not been successfully compared to
> another
> // part the status is false.
> this.comparisonStatus = false;
> // and that the type is set to un-modified. Any parts that have
> not been
> // checked or that are not un-modified will be written away to the
> // results document.
> this.comparisonResult = DocumentPart.UN_MODIFIED;
> }
>
> /**
> * Has a match been foound for this document part?
> *
> * @return A boolean value that indicates whether a match was found
> between
> * two document parts.
> */
> public boolean isMatched() {
> return(this.comparisonStatus);
> }
>
> /**
> * Get the result of the comparison.
> *
> * @return A primitive int value that indicates the result of
> comparing
> * this document part to others. The following constants have
> been
> * declared;
> * DocumentPart.INSERTED = 0;
> * DocumentPart.DELETED = 1;
> * DocumentPart.MODIFIED = 2;
> * DocumentPart.UN_MODIFIED = 3;
> * DocumentPart.MOVED = 4;
> *
> */
> public int getComparisonResult() {
> return(this.comparisonResult);
> }
>
> /**
> * Store the result of the domnparsion between document parts.
> *
> * @param comparisonResult A primitive int whose value indicates the
> result
> * of comparing one document part with others.
> */
> public void setComparisonResult(int comparisonResult) {
> this.comparisonResult = comparisonResult;
> }
>
> /**
> * Does a DocumentPart encapsulate a table?
> * @return A primitive boolean value; true if the DocumentPart
> encapsulates
> * a Table, false otherwise.
> */
> public boolean isTable() {
> return(this.docPart instanceof Table);
> }
>
> /**
> * If the DocumentPart encapsulates a Table, get the number of rows in
> the
> * rable.
> *
> * @return A primitive int whose value indicates how many rows there
> are in
> * the table.
> * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
> * called for a DocumentPart instance that encapsulates a
> Paragraph.
> */
> public int getNumRows() throws UnsupportedOperationException {
> int numRows = 0;
> if(this.isTable()) {
> Table table = (Table)this.docPart;
> numRows = table.numRows();
> }
> else {
> throw new UnsupportedOperationException("The DocumentPart does
> " +
> "not encapsulate a Table.");
> }
> return(numRows);
> }
>
> /**
> * How many columns are there in the Table. This method assumes that
> the
> * table is 'square', i.e. that each row of the Table holds the same
> number
> * of columns.
> *
> * @return A primitive int whose value indicates how many columns
> there are
> * in the Table.
> * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
> * called for a DocumentPart instance that encapsulates a
> Paragraph.
> */
> public int getNumColumns() throws UnsupportedOperationException {
> return(this.getNumColumns(0));
> }
>
> /**
> * How many columns are there in a specific row of the Table.
> *
> * @return A primitive int whose value indicates how many columns
> there are
> * in the Table row.
> * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
> * called for a DocumentPart instance that encapsulates a
> Paragraph.
> */
> public int getNumColumns(int rowNum) throws
> UnsupportedOperationException {
> int numColumns = 0;
> if(this.isTable()) {
> Table table = (Table)this.docPart;
> TableRow row = table.getRow(rowNum);
> numColumns = row.numCells();
> }
> else {
> throw new UnsupportedOperationException("The DocumentPart does
> " +
> "not encapsulate a Table.");
> }
> return(numColumns);
> }
>
> /**
> * Return the contents of a specific cell.
> *
> * @param rowNum A primitive int that indicates the row the cell is
> on.
> * Remember that row indices are zero based.
> * @param colNum A primitive int that indicates the column the cell is
> in.
> * Remember that column indices are zero based.
> * @return An instance of the String class that encapsulates the cells
> * contents
> * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
> * called for a DocumentPart instance that encapsulates a
> Paragraph.
> */
> public String getCellContents(int rowNum, int colNum)
> throws
> UnsupportedOperationException {
> return(null);
> }
>
> /**
> * Return the text of the Paragraph.
> *
> * @return An instance of the String class that encapsulates the text
> * the Paragraph contained. Note that this will be stripped of
> * all fields.
> * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
> * called for a DocumentPart instance that encapsulates a
> Table.
> */
> public String getParagraphText() throws UnsupportedOperationException
> {
> String returnValue = null;
> if(!this.isTable()) {
> Paragraph para = (Paragraph)this.docPart;
> returnValue = Range.stripFields(para.text());
> }
> else {
> throw new IllegalStateException("The DocumentPart does not " +
> "encapsulate a Paragraph.");
> }
> return(returnValue);
> }
> }
>
>
>
> bihag wrote:
>>
>> Hi All,
>>
>> We want to compare two document and what ever things are not common that
>> we have to highlight with some color or any other way ... So I thing we
>> have to merge document or create new document which has content of both
>> the document, and show difference with some color, like deleted with red,
>> newly added with blue ...
>>
>> Mainly we are looking for OLE2CDF doc compare solution ...
>>
>> please provide some code sniplet if possible ...
>>
>> Thanking you in advance ...
>>
>
>
--
View this message in context:
http://www.nabble.com/How-to-compare-2-word-doc-%28OLE2CDF-or-OpenXML%29.-tp24673506p24692130.html
Sent from the POI - Dev mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]