Hi,

Thanks Mark, I think first We have to create some algorithm ... I already
tried some code will compare two paragraph but the problem is like u written
... paragraph one can be come as paragraph 3rd in document that time it is
creating some problem ...

I will go through the API ... let's see if found any clue ...

Anyway thanks and take care ...

Regards,
Bihag Raval.


MSB wrote:
> 
> That should be do-able, with one caveat, images would be the only thing I
> am unsure about at this point; the complicating factor will be the depth
> of the comparison. Your first task IMO would be to decide exactly how the
> comparison should proceed; to sketch out an algorithm that will determine
> 'differentness'. Imagine that we have the first paragraph from document
> one, what should we compare it to in document two? Should we only compare
> corresponding paragraphs, i.e. only compare paragraph one in document one
> with paragrph two in document two? What happens if a new paragraph was
> inserted into document two so that now paragraph one in document one
> matches paragraph two in document two?
> 
> If you have a good search around the list, there is code that demonstrates
> how to extract the text from a document along with the tables. I am
> guessing that you will need to get at the tables 'in line' so to speak as
> a change in the position of the table within the document will be a change
> as far as your algorithm is concerned. At this time, I cannot offer to
> help any further as I am about to leave for the 'office' - a damp, rainy
> nature reserve in actuality. If I have the time tonight, I will try to put
> something together but would suggest that you search through the posts to
> the list to track down some code that will allow you to get at the
> documents contents as a starting point; I am confident that there is code
> there that demonstrates how to get at the tables contents in-line. As
> always though, I cannot promise anything - I am grappling with other Word
> 'issues' that are absorbing quite a bit of time - but will help out where
> I can. Finally, XSSF is still a bit of a mystery to me, I have not done
> any 'real' work with the API.
> 
> Yours
> 
> Mark B
> 
> 
> bihag wrote:
>> 
>> Hi Mark,
>> 
>> Thanks for replay ... 
>> 
>> What I want is compare two same versions of the document and note any
>> changes that have been made.
>> If I can get image, table changes that's really great ... but if I can
>> only get text changes thats more than enough for current requirement ... 
>> 
>> What I will do is, I will pass 2 documents to function that function
>> should create new document with both file content and changes like ms
>> word is doing with compare option in it's menu.
>> 
>> ex. 
>> 
>> File A.doc contains:- The brown fox jumps from lazy dog.
>> File B.doc contains:- The fox jumps from lazy donkey.
>> 
>> File generated after compare A.doc and B.doc contains :- this image file
>> 
>>  http://www.nabble.com/file/p24674962/compare.jpg 
>> 
>> 
>> MSB wrote:
>>> 
>>> This could very well be possible; I have certainly had some success
>>> creating new Word documents using the API. Merging one document into
>>> another is more tricky and not something I would try to do myself with
>>> the API just yet. The first thing is to be clear on is exactly how you
>>> wish to compare the documents. Are you saying that you want to compare
>>> two versions of the same document and note any changes that have been
>>> made? Are you looking just at the text and not at any formatting applied
>>> to the text?
>>> 
>>> If so, then you could use the WordExtractor class to get at the text of
>>> the two documents. This class can return an array of String(s) where
>>> each element maps to a paragraph (I think) in the source document. Next,
>>> you could compare the elements within the arrays to determine if a
>>> paragraph had been deleted, added, moved, modified, etc. If you found a
>>> difference and identified what it was, then that paragraph could be
>>> written away a new 'results' document. To be completely honest, I have
>>> never tried to do much work with the formatting of the text and I cannot
>>> claim sole authorship of this code because I got a start from an example
>>> I found on the 'net. Anyway, here is some very simple code to create a
>>> Word document;
>>> 
>>> POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream("..empty
>>> file.."));
>>> HWPFDocument doc = new HWPFDocument(fs);
>>> // centered paragraph with large font size
>>> Range range = doc.getRange();
>>> Paragraph par1 = range.insertAfter(new ParagraphProperties(), 0);
>>> par1.setSpacingAfter(200);
>>> // justification: 0=left, 1=center, 2=right, 3=left and right
>>> par1.setJustification((byte) 1);
>>> 
>>> 
>>> CharacterRun run1 = par1.insertAfter("one");
>>> run1.setFontSize(2 * 18);
>>> 
>>> // paragraph with bold typeface
>>> Paragraph par2 = run1.insertAfter(new ParagraphProperties(), 0);
>>> par2.setSpacingAfter(200);
>>> CharacterRun run2 = par2.insertAfter("two two two two two two two two
>>> two two two two two");
>>> run2.setBold(true);
>>> 
>>> // paragraph with italic typeface and a line indent in the first line
>>> Paragraph par3 = run2.insertAfter(new ParagraphProperties(), 0);
>>> par3.setFirstLineIndent(200);
>>> par3.setSpacingAfter(200);
>>> CharacterRun run3 = par3.insertAfter("three three three three three
>>> three three three three "
>>>     + "three three three three three three three three three three three
>>> three three three "
>>>     + "three three three three three three three three three three three
>>> three three three");
>>> run3.setItalic(true);
>>> 
>>> // add a custom document property (needs POI 3.5; POI 3.2 doesn't save
>>> custom properties)
>>> DocumentSummaryInformation dsi = doc.getDocumentSummaryInformation();
>>> CustomProperties cp = dsi.getCustomProperties();
>>> if (cp == null) {
>>>     cp = new CustomProperties();
>>> }
>>> cp.put("myProperty", "prop prop prop");
>>> dsi.setCustomProperties(cp);
>>> 
>>> doc.write(new FileOutputStream("..final file.."));
>>> 
>>> The key wrinkle is that HWPF cannot actually create a new, empty Word
>>> document; you will need to use Word itself to create a new file that can
>>> be used as the input to this process - I have called it the empty file
>>> in the code above. All you need to do is open Word, select New->Document
>>> and then save this away. Use this empty file as the input to the process
>>> and you should be away.
>>> 
>>> There is a setColor() method defined on the CharacterRun class but I
>>> have never used it myself. The only advice I can offer is to play with
>>> it and see what the effect is on a simple bit of code such as this one.
>>> You will have access to the usual effects such as strikethrough again
>>> using the CharacterRun class.
>>> 
>>> Yours
>>> 
>>> Mark B
>>> 
>>> 
>>> bihag wrote:
>>>> 
>>>> Hi All,
>>>> 
>>>> We want to compare two document and what ever things are not common
>>>> that we have to highlight with some color or any other way ... So I
>>>> thing we have to merge document or create new document which has
>>>> content of both the document, and show difference with some color, like
>>>> deleted with red, newly added with blue ... 
>>>> 
>>>> Mainly we are looking for OLE2CDF doc compare solution ...
>>>> 
>>>> please provide some code sniplet if possible ...
>>>> 
>>>> Thanking you in advance ...
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-compare-2-word-doc-%28OLE2CDF-or-OpenXML%29.-tp24673506p24676761.html
Sent from the POI - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to