On Wed, Apr 24, 2013 at 12:41 PM, hari prasadh <hariprasad...@gmail.com>wrote:

> Is there anyway to compare two doc files in linux(centos).i tried with diff
> command and it is not working.
>

doc files have tons of meta data like author, date and times. A
direct comparison using tools like diff will almost always fail. Since you
have mentioned in the other message that you only want to compare text
content and not formatting, you could possibly use the following method...

Install LibreOffice and run the following command to convert the doc file
to plain text:
libreoffice --headless --invisible --convert-to txt:Text filetoconvert.doc

After this, you will have all the text in a file called filetoconvert.txt
Run the stream editor `sed` to remove all blank lines:
sed '/^$/d' filetoconvert.txt > convertedandstripped.txt

Now you can run `diff` to find simple differences. If you want to compare
in a more intelligent fashion, open the text file using your favorite
programming language and write the logic :)

If you are going to call LibreOffice from within your shell script or php
script or any other app, use the following command. I struggled to make it
work and took hours of internet searching!

export HOME=/tmp && libreoffice --headless --invisible --convert-to
txt:Text filetoconvert.doc

LibreOffice requires a valid home directory.

All the best.

Regards,
Arun Venkataswamy
http://wondroussky.blogspot.in/

"கற்றது கைமண் அளவு, கல்லாதது உலகளவு" - ஔவையார்
Known is a drop, Unknown is an ocean
_______________________________________________
ILUGC Mailing List:
http://www.ae.iitm.ac.in/mailman/listinfo/ilugc

Reply via email to