[
https://issues.apache.org/jira/browse/PDFBOX-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Hewson closed PDFBOX-2116.
-------------------------------
Resolution: Won't Fix
As Andreas says, PDF isn't text-based like HTML or RTF, it's more like
PostScript where a page is a graphic which happens to include text, or
sometimes just images of text (e.g. JPEG). So a text-based diff isn't going to
work unless you're only interested in the extracted text content (which might
be different from the actual text the user sees, because text extraction can't
be done perfectly with PDF).
You can use PDFBox to render the page to an image and then compare the pixels
of two images for differences, which is trivial. But there's nothing else that
PDFBox can do for you.
> Compare tow pdf file and hilight the mismatch value in generated pdf file
> --------------------------------------------------------------------------
>
> Key: PDFBOX-2116
> URL: https://issues.apache.org/jira/browse/PDFBOX-2116
> Project: PDFBox
> Issue Type: Task
> Components: PDModel
> Affects Versions: 1.8.5
> Environment: Java Environment using PDF box
> Reporter: Amit Vishwakarma
> Labels: test
> Original Estimate: 12h
> Remaining Estimate: 12h
>
> {code}
> PDDocument doc= PDDocument.load(pdf1);
> PDDocument doc2= PDDocument.load(pdf2);
>
> System.out.println(doc);
>
> @SuppressWarnings("rawtypes")
> List list=doc.getDocumentCatalog().getAllPages();
> @SuppressWarnings("rawtypes")
> List list2=doc2.getDocumentCatalog().getAllPages();
>
> PDFTextStripper stripper=new PDFTextStripper();
> PDFTextStripper stripper2=new PDFTextStripper();
>
> String pages= null;
> String pages2= null;
>
> System.out.println("list1 size : "+list.size());
> System.out.println("list2 size : "+list2.size());
>
> if(list.size()==list2.size()){
>
> for(int i=1;i<=list.size();i++){
> stripper.setStartPage(i);
> stripper.setEndPage(i);
>
> stripper2.setStartPage(i);
> stripper2.setEndPage(i);
>
> //
> System.out.println("-----------"+stripper.getEndPage());
>
> pages = stripper.getText(doc);
> pages2 = stripper2.getText(doc2);
>
> String lines[] = pages.split("\\r?\\n");
> String lines2[] = pages2.split("\\r?\\n");
>
> System.out.println("Line in first page :
> "+lines.length);
> System.out.println("Line in second page :
> "+lines2.length);
>
> if(lines.length==lines2.length){
>
> for(int a=0;a<lines.length;a++){
> // System.out.println(lines[a]);
> //
> System.out.println("************----------**********");
> String cols[] =
> lines[a].split("\\s+");
> String cols2[] =
> lines2[a].split("\\s+");
> if(cols.length==cols2.length){
> for(int
> b=0;b<cols.length;b++){
>
> //System.out.println(cols[b].toString()+" - - - - "+cols2[b].toString());
>
> //System.out.println("Page : "+i+" Row : "+a+" Column : "+b);
>
> if(!cols[b].toString().equalsIgnoreCase(cols2[b].toString())){
>
> System.out.println("Not matched : "+cols2[b].toString());
>
> //System.out.println("Page : "+i+" Row : "+a+" Column : "+b);
> }
>
> }
> }else{
>
> System.out.println("column are not equals");
> }
> }
> System.out.println("******");
> }else{
> System.out.println("Line are not equal
> ");
> }
>
> }
> }else{
> System.out.println("Page size is not equal");
> }
>
>
> doc.close();
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)