Re: The Multivio project
Dear Ferran, I work on the multivio project especially on the server side, which do the pdf rendering. Le 3 mai 2010 à 11:19, Ferran Jorba a écrit : Hello Miguel, It certainly looks interesting! I've tested a couple of PDF files from our site. The first one happened to weight 11 MB (an old scanned journal, from http://ddd.uab.cat/record/53804), and it took so long that I had to abort it: http://demo.multivio.org/client/#geturl=http://ddd.uab.cat/pub/garbanzo/garbanzo_a1873n47.pdf The second one, a modern native PDF (from http://ddd.uab.cat/record/5), was sligtly better: http://demo.multivio.org/client/#geturl=http://ddd.uab.cat/pub/autonoma/autonoma_a2010m3n233.pdf I'd certainly choose Multivio instead of our Flash based equivalent (no, it's no my fault, I give it to you so you can see a propietary alternative): http://www.uab.es/revista-autonoma/ Concerning the fact that some pdfs can take while to be displayed on Multivio, this is mainly due to network problems at the server side. The external file should be downloaded on the multivio server, and it seems that we have some problem with our firewall. To avoid this, you can try multivio with local files with RERO DOC. For example: http://doc.rero.ch/record/18242?ln=fr (try the multivio button) or directly: http://demo.multivio.org/client/#geturl=http://doc.rero.ch/record/18242/export/xd However, we are working on a new prototype to make multivio more responsive. But as you can see, accessing very big files take only few seconds. The main idea is that only the information see by the user is downloaded. This is particularly useful for smart phone or GPRS/UMTS internet connexion. The fast web pdf is a part of the solution for big pdf files. At RERO, we have spécific collection such as L'Impartial which is a Swiss newpaper. That's represents Giga of data. For such cases, fast pdf is not really useful. Moreover, to search in a pdf or display the Table Of Content, a file should be completely downloaded. However, I'd say that the quality of the thumbnails can be improved. Do you mean the quality of the rendering? Which tool are you using? In my case, I've found that, by far, the fastest and best results are using a combination of Xpdf's pdftoppm and Imagemagick's convert. We create the thumbnails of the first page of our PDFs this way (simplified): $ pdftoppm -f $page -l $page $file.pdf $file $ convert -thumbnail 85 $file-0$page.ppm $file.png $ rm $file-0$page.ppm pdftoppm converts all pages to ppm if no -f or -l parameters are given. Relying on ImageMagick's own PDF to PNG (or any other graphic format) conversion, the route goes through Ghostscript, and it brings any system to its knees, and the quality is worses. I use the same kind of tools: I use poppler and I did a python wrapper on the poppler classes to perform the rendering and PIL for the image manipulation. I do not want to have system calls. I know that the rendering is affected by the font configuration/installation on the linux distribution. I have to do some tests to obtain the best results. If you have some advise for that, do not hesitate. Hope it helps, Course, all comments are welcome. Thanks again. Ferran -- Johnny
Re: The Multivio project
Hi ! Looks very interesting :) Thanks for the information about this. just as a curiosity... It does not support polish fiscal declarations ;) (Very few PDF readers do ;) ) http://demo.multivio.org/client/#geturl=http://e-deklaracje.mf.gov.pl/files/pdf/PIT-36%2814%29_v2-0.pdf cheers Piotr 2010/5/3 Miguel Moreira miguel.more...@rero.ch: i...@multivio.org
Re: The Multivio project
Hello Miguel, I hope you'll excuse me for using the list, but I have an announcement that might be of interest to you. There's a project going on here at RERO called Multivio whose goal is to provide a presentation layer for archives of digital documents: https://www.multivio.org/ [...] Please don't hesitate to take a look at the project site https://www.multivio.org/, try some examples, try it with your own documents and send us some feedback at i...@multivio.org. It certainly looks interesting! I've tested a couple of PDF files from our site. The first one happened to weight 11 MB (an old scanned journal, from http://ddd.uab.cat/record/53804), and it took so long that I had to abort it: http://demo.multivio.org/client/#geturl=http://ddd.uab.cat/pub/garbanzo/garbanzo_a1873n47.pdf The second one, a modern native PDF (from http://ddd.uab.cat/record/5), was sligtly better: http://demo.multivio.org/client/#geturl=http://ddd.uab.cat/pub/autonoma/autonoma_a2010m3n233.pdf I'd certainly choose Multivio instead of our Flash based equivalent (no, it's no my fault, I give it to you so you can see a propietary alternative): http://www.uab.es/revista-autonoma/ However, I'd say that the quality of the thumbnails can be improved. Which tool are you using? In my case, I've found that, by far, the fastest and best results are using a combination of Xpdf's pdftoppm and Imagemagick's convert. We create the thumbnails of the first page of our PDFs this way (simplified): $ pdftoppm -f $page -l $page $file.pdf $file $ convert -thumbnail 85 $file-0$page.ppm $file.png $ rm $file-0$page.ppm pdftoppm converts all pages to ppm if no -f or -l parameters are given. Relying on ImageMagick's own PDF to PNG (or any other graphic format) conversion, the route goes through Ghostscript, and it brings any system to its knees, and the quality is worses. Hope it helps, Ferran
Re: The Multivio project
Hello Samuele, In data lunedì 3 maggio 2010 11:19:23, Ferran Jorba ha scritto: It certainly looks interesting! I've tested a couple of PDF files from our site. The first one happened to weight 11 MB (an old scanned journal, from http://ddd.uab.cat/record/53804), and it took so long that I had to abort it: just for reference, in case it's needed also by other users, the pdfopt utils (from ghostscript) can transform any PDF into a linearized PDF (also called fast web view mode), that will add hints to the PDF to reference single pages without downloading the full file. I guess this would make the multivio able to open your 11Mb scanned document without any problem. Thanks for thee suggestion. I've tried it on one of our 100 MB+ monsters and what I've seen is that the size doesn't vary. But certainly Xpdf's pdfinfo notes the change in the «Optimized» field: before after pdfopt Pages: 294294 Encrypted: no no File size: 112858067 bytes112838493 bytes Optimized: no yes PDF version:1.51.5 Another task in our TODO list... Thanks, Ferran
Re: The Multivio project
In data lunedì 3 maggio 2010 16:23:06, Ferran Jorba ha scritto: Thanks for thee suggestion. I've tried it on one of our 100 MB+ monsters and what I've seen is that the size doesn't vary. But certainly Xpdf's pdfinfo notes the change in the «Optimized» field: before after pdfopt Pages: 294294 Encrypted: no no File size: 112858067 bytes112838493 bytes Optimized: no yes PDF version:1.51.5 Another task in our TODO list... Yes, this is correct. The file is basically the same and the size should just get a bit larger. What really changes is that the PDF is reorganized in a way that a special table stored inside the PDF renderer in a easy position is filled up with pointers to the pages, so that it's possible to jump directly to any exact page (so this can be exploited in HTTP connections, to request exactly the range of bytes which is sufficient to render the specific page, while continuing pre-fetching the rest of the document in background). So if you actually try to access your optimized monster through Multivio (if Multivio is taking advantage of this feature) you should definitively be able to jump to each page quickly, regardless of the size...) Cheers, Sam P.s in the next release of Invenio there will be integrated a conversion library that will, among other things, wrap this pdfopt in the fulltext management operations, so that you can have in principle for free this optimization (though in the current git master it's not currently fully integrated yet in WebSubmit friends...) -- Samuele Kaplun ** CERN Document Server ** http://cds.cern.ch/