Thanks so much for your answer, it sounds like exactly what I need! I followed 
the instructions, so that my "pdf2html" script appears in the Apple scripts 
menu. However, when I selected a folder of 68 pdf files and then ran the 
script, nothing seemed to happen. Is the output folder the pdf source folder, 
or somewhere else? The readme said it would be the source folder, but maybe it 
isn't? I did not see any errors, and cmd-k in the script editor did not throw 
any errors, so I'm not sure what happened. Any ideas would be wonderful. Thanks 
again!
On Jan 25, 2013, at 6:26 PM, Esther <mori...@mac-access.net> wrote:

> Hi Alex, Dan, Gena, Annie, and Others,
> 
> I can adapt Dan's suggestion of using pdftotext to an AppleScript by putting 
> a wrapper around the command line instruction, so you can select files in 
> Finder and run the AppleScript in order to create .html versions of the 
> selected files in the same directory.  This might work for his purposes, but 
> the problem is the requirement "I'd need line spacing preserved, and 
> navigation could be made into headings."
> 
> It is actually difficult to find a program that maintains formatting in 
> converting back to HTML, especially of table data in PDF files, although that 
> might depend on how the PDF file was generated.  The test case I use for this 
> is the HTML page for Appendix A of the "VoiceOver Getting Started Guide" at:
> http://help.apple.com/voiceover/info/guide/10.8/English.lproj/index.html
> You'll notice that the columns of shortcut keys and associated actions read 
> correctly when you use the web page.  If you print out a PDF version of the 
> page, VoiceOver reads the entries under the first column of shortcuts and 
> then the entries for the column of associated actions.  This does not get 
> fixed if you convert back to HTML with pdftotext (or many other programs).
> 
> The best solution I could find, in response to a question from a member of 
> the mac-access list whose bank statement was delivered in PDF format, was to 
> get the trial version of Wondershare PDF Converter or PDF Converter Pro for 
> Mac from the developer's web site.  This had a limit of 5 pages for 
> conversion in the trial version, which was suitable for bank statements, but 
> not in general.  Note, this is not a general recommendation for that 
> software, but it worked for that specific purpose.  To read more about this 
> and usage tests, see the mail archive link for my mac-access link post at:
> • Re: Tables in PDF documents
> http://www.mail-archive.com/mac-access%40mac-access.net/msg11985.html
> 
> To get back to this specific suggestion of pdftotext, I'll post the link to 
> my earlier recommendation of pdftotext to Dan, which gave some suggestions 
> for alternatively using this either as an AppleScript or Automator action as 
> well as other notes about the application:
> • pdftotext utility [was Re: Xpdf for mac]
> http://www.mail-archive.com/macvisionaries%40googlegroups.com/msg61916.html
> 
> The AppleScript described there for HTML conversions can be adapted for 
> similar use (e.g., highlight files in Finder, then run the AppleScript to 
> batch convert files without having to use Terminal.)
> 
> I'll paste in the AppleScript below my signature starting below the line 
> "---Cut Here---" and ending with the line "end run". You can save it from the 
> AppleScript editor under a name of your choice, like "PDF to HTML".
> 
> HTH.  Cheers,
> 
> Esther
> 
> ---Cut Here---
> (*
> Use pdftotext to create an HTML version of the selected PDF file
>      Created 25 January 2013; modifeid from PDF to Text AppleScript of 17 May 
> 2011
> *)
> on run
>       tell application "Finder"
>               set chosenFile to the selection as alias
>       end tell
>       do shell script "/usr/local/bin/pdftotext -htmlmeta " & quoted form of 
> POSIX path of chosenFile
> end run
> 
> 
> 
> 
> On Jan 25, 2013, at 8:32 AM, - wrote:
> 
>> 
>> In terminal one can use the pdftotext program found at:
>> 
>> http://www.bluem.net/en/mac/packages/
>> 
>> The command to convert to html is:
>> 
>> pdftotext file.pdf -htmlmeta
>> 
>> The converted file has a html extension.  The original files are retained as 
>> pdf.
>> 
>> This can be put in a script with a loop to convert all pdf files in a 
>> directory.
>> 
>> XB
>> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "MacVisionaries" group.
> To post to this group, send email to macvisionaries@googlegroups.com.
> To unsubscribe from this group, send email to 
> macvisionaries+unsubscr...@googlegroups.com.
> Visit this group at http://groups.google.com/group/macvisionaries?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  



Have a great day,
Alex (msg sent from Mac Mini)
mehg...@gmail.com



-- 
You received this message because you are subscribed to the Google Groups 
"MacVisionaries" group.
To post to this group, send email to macvisionaries@googlegroups.com.
To unsubscribe from this group, send email to 
macvisionaries+unsubscr...@googlegroups.com.
Visit this group at http://groups.google.com/group/macvisionaries?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to