Re: Copying from pdf (was Re: GSOC 2014 project list: on LyX<-->docx roundtrip conversion)

2014-02-11 Thread stefano franchi
On Mon, Feb 10, 2014 at 2:25 PM, Andrew Parsloe wrote:

>
>
> On 11/02/2014 2:59 a.m., stefano franchi wrote:
>
>>
>>
>> I had to convert a ~50,000 words book from LyX to Word last month
>> and it took me 2 full days. I think I tried all exporters known to men
>> (and women). They all failed to various degrees. In the end, I had
>> better luck converting the file from the pdf (!) output to word and
>> then reinserting manually all footnotes (all 450 of them).  I am facing
>> the prospect of converting a 200,000 words manuscript in a few months
>> and I am already sweating at night at the very idea. <\rant>
>>
>>
>> Cheers,
>>
>> Stefano
>>
>>
> Did you just copy & paste from the pdf? That's something I've done before.
> The main problem is always that each line on the page in the pdf ends up as
> a separate paragraph in the pasted text in Word. How did you handle that?
>
> I wrote a macro in Word to join up the lines into paragraphs, judging the
> end of a paragraph by the existence of a shorter line -- which obviously
> fails sometimes. (Copying a pdf followed by paste special has the same
> problem in LyX. I have an unfinished script, for the pLyX system, to do the
> same in LyX.)
>
>

No I didn't copy and paste. That would have been even worse. In addition
to  the problem of line-paragraphs you also face the problem of headers and
footers, hyphenation, etc. I guess I could have produced a pdf with no
hyphenation, no headrs no footers, etc before trying the conversion, but I
didn't do that. I used on of the many pdf-to-word or pdf-to-odt utilities
available online. I cannot actually remember which one, to be frank. I
tried several until I got a reasonable output. I still had to do some
cleaning, as some (but not all) apostrophe were lost, and, as I mentioned,
all footnotes came through but as text and not as footnote.


S.


-- 
__
Stefano Franchi
Associate Research Professor
Department of Hispanic Studies Ph:   +1 (979) 845-2125
Texas A&M University  Fax:  +1 (979) 845-6421
College Station, Texas, USA

stef...@tamu.edu
http://stefano.cleinias.org


Copying from pdf (was Re: GSOC 2014 project list: on LyX<-->docx roundtrip conversion)

2014-02-10 Thread Andrew Parsloe



On 11/02/2014 2:59 a.m., stefano franchi wrote:



I had to convert a ~50,000 words book from LyX to Word last month
and it took me 2 full days. I think I tried all exporters known to men
(and women). They all failed to various degrees. In the end, I had
better luck converting the file from the pdf (!) output to word and
then reinserting manually all footnotes (all 450 of them).  I am facing
the prospect of converting a 200,000 words manuscript in a few months
and I am already sweating at night at the very idea. <\rant>


Cheers,

Stefano



Did you just copy & paste from the pdf? That's something I've done 
before. The main problem is always that each line on the page in the pdf 
ends up as a separate paragraph in the pasted text in Word. How did you 
handle that?


I wrote a macro in Word to join up the lines into paragraphs, judging 
the end of a paragraph by the existence of a shorter line -- which 
obviously fails sometimes. (Copying a pdf followed by paste special has 
the same problem in LyX. I have an unfinished script, for the pLyX 
system, to do the same in LyX.)


---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com