Re: [gentoo-user] Join plain text paragraphs
* on the Tue, Jun 13, 2006 at 07:28:17AM +0100, David Morgan said: > On 22:46 Mon 12 Jun , JimD wrote: > > David Morgan wrote: > > > On 18:53 Mon 12 Jun , JimD wrote: > > >> Sweet. Thanks for the tips. I need to start using OOo more ;-) > > > > > > No need. > > > > > > sed -e :a -e '$!N;s/\n[^$]//;ta' -e 'p;D' filename > > > > Close. It is removing the first character of every paragraph. I am > > trying to digitize my book collection. For example, here is a test > > output from Narnia - The Magician's Nephew: > > Indeed - didn't my corrected version get through? I received it before I > received your reply anyway. > > sed -e :a -e '$!N;s/\n\([^$]\)/\1/;ta' -e 'p;D' filename Almost perfect. It now joins the lines without removing the first character. However, There is now no space between the joined lines. For example: CHAPTER ONE THE WRONG DOOR becomes CHAPTER ONETHE WRONG DOOR I added space to the end of all lines, except blank lines and now it gets me pretty much what I was looking for. Thanks, Jim -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Join plain text paragraphs
On 22:46 Mon 12 Jun , JimD wrote: > David Morgan wrote: > > On 18:53 Mon 12 Jun , JimD wrote: > >> Sweet. Thanks for the tips. I need to start using OOo more ;-) > > > > No need. > > > > sed -e :a -e '$!N;s/\n[^$]//;ta' -e 'p;D' filename > > Close. It is removing the first character of every paragraph. I am > trying to digitize my book collection. For example, here is a test > output from Narnia - The Magician's Nephew: Indeed - didn't my corrected version get through? I received it before I received your reply anyway. sed -e :a -e '$!N;s/\n\([^$]\)/\1/;ta' -e 'p;D' filename -- Join The no2id Coalition, http://www.no2id.net/ djm -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Join plain text paragraphs
David Morgan wrote: > On 18:53 Mon 12 Jun , JimD wrote: >> Sweet. Thanks for the tips. I need to start using OOo more ;-) > > No need. > > sed -e :a -e '$!N;s/\n[^$]//;ta' -e 'p;D' filename Close. It is removing the first character of every paragraph. I am trying to digitize my book collection. For example, here is a test output from Narnia - The Magician's Nephew: === cut here == CHAPTER ONE THE WRONG DOOR This is a story about something that happened long ago when your grandfather was a child. It is a very important story because it shows how all the comings and goings between our own world and the land of Narnia first began. In those days Mr Sherlock Holmes was still living in Baker Street and the Bastables were looking for treasure in the Lewisham Road. In those days, if you were a boy you had to wear a stiff Eton collar every day, and schools were usually nastier than now. But meals were nicer; and as for sweets, I won't tell you how cheap and good they were, because it would only make your mouth water in vain. And in those days there lived in London a girl called Polly Plummer. She lived in one of a long row of houses which were all joined together. One morning she was out in the back garden when a boy scrambled up from the garden next door and put his face over the wall. Polly was very surprised because up till now there had never been any children in that house, but only Mr Ketterley and Miss Ketterley, a brother and sister, old bachelor and old maid, living together. So she looked up, full of curiosity. The face of the strange boy was very grubby. It could hardly have been grubbier if he had first rubbed his hands in the earth, and then had a good cry, and then dried his face with his hands. As a matter of fact, this was very nearly what he had been doing. === cut here == Jim -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- You roll an 18 in Dex and see if you don't end up with a girlfriend =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- JimD Central FL, USA, Earth, Sol -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Join plain text paragraphs
On 00:13 Tue 13 Jun , David Morgan wrote: > sed -e :a -e '$!N;s/\n[^$]//;ta' -e 'p;D' filename Gosh, what was I thinking? sed -e :a -e '$!N;s/\n\([^$]\)/\1/;ta' -e 'p;D' filename I expect there's a slightly nicer way, but I'm tired and I have an exam in the morning... -- Join The no2id Coalition, http://www.no2id.net/ djm -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Join plain text paragraphs
On 18:53 Mon 12 Jun , JimD wrote: > Sweet. Thanks for the tips. I need to start using OOo more ;-) No need. sed -e :a -e '$!N;s/\n[^$]//;ta' -e 'p;D' filename -- Join The no2id Coalition, http://www.no2id.net/ djm -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Join plain text paragraphs
Alan McKinnon wrote: > On Monday 12 June 2006 19:22, JimD wrote: >> I have an MS Word "HTML" file. I used Lynx to dump it to text and >> now I want to get it to pdf. I opened it in OOo and saved as an >> OpenDocument. However, all the paragraphs are hard wrapped at 80 >> characters so the text does not take up the whole page. >> >> Is there an easy way to go through the 100+ pages and just join the >> lines of each paragraph so that they will be flowed correctly in >> OOo? >> >> I have the dumped text file and the OOo file and both have the >> paragraphs hard wrapped at column 80. I would think there would >> have to be some simple tool out there to go through the plain text >> file and just join all the lines of a paragraph, no? > > You already have a OOo file so that's a good place to start. > > First, check on Tools -> Autocorrect -> Options that "Remove blank > paragraphs" is checked. Then highlight all the text you want to > modify and do Format -> AutoFormat -> Apply. > > This should remove hard line returns in the middle of paras then > remove blank paras. Then print to pdf. Sweet. Thanks for the tips. I need to start using OOo more ;-) Jim -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- You roll an 18 in Dex and see if you don't end up with a girlfriend =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- JimD Central FL, USA, Earth, Sol -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Join plain text paragraphs
On Monday 12 June 2006 19:22, JimD wrote: > I have an MS Word "HTML" file. I used Lynx to dump it to text and > now I want to get it to pdf. I opened it in OOo and saved as an > OpenDocument. However, all the paragraphs are hard wrapped at 80 > characters so the text does not take up the whole page. > > Is there an easy way to go through the 100+ pages and just join the > lines of each paragraph so that they will be flowed correctly in > OOo? > > I have the dumped text file and the OOo file and both have the > paragraphs hard wrapped at column 80. I would think there would > have to be some simple tool out there to go through the plain text > file and just join all the lines of a paragraph, no? You already have a OOo file so that's a good place to start. First, check on Tools -> Autocorrect -> Options that "Remove blank paragraphs" is checked. Then highlight all the text you want to modify and do Format -> AutoFormat -> Apply. This should remove hard line returns in the middle of paras then remove blank paras. Then print to pdf. -- If only me, you and dead people understand hex, how many people understand hex? Alan McKinnon alan at linuxholdings dot co dot za +27 82, double three seven, one nine three five -- gentoo-user@gentoo.org mailing list
[gentoo-user] Join plain text paragraphs
I have an MS Word "HTML" file. I used Lynx to dump it to text and now I want to get it to pdf. I opened it in OOo and saved as an OpenDocument. However, all the paragraphs are hard wrapped at 80 characters so the text does not take up the whole page. Is there an easy way to go through the 100+ pages and just join the lines of each paragraph so that they will be flowed correctly in OOo? I have the dumped text file and the OOo file and both have the paragraphs hard wrapped at column 80. I would think there would have to be some simple tool out there to go through the plain text file and just join all the lines of a paragraph, no? Thanks, Jim -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- You roll an 18 in Dex and see if you don't end up with a girlfriend =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- JimD Central FL, USA, Earth, Sol -- gentoo-user@gentoo.org mailing list