Fwd: RE: making PDFs workable

2012-09-14 Thread James Crawford



What I did to setup for the conversion Note I'm doing this on a CentOS 
5.x system

1. Add RpmForge to the YUM repo file
wget 
http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el5.rf.i386.rpm 


rpm --import http://apt.sw.be/RPM-GPG-KEY.dag.txt
rpm -K rpmforge-release-0.5.2-2.el5.rf.i386.rpm
rpm -i rpmforge-release-0.5.2-2.el5.rf.i386.rpm
2. install tesseract
yum -y install tesseract tesseract-en

This makes it possible to do the following;
gs -r300x300 -sDEVICE=tiffgray -sOutputFile=ocr_%02d.tif 
-dBATCH -dNOPAUSE 

where the options are
-r is resoultion
-sDEVICE for monocrome output
-sOutputFile=outputfilename note %02d causes the page 
number to be inserted into the filename

Followed by
tesseract inputFile outputfile -l eng
where the options are
input is the output tif files from gs
outputfile will be given a .txt extentsion
-l language of input file lish
And then put the pages back together by
>cat tess-outfile01.txt tess-outfile02.txt ... 
tess-outfilenn.txt > Input.txt


There will some failed conversion/bad guesses by the tesseract program 
so check the final output for correctness.


Bash Script to do the conversion
< This got reformatted and I attempted to put it back the way I 
remembered it.>

< the tesseract step takes a while on each page>
<<<
#!/bin/bash
# # # # # # Use this script to convert a pdf formated file to text
# The Input file will be split into single page tiff files
# which will be run through tesseract to OCR the files into
# text files. the text files will be reassimbled into a
# single text file.
# # NOTE: There will still be some cleanup of the text files
# as the OCR is not perfect.
# # # # # # # # Get Input file name, and final output filename
InFile=${1:-"infile.pdf"}
TIFFile="${InFile%.pdf}"
OutFile=${2:-"$TIFFile.txt"}
echo "Input from $InFile, OCR output to $OutFile"
if [ ! -e "$InFile" ] ; then
echo "$InFile not found. exiting"
exit 1
elsif [ ! -r "$InFile" ]
echo " Read not allowed on $InFile. exiting"
exit 1
fi
# setup a temp working area
WrkDir="/tmp/$(date +%s)"
mkdir $WrkDir
echo " Working Dir = $WrkDir"
cp $InFile $WrkDir/
Hdir=$(pwd)
cd $WrkDir
# pwd
gs -r300x300 -sDEVICE=tiffgray -sOutputFile=$TIFFile%02d.tif -dBATCH 
-dNOPAUSE $InFile >files

TifCount=$(grep "Page " files | wc -l)
rm files
#
ls -l *.tif
echo "number of pages to process = $TifCount"
for wtif in $(ls *.tif); do
wtxt=${wtif%.tif}
tesseract "$wtif" "$wtxt" -l eng
done
#
ls -l *.txt TxtFiles=$( ls *.txt )
touch $OutFile
for Tf in $TxtFiles; do
#
echo "Working on $Tf, "
cat "$Tf" >> $OutFile
done
ls -l
cp $OutFile $Hdir/
cd $Hdir
# once debuged enable the following
rm -fr $WrkDir
exit 0
>>>

James C


Pdf2txt.sh
Description: Binary data
Pdf to Text

The bash script Pdf2txt.sh located in the same directory as this file
will do all the following steps for PDFs upto 99 pages. It is also
on test.sb.state.az.us (10.168.30.100) in /home/jimc.


Convert .pdf document to single page .tif format documents
>gs -r300x300 -sDEVICE=tiffgray -sOutputFile=ocr_%02d.tif -dBATCH -dNOPAUSE 
>
-r is resoultion
-sDEVICE for monocrome output
-sOutputFile=outputfilename
note %02d causes the page number to be inserted
into the filename

next use tesseract to convert each page to text
>tesseract  inputFile outputfile -l eng
input is the output tif files from gs
outputfile will be given a .txt extentsion
-l language of input file lish

reassemble the ocr'ed .txt files into a single document
>cat tess-outfile01.txt tess-outfile02.txt ... tess-outfilenn.txt > Input.txt


on test server <10.168.30.100 test.sb.state.az.us>
I have installed tesseract using the following
yum -y install tesseract tesseract-en
using the Rpmforge repositorys
  wget 
http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el5.rf.i386.rpm
  rpm --import  http://apt.sw.be/RPM-GPG-KEY.dag.txt
  rpm -K rpmforge-release-0.5.2-2.el5.rf.i386.rpm
  rpm -i rpmforge-release-0.5.2-2.el5.rf.i386.rpm


---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss

Re: making PDFs workable

2012-09-12 Thread James Crawford

As noted earlier, none of this helps if the PDF is just a big image.

The PDF referenced below is an image exported from Xara Xtreme Pro (graphics 
software for Windows), and every test I can run on it indicates is a big image 
file; >no native text to copy.



I don't run Adobe reader, it may have some added specialization (e.g. OCR) to 
allow text to be copied.


I have some instructions on a work system that will take a pdf image, convert 
it to tiff then ocr the result to a txt file.
We had a large number of documents that they wanted as text.

I'll try to remember to post the steps and script I used on Thursday.

James C.

---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss


Re: making PDFs workable

2012-09-12 Thread JD Austin
Haven't tried this but you can probably use cuneiform, and exactimage to
create text searchable PDFs from image only PDFs and Tiffs; you can do it
via a live cd:
http://www.watchocr.com/

On Tue, Sep 11, 2012 at 6:22 PM, Joseph Sinclair
wrote:

> As noted earlier, none of this helps if the PDF is just a big image.
>
> The PDF referenced below is an image exported from Xara Xtreme Pro
> (graphics software for Windows), and every test I can run on it indicates
> is a big image file; no native text to copy.
>
> I don't run Adobe reader, it may have some added specialization (e.g. OCR)
> to allow text to be copied.
>
>
>
> On 09/11/2012 04:33 PM, Brian Cluff wrote:
> > I think I remember that you were running KDE..  If so, the Okular PDF
> viewer will allow you to copy and paste, you just need to be in selection
> mode (Don't polute your KDE install with evince).  Just click the
> "selection" icon or pick "tools -> selection" from the menu (ctrl-3 will do
> it too).
> >
> > You can also load the "libreoffice-pdfimport" package and load PDFs
> directly into openoffice.
> >
> > Also inkscape can do a VERY good to percfect job of loading a PDF, the
> quality being mostly dependent on if you have all the fonts installed that
> the PDF is using, but it can only handle a single page at a time.
> >
> > If you have been doing any of that with no luck, you might have a PDF
> where the text is actually a graphics and nothing will allow you to copy
> and paste text in it.  You best bet for those is to extract the graphics
> out of the PDF and see if one of the OCR software packages can turn it into
> text for you.
> >
> > Brian Cluff
> >
> > On 09/11/2012 02:20 PM, Michael Havens wrote:
> >> Well, the reason seems to be that 'document viewer is the default. I
> >> jusat d/l evince and can't seem to make it the default PDGF viewer. I
> >> right click on a pdf>open with>evince but it keeps opening with Document
> >> Viewer!
> >> :-)~MIKE~(-:
> >>
> >>
> >> On Tue, Sep 11, 2012 at 1:18 PM, Matt Graham  >> > wrote:
> >>
> >>  > Michael Havens wrote:
> >>  >> HOw can I make it so I can copy-n-paste the text from
> >>  >> a pdf into a oo document?
> >> From: Mark Jarvis mailto:m.jar...@cox.net>>
> >>  > The Foxitpro PDF reader allows text to be marked and copied.
> >>  > Unfortunately, it's only available for Windows. I don't know if
> >>  > there's a Linux PDF reader that has that capability.
> >>
> >> AFAICT, evince (the PDF reader that's standard for GNOME-based
> >> distros) will
> >> allow you to copy and paste text from PDFs as well.  Also remember
> >> that some
> >> PDF readers have multiple tools available, and the default tool
> might be
> >> "scroll/drag pages" not "select text".
> >>
> >> Also also remember that if the PDF doesn't actually contain text,
> >> but is a
> >> pile of images, then there will be no text to select.  The PDF that
> >> you're
> >> trying to look at doesn't have that problem, but for some reason,
> >> evince won't
> >> let you copy the text.  Acrobrat Reader will.  No, I don't know why
> >> either
> >>
> >> --
> >> Matt G / Dances With Crows
> >> The Crow202 Blog: http://crow202.org/wordpress/
> >> There is no Darkness in Eternity/But only Light too dim for us to
> see
> >>
> >> ---
> >> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> >> 
> >> To subscribe, unsubscribe, or to change your mail settings:
> >> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
> >>
> >>
> >>
> >>
> >> ---
> >> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> >> To subscribe, unsubscribe, or to change your mail settings:
> >> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
> >>
> >
> > ---
> > PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> > To subscribe, unsubscribe, or to change your mail settings:
> > http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
> >
>
>
> ---
> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss

Re: making PDFs workable

2012-09-11 Thread Joseph Sinclair
As noted earlier, none of this helps if the PDF is just a big image.

The PDF referenced below is an image exported from Xara Xtreme Pro (graphics 
software for Windows), and every test I can run on it indicates is a big image 
file; no native text to copy.

I don't run Adobe reader, it may have some added specialization (e.g. OCR) to 
allow text to be copied.



On 09/11/2012 04:33 PM, Brian Cluff wrote:
> I think I remember that you were running KDE..  If so, the Okular PDF viewer 
> will allow you to copy and paste, you just need to be in selection mode 
> (Don't polute your KDE install with evince).  Just click the "selection" icon 
> or pick "tools -> selection" from the menu (ctrl-3 will do it too).
> 
> You can also load the "libreoffice-pdfimport" package and load PDFs directly 
> into openoffice.
> 
> Also inkscape can do a VERY good to percfect job of loading a PDF, the 
> quality being mostly dependent on if you have all the fonts installed that 
> the PDF is using, but it can only handle a single page at a time.
> 
> If you have been doing any of that with no luck, you might have a PDF where 
> the text is actually a graphics and nothing will allow you to copy and paste 
> text in it.  You best bet for those is to extract the graphics out of the PDF 
> and see if one of the OCR software packages can turn it into text for you.
> 
> Brian Cluff
> 
> On 09/11/2012 02:20 PM, Michael Havens wrote:
>> Well, the reason seems to be that 'document viewer is the default. I
>> jusat d/l evince and can't seem to make it the default PDGF viewer. I
>> right click on a pdf>open with>evince but it keeps opening with Document
>> Viewer!
>> :-)~MIKE~(-:
>>
>>
>> On Tue, Sep 11, 2012 at 1:18 PM, Matt Graham > > wrote:
>>
>>  > Michael Havens wrote:
>>  >> HOw can I make it so I can copy-n-paste the text from
>>  >> a pdf into a oo document?
>> From: Mark Jarvis mailto:m.jar...@cox.net>>
>>  > The Foxitpro PDF reader allows text to be marked and copied.
>>  > Unfortunately, it's only available for Windows. I don't know if
>>  > there's a Linux PDF reader that has that capability.
>>
>> AFAICT, evince (the PDF reader that's standard for GNOME-based
>> distros) will
>> allow you to copy and paste text from PDFs as well.  Also remember
>> that some
>> PDF readers have multiple tools available, and the default tool might be
>> "scroll/drag pages" not "select text".
>>
>> Also also remember that if the PDF doesn't actually contain text,
>> but is a
>> pile of images, then there will be no text to select.  The PDF that
>> you're
>> trying to look at doesn't have that problem, but for some reason,
>> evince won't
>> let you copy the text.  Acrobrat Reader will.  No, I don't know why
>> either
>>
>> --
>> Matt G / Dances With Crows
>> The Crow202 Blog: http://crow202.org/wordpress/
>> There is no Darkness in Eternity/But only Light too dim for us to see
>>
>> ---
>> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
>> 
>> To subscribe, unsubscribe, or to change your mail settings:
>> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>>
>>
>>
>>
>> ---
>> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
>> To subscribe, unsubscribe, or to change your mail settings:
>> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>>
> 
> ---
> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
> 



signature.asc
Description: OpenPGP digital signature
---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss

Re: making PDFs workable

2012-09-11 Thread Brian Cluff
I think I remember that you were running KDE..  If so, the Okular PDF 
viewer will allow you to copy and paste, you just need to be in 
selection mode (Don't polute your KDE install with evince).  Just click 
the "selection" icon or pick "tools -> selection" from the menu (ctrl-3 
will do it too).


You can also load the "libreoffice-pdfimport" package and load PDFs 
directly into openoffice.


Also inkscape can do a VERY good to percfect job of loading a PDF, the 
quality being mostly dependent on if you have all the fonts installed 
that the PDF is using, but it can only handle a single page at a time.


If you have been doing any of that with no luck, you might have a PDF 
where the text is actually a graphics and nothing will allow you to copy 
and paste text in it.  You best bet for those is to extract the graphics 
out of the PDF and see if one of the OCR software packages can turn it 
into text for you.


Brian Cluff

On 09/11/2012 02:20 PM, Michael Havens wrote:

Well, the reason seems to be that 'document viewer is the default. I
jusat d/l evince and can't seem to make it the default PDGF viewer. I
right click on a pdf>open with>evince but it keeps opening with Document
Viewer!
:-)~MIKE~(-:


On Tue, Sep 11, 2012 at 1:18 PM, Matt Graham mailto:danceswithcr...@usa.net>> wrote:

 > Michael Havens wrote:
 >> HOw can I make it so I can copy-n-paste the text from
 >> a pdf into a oo document?
From: Mark Jarvis mailto:m.jar...@cox.net>>
 > The Foxitpro PDF reader allows text to be marked and copied.
 > Unfortunately, it's only available for Windows. I don't know if
 > there's a Linux PDF reader that has that capability.

AFAICT, evince (the PDF reader that's standard for GNOME-based
distros) will
allow you to copy and paste text from PDFs as well.  Also remember
that some
PDF readers have multiple tools available, and the default tool might be
"scroll/drag pages" not "select text".

Also also remember that if the PDF doesn't actually contain text,
but is a
pile of images, then there will be no text to select.  The PDF that
you're
trying to look at doesn't have that problem, but for some reason,
evince won't
let you copy the text.  Acrobrat Reader will.  No, I don't know why
either

--
Matt G / Dances With Crows
The Crow202 Blog: http://crow202.org/wordpress/
There is no Darkness in Eternity/But only Light too dim for us to see

---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us

To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss




---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss



---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss


Re: making PDFs workable

2012-09-11 Thread Michael Havens
Well, the reason seems to be that 'document viewer is the default. I jusat
d/l evince and can't seem to make it the default PDGF viewer. I right click
on a pdf>open with>evince but it keeps opening with Document Viewer!
:-)~MIKE~(-:


On Tue, Sep 11, 2012 at 1:18 PM, Matt Graham wrote:

> > Michael Havens wrote:
> >> HOw can I make it so I can copy-n-paste the text from
> >> a pdf into a oo document?
> From: Mark Jarvis 
> > The Foxitpro PDF reader allows text to be marked and copied.
> > Unfortunately, it's only available for Windows. I don't know if
> > there's a Linux PDF reader that has that capability.
>
> AFAICT, evince (the PDF reader that's standard for GNOME-based distros)
> will
> allow you to copy and paste text from PDFs as well.  Also remember that
> some
> PDF readers have multiple tools available, and the default tool might be
> "scroll/drag pages" not "select text".
>
> Also also remember that if the PDF doesn't actually contain text, but is a
> pile of images, then there will be no text to select.  The PDF that you're
> trying to look at doesn't have that problem, but for some reason, evince
> won't
> let you copy the text.  Acrobrat Reader will.  No, I don't know why
> either
>
> --
> Matt G / Dances With Crows
> The Crow202 Blog:  http://crow202.org/wordpress/
> There is no Darkness in Eternity/But only Light too dim for us to see
>
> ---
> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss

Re: making PDFs workable

2012-09-11 Thread Matt Graham
> Michael Havens wrote:
>> HOw can I make it so I can copy-n-paste the text from
>> a pdf into a oo document?
From: Mark Jarvis 
> The Foxitpro PDF reader allows text to be marked and copied.
> Unfortunately, it's only available for Windows. I don't know if
> there's a Linux PDF reader that has that capability.

AFAICT, evince (the PDF reader that's standard for GNOME-based distros) will
allow you to copy and paste text from PDFs as well.  Also remember that some
PDF readers have multiple tools available, and the default tool might be
"scroll/drag pages" not "select text".

Also also remember that if the PDF doesn't actually contain text, but is a
pile of images, then there will be no text to select.  The PDF that you're
trying to look at doesn't have that problem, but for some reason, evince won't
let you copy the text.  Acrobrat Reader will.  No, I don't know why
either

-- 
Matt G / Dances With Crows
The Crow202 Blog:  http://crow202.org/wordpress/
There is no Darkness in Eternity/But only Light too dim for us to see

---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss


Re: making PDFs workable

2012-09-11 Thread JD Austin
If the adobe reader doesn't do it use calibre to convert it to html (I
usually do htmlz and then extract the archive to a folder).
http://calibre-ebook.com/download

On Tue, Sep 11, 2012 at 12:17 PM, Mark Jarvis  wrote:

>
> The Foxitpro PDF reader allows text to be marked and copied.
> Unfortunately, it's only available for Windows. I don't know if there's a
> Linux PDF reader that has that capability.
>
> -mj-
>
>
> Michael Havens wrote:
>
> HOw can I make it so I can copy-n-paste the text from a pdf into a oo
> document?
> :-)~MIKE~(-:
>
>
> On Tue, Feb 21, 2012 at 7:32 AM, Sam Kreimeyer  wrote:
>
>> Here's a pdf of a quick guide to regular expressions
>> http://www.addedbytes.com/download/regular-expressions-cheat-sheet-v1/pdf/
>>
>> Basically, it's a format for defining search patterns that supports
>> special meanings for certain characters. For instance:
>>
>> a - finds any string like "a"
>> a. - finds any string like "a" plus any other character except a new line
>> (matches "aa", "ab", "ac", etc)
>> a.* - finds any string like "a" plus zero or more characters except a new
>> line (matches "aa", "abcdefghijk")
>> Other special characters can further modify this behavior.
>>
>> So here's an explanation of the earlier command.
>>
>> 's/\.JPG$/.jpg/' *.JPG
>>
>> Basic search and replace format s/[string we search for]/[string to
>> replace matches with]/
>>
>> "\.JPG$" - Because "." is special, we escape it with "\" to keep the
>> regex from interpreting it, so the "." will be treated literally. "JPG" is
>> what we're looking for. Placing a "$" at the end of the string tells the
>> regex to match the string only at the end of the strings you're searching.
>> This means that you will match "example.JPG" but not "JPG.example".
>>
>> ".jpg" - This is our replacement string. This is what goes in the place
>> of every match we find.
>>
>> "*.JPG" - while this isn't part of the regex, "*" is a wildcard (can be
>> substituted for any number of characters).
>>
>> Hope that helps!
>>
>> ---
>> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
>> To subscribe, unsubscribe, or to change your mail settings:
>> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>>
>
>
>
> ---
> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail 
> settings:http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
>
>
> ---
> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss

Re: making PDFs workable

2012-09-11 Thread Michael Havens
thanks for letting me know about that. Now we'll just wait and see if
someone else chimes in!
:-)~MIKE~(-:


On Tue, Sep 11, 2012 at 12:17 PM, Mark Jarvis  wrote:

>
> The Foxitpro PDF reader allows text to be marked and copied.
> Unfortunately, it's only available for Windows. I don't know if there's a
> Linux PDF reader that has that capability.
>
> -mj-
>
>
> Michael Havens wrote:
>
> HOanks w can I make it so I can copy-n-paste the text from a pdf into a oo
> document?
> :-)~MIKE~(-:
>
>
> On Tue, Feb 21, 2012 at 7:32 AM, Sam Kreimeyer  wrote:
>
>> Here's a pdf of a quick guide to regular expressions
>> http://www.addedbytes.com/download/regular-expressions-cheat-sheet-v1/pdf/
>>
>> Basically, it's a format for defining search patterns that supports
>> special meanings for certain characters. For instance:
>>
>> a - finds any string like "a"
>> a. - finds any string like "a" plus any other character except a new line
>> (matches "aa", "ab", "ac", etc)
>> a.* - finds any string like "a" plus zero or more characters except a new
>> line (matches "aa", "abcdefghijk")
>> Other special characters can further modify this behavior.
>>
>> So here's an explanation of the earlier command.
>>
>> 's/\.JPG$/.jpg/' *.JPG
>>
>> Basic search and replace format s/[string we search for]/[string to
>> replace matches with]/
>>
>> "\.JPG$" - Because "." is special, we escape it with "\" to keep the
>> regex from interpreting it, so the "." will be treated literally. "JPG" is
>> what we're looking for. Placing a "$" at the end of the string tells the
>> regex to match the string only at the end of the strings you're searching.
>> This means that you will match "example.JPG" but not "JPG.example".
>>
>> ".jpg" - This is our replacement string. This is what goes in the place
>> of every match we find.
>>
>> "*.JPG" - while this isn't part of the regex, "*" is a wildcard (can be
>> substituted for any number of characters).
>>
>> Hope that helps!
>>
>> ---
>> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
>> To subscribe, unsubscribe, or to change your mail settings:
>> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>>
>
>
>
> ---
> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail 
> settings:http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
>
>
> ---
> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss

Re: making PDFs workable

2012-09-11 Thread Mark Jarvis

  
  

The Foxitpro PDF reader allows text to be marked and copied.
Unfortunately, it's only available for Windows. I don't know if
there's a Linux PDF reader that has that capability.

-mj-

Michael Havens wrote:

HOw can I make it so I can copy-n-paste the text from
  a pdf into a oo document?
  :-)~MIKE~(-:
  
  
  On Tue, Feb 21, 2012 at 7:32 AM, Sam
Kreimeyer 
wrote:

  Here's a pdf of a quick guide to regular expressions
  http://www.addedbytes.com/download/regular-expressions-cheat-sheet-v1/pdf/
  
  Basically, it's a format for defining search patterns that
  supports special meanings for certain characters. For
  instance:
  
  a - finds any string like "a"
  a. - finds any string like "a" plus any other character except
  a new line (matches "aa", "ab", "ac", etc)
  a.* - finds any string like "a" plus zero or more characters
  except a new line (matches "aa", "abcdefghijk")
  Other special characters can further modify this behavior.
  
  So here's an explanation of the earlier command.
  
  's/\.JPG$/.jpg/' *.JPG
  
  Basic search and replace format s/[string we search
  for]/[string to replace matches with]/
  
  "\.JPG$" - Because "." is special, we escape it with "\" to
  keep the regex from interpreting it, so the "." will be
  treated literally. "JPG" is what we're looking for. Placing a
  "$" at the end of the string tells the regex to match the
  string only at the end of the strings you're searching. This
  means that you will match "example.JPG" but not "JPG.example".
  
  ".jpg" - This is our replacement string. This is what goes in
  the place of every match we find.
  
  "*.JPG" - while this isn't part of the regex, "*" is a
  wildcard (can be substituted for any number of characters).
  
  Hope that helps!
  
  ---
  PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
  To subscribe, unsubscribe, or to change your mail settings:
  http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss

  
  
  
  
  
  ---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss


  


---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss