PDF to Text

2007-07-13 Thread Mike Lesser

Hi all. Like it says, I need to extract the content of a PDF file.

I installed the tool pdftotext, and it works fine for my needs. I  
recall there was a very simple module that used this to extract text,  
but for the life of me, I can't find it on CPAN! Any leads? Using a  
command-line script in my own code makes me feel icky, but I guess  
I'll deal...


thx
Mike

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Paths, Spaces, Getopt::Long

2007-06-03 Thread Mike Lesser
Hi all. I have a problem that _must_ have a very simple solution  
(that I can't find).


I use the module Getopt::Long to read arguments, one of which is a  
file path that may have spaces. The path string that is returned from  
Getopt has spaces without escape chars.  The string seems to be fine  
for Perl use, but not so great for other things, such as the Shell  
module, which can't handle the spaces.


I have to assume that paths can be converted easily for use in shells  
and such, without resorting to RegEx. Any ideas?


Mike



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Fwd: Paths, Spaces, Getopt::Long

2007-06-03 Thread Mike Lesser


Well I'm not sure. I may be explaining this badly. I'll go thru all  
the details in case it helps.


The path I pass when I'm executing the script is escaped, which I  
assume is correct.


Once that path is read by Getopt, I print it and, voila, no escapes,  
just nice-to-read spaces.


This path gets a filename appended as if it were a regular string,  
and is used to when I make a file (via another module). This file is  
created  written just fine. This made me assume all was well, and  
that Perl or the modules covered all the issues with spaces. I now  
realize this may have been naive.


Then I attempted to use Tidy, sans HTML::Tidy, through Shell. The  
HTML::Tidy lib won't work on my system. So, I have been futzing with  
tidy and I'v e discovered that tidy and simple commands like cd fail,  
most likely because of the spaces in my paths.


For example, here's the path I pass to the script (no quotes):
/Users/mike/Airline\ Sheets/Original\ Schedules/UnitedJune.html

Here's the path as found via File::Basename/fileparse:
/Users/mike/Projects/Omni/Airline Sheets/Original Schedules/

My script uses modules that create files based on this path, and it  
seems okay. If however I try to  use the path with say,  
the Shell mod, it fails. This is what cd returns:


/Users/mike/Projects/Omni/Airline: No such file or directory
.
I need to use the Shell because I need to run tidy, locally.


I fear that you're using the Shell module for more than it was
intended to do, perhaps because you don't know about system().



That may very well be the case!

One easy solution may be to give a list of arguments to system(). The
first is the name of the program you're trying to run, the rest of the
list are the command-line arguments to give it. You don't need to
escape anything, because the strings are passed as-is.

 # use the system's chmod command on a list of filenames
 system chmod, u+w, @filenames;

Hope this helps!

--Tom Phoenix
Stonehenge Perl Training




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Fwd: Paths, Spaces, Getopt::Long

2007-06-03 Thread Mike Lesser



Begin forwarded message:

From: Mike Lesser [EMAIL PROTECTED]
Date: June 3, 2007 3:48:56 PM EDT
To: Chas Owens [EMAIL PROTECTED]
Subject: Re: Paths, Spaces, Getopt::Long

On Jun 3, 2007, at 1:59 PM, Chas Owens wrote:


On 6/3/07, Mike Lesser [EMAIL PROTECTED] wrote:
snip

I have to assume that paths can be converted easily for use in shells
and such, without resorting to RegEx. Any ideas?

snip

Aside from the multi argument version of system that Tom has already
mentioned, the bigger question is Why are you running code outside of
Perl?  Often people think they need to say things like



The script needs to use tidy to strip garbage from an html file prior  
to reading it. It's a file automatically generated by another company  
and it's filled with junk, hence no chance to fix it at the source.


The HTML::Tidy module would be fine but it doesn't pass testing on my  
box, and won't work with a forced install. I took a look and found  
that that seems to be a recurring problem on OS X 10.4. I haven't yet  
looked thru the code to determine the source of the problem as it  
seemed that running either Shell or system () was an interesting  
thing to learn. I might have been wrong there!


I've had success running hard coded paths and stuff, but now see that  
there's this space problem, which I didn't realize since Perl was  
handling paths nicely all by itself!




system rm -rf $path;
system mkdir $path;
system chmod 666 $path;

when they could just as easily say

use File::Path;
use File::chmod;

rmtree $path;
mkpath $path;
chmod 0666, $path;




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Paths, Spaces, Getopt::Long

2007-06-03 Thread Mike Lesser


On Jun 3, 2007, at 1:59 PM, Chas Owens wrote:


On 6/3/07, Mike Lesser [EMAIL PROTECTED] wrote:
snip

I have to assume that paths can be converted easily for use in shells
and such, without resorting to RegEx. Any ideas?

snip

Aside from the multi argument version of system that Tom has already
mentioned, the bigger question is Why are you running code outside of
Perl?  Often people think they need to say things like

system rm -rf $path;
system mkdir $path;
system chmod 666 $path;



My intent is to keep it within Perl, but I seem to be going further  
outside of it due to this problem. I'm involved in all sorts of Perl- 
unrelated nonsense.


I've taken another look at HTML::Tidy, and it appears that there are  
some critical issues with the version of tidylib that's on Mac OS X,  
including the version number. Tidy's included, which is nice, but  
fink is intent on keeping the same (old) version.


I figure, get the latest from CVS, but there's been some problems  
there (probably due to me having never used it before). Rebuilding a  
new version of the lib hasn't been successful yet due to a variety of  
problems (but I may solve them tonite). Thus writing an Xsub to it  
(another thing for me to learn) seems excessive. Now I'm really far out!


Roadblocks everywhere!

I'm thinking that a little RegEx might just serve me better even  
though it's reinventing the wheel. This script is intended for use  
with a file from a specific vendor, which has it's own quirks (it  
looks like someone set it an exporter 10 years ago and then left the  
company) that make no sense. It's not _so_ bad to make some custom  
code, is it? Yuck.





--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Paths, Spaces, Getopt::Long

2007-06-03 Thread Mike Lesser

Okay, I eliminated the tidy with some more robust regex. D'oh!

Case closed!

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




testing return values

2007-04-22 Thread Mike Lesser
Hiya. I'm looking for the correct Perl style for testing and storing  
a return value in a control statement. The solution in any other  
language is pretty obvious, but I get the distinct impression that  
there's a 'right' way in Perl...


Let's say I want to test a scalar returned from a subroutine, and  
also keep a copy for my own use:


 $scalar = sub( $argument );

 if( $scalar ){
 }

Naturally that's no big deal. Now let's say I have a tree I want to  
traverse, or some similar open-ended thing to evaluate, and want to  
run it until a condition is reached..


 while( read_tree( $argument ){
 }

Again no biggie. The problem is if I want to keep the result.  
Obviously I can't do this:


while( $tree_element = read_tree( $argument ) ){
   do_something( $tree_element );
}

I can come up with a brute-force solution of course, but there's  
probably a better, Perlish way that I'm not aware of. In addition, I  
don't expect a return value from some module to be consistently  
undefined or zero; it could change under some circumstances. This  
makes me think that the problem has been dealt with long ago, and  
just doesn't stick out in the llama/alpaca/whatever books.


Hopefully I explained this correctly!







--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Diagnosing use/lib problems

2007-03-27 Thread Mike Lesser


On Mar 26, 2007, at 7:32 PM, Tom Phoenix wrote:


On 3/26/07, Mike Lesser [EMAIL PROTECTED] wrote:


Hi all. First time using subroutines in external files. I've had some
sporadic success with some simple libs (not modules), but can't seem
to get consistent results.


What do you mean by sporadic success? Does something work only on
some invocations of your program? Once you can get it to work, doesn't
it keep working?


I've been able to use FindBin as described in the alpaca book to load a
single file. When I try to use another - almost identical - file, it  
fails.

I've tried FindLib-again as well as pretty much everything I've seen
online. There's a level of



use FindBin qw($Bin);
use lib $Bin/Libs;


'use lib' doesn't load your library files; it just tells Perl where to
find them. You still need to load them (either with require or use).
You could put code like this after those two lines; the BEGIN block
ensures that the external subroutines are compiled before the
following code begins compilation.

 BEGIN {
   require first.pl;
   require another.pl;
   require one_more.pl;
 }



It works! Thanks. That's what I didn't glean from the books. I think the
stuff I've seen online is either outdated or simply unsuited to multiple
files. Makes sense, since modules is the way to go, apparently.


yeah, they're executable, and yeah, I've tried several approaches.


Modules and libraries don't normally need to be marked as executable.

Hope this helps!

--Tom Phoenix
Stonehenge Perl Training



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Diagnosing use/lib problems

2007-03-26 Thread Mike Lesser
Hi all. First time using subroutines in external files. I've had some  
sporadic success with some simple libs (not modules), but can't seem  
to get consistent results.


I've got a few subs files ending in 'pl' in a folder within my  
scripts' folder:


UnitedScripts (dir)
myScript.pl
Libs (dir)
  Lib1.pl, etc

I'm using the following (at the moment) to include them:

use FindBin qw($Bin);
use lib $Bin/Libs;

I consistently get errors like so: Undefined subroutine  
main::route_get_hash called at ./ReadSingleTable.pl line 114.


I first assumed is was an @INC problem (I know that much), but  
printing the contents of @INC clearly shows my path is included:


/Users/mike/Code/Perl/UnitedScripts/Libs
/System/Library/Perl/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/5.8.6
...and so on

yeah, they're executable, and yeah, I've tried several approaches.  
Still no success. Anyone have any idea? Thanks.



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: perl module to create pdf reports

2005-04-30 Thread Mike Lesser
Whoops! Top-post!
Chris,
Yes, I did search CPAN and found a lot of interesting modules.
I will mention that I had asked for ideas and insights, over and above 
of
just plain module names. I guess you often assume people do not search
CPAN, and hence I can understand the frustration your email seems to
portray. I do value your comments.
As far as free solutions, the Big Badass is PDF::API2. There are many 
users,
a zillion modules, and a mailing list. There are several modules that 
are
built on top of this, to relieve the difficulty of using it. I'm not so 
sure they
make sense to use.

The problem is that it has only very recently started to include 
documentation.
It's as complex as PDF itself, and assumes that you are familiar with 
the gigantic
PDF Reference.  The author knows PDF inside and out, but IMHO doesn't 
get that
his user base is struggling, and needs dox more than they need 
attittude.

Making very simple documents is easy, using a couple of the simpler 
modules,
but once you want to stretch out it becomes very heavy. The samples are 
all
minimal, and don't always address a solution that anyone would want to
use. [Some of them have even used outdated APIs.]

The O'Reilly book Web Graphics Programming has a chapter on using the
module, but they are simplistic examples for the most part. However, 
when they
give you soup, you take the soup.

The author is starting to add dox, so that's a step in the right 
direction. I think the
best thing to do is to plan on a big learning curve, including (at 
least) learning the
terminology and models from the PDF Reference (which is online).

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response



CGIs CSS - References

2005-04-30 Thread Mike Lesser
Hi gang
I'm working on my first CGIs, and have been adding stylesheets. I'm 
currently working
on my own machine, under localhost.  I have two questions;

(1) What's the contemporary way to print xhtml from my CGI? Should I 
use CGI.pm,
or something else? I currently just have a mess of print commands.

(2) My stylesheets don't work with @import - Apache claims that it 
can't find the file.
They _do_ work with a hard reference like 			
	link rel=stylesheet type=text/css 
href=http://localhost/the_stylesheet.css; /

I've been advised that @import is the way to go, and either way I need 
relative references
to the stylesheets.  I can only assume that it's an http.conf thing 
that I haven't been able
to google up.

Thanks
Mike
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response



Re: CGIs CSS - References

2005-04-30 Thread Mike Lesser
Current versions of CGI.pm should generate XHTML by default, iirc.
Hmm. Does it include the doctype stuff and so forth? I seem to be  
getting
along fine with print qq() today, but I figured CGI.pm would help with  
tables
and links and so forth. I guess there's no need to settle on using just
CGI.pm, right? Using both CGI.pm and print qq isn't considered gross?


(2) My stylesheets don't work with @import - Apache claims that it
can't find the file. They _do_ work with a hard reference like
link rel=stylesheet type=text/css  
href=http://localhost/the_stylesheet.css; /
If you really really want to have them in an @import call in the HTML,
reconsider. The link ... approach should be fine. If you still want
to do it this way, please clarify how you're trying to implement it.
[The only reason I'm trying it is because the book (my first CSS book)  
uses it
in the examples; it says that older browsers won't choke on the  
stylesheets. Frankly,
I don't care about any older browsers at the moment! Also I don't even  
know where
this @import symbol comes from! If the link  tag method is considered to
be sufficient, then hey, I'm fine with that for now.]

However I think I have bigger issues. My stylesheet link tags only seem  
to work when
I make a complete url to the stylesheet in my personal web space - a  
partial
ref, or a full ref to the global web space doesn't work. Strangely  
enough, a javascript
ref works just fine!

[Note that my machine has a global cgi folder aliased as  
/localhost/cgi-bin/, as well as
a global web folder at /localhost/. I also can serve pages (only) from  
my user space at
/localhost/~mike/]

Can you construct a simple HTML file with the same @import call that
fails in the same way your CGI script does? If so, then CGI is ruled  
out
as a culprit and the problem is with the URL, or with Apache; if the
page works that way, then something your CGI is doing is breaking.

Ignoring the import issue for the moment...
I _definitely_ can take the CGI-created source and make it a static web  
page, and it
will find the stylesheet (by just its name) just fine - no problems at  
all. This is
true for both the global web folder and my personal one.(Naturally the  
stylesheets and
js files are in both places). This makes me think it's Apache. I'm  
trying to make
a static page that _doesn't_ work.

Inside my head tag I have this.
			
link rel=stylesheet type=text/css
	href=http://localhost/~mike/07_p188_adding_separator_stripes.css;  
//link
script src=http://localhost/07_p188_adding_separator_stripes.js;
 type=text/javascript/script

(and yeah, the goofy file names are from the book examples' site)
Now if I have a full href to the stylesheet - http://localhost/that  
giant name - it won't be
used and the Apache log will reflect this. Likewise with a partial href  
to the file. But the
javascript src tag seems to be a lot smarter!

 

Just for kicks;

The original version using the import, btw, looks like this inside the  
head tag:
   style type=text/css title=text/css
/* ![CDATA[ */
@import url(07_p188_adding_separator_stripes.css);
/* ]] */
/style
script src=07_p188_adding_separator_stripes.js  
type=text/javascript/script

And yeah, this works as a static page, but not as a CGI. I made a  
simple CGI that's
just some text and a stylesheet - the import doesn't work.

 


Thanks
Mike
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response



CGI advice

2005-04-12 Thread Mike Lesser
Hi all. Okay, here's the latest.
I've been screwing around with my first CGIs - pretty cool, now I see 
the appeal. Anyway,
I'm fairly comfortable using MySQL and generating XML output from it, 
and printing it
to files and stdout. Now I'd like to create the usual web interface 
that starts with a form
and results in a table of links from the query results. Probably 
multiple-page results per query.

The problem is, which way to go? There seem to be a zillion techniques, 
and I don't want
to go down a dead (or annoying) path. For example, should I go XML to 
HTML (via some
module like XML::Parser), or use CGI.pm? Should I create a layer 
between the query
results and the HTML generation?

I'm about to jump into CGI.pm anyway, since it seems like the logical 
place to start, but
any advice appreciated...

thx
Mike
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response



Re: .pm and .pl files.

2005-04-01 Thread Mike Lesser
snip
So if we don't want to be like dweebs, it's no extension for scripts, 
pl for libraries?

What's weird is that many of the O'Reily books I have use the pl... so 
I blame them.

Is the library file thing passe? I was going to put some stuff in a 
library, but I get
the impression that I should jump right to a pm. As a newbie this is a 
little daunting.

Mike
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response



Re: More Regex for Newbies

2005-03-29 Thread Mike Lesser

I'm fairly new too, but
s/\(.*\)//g
Should work?
Thanks David. The g sort of throws me, but I had The Epiphany last 
night. I'm still a little
bit confused, but I'm making headway now.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response



More Regex for Newbies

2005-03-28 Thread Mike Lesser
Hi all. I have this strange relationship with Regex. I seem to be able 
to get simple
stuff accomplished, but in really brute-force ways, and I think I'm 
missing some
fundamental aspect of its usage. I'm forced to chomp thru strings from 
the
side like pac-man. It's like I can do a simple match or substitution, 
but I can't
get stuff out of a string that I want (which is what seems to be going 
on in all
the examples).

For example, if I have a string that goes like
  Joe Shmoe (alphanumerics)
and I want to get the alphanumerics between the parens, It's like 
pulling teeth,
I think (i think..) what I want to do is match the stuff that's _not_ 
between the
parens, and substitute that to nothing? Is that how to do it?

Mike
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response



Re: XML Generator DBI

2005-03-23 Thread Mike Lesser

-w is kind of pan-galactic and influences modules you're calling, 
regardless
of their own 'warnings' settings.  Try 'use warnings' without -w.  See 
p861
in the camel 3rd edition.

HTH, GStC.
That's exactly what I needed to hear. Thank you Graeme!
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response



XML Generator DBI

2005-03-19 Thread Mike Lesser
Hi all. I'm messing with XML  MySQL, and I may have run into a newbie
snag.
I'm using XML::Generator ::DBI with XML::Handler::YAWriter, in the
same manner that I've seen in tutorials, the MySQL Cookbook, and other
assorted places.
$writer = XML::Handler::YAWriter-new (AsString = 1);
$query = 'SELECT cd_airdate, cd_copies, cd_hilites FROM cds WHERE 
cd_labeled=0
ORDER BY cd_airdate';
$generator = XML::Generator::DBI-new(  dbh =$dbh,
Handler =$writer,
RootElement=Creators,
Indent=1,
QueryElement = '');
$xml_output = $generator-execute($query);
According to the XML::Gen perldoc, Nulls are handled by excluding 
either
the attribute or the tag. The XML looks and works just fine. No prob.

However,when I use the -w option with the shebang, I get a couple 
warnings
for each record of the database. They're all about the same:

Use of uninitialized value in string eq at 
/Library/Perl/5.8.1/XML/Generator/
DBI.pm line 180.

Now it seems to me that if nulls are supposed to be handled by the 
modules, then
why the warnings? Is it normal for modules to put out warnings during 
their
normal function? I'm used to C/C++ environments where you normally 
eliminate
any warning that appears.

Mike
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response