PDF to Text
Hi all. Like it says, I need to extract the content of a PDF file. I installed the tool pdftotext, and it works fine for my needs. I recall there was a very simple module that used this to extract text, but for the life of me, I can't find it on CPAN! Any leads? Using a command-line script in my own code makes me feel icky, but I guess I'll deal... thx Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Paths, Spaces, Getopt::Long
Hi all. I have a problem that _must_ have a very simple solution (that I can't find). I use the module Getopt::Long to read arguments, one of which is a file path that may have spaces. The path string that is returned from Getopt has spaces without escape chars. The string seems to be fine for Perl use, but not so great for other things, such as the Shell module, which can't handle the spaces. I have to assume that paths can be converted easily for use in shells and such, without resorting to RegEx. Any ideas? Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Fwd: Paths, Spaces, Getopt::Long
Well I'm not sure. I may be explaining this badly. I'll go thru all the details in case it helps. The path I pass when I'm executing the script is escaped, which I assume is correct. Once that path is read by Getopt, I print it and, voila, no escapes, just nice-to-read spaces. This path gets a filename appended as if it were a regular string, and is used to when I make a file (via another module). This file is created written just fine. This made me assume all was well, and that Perl or the modules covered all the issues with spaces. I now realize this may have been naive. Then I attempted to use Tidy, sans HTML::Tidy, through Shell. The HTML::Tidy lib won't work on my system. So, I have been futzing with tidy and I'v e discovered that tidy and simple commands like cd fail, most likely because of the spaces in my paths. For example, here's the path I pass to the script (no quotes): /Users/mike/Airline\ Sheets/Original\ Schedules/UnitedJune.html Here's the path as found via File::Basename/fileparse: /Users/mike/Projects/Omni/Airline Sheets/Original Schedules/ My script uses modules that create files based on this path, and it seems okay. If however I try to use the path with say, the Shell mod, it fails. This is what cd returns: /Users/mike/Projects/Omni/Airline: No such file or directory . I need to use the Shell because I need to run tidy, locally. I fear that you're using the Shell module for more than it was intended to do, perhaps because you don't know about system(). That may very well be the case! One easy solution may be to give a list of arguments to system(). The first is the name of the program you're trying to run, the rest of the list are the command-line arguments to give it. You don't need to escape anything, because the strings are passed as-is. # use the system's chmod command on a list of filenames system chmod, u+w, @filenames; Hope this helps! --Tom Phoenix Stonehenge Perl Training -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Fwd: Paths, Spaces, Getopt::Long
Begin forwarded message: From: Mike Lesser [EMAIL PROTECTED] Date: June 3, 2007 3:48:56 PM EDT To: Chas Owens [EMAIL PROTECTED] Subject: Re: Paths, Spaces, Getopt::Long On Jun 3, 2007, at 1:59 PM, Chas Owens wrote: On 6/3/07, Mike Lesser [EMAIL PROTECTED] wrote: snip I have to assume that paths can be converted easily for use in shells and such, without resorting to RegEx. Any ideas? snip Aside from the multi argument version of system that Tom has already mentioned, the bigger question is Why are you running code outside of Perl? Often people think they need to say things like The script needs to use tidy to strip garbage from an html file prior to reading it. It's a file automatically generated by another company and it's filled with junk, hence no chance to fix it at the source. The HTML::Tidy module would be fine but it doesn't pass testing on my box, and won't work with a forced install. I took a look and found that that seems to be a recurring problem on OS X 10.4. I haven't yet looked thru the code to determine the source of the problem as it seemed that running either Shell or system () was an interesting thing to learn. I might have been wrong there! I've had success running hard coded paths and stuff, but now see that there's this space problem, which I didn't realize since Perl was handling paths nicely all by itself! system rm -rf $path; system mkdir $path; system chmod 666 $path; when they could just as easily say use File::Path; use File::chmod; rmtree $path; mkpath $path; chmod 0666, $path; -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Paths, Spaces, Getopt::Long
On Jun 3, 2007, at 1:59 PM, Chas Owens wrote: On 6/3/07, Mike Lesser [EMAIL PROTECTED] wrote: snip I have to assume that paths can be converted easily for use in shells and such, without resorting to RegEx. Any ideas? snip Aside from the multi argument version of system that Tom has already mentioned, the bigger question is Why are you running code outside of Perl? Often people think they need to say things like system rm -rf $path; system mkdir $path; system chmod 666 $path; My intent is to keep it within Perl, but I seem to be going further outside of it due to this problem. I'm involved in all sorts of Perl- unrelated nonsense. I've taken another look at HTML::Tidy, and it appears that there are some critical issues with the version of tidylib that's on Mac OS X, including the version number. Tidy's included, which is nice, but fink is intent on keeping the same (old) version. I figure, get the latest from CVS, but there's been some problems there (probably due to me having never used it before). Rebuilding a new version of the lib hasn't been successful yet due to a variety of problems (but I may solve them tonite). Thus writing an Xsub to it (another thing for me to learn) seems excessive. Now I'm really far out! Roadblocks everywhere! I'm thinking that a little RegEx might just serve me better even though it's reinventing the wheel. This script is intended for use with a file from a specific vendor, which has it's own quirks (it looks like someone set it an exporter 10 years ago and then left the company) that make no sense. It's not _so_ bad to make some custom code, is it? Yuck. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Paths, Spaces, Getopt::Long
Okay, I eliminated the tidy with some more robust regex. D'oh! Case closed! -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
testing return values
Hiya. I'm looking for the correct Perl style for testing and storing a return value in a control statement. The solution in any other language is pretty obvious, but I get the distinct impression that there's a 'right' way in Perl... Let's say I want to test a scalar returned from a subroutine, and also keep a copy for my own use: $scalar = sub( $argument ); if( $scalar ){ } Naturally that's no big deal. Now let's say I have a tree I want to traverse, or some similar open-ended thing to evaluate, and want to run it until a condition is reached.. while( read_tree( $argument ){ } Again no biggie. The problem is if I want to keep the result. Obviously I can't do this: while( $tree_element = read_tree( $argument ) ){ do_something( $tree_element ); } I can come up with a brute-force solution of course, but there's probably a better, Perlish way that I'm not aware of. In addition, I don't expect a return value from some module to be consistently undefined or zero; it could change under some circumstances. This makes me think that the problem has been dealt with long ago, and just doesn't stick out in the llama/alpaca/whatever books. Hopefully I explained this correctly! -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Diagnosing use/lib problems
On Mar 26, 2007, at 7:32 PM, Tom Phoenix wrote: On 3/26/07, Mike Lesser [EMAIL PROTECTED] wrote: Hi all. First time using subroutines in external files. I've had some sporadic success with some simple libs (not modules), but can't seem to get consistent results. What do you mean by sporadic success? Does something work only on some invocations of your program? Once you can get it to work, doesn't it keep working? I've been able to use FindBin as described in the alpaca book to load a single file. When I try to use another - almost identical - file, it fails. I've tried FindLib-again as well as pretty much everything I've seen online. There's a level of use FindBin qw($Bin); use lib $Bin/Libs; 'use lib' doesn't load your library files; it just tells Perl where to find them. You still need to load them (either with require or use). You could put code like this after those two lines; the BEGIN block ensures that the external subroutines are compiled before the following code begins compilation. BEGIN { require first.pl; require another.pl; require one_more.pl; } It works! Thanks. That's what I didn't glean from the books. I think the stuff I've seen online is either outdated or simply unsuited to multiple files. Makes sense, since modules is the way to go, apparently. yeah, they're executable, and yeah, I've tried several approaches. Modules and libraries don't normally need to be marked as executable. Hope this helps! --Tom Phoenix Stonehenge Perl Training -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Diagnosing use/lib problems
Hi all. First time using subroutines in external files. I've had some sporadic success with some simple libs (not modules), but can't seem to get consistent results. I've got a few subs files ending in 'pl' in a folder within my scripts' folder: UnitedScripts (dir) myScript.pl Libs (dir) Lib1.pl, etc I'm using the following (at the moment) to include them: use FindBin qw($Bin); use lib $Bin/Libs; I consistently get errors like so: Undefined subroutine main::route_get_hash called at ./ReadSingleTable.pl line 114. I first assumed is was an @INC problem (I know that much), but printing the contents of @INC clearly shows my path is included: /Users/mike/Code/Perl/UnitedScripts/Libs /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 ...and so on yeah, they're executable, and yeah, I've tried several approaches. Still no success. Anyone have any idea? Thanks. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: perl module to create pdf reports
Whoops! Top-post! Chris, Yes, I did search CPAN and found a lot of interesting modules. I will mention that I had asked for ideas and insights, over and above of just plain module names. I guess you often assume people do not search CPAN, and hence I can understand the frustration your email seems to portray. I do value your comments. As far as free solutions, the Big Badass is PDF::API2. There are many users, a zillion modules, and a mailing list. There are several modules that are built on top of this, to relieve the difficulty of using it. I'm not so sure they make sense to use. The problem is that it has only very recently started to include documentation. It's as complex as PDF itself, and assumes that you are familiar with the gigantic PDF Reference. The author knows PDF inside and out, but IMHO doesn't get that his user base is struggling, and needs dox more than they need attittude. Making very simple documents is easy, using a couple of the simpler modules, but once you want to stretch out it becomes very heavy. The samples are all minimal, and don't always address a solution that anyone would want to use. [Some of them have even used outdated APIs.] The O'Reilly book Web Graphics Programming has a chapter on using the module, but they are simplistic examples for the most part. However, when they give you soup, you take the soup. The author is starting to add dox, so that's a step in the right direction. I think the best thing to do is to plan on a big learning curve, including (at least) learning the terminology and models from the PDF Reference (which is online). -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
CGIs CSS - References
Hi gang I'm working on my first CGIs, and have been adding stylesheets. I'm currently working on my own machine, under localhost. I have two questions; (1) What's the contemporary way to print xhtml from my CGI? Should I use CGI.pm, or something else? I currently just have a mess of print commands. (2) My stylesheets don't work with @import - Apache claims that it can't find the file. They _do_ work with a hard reference like link rel=stylesheet type=text/css href=http://localhost/the_stylesheet.css; / I've been advised that @import is the way to go, and either way I need relative references to the stylesheets. I can only assume that it's an http.conf thing that I haven't been able to google up. Thanks Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: CGIs CSS - References
Current versions of CGI.pm should generate XHTML by default, iirc. Hmm. Does it include the doctype stuff and so forth? I seem to be getting along fine with print qq() today, but I figured CGI.pm would help with tables and links and so forth. I guess there's no need to settle on using just CGI.pm, right? Using both CGI.pm and print qq isn't considered gross? (2) My stylesheets don't work with @import - Apache claims that it can't find the file. They _do_ work with a hard reference like link rel=stylesheet type=text/css href=http://localhost/the_stylesheet.css; / If you really really want to have them in an @import call in the HTML, reconsider. The link ... approach should be fine. If you still want to do it this way, please clarify how you're trying to implement it. [The only reason I'm trying it is because the book (my first CSS book) uses it in the examples; it says that older browsers won't choke on the stylesheets. Frankly, I don't care about any older browsers at the moment! Also I don't even know where this @import symbol comes from! If the link tag method is considered to be sufficient, then hey, I'm fine with that for now.] However I think I have bigger issues. My stylesheet link tags only seem to work when I make a complete url to the stylesheet in my personal web space - a partial ref, or a full ref to the global web space doesn't work. Strangely enough, a javascript ref works just fine! [Note that my machine has a global cgi folder aliased as /localhost/cgi-bin/, as well as a global web folder at /localhost/. I also can serve pages (only) from my user space at /localhost/~mike/] Can you construct a simple HTML file with the same @import call that fails in the same way your CGI script does? If so, then CGI is ruled out as a culprit and the problem is with the URL, or with Apache; if the page works that way, then something your CGI is doing is breaking. Ignoring the import issue for the moment... I _definitely_ can take the CGI-created source and make it a static web page, and it will find the stylesheet (by just its name) just fine - no problems at all. This is true for both the global web folder and my personal one.(Naturally the stylesheets and js files are in both places). This makes me think it's Apache. I'm trying to make a static page that _doesn't_ work. Inside my head tag I have this. link rel=stylesheet type=text/css href=http://localhost/~mike/07_p188_adding_separator_stripes.css; //link script src=http://localhost/07_p188_adding_separator_stripes.js; type=text/javascript/script (and yeah, the goofy file names are from the book examples' site) Now if I have a full href to the stylesheet - http://localhost/that giant name - it won't be used and the Apache log will reflect this. Likewise with a partial href to the file. But the javascript src tag seems to be a lot smarter! Just for kicks; The original version using the import, btw, looks like this inside the head tag: style type=text/css title=text/css /* ![CDATA[ */ @import url(07_p188_adding_separator_stripes.css); /* ]] */ /style script src=07_p188_adding_separator_stripes.js type=text/javascript/script And yeah, this works as a static page, but not as a CGI. I made a simple CGI that's just some text and a stylesheet - the import doesn't work. Thanks Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
CGI advice
Hi all. Okay, here's the latest. I've been screwing around with my first CGIs - pretty cool, now I see the appeal. Anyway, I'm fairly comfortable using MySQL and generating XML output from it, and printing it to files and stdout. Now I'd like to create the usual web interface that starts with a form and results in a table of links from the query results. Probably multiple-page results per query. The problem is, which way to go? There seem to be a zillion techniques, and I don't want to go down a dead (or annoying) path. For example, should I go XML to HTML (via some module like XML::Parser), or use CGI.pm? Should I create a layer between the query results and the HTML generation? I'm about to jump into CGI.pm anyway, since it seems like the logical place to start, but any advice appreciated... thx Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: .pm and .pl files.
snip So if we don't want to be like dweebs, it's no extension for scripts, pl for libraries? What's weird is that many of the O'Reily books I have use the pl... so I blame them. Is the library file thing passe? I was going to put some stuff in a library, but I get the impression that I should jump right to a pm. As a newbie this is a little daunting. Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: More Regex for Newbies
I'm fairly new too, but s/\(.*\)//g Should work? Thanks David. The g sort of throws me, but I had The Epiphany last night. I'm still a little bit confused, but I'm making headway now. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
More Regex for Newbies
Hi all. I have this strange relationship with Regex. I seem to be able to get simple stuff accomplished, but in really brute-force ways, and I think I'm missing some fundamental aspect of its usage. I'm forced to chomp thru strings from the side like pac-man. It's like I can do a simple match or substitution, but I can't get stuff out of a string that I want (which is what seems to be going on in all the examples). For example, if I have a string that goes like Joe Shmoe (alphanumerics) and I want to get the alphanumerics between the parens, It's like pulling teeth, I think (i think..) what I want to do is match the stuff that's _not_ between the parens, and substitute that to nothing? Is that how to do it? Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: XML Generator DBI
-w is kind of pan-galactic and influences modules you're calling, regardless of their own 'warnings' settings. Try 'use warnings' without -w. See p861 in the camel 3rd edition. HTH, GStC. That's exactly what I needed to hear. Thank you Graeme! -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
XML Generator DBI
Hi all. I'm messing with XML MySQL, and I may have run into a newbie snag. I'm using XML::Generator ::DBI with XML::Handler::YAWriter, in the same manner that I've seen in tutorials, the MySQL Cookbook, and other assorted places. $writer = XML::Handler::YAWriter-new (AsString = 1); $query = 'SELECT cd_airdate, cd_copies, cd_hilites FROM cds WHERE cd_labeled=0 ORDER BY cd_airdate'; $generator = XML::Generator::DBI-new( dbh =$dbh, Handler =$writer, RootElement=Creators, Indent=1, QueryElement = ''); $xml_output = $generator-execute($query); According to the XML::Gen perldoc, Nulls are handled by excluding either the attribute or the tag. The XML looks and works just fine. No prob. However,when I use the -w option with the shebang, I get a couple warnings for each record of the database. They're all about the same: Use of uninitialized value in string eq at /Library/Perl/5.8.1/XML/Generator/ DBI.pm line 180. Now it seems to me that if nulls are supposed to be handled by the modules, then why the warnings? Is it normal for modules to put out warnings during their normal function? I'm used to C/C++ environments where you normally eliminate any warning that appears. Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response