pdf to spreadsheet advice

2010-09-06 Thread Matt Johnson
Hello,

I periodically receive pdf's with a table of member names, addresses,
etc in a badly formated hard to read pdf. I would like to open the
pdf, extract the data, do a little re-organizing and write it to an
excel spreadsheet. Perl seems like the best way to do this.

I have searched CPAN and seen that there are a bunch of pdf and
spreadsheet related modules. I am looking for advice about the best
modules to use for this.

Which modules would be the best to extract the data from the pdf and
write to Excel with?

I will probably do this on OS X, though I can use Windows if I need to.

Thanks
-- Matt

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Help with WWW::Mechanize

2006-11-30 Thread Matt Johnson
Mathew Snyder wrote:
 Tom Phoenix wrote:
 On 11/28/06, Mathew Snyder [EMAIL PROTECTED] wrote:

 I have a form I'm trying to fill out.  One of the fields, despite
 being named in
 the HTML source keeps erroring out on me.
 select name=ValueOfActor
 fields = {
 ValueOfStatus = $status,
 Value0fActor = $username,
 },
 Maybe you should use a typeface that shows the difference between O
 and 0 more clearly. As another good practice, you can use
 copy-and-paste when you need to be sure to get the exact spelling.
 Hope this helps!

 --Tom Phoenix
 Stonehenge Perl Training

 
 Upon further review, this wasn't it.  I'm not sure why the 'O' in either of 
 the
 fields above is different but I made sure they are the same in the code.
 
 It seems like it isn't getting past the login screen for the page I'm trying 
 to
 access.  There's only one form on it with a two fields, one called 'user' and
 the other 'pass' and a submit button.  If I just run this
 
 #!/usr/bin/perl
 
 use warnings;
 use strict;
 use WWW::Mechanize;
 use HTML::TokeParser;
 
 my $username = msnyder;
 print Enter your password: ;
 chomp(my $password = STDIN);
 my $status   = open;
 my $url = 'https://rt.xxx.xxx.com/Search/Build.html';
 my $textRegex = 'Tickets';
 
 my $agent = WWW::Mechanize-new();
 $agent-get('https://rt.xxx.xxx.com/');
 $agent-form(1);
 $agent-field('user',$username);
 $agent-field('pass',$password);
 $agent-click_button(value = 'Login');
 sleep(10);
 
 it returns to the prompt with no errors or output.  However, as soon as I add 
 a
 line to follow a link based on either a URL or regex I get an error saying the
 sought after item wasn't found on the page.  It should be noted that the login
 page has the same URL as the page loaded after logging in.  I don't know if 
 that
 matters though.
 
 I looked at the test subroutines but need to sort them out to figure out how 
 to
 use them to tell me what the issue might be.
 
 Mathew
 
Matthew
My reading of the WWW::Mechanize documentation is that
$agent-click_button returns an HTTP::Response object. I don't think
that would print anything on error. In this situation I would check the
HTTP::Response and possibly print out $agent-content to verify that I
got the page I expected.

Hope that helps
-- MattJ



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: How to manipulate environment variables in parent process?

2006-11-14 Thread Matt Johnson
siegfried wrote:
 I think the best I could hope for would be to write a perl script that
 generated a bat file and then I manually execute the bat file. I don't
 think there is anyway to automate the execution of the bat file.
 I'm sure that there is; if you can't put it into its own bat file, you
 could have Perl itself execute it via the system() command. Since new
 processes (such as those run with system()) inherit the environment,
 
 Yeah: that is precisely the problem. The child inherits from the parent. Can
 I make the child manipulate the environment table in the parent? I don't
 think so. Please tell me I'm wrong.
 
 
 it's easy to set up %ENV however you'd like.

 Hope this helps!

 --Tom Phoenix
 Stonehenge Perl Training
 
 Tom,
 Please elaborate. I need to have an interactive command shell with the
 symbols set up. 
 
 Anytime I run a perl program that uses backquotes or system, those symbols
 that are defined by perl will only be good for duration of the perl program
 and as soon as perl exits, I'll have a command prompt with none of the new
 symbols defined -- correct?
 
 I believe this is true for all *nix and windows shells.
 
 Thanks,
 Siegfried
 
 
Siegfried, as far as I know a child cannot modify its parent. Tom is
smart then I am though, so he may know something I don't.

You can get the time and date to use in file or directory name in
Windows batch. http://www.robvanderwoude.com/index.html has some examples.

Of course Windows batch is an abysmal scripting language, so if you want
to do anything non-trival it is worth doing, or re-doing in Perl.

-- MattJ


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: How to use Perl for API testing

2006-07-10 Thread Matt Johnson

You might want to consider
Test::WWW::Mechanize http://search.cpan.org/author/PETDANCE/Test-WWW- 
Mechanize-1.12/Mechanize.pm
or WWW-Mechanize http://search.cpan.org/~petdance/WWW-Mechanize-1.18/ 
lib/WWW/Mechanize.pm


-- MattJ


On Jul 9, 2006, at 10:13 PM, Suja Emmanuel wrote:



Hi,

I want to use PERL for API testing, i.e., I want to call
different URLs through the browser. How much is possin




The information contained in, or attached to, this e-mail, contains  
confidential information and is intended solely for the use of the  
individual or entity to whom they are addressed and is subject to  
legal privilege. If you have received this e-mail in error you  
should notify the sender immediately by reply e-mail, delete the  
message from your system and notify your system manager. Please do  
not copy it for any purpose, or disclose its contents to any other  
person. The views or opinions presented in this e-mail are solely  
those of the author and do not necessarily represent those of the  
company. The recipient should check this e-mail and any attachments  
for the presence of viruses. The company accepts no liability for  
any damage caused, directly or indirectly, by any virus transmitted  
in this email.


www.aztecsoft.com



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: best way of getting a web document

2006-06-25 Thread Matt Johnson

Mumia W. wrote:

Dan wrote:

LWP or HTTP::Client?
i've used both and run across..some problems. [...]
i need the most reliable to fetch the feed, and pass me the body of 
the page so i can pass it to an xml parser of sort.


unless there's something else which can already do that? [...]


Hi Dan.

I've played with LWP before, and it worked okay.

Another option is to use the lynx web browser to fetch the page source. 
As far as I know, lynx cannot parse XML, so you'd have to use a separate 
XML parser after fetching the page with lynx.


More options for fetching pages are curl (the module) and curl (the 
program).


Foremost among the XML parsers is XML::Parser; however, CPAN has many 
XML parsing modules.





Hi,
	I use WWW::Mechanize 
http://search.cpan.org/~petdance/WWW-Mechanize-1.18/lib/WWW/Mechanize.pm
to get pages. I do some simple XML validation and manipulation in some 
cases.


-- Matt Johnson


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response