what to call my module
I am planning to distribut my module, but am confused on what to call it. I have a series of modules in a directory that I will call Rtf2xml. (I checked on Cpan, and don't believe this name space is taken.) I named each module by a simple name--for example Pict.pm. In the main script, I have: use Rtf2xml::Pict ... Pict::process_pict() In the actual module, I have: package Pict; Everything works fine this way. But when I look at other modules, I notice that they use a different naming convention. My question is if I should follow this syntax: package Rtf2xml::Pict # for the actual module Rtf::Pict::process_pict() # for calling on a subroutin in the pacage I believe that the second way provides a more distinct namespace? I *have* read the documentation on how to distribute a module, but I couldn't follow most of the jargon. Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: how to set up Makefile.pl
Thanks Bob, and thanks RF. I used seek and tell, and now have the data as part of a module. Paul On Wed, Feb 05, 2003 at 05:13:24PM -0500, Bob Showalter wrote: From: Bob Showalter [EMAIL PROTECTED] To: 'Paul Tremblay' [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: RE: how to set up Makefile.pl Date: Wed, 5 Feb 2003 17:13:24 -0500 Paul Tremblay wrote: Thanks, but this won't work. I need to open the data file and read some data. Later in the script, I need to open the data file again and read more data. When I use DATA, perl apparently reads one line at a time until it finds what I want. It then starts at the line I left off when I need to read more data. It does not start at the beginning again, so I cannot find the data I need. The only solution I can think using the __DATA__ method would be to have my script print out *all* data to a temporary file. I could then open and close this file when I wanted data. But this seems like kind of a hack--and it would take a bit more time, though only a second or two. Two alternate approaches: 1) read the data into a memory structure (array or hash) 2) use tell()/seek() on the DATA file handle to move around. n.b. seek(DATA, 0, SEEK_SET) does not put you at the first line after __DATA__, it puts you at the top of your script file. So use tell() to mark the start point before you start reading. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: how to set up Makefile.pl
Thanks, but this won't work. I need to open the data file and read some data. Later in the script, I need to open the data file again and read more data. When I use DATA, perl apparently reads one line at a time until it finds what I want. It then starts at the line I left off when I need to read more data. It does not start at the beginning again, so I cannot find the data I need. The only solution I can think using the __DATA__ method would be to have my script print out *all* data to a temporary file. I could then open and close this file when I wanted data. But this seems like kind of a hack--and it would take a bit more time, though only a second or two. Paul On Wed, Feb 05, 2003 at 09:13:13AM -0500, RF wrote: However, I have several files that need aren't modules but are needed for my script. One of these is a data file. How do I set up Makefile.pl to make sure this data file gets put in a place where the script can read it? The answer verbatim from The Perl Cookbook (http://www.oreilly.com/catalog/cookbook) Problem You have data that you want to bundle with your program and treat as though it were in a file, but you don't want it to be in a different file. Solution Use the __DATA__ or __END__ tokens after your program code to mark the start of a data block, which can be read inside your program or module from the DATA filehandle. Use __DATA__ within a module: while (DATA) { # process the line } __DATA__ # your data goes here Similarly, use __END__ within the main program file: while (main::DATA) { # process the line } __END__ # your data goes here Discussion __DATA__ and __END__ indicate the logical end of a module or script before the physical end of file is reached. Text after __DATA__ or __END__ can be read through the per-package DATA filehandle. For example, take the hypothetical module Primes. Text after __DATA__ in Primes.pm can be read from the Primes::DATA filehandle. __END__ behaves as a synonym for __DATA__ in the main package. Text after __END__ tokens in modules is inaccessible. This lets you write self-contained programs that would ordinarily keep data kept in separate files. Often this is used for documentation. Sometimes it's configuration data or old test data that the program was originally developed with, left lying about in case it ever needs to be recreated. Another trick is to use DATA to find out the current program's or module's size or last modification date. On most systems, the $0 variable will contain the full pathname to your running script. On systems where $0 is not correct, you could try the DATA filehandle instead. This can be used to pull in the size, modification date, etc. Put a special token __DATA__ at the end of the file (and maybe a warning not to delete it), and the DATA filehandle will be to the script itself. use POSIX qw(strftime); $raw_time = (stat(DATA))[9]; $size = -s DATA; $kilosize = int($size / 1024) . 'k'; print PScript size is $kilosize\n; print strftime(PLast script update: %c (%Z)\n, localtime($raw_time)); __DATA__ DO NOT REMOVE THE PRECEDING LINE. Everything else in this file will be ignored. See Also The Scalar Value Constructors section of perldata (1), and the Other literal tokens section of Chapter 2 of Programming Perl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
how to set up Makefile.pl
I'm trying to write a Makefile.pl to distribute module. By just playing around, I believe I've kind of got the hang of how to make sure my module gets put in the right place. However, I have several files that need aren't modules but are needed for my script. One of these is a data file. How do I set up Makefile.pl to make sure this data file gets put in a place where the script can read it? Also, I would like for the Makefile.pl to put an executable in an appropriate place (/usr/bin, or whaterver) so the user can run my script with just a command. I have looked on the web but haven't found any straightforward documentation. Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: how to distribute a module
On Sun, Feb 02, 2003 at 10:10:02PM -0800, Randal L. Schwartz wrote: rtf is a Really Bad Name for a module. First, it starts with a lowercase letter, which is reserved for system packages and modules. Not You. Second, if you ever plan on distributing it outside your local group, the name should be selected by contacting [EMAIL PROTECTED] If you *don't* plan on distributing it outside your local group, pick a group prefix uniquely identifying your group. I use Stonehenge::. Thanks. I will check this out. I really don't want to create a module so much as I want to package my script for distribution. The script consisits of one main script which uses my own modules. I doubt anyone else will use these modules themselves, so perhaps I shouldn't even try to package them as modules? Right now I have this line in the main script: use library /perl5/rtf I could have the user put the rtf directory wherever s/he wants, and then just change the line in the main script. This just seemed like kind of a hack. Third... you *are* aware that there are a lot RTF things already, right? So you might just be reinventing some portion of something that is already tested and deployed. See search.cpan.org for more details. No. Every time I ask for help on this script, on different mailing lists, I get this response. But there is no open source project that converts RTF to XML. There are scripts that convert RTF to other formats, but not XML. I have checked several times. I am working with someone else, and he has also checked. (Of course, there is always the possibilithy that I am still wrong!) Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
how to distribute a module
I have a series of modules that I would like to distribute for public use. These modules convert RTF to XML, and they are all contained in a folder called rtf. Can anyone tell me how to package these modules for distribution? The *Perl Cookbook* doesn't seem to be much help in this area. If I type h2xs -XA rtf Then I am given just the tidbits to get started with making my module ready for distribution. How do I get all the modules in the right perl library? And how do I get the executable to the right place? Also, I have a data table that the script needs, and I have a dtd that also needs to be put in the right place. I don't know exactly what the right place is right now--as long as the script knows where they are, it will work. I could have the user intall every thing manually, but that seems like it is kind of a hack. I would like for a user to be able to type make make install make test and have everything working right. Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: how to distribute a module
On Sun, Feb 02, 2003 at 08:23:04PM +0100, Paul Johnson wrote: I have a series of modules that I would like to distribute for public use. These modules convert RTF to XML, and they are all contained in a folder called rtf. I could have the user intall every thing manually, but that seems like it is kind of a hack. I would like for a user to be able to type make make install make test and have everything working right. You need to create a Makefile.PL. For documentation on how to do this see: perldoc ExtUtils::MakeMaker Thanks. I have run h2xs -XA -n rtf This creates a skeleton Makefile.pl. I checked out an example Makefile.pl from a module called HTML-Format. Theis person put all the modules in a directory called HTML, then put this directory in a directory called lib. So the Makefile looks like: WriteMakefile( NAME = 'HTML-Format', VERSION_FROM = 'lib/HTML/Formatter.pm', PREREQ_PM= { 'HTML::Element' = 1.44, 'Font::AFM' = 1.17, }, dist = { COMPRESS = 'gzip -9f', SUFFIX = 'gz', }, ); In my case, I believe I just need to do the same thing, except change the line VERSION_FROM = 'lib/HTML/Formatter.pm', to VERSION_FROM = 'lib/rtf/some_file_with_a_version', ? In additon, I would delete the line starting wiwth PREREQ_PM, since my modules require no other modules. However, how would I tell the Makefile.pl to install an executable, and how do I put the datafile in the right place? This datafile contains the data for character encoding. Obviously, the main script needs to know where to find this data file. Last, there is a dtd. Once my module converts the RTF to XML, in writes a path to the dtd. Again, the script needs to put this dtd some place, and needs to know where it is. Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
finding invalide options in getopt
I would like to have my script print a short help message and then quit if a user uses an invalid option. I am using the Getopt::Long module. Also, I've noticed that some scripts have a trick whereby if a user types scriptname --help The script prints on the pod. There is a link on perldoc.com for usage::pod, but this link is broken. (I read your tutorial, Drieux, but it was a bit too advanced for me!) Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: where to put data files
On Thu, Aug 29, 2002 at 06:47:26AM -0700, drieux wrote: ah! I see! you really are in the have code, the rest is temporary data files I'm not sure I catch your drift here. Did a line get dropped from the email? sorry for not quite catching your drift the first time. just a dumb-bunny question - but if I type rtf2xml --help will your code dump me some 'advice'? and/or perldoc rtf2xml show me if rtf2xml input_file output_file No to both questions! Which means I need to include both. I have just added some switches for this script. I guess I can just add a switch for --help. #i'v already defined my switch --help and attatched it to $help if ($help){ print in order to use this script... } As far as perldoc goes, if I include any pod in the script, then typing perldoc rtf2xml should output the documentation? (I'll try this out right now. I'll probably have the answer before you respond!) would work - or that in this release it is only sending output to STDOUT? that being the other rack of basics - all you really need to do to distribute it would be to hang the code on a webPage and let folks 'save to file' from their browser The next level of complexity is to either go with the Make::Maker approach of building an installer based upon the usual perl *.PL make install model - or you will want to look at the other option of a) hand crafting an installation script b) adopting someone else's package installer - rpm/pkgadd and complying with their standard. The basic stuff that you will want to put into the general release are a) README - what we are about - clues to any other documentation b) ChangeLog - when we did what to this for why c) The Installation Stuff - how to check for all the dependencies - how to check for the installer's desires and wishes as well as explain to them if they try to install into places they shouldn't - how to actually install it... d) The Code Stuff - the one item - at this point e) Manifest - what should be in the tarball This way the person who just wants 'the application' Can just install it - without having to understand what all the rest of that stuff is about - while others will be able to keep track of how this evolves as it evolves... Thanks. This is all useful info. Right now in order to run the script, a user needs: 1. the code 2. the character_set file 3. a folder called rtf2xml_dir located in /usr/share 4. a temp folder located in the above folder. It is pretty easy to write a script to make a folder and copy the character_set file to this folder. Will all users have permission to make such a folder? And where should I put the executable script? In /bin, or /usr/bin, or what? I think I'll go with simply writing my own simple script for making the installation. I'll also give explicit instructions in README on how to install the few components by hand, (if the user wants to) as well as how to change the place the script writes its temporary files, and where it reads its data. (I put a variable called $directory at the beginning of the script so a user can change it easily if he wants.) I guess I need to work on the pod documentation. I believe I should be able to simply follow the Perl Cookbook. Thanks Paul -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
convert decimal to hexidecimal
Is there a way to convert a decimal number to a hexidecimal number in perl? I have expressions like this: \u8195\'20 \u9824\'3f The number after the u is a decimal number that needs to be converted to hexidecail. The number after the second slash needs to simply be elimianted. Thus, the two lines above should look like: #x2003; #x2660; Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: convert decimal to hexidecimal
On Wed, Aug 28, 2002 at 09:51:11PM -0700, Randal L. Schwartz wrote: Why? #8195; #9824; is prefectly valid, unless you're talking about something besides XML or HTML. -- Yes, it certainly is! I realized this as soon as I sent off the email. Most of RTF (I am writing a script to convert RTF to XML) uses hexadecimal, so I was thinking along those lines. But yes, I only have to do a simple substitution. None-the-less, I am glad for the tips I have gotten. Who knows if I'll have to convert to hexadecimal sometime in the future? Thanks everyone Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: where to put data files
On Wed, Aug 21, 2002 at 11:43:43AM -0700, drieux wrote: On Wednesday, August 21, 2002, at 10:06 , Paul Tremblay wrote: I am writing a script that converts RTF to XML, and this script needs to read an external data file to form a hash. I plan to make this script available to anyone who needs it, and am wondering what to do with this external data file. when you say 'external data file' - do you mean things that are required for the script itself to work? if so, you may want to look at the traditional solution of getting those to install into the site_perl as a perl module, hence you will most likely want to look at h2xs as a first round for how to do that Okay, I'm getting closer to finishing the d**n beast of a script, so I need to think about how to distribute it. This script will be a command line utility. Right now it has no switches: rtf2xml file I want to make it easy to use for people who don't know perl. Should I still use a module? My understanding of a module is that it provides code for other perl scripts, rather than being a stand alone utility by itself. I am thinking that if I distribute this script as a module, I will have to write a wrapper for it? That is, say I have a script like this: #!perl print hello world; I have to re-write it so it looks like: sub main{ print hello world; } Now I call this module hello_world.pm. This installs in the proper places. I write another script that looks like this: #!perl use hello_world; main(); Thanks Paul or do you mean, as seems to be suggested below, any 'temporary files' that are created in transit? Should I simply make the data file available with the script and tell the user where to put it? Right now, I have it in /usr/share/rtf2xml/char_data.data. a reasonable alternative plan - but going with the traditional perl module approach will mean that they can then just do the usual perl Makefile.PL make make test make install and let it go at that... This way if you need to upgrade along the way - and it's all nice and neatly in a perl module - you can then distribute a new release of it - and not need to upgrade the core code itself. [..] Also, the script outputs to standard output. Future versions might require that the script make one run through the file, write to a temp file, then make a run through the temp file. Where should I put this temp file (if I need to make one, that is)? [..] you might want to look at IO::File and use the new_tmpfile method to create a temp file 'for the duration' - before sending it to stdout also IF in this future upgrade you start doing the myCode Infile Outfile then you could use the file named as Outfile as the temp file, unless you allow things like myCode Infile - | doFoo where you expressly allow '-' as the 'cheat' that you want the output to finally go to stdout ciao drieux --- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
where to put data files
I am writing a script that converts RTF to XML, and this script needs to read an external data file to form a hash. I plan to make this script available to anyone who needs it, and am wondering what to do with this external data file. Should I simply make the data file available with the script and tell the user where to put it? Right now, I have it in /usr/share/rtf2xml/char_data.data. How about on a Windows system, which I don't know much about? Should I write a scrpt that creates a directory and puts the data file in that directory? What is the standard procedure here? Also, the script outputs to standard output. Future versions might require that the script make one run through the file, write to a temp file, then make a run through the temp file. Where should I put this temp file (if I need to make one, that is)? Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
to slurp or not to slurp
I am writing a script to convert RTF to XML, and my output looks like this: (I explain this ugliness below) id1listlevel1 text id1listlevel2 text /id1listlevle2 /id1listlevel1 text id1listlevel1 text id1listlevel2 text /id1listlevle2 /id1listlevel1 I know that is ugly to read, but I'm just point out that the tags repeat themselves. It should look like this: id1listlevel1 text id1listlevel2 text text text text /id1listlevle2 /id1listlevel1 In other words, the list starts and then stops in the middle. In order to get rid of these exessive tags , I was thinking of reading the whole file into memory at once, and then doing this substitution: my @array = split /(id1listlevel1(.*)I\/id1listlevel1/, $_; for my $name(@array){ if ($name =~/id1listlevel1/){ $name=~s/id1listlevel1//g; $name= id1listlevel1$nameid1listlevel1; } print $name; } my @array = split /(id1listlevel2(.*)I\/id1listlevel2/, $_; I would actually use a loop for each level. However, isn't it a bad idea to read the whole file in at once? What happens if the user had a really huge file? My other method was to read my result file one line at a time. Once I found id1listlevel1 or anything that matches a similar pattern, I would push it into an array. If I found it again, I would simply delete it. Then, read the file in backwards one line at a time, and look for the pattern /id1listlevel1, and allow only the first one of these. So, should I slurp or do it one line at a time? In case you are wondering why my output has extra tags in the middle, you can blame good ol Bill Gates. RTF really does suck. In earlier versions of RTF, the code told you when the user was skipping an item in a list (but was still continuing the list). For a reason only known to the morons at micro$oft, they changed this code in word 97 and 2000. Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: speed and perl
On Fri, Aug 02, 2002 at 08:43:45AM -0500, Bryan DeLuca wrote: If you are interested in language benchmarks you might want to check out the Great Computer Language Shootout: http://www.bagley.org/~doug/shootout/ It has some surprising results. Yes, the results are surprising. According to these benchmarks, perl is not that much faster than python when doing regexs. Can this be so? It seems when I have done my own anecdotal, unscientific tests, perl was so much faster that I decided to write my script in perl. I would have expected perl to blow python or java or even C right out of the water when it comes to munging text. Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: speed and perl
On Sat, Aug 03, 2002 at 10:58:24AM +0200, Paul Johnson wrote: On Fri, Aug 02, 2002 at 02:08:25AM -0400, Paul Tremblay wrote: (I know I did a little test with sed, a python script, and a perl script, just changing the word the to teh in a huge file. Sed and python took about he same time, while perl was six times faster.) This is from the perl source code (sv.c if you are interested): /* Here is some breathtakingly efficient cheating */ if (rslen) { while (cnt 0) {/* this | eat */ cnt--; if ((*bp++ = *ptr++) == rslast) /* really | dust */ goto thats_all_folks;/* screams | sed :-) */ } } else { Copy(ptr, bp, cnt, char);/* this | eat */ bp += cnt; /* screams | dust */ ptr += cnt; /* louder | sed :-) */ cnt = 0; } Is this for real, and not a joke? If so, it is pretty funny! Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
changing multiple flags and changing them back
I have a series of flags that I need to change all at once, and then change back, and was wondering if I could use an array or hash to do this. I am parsing an RTF file, and when I find a footnote, I need to preserve the flags of the non-footnote text. So if I was in a table, I need to save the $in_table flag. Then when I am done with the footnote text, I need to re-set the $in_table flag to its previous state. So far I have this: sub start_footnote{ $previous_in_table = $in_table; ... } sub end_footnone{ $in_table = $previous_in_table; ... } This works find except I might have 15 or 20 flags I need to set or re-set. I would like to use an array like this: @flags = ($in_table, $after_cell, $in_paragraph); When I finish with my footnote, I will have an array of the previous values. Now how do I assign these values to the variables? Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
speed and perl
This question may be too vague for a good answer, but my curiosity makes me ask it anyway. I thought I read somewhere that perl is actually faster than C for certain tasks. The vagueness of the question probably lies in exactly what task, who writes the program, the size and type of data, and a dozen other factors. Specifically, I am writing a perl script to convert RTF to XML. I downloaded a utility called unrtf, written in C, which converts RTF to HTML, a somewhat easier task. I got depressed when I ran unrtf on small files and saw that it didn't take *any* time. But when I ran it on a big file of 1.8 megabytes, it actually took 6 minutes and 30 seconds. My script only took a minute! (Hooray, after all this hard work, that's a little encouraging to see!) When I ran the same file through a java utility called majix, it took over a minute. It is hard to say exactly, because majix supplies their own timer, and this timer only starts when java starts processing the documents, some 20 or 30 seconds after you launch it. I had actually considered learning C++ to make my little script really fast so that people would consider using it. I really doubt I would have really gone through all that trouble, but now I am wondering if C++ would have given that much of a time advantage--if any at all. Certainly, a perl script would be easier to maintain and debug. Thoughs on how C, java, and perl compare on speed? (I know I did a little test with sed, a python script, and a perl script, just changing the word the to teh in a huge file. Sed and python took about he same time, while perl was six times faster.) Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Perl IDE's v. Perl Editors was Re: Editor
I'm surprised that more posters didn't advocate vim as *the* editor. I think it is linux world that did a survey and found that 80 percent of the users picked vim as their favorite editor. I originally started using vim, then switched to nedit, and now have switched back to vim. Nedit if very nice, but I had problems with keyboard commands. About half the time when I would press keys like cntrl z to undo a command, I would get a ^ack or something like that on the screen. I can see why vim is the most popular linux editor. It is extremelly powerful. It has all sorts of options for automatic inenting, as well as a feature called folding, which I still have to learn how to use; it bascially hides lines on your screen. So if you were working between your main program and a subroutine 1000 lines below, you could hide those thousand lines. Of course, vim as full highlighting capabilities. Vim offers so many options that I doubt I could learn them all. The drawback to vim is that it is a bit hard to learn at first. It is keyboard driven, which goes against how most people learn to operate a computer, with a mouse. Of course, there is full graphical interface version of vim, which I use, called gvim. Once you get over the initial difficulty of learning vim (which should only take a few days?), then it can be easier to use, depending on your preferences. Vim is also one hundred percent free--as is Nedit. I wouldn't pay for an editor, with all the excellent free choices out there. But nedit is also a good choice as an editor. It is very intuitive and also powerful. Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: So what is munging?
On Mon, Jul 29, 2002 at 12:07:05PM -0400, Jeff 'japhy' Pinyan wrote: I've got an article in the July Linux Magazine (Hitting the Motherlode) about regexes (mainly in Perl). In it, I use the term munge. FOLDOC[1] says a derogatory term meaning to imperfectly transform data, but I don't think it's such a bad term. For me, munge just means using whatever means necessary to massage data from one format to another -- that might mean extracting stuff, changing its layout, or transforming it to an entirely different format. And I use massage here, because it's accurate: sometimes, you get a gentle backrub, and sometimes, you get painful hand-chopping on your spine. Huh, pretty funny. Would you make a distinction between parsing and munging? For example, if you wrote a script to convert Word RTF to XML, is that parsing or is that still munging? Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: fastest way to substitute
On Sat, Jul 27, 2002 at 02:40:52PM -0700, John W. Krahn wrote: s[([]|(?=\\)$rx)][$rep{$1}]go; John, this doesn't work. My representative line is: \ldblquote \rdblquote\par My code is: my %rep = qw( amp; gt; lt; ldblquote lt_quote/ rdblquote rt_quote/ ); my $rx = join |, map quotemeta, keys %rep; s[([]|(?=\\)$rx)][$rep{$1}]go; And the result is: \lt_quote/ \rt_quote/ amp; lt; gt; \par I'm getting the backslashes in front of my XML tags. I need lt_quote/ rt_quote/ amp; lt; gt; \par I've tried a number of different variations with no success. Perhaps I will need two lines of substitution in my script, one for backslashed characters, and one for characters that are not backslashed? Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: fastest way to substitute
On Sat, Jul 27, 2002 at 11:23:59AM -0400, Jeff 'japhy' Pinyan wrote: That's because my method requires creating a hash each time. If you were to take that out of the function, it would run faster. Right. This makes perfect sense. I followed your advice and put the hash outside of the subroutine. I also took out the ampersand substitution to make things equal. I got some interesting resulst. If I used a long line full of tokens, then each method was as fast. But if I used a more represenative line with just a few tokens, then your method was around twice as fast. That makes sense. The read_each_line method has to read the non-tokens 35 times (once for each substitution). You method gets to skip over them. If I used a line with no tokens, your method ran 10 times faster. That's why your suggestion above: '\\this ' = 'this/, might not be a good idea. If I wrote my hash like this, then as perl searches the line, it has to search each item in the hash. Your original suggestion looked like this: $l =~ s[\\($rx) ][$rep{$1}/]go; With this method, perl stops searching if it doesn't find a \. That means it can skip over lines with no \, which in turn means that it will run around 10 times faster than using my original method. The only problem is how I should replace , , and . I think I'll do single line subs for this text. Even with huge files it shouldn't take more than 1/2 a second or so, and that allows me to use your original method to speed things up. Or this just occurred to me: s[()|()|()|\\($rx)][$rep{$1}/]go; Yea, that should work! I also realize that I shouldn't initialize my hashes in my subroutines. Making them global should also speed up my script a bit. Thanks! Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: fastest way to substitute
On Sat, Jul 27, 2002 at 02:40:52PM -0700, John W. Krahn wrote: I thought '' and '/' were already in the hash values? If so, wouldn't this work? s[([]|(?=\\)$rx)][$rep{$1}]go; I don't understand this syntax: s[([]|(?=\\)$rx)][$rep{$1}]go; ^^^ Is that another way of telling the regex that you don't want to save the value? If my hash looks like this: my %rep = qw( ldblquote rt_quote/ rdblquote lt_quote/ amp; gt; lt; . ); Then you solution shold work. I planned to change my hash, but I guess I got ahead of myslef in my email! Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: fastest way to substitute
On Fri, Jul 26, 2002 at 02:02:54PM -0400, Jeff 'japhy' Pinyan wrote: It's best to come up with a hash of strings and replacements: my %rep = qw( ldblquote rt_quote rdblquote lt_quote emdashem_dash rquoter_quote tab tab lquotel_quote ); Then create a regex: my $rx = join |, map quotemeta, keys %rep; Then use it in a larger regex: $source =~ s[\\($rx) ][$rep{$1}/]g; Ta da! ONLY one pass through the string. This looks really nice! I'll have to test it with a timer. I'd imgaine it would be much faster because you only make one pass through. On the other hand, doesn't perl have to recompile the $rx each time because it is a variable? After all, $rx might have changed--though in my case, it definitely wouldn't have. You'll need to beef up the hash and the regex as needed, if not everything is '\\IN ' and not every replacement is 'OUT/'. As a matter of fact, the expressions take only two forms: \emdash Regular text \'9oeRegular text Some of the expressions (the ones for foreign characters) don't have a space after the control word. So I think: $source =~ s[\\($rx)(?:\s)*][$rep{$1}/]g; Should work? On another note, my script is 1100 lines long, and seems to work. It seems like there is a need for converting RTF to XML, since the perl convertors availble only convert to HTML. I would like to release the script at some point, but when I get tips off this site, I realize how much better an experienced perl programmer could do things. It would be much more effective to work on this as part of a team, but I've never done something like this before. I guess I'll post feelers on other mailing lists. (This really should be another thread!) Thanks! Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: time comparison
On Fri, Jul 26, 2002 at 06:51:08AM -0700, lz wrote: I went to search.cpan.org trying to look for utility that will return current GMT time, and couldn't find any. According to *Perl Cookbook*: use Time::gmtime; $seconds = $tm-sec; The second line is just an example of getting the seconds that have passed since 1970, a useful variable when you are trying to figure out if a file is older than a certain date. But I believe you should be able to use any time functions with the gmtime module that you can with the localtime module. Hope that helps Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: fastest way to substitute
On Fri, Jul 26, 2002 at 02:02:54PM -0400, Jeff 'japhy' Pinyan wrote: Jeff: I ran a benchmark on your method, and it actually proved slower. I ran a test line 10,000 times. Directly substituting each line took 38 wall seconds. Using your method took 60. I included my test below, in case I made a mistake. Thanks Paul ## #!/usr/bin/perl #-w use strict; use Benchmark; my $loopcount=10_000; ##my $loopcount = 1; my $line = \\tab Paul Tom said \\ldblquote we are brothers \\rdblquote \\emdash which was the truth. \\e4 \\e5 \\e6 text text \\e7 text \\e8 text \\e9 text text text \\e10 text \\e11 text \\e12 text \\e13 text \\e14 text text \\e15 text \\e16 text \\e17 text \\e18 text \\e19 text \\e20 text e\\21 text \\e21 text \\e22 text \\e23 text \\e24 text \\e25 text \\e26 text \\e27 text \\e28 text \\e29 text \\e30 ; sub each_line{ $_ = $line; s//amp;/g; s//lt;/g; s//gt;/g; s/\\ldblquote /rt_quote\//g; s/\\rdblquote /lt_quote\//g; s/\\emdash /em_dash\//g; s/\\rquote /r_quote\//g; s/\\tab /tab\//g; s/\\lquote /l_quote\//g; s/\\e4 /e4\//g; s/\\e5 /e5\//g; s/\\e6 /e6\//g; s/\\e7 /e7\//g; s/\\e8 /e8\//g; s/\\e9 /e9\//g; s/\\e10 /e10\//g; s/\\e11 /e12\//g; s/\\e13 /e13\//g; s/\\e14 /e14\//g; s/\\e15 /e15\//g; s/\\e16 /e16\//g; s/\\e17 /e17\//g; s/\\e18 /e18\//g; s/\\e19 /e19\//g; s/\\e20 /e20\//g; s/\\e21 /e21\//g; s/\\e22 /e22\//g; s/\\e22 /e22\//g; s/\\e23 /e23\//g; s/\\e24 /e24\//g; s/\\e25 /e25\//g; s/\\e26 /e26\//g; s/\\e27 /e27\//g; s/\\e28 /e28\//g; s/\\e29 /e29\//g; s/\\e30 /e30\//g; ##print $_; } sub hash_method{ my $line = $line; my %rep = qw( ldblquote rt_quote rdblquote lt_quote emdash em_dash rquote r_quote tab tab lquote l_quote amp lt; gt; e4 e4 e5 e5 e6 e6 e7 e7 e8 e8 e9 e9 e10 e10 e11 e11 e12 e12 e13 e13 e14 e14 e15 e15 e16 e16 e17 e17 e18 e18 e19 e19 e20 e20 e21 e21 e22 e22 e23 e23 e24 e24 e25 e25 e26 e26 e27 e27 e28 e28 e29 e29 e30 e30 ); my $rx = join |, map quotemeta, keys %rep; $line =~ s[\\($rx) ][$rep{$1}/]go; ##print $line\n; } #-- # the main loop section timethese $loopcount, { each_line = \each_line, hash_method = \hash_method, }; # end of the world as I knew it [EMAIL PROTECTED] all rights reserved ### On Jul 26, Paul Tremblay said: Is there a quicker way to substitute an item in a line than reading the line in each time? I am writing a script to convert RTF to XML. One part of the script involves simple substitution, like this: s/\\ldblquote /rt_quote\//g; s/\\rdblquote /lt_quote\//g; s/\\emdash /em_dash\//g; s/\\rquote /r_quote\//g; s/\\tab /tab\//g; s/\\lquote /l_quote\//g; It's best to come up with a hash of strings and replacements: my %rep = qw( ldblquote rt_quote rdblquote lt_quote emdashem_dash rquoter_quote tab tab lquotel_quote ); Then create a regex: my $rx = join |, map quotemeta, keys %rep; Then use it in a larger regex: $source =~ s[\\($rx) ][$rep{$1}/]g; Ta da! ONLY one pass through the string. You'll need to beef up the hash and the regex as needed, if not everything is '\\IN ' and not every replacement is 'OUT/'. -- Jeff japhy Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/ ** Look for Regular Expressions in Perl published by Manning, in 2002 ** stu what does y/// stand for? tenderpuss why, yansliterate of course. [ I'm looking for programming work. If you like my work, let me know. ] -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
create annoymous hash here?
I have a small array that looks like this: @array = qw(rtf1_deff fonttbl s33_up7) I generate this file from parsing an rtf file. I have to process each element in the array. I have to separte the letters from the numbers. So the first element would break down to rtf and 1. Based on the first part of the element (the rtf part), I have to determine a set of actions. Later on in the program, I need to once again process each element in the array, and that means separtaing the letters from the numbers. It seems ineffecient to do this twice. Should I create a hash within my array? In other words, each element in the array will have one or more values attatched to it: rtf1_deff: ignore, 1 (ignore this, it's number is one); font_table: ignore s33_up7: print, 33 I'm not sure how to create this data structure. The *Cookbook* shows you how to created hashes within hashses, but not hashes within arrays. (Or do I want to create an array within an array?) My second question is whether I should bother. I would rather just process the elements twice by sending them to a subroutine, but I don't want to unnecessarily slow down my script. Thanks! Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: script too slow?
On Mon, Jul 15, 2002 at 10:26:25AM +0200, Janek Schleicher wrote: To increase speed, we can make also a lookahead statement: my @tokens = split / ( \\ (?=\S)# there's never a whitespace (?: [^\s{}]+ | [^\s\\}]+ | [\\}] ) | } )/x = $line; Can you explain the lookahead statement to me, or better yet, point me to some good documentation? It is not explained in *Perl Cookbook.* As John pointed out, your solution leaves off the leading open bracket. However, it is about twice as quick as the previous solution. You solution tokenizes this line: {\i italics} This way: '{' '\i' ' italics' Actually, I can deal with this. I would just have to set a flag in my program. If the preceeding token was '{' then the next token is part of an opening group. However, your solution brings up another problem. It does not speparate true brackets (which convey formatting information) from escaped brackets (those in the text). Likewise, no distinction is made between true back slashes and escaped back slashes. Here is a very typical line, and the tokesn it should be split into. \pard\plain \{All of this text {\i italicized words} is \{\} \}\{between brackets\} \\escaped_back_slash\par '\pard' '' '\plain' ' ' '\{' 'All of this text ' '{\i' ' italicized words' '}' ' is ' '\{' '' '\}' ' ' '\}' '' '\{between' ' brackets' '\}' '' '\\' '\\escaped_back_slash' '\par' ' Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: script too slow?
On Sun, Jul 14, 2002 at 04:45:19AM -0700, John W. Krahn wrote: So your split could be simplified to: my @tokens = split /({\\[^\s}{]+|\\[^\s\\}]+|\\[\\}]|})/, $line; Ah, that cuts the tokenize process in half. My entire script now takes 40 seconds to run instead of 50. Thanks! Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
script too slow?
I just finished my first version of a script that converts rtf to xml and was wondering if I went about writing it the wrong way. My method was to read in one line at a time and split the lines into tokens, and then to read one token at a time. I used this line to split up the text: @tokens = split(/({\\[^\s\n\}{]+)|(\\[^\s\n\\}]+)|()|(})|(\\})/,$line); Splitting up the text on my test file of 1.8 megabytes tooks 25 seconds. The entire script took 50 seconds. I had written a previous uncompleted version in which I relied on regular expressions rather than tokens, and this script took only 10 seconds to run. I gave up on this method because it seemed there would always be an excpetion that would require another regexp. So why does splitting a text into tokens take so long? Has anybody done something similar to what I am trying, and do you have any advice? The good news is that relativley speaking, perl is very, very fast. I tried a similar script in python using a lexer called plex, and the 1.8 megabyte file took 12 minutes to parse! In case you are wondering why I'm seemingly obsessed with speed, I would like to make this script available to anyone. Right now the only free utilities for converting rtf to xml are a java utility call majix, which deletes your footnotes and only allows for 9 user-defined styles. If my perl script is too slow, it won't be very useful. Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: script too slow?
On Sat, Jul 13, 2002 at 08:08:50PM -0700, Marco Antonio Valenzuela Escárcega wrote: Subject: Re: script too slow? maybe you should check this out: http://search.cpan.org/search?dist=RTF-Tokenizer http://search.cpan.org/search?dist=RTF-Parser These modules could be exactly what I need. However, since the documentation is so minimal, I don't know how to use them. I have a feeling that the second module converts to html and not xml? Is the first module just a tokenizer? Splitting the document into tokens seems to be the easiest part of the job. Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: script too slow?
On Sat, Jul 13, 2002 at 10:57:04PM -0400, Tanton Gibbs wrote: I'm not exactly sure what the problems are; however, here are a couple of things to try 1.) If you don't need to save the value of each of the subexpressions, then tell perl so by using ?: after each opening paren. Once I tokenize the text, I don't use regular expressions at all. Here is an example of an rtf line: \pard \s1\fi720 \ldblquote Big guy, I didn\rquote t expect to see you so early, \rdblquote Joe said. \par Here are the tokens: '\pard':'\s1':'\fi720':'\ldblquote':' Big guy, I didn': '\rquote': t expect to see you so early,' :''\rdblquote' : ' Joe said. ' :'\par' Each of the escaped sequences represents some type of info that I have to decide what to do with. I use the substr function to determine the nature of the token. Actually, I have simplified the list of tokens. My split function actually produced 31 empty ('') tokens for this one line. So perl is doing a lot of searching. 2.) Usually alternation is much slower than doing separate regexes...however, in your case separating the regexes is seemingly impossible. I'm not sure what alternation is. But now I am thinking that regexes are really not at all impossible. Perhaps they require a little more thought. That's not what stopped me from using them. I thought that I as I encoutnered more complex rtf, with different (and insidious versions) of word, I would have have to tweak my code so much that I wouldn't be able to maintain it. However, on second thought, I don't think the problem is that complicated. Let's take a look at the line above. It starts with pard this means start a paragraph with a new style. the style names are stored in the escaped sequences afterwords. So this style name is \s1 (stlye 1), \fi720 The fi means first indent by 36 pts. There are a zillion other tokens, all of which I don't understand. What I need to know is when the text starts. I could just look for non-escaped text. But the '\ldblquote' actually marks the start of the text because it means left quote. You can start to see some of the complexities and why I thought it better to handle one token at a time. However, I was just playing around with perl. I substituted every instance of \ldbquote and 4 other control sequences (right quote, em-dash, tab, and right curly). That only took 4 seconds for a 1.8 megabyte documents. So I am thinking of doing the simple substitutions first, and then proceeding. For example, if I substitute /\\ldblquote/lft_quote//g; /\\rdblquote/rt_quote//g; then my line looks like this: \pard \s1\fi720 lft_quote/ Big guy, I didn\rquote t expect to see you so early, rt_quote/ Joe said. \par now I can substitute: s/\\pard(.*?)\s[^\\]/para style=\$1\/; # pard, followed by a #space, followed by #any character that # is not a backslash The most difficult part will be dealing with footnotes. They look something like this: {\footnote \pard \fi720 {\i italics word} text {\b bold words}} This line contains a nested structure, and I have to determine when it ends, because the paragraph styles are independent of the styles in the main body. For this I will have to use //g as you suggested, and keep counting the open and closed brackets until they equal zero. One last note on why I think I can change my strategy. an rtf line can look like this: \pard He was reading { \i The Sun Also Rises} when he heard the dog bark.\par This line should look like \pard He was reading {\i The Sun Also Rises} ... In other words, rtf is so scrwed up, that it even splits tokens across lines. However, I just read the Perl Cook book and realize I can do this: $\ = \\par; read in each line s/\n//g;# get rid of line endings. This will work. The only line # line ending should come at the \par delimter Also, rtf does this \pard {i The Sun Also Rises \par } I have to read the whole file in and swith it so it reads: \pard {i The Sun Also Rises} \par I tried this on my big document, and it took only 4 tenths of a second. In sum, I am thinking that the regex are so super fast in perl that it I choose carefully what to substitute first, I can parse my document much faster. Thanks! -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: fastest regexp: split or (.*)?
On Thu, Jul 11, 2002 at 09:17:18PM -0700, drieux wrote: http://www.wetware.com/drieux/pbl/Other/BenchMarks/split_v.re_for_RTF.txt Thanks. It appears splitting is quickest. What is bizzare about your link is that the code appear *almost identical* to what I had written. I mean, I asked a question with an example line, and the same example line was used; I chose two variables, and the same two variables were used. I kept having to say no they didn't use your code you posed a few hours ago. Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
fastest regexp: split or (.*)?
I am writing a script to parste rtf into xml and have a number of questions about regular expressions. The first is: I have this line: \sbknone\linemod0\linex0\cols1\endnhere \pard\plain text\par I want to split it at the expression \endnhere. I can either use @array = split(/endnhere/,2); $first_part_of_line = $array[0]; $second_part_of_line = $array[1]; Or: $line=~/(.*?)endnhere(.*); $first_part_of_line = $1; $second_part_of_line = $2; Which is quickest? I am using this method repeatedly in my script, so I wanted the quickest method. I also have come accross the g anchor. For example: $line=~/(.*?)endnhere/g; $rest_of_line=~/\G.*/; Is it advisable to use this anchor? I know that the $` and $' are deprecated because they slow down a script (at least according to *Perl Cookbook*). How about these anchors? They seem like they would be very useful. Thanks! Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: converting html to text
On Fri, Apr 05, 2002 at 05:15:08AM -0800, drieux wrote: ### #!/usr/bin/perl ### ### use HTML::Parser; ### use HTML::FormatText; ### use HTML::TreeBuilder; ### ### my $html_text; ### my $filename = $ARGV[0]; ### open(FH, $filename) or die unable to open file $filename :$!\n; ### while (FH) { $html_text .= $_ ; } ### ###my $plain_text = HTML::FormatText-new-format(parse_html($html_text)); ### my $tree = HTML::TreeBuilder-new-parse($html_text); ### my $plain_text = HTML::FormatText-new-format($tree); ### ### print $plain_text\n; ### I tried this code, and it did not work. I also tried this code: use HTML::TreeBuilder; my $tree = HTML::TreeBuilder-new(); $tree-parse_file(/tmp/cleanup); use HTML::FormatText; my $formatter = HTML::FormatText-new(leftmargin = 0, rightmargin = 70); #print $formatter-format($tree); my $ascii = $formatter-format($tree); It also did not work. The problem is that the filter deletes all of my text and ouputs this: [TABLE NOT SHOWN][TABLE NOT SHOWN][TABLE NOT SHOWN][TABLE NOT SHOWN][TABLE NOT SHOWN] I have tried it on five different files. All of these files were from the same website. It appears that this module is broken. That is, it can't handle certain html (which is valid when looked at in a browser). I think I'm ready to try other filters (those not in perl). Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
converting html to text
I spent several hours last night trying to convert an html file to text, so that I could include it in an email. Someone from a mailing list sent me a simple perl script, which worked for my purpose. However, this script simply eleminates tables and lists. I am wodering if there isn't a CPAN module already written. Converting html to text seems like such a common task, that there ought to be some robust scripts out there. Interestingly enough, I found many scripts to convert html to rtf and LaTeX and every other format, but not plain old text! Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: converting html to text
On Thu, Apr 04, 2002 at 10:36:36AM -0800, Agustin Rivera wrote: Are you looking to keep the basic formatting of the HTML in tact during the conversion, or just want the HTML stripped? I wouldn't imagine that it would be that hard to convert the HTML to text if the HTML wasn't overly complicated. I am just trying to strip the tags and preserve formatting. Of course, you can do a quick hack (s/.*?//g;#etc), but that still leaves you with messy line breaks. I've already tried saving the file as text from both Netscape and Lynx, and they did a pretty poor job. A good script might put a * before list items, might apporxtimate tables, etc. No sense in hacking something together when someone problably wrote a powerful version. Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: anonymous subroutine problem
On Wed, Mar 27, 2002 at 09:05:02PM -0800, bob ackerman wrote: i copied the code as is, and got no error. are you sure line #29 is where you are calling $_incr_count-()? are you sure the code you are executing is what you posted? No apparently this is not the code that gives me the error message! Below is the code that gives me the same error message. #!/usr/bin/perl -w my $cd = CD::Music-new(Canon in D, Pachelbel, Boering Mussak GmbH, 1729-67836847-1, 1, 8,8, 5.0); # How many CDs in the entire collection? print The number of CDs is ,CD::Music-get_count, \n; package CD::Music; use strict; { my $_count = 0; sub get_count {$_count} my $_incr_count = sub {++$_count};#create an anonymous subroutine sub new{ $_incr_count-();#call anonymous subroutine # # Instead I could put this code?: # my $_incr_count = sub {++$_count}; #$_incr_count-(); ### $_incr_count-(); #call the anonymous subroutine #Gives me an error message shown below my ($class) = @_; bless { _name = $_[1], #second value passed to subroutine _artist = $_[2], #third value passed to subroutine _publisher = $_[3], #and so on _ISBN = $_[4], _tracks = $_[5], _room = $_[6], _shelf = $_[7], _rating = $_[8], }, $class; } } #ERROR MESSAGE WITH 'use strict' COMMENTED: # #Use of uninitialized value in subroutine entry at #/home/paul/bin/test16.pl line 27. #Undefined subroutine main:: called at /home/paul/bin/test16.pl #line 27. #ERROR MESSAGE WITH 'use strict' #Use of uninitialized value in subroutine entry at #/home/paul/bin/test16.pl line 27. #Can't use string () as a subroutine ref while strict refs #in use at /home/paul/bin/test16.pl line 27. Again, I believe that the reference to the anonymous subroutine expires once I start the 'new' subroutine. If I put the anonymous subroutine within the 'new' subroutine block, then the script executes correctly, and I still make sure that you can't increment the value of the number of CDs directly. Thanks! Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
path for personal library
I am trying to set up a library for my own modules. According to *Object Oreinted Perl,* (Conway), I should do the following: PERL5LIB=${PERL5LIB}:/home/paul/perl5 export PERL5LIB However, this doesn't work for me. If I put a module in the directory /home/paul/perl5, then my perl scripts tell me it can't find the module. If I use use lib /home/paul/perl5; Then I don't get an error message. How do I set a permanent path for my own personal library? Thanks Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
anonymous subroutine problem
I am coppying the code below directly from *Object Oriented Perl* by Conway. (Error message is below.) package CD::Music; #use strict; # turn this off for testing purposes { my $_count = 0; sub get_count {$_count} my $_incr_count = sub {++$_count};#create an anonymous subroutine $_incr_count-();# it works if I call it here sub new{ $_incr_count-();#call the anonymous subroutine #this line above causes the error below #Isn't this where the anonymous subroutine should go? #my $_incr_count = sub {++$_count}; my ($class) = @_; bless { _name = $_[1], #second value passed to subroutine _artist = $_[2], #third value passed to subroutine _publisher = $_[3], #and so on _ISBN = $_[4], _tracks = $_[5], _room = $_[6], _shelf = $_[7], _rating = $_[8], }, $class; } } I am getting this error message: /home/paul/bin/test16.pl Use of uninitialized value in subroutine entry at /home/paul/bin/test16.pl line 29. Undefined subroutine main:: called at /home/paul/bin/test16.pl line 29. I am nearly positive that this error message results because the reference to the anonymous subroutine expires once I start the subroutine new. Could a more experienced user confirm this? I believe that the annonymous subroutine (and the other code that initializes the counter) should go in the new subroutine block. The idea is make sure you can't access the subroutine directly. This happens as long as I make an anonymous subroutine--no matter where I put it. Thanks! PS: Typos in computer books can really cause you to lose your mind. You keep thinking that you typed something wrong! -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
looking for book on object oriented perl
I finished *Learning Perl*(The O'Reilly book), and thought it one of the best books relating to a linux subject I have yet read. (I know that many people on this mailing list probably use Windows. If you are a linux user and have had to suffer through some of the awful documentation on various aspects of linux, then you will think books like *Learning Perl* a masterpiece!) However, I now believe I should learn at least some of the concepts of object oriented perl, especially since I want to use modules like XML::Parser. My search on amazon.com came up with a book called *Object Oriented Perl,* written by Damian Conway and others. Amazon.com lets regular people review the book, and almost every one of these reviewers raved about the book. It was the best book the ever read, they wished they had read it a long time ago. However, I believe I came upon a portion of this book on line. This is the url: http://www.google.com/search?q=cache:NiHGVcWGmw8C:www.csse.monash.edu.au/~damian/papers/PDF/cyberdigest.pdf+object+oriented+perlhl=en (1) Could someone tell me if this in fact is from the same book? From what I read on this site, I was not too impressed with the book at all. It seemed to go on forever explaining theory without giving any concrete examples with perl code. My second question is: (2) Does this book get better in chapters 2 and 3? Do people on this mailing list recomend it? Keep in mind that I am a beginner with no previous experience in any object oriented langauge. Thanks! Paul -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: looking for book on object oriented perl
Thanks. Looks like I'll give it a try. I came across an excellent tutorial: http://www.extropia.com/tutorials/perl5/oop.html I recomend this as a place to start for any beginner. I have the Cookbook, and although it is excellent in most parts, its treatement of oop is confusing, unless you are already grounded in oop. I understand what you mean when you say that you have to have some theory before tackling object oriented programming. The web page I looked at, however, went on and on without ever trying to ground the theory in practice. I think the book will be better. Paul On Mon, Mar 18, 2002 at 10:58:50AM +, Jonathan E. Paton wrote: Date: Mon, 18 Mar 2002 10:58:50 + (GMT) From: Jonathan E. Paton [EMAIL PROTECTED] Subject: Re: looking for book on object oriented perl To: [EMAIL PROTECTED] (1) Could someone tell me if this in fact is from the same book? From what I read on this site, I was not too impressed with the book at all. It seemed to go on forever explaining theory without giving any concrete examples with perl code. This is a series of extracts from Object Orientated Perl, and it looks familar enough but isn't any one chapter - it's a preview as it were. In my opinion, the balanace between theory and technique in the book is good, you really can't expect a book about object oriented to be without any theory at all right? There is a lot of theory, explaining the various approaches then following with an implementation of what was dicussed. In OO there is NO ONE WAY to implement a system, you need to know what the different techniques are and when to apply them - this book will teach you that. (2) Does this book get better in chapters 2 and 3? Do people on this mailing list recomend it? Yes, they do - as I bought it on other people's recommends and didn't regret doing so :) Keep in mind that I am a beginner with no previous experience in any object oriented langauge. I was very fresh on OO perl when I first read the book and I did found a lot of good tips and advice from it. If you are too fresh to OO perl, I would say this is a good place to start. OO is difficult at first, but makes larger problems easier. However, before you can do anything complex you need to learn a lot of things first. However, you should remember the Cookbook and the Camel both have chapters on OO that you may want to start with - although they are geared towards those with some prior experience. Jonathan Paton __ Do You Yahoo!? Everything you'll ever need on one web page from News and Sport to Email and Music Charts http://uk.my.yahoo.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- *Paul Tremblay * *[EMAIL PROTECTED]* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]