RE: finding rows in a large file (22 millions of rows)

2003-02-11 Thread Westgate, Jared

Madhu Reddy Wrote:
 We are trying to load date into teradata [which is 
 data warehousing, stores Terabytes of data, and which 
 is 10 times faster than any other database..)

Data warehousing is always an exciting subject!  However, I'd be surprised to
see this kind of performance increase.  A major factor in database performance
is the database design.  Many database designers do not know how to build 
data warehouses, they are stuck on normal relational concepts.  Anyway, sorry 
to be off topic...  I just can't turn down a database debate! :)

 before loading data into Teradata, we need to do some
 massaging on data..basically eliminating..duplicate
 rows and invalid rows...

I don't know anything about the Teradata database system, but I know how I 
would do this on other systems: 

1. Load the data as it is into a temporary database
2. Do a select (or a report), returning unique (distinct) rows.  This same
   select could also filter out your invalid rows and massage data.
3. Load the result of the select into the final database.

If you are really looking to do this with Perl, I guess you load the data
into a hash, sort it, and then print the unique values.  I have no idea how
long this would take to run, but the code would be fairly straight-forward:

Just load the data into a hash using each column as a key.  Then sort the
hash (this may take a little while).  Finally, write a conditional that 
cycles through the hash, checking the first key.  If the hash record you 
last read is the same as the current one, don't print it to a file.  
Otherwise, do print it to a file.  At this point you could also do some 
formatting, etc. 

I guess it all just depends on which you are more comfortable with.

Hope this helps,

Jared

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Formatting Variables.

2003-02-11 Thread Westgate, Jared
Ramón Chávez wrote:
 I mean, if I don't want to get printed 3.1415926535 (Or any irrational
 number) but something like 3.14, is there a way to use format??

I agree with the other posts.  Use printf.  Here is some more reading, to check out:

perldoc -q long decimals
perldoc -q round

Hope this helps,

Jared

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: still needing help

2003-01-30 Thread Westgate, Jared
Warning: opinionated text follows, so please don't take offense :)

stepping onto soapbox

  whatever it gets text where it needs to go... and if 
 all you need is
  text the form parser below is fine.. also...if your not offering any
  real help...maybe you can keep your comments to yourself :)
  I hate people who answer questions with no or you cant do that or
  the like..it freakin lame...maybe you could tell him why 
 you think this
  form parser is broken and actully really help someone...ok
  so maybe stop being so freakin cool for just a sec and try to help
 
 sure =0)..
 
 Im no Randal Schwartz:

I kept telling myself that I wasn't going to get involved...  Oh well.  Although I 
think Jdavis was a little too harsh, I have to agree with him (although, not as 
adamantly).  First off, I also admit I'm no Perl guru, but I'm learning.  I think 
Jdavis was saying that _reasons_ why something is broken (or doesn't work, or is 
bad, or whatever), are helpful to beginners.  
 
In fact, I occasionally find myself frustrated with the brevity of many responses to 
people's questions.  I think a lot of people are using this list to learn, not just to 
be told what to do.  I'm not saying to write a novel out of each response, but a 
little detail can be nice.  You have to remember, a lot of people who are learning 
Perl (and even many who are learning English) are using this list.

 I dont think the parser is broken, I KNOW it is ;0). Among 
 other things,
 this:
 
  @in = split(//,$in);
 
 is 'bad, bad, bad, bad, ' x 100_000_000

Why is this bad?  Don't get me wrong... I'm not saying you are incorrect, because 
frankly I don't know.  Is it because he is using a scalar with the same name as the 
array he is assigning it to?  Oh well, I don't even remember the rest of the code that 
was posted :)
 
 I cant even use programs that use that parser on my RH 
 konquerer or Mozilla

Why does it cause problems with Konquerer or Mozilla?
 
I guess I've always been the type to question someone else's opinions. :)  There is 
really no offense intended.  I'm just hoping to keep people's minds open and ease 
tensions.  Lets try to keep this list as helpful as possible.

stepping down from soapbox

Jared



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: how do i get rid of and , chars ??????

2003-01-28 Thread Westgate, Jared
Swami wrote:
 I am reading a line from a file and splitting it into
 a 2 dimensional array, this is no probs..
 BUT i want to get rid ofand , out of each line
 - how do i do this ???

You can use the transliteration operator for this.  You will have to use the d 
modifier to tell it to delete the characters you specify.  Just put this into your 
code:

while ($line=INFILE)  # This is where you're reading in the file
{
 chop $line;

 # this is the transliteration.  Look for anyor , characters
 # and delete them. 
 $line=~tr/,//d;

 # now, you can do your splitting, etc...   
} 
 
If you prefer, you could of course use the tr on the array elements, after it was 
split.

Regards,

Jared

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Getting Perl

2003-01-22 Thread Westgate, Jared

Scott Barnett wrote:

 Is there a free version of Perl that I can get that will run 
 on Win98 machine. I want to start learning Perl. checked 
 ActiveState but it looks like that is only a 15 or 30 day 
 evaluation, I may be wrong?

I assume you want a binary, not the source code?  Well, check out this link, it is the 
CPAN Perl ports page (Windows binaries section).  http://www.cpan.org/ports/#win32

I'm personally using SiePerl.  It seems to be able to do everything I need. :)  If you 
go to the link, there is installation documentation and a list of modules.  You 
probably want the 5.8 version.

Hope this helps,

Jared

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Size of number in scalar

2003-01-22 Thread Westgate, Jared
Chris Said:
 Someone posted a question as to the size of number which a 
 scalar would 
 tolerate.  

I guess I missed this thread, so I hope I'm not repeating information. :)

 Perl seems to 
 tolerate quite a bit of this, as the app has been churning away, 
 printing every so many number just to let me know where it's 
 at, and it 
 recently went past two hundred billion (I cheated and am incrementing 
 by 1000 instead of 1, because I got impatient incrementing by one).  

From the O'Reilly Camel Book:

Perl stores numbers as signed integers if possible, or as double-precision floating 
point values in the machine's native format otherwise

 $scalars in perl handle big numbers ... and maybe Perl notices when a 
 boundary is being crossed and reconstitutes the number?

Also, see perldoc perlnumber for more information.  It explains a lot of this. 

Sorry if I'm repeating information,

Jared

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: printing number with commas in it

2003-01-15 Thread Westgate, Jared
Reggie Wrote:
 
 I am trying to print a number with commas in it. I cannot 
 find the correct syntax to do this with printf.
 I considered using the substr function but this depends on 
 mealways knowing the size of the number.
 Can you help me with this?

I like to use the method listed in the perldocs.  Try this:

  perldoc -q output my numbers with commas

It lists a really cool solution, although I'm sure there are plenty of others :)
It credits Benjamin Goldberg with this:

   s/(^[-+]?\d+?(?=(?(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g;

Hope this will work for you,

Jared

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: uploading and downloading files to MySQL (--2--)

2003-01-09 Thread Westgate, Jared

Mariusz wrote:

 What type of field should I use for storing the path; just 
 VARCHAR I guess?
 And as far as the filenames - make up some random file name for each
 submitted file?

This is a little off topic for a Perl list, but I'll give it a shot.  Be forewarned, 
that I do not know much about MySQL.  I am writing this based on my experiences with 
Oracle and other DB languages.

It all depends on how you want to handle it.  Varchar would work fine for storing a 
path.  If you wanted to, you could also come up with a random name for the file.  Of 
course, you would have to do some checking to ensure the file name was not in use, or 
maybe just use a fancy directory naming scheme.  

 ps. If storing files in the DB is not common, what exactly is 
 the BLOB type
 for?

Storing files in the DB is not common, but it is used in some cases.  Depending on 
your application, it may be appropriate.  There are several advantages to storing 
these types of files in the database:

1.  You don't have to worry about file names
2.  You get an indexed search to find the file itself.  In other words, you don't need 
another system call to get the file off of the file system and then return it.  
3.  Maybe faster retrieval times (depending on how your hardware and DB are setup).  
Not real likely unless you have a lot of money to work with.  I'm talking Oracle with 
multiple servers, etc.  
4.  Restoring the database will restore the files within it
5.  All of the database features themselves would apply to the files.  For example, 
greater control over security.  

Of course, there are also disadvantages:

1.  Much larger database (this is a big disadvantage)
2.  Probably slower data retrieval times (once again, depending on how things are set 
up).  This would probably be the case for you.
3.  BLOBS, GLOBS, etc are typically much harder to work with

I'm sure there are many other advantages/disadvantages that I could not think of off 
hand.  Just be aware that in most situations, as a general guideline, you want to 
store the path to a file, not the file itself.  I highly suggest trying it both ways 
(although I know that sometimes that is impossible).  But, never take anyone's word 
for it!  :)  Also, try googling for database storing files or database file 
storage or something like that.  I think you will find a lot of useful information. 

Hope this helps,

Jared

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Beginners -- a suggestion

2002-11-21 Thread Westgate, Jared
Just to help out Huan Huang below:

Huan Huang wrote: 
I have set up Activeperl in my windows NT system. I have some source file.
But I really get no idea how to run the source file. Could anybody give me
a go?

You basically need to associate the file type with the perl interpreter.  I am not 
using NT, so I can't tell you the exact steps, but you basically want to open 
explorer.  When you double-click a .pl file, it will ask you what to run it with.  
Specify the perl.exe file from the path where it was installed.

Also, you want to make sure you add the location of the perl.exe file and your 
perl/bin directory to your path.

As far as good tutorials...  It depends on what you need help with.  But, if you need 
help learning Perl, you might find these helpful:
http://www.comp.leeds.ac.uk/Perl/start.html
http://www.cclabs.missouri.edu/things/instruction/perl/perlcourse.html

I'm not sure, but I think I got them from the perlfax (even if not, you'll want to 
read the fax):
http://www.perldoc.com/perl5.8.0/pod/perlfaq.html

Hope this helps,

Jared


-Original Message-
From: Zemer Rick [mailto:[EMAIL PROTECTED]]
Sent: Thursday, November 21, 2002 9:20 AM
To: [EMAIL PROTECTED]
Subject: Beginners -- a suggestion



I too am rather new to Perl, and have found the OpenPerlIDE debugger to be
helpful.  It is available on sourceforge.net and probably elsewhere.

Hope that helps.

-rz.

She was trying to construct a life that made sense from things she found in
gift shops. 
 -kv.

-Original Message-
From: Jim Blanchard [mailto:[EMAIL PROTECTED]]
Sent: Thursday, November 21, 2002 11:02 AM
To: 'Sorin Marti'; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: RE: Email To Mysql.


I am new too! I have tried Activeperl on my XP machine at home and found
that you have to run the version I have in a command window.

c:perl file name


I hope this helps.

Jim Blanchard
-Original Message-
From: Sorin Marti [mailto:[EMAIL PROTECTED]]
Sent: Thursday, November 21, 2002 10:49 AM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: Email To Mysql.


[EMAIL PROTECTED] wrote:

Hi, I am really new to Perl.

me too

I have set up Activeperl in my windows NT system. I have some source file.
But I really get no idea how to run the source file. Could anybody give me
a go?

try this: http://www.wdvl.com/Authoring/Languages/Perl/Windows/

I have checked the following sites but found few introduction to windows
user.

http://www.perldoc.com/perl5.8.0/pod/perl.html#NAME

could anybody give me some helpful websites as well? thanks a lot!

Huan Huang

Please choose a good subject to mail at a mailing list. Your message has 
nothing to do with Email To Mysql so choose a Subject like Perl 
documentation for windows user so everyone knows what you're talking about


Greets

Sorin


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Confidentiality Notice: This e-mail message, including any
attachments, is for the sole use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized
review, use, disclosure or distribution is prohibited.  If you are not
the intended recipient, please contact the sender by reply e-mail and
destroy all copies of the original message.

PARTNERS Health PlanPhone: 574-233-4899
100 E. Wayne St., Suite 502 Fax:  574-234-7484
South Bend, IN 46601www.partnersindiana.com



Confidentiality Notice: This e-mail message, including any
attachments, is for the sole use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized
review, use, disclosure or distribution is prohibited.  If you are not
the intended recipient, please contact the sender by reply e-mail and
destroy all copies of the original message.

PARTNERS Health PlanPhone: 574-233-4899
100 E. Wayne St., Suite 502 Fax:  574-234-7484
South Bend, IN 46601www.partnersindiana.com



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Working with text files.Please help!

2002-11-21 Thread Westgate, Jared
Hey Vitali,

I am also a beginner, so I don't know if I am doing everything most efficiently... 
but, here is what I wrote to do this.  I wouldn't normally just write a program for 
someone, but the problem intrigued me.  Take a look through the code and it will 
hopefully make sense.  I hope it will at least get you going on the right road!

Jared

begin code
$infile = input_file.txt;

if (! -e $infile)   # if the input file doesn't exist, die
{
 die File $infile not found!;
}
 
open(JARREAD,$infile);  # open the file with a filehandle I made up
while ($line=JARREAD)  # reads each line of the file until done. notice =
{
 chop $line; 
 if  (substr($line,0,2) eq 10)  
 {
  $outfile=substr($line,3,4)..out;  # grab the file name and append a .out
  if ($prefix  0)  # if there is no prefix, this is the first time through.
  {
   close(JARWRITE); 
  }
$prefix=substr($line,0,2);  # prefix is what we add to the beggining of each line.
open(JARWRITE, , $outfile);
  print {JARWRITE} $line\n;  # print the first line with no prefix
   }
   else  
   {
  if (substr($line,0,2)  10) # this says it found the next section ie 20
{
 $prefix=substr($line,0,1); # make the prefix 1 character
 # for each section label 10,20,etc, do not do a prefix
 print {JARWRITE} $line\n; 
}
   elsif (substr($line,0,2) eq AB or substr($line,0,2) eq EP or substr($line,0,2) 
eq A0)
  {
   # if we found one of the above sections titles, just spit out the line
 print {JARWRITE} $line\n;
  }
  else 
  {
   # this substr is because the first char is a space for all other lines
   $line=substr($line,1);
 print {JARWRITE} $prefix.$line\n;
  }
 }
 } 
close(JARREAD);  #close the files
close(JARWRITE);
end code

-Original Message-
From: Vitali Pokrovski [mailto:[EMAIL PROTECTED]]
Sent: Thursday, November 21, 2002 2:39 AM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Working with text files.Please help!



Dear friends,

could you please show me the way,how to write the code for converting data from 
input_file to output files (1001.out,1002.out,1003.out) format.(please see an example 
attached)
I have already doing with this code writing about four week,but 
unfortunatlly..,becouse i'm just beginner:-(

I have a text file (input_file.txt). And output i need are files 
1001.out;1002.out;1003.out.

Here are steps what programm must do:

1.Open input file

2.reads lines from canal 10- to canal A0 (info from line 10 to line A0  must 
be one file)  

3.convert lines(canal numbers 10,20,30 ... on two first position) as is show in 
output files. For example:

10
20
20
20
20
30
30
30

4.and save this data to new  file(1001.out). P.S. File names coming from lines 10 
1001 ,acctually file names is from 3 position  for characters,in this case is 1001 
or 1002 or 1003..

I'm not sure what I'm trying to do here..

#!perl -w 
$count = 0;

 while (STDIN) { 
 if  (substr($_,0,2) eq 10)  {   #If left 2 
   ++$count; 
   $lines = 0; 
   open  OUT, $count.out; 
   print OUT $_;   }; 

   } 
  __EOF__ 


Any help  welcome!

Regards,

VItali
Estonia,Tallinn





















  IncrediMail - Email has finally evolved - Click Here

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]