On Apr 11, 2005 11:22 AM, Craig Cardimon <[EMAIL PROTECTED]> wrote:
> I am working with huge ASCII text files and large text fields.
> 
> As needs and wants have changed, I will be reprocessing data we have
> already gone through to see if more records can be extracted.
> 
> I will need to compare strings to ensure that records I am inserting
> into our SQL Sever 2000 database are not duplicates of records already
> there.
> 
> I have come up with two ways:
> 
> (1) use string length (number of characters a string holds):
 [---]
> (2) compare strings (or 200-characer substrings thereof) directly:
> 
 [---] 
> Does this sound sane to you folks? If anyone has a better way, don't be shy.

  I'm not one to judge sanity one way or another :-)

   ... but maybe you might want to consider loading your data into a
temporary table and having tsql determine the duplicates.  If the
files are a big as you say, I would think   SQL Server BCP, in combo
with a native-SQL Server t-SQL stored procedure would be MUCH faster
than any line-by-line  perl ETL utility you can write;     I mean hey,
 I have SQLSrver 7 running on a laptop importing over 4 Million rows a
minute... Perl is great and all, but physics says it isn't going to
give you super-threading DB I/O unless you really, REALLY work at it.

Just my thoughts- 

kevdot
_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to