On Apr 11, 2005 11:22 AM, Craig Cardimon <[EMAIL PROTECTED]> wrote: > I am working with huge ASCII text files and large text fields. > > As needs and wants have changed, I will be reprocessing data we have > already gone through to see if more records can be extracted. > > I will need to compare strings to ensure that records I am inserting > into our SQL Sever 2000 database are not duplicates of records already > there. > > I have come up with two ways: > > (1) use string length (number of characters a string holds): [---] > (2) compare strings (or 200-characer substrings thereof) directly: > [---] > Does this sound sane to you folks? If anyone has a better way, don't be shy.
I'm not one to judge sanity one way or another :-) ... but maybe you might want to consider loading your data into a temporary table and having tsql determine the duplicates. If the files are a big as you say, I would think SQL Server BCP, in combo with a native-SQL Server t-SQL stored procedure would be MUCH faster than any line-by-line perl ETL utility you can write; I mean hey, I have SQLSrver 7 running on a laptop importing over 4 Million rows a minute... Perl is great and all, but physics says it isn't going to give you super-threading DB I/O unless you really, REALLY work at it. Just my thoughts- kevdot _______________________________________________ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs