[Perl-unix-users] regex and multibyte characters

Sundara Rajan Fri, 10 May 2002 09:02:59 -0700

I have a tool developed in perl that processes a file using regex and then
pumps the data into a Berkley DB file. the parts of the logic is as an
example as below


        my @LMUData = ();

    # Initiate the variables
    
   
    # Main Logic
    open (LMUFILE, $InputFile) or die "Could not open $InputFile ";
    open (OPFILE, "> $OutputFile") or die "Could not open $OutputFile ";
    open (LOGFILE, "> $LogFile") or die "Could not open $OutputFile ";

    while (<LMUFILE>) {
        @LMUData = &quotewords('\t', 0, $_);
                        
                # do some pattern matches
        if ($LMUData[10]=~m/<.*>|Ctrl\+|^\-|[\*]|Alt\+|[\(]/) {
             print LOGFILE "$LMUData[10]\t$LMUData[11]\n";
        }
        
        elsif (!$LMUData[10]) {
             print LOGFILE "$LMUData[10]\t$LMUData[11]\n";
        }
        
              
        else {
            print OPFILE "$LMUData[10]\t$LMUData[11]\n";
        }
     
    }

    close(LOGFILE);
    close(OPFILE);
    close(LMUFILE);


I'm using the UTF pragma and the input file is also UTF8.
The pattern matching does not work and in some cases gives me an error that
says "Use of uninitialized value in concatenation (.) or string at
C:\paksa3\DataHandler.pl line 102, <LMUFILE> line 155 "

How do I pattern matching for japanese/chinese characters?

_______________________________________________
Perl-Unix-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

[Perl-unix-users] regex and multibyte characters

Reply via email to