Re: [fpc-pascal] CSV via PCRE
--- Graeme Geldenhuys <[EMAIL PROTECTED]> wrote: > The code shown in the url below works just fine. Also the usage sample > is all you need to use the tokenizer. Just replace the FieldSpecLine > variable with the content from a CSV file and you are good to go. I > use it as-is in my production code. Plus that unit has been well unit > tested as part of the hourly tests run by the tiOPF project. > > http://tinyurl.com/395vgp It won't compile. Free Pascal Compiler version 2.2.0 [2007/09/09] for i386 Copyright (c) 1993-2007 by Florian Klaempfl Target OS: Win32 for i386 Compiling tiTokenLibrary.pp tiTokenLibrary.pp(26,2) Fatal: Can't open include file "tiDefines.inc" > > But if you insist, I can give you a full running application. My > point was that regular expression are normally a nightmare to debug > and maintain. I don't think so. > Plus not everybody knows them (syntax wise), Not everyone knows how to tie his shoelaces. Regular expressions are used by vi and emacs; in fact, any editor that doesn't let you do a regex search is a joke. (Even some microsoft applications understand regexes.) So everyone who programs should learn regular expressions. Regular expressions are also used by grep, and by the languages awk, Ruby, Perl, etc. Every programmer should know how to use at least one of these languages. Those who don't should perhaps be lumped together with COBOL programmers. Therefore, I honestly believe that anyone who doesn't understand regular expressions has a huge, glaring, inexcusable hole in his computer knowledge. Ignorance of regular expressions is like not knowing how to tie one's shoelaces: it's not something to boast about. The fact that a single regular expression takes as much time to read and understand as many lines of Pascal isn't a problem, because the regex does as much work as many lines of Pascal. Don't worry about that; you'll spend more time writing them than reading them. Regular expressions are no more against the spirit of Pascal than associative arrays (hash tables) or any other feature that is added by using a unit. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
[fpc-pascal] regex-dna: finally fast enough?
I don't think so, although it's over twice as fast as the last incarnation. One speedup I stole from the Perl program: instead of counting matches for /foo|bar/, count matches for /foo/ and for /bar/. The other speedup is lowercasing the string that is searched instead of requiring the regex engine to do a case-insensitive search. I don't think this should be submitted to the shootout site unless it will improve Free Pascal's standing; i.e., unless it is not more than 1.4 times as slow as the Perl program. { The Computer Language Benchmarks Game http://shootout.alioth.debian.org contributed by Steve Fisher modified by Peter Vreman modified by Steve Fisher } uses regexpr; const patterns : array[1..9] of string[255] = ( 'agggtaaa|tttaccct', '[cgt]gggtaaa|tttaccc[acg]', 'a[act]ggtaaa|tttacc[agt]t', 'ag[act]gtaaa|tttac[agt]ct', 'agg[act]taaa|ttta[agt]cct', 'aggg[acg]aaa|ttt[cgt]ccct', 'agggt[cgt]aa|tt[acg]accct', 'agggta[cgt]a|t[acg]taccct', 'agggtaa[cgt]|[acg]ttaccct' ); replacements : array[0..10,0..1] of string[15] = ( ('B', '(c|g|t)'), ('D', '(a|g|t)'), ('H', '(a|c|t)'), ('K', '(g|t)'), ('M', '(a|c)'), ('N', '(a|c|g|t)'), ('R', '(a|g)'), ('S', '(c|t)'), ('V', '(a|c|g)'), ('W', '(a|t)'), ('Y', '(c|t)') ); // Append 2 strings to an ansistring rapidly. Note: the ansistring's // length will be increased by a more than sufficient amount. function append2( var dest: ansistring; len0: longint; s1: pchar; len1: longint; s2: pchar; len2: longint): longint; inline; const quantum = 599000; var newlength: longint; begin newlength := len0 + len1 + len2; // Since setlength() is somewhat costly, we'll do it less // often than you would think. if length( dest ) < newlength then setlength( dest, newlength + quantum ); move( s1^, dest[len0 + 1], len1 ); move( s2^, dest[len0 + 1 + len1], len2 ); exit( newlength ); end; procedure replace_matches( const str: ansistring; var dest: ansistring ); var engine : tRegexprEngine; starti, index, size, truelength, i : longint; pstart : pchar; target, repl: string[255]; begin target := '['; for i := 0 to high(replacements) do target += replacements[i,0]; target += ']' + #0; GenerateRegExprEngine( @target[1], [], engine); dest := ''; truelength := 0; starti := 1; pstart := pchar(str); while starti <= length(str) do if RegExprPos(engine, pstart, index, size ) then begin repl := replacements[ pos( (pstart+index)^ , target)-2, 1 ]; truelength := append2( dest, truelength, pstart, index, @repl[1], length(repl) ); inc(pstart, index+size); inc(starti, index+size); end else break; DestroyRegExprEngine( engine ); setlength( dest, truelength ); dest := dest + Copy( str, starti, length(str)-starti+1); end; function count_matches_simple( pattern: pchar; const str: ansistring ): longint; var engine : tRegexprEngine; p_start, p_end : pchar; count, index, size : longint; begin GenerateRegExprEngine( pattern, [], engine); count := 0; p_start := pchar(str); p_end := @str[ length(str) ]; while p_start <= p_end do if RegExprPos(engine, p_start, index, size ) then begin inc(count); inc(p_start, index+size); end else break; DestroyRegExprEngine( engine ); exit(count) end; function count_matches( pattern: string[255]; const str: ansistring ): longint; var count, p: longint; begin pattern += #0; p := pos( '|', pattern ); pattern[p] := #0; count := count_matches_simple( @pattern[1], str ); count += count_matches_simple( @pattern[p+1], str ); exit( count ) end; var sequence, new_seq, lowered : ansiString; line: string[255]; i, count, init_length, clean_length : longint; inbuf : array[0..64*1024] of char; begin settextbuf(input,inbuf); sequence := ''; init_length := 0; while not eof do begin readln( line ); init_length += length( line ) + 1; if line[1] <> '>' then sequence := sequence + line; end; clean_length := length(sequence); // Count pattern-matches. lowered := lowercase( sequence ); for i := low(patterns) to high(patterns) do begin count := count_matches( patterns[i], lowered ); writeln( patterns[i], ' ', count); end; // Replace. replace_matches(sequence, new_seq); writeln; writeln( init_length ); writeln( clean_length ); writeln( length(new_seq) ); end. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] CSV via PCRE
On 11/11/2007, S. Fisher <[EMAIL PROTECTED]> wrote: > > That's not a working sample. It has no CSV record to parse. > > Give a working program that we can run with no modifications > whatsoever; parse an actual CSV record; print every field > in the record. That's what my sample did. The code shown in the url below works just fine. Also the usage sample is all you need to use the tokenizer. Just replace the FieldSpecLine variable with the content from a CSV file and you are good to go. I use it as-is in my production code. Plus that unit has been well unit tested as part of the hourly tests run by the tiOPF project. http://tinyurl.com/395vgp But if you insist, I can give you a full running application. My point was that regular expression are normally a nightmare to debug and maintain. Plus not everybody knows them (syntax wise), so it make it hard for other developers to maintain that code. Implementing something like a CSV parser in a more Object Pascal fashion goes more with the style of the language (easy to read and understand) and is much easier to maintain by others if needed. Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] CSV via PCRE
--- Graeme Geldenhuys <[EMAIL PROTECTED]> wrote: > OK, while we are busy with show-and-tell... Then have a look at my > token library implementation. > > http://tinyurl.com/395vgp > > Sample Usage: > > tokenizer := TTokens.Create(FieldSpecLine, ', ', '"', '"', '\', > tsMultipleSeparatorsBetweenTokens); > try > lField := tokenizer.Token(2); > lAnotherField := tokenizer.Token(4); > finally > tokenizer.Free; > end; That's not a working sample. It has no CSV record to parse. Give a working program that we can run with no modifications whatsoever; parse an actual CSV record; print every field in the record. That's what my sample did. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] outportb
> > On Nov 10, 2007 4:30 PM, Mihai <[EMAIL PROTECTED]> > > wrote: > > > I am new with FPC and I am trying something nasty on > > > Linux SO (some lp0 output command by data bits). > > > Is there any chance to use FPC on Linux to handle > > > parallel port data bits as in outportb[$378]:=32; ? > > > > Explained here: > > > > http://wiki.lazarus.freepascal.org/Hardware_Access#Using_ioperm_to_access_ports_on_Linux > > Most notably the first note, that these are portably > implemented in unit x86. Got it. Thank very much to all, Mihai ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] More help with unicode
I now see that I should probably be using SendMessageW, but that didn't make any difference. thanks, -- Felipe Monteiro de Carvalho ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Help making code unicode capable
Mattias Gaertner wrote: On Sat, 10 Nov 2007 15:44:54 +0100 "Felipe Monteiro de Carvalho" <[EMAIL PROTECTED]> wrote: Thanks, I arrived at this: var FilterBuffer: WideString; ... FilterBuffer := Utf8Decode(Filter); lpStrFilter := GetMem(Length(FilterBuffer) * 2 + 2); Move(FilterBuffer, lpStrFilter, Length(FilterBuffer) * 2 + 2); But now it crashes when loading the dialog =/ any ideas? Move(FilterBuffer[0], lpStrFilter^, Length(FilterBuffer) * 2 + 2); See my other mail, I think that lpStrFilter := PWChar(FilterBuffer) should work here. No need for a temp buffer. Marc ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Help making code unicode capable
Felipe Monteiro de Carvalho wrote: Hello, I have a small piece of code on LCL which I have found hard to convert to unicode: lpStrFilter := StrAlloc(Length(Filter)+1); StrPCopy(lpStrFilter, Filter); There is a big chance that this is an inheritence of the pre 1.0 fpc times. At that time Casting a string to a PChar didn't work reliable, so all in all cases where you now would use PChar(S) these constructs where used. I think in this case you now simply can use a sting (or widestring) Marc ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] outportb
> On Nov 10, 2007 4:30 PM, Mihai <[EMAIL PROTECTED]> wrote: > > I am new with FPC and I am trying something nasty on Linux > > SO (some lp0 output command by data bits). > > Is there any chance to use FPC on Linux to handle parallel > > port data bits as in outportb[$378]:=32; ? > > Explained here: > > http://wiki.lazarus.freepascal.org/Hardware_Access#Using_ioperm_to_access_ports_on_Linux Most notably the first note, that these are portably implemented in unit x86. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] outportb
On Nov 10, 2007 4:30 PM, Mihai <[EMAIL PROTECTED]> wrote: > I am new with FPC and I am trying something nasty on Linux > SO (some lp0 output command by data bits). > Is there any chance to use FPC on Linux to handle parallel > port data bits as in outportb[$378]:=32; ? Explained here: http://wiki.lazarus.freepascal.org/Hardware_Access#Using_ioperm_to_access_ports_on_Linux -- Felipe Monteiro de Carvalho ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] More help with unicode
On Nov 10, 2007 4:30 PM, Jonas Maebe <[EMAIL PROTECTED]> wrote: > You're making the same error which Marco pointed out earlier: > Utf8Decode returns a (reference counted) widestring, but you are not > assigning it to anything so it ends up in a (reusable) temp location. > As soon as the next temporary widestring has to be created, the > previous one is destroyed and the pwidechar will point to a random > memory block. Ummm, but there is nothing else on that function, so I don't see how can the temporary string be destroyed before I call SETTEXT, unless SETTEXT expects that I give it a storage that is at all times available... procedure TWin32MemoStrings.SetText(TheText: PChar); begin SendMessage(fHandle, WM_SETTEXT, 0, LPARAM(TheText)); end; Which I currently converted into (but still doesn't work): procedure TWin32MemoStrings.SetText(TheText: PChar); var AnsiBuffer: ansistring; WideBuffer: widestring; begin {$ifdef WindowsUnicodeSupport} if UnicodeEnabledOS then begin WideBuffer := Utf8Decode(TheText); SendMessage(fHandle, WM_SETTEXT, 0, LPARAM(PWideChar(WideBuffer))); end else begin AnsiBuffer := Utf8ToAnsi(TheText); SendMessage(fHandle, WM_SETTEXT, 0, LPARAM(PChar(AnsiBuffer))); end; {$else} SendMessage(fHandle, WM_SETTEXT, 0, LPARAM(TheText)); {$endif} end; Alternatively I moved AnsiBuffer and WideBuffer to the class declaration, so they are always available, which also didn't solve the problem of showing wrong characters. It shows the string ééé as if it was Ã(c)Ã(c)Ã(c) Which is what I would expect if I use ansi routines to show a utf-8 string. thanks, -- Felipe Monteiro de Carvalho ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
[fpc-pascal] outportb
Hello all, I am new with FPC and I am trying something nasty on Linux SO (some lp0 output command by data bits). Is there any chance to use FPC on Linux to handle parallel port data bits as in outportb[$378]:=32; ? Thank you very much, Mihai ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] More help with unicode
On 10 Nov 2007, at 16:23, Felipe Monteiro de Carvalho wrote: Which I tryed to convert to: if UnicodeEnabledOS then SendMessage(fHandle, WM_SETTEXT, 0, LPARAM(PWideChar(Utf8Decode(TheText else SendMessage(fHandle, WM_SETTEXT, 0, LPARAM(PChar(Utf8ToAnsi(TheText; But this doesn't seam to work, and the text is shown scrambled. You're making the same error which Marco pointed out earlier: Utf8Decode returns a (reference counted) widestring, but you are not assigning it to anything so it ends up in a (reusable) temp location. As soon as the next temporary widestring has to be created, the previous one is destroyed and the pwidechar will point to a random memory block. The same goes for the Utf8ToAnsi() call, except that here it is an ansistring instead of a widestring. Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Help making code unicode capable
On Nov 10, 2007 3:54 PM, Mattias Gaertner <[EMAIL PROTECTED]> wrote: >Move(FilterBuffer[0], lpStrFilter^, Length(FilterBuffer) * 2 + > 2); The compiler wisely doesn't allow accessing [0], but: Move(FilterBuffer[1], lpStrFilter^, Length(FilterBuffer) * 2 + 2); Seams to work perfectly =) thanks, -- Felipe Monteiro de Carvalho ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
[fpc-pascal] More help with unicode
Hello, I still have some issues with unicode support =) I am trying to implement unicode support for TMemo, but somehow the solution isn't simple. TMemo will throw a WM_SETTEXT message to when it's text is set, like this: SendMessage(fHandle, WM_SETTEXT, 0, LPARAM(TheText)); Which I tryed to convert to: if UnicodeEnabledOS then SendMessage(fHandle, WM_SETTEXT, 0, LPARAM(PWideChar(Utf8Decode(TheText else SendMessage(fHandle, WM_SETTEXT, 0, LPARAM(PChar(Utf8ToAnsi(TheText; But this doesn't seam to work, and the text is shown scrambled. I would think that the control wasn't create with unicode functions, but this is impossible because handle creating is centralized and all other controls work normally, setting the same text to them. I already debugged and execution flow does go throught that part. There is just something about using WM_SETTEXT to set the text of a control which escapes me. Any ideas? thanks, -- Felipe Monteiro de Carvalho ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Help making code unicode capable
On Sat, 10 Nov 2007 15:44:54 +0100 "Felipe Monteiro de Carvalho" <[EMAIL PROTECTED]> wrote: > Thanks, I arrived at this: > > var > FilterBuffer: WideString; > ... > > FilterBuffer := Utf8Decode(Filter); > lpStrFilter := GetMem(Length(FilterBuffer) * 2 + 2); > Move(FilterBuffer, lpStrFilter, Length(FilterBuffer) * 2 + 2); > > But now it crashes when loading the dialog =/ > > any ideas? Move(FilterBuffer[0], lpStrFilter^, Length(FilterBuffer) * 2 + 2); Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Help making code unicode capable
Thanks, I arrived at this: var FilterBuffer: WideString; ... FilterBuffer := Utf8Decode(Filter); lpStrFilter := GetMem(Length(FilterBuffer) * 2 + 2); Move(FilterBuffer, lpStrFilter, Length(FilterBuffer) * 2 + 2); But now it crashes when loading the dialog =/ any ideas? thanks, -- Felipe Monteiro de Carvalho ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Help making code unicode capable
On 10 Nov 2007, at 15:24, Felipe Monteiro de Carvalho wrote: utf8decode returns string I assume? It returns WideString. Is there a function to manually alloc a widestring like StrAlloc? Assign it to a variable of the type widestring. If you cannot this variable in scope the whole time, you have to allocate enough memory (length(widestring)*2+2) and copy the contents (including the terminating zero) to the allocated memory block. Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Help making code unicode capable
> utf8decode returns string I assume? It returns WideString. Is there a function to manually alloc a widestring like StrAlloc? -- Felipe Monteiro de Carvalho ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Help making code unicode capable
> On 10 Nov 2007, at 15:08, Felipe Monteiro de Carvalho wrote: > > > Having a win9x version is trivial, but having a version where we put a > > PWideChar into lpStrFilter looks hard... > > > > lpStrFilter := StrAlloc(Length(Filter)+1); > > StrPCopy(lpStrFilter, Utf8ToAnsi(Filter)); > > > > I tryed this as unicode version: > > > > lpStrFilter := PChar(PWideChar(Utf8Decode(Filter))); > > > > But it crashes the dialog when it is closed... I don't know what > > exactly is happening here, or why the original code manually alocated > > memory and then copyed the string, instead of just copying, > > Maybe because Filter may already have disappeared/freed by the time > the dialog is closed? (and for some reason it may still need that text > at that time) utf8decode returns string I assume?. I think it is that tempstring that is lost. Moreover, the first piece of code assumes that length(filter)>=length(utf8toansi(filter)) I think that is ok in general, but I would assign the result of utf8toansi to a local string first, because even if a shortcut abusing temps works, that will be more comprehensive. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Help making code unicode capable
On 10 Nov 2007, at 15:08, Felipe Monteiro de Carvalho wrote: Having a win9x version is trivial, but having a version where we put a PWideChar into lpStrFilter looks hard... lpStrFilter := StrAlloc(Length(Filter)+1); StrPCopy(lpStrFilter, Utf8ToAnsi(Filter)); I tryed this as unicode version: lpStrFilter := PChar(PWideChar(Utf8Decode(Filter))); But it crashes the dialog when it is closed... I don't know what exactly is happening here, or why the original code manually alocated memory and then copyed the string, instead of just copying, Maybe because Filter may already have disappeared/freed by the time the dialog is closed? (and for some reason it may still need that text at that time) Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
[fpc-pascal] Help making code unicode capable
Hello, I have a small piece of code on LCL which I have found hard to convert to unicode: lpStrFilter := StrAlloc(Length(Filter)+1); StrPCopy(lpStrFilter, Filter); on win32wsdialogs.pp lpStrFilter is a member on the LPOPENFILENAME winapi structure. Having a win9x version is trivial, but having a version where we put a PWideChar into lpStrFilter looks hard... lpStrFilter := StrAlloc(Length(Filter)+1); StrPCopy(lpStrFilter, Utf8ToAnsi(Filter)); I tryed this as unicode version: lpStrFilter := PChar(PWideChar(Utf8Decode(Filter))); But it crashes the dialog when it is closed... I don't know what exactly is happening here, or why the original code manually alocated memory and then copyed the string, instead of just copying, so I'm lost about solving this. thanks, -- Felipe Monteiro de Carvalho ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] CSV via PCRE
Graeme Geldenhuys wrote: > OK, while we are busy with show-and-tell... Then have a look at my > token library implementation. You've implemented some kind of 'cut'. But grep is also very useful (and more often used in a shell, at least by me). Micha ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] CSV via PCRE
OK, while we are busy with show-and-tell... Then have a look at my token library implementation. http://tinyurl.com/395vgp * It's based on a Infinite State Machine. * No external units required. * Allows multiple separators (user selectable) between tokens. * Allows for user selectable seperator characters. * Does line number and position error reporting in case the CSV file is not well formatted. * Only parses the string once, so if you request multiple tokens, it takes no performance it. * I also think it is much easier to understand and extend than than than single regex which looks more to me like your are cursing! ;-) Sample Usage: tokenizer := TTokens.Create(FieldSpecLine, ', ', '"', '"', '\', tsMultipleSeparatorsBetweenTokens); try lField := tokenizer.Token(2); lAnotherField := tokenizer.Token(4); finally tokenizer.Free; end; Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal