Re: [fpc-pascal] How to split file of whitespace separated numbers?

2016-12-24 Thread Luiz Americo Pereira Camara
2016-12-23 15:27 GMT-03:00 Marco van de Voort :

> In our previous episode, Graeme Geldenhuys said:
> > For many other things, plain code could be faster, but often a lot more
> > effort and time consuming to implement. Where as you could have written
> > a regex expression in under 10 seconds and accomplish the same task 8
> > lines of code or less - very little effort required.
>
> Writing or even worse, reading/debugging regex is about the most intensive
> effort there is IMHO.
>


Agree that Regex carries an extra mental overhead. This is why i kept away
from it for a long time.

Early this year i needed to use it in one of my projects, so i decided to
bite the bullet and read Mastering Regular Expressions book.

Once you understand the reasoning behind regex, it's a lot less
intimidating.

These days i use eventually

For coincidence, yesterday, i was writing code to parse raw text to extract
some data.

Initially i did manually but when i needed to extract a new field i
realized things would get even worse. Than rewrote with regex.

See diff here: https://www.diffchecker.com/NDDa9gpH

IMO much better.

Not saying that is easy or should be used at will. But once you learn the
basics, regex is a valuable tool.

For debugging i use http://regexr.com/ and rely on unit tests to ensure
correctness

Luiz
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] How to split file of whitespace separated numbers?

2016-12-24 Thread Sven Barth
Am 24.12.2016 12:53 schrieb "Mark Morgan Lloyd" <
markmll.fpc-pas...@telemetry.co.uk>:
>
> On 24/12/16 11:30, Lars wrote:
>>
>> On Fri, December 23, 2016 12:54 pm, Graeme Geldenhuys wrote:
>>>
>>> On 2016-12-23 18:27, Marco van de Voort wrote:
>>>
 Writing or even worse, reading/debugging regex is about the most
 intensive effort there is IMHO.
>>>
>>>
>>> So is standard programming code - if you don't know the syntax or how it
>>> works. ;-)  Also the reason why I posted a couple of links to regex
sites
>>> to get the original poster started (in case he doesn't know regex). Here
>>> is another link (by the author of EditPad Pro), who really knows his
>>> regex!
>>>
>>> http://www.regular-expressions.info/tutorial.html
>>>
>>
>> Next thing todo: implement PERL inside pascal programs, compiled in perl.
>> Then, realize, why you didn't originally want to go there ;-)
>
>
> Or even allow FPC to to call Lua.

You realize that we already have language bindings for Lua somewhere? ;)

Regards,
Sven
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] How to split file of whitespace separated numbers?

2016-12-24 Thread Mark Morgan Lloyd

On 24/12/16 11:30, Lars wrote:

On Fri, December 23, 2016 12:54 pm, Graeme Geldenhuys wrote:

On 2016-12-23 18:27, Marco van de Voort wrote:


Writing or even worse, reading/debugging regex is about the most
intensive effort there is IMHO.


So is standard programming code - if you don't know the syntax or how it
works. ;-)  Also the reason why I posted a couple of links to regex sites
to get the original poster started (in case he doesn't know regex). Here
is another link (by the author of EditPad Pro), who really knows his
regex!

http://www.regular-expressions.info/tutorial.html



Next thing todo: implement PERL inside pascal programs, compiled in perl.
Then, realize, why you didn't originally want to go there ;-)


Or even allow FPC to to call Lua.

I know this is rare and probably wouldn't happen outside the "season of 
goodwill", but I actually agree with Graeme here: regexes are useful. 
BUT FFS DOCUMENT WHAT YOU'RE DOING FOR PEOPLE WHO DON'T UNDERSTAND THEM!



--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] How to split file of whitespace separated numbers?

2016-12-24 Thread Lars
On Fri, December 23, 2016 12:54 pm, Graeme Geldenhuys wrote:
> On 2016-12-23 18:27, Marco van de Voort wrote:
>
>> Writing or even worse, reading/debugging regex is about the most
>> intensive effort there is IMHO.
>
> So is standard programming code - if you don't know the syntax or how it
> works. ;-)  Also the reason why I posted a couple of links to regex sites
> to get the original poster started (in case he doesn't know regex). Here
> is another link (by the author of EditPad Pro), who really knows his
> regex!
>
> http://www.regular-expressions.info/tutorial.html
>

Next thing todo: implement PERL inside pascal programs, compiled in perl.
Then, realize, why you didn't originally want to go there ;-)
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] How to split file of whitespace separated numbers?

2016-12-24 Thread Luiz Americo Pereira Camara
Em 23 de dez de 2016 05:15, "Bo Berglund"  escreveu:

Is there a quick way to split a string of whitespace separated values
into the separate members?



Unir strutils

Wordcount + extractword

Or

Extractsubstr in loop


http://www.freepascal.org/docs-html/rtl/strutils/extractsubstr.html

Luiz

I have to create a function to process a number of big data files
where
numbers are stored in lines of 4-6 values using whitespace inbetween.
First I got a sample looking like this:
{code}
0.41670.3636-14.1483227.2260
{code}
Here the separators were 4 spaces so on each line I used (slDecode is
a TStringList):
{code}
  sLine := StringReplace(sLine, '', #13, [rfReplaceAll]);
  slDecode.Text := sLine;
{code}
Worked fine if a bit slow...
The stringlist items are then passed to a string to float function and
stored into a dynamic array.

But then it failed on a file containing lines like this:
{code}
   0.0000.0007.0000.000  29.6628
{code}
Here there are 3 leading spaces plus one separator is only 2 spaces
wide. So I had to modify the code:
{code}
  sLine := Trim(sLine);
  sLine := StringReplace(sLine, '', #13, [rfReplaceAll]);
  sLine := StringReplace(sLine, '  ', #13, [rfReplaceAll]);
  slDecode.Text := sLine;
{code}

This works in this case but now I realize I need something better,
which can deal with varying number of whitespace chars inbetween
numbers.
The test files are very big, like half a million lines and up, so I
cannot introduce a lot of code in the loop since processing time will
increase.

Is there any good and quick way to extract real data from a space
separated list without knowing beforehand the size of the whitespace
separators?

I guess that my next sample problem will be a file with TAB rather
than space or even mixed TAB and space...

--
Bo Berglund
Developer in Sweden

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] How to split file of whitespace separated numbers?

2016-12-24 Thread Lars
On Fri, December 23, 2016 4:49 am, Howard Page-Clark wrote:
> On 23/12/16 08:14, Bo Berglund wrote:
>
>> Is there a quick way to split a string of whitespace separated values
>> into the separate members?
> It is possible that a custom string parser (something along these lines)
> might improve your processing speed:
>
> type TDoubleArray = array of Double;
>
>
> function StrToDblArray(const aString: string): TDoubleArray;
> var c: Char;

And as soon as char is involved, unicode gets screwed up

Am I right, am I right...

But if he is not dealing with any unicode data and it is all simple chars,
should be okay.

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal