Re: [Gambas-user] Help with some parsing

2009-02-18 Thread Benoît Minisini
 nando ha scritto:
  I got a chuckle from this one.
  A typewriter, those mechanical things that are only in black and white
  movies and museum actually performed CRLF using that bar to perform the
  line feed and carriage return.  And CR LF comes from that.
  All the software I write for HTTP, ascii files, When reading, I only look
  for the LF. If I find a CR, I just ignore it. I have yet to find a case
  it normally doesn't work with.

 I agree partially with you - I don't like the CR-LF convention: I think
 that a single LF is enough.
 But your programs, even if they ignore LF when reading, still use CR-LF
 when writing (I hope!), otherwise they did not respect the standard.

 And about the standard... someone in the past said a single LF would be
 the line terminator. But someone else thought that CR was enough (Apple
 computers for example, and before, other mainframes like VAX). Then,
 still someone else thought that neither was good, and used CRLF. If the
 american DoD, from which our Internet comes, decided CR-LF was the way
 to go, they too had good reasons, I think, even if I don't know them. I
 admire so much who invented IP, TCP/IP, FTP, HTTP, that I take for good
 what they said: CR-LF is the right line terminator even if I don't like it.


Maybe this is the reason why they decided to use CR-LF. This way, lines can be 
easily interpreted on Unix, Windows, and MacOS. Choosing just CR or just LF 
whould have done things more difficult for the two other OS.

Just my 2 cents.

-- 
Benoît

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Gambas-user mailing list
Gambas-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gambas-user


Re: [Gambas-user] Help with some parsing

2009-02-17 Thread richard terry
On Tue, 17 Feb 2009 05:55:10 pm you wrote:
 So my quesitoni is 'how to discover what the character is ?chr$(10)(13)
 and how to eliminate those before parsing.

 CHR$(10)  CHR$(13) - carriage return  line feed
 Those are special command character from the stone age of the
 informatic used to go to the next line (add new line) - was needed in the
 times where a martix printer with ink ribbon the single output device of
 the computer was


 Try using REPLACE$, I never used it in GAMBAS, but the same operator in VB6
 was able to remove every char (also those with special functions)

 ResultString=REPLACE$(InputStringWithBadChars, CHR$(10), )
 ResultString=REPLACE$(ResultString, CHR$(13), )


 This will search (in 2 Stages) first for CHR$(10) and than for CHR$(13) and
 replace them with empty strings...


 hope this helps
That's the sort of solution I was looking for and I will try it and see if it 
works better than my crude solution:

In the meantime I found another solution. I re-exported from msAccess, but 
added a 'dummy field at the end I called 'end', so  a theoretically complete 
line of data would look like

surname|firstname|birthdate|socialhistory|end

I then read the lines concatenating them until I had a complete line:

  Line Input #hfile, sLineInput   
  sIncompleteLine = sLineInput 'keep copy
  Do Until Right(sIncompleteLine, 4) = |end
 Line Input #hfile, sLineInput'No, then get the next line
sIncompleteLine = sLineInput  'and concatenate for 
  Loop
  sLineInput = sIncompleteLine  'ok n now, keep going.

'Line is complete, so plit up into bits
  bits = Split(sLineInput, |)

thanks all, and benoit, I'll look at the gambas data thing but I've never been 
able to get it to work - it won't read my postgres database without crashing, 
I've posted that before.

Regards

richard






 kind regards


 Emil



--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Gambas-user mailing list
Gambas-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gambas-user


Re: [Gambas-user] Help with some parsing

2009-02-17 Thread Doriano Blengino
nando ha scritto:
 I got a chuckle from this one.
 A typewriter, those mechanical things that are only in black and white movies
 and museum actually performed CRLF using that bar to perform the line feed 
 and 
 carriage return.  And CR LF comes from that. 
 All the software I write for HTTP, ascii files, When reading, I only look for 
 the LF.
 If I find a CR, I just ignore it. I have yet to find a case it normally
 doesn't work with.
   
I agree partially with you - I don't like the CR-LF convention: I think 
that a single LF is enough.
But your programs, even if they ignore LF when reading, still use CR-LF 
when writing (I hope!), otherwise they did not respect the standard.

And about the standard... someone in the past said a single LF would be 
the line terminator. But someone else thought that CR was enough (Apple 
computers for example, and before, other mainframes like VAX). Then, 
still someone else thought that neither was good, and used CRLF. If the 
american DoD, from which our Internet comes, decided CR-LF was the way 
to go, they too had good reasons, I think, even if I don't know them. I 
admire so much who invented IP, TCP/IP, FTP, HTTP, that I take for good 
what they said: CR-LF is the right line terminator even if I don't like it.

Pardon me for getting this so hot, I find it difficult to think that 
CR-LF is bad only because is so old... :-), the only important thing is 
to cooperate in a common standard. Well: the IP/TCP/HTTP/HTML standard 
is out there, and it works. It has been developed in full democracy (you 
know what RFC stands for), and so many different kinds of computer can 
intercommunicate, thanks to this standard. Do you want to say this is a 
stupid thing?

And finally... I still use what you could call old museum printers, 
which need CR *and* LF. There are some around, even if you don't see 
them... the world is not only made from 4Gb ram computers and 30PPM 
laser printers...

Best regards, and don't get too angry with me...

-- 
Doriano Blengino

Listen twice before you speak.
This is why we have two ears, but only one mouth.


--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Gambas-user mailing list
Gambas-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gambas-user


Re: [Gambas-user] Help with some parsing

2009-02-17 Thread Emil Tchekov
Hello Doriano,

I have to excuse me - it was never my intention to insult your, as it seems
deeper going feelings, regarding this particular 2 Bytes :-D

Ofcourse you are right - they are (and will be in the future also) still in
use...

But here is the point where I do not agree with you:

I do not know how old are you... My first date with CRLF was on my very
expensive (4000 USD) Apple ][ with the huge ammount of 48 KB RAM, 16 KB
ROM, 144 KB Floppy Disk and rare extension - Video Card allowing me to work
with 80x25 chars instead of the usual 40x25 ... Not to mention the
extraordinary calculation power based on the 1 MHz 6502 CPU ;-).

So please excuse when I am calling this stone age, but my very simple
mobile phone laying in my pocket offers 2 GB of storing capacity on volume
that the old Apple used to store about 16KB and about 300 MHz RISC based
power...

It was not a try to reduce the meaning and importance of those command
chars - but simply to show how long they are in use...

Very best regards

And one more time excuse my lack of respect


Emil Tchekov










-Ursprungliche Nachricht-
Von: Doriano Blengino [mailto:doriano.bleng...@fastwebnet.it]
Gesendet: Dienstag, 17. Februar 2009 23:15
An: mailing list for gambas users
Betreff: Re: [Gambas-user] Help with some parsing


Emil Tchekov ha scritto:
 CHR$(10)  CHR$(13) - carriage return  line feed
 Those are special command character from the stone age of the
informatic
 used to go to the next line (add new line) - was needed in the times where
a
 martix printer with ink ribbon the single output device of the computer
 was

Pardon me to point out. I disagree,
chr$(13) is CR; chr$(10) is LF.
While it is true they are an old standard, saying they are from the
stone age is not fair. HTTP protocol relies on CR-LF, and you must
admit that HTTP protocol is well alive - not a stone age standard. The
fact CR-LF is so old is a proof of its power.
Moreover, text files are the most expressive and versatile form of
digital data. Html, Xml, Svg, Postscript, PDF, are all in wide use and
are based on ASCII, and so they contain CR-LF sequences.

The fact Unix/Linux uses a single LF instead of a CR-LF is pretty
marginal - it was simply a design choice. A windows HTTP server can take
text files from disk and serve them verbatim, while a Unix HTTP server
has to translate them (to add the missing CR). Anyway, the concept is
the same. Most programs today can cope perfectly well with this two
standars.

In the end, the only thing I wanted to say is that CR and LF are far
more than what you depicted, and they are still needed.

Regards,

--
Doriano Blengino

Listen twice before you speak.
This is why we have two ears, but only one mouth.



--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Gambas-user mailing list
Gambas-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gambas-user


--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Gambas-user mailing list
Gambas-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gambas-user


[Gambas-user] Help with some parsing

2009-02-16 Thread richard terry
I'm importing some old data from windows. The data is exported from msAccess97 
and some of the text fields have a carriage return of some sort in the 
middle. The same character seems to be used by access as the end of line for 
each record, so when I try and import it, I'm getting a truncated line, which 
then mucks up the next line

I wondered how one could check this

eg: If I look at the file in a text editor with line numbers:

1 Doe|john|01/02/1950|some text in here saying something
2 Smith|Peter|19/02/1944|also some text 
3 but is split onto a new line so the parser crashes
3 Brown|Michael|17/05/1966|but this line is ok

ie: there is a character in there  that here I've designated as [xx] which is 
intrepreted as a new line


1 Doe|john|01/02/1950|some text in here saying something
2 Smith|Peter|19/02/1944|also some text [xx]
3 but is split onto a new line so the parser crashes
3 Brown|Michael|17/05/1966|but this line is ok

So my quesitoni is 'how to discover what the character is ?chr$(10)(13)
and how to eliminate those before parsing.

Thanks in anticipation.

richard


--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Gambas-user mailing list
Gambas-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gambas-user


Re: [Gambas-user] Help with some parsing

2009-02-16 Thread Doriano Blengino
richard terry ha scritto:
 I'm importing some old data from windows. The data is exported from 
 msAccess97 
 and some of the text fields have a carriage return of some sort in the 
 middle. The same character seems to be used by access as the end of line for 
 each record, so when I try and import it, I'm getting a truncated line, which 
 then mucks up the next line

 ie: there is a character in there  that here I've designated as [xx] which is 
 intrepreted as a new line


 1 Doe|john|01/02/1950|some text in here saying something
 2 Smith|Peter|19/02/1944|also some text [xx]
 3 but is split onto a new line so the parser crashes
 3 Brown|Michael|17/05/1966|but this line is ok

 So my quesitoni is 'how to discover what the character is ?chr$(10)(13)
 and how to eliminate those before parsing.
   
You could open your data file with a hex editor, or use hexdump or 
similar, and look at both the correct lines and the splitted ones to see 
how they end. If their ending is different, you could read your file in 
binary mode and manage to reconstruct it by code.
But I suspect that all the lines end in the same way - CR-LF.

Another way, probably better, is to guess the number of fields a line 
should have; say, every line must have 3 fields, and 2 pipes |. You 
read the file line by line and, if you don't find the expected number of 
pipes then the line is a continuation and must be appended to the one 
read before. In the excerpt you reported, the line number 3 does not 
contain pipes, so it should be joined with the previous.

Hope this helps - regards,

-- 
Doriano Blengino

Listen twice before you speak.
This is why we have two ears, but only one mouth.


--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Gambas-user mailing list
Gambas-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gambas-user