Re: [Gambas-user] Help with some parsing
nando ha scritto: I got a chuckle from this one. A typewriter, those mechanical things that are only in black and white movies and museum actually performed CRLF using that bar to perform the line feed and carriage return. And CR LF comes from that. All the software I write for HTTP, ascii files, When reading, I only look for the LF. If I find a CR, I just ignore it. I have yet to find a case it normally doesn't work with. I agree partially with you - I don't like the CR-LF convention: I think that a single LF is enough. But your programs, even if they ignore LF when reading, still use CR-LF when writing (I hope!), otherwise they did not respect the standard. And about the standard... someone in the past said a single LF would be the line terminator. But someone else thought that CR was enough (Apple computers for example, and before, other mainframes like VAX). Then, still someone else thought that neither was good, and used CRLF. If the american DoD, from which our Internet comes, decided CR-LF was the way to go, they too had good reasons, I think, even if I don't know them. I admire so much who invented IP, TCP/IP, FTP, HTTP, that I take for good what they said: CR-LF is the right line terminator even if I don't like it. Maybe this is the reason why they decided to use CR-LF. This way, lines can be easily interpreted on Unix, Windows, and MacOS. Choosing just CR or just LF whould have done things more difficult for the two other OS. Just my 2 cents. -- Benoît -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] Help with some parsing
On Tue, 17 Feb 2009 05:55:10 pm you wrote: So my quesitoni is 'how to discover what the character is ?chr$(10)(13) and how to eliminate those before parsing. CHR$(10) CHR$(13) - carriage return line feed Those are special command character from the stone age of the informatic used to go to the next line (add new line) - was needed in the times where a martix printer with ink ribbon the single output device of the computer was Try using REPLACE$, I never used it in GAMBAS, but the same operator in VB6 was able to remove every char (also those with special functions) ResultString=REPLACE$(InputStringWithBadChars, CHR$(10), ) ResultString=REPLACE$(ResultString, CHR$(13), ) This will search (in 2 Stages) first for CHR$(10) and than for CHR$(13) and replace them with empty strings... hope this helps That's the sort of solution I was looking for and I will try it and see if it works better than my crude solution: In the meantime I found another solution. I re-exported from msAccess, but added a 'dummy field at the end I called 'end', so a theoretically complete line of data would look like surname|firstname|birthdate|socialhistory|end I then read the lines concatenating them until I had a complete line: Line Input #hfile, sLineInput sIncompleteLine = sLineInput 'keep copy Do Until Right(sIncompleteLine, 4) = |end Line Input #hfile, sLineInput'No, then get the next line sIncompleteLine = sLineInput 'and concatenate for Loop sLineInput = sIncompleteLine 'ok n now, keep going. 'Line is complete, so plit up into bits bits = Split(sLineInput, |) thanks all, and benoit, I'll look at the gambas data thing but I've never been able to get it to work - it won't read my postgres database without crashing, I've posted that before. Regards richard kind regards Emil -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] Help with some parsing
nando ha scritto: I got a chuckle from this one. A typewriter, those mechanical things that are only in black and white movies and museum actually performed CRLF using that bar to perform the line feed and carriage return. And CR LF comes from that. All the software I write for HTTP, ascii files, When reading, I only look for the LF. If I find a CR, I just ignore it. I have yet to find a case it normally doesn't work with. I agree partially with you - I don't like the CR-LF convention: I think that a single LF is enough. But your programs, even if they ignore LF when reading, still use CR-LF when writing (I hope!), otherwise they did not respect the standard. And about the standard... someone in the past said a single LF would be the line terminator. But someone else thought that CR was enough (Apple computers for example, and before, other mainframes like VAX). Then, still someone else thought that neither was good, and used CRLF. If the american DoD, from which our Internet comes, decided CR-LF was the way to go, they too had good reasons, I think, even if I don't know them. I admire so much who invented IP, TCP/IP, FTP, HTTP, that I take for good what they said: CR-LF is the right line terminator even if I don't like it. Pardon me for getting this so hot, I find it difficult to think that CR-LF is bad only because is so old... :-), the only important thing is to cooperate in a common standard. Well: the IP/TCP/HTTP/HTML standard is out there, and it works. It has been developed in full democracy (you know what RFC stands for), and so many different kinds of computer can intercommunicate, thanks to this standard. Do you want to say this is a stupid thing? And finally... I still use what you could call old museum printers, which need CR *and* LF. There are some around, even if you don't see them... the world is not only made from 4Gb ram computers and 30PPM laser printers... Best regards, and don't get too angry with me... -- Doriano Blengino Listen twice before you speak. This is why we have two ears, but only one mouth. -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] Help with some parsing
Hello Doriano, I have to excuse me - it was never my intention to insult your, as it seems deeper going feelings, regarding this particular 2 Bytes :-D Ofcourse you are right - they are (and will be in the future also) still in use... But here is the point where I do not agree with you: I do not know how old are you... My first date with CRLF was on my very expensive (4000 USD) Apple ][ with the huge ammount of 48 KB RAM, 16 KB ROM, 144 KB Floppy Disk and rare extension - Video Card allowing me to work with 80x25 chars instead of the usual 40x25 ... Not to mention the extraordinary calculation power based on the 1 MHz 6502 CPU ;-). So please excuse when I am calling this stone age, but my very simple mobile phone laying in my pocket offers 2 GB of storing capacity on volume that the old Apple used to store about 16KB and about 300 MHz RISC based power... It was not a try to reduce the meaning and importance of those command chars - but simply to show how long they are in use... Very best regards And one more time excuse my lack of respect Emil Tchekov -Ursprungliche Nachricht- Von: Doriano Blengino [mailto:doriano.bleng...@fastwebnet.it] Gesendet: Dienstag, 17. Februar 2009 23:15 An: mailing list for gambas users Betreff: Re: [Gambas-user] Help with some parsing Emil Tchekov ha scritto: CHR$(10) CHR$(13) - carriage return line feed Those are special command character from the stone age of the informatic used to go to the next line (add new line) - was needed in the times where a martix printer with ink ribbon the single output device of the computer was Pardon me to point out. I disagree, chr$(13) is CR; chr$(10) is LF. While it is true they are an old standard, saying they are from the stone age is not fair. HTTP protocol relies on CR-LF, and you must admit that HTTP protocol is well alive - not a stone age standard. The fact CR-LF is so old is a proof of its power. Moreover, text files are the most expressive and versatile form of digital data. Html, Xml, Svg, Postscript, PDF, are all in wide use and are based on ASCII, and so they contain CR-LF sequences. The fact Unix/Linux uses a single LF instead of a CR-LF is pretty marginal - it was simply a design choice. A windows HTTP server can take text files from disk and serve them verbatim, while a Unix HTTP server has to translate them (to add the missing CR). Anyway, the concept is the same. Most programs today can cope perfectly well with this two standars. In the end, the only thing I wanted to say is that CR and LF are far more than what you depicted, and they are still needed. Regards, -- Doriano Blengino Listen twice before you speak. This is why we have two ears, but only one mouth. -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
[Gambas-user] Help with some parsing
I'm importing some old data from windows. The data is exported from msAccess97 and some of the text fields have a carriage return of some sort in the middle. The same character seems to be used by access as the end of line for each record, so when I try and import it, I'm getting a truncated line, which then mucks up the next line I wondered how one could check this eg: If I look at the file in a text editor with line numbers: 1 Doe|john|01/02/1950|some text in here saying something 2 Smith|Peter|19/02/1944|also some text 3 but is split onto a new line so the parser crashes 3 Brown|Michael|17/05/1966|but this line is ok ie: there is a character in there that here I've designated as [xx] which is intrepreted as a new line 1 Doe|john|01/02/1950|some text in here saying something 2 Smith|Peter|19/02/1944|also some text [xx] 3 but is split onto a new line so the parser crashes 3 Brown|Michael|17/05/1966|but this line is ok So my quesitoni is 'how to discover what the character is ?chr$(10)(13) and how to eliminate those before parsing. Thanks in anticipation. richard -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user
Re: [Gambas-user] Help with some parsing
richard terry ha scritto: I'm importing some old data from windows. The data is exported from msAccess97 and some of the text fields have a carriage return of some sort in the middle. The same character seems to be used by access as the end of line for each record, so when I try and import it, I'm getting a truncated line, which then mucks up the next line ie: there is a character in there that here I've designated as [xx] which is intrepreted as a new line 1 Doe|john|01/02/1950|some text in here saying something 2 Smith|Peter|19/02/1944|also some text [xx] 3 but is split onto a new line so the parser crashes 3 Brown|Michael|17/05/1966|but this line is ok So my quesitoni is 'how to discover what the character is ?chr$(10)(13) and how to eliminate those before parsing. You could open your data file with a hex editor, or use hexdump or similar, and look at both the correct lines and the splitted ones to see how they end. If their ending is different, you could read your file in binary mode and manage to reconstruct it by code. But I suspect that all the lines end in the same way - CR-LF. Another way, probably better, is to guess the number of fields a line should have; say, every line must have 3 fields, and 2 pipes |. You read the file line by line and, if you don't find the expected number of pipes then the line is a continuation and must be appended to the one read before. In the excerpt you reported, the line number 3 does not contain pipes, so it should be joined with the previous. Hope this helps - regards, -- Doriano Blengino Listen twice before you speak. This is why we have two ears, but only one mouth. -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Gambas-user mailing list Gambas-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gambas-user