Re: binary vs. text?
Thanks to everyone for your suggestions. I'll probably go with this one from Mark, even though Sarah's was very good. Since my files will have to be in a certain format anyway, it's easy for me just to verify the data. Chris On Dec 11, 2006, at 4:03 PM, Mark Schonewille wrote: When you import a file, you always want to do something with its contents. Just check to see if the text contents fits the destination. If not, it might be a binary file and you may need to handle it differently. -- Chris Sheffield Read Naturally The Fluency Company http://www.readnaturally.com -- ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
On Dec 11, 2006, at 3:09 PM, Chris Sheffield wrote: Does anyone have a sure fire way to determine if a file is binary or text? I have need to create an import utility that will import data from a text file (csv, tab-delimited, etc) into a database, but I'd like to check the file before doing anything else just to make sure it is in fact text and not binary. In general, there is no way. However, all is not lost. A text file is a special case of a binary file consisting of a sequence of characters whose representations are binary. For very short files, it is hard to tell. However, if you have some idea of the pattern you are expecting you can increase your confidence that some file is binary or text. Many file formats have magic words and header data that indicate the type. These provide a hint and an additional check can provide some confidence. For example, a magic word plus a required element can identify a .png file, that is, check to see whether it starts with this: format("\211PNG\r\n\032\n\000\000\000\015IHDR"). Unicode files often have BOM markers at the start, but they are not required in some cases and the BOM shouldn't be there in others. I have a function I use to differentiate among Unicode files, but that already assumes I know they are unicode and even then it has trouble with some perverse files. (It does get it right more often than Microsoft programs do.) UTF-8 files also have other limitations among the characters, so that can help. Text files should have certain patterns. For example, if the file is ASCII and is comma-delimited or tab-delimited, there are some indicators. You should see only certain control characters. You should see the expected delimiter. You should see either CR or LF or both. All characters have codes less than 128. You might want to require the same number of delimiters per line. So, given some specified pattern of what you expect in binary or text, you should be able to differentiate. However, an alternate approach would be to parse the file and if the file does not pass, then reject it no matter the form of the data. Dar ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
Chris, When you import a file, you always want to do something with its contents. Just check to see if the text contents fits the destination. If not, it might be a binary file and you may need to handle it differently. There is another way. You could do a guess about the percentage of spaces, returns, and alphanumerical characters in a normal text file (nearly 100%) and in a binary file (significantly less). In those cases that the actual percentage is lower than some treshold value, assume it is a binary file. If the actual percentage is higher, assume it is a text file. If the actual percentage is approximately equal to the treshold value, ask the user. You can store a copy of (a part of) the data in another variable, use replaceText to remove all non-alfanumerical characters and calculate the percentage. If you have a really large file, you don't need to analyse the entire file. Best, Mark -- Economy-x-Talk Consultancy and Software Engineering http://economy-x-talk.com http://www.salery.biz Get your store on-line within minutes with Salery Web Store software. Download at http://www.salery.biz Op 11-dec-2006, om 23:09 heeft Chris Sheffield het volgende geschreven: Does anyone have a sure fire way to determine if a file is binary or text? I have need to create an import utility that will import data from a text file (csv, tab-delimited, etc) into a database, but I'd like to check the file before doing anything else just to make sure it is in fact text and not binary. Any thoughts? Thanks, Chris ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
On 11 Dec 2006, at 22:21, Chris Sheffield wrote: Thanks, Sarah. Very cool idea. Seems to work for me. Can anyone thing of any cases that this might fail? I think this would only work if you can be sure that the line endings in the text files are unix style (numToChar(10)). If, for example, the file had crlf as line endings, they would be converted when opening as a file, but not as a binfile. The comparison would fail even though it was a text file. (Not tested, so please give it a try.) Cheers Dave ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
On 12/12/06 12:55 AM, "Mark Schonewille" <[EMAIL PROTECTED]> wrote: > Ruslan, > > You can't do that, because there are about a dozen different unicode > signatures and some streams have no unicode signature at all. Hi Mark, We do this for Valentina studio. :-) Of course can be file without signature. It is optional ... -- Best regards, Ruslan Zasukhin VP Engineering and New Technology Paradigma Software, Inc Valentina - Joining Worlds of Information http://www.paradigmasoft.com [I feel the need: the need for speed] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
Ruslan, You can't do that, because there are about a dozen different unicode signatures and some streams have no unicode signature at all. Best, Mark -- Economy-x-Talk Consultancy and Software Engineering http://economy-x-talk.com http://www.salery.biz Get your store on-line within minutes with Salery Web Store software. Download at http://www.salery.biz Op 11-dec-2006, om 23:15 heeft Ruslan Zasukhin het volgende geschreven: If file follow unicode rules, it have special signature on start I did not hear about such ability define kind of file.. -- Best regards, Ruslan Zasukhin ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
Hi Sarah, You can't do that. Crlf is translated by Rev into linefeed on DOS cr is also translated into linefeed oin Mac OS 9, which means that a text file is never equal to its binary equivalent. Best, Mark -- Economy-x-Talk Consultancy and Software Engineering http://economy-x-talk.com http://www.salery.biz Get your store on-line within minutes with Salery Web Store software. Download at http://www.salery.biz Op 11-dec-2006, om 23:14 heeft Sarah Reichelt het volgende geschreven: Just a guess here but how about reading the file twice: once as file: and once as binfile: If the two are identical, then I assume it's text only. I have no idea if this will work, but it's worth a try :-) Sarah ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
Sarah Reichelt wrote: On 12/12/06, Chris Sheffield <[EMAIL PROTECTED]> wrote: Does anyone have a sure fire way to determine if a file is binary or text? I have need to create an import utility that will import data from a text file (csv, tab-delimited, etc) into a database, but I'd like to check the file before doing anything else just to make sure it is in fact text and not binary. Just a guess here but how about reading the file twice: once as file: and once as binfile: If the two are identical, then I assume it's text only. I have no idea if this will work, but it's worth a try :-) Sarah The solution won't scale, and it also depends whether it would handle multi-byte characters (use for Kanji for example) and certain UNICODE formats that use the NULL character. If this is on a particular platform, then there are various ways to do this. On Linux/UNIX systems you can run `file ` in order to classify a file for example. -- Geir A. Myrestrand ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
Thanks, Eric. I shouldn't have a problem with Sarah's method since these files will not be downloaded from the internet at all. On Dec 11, 2006, at 3:23 PM, Eric Chatonet wrote: Hi Chris, Sarah's answer sounds good (I mean fully reliable) but may appear unusable with heavy files you download from the internet. As for us we use another method based on statistics by checking charToNum's chars. On Windows, it appears that checking 60 chars and finding more than 3 chars the charToNum of which is less than 9 or greater than 175 gives an "almost" fully reliable result ;-) Le 11 déc. 06 à 23:09, Chris Sheffield a écrit : Does anyone have a sure fire way to determine if a file is binary or text? I have need to create an import utility that will import data from a text file (csv, tab-delimited, etc) into a database, but I'd like to check the file before doing anything else just to make sure it is in fact text and not binary. Any thoughts? Best Regards from Paris, Eric Chatonet -- http://www.sosmartsoftware.com/[EMAIL PROTECTED]/ ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution -- Chris Sheffield Read Naturally The Fluency Company http://www.readnaturally.com -- ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
Chris Sheffield wrote: Does anyone have a sure fire way to determine if a file is binary or text? There is no such thing as a sure fire way to determine if it is text or not, unless the definition of text can be clearly defined. I have need to create an import utility that will import data from a text file (csv, tab-delimited, etc) into a database, but I'd like to check the file before doing anything else just to make sure it is in fact text and not binary. Any thoughts? I would just assume it is text, and then handle "wrong" input gracefully. Trying to parse the contents first to verify it would just add overhead, and is unnecessary if you do the former. Consider my input as general, and not Revolution specific --my exposure to Revolution is modest, at least at this point... -- Geir A. Myrestrand ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
Hi Chris, Sarah's answer sounds good (I mean fully reliable) but may appear unusable with heavy files you download from the internet. As for us we use another method based on statistics by checking charToNum's chars. On Windows, it appears that checking 60 chars and finding more than 3 chars the charToNum of which is less than 9 or greater than 175 gives an "almost" fully reliable result ;-) Le 11 déc. 06 à 23:09, Chris Sheffield a écrit : Does anyone have a sure fire way to determine if a file is binary or text? I have need to create an import utility that will import data from a text file (csv, tab-delimited, etc) into a database, but I'd like to check the file before doing anything else just to make sure it is in fact text and not binary. Any thoughts? Best Regards from Paris, Eric Chatonet -- http://www.sosmartsoftware.com/[EMAIL PROTECTED]/ ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
On 12/12/06 12:15 AM, "Ruslan Zasukhin" <[EMAIL PROTECTED]> wrote: > On 12/12/06 12:09 AM, "Chris Sheffield" <[EMAIL PROTECTED]> wrote: > > Hi Chris, > >> Does anyone have a sure fire way to determine if a file is binary or >> text? >> >> I have need to create an import utility that will import data from a >> text file (csv, tab-delimited, etc) into a database, but I'd like to >> check the file before doing anything else just to make sure it is in >> fact text and not binary. >> >> Any thoughts? > > If file follow unicode rules, it have special signature on start > > I did not hear about such ability define kind of file.. Well, I think you can try next: 1) check unicode signature. if you have find it -- this is unicode textfile 2) ELSE you can scan the whole file or its part, byte for byte to see if you meat a ZERO byte. if YES - its a binary file. -- Best regards, Ruslan Zasukhin VP Engineering and New Technology Paradigma Software, Inc Valentina - Joining Worlds of Information http://www.paradigmasoft.com [I feel the need: the need for speed] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
Thanks, Sarah. Very cool idea. Seems to work for me. Can anyone thing of any cases that this might fail? On Dec 11, 2006, at 3:14 PM, Sarah Reichelt wrote: On 12/12/06, Chris Sheffield <[EMAIL PROTECTED]> wrote: Does anyone have a sure fire way to determine if a file is binary or text? I have need to create an import utility that will import data from a text file (csv, tab-delimited, etc) into a database, but I'd like to check the file before doing anything else just to make sure it is in fact text and not binary. Just a guess here but how about reading the file twice: once as file: and once as binfile: If the two are identical, then I assume it's text only. I have no idea if this will work, but it's worth a try :-) Sarah ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution -- Chris Sheffield Read Naturally The Fluency Company http://www.readnaturally.com -- ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
On 12/12/06 12:09 AM, "Chris Sheffield" <[EMAIL PROTECTED]> wrote: Hi Chris, > Does anyone have a sure fire way to determine if a file is binary or > text? > > I have need to create an import utility that will import data from a > text file (csv, tab-delimited, etc) into a database, but I'd like to > check the file before doing anything else just to make sure it is in > fact text and not binary. > > Any thoughts? If file follow unicode rules, it have special signature on start I did not hear about such ability define kind of file.. -- Best regards, Ruslan Zasukhin VP Engineering and New Technology Paradigma Software, Inc Valentina - Joining Worlds of Information http://www.paradigmasoft.com [I feel the need: the need for speed] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: binary vs. text?
On 12/12/06, Chris Sheffield <[EMAIL PROTECTED]> wrote: Does anyone have a sure fire way to determine if a file is binary or text? I have need to create an import utility that will import data from a text file (csv, tab-delimited, etc) into a database, but I'd like to check the file before doing anything else just to make sure it is in fact text and not binary. Just a guess here but how about reading the file twice: once as file: and once as binfile: If the two are identical, then I assume it's text only. I have no idea if this will work, but it's worth a try :-) Sarah ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
binary vs. text?
Does anyone have a sure fire way to determine if a file is binary or text? I have need to create an import utility that will import data from a text file (csv, tab-delimited, etc) into a database, but I'd like to check the file before doing anything else just to make sure it is in fact text and not binary. Any thoughts? Thanks, Chris -- Chris Sheffield Read Naturally The Fluency Company http://www.readnaturally.com -- ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution