Thanks, Earl. Your utility ran like a charm, and confirmed that my effort to adapt Enrico's code to this purpose had not gone astray, which is to say, I found no funky characters. Your help is greatly appreciated. Sincerely, andrewH
On Tue, Dec 10, 2013 at 7:35 AM, Earl F Glynn [via R] < ml-node+s789695n4681952...@n4.nabble.com> wrote: > andrewH wrote: > > > However, my suspicion is that there are some funky characters, either > > control characters or characters with some non-standard encoding, > somewhere > > in this 14 gig file. Moreover, I am concerned that these characters may > > cause me trouble down the road even if I use a different approach to > getting > > columns out of the file. > > This is not an R solution, but here's a Windows utility I wrote to > produce a table of frequency counts for all hex characters x00 to xFF in > a file. > > http://www.efg2.com/Lab/OtherProjects/CharCount.ZIP > > Normally, you'll want to scrutinize anything below x20 or above x7F, > since ASCII printable characters are in the range x20 to x7E. You can > see how many tab (x09) characters are in the file, and whether the line > endings are from Linux (x0A) or Windows (paired x0A and x0D). > > > The ZIP includes Delphi source code, but provides a Windows executable. > I made a change several months ago to allow drag-and-drop, so you can > just drop the file on the application to have the characters counted. > Just run the EXE after unzipping. No installation is needed. > > Once you find problems characters in the file, you can read the file as > character data and use sub/gsub or other tools to remove or alter > problem characters. > > efg > Earl F Glynn > UMKC School of Medicine > Center for Health Insights > > ______________________________________________ > [hidden email] <http://user/SendEmail.jtp?type=node&node=4681952&i=0>mailing > list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://r.789695.n4.nabble.com/How-can-I-find-nonstandard-or-control-characters-in-a-large-file-tp4681896p4681952.html > To unsubscribe from How can I find nonstandard or control characters in a > large file?, click > here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4681896&code=YWhvZXJuZXJAcnByb2dyZXNzLm9yZ3w0NjgxODk2fC0yMDQ3NjI1NDM5> > . > NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- J. Andrew Hoerner Director, Sustainable Economics Program Redefining Progress (510) 507-4820 -- View this message in context: http://r.789695.n4.nabble.com/How-can-I-find-nonstandard-or-control-characters-in-a-large-file-tp4681896p4682257.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.