Thanks, Earl. Your utility ran like a charm, and confirmed that my effort
to adapt Enrico's code to this purpose had not gone astray, which is to
say, I found no funky characters. Your help is greatly appreciated.
Sincerely, andrewH


On Tue, Dec 10, 2013 at 7:35 AM, Earl F Glynn [via R] <
ml-node+s789695n4681952...@n4.nabble.com> wrote:

> andrewH wrote:
>
> > However, my suspicion is that there are some funky characters, either
> > control characters or characters with some non-standard encoding,
> somewhere
> > in this 14 gig file. Moreover, I am concerned that these characters may
> > cause me trouble down the road even if I use a different approach to
> getting
> > columns out of the file.
>
> This is not an R solution, but here's a Windows utility I wrote to
> produce a table of frequency counts for all hex characters x00 to xFF in
> a file.
>
> http://www.efg2.com/Lab/OtherProjects/CharCount.ZIP
>
> Normally, you'll want to scrutinize anything below x20 or above x7F,
> since ASCII printable characters are in the range x20 to x7E. You can
> see how many tab (x09) characters are in the file, and whether the line
> endings are from Linux (x0A) or Windows (paired x0A and x0D).
>
>
> The ZIP includes Delphi source code, but provides a Windows executable.
>   I made a change several months ago to allow drag-and-drop, so you can
> just drop the file on the application to have the characters counted.
> Just run the EXE after unzipping.  No installation is needed.
>
> Once you find problems characters in the file, you can read the file as
> character data and use sub/gsub or other tools to remove or alter
> problem characters.
>
> efg
> Earl F Glynn
> UMKC School of Medicine
> Center for Health Insights
>
> ______________________________________________
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4681952&i=0>mailing 
> list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://r.789695.n4.nabble.com/How-can-I-find-nonstandard-or-control-characters-in-a-large-file-tp4681896p4681952.html
>  To unsubscribe from How can I find nonstandard or control characters in a
> large file?, click 
> here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4681896&code=YWhvZXJuZXJAcnByb2dyZXNzLm9yZ3w0NjgxODk2fC0yMDQ3NjI1NDM5>
> .
> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
J. Andrew Hoerner
Director, Sustainable Economics Program
Redefining Progress
(510) 507-4820




--
View this message in context: 
http://r.789695.n4.nabble.com/How-can-I-find-nonstandard-or-control-characters-in-a-large-file-tp4681896p4682257.html
Sent from the R help mailing list archive at Nabble.com.
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to