There are actually non-break spaces in the source code of the page. If you look at it, you will see things like this:
<B>A/D <BR> Whether or not XML::trim gets rid of them for you may be OS specific. See and answer to an old question of mine on R-help for example https://stat.ethz.ch/pipermail/r-help/2012-February/302417.html Best, Garrett On Tue, Feb 5, 2013 at 8:20 AM, David Reiner <david.rei...@xrtrading.com> wrote: > Very nice, Garrett! > More curious than anything, but does anyone know why I get the extraneous > characters when I do it? > They are present in x as well. I believe they are non-breaking spaces. > >> head(SIC) > SICCode A/D  Office  Industry Title > 4 100 5  AGRICULTURAL PRODUCTION-CROPS > 5 200 5  AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES > 6 700 5  AGRICULTURAL SERVICES > 7 800 5  FORESTRY > 8 900 5  FISHING, HUNTING AND TRAPPING > 9 1000 9  METAL MINING >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] XML_3.95-0.1 > > loaded via a namespace (and not attached): > [1] tools_2.15.2 > > Thanks, > -- David Reiner > > > -----Original Message----- > From: r-sig-finance-boun...@r-project.org > [mailto:r-sig-finance-boun...@r-project.org] On Behalf Of G See > Sent: Monday, February 04, 2013 9:30 PM > To: Bastian Offermann > Cc: r-sig-finance@r-project.org > Subject: Re: [R-SIG-Finance] 4-digit SIC codes > > I'm not sure, but here's a really quick and dirty way to get it > >> library(XML) >> x <- readHTMLTable("http://www.sec.gov/info/edgar/siccodes.htm", > stringsAsFactors=FALSE)[[4]] >> colnames(x) <- x[2, ] >> SIC <- x[-c(1:3), ] >> head(SIC) > SICCode A/D Office Industry Title > 4 100 5 AGRICULTURAL PRODUCTION-CROPS > 5 200 5 AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES > 6 700 5 AGRICULTURAL SERVICES > 7 800 5 FORESTRY > 8 900 5 FISHING, HUNTING AND TRAPPING > 9 1000 9 METAL MINING > >> SIC[SIC$SICCode == "2834", ] > SICCode A/D Office Industry Title > 91 2834 1 PHARMACEUTICAL PREPARATIONS > > HTH, > Garrett > > On Mon, Feb 4, 2013 at 9:19 PM, Bastian Offermann > <bastian250...@yahoo.co.uk> wrote: >> Hi, >> does anybody know whether 4-digit SIC codes are available in R? Something >> along the lines >> >> "2834" "Pharmaceutical Preparations" >> >> Thank you. >> >> _______________________________________________ >> R-SIG-Finance@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >> -- Subscriber-posting only. If you want to post, subscribe first. >> -- Also note that this is not the r-help list where general R questions >> should go. > > _______________________________________________ > R-SIG-Finance@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions > should go. > > > This e-mail and any materials attached hereto, including, without limitation, > all content hereof and thereof (collectively, "XR Content") are confidential > and proprietary to XR Trading, LLC ("XR") and/or its affiliates, and are > protected by intellectual property laws. Without the prior written consent > of XR, the XR Content may not (i) be disclosed to any third party or (ii) be > reproduced or otherwise used by anyone other than current employees of XR or > its affiliates, on behalf of XR or its affiliates. > > THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF > ANY KIND. TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY > DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR > CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE > LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED > TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF > PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, > OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY > OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE. On Tue, Feb 5, 2013 at 8:20 AM, David Reiner <david.rei...@xrtrading.com> wrote: > Very nice, Garrett! > More curious than anything, but does anyone know why I get the extraneous > characters when I do it? > They are present in x as well. I believe they are non-breaking spaces. > >> head(SIC) > SICCode A/D  Office  Industry Title > 4 100 5  AGRICULTURAL PRODUCTION-CROPS > 5 200 5  AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES > 6 700 5  AGRICULTURAL SERVICES > 7 800 5  FORESTRY > 8 900 5  FISHING, HUNTING AND TRAPPING > 9 1000 9  METAL MINING >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] XML_3.95-0.1 > > loaded via a namespace (and not attached): > [1] tools_2.15.2 > > Thanks, > -- David Reiner > > > -----Original Message----- > From: r-sig-finance-boun...@r-project.org > [mailto:r-sig-finance-boun...@r-project.org] On Behalf Of G See > Sent: Monday, February 04, 2013 9:30 PM > To: Bastian Offermann > Cc: r-sig-finance@r-project.org > Subject: Re: [R-SIG-Finance] 4-digit SIC codes > > I'm not sure, but here's a really quick and dirty way to get it > >> library(XML) >> x <- readHTMLTable("http://www.sec.gov/info/edgar/siccodes.htm", > stringsAsFactors=FALSE)[[4]] >> colnames(x) <- x[2, ] >> SIC <- x[-c(1:3), ] >> head(SIC) > SICCode A/D Office Industry Title > 4 100 5 AGRICULTURAL PRODUCTION-CROPS > 5 200 5 AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES > 6 700 5 AGRICULTURAL SERVICES > 7 800 5 FORESTRY > 8 900 5 FISHING, HUNTING AND TRAPPING > 9 1000 9 METAL MINING > >> SIC[SIC$SICCode == "2834", ] > SICCode A/D Office Industry Title > 91 2834 1 PHARMACEUTICAL PREPARATIONS > > HTH, > Garrett > > On Mon, Feb 4, 2013 at 9:19 PM, Bastian Offermann > <bastian250...@yahoo.co.uk> wrote: >> Hi, >> does anybody know whether 4-digit SIC codes are available in R? Something >> along the lines >> >> "2834" "Pharmaceutical Preparations" >> >> Thank you. >> >> _______________________________________________ >> R-SIG-Finance@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >> -- Subscriber-posting only. If you want to post, subscribe first. >> -- Also note that this is not the r-help list where general R questions >> should go. > > _______________________________________________ > R-SIG-Finance@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions > should go. > > > This e-mail and any materials attached hereto, including, without limitation, > all content hereof and thereof (collectively, "XR Content") are confidential > and proprietary to XR Trading, LLC ("XR") and/or its affiliates, and are > protected by intellectual property laws. Without the prior written consent > of XR, the XR Content may not (i) be disclosed to any third party or (ii) be > reproduced or otherwise used by anyone other than current employees of XR or > its affiliates, on behalf of XR or its affiliates. > > THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF > ANY KIND. TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY > DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR > CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE > LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED > TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF > PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, > OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY > OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE. _______________________________________________ R-SIG-Finance@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.