There are actually non-break spaces in the source code of the page.
If you look at it, you will see things like this:

<B>A/D &nbsp;<BR>

Whether or not XML::trim gets rid of them for you may be OS specific.
See and answer to an old question of mine on R-help for example
https://stat.ethz.ch/pipermail/r-help/2012-February/302417.html

Best,
Garrett

On Tue, Feb 5, 2013 at 8:20 AM, David Reiner <david.rei...@xrtrading.com> wrote:
> Very nice, Garrett!
> More curious than anything, but does anyone know why I get the extraneous 
> characters when I do it?
> They are present in x as well. I believe they are non-breaking spaces.
>
>> head(SIC)
>   SICCode A/D  Office                                    Industry Title
> 4     100            5 Â                    AGRICULTURAL PRODUCTION-CROPS
> 5     200            5 Â AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES
> 6     700            5 Â                            AGRICULTURAL SERVICES
> 7     800            5 Â                                         FORESTRY
> 8     900            5 Â                    FISHING, HUNTING AND TRAPPING
> 9    1000            9 Â                                     METAL MINING
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
> States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] XML_3.95-0.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.15.2
>
> Thanks,
> -- David Reiner
>
>
> -----Original Message-----
> From: r-sig-finance-boun...@r-project.org 
> [mailto:r-sig-finance-boun...@r-project.org] On Behalf Of G See
> Sent: Monday, February 04, 2013 9:30 PM
> To: Bastian Offermann
> Cc: r-sig-finance@r-project.org
> Subject: Re: [R-SIG-Finance] 4-digit SIC codes
>
> I'm not sure, but here's a really quick and dirty way to get it
>
>> library(XML)
>> x <- readHTMLTable("http://www.sec.gov/info/edgar/siccodes.htm";,
>                                   stringsAsFactors=FALSE)[[4]]
>> colnames(x) <- x[2, ]
>> SIC <- x[-c(1:3), ]
>> head(SIC)
>   SICCode A/D  Office                                    Industry Title
> 4     100           5                     AGRICULTURAL PRODUCTION-CROPS
> 5     200           5  AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES
> 6     700           5                             AGRICULTURAL SERVICES
> 7     800           5                                          FORESTRY
> 8     900           5                     FISHING, HUNTING AND TRAPPING
> 9    1000           9                                      METAL MINING
>
>> SIC[SIC$SICCode == "2834", ]
>    SICCode A/D  Office               Industry Title
> 91    2834           1  PHARMACEUTICAL PREPARATIONS
>
> HTH,
> Garrett
>
> On Mon, Feb 4, 2013 at 9:19 PM, Bastian Offermann
> <bastian250...@yahoo.co.uk> wrote:
>> Hi,
>> does anybody know whether 4-digit SIC codes are available in R? Something
>> along the lines
>>
>> "2834" "Pharmaceutical Preparations"
>>
>> Thank you.
>>
>> _______________________________________________
>> R-SIG-Finance@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>
> _______________________________________________
> R-SIG-Finance@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions 
> should go.
>
>
> This e-mail and any materials attached hereto, including, without limitation, 
> all content hereof and thereof (collectively, "XR Content") are confidential 
> and proprietary to XR Trading, LLC ("XR") and/or its affiliates, and are 
> protected by intellectual property laws.  Without the prior written consent 
> of XR, the XR Content may not (i) be disclosed to any third party or (ii) be 
> reproduced or otherwise used by anyone other than current employees of XR or 
> its affiliates, on behalf of XR or its affiliates.
>
> THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF 
> ANY KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY 
> DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR 
> CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE 
> LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED 
> TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF 
> PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, 
> OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY 
> OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.

On Tue, Feb 5, 2013 at 8:20 AM, David Reiner <david.rei...@xrtrading.com> wrote:
> Very nice, Garrett!
> More curious than anything, but does anyone know why I get the extraneous 
> characters when I do it?
> They are present in x as well. I believe they are non-breaking spaces.
>
>> head(SIC)
>   SICCode A/D  Office                                    Industry Title
> 4     100            5 Â                    AGRICULTURAL PRODUCTION-CROPS
> 5     200            5 Â AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES
> 6     700            5 Â                            AGRICULTURAL SERVICES
> 7     800            5 Â                                         FORESTRY
> 8     900            5 Â                    FISHING, HUNTING AND TRAPPING
> 9    1000            9 Â                                     METAL MINING
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
> States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] XML_3.95-0.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.15.2
>
> Thanks,
> -- David Reiner
>
>
> -----Original Message-----
> From: r-sig-finance-boun...@r-project.org 
> [mailto:r-sig-finance-boun...@r-project.org] On Behalf Of G See
> Sent: Monday, February 04, 2013 9:30 PM
> To: Bastian Offermann
> Cc: r-sig-finance@r-project.org
> Subject: Re: [R-SIG-Finance] 4-digit SIC codes
>
> I'm not sure, but here's a really quick and dirty way to get it
>
>> library(XML)
>> x <- readHTMLTable("http://www.sec.gov/info/edgar/siccodes.htm";,
>                                   stringsAsFactors=FALSE)[[4]]
>> colnames(x) <- x[2, ]
>> SIC <- x[-c(1:3), ]
>> head(SIC)
>   SICCode A/D  Office                                    Industry Title
> 4     100           5                     AGRICULTURAL PRODUCTION-CROPS
> 5     200           5  AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES
> 6     700           5                             AGRICULTURAL SERVICES
> 7     800           5                                          FORESTRY
> 8     900           5                     FISHING, HUNTING AND TRAPPING
> 9    1000           9                                      METAL MINING
>
>> SIC[SIC$SICCode == "2834", ]
>    SICCode A/D  Office               Industry Title
> 91    2834           1  PHARMACEUTICAL PREPARATIONS
>
> HTH,
> Garrett
>
> On Mon, Feb 4, 2013 at 9:19 PM, Bastian Offermann
> <bastian250...@yahoo.co.uk> wrote:
>> Hi,
>> does anybody know whether 4-digit SIC codes are available in R? Something
>> along the lines
>>
>> "2834" "Pharmaceutical Preparations"
>>
>> Thank you.
>>
>> _______________________________________________
>> R-SIG-Finance@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>
> _______________________________________________
> R-SIG-Finance@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions 
> should go.
>
>
> This e-mail and any materials attached hereto, including, without limitation, 
> all content hereof and thereof (collectively, "XR Content") are confidential 
> and proprietary to XR Trading, LLC ("XR") and/or its affiliates, and are 
> protected by intellectual property laws.  Without the prior written consent 
> of XR, the XR Content may not (i) be disclosed to any third party or (ii) be 
> reproduced or otherwise used by anyone other than current employees of XR or 
> its affiliates, on behalf of XR or its affiliates.
>
> THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF 
> ANY KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY 
> DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR 
> CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE 
> LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED 
> TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF 
> PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, 
> OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY 
> OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.

_______________________________________________
R-SIG-Finance@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should 
go.

Reply via email to