> -----Original Message----- > From: gerald.j...@dgag.ca [mailto:gerald.j...@dgag.ca] > Sent: Friday, February 05, 2010 10:58 AM > To: William Dunlap > Cc: Uwe Ligges; r-help@r-project.org > Subject: RE: [R] Importing data coming from Splus into R. > > Hello Bill, > > here is what I tried with the Splus built-in data set "claims". > > In Splus: > > apply(claims, 2, class) > age car.age type cost number > "ordered" "ordered" "factor" "numeric" "numeric" > dump(list = "claims", > fileout = "/home/jeg002/splus/R/Exemples/R/myclaims.txt", > oldStyle = T) ## I tried both, oldStyle = T and > oldStyle = F, same > results. > > In R: > > claims <- source("/home/jeg002/splus/R/Exemples/R/myclaims.txt") > apply(claims$value, 2, class) ## oldStyle = T this time. > age car.age type cost number > "character" "character" "character" "character" "character" >
Use lapply(claims$value, class) instead of apply(claims$value, 2, class). In R apply converts its first argument into a matrix, which will be a character matrix if any columns are factors. In recent versions of S+ apply(data.frame, MARGIN=2,...) avoids the convert-to-matrix step and works on the columns of the data.frame. In this example it looks like the Splus dump -> R source route works. R> lapply(claims$value, class) $age [1] "ordered" "factor" $car.age [1] "ordered" "factor" $type [1] "factor" $cost [1] "numeric" $number [1] "numeric" Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > I must admit I had not tried using "write.table" from Splus. > I did, now, > always with the "claims" data set. On the first attempt R > complained of no > method to change the character variables to the "ordered" > class. I made a > copy of the data set in Splus, changed the class of two variables from > "ordered" to "factor" and gave it another try. Here are the results: > > In Splus: > > new.claims <- claims > class(new.claims$age) <- "factor" > class(new.claims$car.age) <- "factor" > apply(new.claims, 2, class) > age car.age type cost number > "factor" "factor" "factor" "numeric" "numeric" > write.table(data = new.claims, > file = "/home/jeg002/splus/R/Exemples/R/myclaims.txt", > sep = "@", append = F, quote.strings = T, > dimnames.write = T, na = NA, end.of.row = "\n", > justify.format = "decimal") > > In R: > > claims.classes <- c("character", "factor", "factor", > "factor", "numeric", > "numeric") ## The first "character" is for the > row.names > claims <- > read.table(file = "/home/jeg002/splus/R/Exemples/R/myclaims.txt", > header = TRUE, sep = "@", quote = "\"", as.is = FALSE, > strip.white = FALSE, comment.char = "", > na.strings = "NA", > nrows = 200, colClasses = claims.classes) > apply(claims, 2, class) > row.names age car.age type cost > number > "character" "character" "character" "character" "character" > "character" > > > I'd be more than happy to supply you a small sample of my > data set if the > built-in "claims" doesn't do the job. > > Thanks for your support, > > Gérald Jean > Conseiller senior en statistiques, > VP Planification et Développement des Marchés, > Desjardins Groupe d'Assurances Générales > télephone : (418) 835-4900 poste (7639) > télecopieur : (418) 835-6657 > courrier électronique: gerald.j...@dgag.ca > > "In God we trust, all others must bring data" W. Edwards Deming > > > "William Dunlap" <wdun...@tibco.com> a écrit sur 2010/02/05 12:37:25 : > > > For a data.frame with only numeric and factor > > columns using dump() on the S+ end and source() > > on the R end ought to work. If you have timeDate > > columns you will need to convert them to character > > data before exporting and convert them to your > > favorite R time/date class after importing them. > > > > If you could send me a fairly small sample of your > > data that shows the incompatibility between S+'s > > write.table and R's read.table I could try to fix > > things up so they were more compatible. > > > > Code that reads the S+ native binary format must > > be 32/64 bit aware, since S+ integers are 32 bits > > on 32-bit versions of S+ and 64 bits on 64-bit > > versions. > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > > > > -----Original Message----- > > > From: r-help-boun...@r-project.org > > > [mailto:r-help-boun...@r-project.org] On Behalf Of Uwe Ligges > > > Sent: Friday, February 05, 2010 8:05 AM > > > To: Gerald Jean > > > Cc: r-help@r-project.org > > > Subject: Re: [R] Importing data coming from Splus into R. > > > > > > 1. I am stuck with a copy of S-PLUS 4.x. At that time I used > > > dump() in > > > S-PLUS and source() to get things into R afterwards ... > > > > > > 2. Why do you think that 32-bit vs. 64-bit issues matter? The file > > > format does not change (well, this is guessed since I do > not have any > > > 64-bit S-PLUS version available). > > > > > > Best, > > > Uwe Ligges > > > > > > > > > On 05.02.2010 16:35, gerald.j...@dgag.ca wrote: > > > > > > > > Hello there, > > > > > > > > I spent all day yesterday trying to get a small data set > > > from Splus into R, > > > > no luck! Both, Splus and R, are run on a 64-bit RedHat > > > Linux machine, the > > > > versions of the softwares are 64-bit and are as what follows: > > > > > > > > Splus: > > > > TIBCO Software Inc. Confidential Information > > > > Copyright (c) 1988-2008 TIBCO Software Inc. ALL RIGHTS RESERVED. > > > > TIBCO Spotfire S+ Version 8.1.1 for Linux 2.6.9-34.EL, > 64-bit : 2008 > > > > > > > > R: > > > > R version 2.8.0 (2008-10-20) > > > > Copyright (C) 2008 The R Foundation for Statistical Computing > > > > ISBN 3-900051-07-0 > > > > > > > > I know that the "foreign" package has a function to > > > directly import Splus > > > > data sets into R, but I also know that it is working > only for 32-bit > > > > versions of the softwares, hence I didn't try that route. > > > Here is what I > > > > have done: > > > > > > > > In Splus: > > > > > > > > ttt<- exportData(data = FMD.CR.test, > > > > file = > > > "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv", > > > > type = "ASCII", delimiter = "@", quote = > > > T, na.string = > > > > "NA") > > > > ttt.class<- unlist(lapply(FMD.CR.test, class)) > > > > > > > > ### I am using "@" as delimiter since some factor levels > > > contain both the > > > > "," and the ";". > > > > > > > > In R: > > > > > > > > FMD.CR.test.fields<- count.fields(file = > > > > "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv", > > > > sep = "@", quote = > > > "\"", comment.char = > > > > "") > > > > all(FMD.CR.test.fields == 327) > > > > [1] TRUE ## Hence all observations have the same number of > > > fields, so far, > > > > so good! > > > > > > > > FMD.CR.test.classes<- c("factor", "character", > "factor", "factor", > > > > "factor", > > > > "factor", "factor", "factor", > > > "factor", "factor", > > > > "factor", "numeric", "character", > > > and so on) > > > > names(FMD.CR.test.classes)<- c("RTA","police", "mnt.rent.bnct", > > > > "mnt.rent.boni", "mnt.rent.cred.bnct", > > > > "mnt.rent.epar.bnct", "mnt.rent.snbn", > > > > "mnt.rent.trxl", "solde.eop", > > > "solde.nenr.es", > > > > "solde.enr.es", "num.enreg", > > > "trouve", and so on) > > > > FMD.CR.test<- > > > > read.table(file = > > > "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv", > > > > header = TRUE, sep = "@", quote = "\"", > > > as.is = FALSE, > > > > strip.white = FALSE, comment.char = "", > > > na.strings = "NA", > > > > nrows = 65000, colClasses = FMD.CR.test.classes) > > > > dim(FMD.CR.test) > > > > [1] 64093 327 ## OK > > > > > > > > ### Testing if classes are the same as the Splus classes. > > > > > > > > FMD.CR.test.R.classes<- apply(FMD.CR.test, 2, FUN = class) > > > > sum(FMD.CR.test.R.classes == FMD.CR.test.classes) > > > > [1] 79 ## Not exactly what I was expecting! > > > > all(FMD.CR.test.R.classes == "character") > > > > [1] TRUE > > > > > > > > Hence all variables were imported as character, which I > find very > > > > inconvenient; since the data set has a few hundred > factor variables > > > > recoding them is a lot of work, this work has already been > > > done in Splus; > > > > furthermore, the numeric variables would need > conversion as well. > > > > > > > > I tried all combinations of the arguments "as.is", > > > "stringsAsFactors" and > > > > "colClasses" to no avail. I also tried to export the data > > > set in SAS > > > > transport format from Splus and read it through the > > > foreign's read.xport > > > > function, always the same result, everything is imported as > > > character. I > > > > search the r-help archives, I found several messages > > > relating this problem > > > > but no satisfactory solution! > > > > > > > > I am a long time user of Splus and I am planning to use R > > > more often, > > > > mainly due to its wealth of packages and the convenience of > > > installing > > > > them. I hope to find a reliable and convivial way of > > > transferring data > > > > between the two cousins pieces of software. > > > > > > > > Thanks for any insights, > > > > > > > > Gérald Jean > > > > Conseiller senior en statistiques, > > > > VP Planification et Développement des Marchés, > > > > Desjardins Groupe d'Assurances Générales > > > > télephone : (418) 835-4900 poste (7639) > > > > télecopieur : (418) 835-6657 > > > > courrier électronique: gerald.j...@dgag.ca > > > > > > > > "In God we trust, all others must bring data" W. Edwards Deming > > > > > > > > > > > > > > > > > > > > > > > > Le message ci-dessus, ainsi que les documents > > > l'accompagnant, sont destinés > > > > uniquement aux personnes identifiées et peuvent contenir > > > des informations > > > > privilégiées, confidentielles ou ne pouvant être > > > divulguées. Si vous avez > > > > reçu ce message par erreur, veuillez le détruire. > > > > > > > > This communication ( and/or the attachments ) is > intended for named > > > > recipients only and may contain privileged or confidential > > > information > > > > which is not to be disclosed. If you received this > > > communication by mistake > > > > please destroy all copies. > > > > > > > > > > > > > > > > > > > > Faites bonne impression et imprimez seulement au besoin ! > > > > Think green before you print ! > > > > > > > > Le message ci-dessus, ainsi que les documents > > > l'accompagnant, sont destinés uniquement aux personnes > > > identifiées et peuvent contenir des informations > > > privilégiées, confidentielles ou ne pouvant être divulguées. > > > Si vous avez reçu ce message par erreur, veuillez le détruire. > > > > > > > > This communication (and/or the attachments) is intended for > > > named recipients only and may contain privileged or > > > confidential information which is not to be disclosed. If you > > > received this communication by mistake please destroy all copies. > > > > > > > > ______________________________________________ > > > > R-help@r-project.org mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, > reproducible code. > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > Le message ci-dessus, ainsi que les documents l'accompagnant, > sont destinés > uniquement aux personnes identifiées et peuvent contenir des > informations > privilégiées, confidentielles ou ne pouvant être divulguées. > Si vous avez > reçu ce message par erreur, veuillez le détruire. > > This communication ( and/or the attachments ) is intended for named > recipients only and may contain privileged or confidential information > which is not to be disclosed. If you received this > communication by mistake > please destroy all copies. > > > > > Faites bonne impression et imprimez seulement au besoin ! > Think green before you print ! > > Le message ci-dessus, ainsi que les documents l'accompagnant, > sont destinés uniquement aux personnes identifiées et peuvent > contenir des informations privilégiées, confidentielles ou ne > pouvant être divulguées. Si vous avez reçu ce message par > erreur, veuillez le détruire. > > This communication (and/or the attachments) is intended for > named recipients only and may contain privileged or > confidential information which is not to be disclosed. If you > received this communication by mistake please destroy all copies. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.