Re: [R] difficulties in reading a .prn file
I would guess that your separator is not really a tab like you think it is. Take a small subset of the data, bring it up in a text editor, check the contents and then try to read it. Always start small to see if it is working the way you think it should. Also it seem to have a header, so why are you ignoring it? It may make your numeric columns look like factors which is probably not want you want. On Wed, Oct 29, 2008 at 12:19 PM, [EMAIL PROTECTED] wrote: Hello, I am having problems in reading appropriately a huge .prn file of almost 450.000 rows and 29 columns. The variables are consisted of characters, dates, time, numeric values. I use read.table(file.prn, header=F, sep=\t, na.strings=*), where the missing values are declared as *. The R engine is reading it like it, but when I am asking for the dimensions of the data frame I get the right number of rows but only 1 column... dim(file) [1] 422344 1 It is somehow as it reads the whole row as one column. When I am asking for the first 3 lines for example I got the message that R is reading everything as factors and I get something like this below: data12L[1:3,] ID DATETime RRR VEl Leng Weig Sub var1 var2 var3 var4 var5 var6 var7 var8 var9 var10var11var12var13var14var15VAR1VAR2VAR3 VAR4VAR5VAR6VAR7VAR8VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 [2] 54678611 39356 0.1572569RW 892014 21400 V11A11 4500 7200 4700 5000 * * * * * * * * * * * 0 527 594 567 * * * * * * * * * * * [3] 54678612 39356 0.158RW 811716 33000 T11O3 7100 9100 5700 5600 5500 * * * * * * * * * * 0 397 605 133 133 * * * * * * * * * * 422344 Levels:ID DATETime RRR VEl LengWeig Sub var1 var2 var3 var4 var5 var6 var7 var8 var9var10var11var12var13var14 var15VAR1VAR2VAR3VAR4VAR5VAR6VAR7VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 .. Is there any solution? Any suggestion? And what is going on with the *? Is there any suggestion for this as well??? Thanks for your time! Ismini __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difficulties in reading a .prn file
On Wed, Oct 29, 2008 at 06:19:51PM +0200, [EMAIL PROTECTED] wrote: I am having problems in reading appropriately a huge .prn file of almost 450.000 rows and 29 columns. The variables are consisted of characters, dates, time, numeric values. I use read.table(file.prn, header=F, sep=\t, na.strings=*), where the missing values are declared as *. The R engine is reading it like it, but when I am asking for the dimensions of the data frame I get the right number of rows but only 1 column... dim(file) [1] 422344 1 The most likely explanation is that your file is not tab separated. And what is going on with the *? Is there any suggestion for this as well??? That should work fine as soon as you figure out the correct value for sep. BTW: your outpu looks like you want to use header=T. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difficulties in reading a .prn file
jim holtman wrote: I would guess that your separator is not really a tab like you think it is. Take a small subset of the data, bring it up in a text editor, check the contents and then try to read it. Always start small to see if it is working the way you think it should. Also it seem to have a header, so why are you ignoring it? It may make your numeric columns look like factors which is probably not want you want. Also, there seems to be 38 columns, not 29... Does it not work with plain whitespace separation?, i.e.: read.table(file.prn, header=T, na.strings=*) On Wed, Oct 29, 2008 at 12:19 PM, [EMAIL PROTECTED] wrote: Hello, I am having problems in reading appropriately a huge .prn file of almost 450.000 rows and 29 columns. The variables are consisted of characters, dates, time, numeric values. I use read.table(file.prn, header=F, sep=\t, na.strings=*), where the missing values are declared as *. The R engine is reading it like it, but when I am asking for the dimensions of the data frame I get the right number of rows but only 1 column... dim(file) [1] 422344 1 It is somehow as it reads the whole row as one column. When I am asking for the first 3 lines for example I got the message that R is reading everything as factors and I get something like this below: data12L[1:3,] ID DATETime RRR VEl Leng Weig Sub var1 var2 var3 var4 var5 var6 var7 var8 var9var10var11var12var13var14var15VAR1VAR2 VAR3VAR4VAR5VAR6VAR7VAR8VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 [2] 54678611 39356 0.1572569RW 892014 21400 V11A11 4500 7200 4700 5000 * * * * * * * * * * * 0 527 594 567 * * * * * * * * * * * [3] 54678612 39356 0.158RW 811716 33000 T11O3 7100 9100 5700 5600 5500 * * * * * * * * * * 0 397 605 133 133 * * * * * * * * * * 422344 Levels:ID DATETime RRR VEl LengWeig Sub var1 var2 var3 var4 var5 var6 var7 var8 var9var10var11var12var13 var14var15VAR1VAR2VAR3VAR4VAR5VAR6VAR7 VAR8VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 .. Is there any solution? Any suggestion? And what is going on with the *? Is there any suggestion for this as well??? Thanks for your time! Ismini __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.