Re: [R] difficulties in reading a .prn file

2008-10-29 Thread jim holtman
I would guess that your separator is not really a tab like you think
it is.  Take a small subset of the data, bring it up in a text editor,
check the contents and then try to read it.  Always start small to see
if it is working the way you think it should.  Also it seem to have a
header, so why are you ignoring it?  It may make your numeric columns
look like factors which is probably not want you want.

On Wed, Oct 29, 2008 at 12:19 PM,  [EMAIL PROTECTED] wrote:

 Hello,

 I am having problems in reading appropriately a huge .prn file of almost 
 450.000 rows and 29 columns.
 The variables are consisted of characters, dates, time, numeric values.
 I use read.table(file.prn, header=F, sep=\t, na.strings=*), where the 
 missing values are declared as *.
 The R engine is reading it like it, but when I am asking for the dimensions 
 of the data frame I get the right number of rows but only 1 column...
 dim(file)
 [1] 422344  1

 It is somehow as it reads the whole row as one column.
 When I am asking for the first 3 lines for example I got the message that R 
 is reading everything as factors and I get something like this below:

  data12L[1:3,]
 ID   DATETime  RRR  VEl   Leng Weig  Sub  
  var1 var2 var3 var4 var5 var6 var7 var8 var9 
var10var11var12var13var14var15VAR1VAR2VAR3 
VAR4VAR5VAR6VAR7VAR8VAR9   VAR10   VAR11   VAR12   
 VAR13   VAR14   VAR15
 [2] 54678611   39356   0.1572569RW  892014   
 21400  V11A11  4500  7200  4700  5000 *   
   * * * * * * * * 
 * * 0   527   594   567 * 
 * * * * * * * *   
   * *
 [3] 54678612   39356   0.158RW   811716   
 33000   T11O3  7100  9100  5700  5600  5500   
   * * * * * * * * 
 * * 0   397   605   133   133 
 * * * * * * * *   
   * *

 422344 Levels:ID   DATETime RRR VEl
 LengWeig Sub var1 var2 var3 var4 var5 
 var6 var7 var8 var9var10var11var12var13var14  
   var15VAR1VAR2VAR3VAR4VAR5VAR6VAR7VAR8
 VAR9   VAR10   VAR11   VAR12   VAR13   VAR14   VAR15 ..

 Is there any solution? Any suggestion?
 And what is going on with the *? Is there any suggestion for this as well???
 Thanks for your time!

 Ismini

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difficulties in reading a .prn file

2008-10-29 Thread Philipp Pagel
On Wed, Oct 29, 2008 at 06:19:51PM +0200, [EMAIL PROTECTED] wrote:
 I am having problems in reading appropriately a huge .prn file of almost
 450.000 rows and 29 columns.  The variables are consisted of characters,
 dates, time, numeric values.  I use read.table(file.prn, header=F,
 sep=\t, na.strings=*), where the missing values are declared as *.  The
 R engine is reading it like it, but when I am asking for the dimensions of
 the data frame I get the right number of rows but only 1 column...
 dim(file)
 [1] 422344  1

The most likely explanation is that your file is not tab separated.

 And what is going on with the *? Is there any suggestion for this as well???

That should work fine as soon as you figure out the correct value for sep.

BTW: your outpu looks like you want to use header=T.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difficulties in reading a .prn file

2008-10-29 Thread Peter Dalgaard
jim holtman wrote:
 I would guess that your separator is not really a tab like you think
 it is.  Take a small subset of the data, bring it up in a text editor,
 check the contents and then try to read it.  Always start small to see
 if it is working the way you think it should.  Also it seem to have a
 header, so why are you ignoring it?  It may make your numeric columns
 look like factors which is probably not want you want.


Also, there seems to be 38 columns, not 29...

Does it not work with plain whitespace separation?, i.e.:

read.table(file.prn, header=T, na.strings=*)

 On Wed, Oct 29, 2008 at 12:19 PM,  [EMAIL PROTECTED] wrote:
 Hello,

 I am having problems in reading appropriately a huge .prn file of almost 
 450.000 rows and 29 columns.
 The variables are consisted of characters, dates, time, numeric values.
 I use read.table(file.prn, header=F, sep=\t, na.strings=*), where the 
 missing values are declared as *.
 The R engine is reading it like it, but when I am asking for the dimensions 
 of the data frame I get the right number of rows but only 1 column...
 dim(file)
 [1] 422344  1

 It is somehow as it reads the whole row as one column.
 When I am asking for the first 3 lines for example I got the message that R 
 is reading everything as factors and I get something like this below:

  data12L[1:3,]
 ID   DATETime  RRR  VEl   Leng Weig  Sub 
   var1 var2 var3 var4 var5 var6 var7 var8 
 var9var10var11var12var13var14var15VAR1VAR2   
  VAR3VAR4VAR5VAR6VAR7VAR8VAR9   VAR10   VAR11   
 VAR12   VAR13   VAR14   VAR15
 [2] 54678611   39356   0.1572569RW  892014   
 21400  V11A11  4500  7200  4700  5000 *  
* * * * * * * 
 * * * 0   527   594   567 *  
* * * * * * * *   
   * *
 [3] 54678612   39356   0.158RW   811716  
  33000   T11O3  7100  9100  5700  5600  5500 
 * * * * * * *
  * * * 0   397   605   133   133 
 * * * * * * * *  
* *

 422344 Levels:ID   DATETime RRR VEl
 LengWeig Sub var1 var2 var3 var4 var5
  var6 var7 var8 var9var10var11var12var13
 var14var15VAR1VAR2VAR3VAR4VAR5VAR6VAR7
 VAR8VAR9   VAR10   VAR11   VAR12   VAR13   VAR14   VAR15 ..

 Is there any solution? Any suggestion?
 And what is going on with the *? Is there any suggestion for this as 
 well???
 Thanks for your time!

 Ismini

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.