Re: [R] R eat my data

2010-05-25 Thread Changbin Du
> id_gname<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",
quote="", skip=0, header=F, fill=T)
> dim(id_gname)
[1] 19323


Yes, it works after adding quote="" to the read table options.
Thanks, Chris!




On Tue, May 25, 2010 at 9:34 AM, Chris Stubben  wrote:

>
> Gene names often have single quotes like
>
> 5'-methylthioadenosine phosphorylase
> ATP synthase B' chain
> ppGpp 3'-pyrophosphohydrolase
>
> so maybe  try adding quote="" to the read table options.
>
> Chris Stubben
> --
> View this message in context:
> http://r.789695.n4.nabble.com/R-eat-my-data-tp2230217p2230303.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Chris Stubben

Gene names often have single quotes like

5'-methylthioadenosine phosphorylase  
ATP synthase B' chain
ppGpp 3'-pyrophosphohydrolase

so maybe  try adding quote="" to the read table options.

Chris Stubben
-- 
View this message in context: 
http://r.789695.n4.nabble.com/R-eat-my-data-tp2230217p2230303.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Joris Meys
the last entries in the dataframe, how do they look?

On Tue, May 25, 2010 at 6:12 PM, Changbin Du  wrote:

> 644727344ABC-2 type transporterABC-2 type transporter
> 644727345conserved hypothetical proteinconserved hypothetical
> protein
>
> Here is the last two lines of the file id_name_gh5.txt.
>
>
>
> On Tue, May 25, 2010 at 8:57 AM, David Winsemius  >wrote:
>
> >
> > On May 25, 2010, at 11:42 AM, Changbin Du wrote:
> >
> >  HI, Dear R community,
> >>
> >> My original file has 1932 lines, but when I read into R, it changed to
> >> 1068
> >> lines, how comes?
> >>
> >
> > We are being asked to investigate this quest, how?
> >
> > Have you looked at the last line to see if it looks like gene_name?
> >
> > Isn't this isomorphic to genetics questions? What sort of mutation is it?
> > Deletion? Abnormal stop codon? Figure out where the transcription process
> > went wrong.  This sort of analysis would appear to be right up the alley
> of
> > someone doing genetics.
> >
> >
> >
> >>
> >> c...@nuuk:~/operon$ wc -l id_name_gh5.txt
> >> 1932 id_name_gh5.txt
> >>
> >>
> >>  gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",
> >>>
> >> skip=0, header=F, fill=T)
> >>
> >>> dim(gene_name)
> >>>
> >> [1] 10683
> >>
> >>
> >> --
> >>
> >
> >
> > David Winsemius, MD
> > West Hartford, CT
> >
> >
>
>
> --
> Sincerely,
> Changbin
> --
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Changbin Du
 gene_name<-read.delim("/home/cdu/operon/id_name_gh5.txt", sep="\t", skip=0,
header=F, fill=T)
> dim(gene_name)
[1] 19323

Thanks, Tao!  Now you see, read.delim works!


Thanks all for your imputs! I really appreciated!









On Tue, May 25, 2010 at 10:13 AM, Shi, Tao  wrote:

> Changbin,
>
> It looks you're trying to read in a gene annotation file and usually it has
> many strange characters, e.g. "#", "&",   (as other people also
> suggest).   I encounter this all the time.  So try to be very thorough about
> your search (the first place I'll look for is the line where R stop reading.
>  See if any thing strange there.)
>
> Also, changing "read.table" to "read.delim" often works.
>
> ...Tao
>
>
>
>
>
> - Original Message ----
> > From: Changbin Du 
> > To: David Winsemius 
> > Cc: r-help@r-project.org
> > Sent: Tue, May 25, 2010 9:12:58 AM
> > Subject: Re: [R] R eat my data
> >
> > 644727344ABC-2 type transporterABC-2 type
> > transporter
> 644727345conserved hypothetical protein
> >   conserved hypothetical
> protein
>
> Here is the last two lines of
> > the file id_name_gh5.txt.
>
>
>
> On Tue, May 25, 2010 at 8:57 AM, David
> > Winsemius <
> > href="mailto:dwinsem...@comcast.net";>dwinsem...@comcast.net>wrote:
>
> >
> >
> > On May 25, 2010, at 11:42 AM, Changbin Du wrote:
> >
> >  HI, Dear
> > R community,
> >>
> >> My original file has 1932 lines, but when I
> > read into R, it changed to
> >> 1068
> >> lines, how
> > comes?
> >>
> >
> > We are being asked to investigate this quest,
> > how?
> >
> > Have you looked at the last line to see if it looks like
> > gene_name?
> >
> > Isn't this isomorphic to genetics questions? What
> > sort of mutation is it?
> > Deletion? Abnormal stop codon? Figure out where
> > the transcription process
> > went wrong.  This sort of analysis would
> > appear to be right up the alley of
> > someone doing
> > genetics.
> >
> >
> >
> >>
> >> c...@nuuk:~/operon$ wc
> > -l id_name_gh5.txt
> >> 1932
> > id_name_gh5.txt
> >>
> >>
> >>
> > gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt",
> > sep="\t",
> >>>
> >> skip=0, header=F,
> > fill=T)
> >>
> >>> dim(gene_name)
> >>>
> >>
> > [1] 10683
> >>
> >>
> >>
> > --
> >>
> >
> >
> > David Winsemius, MD
> > West
> > Hartford, CT
> >
> >
>
>
> --
> >
> Sincerely,
> Changbin
> --
>
> [[alternative HTML
> > version deleted]]
>
> __
>
> > ymailto="mailto:R-help@r-project.org";
> > href="mailto:R-help@r-project.org";>R-help@r-project.org mailing list
>
> > href="https://stat.ethz.ch/mailman/listinfo/r-help"; target=_blank
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting
> > guide http://www.R-project.org/posting-guide.html
> and provide commented,
> > minimal, self-contained, reproducible code.
>
>
>
>


-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Changbin Du
Thanks, Tao, I will try to do it.


On Tue, May 25, 2010 at 10:13 AM, Shi, Tao  wrote:

> Changbin,
>
> It looks you're trying to read in a gene annotation file and usually it has
> many strange characters, e.g. "#", "&",   (as other people also
> suggest).   I encounter this all the time.  So try to be very thorough about
> your search (the first place I'll look for is the line where R stop reading.
>  See if any thing strange there.)
>
> Also, changing "read.table" to "read.delim" often works.
>
> ...Tao
>
>
>
>
>
> - Original Message 
> > From: Changbin Du 
> > To: David Winsemius 
> > Cc: r-help@r-project.org
> > Sent: Tue, May 25, 2010 9:12:58 AM
> > Subject: Re: [R] R eat my data
> >
> > 644727344ABC-2 type transporterABC-2 type
> > transporter
> 644727345conserved hypothetical protein
> >   conserved hypothetical
> protein
>
> Here is the last two lines of
> > the file id_name_gh5.txt.
>
>
>
> On Tue, May 25, 2010 at 8:57 AM, David
> > Winsemius <
> > href="mailto:dwinsem...@comcast.net";>dwinsem...@comcast.net>wrote:
>
> >
> >
> > On May 25, 2010, at 11:42 AM, Changbin Du wrote:
> >
> >  HI, Dear
> > R community,
> >>
> >> My original file has 1932 lines, but when I
> > read into R, it changed to
> >> 1068
> >> lines, how
> > comes?
> >>
> >
> > We are being asked to investigate this quest,
> > how?
> >
> > Have you looked at the last line to see if it looks like
> > gene_name?
> >
> > Isn't this isomorphic to genetics questions? What
> > sort of mutation is it?
> > Deletion? Abnormal stop codon? Figure out where
> > the transcription process
> > went wrong.  This sort of analysis would
> > appear to be right up the alley of
> > someone doing
> > genetics.
> >
> >
> >
> >>
> >> c...@nuuk:~/operon$ wc
> > -l id_name_gh5.txt
> >> 1932
> > id_name_gh5.txt
> >>
> >>
> >>
> > gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt",
> > sep="\t",
> >>>
> >> skip=0, header=F,
> > fill=T)
> >>
> >>> dim(gene_name)
> >>>
> >>
> > [1] 10683
> >>
> >>
> >>
> > --
> >>
> >
> >
> > David Winsemius, MD
> > West
> > Hartford, CT
> >
> >
>
>
> --
> >
> Sincerely,
> Changbin
> --
>
> [[alternative HTML
> > version deleted]]
>
> __
>
> > ymailto="mailto:R-help@r-project.org";
> > href="mailto:R-help@r-project.org";>R-help@r-project.org mailing list
>
> > href="https://stat.ethz.ch/mailman/listinfo/r-help"; target=_blank
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting
> > guide http://www.R-project.org/posting-guide.html
> and provide commented,
> > minimal, self-contained, reproducible code.
>
>
>
>


-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Shi, Tao
Changbin,

It looks you're trying to read in a gene annotation file and usually it has 
many strange characters, e.g. "#", "&",   (as other people also suggest).   
I encounter this all the time.  So try to be very thorough about your search 
(the first place I'll look for is the line where R stop reading.  See if any 
thing strange there.)  

Also, changing "read.table" to "read.delim" often works.

...Tao





- Original Message 
> From: Changbin Du 
> To: David Winsemius 
> Cc: r-help@r-project.org
> Sent: Tue, May 25, 2010 9:12:58 AM
> Subject: Re: [R] R eat my data
> 
> 644727344ABC-2 type transporterABC-2 type 
> transporter
644727345conserved hypothetical protein  
>   conserved hypothetical
protein

Here is the last two lines of 
> the file id_name_gh5.txt.



On Tue, May 25, 2010 at 8:57 AM, David 
> Winsemius <
> href="mailto:dwinsem...@comcast.net";>dwinsem...@comcast.net>wrote:

>
> 
> On May 25, 2010, at 11:42 AM, Changbin Du wrote:
>
>  HI, Dear 
> R community,
>>
>> My original file has 1932 lines, but when I 
> read into R, it changed to
>> 1068
>> lines, how 
> comes?
>>
>
> We are being asked to investigate this quest, 
> how?
>
> Have you looked at the last line to see if it looks like 
> gene_name?
>
> Isn't this isomorphic to genetics questions? What 
> sort of mutation is it?
> Deletion? Abnormal stop codon? Figure out where 
> the transcription process
> went wrong.  This sort of analysis would 
> appear to be right up the alley of
> someone doing 
> genetics.
>
>
>
>>
>> c...@nuuk:~/operon$ wc 
> -l id_name_gh5.txt
>> 1932 
> id_name_gh5.txt
>>
>>
>>  
> gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", 
> sep="\t",
>>>
>> skip=0, header=F, 
> fill=T)
>>
>>> dim(gene_name)
>>>
>> 
> [1] 10683
>>
>>
>> 
> --
>>
>
>
> David Winsemius, MD
> West 
> Hartford, CT
>
>


-- 
> 
Sincerely,
Changbin
--

[[alternative HTML 
> version deleted]]

__

> ymailto="mailto:R-help@r-project.org"; 
> href="mailto:R-help@r-project.org";>R-help@r-project.org mailing list

> href="https://stat.ethz.ch/mailman/listinfo/r-help"; target=_blank 
> >https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting 
> guide http://www.R-project.org/posting-guide.html
and provide commented, 
> minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Changbin Du
Thanks you all for the contributions!

I will send the data back to the computer guys who collect data yesterday.
Actually, the data can be open in excel and txt editor. after replace some
";",

> gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",
skip=0, header=FALSE, fill=TRUE)
> dim(gene_name)
[1] 12053

It is now 1205 lines. It needs to be cleaned. Thanks so much for the great
help and input, I am afraid I use much of your time.

Thanks!





On Tue, May 25, 2010 at 9:27 AM, David Winsemius wrote:

> Have you compared them to
>
> tail(gene)_name, 2)
>
> Come on, man, show some initiative.
>
> On May 25, 2010, at 12:12 PM, Changbin Du wrote:
>
> 644727344ABC-2 type transporterABC-2 type transporter
> 644727345conserved hypothetical proteinconserved hypothetical
> protein
>
> Here is the last two lines of the file id_name_gh5.txt.
>
>
>
> On Tue, May 25, 2010 at 8:57 AM, David Winsemius 
> wrote:
>
>>
>> On May 25, 2010, at 11:42 AM, Changbin Du wrote:
>>
>>  HI, Dear R community,
>>>
>>> My original file has 1932 lines, but when I read into R, it changed to
>>> 1068
>>> lines, how comes?
>>>
>>
>> We are being asked to investigate this quest, how?
>>
>> Have you looked at the last line to see if it looks like gene_name?
>>
>> Isn't this isomorphic to genetics questions? What sort of mutation is it?
>> Deletion? Abnormal stop codon? Figure out where the transcription process
>> went wrong.  This sort of analysis would appear to be right up the alley of
>> someone doing genetics.
>>
>>
>>
>>>
>>> c...@nuuk:~/operon$ wc -l id_name_gh5.txt
>>> 1932 id_name_gh5.txt
>>>
>>>
>>>  gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",

>>> skip=0, header=F, fill=T)
>>>
 dim(gene_name)

>>> [1] 10683
>>>
>>>
>>> --
>>>
>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
>
> --
> Sincerely,
> Changbin
> --
>
>
>
>
> David Winsemius, MD
> West Hartford, CT
>
>


-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread David Winsemius
Have you compared them to

tail(gene)_name, 2)

Come on, man, show some initiative.

On May 25, 2010, at 12:12 PM, Changbin Du wrote:

> 644727344ABC-2 type transporterABC-2 type transporter
> 644727345conserved hypothetical proteinconserved  
> hypothetical protein
>
> Here is the last two lines of the file id_name_gh5.txt.
>
>
>
> On Tue, May 25, 2010 at 8:57 AM, David Winsemius  > wrote:
>
> On May 25, 2010, at 11:42 AM, Changbin Du wrote:
>
> HI, Dear R community,
>
> My original file has 1932 lines, but when I read into R, it changed  
> to 1068
> lines, how comes?
>
> We are being asked to investigate this quest, how?
>
> Have you looked at the last line to see if it looks like gene_name?
>
> Isn't this isomorphic to genetics questions? What sort of mutation  
> is it? Deletion? Abnormal stop codon? Figure out where the  
> transcription process went wrong.  This sort of analysis would  
> appear to be right up the alley of someone doing genetics.
>
>
>
>
> c...@nuuk:~/operon$ wc -l id_name_gh5.txt
> 1932 id_name_gh5.txt
>
>
> gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",
> skip=0, header=F, fill=T)
> dim(gene_name)
> [1] 10683
>
>
> -- 
>
>
> David Winsemius, MD
> West Hartford, CT
>
>
>
>
> -- 
> Sincerely,
> Changbin
> --
>
>
>

David Winsemius, MD
West Hartford, CT


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Kevin E. Thorpe
When I encounter problems like this, I make sure each row has the 
expected number of columns.  Something like the following awk code is 
useful.


awk -F"\t" '{print NF}' id_name_gh5.txt | sort | uniq -c

Note: I'm not sure is the \t will work with the -F switch as above.

Kevin

Changbin Du wrote:

c...@nuuk:~/operon$ grep '^#' id_name_gh5.txt
c...@nuuk:~/operon$

no lines starts with #



On Tue, May 25, 2010 at 9:11 AM, Barry Rowlingson <
b.rowling...@lancaster.ac.uk> wrote:


On Tue, May 25, 2010 at 4:42 PM, Changbin Du  wrote:

HI, Dear R community,

My original file has 1932 lines, but when I read into R, it changed to

1068

lines, how comes?


c...@nuuk:~/operon$ wc -l id_name_gh5.txt
1932 id_name_gh5.txt



gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",

skip=0, header=F, fill=T)

dim(gene_name)

[1] 10683



 Do any of your lines start with a "#"?


read.table("test.txt",sep="\t")

 V1
1 line 1
2 line 2
3 line 3
4 line 4


read.table("test.txt",comment.char="",sep="\t")

  V1
1  line 1
2  #commented
3  line 2
4  line 3
5 #nother comment
6  line 4

 just a guess. hard to tell without the file...

Barry








--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Joris Meys
without any clue about your data-file this is definitely unsolvable. But
some things to consider :  Where is the dataset coming from? Did you check
for special characters?  Is there an apostrophe somewhere in a string? (That
messed up things for me once). Is the delimiter placed correctly everywhere?


Did you check how the dataframe looks like? If you see what's the last
observation read in, you can jump to that line number in the txt file and
check yourself what goes wrong.


On Tue, May 25, 2010 at 6:15 PM, Changbin Du  wrote:

> c...@nuuk:~/operon$ grep '^#' id_name_gh5.txt
> c...@nuuk:~/operon$
>
> no lines starts with #
>
>
>
> On Tue, May 25, 2010 at 9:11 AM, Barry Rowlingson <
> b.rowling...@lancaster.ac.uk> wrote:
>
> > On Tue, May 25, 2010 at 4:42 PM, Changbin Du 
> wrote:
> > > HI, Dear R community,
> > >
> > > My original file has 1932 lines, but when I read into R, it changed to
> > 1068
> > > lines, how comes?
> > >
> > >
> > > c...@nuuk:~/operon$ wc -l id_name_gh5.txt
> > > 1932 id_name_gh5.txt
> > >
> > >
> > >> gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",
> > > skip=0, header=F, fill=T)
> > >> dim(gene_name)
> > > [1] 10683
> > >
> > >
> >
> >  Do any of your lines start with a "#"?
> >
> > > read.table("test.txt",sep="\t")
> >  V1
> > 1 line 1
> > 2 line 2
> > 3 line 3
> > 4 line 4
> >
> > > read.table("test.txt",comment.char="",sep="\t")
> >   V1
> > 1  line 1
> > 2  #commented
> > 3  line 2
> > 4  line 3
> > 5 #nother comment
> > 6  line 4
> >
> >  just a guess. hard to tell without the file...
> >
> > Barry
> >
>
>
>
> --
> Sincerely,
> Changbin
> --
>
> Changbin Du
> DOE Joint Genome Institute
> Bldg 400 Rm 457
> 2800 Mitchell Dr
> Walnut Creet, CA 94598
> Phone: 925-927-2856
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Sarah Goslee
Without the actual file to look at, this is like playing 20 questions,
only not so much fun.

However, this kind of problem is most often caused by the presence
in your file of something that R interprets as a special character, usually
# or ' or ".

Can you open the file in a spreadsheet?
Can you open the file in a text editor?
Can you search for and remove/replaces symbol characters?

The problem is usually in the last line read or the first line omitted,
so I'd start by looking near line 1068 for issues.

*Reproducible examples* do a much better job of not wasting
people's time.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Changbin Du
c...@nuuk:~/operon$ grep '^#' id_name_gh5.txt
c...@nuuk:~/operon$

no lines starts with #



On Tue, May 25, 2010 at 9:11 AM, Barry Rowlingson <
b.rowling...@lancaster.ac.uk> wrote:

> On Tue, May 25, 2010 at 4:42 PM, Changbin Du  wrote:
> > HI, Dear R community,
> >
> > My original file has 1932 lines, but when I read into R, it changed to
> 1068
> > lines, how comes?
> >
> >
> > c...@nuuk:~/operon$ wc -l id_name_gh5.txt
> > 1932 id_name_gh5.txt
> >
> >
> >> gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",
> > skip=0, header=F, fill=T)
> >> dim(gene_name)
> > [1] 10683
> >
> >
>
>  Do any of your lines start with a "#"?
>
> > read.table("test.txt",sep="\t")
>  V1
> 1 line 1
> 2 line 2
> 3 line 3
> 4 line 4
>
> > read.table("test.txt",comment.char="",sep="\t")
>   V1
> 1  line 1
> 2  #commented
> 3  line 2
> 4  line 3
> 5 #nother comment
> 6  line 4
>
>  just a guess. hard to tell without the file...
>
> Barry
>



-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Changbin Du
644727344ABC-2 type transporterABC-2 type transporter
644727345conserved hypothetical proteinconserved hypothetical
protein

Here is the last two lines of the file id_name_gh5.txt.



On Tue, May 25, 2010 at 8:57 AM, David Winsemius wrote:

>
> On May 25, 2010, at 11:42 AM, Changbin Du wrote:
>
>  HI, Dear R community,
>>
>> My original file has 1932 lines, but when I read into R, it changed to
>> 1068
>> lines, how comes?
>>
>
> We are being asked to investigate this quest, how?
>
> Have you looked at the last line to see if it looks like gene_name?
>
> Isn't this isomorphic to genetics questions? What sort of mutation is it?
> Deletion? Abnormal stop codon? Figure out where the transcription process
> went wrong.  This sort of analysis would appear to be right up the alley of
> someone doing genetics.
>
>
>
>>
>> c...@nuuk:~/operon$ wc -l id_name_gh5.txt
>> 1932 id_name_gh5.txt
>>
>>
>>  gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",
>>>
>> skip=0, header=F, fill=T)
>>
>>> dim(gene_name)
>>>
>> [1] 10683
>>
>>
>> --
>>
>
>
> David Winsemius, MD
> West Hartford, CT
>
>


-- 
Sincerely,
Changbin
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Mohamed Lajnef

cheks the comments sent by  David!

M



Changbin Du a écrit :

> length(count.fields("/home/cdu/operon/id_name_gh5.txt"))
[1] 1932

It is 1932 lines when count in R



On Tue, May 25, 2010 at 8:52 AM, Mohamed Lajnef 
mailto:mohamed.laj...@inserm.fr>> wrote:


Hi Changbin,

Try to use this code in R to count the lines of  your file without
open it

 length(count.fields("id_name_gh5.txt"))
Regards Mohamed




Changbin Du a écrit :

HI, Dear R community,

My original file has 1932 lines, but when I read into R, it
changed to 1068
lines, how comes?


c...@nuuk:~/operon$ wc -l id_name_gh5.txt
1932 id_name_gh5.txt


 


gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt",
sep="\t",
   


skip=0, header=F, fill=T)
 


dim(gene_name)
   


[1] 10683


 




-- 



Mohamed Lajnef,IE INSERM U955 eq 15
Pôle de Psychiatrie
Hôpital CHENEVIER
40, rue Mesly
94010 CRETEIL Cedex FRANCE
mohamed.laj...@inserm.fr 
tel : 01 49 81 31 31 (poste 18470)
Sec : 01 49 81 32 90
fax : 01 49 81 30 99




--
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856





--


Mohamed Lajnef,IE 
INSERM U955 eq 15

Pôle de Psychiatrie
Hôpital CHENEVIER
40, rue Mesly
94010 CRETEIL Cedex FRANCE
mohamed.laj...@inserm.fr
tel : 01 49 81 31 31 (poste 18470)
Sec : 01 49 81 32 90
fax : 01 49 81 30 99 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Barry Rowlingson
On Tue, May 25, 2010 at 4:42 PM, Changbin Du  wrote:
> HI, Dear R community,
>
> My original file has 1932 lines, but when I read into R, it changed to 1068
> lines, how comes?
>
>
> c...@nuuk:~/operon$ wc -l id_name_gh5.txt
> 1932 id_name_gh5.txt
>
>
>> gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",
> skip=0, header=F, fill=T)
>> dim(gene_name)
> [1] 1068    3
>
>

 Do any of your lines start with a "#"?

> read.table("test.txt",sep="\t")
  V1
1 line 1
2 line 2
3 line 3
4 line 4

> read.table("test.txt",comment.char="",sep="\t")
   V1
1  line 1
2  #commented
3  line 2
4  line 3
5 #nother comment
6  line 4

 just a guess. hard to tell without the file...

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Changbin Du
> length(count.fields("/home/cdu/operon/id_name_gh5.txt"))
[1] 1932

It is 1932 lines when count in R



On Tue, May 25, 2010 at 8:52 AM, Mohamed Lajnef wrote:

> Hi Changbin,
>
> Try to use this code in R to count the lines of  your file without open it
>
>  length(count.fields("id_name_gh5.txt"))
> Regards Mohamed
>
>
>
>
> Changbin Du a écrit :
>
>  HI, Dear R community,
>>
>> My original file has 1932 lines, but when I read into R, it changed to
>> 1068
>> lines, how comes?
>>
>>
>> c...@nuuk:~/operon$ wc -l id_name_gh5.txt
>> 1932 id_name_gh5.txt
>>
>>
>>
>>
>>> gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",
>>>
>>>
>> skip=0, header=F, fill=T)
>>
>>
>>> dim(gene_name)
>>>
>>>
>> [1] 10683
>>
>>
>>
>>
>
>
> --
>
>
> Mohamed Lajnef,IE INSERM U955 eq 15
> Pôle de Psychiatrie
> Hôpital CHENEVIER
> 40, rue Mesly
> 94010 CRETEIL Cedex FRANCE
> mohamed.laj...@inserm.fr
> tel : 01 49 81 31 31 (poste 18470)
> Sec : 01 49 81 32 90
> fax : 01 49 81 30 99
>



-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread David Winsemius


On May 25, 2010, at 11:42 AM, Changbin Du wrote:


HI, Dear R community,

My original file has 1932 lines, but when I read into R, it changed  
to 1068

lines, how comes?


We are being asked to investigate this quest, how?

Have you looked at the last line to see if it looks like gene_name?

Isn't this isomorphic to genetics questions? What sort of mutation is  
it? Deletion? Abnormal stop codon? Figure out where the transcription  
process went wrong.  This sort of analysis would appear to be right up  
the alley of someone doing genetics.





c...@nuuk:~/operon$ wc -l id_name_gh5.txt
1932 id_name_gh5.txt



gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",

skip=0, header=F, fill=T)

dim(gene_name)

[1] 10683


--



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R eat my data

2010-05-25 Thread Mohamed Lajnef

Hi Changbin,

Try to use this code in R to count the lines of  your file without open it

 length(count.fields("id_name_gh5.txt")) 

Regards 
Mohamed





Changbin Du a écrit :

HI, Dear R community,

My original file has 1932 lines, but when I read into R, it changed to 1068
lines, how comes?


c...@nuuk:~/operon$ wc -l id_name_gh5.txt
1932 id_name_gh5.txt


  

gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",


skip=0, header=F, fill=T)
  

dim(gene_name)


[1] 10683


  



--


Mohamed Lajnef,IE 
INSERM U955 eq 15

Pôle de Psychiatrie
Hôpital CHENEVIER
40, rue Mesly
94010 CRETEIL Cedex FRANCE
mohamed.laj...@inserm.fr
tel : 01 49 81 31 31 (poste 18470)
Sec : 01 49 81 32 90
fax : 01 49 81 30 99 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.