Re: [R] about data problem
You can use the latter IF you know there are no problems with the input data. If you need to troubleshoot then you need separate columns so you can compare them. -- Sent from my phone. Please excuse my brevity. On September 20, 2016 4:22:41 PM PDT, lily liwrote: >Thanks. The former method works. I confused character with factor. > >Besides, I should use: dta$DischargeNum <- as.numeric( dta$Discharge ) >instead of: dta$Discharge <- as.numeric( dta$Discharge ) > > >On Tue, Sep 20, 2016 at 5:18 PM, Jeff Newmiller > >wrote: > >> Which means it avoided converting to factor... Success! >> >> Note that the column apparently has garbage characters in one or more >of >> the rows, which should be evident when you LOOK AT THE CHARACTERS in >the >> column. They should all be numeric symbols, plus or minus, and >perhaps >> decimal points. If they are not, then the conversion to numeric will >be >> incomplete. See my other message. You have the choice of editing the >file >> (may have concerns with traceability), or you can write R code that >removes >> the garbage characters using gsub. >> -- >> Sent from my phone. Please excuse my brevity. >> >> On September 20, 2016 4:09:02 PM PDT, lily li >wrote: >> >Yes, I tried to add this statement when reading the dataset. >> >But when I use summary(df), it shows: >> >Discharge >> >Length: >> >Class :character >> >Mode :character >> > >> > >> >On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradini > >> >wrote: >> > >> >> read.csv("your_data.csv", stringsAsFactors=FALSE) >> >> (I'm just reiterating Jianling said...) >> >> >> >> Joe >> >> >> >> On Tue, Sep 20, 2016 at 4:56 PM, lily li >wrote: >> >> >> >>> Is there a function in read.csv that I can use to avoid >converting >> >numeric >> >>> to factor? Thanks a lot. >> >>> >> >>> >> >>> >> >>> On Tue, Sep 20, 2016 at 4:42 PM, lily li >> >wrote: >> >>> >> >>> > Thanks. Then what should I do to solve the problem? >> >>> > >> >>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller < >> >>> jdnew...@dcn.davis.ca.us> >> >>> > wrote: >> >>> > >> >>> >> I suppose you can do what works for your data, but I wouldn't >> >recommend >> >>> >> na.rm=TRUE because it hides problems rather than clarifying >them. >> >>> >> >> >>> >> If in fact your data includes true NA values (the letters NA >or >> >simply >> >>> >> nothing between the commas are typical ways this information >may >> >be >> >>> >> indicated), then read.csv will NOT change from integer to >factor >> >>> >> (particularly if you have specified which markers represent NA >> >using >> >>> the >> >>> >> na.strings argument documented under read.table)... so you >> >probably DO >> >>> have >> >>> >> unexpected garbage still in your data which could be obscuring >> >valuable >> >>> >> information that could affect your conclusions. >> >>> >> -- >> >>> >> Sent from my phone. Please excuse my brevity. >> >>> >> >> >>> >> On September 20, 2016 3:11:42 PM PDT, lily li >> > >> >>> >> wrote: >> >>> >> >I reread the data, and use 'na.rm = T' when reading the data. >> >This >> >>> time >> >>> >> >it >> >>> >> >has no such problem. It seems that the existence of NAs >convert >> >the >> >>> >> >integer >> >>> >> >to factor. Thanks for your help. >> >>> >> > >> >>> >> > >> >>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan >> > >> >>> >> >wrote: >> >>> >> > >> >>> >> >> Add the "stringsAsFactors = F" when you read the data, and >> >then >> >>> >> >> convert them to numeric. >> >>> >> >> >> >>> >> >> On 20 September 2016 at 16:00, lily li > >> >wrote: >> >>> >> >> > Yes, it is stored as factor. I can't check out any >problem >> >in the >> >>> >> >> original >> >>> >> >> > data. Reread data doesn't help either. I use read.csv to >> >read in >> >>> >> >the >> >>> >> >> data, >> >>> >> >> > do you think it is better to use read.table? Thanks >again. >> >>> >> >> > >> >>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow >> ><538...@gmail.com> >> >>> >> >wrote: >> >>> >> >> > >> >>> >> >> >> This indicates that your Discharge column has been >> >>> >> >stored/converted as >> >>> >> >> >> a factor (run str(df) to verify and check other >columns). >> >This >> >>> >> >> >> usually happens when functions like read.table are left >to >> >try to >> >>> >> >> >> figure out what each column is and it finds something in >> >that >> >>> >> >column >> >>> >> >> >> that cannot be converted to a number (possibly an oh >> >instead of a >> >>> >> >> >> zero, an el instead of a one, or just a letter or >> >punctuation >> >>> mark >> >>> >> >> >> accidentally in the file). You can either find the >error >> >in your >> >>> >> >> >> original data, fix it, and reread the data, or specify >that >> >the >> >>> >> >column >> >>> >> >> >> should be numeric using the colClasses argument to >> >read.table or >> >>> >> >other >>
Re: [R] about data problem
Thanks. The former method works. I confused character with factor. Besides, I should use: dta$DischargeNum <- as.numeric( dta$Discharge ) instead of: dta$Discharge <- as.numeric( dta$Discharge ) On Tue, Sep 20, 2016 at 5:18 PM, Jeff Newmillerwrote: > Which means it avoided converting to factor... Success! > > Note that the column apparently has garbage characters in one or more of > the rows, which should be evident when you LOOK AT THE CHARACTERS in the > column. They should all be numeric symbols, plus or minus, and perhaps > decimal points. If they are not, then the conversion to numeric will be > incomplete. See my other message. You have the choice of editing the file > (may have concerns with traceability), or you can write R code that removes > the garbage characters using gsub. > -- > Sent from my phone. Please excuse my brevity. > > On September 20, 2016 4:09:02 PM PDT, lily li wrote: > >Yes, I tried to add this statement when reading the dataset. > >But when I use summary(df), it shows: > >Discharge > >Length: > >Class :character > >Mode :character > > > > > >On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradini > >wrote: > > > >> read.csv("your_data.csv", stringsAsFactors=FALSE) > >> (I'm just reiterating Jianling said...) > >> > >> Joe > >> > >> On Tue, Sep 20, 2016 at 4:56 PM, lily li wrote: > >> > >>> Is there a function in read.csv that I can use to avoid converting > >numeric > >>> to factor? Thanks a lot. > >>> > >>> > >>> > >>> On Tue, Sep 20, 2016 at 4:42 PM, lily li > >wrote: > >>> > >>> > Thanks. Then what should I do to solve the problem? > >>> > > >>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller < > >>> jdnew...@dcn.davis.ca.us> > >>> > wrote: > >>> > > >>> >> I suppose you can do what works for your data, but I wouldn't > >recommend > >>> >> na.rm=TRUE because it hides problems rather than clarifying them. > >>> >> > >>> >> If in fact your data includes true NA values (the letters NA or > >simply > >>> >> nothing between the commas are typical ways this information may > >be > >>> >> indicated), then read.csv will NOT change from integer to factor > >>> >> (particularly if you have specified which markers represent NA > >using > >>> the > >>> >> na.strings argument documented under read.table)... so you > >probably DO > >>> have > >>> >> unexpected garbage still in your data which could be obscuring > >valuable > >>> >> information that could affect your conclusions. > >>> >> -- > >>> >> Sent from my phone. Please excuse my brevity. > >>> >> > >>> >> On September 20, 2016 3:11:42 PM PDT, lily li > > > >>> >> wrote: > >>> >> >I reread the data, and use 'na.rm = T' when reading the data. > >This > >>> time > >>> >> >it > >>> >> >has no such problem. It seems that the existence of NAs convert > >the > >>> >> >integer > >>> >> >to factor. Thanks for your help. > >>> >> > > >>> >> > > >>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan > > > >>> >> >wrote: > >>> >> > > >>> >> >> Add the "stringsAsFactors = F" when you read the data, and > >then > >>> >> >> convert them to numeric. > >>> >> >> > >>> >> >> On 20 September 2016 at 16:00, lily li > >wrote: > >>> >> >> > Yes, it is stored as factor. I can't check out any problem > >in the > >>> >> >> original > >>> >> >> > data. Reread data doesn't help either. I use read.csv to > >read in > >>> >> >the > >>> >> >> data, > >>> >> >> > do you think it is better to use read.table? Thanks again. > >>> >> >> > > >>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow > ><538...@gmail.com> > >>> >> >wrote: > >>> >> >> > > >>> >> >> >> This indicates that your Discharge column has been > >>> >> >stored/converted as > >>> >> >> >> a factor (run str(df) to verify and check other columns). > >This > >>> >> >> >> usually happens when functions like read.table are left to > >try to > >>> >> >> >> figure out what each column is and it finds something in > >that > >>> >> >column > >>> >> >> >> that cannot be converted to a number (possibly an oh > >instead of a > >>> >> >> >> zero, an el instead of a one, or just a letter or > >punctuation > >>> mark > >>> >> >> >> accidentally in the file). You can either find the error > >in your > >>> >> >> >> original data, fix it, and reread the data, or specify that > >the > >>> >> >column > >>> >> >> >> should be numeric using the colClasses argument to > >read.table or > >>> >> >other > >>> >> >> >> function. > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li > > > >>> >> >wrote: > >>> >> >> >> > Hi R users, > >>> >> >> >> > > >>> >> >> >> > I have a problem in reading data. > >>> >> >> >> > For example, part of my dataframe is like this: > >>> >> >> >> > > >>> >> >> >> > df > >>> >> >> >> > month day year Discharge > >>> >> >> >> >31
Re: [R] about data problem
Which means it avoided converting to factor... Success! Note that the column apparently has garbage characters in one or more of the rows, which should be evident when you LOOK AT THE CHARACTERS in the column. They should all be numeric symbols, plus or minus, and perhaps decimal points. If they are not, then the conversion to numeric will be incomplete. See my other message. You have the choice of editing the file (may have concerns with traceability), or you can write R code that removes the garbage characters using gsub. -- Sent from my phone. Please excuse my brevity. On September 20, 2016 4:09:02 PM PDT, lily liwrote: >Yes, I tried to add this statement when reading the dataset. >But when I use summary(df), it shows: >Discharge >Length: >Class :character >Mode :character > > >On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradini >wrote: > >> read.csv("your_data.csv", stringsAsFactors=FALSE) >> (I'm just reiterating Jianling said...) >> >> Joe >> >> On Tue, Sep 20, 2016 at 4:56 PM, lily li wrote: >> >>> Is there a function in read.csv that I can use to avoid converting >numeric >>> to factor? Thanks a lot. >>> >>> >>> >>> On Tue, Sep 20, 2016 at 4:42 PM, lily li >wrote: >>> >>> > Thanks. Then what should I do to solve the problem? >>> > >>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller < >>> jdnew...@dcn.davis.ca.us> >>> > wrote: >>> > >>> >> I suppose you can do what works for your data, but I wouldn't >recommend >>> >> na.rm=TRUE because it hides problems rather than clarifying them. >>> >> >>> >> If in fact your data includes true NA values (the letters NA or >simply >>> >> nothing between the commas are typical ways this information may >be >>> >> indicated), then read.csv will NOT change from integer to factor >>> >> (particularly if you have specified which markers represent NA >using >>> the >>> >> na.strings argument documented under read.table)... so you >probably DO >>> have >>> >> unexpected garbage still in your data which could be obscuring >valuable >>> >> information that could affect your conclusions. >>> >> -- >>> >> Sent from my phone. Please excuse my brevity. >>> >> >>> >> On September 20, 2016 3:11:42 PM PDT, lily li > >>> >> wrote: >>> >> >I reread the data, and use 'na.rm = T' when reading the data. >This >>> time >>> >> >it >>> >> >has no such problem. It seems that the existence of NAs convert >the >>> >> >integer >>> >> >to factor. Thanks for your help. >>> >> > >>> >> > >>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan > >>> >> >wrote: >>> >> > >>> >> >> Add the "stringsAsFactors = F" when you read the data, and >then >>> >> >> convert them to numeric. >>> >> >> >>> >> >> On 20 September 2016 at 16:00, lily li >wrote: >>> >> >> > Yes, it is stored as factor. I can't check out any problem >in the >>> >> >> original >>> >> >> > data. Reread data doesn't help either. I use read.csv to >read in >>> >> >the >>> >> >> data, >>> >> >> > do you think it is better to use read.table? Thanks again. >>> >> >> > >>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow ><538...@gmail.com> >>> >> >wrote: >>> >> >> > >>> >> >> >> This indicates that your Discharge column has been >>> >> >stored/converted as >>> >> >> >> a factor (run str(df) to verify and check other columns). >This >>> >> >> >> usually happens when functions like read.table are left to >try to >>> >> >> >> figure out what each column is and it finds something in >that >>> >> >column >>> >> >> >> that cannot be converted to a number (possibly an oh >instead of a >>> >> >> >> zero, an el instead of a one, or just a letter or >punctuation >>> mark >>> >> >> >> accidentally in the file). You can either find the error >in your >>> >> >> >> original data, fix it, and reread the data, or specify that >the >>> >> >column >>> >> >> >> should be numeric using the colClasses argument to >read.table or >>> >> >other >>> >> >> >> function. >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li > >>> >> >wrote: >>> >> >> >> > Hi R users, >>> >> >> >> > >>> >> >> >> > I have a problem in reading data. >>> >> >> >> > For example, part of my dataframe is like this: >>> >> >> >> > >>> >> >> >> > df >>> >> >> >> > month day year Discharge >>> >> >> >> >31 20106.4 >>> >> >> >> >32 2010 7.58 >>> >> >> >> >33 2010 6.82 >>> >> >> >> >34 2010 8.63 >>> >> >> >> >35 2010 8.16 >>> >> >> >> >36 2010 7.58 >>> >> >> >> > >>> >> >> >> > Then if I type summary(df), why it converts the discharge >data >>> >> >to >>> >> >> >> levels? I >>> >> >> >> > also met the same problem when reading some other csv >files. >>> How >>> >> >to >>> >> >> solve >>> >> >>
Re: [R] about data problem
Yes, I tried to add this statement when reading the dataset. But when I use summary(df), it shows: Discharge Length: Class :character Mode :character On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradiniwrote: > read.csv("your_data.csv", stringsAsFactors=FALSE) > (I'm just reiterating Jianling said...) > > Joe > > On Tue, Sep 20, 2016 at 4:56 PM, lily li wrote: > >> Is there a function in read.csv that I can use to avoid converting numeric >> to factor? Thanks a lot. >> >> >> >> On Tue, Sep 20, 2016 at 4:42 PM, lily li wrote: >> >> > Thanks. Then what should I do to solve the problem? >> > >> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller < >> jdnew...@dcn.davis.ca.us> >> > wrote: >> > >> >> I suppose you can do what works for your data, but I wouldn't recommend >> >> na.rm=TRUE because it hides problems rather than clarifying them. >> >> >> >> If in fact your data includes true NA values (the letters NA or simply >> >> nothing between the commas are typical ways this information may be >> >> indicated), then read.csv will NOT change from integer to factor >> >> (particularly if you have specified which markers represent NA using >> the >> >> na.strings argument documented under read.table)... so you probably DO >> have >> >> unexpected garbage still in your data which could be obscuring valuable >> >> information that could affect your conclusions. >> >> -- >> >> Sent from my phone. Please excuse my brevity. >> >> >> >> On September 20, 2016 3:11:42 PM PDT, lily li >> >> wrote: >> >> >I reread the data, and use 'na.rm = T' when reading the data. This >> time >> >> >it >> >> >has no such problem. It seems that the existence of NAs convert the >> >> >integer >> >> >to factor. Thanks for your help. >> >> > >> >> > >> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan >> >> >wrote: >> >> > >> >> >> Add the "stringsAsFactors = F" when you read the data, and then >> >> >> convert them to numeric. >> >> >> >> >> >> On 20 September 2016 at 16:00, lily li wrote: >> >> >> > Yes, it is stored as factor. I can't check out any problem in the >> >> >> original >> >> >> > data. Reread data doesn't help either. I use read.csv to read in >> >> >the >> >> >> data, >> >> >> > do you think it is better to use read.table? Thanks again. >> >> >> > >> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com> >> >> >wrote: >> >> >> > >> >> >> >> This indicates that your Discharge column has been >> >> >stored/converted as >> >> >> >> a factor (run str(df) to verify and check other columns). This >> >> >> >> usually happens when functions like read.table are left to try to >> >> >> >> figure out what each column is and it finds something in that >> >> >column >> >> >> >> that cannot be converted to a number (possibly an oh instead of a >> >> >> >> zero, an el instead of a one, or just a letter or punctuation >> mark >> >> >> >> accidentally in the file). You can either find the error in your >> >> >> >> original data, fix it, and reread the data, or specify that the >> >> >column >> >> >> >> should be numeric using the colClasses argument to read.table or >> >> >other >> >> >> >> function. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li >> >> >wrote: >> >> >> >> > Hi R users, >> >> >> >> > >> >> >> >> > I have a problem in reading data. >> >> >> >> > For example, part of my dataframe is like this: >> >> >> >> > >> >> >> >> > df >> >> >> >> > month day year Discharge >> >> >> >> >31 20106.4 >> >> >> >> >32 2010 7.58 >> >> >> >> >33 2010 6.82 >> >> >> >> >34 2010 8.63 >> >> >> >> >35 2010 8.16 >> >> >> >> >36 2010 7.58 >> >> >> >> > >> >> >> >> > Then if I type summary(df), why it converts the discharge data >> >> >to >> >> >> >> levels? I >> >> >> >> > also met the same problem when reading some other csv files. >> How >> >> >to >> >> >> solve >> >> >> >> > this problem? Thanks. >> >> >> >> > >> >> >> >> > Discharge >> >> >> >> > 7.58 :2 >> >> >> >> > 6.4 :1 >> >> >> >> > 6.82 :1 >> >> >> >> > 8.63 :1 >> >> >> >> > 8.16 :1 >> >> >> >> > >> >> >> >> > [[alternative HTML version deleted]] >> >> >> >> > >> >> >> >> > __ >> >> >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >> >> >see >> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> >> >> > PLEASE do read the posting guide http://www.R-project.org/ >> >> >> >> posting-guide.html >> >> >> >> > and provide commented, minimal, self-contained, reproducible >> >> >code. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> Gregory (Greg) L. Snow Ph.D. >> >> >> >> 538...@gmail.com >>
Re: [R] about data problem
Find the offending data. One approach is to look at the input data with your image sensors and neural pattern processor (eyes and brain). One way to reduce the load on those told is to read in the data with the stringsAsFactors=TRUE argument and try manually converting the resulting character strings into numeric values. You can then use the is.na function to find which rows failed to convert and use indexing to review the strings that had trouble. # I recommend against using df as a variable name, since it is the name of a function in base R dta$DischargeNum <- as.numeric( dta$Discharge ) dta[ is.na( dta$DischargeNum ), "Discharge" ] -- Sent from my phone. Please excuse my brevity. On September 20, 2016 3:42:39 PM PDT, lily liwrote: >Thanks. Then what should I do to solve the problem? > >On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller > >wrote: > >> I suppose you can do what works for your data, but I wouldn't >recommend >> na.rm=TRUE because it hides problems rather than clarifying them. >> >> If in fact your data includes true NA values (the letters NA or >simply >> nothing between the commas are typical ways this information may be >> indicated), then read.csv will NOT change from integer to factor >> (particularly if you have specified which markers represent NA using >the >> na.strings argument documented under read.table)... so you probably >DO have >> unexpected garbage still in your data which could be obscuring >valuable >> information that could affect your conclusions. >> -- >> Sent from my phone. Please excuse my brevity. >> >> On September 20, 2016 3:11:42 PM PDT, lily li >wrote: >> >I reread the data, and use 'na.rm = T' when reading the data. This >time >> >it >> >has no such problem. It seems that the existence of NAs convert the >> >integer >> >to factor. Thanks for your help. >> > >> > >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan > >> >wrote: >> > >> >> Add the "stringsAsFactors = F" when you read the data, and then >> >> convert them to numeric. >> >> >> >> On 20 September 2016 at 16:00, lily li >wrote: >> >> > Yes, it is stored as factor. I can't check out any problem in >the >> >> original >> >> > data. Reread data doesn't help either. I use read.csv to read in >> >the >> >> data, >> >> > do you think it is better to use read.table? Thanks again. >> >> > >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com> >> >wrote: >> >> > >> >> >> This indicates that your Discharge column has been >> >stored/converted as >> >> >> a factor (run str(df) to verify and check other columns). This >> >> >> usually happens when functions like read.table are left to try >to >> >> >> figure out what each column is and it finds something in that >> >column >> >> >> that cannot be converted to a number (possibly an oh instead of >a >> >> >> zero, an el instead of a one, or just a letter or punctuation >mark >> >> >> accidentally in the file). You can either find the error in >your >> >> >> original data, fix it, and reread the data, or specify that the >> >column >> >> >> should be numeric using the colClasses argument to read.table >or >> >other >> >> >> function. >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li >> >wrote: >> >> >> > Hi R users, >> >> >> > >> >> >> > I have a problem in reading data. >> >> >> > For example, part of my dataframe is like this: >> >> >> > >> >> >> > df >> >> >> > month day year Discharge >> >> >> >31 20106.4 >> >> >> >32 2010 7.58 >> >> >> >33 2010 6.82 >> >> >> >34 2010 8.63 >> >> >> >35 2010 8.16 >> >> >> >36 2010 7.58 >> >> >> > >> >> >> > Then if I type summary(df), why it converts the discharge >data >> >to >> >> >> levels? I >> >> >> > also met the same problem when reading some other csv files. >How >> >to >> >> solve >> >> >> > this problem? Thanks. >> >> >> > >> >> >> > Discharge >> >> >> > 7.58 :2 >> >> >> > 6.4 :1 >> >> >> > 6.82 :1 >> >> >> > 8.63 :1 >> >> >> > 8.16 :1 >> >> >> > >> >> >> > [[alternative HTML version deleted]] >> >> >> > >> >> >> > __ >> >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >> >see >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> >> > PLEASE do read the posting guide http://www.R-project.org/ >> >> >> posting-guide.html >> >> >> > and provide commented, minimal, self-contained, reproducible >> >code. >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Gregory (Greg) L. Snow Ph.D. >> >> >> 538...@gmail.com >> >> >> >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> > __ >> >> >
Re: [R] about data problem
read.csv("your_data.csv", stringsAsFactors=FALSE) (I'm just reiterating Jianling said...) Joe On Tue, Sep 20, 2016 at 4:56 PM, lily liwrote: > Is there a function in read.csv that I can use to avoid converting numeric > to factor? Thanks a lot. > > > > On Tue, Sep 20, 2016 at 4:42 PM, lily li wrote: > > > Thanks. Then what should I do to solve the problem? > > > > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller < > jdnew...@dcn.davis.ca.us> > > wrote: > > > >> I suppose you can do what works for your data, but I wouldn't recommend > >> na.rm=TRUE because it hides problems rather than clarifying them. > >> > >> If in fact your data includes true NA values (the letters NA or simply > >> nothing between the commas are typical ways this information may be > >> indicated), then read.csv will NOT change from integer to factor > >> (particularly if you have specified which markers represent NA using the > >> na.strings argument documented under read.table)... so you probably DO > have > >> unexpected garbage still in your data which could be obscuring valuable > >> information that could affect your conclusions. > >> -- > >> Sent from my phone. Please excuse my brevity. > >> > >> On September 20, 2016 3:11:42 PM PDT, lily li > >> wrote: > >> >I reread the data, and use 'na.rm = T' when reading the data. This time > >> >it > >> >has no such problem. It seems that the existence of NAs convert the > >> >integer > >> >to factor. Thanks for your help. > >> > > >> > > >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan > >> >wrote: > >> > > >> >> Add the "stringsAsFactors = F" when you read the data, and then > >> >> convert them to numeric. > >> >> > >> >> On 20 September 2016 at 16:00, lily li wrote: > >> >> > Yes, it is stored as factor. I can't check out any problem in the > >> >> original > >> >> > data. Reread data doesn't help either. I use read.csv to read in > >> >the > >> >> data, > >> >> > do you think it is better to use read.table? Thanks again. > >> >> > > >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com> > >> >wrote: > >> >> > > >> >> >> This indicates that your Discharge column has been > >> >stored/converted as > >> >> >> a factor (run str(df) to verify and check other columns). This > >> >> >> usually happens when functions like read.table are left to try to > >> >> >> figure out what each column is and it finds something in that > >> >column > >> >> >> that cannot be converted to a number (possibly an oh instead of a > >> >> >> zero, an el instead of a one, or just a letter or punctuation mark > >> >> >> accidentally in the file). You can either find the error in your > >> >> >> original data, fix it, and reread the data, or specify that the > >> >column > >> >> >> should be numeric using the colClasses argument to read.table or > >> >other > >> >> >> function. > >> >> >> > >> >> >> > >> >> >> > >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li > >> >wrote: > >> >> >> > Hi R users, > >> >> >> > > >> >> >> > I have a problem in reading data. > >> >> >> > For example, part of my dataframe is like this: > >> >> >> > > >> >> >> > df > >> >> >> > month day year Discharge > >> >> >> >31 20106.4 > >> >> >> >32 2010 7.58 > >> >> >> >33 2010 6.82 > >> >> >> >34 2010 8.63 > >> >> >> >35 2010 8.16 > >> >> >> >36 2010 7.58 > >> >> >> > > >> >> >> > Then if I type summary(df), why it converts the discharge data > >> >to > >> >> >> levels? I > >> >> >> > also met the same problem when reading some other csv files. How > >> >to > >> >> solve > >> >> >> > this problem? Thanks. > >> >> >> > > >> >> >> > Discharge > >> >> >> > 7.58 :2 > >> >> >> > 6.4 :1 > >> >> >> > 6.82 :1 > >> >> >> > 8.63 :1 > >> >> >> > 8.16 :1 > >> >> >> > > >> >> >> > [[alternative HTML version deleted]] > >> >> >> > > >> >> >> > __ > >> >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > >> >see > >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> >> >> > PLEASE do read the posting guide http://www.R-project.org/ > >> >> >> posting-guide.html > >> >> >> > and provide commented, minimal, self-contained, reproducible > >> >code. > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> Gregory (Greg) L. Snow Ph.D. > >> >> >> 538...@gmail.com > >> >> >> > >> >> > > >> >> > [[alternative HTML version deleted]] > >> >> > > >> >> > __ > >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> >> > PLEASE do read the posting guide http://www.R-project.org/ > >> >> posting-guide.html > >>
Re: [R] about data problem
Is there a function in read.csv that I can use to avoid converting numeric to factor? Thanks a lot. On Tue, Sep 20, 2016 at 4:42 PM, lily liwrote: > Thanks. Then what should I do to solve the problem? > > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller > wrote: > >> I suppose you can do what works for your data, but I wouldn't recommend >> na.rm=TRUE because it hides problems rather than clarifying them. >> >> If in fact your data includes true NA values (the letters NA or simply >> nothing between the commas are typical ways this information may be >> indicated), then read.csv will NOT change from integer to factor >> (particularly if you have specified which markers represent NA using the >> na.strings argument documented under read.table)... so you probably DO have >> unexpected garbage still in your data which could be obscuring valuable >> information that could affect your conclusions. >> -- >> Sent from my phone. Please excuse my brevity. >> >> On September 20, 2016 3:11:42 PM PDT, lily li >> wrote: >> >I reread the data, and use 'na.rm = T' when reading the data. This time >> >it >> >has no such problem. It seems that the existence of NAs convert the >> >integer >> >to factor. Thanks for your help. >> > >> > >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan >> >wrote: >> > >> >> Add the "stringsAsFactors = F" when you read the data, and then >> >> convert them to numeric. >> >> >> >> On 20 September 2016 at 16:00, lily li wrote: >> >> > Yes, it is stored as factor. I can't check out any problem in the >> >> original >> >> > data. Reread data doesn't help either. I use read.csv to read in >> >the >> >> data, >> >> > do you think it is better to use read.table? Thanks again. >> >> > >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com> >> >wrote: >> >> > >> >> >> This indicates that your Discharge column has been >> >stored/converted as >> >> >> a factor (run str(df) to verify and check other columns). This >> >> >> usually happens when functions like read.table are left to try to >> >> >> figure out what each column is and it finds something in that >> >column >> >> >> that cannot be converted to a number (possibly an oh instead of a >> >> >> zero, an el instead of a one, or just a letter or punctuation mark >> >> >> accidentally in the file). You can either find the error in your >> >> >> original data, fix it, and reread the data, or specify that the >> >column >> >> >> should be numeric using the colClasses argument to read.table or >> >other >> >> >> function. >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li >> >wrote: >> >> >> > Hi R users, >> >> >> > >> >> >> > I have a problem in reading data. >> >> >> > For example, part of my dataframe is like this: >> >> >> > >> >> >> > df >> >> >> > month day year Discharge >> >> >> >31 20106.4 >> >> >> >32 2010 7.58 >> >> >> >33 2010 6.82 >> >> >> >34 2010 8.63 >> >> >> >35 2010 8.16 >> >> >> >36 2010 7.58 >> >> >> > >> >> >> > Then if I type summary(df), why it converts the discharge data >> >to >> >> >> levels? I >> >> >> > also met the same problem when reading some other csv files. How >> >to >> >> solve >> >> >> > this problem? Thanks. >> >> >> > >> >> >> > Discharge >> >> >> > 7.58 :2 >> >> >> > 6.4 :1 >> >> >> > 6.82 :1 >> >> >> > 8.63 :1 >> >> >> > 8.16 :1 >> >> >> > >> >> >> > [[alternative HTML version deleted]] >> >> >> > >> >> >> > __ >> >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >> >see >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> >> > PLEASE do read the posting guide http://www.R-project.org/ >> >> >> posting-guide.html >> >> >> > and provide commented, minimal, self-contained, reproducible >> >code. >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Gregory (Greg) L. Snow Ph.D. >> >> >> 538...@gmail.com >> >> >> >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> > __ >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> > PLEASE do read the posting guide http://www.R-project.org/ >> >> posting-guide.html >> >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> >> >> >> >> >> -- >> >> Jianling Fan >> >> 樊建凌 >> >> >> > >> > [[alternative HTML version deleted]] >> > >> >__ >> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >PLEASE do read the posting guide >>
Re: [R] about data problem
Thanks. Then what should I do to solve the problem? On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmillerwrote: > I suppose you can do what works for your data, but I wouldn't recommend > na.rm=TRUE because it hides problems rather than clarifying them. > > If in fact your data includes true NA values (the letters NA or simply > nothing between the commas are typical ways this information may be > indicated), then read.csv will NOT change from integer to factor > (particularly if you have specified which markers represent NA using the > na.strings argument documented under read.table)... so you probably DO have > unexpected garbage still in your data which could be obscuring valuable > information that could affect your conclusions. > -- > Sent from my phone. Please excuse my brevity. > > On September 20, 2016 3:11:42 PM PDT, lily li wrote: > >I reread the data, and use 'na.rm = T' when reading the data. This time > >it > >has no such problem. It seems that the existence of NAs convert the > >integer > >to factor. Thanks for your help. > > > > > >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan > >wrote: > > > >> Add the "stringsAsFactors = F" when you read the data, and then > >> convert them to numeric. > >> > >> On 20 September 2016 at 16:00, lily li wrote: > >> > Yes, it is stored as factor. I can't check out any problem in the > >> original > >> > data. Reread data doesn't help either. I use read.csv to read in > >the > >> data, > >> > do you think it is better to use read.table? Thanks again. > >> > > >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com> > >wrote: > >> > > >> >> This indicates that your Discharge column has been > >stored/converted as > >> >> a factor (run str(df) to verify and check other columns). This > >> >> usually happens when functions like read.table are left to try to > >> >> figure out what each column is and it finds something in that > >column > >> >> that cannot be converted to a number (possibly an oh instead of a > >> >> zero, an el instead of a one, or just a letter or punctuation mark > >> >> accidentally in the file). You can either find the error in your > >> >> original data, fix it, and reread the data, or specify that the > >column > >> >> should be numeric using the colClasses argument to read.table or > >other > >> >> function. > >> >> > >> >> > >> >> > >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li > >wrote: > >> >> > Hi R users, > >> >> > > >> >> > I have a problem in reading data. > >> >> > For example, part of my dataframe is like this: > >> >> > > >> >> > df > >> >> > month day year Discharge > >> >> >31 20106.4 > >> >> >32 2010 7.58 > >> >> >33 2010 6.82 > >> >> >34 2010 8.63 > >> >> >35 2010 8.16 > >> >> >36 2010 7.58 > >> >> > > >> >> > Then if I type summary(df), why it converts the discharge data > >to > >> >> levels? I > >> >> > also met the same problem when reading some other csv files. How > >to > >> solve > >> >> > this problem? Thanks. > >> >> > > >> >> > Discharge > >> >> > 7.58 :2 > >> >> > 6.4 :1 > >> >> > 6.82 :1 > >> >> > 8.63 :1 > >> >> > 8.16 :1 > >> >> > > >> >> > [[alternative HTML version deleted]] > >> >> > > >> >> > __ > >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > >see > >> >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> >> > PLEASE do read the posting guide http://www.R-project.org/ > >> >> posting-guide.html > >> >> > and provide commented, minimal, self-contained, reproducible > >code. > >> >> > >> >> > >> >> > >> >> -- > >> >> Gregory (Greg) L. Snow Ph.D. > >> >> 538...@gmail.com > >> >> > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > __ > >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide http://www.R-project.org/ > >> posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > >> > >> > >> -- > >> Jianling Fan > >> 樊建凌 > >> > > > > [[alternative HTML version deleted]] > > > >__ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
Re: [R] about data problem
I suppose you can do what works for your data, but I wouldn't recommend na.rm=TRUE because it hides problems rather than clarifying them. If in fact your data includes true NA values (the letters NA or simply nothing between the commas are typical ways this information may be indicated), then read.csv will NOT change from integer to factor (particularly if you have specified which markers represent NA using the na.strings argument documented under read.table)... so you probably DO have unexpected garbage still in your data which could be obscuring valuable information that could affect your conclusions. -- Sent from my phone. Please excuse my brevity. On September 20, 2016 3:11:42 PM PDT, lily liwrote: >I reread the data, and use 'na.rm = T' when reading the data. This time >it >has no such problem. It seems that the existence of NAs convert the >integer >to factor. Thanks for your help. > > >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan >wrote: > >> Add the "stringsAsFactors = F" when you read the data, and then >> convert them to numeric. >> >> On 20 September 2016 at 16:00, lily li wrote: >> > Yes, it is stored as factor. I can't check out any problem in the >> original >> > data. Reread data doesn't help either. I use read.csv to read in >the >> data, >> > do you think it is better to use read.table? Thanks again. >> > >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com> >wrote: >> > >> >> This indicates that your Discharge column has been >stored/converted as >> >> a factor (run str(df) to verify and check other columns). This >> >> usually happens when functions like read.table are left to try to >> >> figure out what each column is and it finds something in that >column >> >> that cannot be converted to a number (possibly an oh instead of a >> >> zero, an el instead of a one, or just a letter or punctuation mark >> >> accidentally in the file). You can either find the error in your >> >> original data, fix it, and reread the data, or specify that the >column >> >> should be numeric using the colClasses argument to read.table or >other >> >> function. >> >> >> >> >> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li >wrote: >> >> > Hi R users, >> >> > >> >> > I have a problem in reading data. >> >> > For example, part of my dataframe is like this: >> >> > >> >> > df >> >> > month day year Discharge >> >> >31 20106.4 >> >> >32 2010 7.58 >> >> >33 2010 6.82 >> >> >34 2010 8.63 >> >> >35 2010 8.16 >> >> >36 2010 7.58 >> >> > >> >> > Then if I type summary(df), why it converts the discharge data >to >> >> levels? I >> >> > also met the same problem when reading some other csv files. How >to >> solve >> >> > this problem? Thanks. >> >> > >> >> > Discharge >> >> > 7.58 :2 >> >> > 6.4 :1 >> >> > 6.82 :1 >> >> > 8.63 :1 >> >> > 8.16 :1 >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> > __ >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >see >> >> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> > PLEASE do read the posting guide http://www.R-project.org/ >> >> posting-guide.html >> >> > and provide commented, minimal, self-contained, reproducible >code. >> >> >> >> >> >> >> >> -- >> >> Gregory (Greg) L. Snow Ph.D. >> >> 538...@gmail.com >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Jianling Fan >> 樊建凌 >> > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about data problem
I reread the data, and use 'na.rm = T' when reading the data. This time it has no such problem. It seems that the existence of NAs convert the integer to factor. Thanks for your help. On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fanwrote: > Add the "stringsAsFactors = F" when you read the data, and then > convert them to numeric. > > On 20 September 2016 at 16:00, lily li wrote: > > Yes, it is stored as factor. I can't check out any problem in the > original > > data. Reread data doesn't help either. I use read.csv to read in the > data, > > do you think it is better to use read.table? Thanks again. > > > > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com> wrote: > > > >> This indicates that your Discharge column has been stored/converted as > >> a factor (run str(df) to verify and check other columns). This > >> usually happens when functions like read.table are left to try to > >> figure out what each column is and it finds something in that column > >> that cannot be converted to a number (possibly an oh instead of a > >> zero, an el instead of a one, or just a letter or punctuation mark > >> accidentally in the file). You can either find the error in your > >> original data, fix it, and reread the data, or specify that the column > >> should be numeric using the colClasses argument to read.table or other > >> function. > >> > >> > >> > >> On Tue, Sep 20, 2016 at 3:46 PM, lily li wrote: > >> > Hi R users, > >> > > >> > I have a problem in reading data. > >> > For example, part of my dataframe is like this: > >> > > >> > df > >> > month day year Discharge > >> >31 20106.4 > >> >32 2010 7.58 > >> >33 2010 6.82 > >> >34 2010 8.63 > >> >35 2010 8.16 > >> >36 2010 7.58 > >> > > >> > Then if I type summary(df), why it converts the discharge data to > >> levels? I > >> > also met the same problem when reading some other csv files. How to > solve > >> > this problem? Thanks. > >> > > >> > Discharge > >> > 7.58 :2 > >> > 6.4 :1 > >> > 6.82 :1 > >> > 8.63 :1 > >> > 8.16 :1 > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > __ > >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide http://www.R-project.org/ > >> posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > >> > >> > >> -- > >> Gregory (Greg) L. Snow Ph.D. > >> 538...@gmail.com > >> > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Jianling Fan > 樊建凌 > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about data problem
Add the "stringsAsFactors = F" when you read the data, and then convert them to numeric. On 20 September 2016 at 16:00, lily liwrote: > Yes, it is stored as factor. I can't check out any problem in the original > data. Reread data doesn't help either. I use read.csv to read in the data, > do you think it is better to use read.table? Thanks again. > > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com> wrote: > >> This indicates that your Discharge column has been stored/converted as >> a factor (run str(df) to verify and check other columns). This >> usually happens when functions like read.table are left to try to >> figure out what each column is and it finds something in that column >> that cannot be converted to a number (possibly an oh instead of a >> zero, an el instead of a one, or just a letter or punctuation mark >> accidentally in the file). You can either find the error in your >> original data, fix it, and reread the data, or specify that the column >> should be numeric using the colClasses argument to read.table or other >> function. >> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li wrote: >> > Hi R users, >> > >> > I have a problem in reading data. >> > For example, part of my dataframe is like this: >> > >> > df >> > month day year Discharge >> >31 20106.4 >> >32 2010 7.58 >> >33 2010 6.82 >> >34 2010 8.63 >> >35 2010 8.16 >> >36 2010 7.58 >> > >> > Then if I type summary(df), why it converts the discharge data to >> levels? I >> > also met the same problem when reading some other csv files. How to solve >> > this problem? Thanks. >> > >> > Discharge >> > 7.58 :2 >> > 6.4 :1 >> > 6.82 :1 >> > 8.63 :1 >> > 8.16 :1 >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Gregory (Greg) L. Snow Ph.D. >> 538...@gmail.com >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jianling Fan 樊建凌 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about data problem
Yes, it is stored as factor. I can't check out any problem in the original data. Reread data doesn't help either. I use read.csv to read in the data, do you think it is better to use read.table? Thanks again. On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com> wrote: > This indicates that your Discharge column has been stored/converted as > a factor (run str(df) to verify and check other columns). This > usually happens when functions like read.table are left to try to > figure out what each column is and it finds something in that column > that cannot be converted to a number (possibly an oh instead of a > zero, an el instead of a one, or just a letter or punctuation mark > accidentally in the file). You can either find the error in your > original data, fix it, and reread the data, or specify that the column > should be numeric using the colClasses argument to read.table or other > function. > > > > On Tue, Sep 20, 2016 at 3:46 PM, lily liwrote: > > Hi R users, > > > > I have a problem in reading data. > > For example, part of my dataframe is like this: > > > > df > > month day year Discharge > >31 20106.4 > >32 2010 7.58 > >33 2010 6.82 > >34 2010 8.63 > >35 2010 8.16 > >36 2010 7.58 > > > > Then if I type summary(df), why it converts the discharge data to > levels? I > > also met the same problem when reading some other csv files. How to solve > > this problem? Thanks. > > > > Discharge > > 7.58 :2 > > 6.4 :1 > > 6.82 :1 > > 8.63 :1 > > 8.16 :1 > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Gregory (Greg) L. Snow Ph.D. > 538...@gmail.com > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about data problem
This indicates that your Discharge column has been stored/converted as a factor (run str(df) to verify and check other columns). This usually happens when functions like read.table are left to try to figure out what each column is and it finds something in that column that cannot be converted to a number (possibly an oh instead of a zero, an el instead of a one, or just a letter or punctuation mark accidentally in the file). You can either find the error in your original data, fix it, and reread the data, or specify that the column should be numeric using the colClasses argument to read.table or other function. On Tue, Sep 20, 2016 at 3:46 PM, lily liwrote: > Hi R users, > > I have a problem in reading data. > For example, part of my dataframe is like this: > > df > month day year Discharge >31 20106.4 >32 2010 7.58 >33 2010 6.82 >34 2010 8.63 >35 2010 8.16 >36 2010 7.58 > > Then if I type summary(df), why it converts the discharge data to levels? I > also met the same problem when reading some other csv files. How to solve > this problem? Thanks. > > Discharge > 7.58 :2 > 6.4 :1 > 6.82 :1 > 8.63 :1 > 8.16 :1 > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] about data problem
Hi R users, I have a problem in reading data. For example, part of my dataframe is like this: df month day year Discharge 31 20106.4 32 2010 7.58 33 2010 6.82 34 2010 8.63 35 2010 8.16 36 2010 7.58 Then if I type summary(df), why it converts the discharge data to levels? I also met the same problem when reading some other csv files. How to solve this problem? Thanks. Discharge 7.58 :2 6.4 :1 6.82 :1 8.63 :1 8.16 :1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "invalid argument to unary operator" while selecting rows by name
Sorry, I've made a stupid mistake. It's obviously the other way around. ix <- which(rownames(data) %in% c("601", "604")) clean <- data[-ix, ] Rui Barradas Citando ruipbarra...@sapo.pt: Hello, Try something like the following. ix <- which(c("601", "604") %in% rownames(data)) clean <- data[-ix, ] Hope this helps, Rui Barradas Citando Pauline Laïlle: Dear all, I built a dataframe with read.csv2(). Initially, row names are integers (order of answers to a survey). They are listed in the csv's first column. The import works well and my dataframe looks like I wanted it to look. Row names go as follows : [1] "6" "29" "31" "32" "52" "55" "63" "71" "72" "80" "88" "89" "91" "93" "105" "110" "111" "117" "119" "120" [21] "122" "127" "128" "133" "137" "140" "163" "165" "167" "169" "177" "178" "179" "184" "186" "192" "193" "200" "201" "228" etc. I would like to drop rows "601" & "604" to clean the dataframe. While data["601",] shows me the first row i'd like to drop, data[-"601",] returns the following : Error in -"601" : invalid argument to unary operator idem with data[c("601","604"),] and data[-c("601","604"),] It is the first time that I run into this specific error. After reading a bit about it I still don't understand what it means and how to fix it. Thanks for reading! Best, Pauline. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "invalid argument to unary operator" while selecting rows by name
Hello, Try something like the following. ix <- which(c("601", "604") %in% rownames(data)) clean <- data[-ix, ] Hope this helps, Rui Barradas Citando Pauline Laïlle: Dear all, I built a dataframe with read.csv2(). Initially, row names are integers (order of answers to a survey). They are listed in the csv's first column. The import works well and my dataframe looks like I wanted it to look. Row names go as follows : [1] "6" "29" "31" "32" "52" "55" "63" "71" "72" "80" "88" "89" "91" "93" "105" "110" "111" "117" "119" "120" [21] "122" "127" "128" "133" "137" "140" "163" "165" "167" "169" "177" "178" "179" "184" "186" "192" "193" "200" "201" "228" etc. I would like to drop rows "601" & "604" to clean the dataframe. While data["601",] shows me the first row i'd like to drop, data[-"601",] returns the following : Error in -"601" : invalid argument to unary operator idem with data[c("601","604"),] and data[-c("601","604"),] It is the first time that I run into this specific error. After reading a bit about it I still don't understand what it means and how to fix it. Thanks for reading! Best, Pauline. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to plot the regression line of multivariable linear model?
You might consider the Predict.Plot and TkPredict functions in the TeachingDemos package. These help you explore multiple linear regression models by plotting the "line" relating the response to one of the predictors at given values of the other predictors. These lines can be combined in a single plot (Predict.Plot) or changed interactively (TkPredict). See the examples in the help page. On Sun, Sep 18, 2016 at 9:26 AM, mviljamaawrote: > I'm having a bit of trouble plotting the regression line of multivariable > linear model. > > Specifically my model has one response and two predictors, i.e. it's of the > form > > Y = b_0+b_1*X_1+b_2*X_2 > > Plotting the regression line for a single predictor model > > Y = b_0+b_1*X_1 > > is simple enough, just call abline() with the coefficients returned by lm(). > > However, I don't know if this can be adapted to multivariable linear models. > > I also know about curve(), but I don't know how am I supposed to input the > multivariable model's coefficients into it. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "invalid argument to unary operator" while selecting rows by name
Hint: "601" is not 601. Have you gone through any R tutorials? Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Sep 20, 2016 at 5:42 AM, Pauline Laïllewrote: > Dear all, > > I built a dataframe with read.csv2(). Initially, row names are integers > (order of answers to a survey). They are listed in the csv's first column. > The import works well and my dataframe looks like I wanted it to look. > > Row names go as follows : > [1] "6" "29" "31" "32" "52" "55" "63" "71" "72" "80" "88" "89" > "91" "93" "105" "110" "111" "117" "119" "120" > [21] "122" "127" "128" "133" "137" "140" "163" "165" "167" "169" "177" > "178" "179" "184" "186" "192" "193" "200" "201" "228" > etc. > > I would like to drop rows "601" & "604" to clean the dataframe. > > While data["601",] shows me the first row i'd like to drop, data[-"601",] > returns the following : > Error in -"601" : invalid argument to unary operator > > idem with data[c("601","604"),] and data[-c("601","604"),] > > It is the first time that I run into this specific error. After reading a > bit about it I still don't understand what it means and how to fix it. > > Thanks for reading! > Best, > Pauline. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] "invalid argument to unary operator" while selecting rows by name
Dear all, I built a dataframe with read.csv2(). Initially, row names are integers (order of answers to a survey). They are listed in the csv's first column. The import works well and my dataframe looks like I wanted it to look. Row names go as follows : [1] "6" "29" "31" "32" "52" "55" "63" "71" "72" "80" "88" "89" "91" "93" "105" "110" "111" "117" "119" "120" [21] "122" "127" "128" "133" "137" "140" "163" "165" "167" "169" "177" "178" "179" "184" "186" "192" "193" "200" "201" "228" etc. I would like to drop rows "601" & "604" to clean the dataframe. While data["601",] shows me the first row i'd like to drop, data[-"601",] returns the following : Error in -"601" : invalid argument to unary operator idem with data[c("601","604"),] and data[-c("601","604"),] It is the first time that I run into this specific error. After reading a bit about it I still don't understand what it means and how to fix it. Thanks for reading! Best, Pauline. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mgcv: bam(), error in models with random intercepts and random slopes
Hi all I am using the bam function of the mgcv package to model behavioral data of a learning experiment. To model individual variation in learning rate, I am testing models with (a) by-participant random intercepts of trial, (b) by-participant random slopes and random intercepts of trial, and (c) by-participant random smooth terms. While all (a) and (c) models converge, I am getting an error for every possible variation of a model with random intercepts and random slopes. For example: m1.rs<-bam(acc~ 1 + igc + s(ctrial) + s(sbj, bs="re") + s(ctrial, sbj, bs="re") , data=data_a, family=binomial) Error in G$smooth[[i]]$first.para:G$smooth[[i]]$last.para : argument of length 0 Any idea on what that error might be? Thank you in advance for your time. Fotis P.S.: R version: 3.3.1, mgcv version: 1.8.15 -- PhD Candidate Department of Philosophy and History of Science University of Athens, Greece. http://users.uoa.gr/~aprotopapas/LLL/en/members.html#fotisfotiadis Notice: Please do not use this account for social networks invitations, for sending chain-mails to me, or as it were a facebook account. Thank you for respecting my privacy. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Run a fixed effect regression and a logit regression on a national survey that need to be "weighted"
If you want your records to be weighted by the survey weights during the analysis, then use the weights= argument of the glm() function. Jean On Tue, Sep 20, 2016 at 5:04 AM, laura roncagliawrote: > I am a beginner user of R. I am using a national survey to test what > variables influence the partecipation in complementary pensions (the > partecipation in complementary pension is voluntary in my country). > > Since the dependent variable is a dummy (1 if the person partecipate and 0 > otherwise) I want to run a logit or probit regression; moreover I want to > run a fixed effect regression since I subset the survey in order to have > only the individuals interviewed more than one time. > > The data frame is composed by several social and economical variables and > it also contain a variable "weight" which is the survey weight (they are > weighting coefficients to adjust the results of the sample to the national > data). > > family pers sex income pension1 101 F 1 12 > 201 F 2 13 202 M 4 04 30 > 1 M 25000 05 302 F 5 06 401 M > 6 1 > > pers is the component of the family and pension takes 1 if the person > partecipate to complementary pension (it is a semplification of the > original survey, which contains more variables and observation (aroun 22k > observations)). > > I know how to use the plm and glm functions for a fixed effect or logit > regressoin; in this case I don't know what to do since I need to take > account of the survey weights. > > I used the svydesing function to "weight" the data frame: > > df1 <- svydesign(ids=~1, data=df, weights=~dfweight) > > I used ids=~1 because there isn't a "cluster" variable in the survey (I > know that the towns are ramdomly selected and then individuals are ramdomly > selected, but there isn't a variable that indicate the stratification). > > At this point I am lost: I don't know if it is right to use the survey > package and then what function use to run the regression, or there is a way > to use the plm or glm functions taking account of the weights. > > I tried so hard to search a solution on the website but if you could give > me an answer I'd be glad. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Return the indices of rows of a data frame
There are many good R tutorials on the web. Some recommendations can be found here: https://www.rstudio.com/online-learning/#R Please spend some time learning fundamental R constructs and functionality before posting what appear to be very basic questions here. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Sep 19, 2016 at 8:37 PM, Johnwrote: > Hi, > >I have the following dataframe: > >> temp<-data.frame(a=c(1,1,2), b=2:4, c=1:3) >> row.names(temp)<-c("D", "E", "F") >> temp > a b c > D 1 2 1 > E 1 3 2 > F 2 4 3 > >I would like R to tell me which rows has value "a" equal to 1. The > answer is the first row and the second row, or row D and row E. Which > function should i use? function subset? function which? > >Thanks! > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm model with many categorical variables
You need statistical help, which is generally off topic here. I suggest you post to a statistcal site like stats.stackexchange.com instead. Better yet, find a local statistical expert with whom you can consult. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Sep 20, 2016 at 1:34 AM, Michael Haenleinwrote: > Dear all, > > I am trying to estimate a lm model with one continuous dependent variable > and 11 independent variables that are all categorical, some of which have > many categories (several dozens in some cases). > > I am not interested in statistical inference to a larger population. The > objective of my model is to find a way to best predict my continuous > variable within the sample. > > When I run the lm model I evidently get many regression coefficients that > are not significant. Is there some way to automatically combine levels of a > categorical variable together if the regression coefficients for the > individual levels are not significant? > > My idea is to find some form of grouping of the different categories that > allows me to work with less levels while keeping or even improving the > quality of predictions. > > Thanks, > > Michael > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Return the indices of rows of a data frame
On 9/19/2016 10:37 PM, John wrote: Hi, I have the following dataframe: temp<-data.frame(a=c(1,1,2), b=2:4, c=1:3) row.names(temp)<-c("D", "E", "F") temp a b c D 1 2 1 E 1 3 2 F 2 4 3 I would like R to tell me which rows has value "a" equal to 1. The answer is the first row and the second row, or row D and row E. Which function should i use? function subset? function which? row.names(temp[temp$a==1,]) -- -- Robert W. Baer, Ph.D. Professor of Physiology Kirksville College of Osteopathic Medicine A T Still University of Health Sciences 800 W. Jefferson St Kirksville, MO 63501 660-626-2321 Department 660-626-2965 FAX __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue on LGP solving
Thanks Petr!! However, I found in the goalprog package I found "achievements" as "a data frame with the deviation variables for each objective together with the priority level". I defined > p1<-c(2,0,0,0,0,0) > p2 <- c(0,0,0,0,1,0) > p3<- c(0,0,0,0,0,1) > achievement <- data.frame(p1,p2,p3) Here p1, p2 and p3 are the 3 priority levels. I understand the problem is at "achievement" data frame. To your point, data frame with four named columns (objective, priority, p and n), how these four columns are defined ? Appreciate your time Petr. Thanks again!! Regards, Debasis Ghosh, Ph.D -Original Message- From: PIKAL Petr [mailto:petr.pi...@precheza.cz] Sent: Tuesday, September 20, 2016 6:55 AM To: Dr. Debasis Ghosh; R-help@r-project.org Subject: RE: [R] Issue on LGP solving Hi Just a wild guess. Achievement in the goalprog package is data frame with four named columns (objective, priority, p and n). Your achievement is 3 column data.frame with names p1, p2 and p3. Maybe data frame with defined structure is required. Cheers Petr > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Dr. > Debasis Ghosh > Sent: Tuesday, September 20, 2016 8:12 AM > To: R-help@r-project.org > Subject: [R] Issue on LGP solving > > I was solving a LGP problem which is very basic. > > > > Find x0 = [x1; x2], n0 = [n1; n2; n3] and p0 = [p1; p2; p3] that minimize a = > [(2p1); (n2); (n3)] > > The objectives are as follows > > 10x1 + 15x2 + n1 - p1 = 40 > > 100x1 + 100x2 + n2 - p2 = 1000 > > x2 + n3 - p3 = 7 > > x; n; p >= 0 > > The solution is x' = [4; 0] and a = [0; 600; 7] > > > > > > > local({pkg <- select.list(sort(.packages(all.available = > TRUE)),graphics=TRUE) > > + if(nchar(pkg)) library(pkg, character.only=TRUE)}) > > > local({pkg <- select.list(sort(.packages(all.available = > TRUE)),graphics=TRUE) > > + if(nchar(pkg)) library(pkg, character.only=TRUE)}) > > > > > > > coeff<-matrix (c(10,15,100,100,0,1), nrow=3, ncol=2, byrow=TRUE) > > > target<-c(40,1000,7) > > > p1<-c(2,0,0,0,0,0) > > > p2 <- c(0,0,0,0,1,0) > > > p3<- c(0,0,0,0,0,1) > > > achievement <- data.frame(p1,p2,p3) > > > achievement > > p1 p2 p3 > > 1 2 0 0 > > 2 0 0 0 > > 3 0 0 0 > > 4 0 0 0 > > 5 0 1 0 > > 6 0 0 1 > > > llgp(coeff,target,achievement) > > > > Do you have any idea why I am seeing below error ? > > > > > > Error in matrix(0, nrow = levels, ncol = nonbasics) : > > invalid 'nrow' value (too large or NA) > > In addition: Warning messages: > > 1: In max(achievements$priority) : > > no non-missing arguments to max; returning -Inf > > 2: In matrix(0, nrow = levels, ncol = nonbasics) : > > NAs introduced by coercion to integer range > > > > Regards, > > Debasis Ghosh, Ph.D > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for
Re: [R] Errors in Raster to Point
First answer is that R is not the proper environment for such a question. There a re many free package for image analysis or even GIS. Try for example Q-GIS David Il 19/09/2016 22:34, GwanSeon Kim ha scritto: Hi, all I am just beginner to use R. I am working with TIF image file, and the information about the raster is following: class : RasterLayer dimensions : 11150, 21808, 243159200 (nrow, ncol, ncell) resolution : 30, 30 (x, y) extent : 569685, 1223925, 1513995, 1848495 (xmin, xmax, ymin, ymax) coord. ref. : +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0 data source : C:\Users\Gwan\AppData\Local\Temp\Rtmpg506Ee\raster\r_tmp_2016-09-14_122409_6260_09589.grd names : test_map values : 1, 225 (min, max) attributes : ID OBJECTID Value Red Green Blue Count Class_Name Opacity from: 02 1 1 00 5982503 Corn 1 to : 48 255 254 0 00 10336 Dbl Crop Barley/Soybeans 1 >From this Rasterlayer, I want to convert raster to point for each pixel based on "Value (one of column name)" and create a raster with georeferenced information. I used code as following: RP <- rasterToPoints(KY_raster) However, I could not get the points and have an error message "cannot allocate vector of size 5.4 Gb" and "Your computer is low on memory. Save your files and close these programs". Could someone please help me how I can convert to raster to points?? Best, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm model with many categorical variables
> On 20 Sep 2016, at 11:34, Michael Haenleinwrote: > > Dear all, > > I am trying to estimate a lm model with one continuous dependent variable > and 11 independent variables that are all categorical, some of which have > many categories (several dozens in some cases). If I’m not wrong, ( I assume that categorical variables are in factor form) lm will pick the most crowded categories and will try to fit a linear model over them. (This might be wrong, please correct me somebody) > > I am not interested in statistical inference to a larger population. The > objective of my model is to find a way to best predict my continuous > variable within the sample. The best pick would be a CART ( Classification and Reg. Tree, rpart) or CIT (Conditional Inference Tree, ctree) model to predict continous response variable by categorical variables. Please, see new partykit (old party) package for CIT. > > When I run the lm model I evidently get many regression coefficients that > are not significant. Is there some way to automatically combine levels of a > categorical variable together if the regression coefficients for the > individual levels are not significant? > > My idea is to find some form of grouping of the different categories that > allows me to work with less levels while keeping or even improving the > quality of predictions. I also want to mention cforest here, you can measure the importance of your predictor variables. I would recommend partykit package for categorical predictors, but also you can give it a try to rpart. > > Thanks, > > Michael > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue on LGP solving
Hi Just a wild guess. Achievement in the goalprog package is data frame with four named columns (objective, priority, p and n). Your achievement is 3 column data.frame with names p1, p2 and p3. Maybe data frame with defined structure is required. Cheers Petr > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Dr. > Debasis Ghosh > Sent: Tuesday, September 20, 2016 8:12 AM > To: R-help@r-project.org > Subject: [R] Issue on LGP solving > > I was solving a LGP problem which is very basic. > > > > Find x0 = [x1; x2], n0 = [n1; n2; n3] and p0 = [p1; p2; p3] that minimize a = > [(2p1); (n2); (n3)] > > The objectives are as follows > > 10x1 + 15x2 + n1 - p1 = 40 > > 100x1 + 100x2 + n2 - p2 = 1000 > > x2 + n3 - p3 = 7 > > x; n; p >= 0 > > The solution is x' = [4; 0] and a = [0; 600; 7] > > > > > > > local({pkg <- select.list(sort(.packages(all.available = > TRUE)),graphics=TRUE) > > + if(nchar(pkg)) library(pkg, character.only=TRUE)}) > > > local({pkg <- select.list(sort(.packages(all.available = > TRUE)),graphics=TRUE) > > + if(nchar(pkg)) library(pkg, character.only=TRUE)}) > > > > > > > coeff<-matrix (c(10,15,100,100,0,1), nrow=3, ncol=2, byrow=TRUE) > > > target<-c(40,1000,7) > > > p1<-c(2,0,0,0,0,0) > > > p2 <- c(0,0,0,0,1,0) > > > p3<- c(0,0,0,0,0,1) > > > achievement <- data.frame(p1,p2,p3) > > > achievement > > p1 p2 p3 > > 1 2 0 0 > > 2 0 0 0 > > 3 0 0 0 > > 4 0 0 0 > > 5 0 1 0 > > 6 0 0 1 > > > llgp(coeff,target,achievement) > > > > Do you have any idea why I am seeing below error ? > > > > > > Error in matrix(0, nrow = levels, ncol = nonbasics) : > > invalid 'nrow' value (too large or NA) > > In addition: Warning messages: > > 1: In max(achievements$priority) : > > no non-missing arguments to max; returning -Inf > > 2: In matrix(0, nrow = levels, ncol = nonbasics) : > > NAs introduced by coercion to integer range > > > > Regards, > > Debasis Ghosh, Ph.D > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for
[R] Run a fixed effect regression and a logit regression on a national survey that need to be "weighted"
I am a beginner user of R. I am using a national survey to test what variables influence the partecipation in complementary pensions (the partecipation in complementary pension is voluntary in my country). Since the dependent variable is a dummy (1 if the person partecipate and 0 otherwise) I want to run a logit or probit regression; moreover I want to run a fixed effect regression since I subset the survey in order to have only the individuals interviewed more than one time. The data frame is composed by several social and economical variables and it also contain a variable "weight" which is the survey weight (they are weighting coefficients to adjust the results of the sample to the national data). family pers sex income pension1 101 F 1 12 201 F 2 13 202 M 4 04 30 1 M 25000 05 302 F 5 06 401 M 6 1 pers is the component of the family and pension takes 1 if the person partecipate to complementary pension (it is a semplification of the original survey, which contains more variables and observation (aroun 22k observations)). I know how to use the plm and glm functions for a fixed effect or logit regressoin; in this case I don't know what to do since I need to take account of the survey weights. I used the svydesing function to "weight" the data frame: df1 <- svydesign(ids=~1, data=df, weights=~dfweight) I used ids=~1 because there isn't a "cluster" variable in the survey (I know that the towns are ramdomly selected and then individuals are ramdomly selected, but there isn't a variable that indicate the stratification). At this point I am lost: I don't know if it is right to use the survey package and then what function use to run the regression, or there is a way to use the plm or glm functions taking account of the weights. I tried so hard to search a solution on the website but if you could give me an answer I'd be glad. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Issue on LGP solving
I was solving a LGP problem which is very basic. Find x0 = [x1; x2], n0 = [n1; n2; n3] and p0 = [p1; p2; p3] that minimize a = [(2p1); (n2); (n3)] The objectives are as follows 10x1 + 15x2 + n1 - p1 = 40 100x1 + 100x2 + n2 - p2 = 1000 x2 + n3 - p3 = 7 x; n; p >= 0 The solution is x' = [4; 0] and a = [0; 600; 7] > local({pkg <- select.list(sort(.packages(all.available = TRUE)),graphics=TRUE) + if(nchar(pkg)) library(pkg, character.only=TRUE)}) > local({pkg <- select.list(sort(.packages(all.available = TRUE)),graphics=TRUE) + if(nchar(pkg)) library(pkg, character.only=TRUE)}) > coeff<-matrix (c(10,15,100,100,0,1), nrow=3, ncol=2, byrow=TRUE) > target<-c(40,1000,7) > p1<-c(2,0,0,0,0,0) > p2 <- c(0,0,0,0,1,0) > p3<- c(0,0,0,0,0,1) > achievement <- data.frame(p1,p2,p3) > achievement p1 p2 p3 1 2 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 1 0 6 0 0 1 > llgp(coeff,target,achievement) Do you have any idea why I am seeing below error ? Error in matrix(0, nrow = levels, ncol = nonbasics) : invalid 'nrow' value (too large or NA) In addition: Warning messages: 1: In max(achievements$priority) : no non-missing arguments to max; returning -Inf 2: In matrix(0, nrow = levels, ncol = nonbasics) : NAs introduced by coercion to integer range Regards, Debasis Ghosh, Ph.D [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using lm's subset parameter results in Error in xj[i] : invalid subscript type 'list'
Hi see in line > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of mviljamaa > Sent: Tuesday, September 20, 2016 10:01 AM > To: r-help@r-project.org > Subject: [R] Using lm's subset parameter results in Error in xj[i] : invalid > subscript type 'list' > > I'm trying to take lm on a subset of my dataset and to do this I believe I > need > to pass my subset of the data as the subset parameter of lm. > > So I do my subsetting: > > firstkids <- kidmomhsage[0:234,], i.e. the first 234 rows of the data frame. It works, however line numbering in R starts with 1 not 0. R is clever enough to subset with 0:xx vector however you will not get line 1 by subsetting with 0 dd<-data.frame(a=1:10, b=rnorm(10)) dd[0,] [1] a b <0 rows> (or 0-length row.names) > > Then construct the model: > > fit4 <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age + > kidmomhsage$mom_hs + kidmomhsage$mom_age * > kidmomhsage$mom_hs, > subset=firstkids) You definitelly should spend some time with introductory documentation. The above construction shall be e.g. fit4 <- lm(kid_score ~ mom_age +mom_hs + mom_age *mom_hs, data = kidmomhsage, subset=1:234) > > which results in: > > Error in xj[i] : invalid subscript type 'list' > > I read somewhere a recommendation to use "unlist": I wonder where did you read such recommendation for lm function, you should better avoid that source. Cheers Petr > > fit4 <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age + > kidmomhsage$mom_hs + kidmomhsage$mom_age * > kidmomhsage$mom_hs, > subset=unlist(firstkids)) > > which seems to not produce the error and results in some sort of model, but > is this model the correct one (i.e. for the data set firstkids, just as it > originally > appears)? How does unlist change the data? > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such
[R] lm model with many categorical variables
Dear all, I am trying to estimate a lm model with one continuous dependent variable and 11 independent variables that are all categorical, some of which have many categories (several dozens in some cases). I am not interested in statistical inference to a larger population. The objective of my model is to find a way to best predict my continuous variable within the sample. When I run the lm model I evidently get many regression coefficients that are not significant. Is there some way to automatically combine levels of a categorical variable together if the regression coefficients for the individual levels are not significant? My idea is to find some form of grouping of the different categories that allows me to work with less levels while keeping or even improving the quality of predictions. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using lm's subset parameter results in Error in xj[i] : invalid subscript type 'list'
By subsetting the rows in the firstkids data frame, you've already subsetted your data. Try specifying firstkids as your data instead of a subset in the lm call. Also, eliminate the kidmomhsage prefix from all of your variables since you're running the linear model on a different data frame(firstkids) Something along this line: lm(kid_score ~ mom_age , data = firstkids) *Jorge Cimentada* *Ph.D. Candidate* Dpt. Ciències Polítiques i Socials Ramon Trias Fargas, 25-27 | 08005 Barcelona Office 24.331 [Tel.] 697 382 009 jorge.ciment...@upf.edu http://www.upf.edu/dcpis/ On Tue, Sep 20, 2016 at 10:00 AM, mviljamaawrote: > I'm trying to take lm on a subset of my dataset and to do this I believe I > need to pass my subset of the data as the subset parameter of lm. > > So I do my subsetting: > > firstkids <- kidmomhsage[0:234,], i.e. the first 234 rows of the data > frame. > > Then construct the model: > > fit4 <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age + > kidmomhsage$mom_hs + kidmomhsage$mom_age * kidmomhsage$mom_hs, > subset=firstkids) > > which results in: > > Error in xj[i] : invalid subscript type 'list' > > I read somewhere a recommendation to use "unlist": > > fit4 <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age + > kidmomhsage$mom_hs + kidmomhsage$mom_age * kidmomhsage$mom_hs, > subset=unlist(firstkids)) > > which seems to not produce the error and results in some sort of model, > but is this model the correct one (i.e. for the data set firstkids, just as > it originally appears)? How does unlist change the data? > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posti > ng-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Errors in Raster to Point
On Tue, 20 Sep 2016, 15:55 GwanSeon Kimwrote: > Hi, all > I am just beginner to use R. > I am working with TIF image file, and the information about the raster is > following: > > class : RasterLayer > dimensions : 11150, 21808, 243159200 (nrow, ncol, ncell) > resolution : 30, 30 (x, y) > extent : 569685, 1223925, 1513995, 1848495 (xmin, xmax, ymin, ymax) > coord. ref. : +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 > +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0 > data source : > > C:\Users\Gwan\AppData\Local\Temp\Rtmpg506Ee\raster\r_tmp_2016-09-14_122409_6260_09589.grd > names : test_map > values : 1, 225 (min, max) > attributes : >ID OBJECTID Value Red Green Blue Count Class_Name > Opacity > from: 02 1 1 00 5982503 Corn > 1 > to : 48 255 254 0 00 10336 Dbl Crop Barley/Soybeans > 1 > > > > >From this Rasterlayer, I want to convert raster to point for each pixel > based on "Value (one of column name)" and create a raster with > georeferenced information. > I used code as following: RP <- rasterToPoints(KY_raster) > However, I could not get the points and have an error message "cannot > allocate vector of size 5.4 Gb" and "Your computer is low on memory. Save > your files and close these programs". > Could someone please help me how I can convert to raster to points?? > Best, > The First question is why? The point (centre)coordinates of every pixel are massively redundant since they are a simple function of cell index and the raster's extent. You might try as.data.frame with xy =TRUE to avoid any overhead in casting to Spatial, but still it's very likely that this is just a step towards your actual goal. Tell us what you want to do and I am sure there is a better way. Cheers, Mike > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Dr. Michael Sumner Software and Database Engineer Australian Antarctic Division 203 Channel Highway Kingston Tasmania 7050 Australia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using lm's subset parameter results in Error in xj[i] : invalid subscript type 'list'
I'm trying to take lm on a subset of my dataset and to do this I believe I need to pass my subset of the data as the subset parameter of lm. So I do my subsetting: firstkids <- kidmomhsage[0:234,], i.e. the first 234 rows of the data frame. Then construct the model: fit4 <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age + kidmomhsage$mom_hs + kidmomhsage$mom_age * kidmomhsage$mom_hs, subset=firstkids) which results in: Error in xj[i] : invalid subscript type 'list' I read somewhere a recommendation to use "unlist": fit4 <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age + kidmomhsage$mom_hs + kidmomhsage$mom_age * kidmomhsage$mom_hs, subset=unlist(firstkids)) which seems to not produce the error and results in some sort of model, but is this model the correct one (i.e. for the data set firstkids, just as it originally appears)? How does unlist change the data? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where is R installed on my Linux?
Mike Wojnowiczwrites: > I have successfully installed R on my AWS EC2 r3.8 box running Linux with >>sudo yum install -y R > > However, I cannot find R anywhere (which I want for the sake of > tar'ing it up and decompressing to make future installations easier.) > For example, > >> rpm -ql R > > Says there is nothing to show. > > Does anyone have any ideas? > > -Mike If you run yum info R you'll find this description: This is a metapackage that provides both core R userspace and all R development components. The metapackage consists of 'r-core' plus various other packages. These are the packages that 'rpm' sees. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where is R installed on my Linux?
Hi Mike, Depending upon the flavor of Linux (looks like it's in the RedHat family) it will usually start by running the command "R" in a terminal. What does: which R say? Then look in the startup file (often in /usr/local/bin) for the R_HOME directory. Jim On Tue, Sep 20, 2016 at 9:38 AM, Mike Wojnowiczwrote: > I have successfully installed R on my AWS EC2 r3.8 box running Linux with >>sudo yum install -y R > > However, I cannot find R anywhere (which I want for the sake of tar'ing it up > and decompressing to make future installations easier.) For example, > >> rpm -ql R > > Says there is nothing to show. > > Does anyone have any ideas? > > -Mike > > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.