Re: [R] about data problem

2016-09-20 Thread Jeff Newmiller
You can use the latter IF you know there are no problems with the input data. 
If you need to troubleshoot then you need separate columns so you can compare 
them. 
-- 
Sent from my phone. Please excuse my brevity.

On September 20, 2016 4:22:41 PM PDT, lily li  wrote:
>Thanks. The former method works. I confused character with factor.
>
>Besides, I should use: dta$DischargeNum <- as.numeric( dta$Discharge )
>instead of: dta$Discharge <- as.numeric( dta$Discharge )
>
>
>On Tue, Sep 20, 2016 at 5:18 PM, Jeff Newmiller
>
>wrote:
>
>> Which means it avoided converting to factor... Success!
>>
>> Note that the column apparently has garbage characters in one or more
>of
>> the rows, which should be evident when you LOOK AT THE CHARACTERS in
>the
>> column. They should all be numeric symbols, plus or minus, and
>perhaps
>> decimal points. If they are not, then the conversion to numeric will
>be
>> incomplete. See my other message. You have the choice of editing the
>file
>> (may have concerns with traceability), or you can write R code that
>removes
>> the garbage characters using gsub.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 20, 2016 4:09:02 PM PDT, lily li 
>wrote:
>> >Yes, I tried to add this statement when reading the dataset.
>> >But when I use summary(df), it shows:
>> >Discharge
>> >Length:
>> >Class  :character
>> >Mode  :character
>> >
>> >
>> >On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradini
>
>> >wrote:
>> >
>> >> read.csv("your_data.csv", stringsAsFactors=FALSE)
>> >> (I'm just reiterating Jianling said...)
>> >>
>> >> Joe
>> >>
>> >> On Tue, Sep 20, 2016 at 4:56 PM, lily li 
>wrote:
>> >>
>> >>> Is there a function in read.csv that I can use to avoid
>converting
>> >numeric
>> >>> to factor? Thanks a lot.
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Sep 20, 2016 at 4:42 PM, lily li 
>> >wrote:
>> >>>
>> >>> > Thanks. Then what should I do to solve the problem?
>> >>> >
>> >>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
>> >>> jdnew...@dcn.davis.ca.us>
>> >>> > wrote:
>> >>> >
>> >>> >> I suppose you can do what works for your data, but I wouldn't
>> >recommend
>> >>> >> na.rm=TRUE because it hides problems rather than clarifying
>them.
>> >>> >>
>> >>> >> If in fact your data includes true NA values (the letters NA
>or
>> >simply
>> >>> >> nothing between the commas are typical ways this information
>may
>> >be
>> >>> >> indicated), then read.csv will NOT change from integer to
>factor
>> >>> >> (particularly if you have specified which markers represent NA
>> >using
>> >>> the
>> >>> >> na.strings argument documented under read.table)... so you
>> >probably DO
>> >>> have
>> >>> >> unexpected garbage still in your data which could be obscuring
>> >valuable
>> >>> >> information that could affect your conclusions.
>> >>> >> --
>> >>> >> Sent from my phone. Please excuse my brevity.
>> >>> >>
>> >>> >> On September 20, 2016 3:11:42 PM PDT, lily li
>> >
>> >>> >> wrote:
>> >>> >> >I reread the data, and use 'na.rm = T' when reading the data.
>> >This
>> >>> time
>> >>> >> >it
>> >>> >> >has no such problem. It seems that the existence of NAs
>convert
>> >the
>> >>> >> >integer
>> >>> >> >to factor. Thanks for your help.
>> >>> >> >
>> >>> >> >
>> >>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan
>> >
>> >>> >> >wrote:
>> >>> >> >
>> >>> >> >> Add the "stringsAsFactors = F"  when you read the data, and
>> >then
>> >>> >> >> convert them to numeric.
>> >>> >> >>
>> >>> >> >> On 20 September 2016 at 16:00, lily li
>
>> >wrote:
>> >>> >> >> > Yes, it is stored as factor. I can't check out any
>problem
>> >in the
>> >>> >> >> original
>> >>> >> >> > data. Reread data doesn't help either. I use read.csv to
>> >read in
>> >>> >> >the
>> >>> >> >> data,
>> >>> >> >> > do you think it is better to use read.table? Thanks
>again.
>> >>> >> >> >
>> >>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow
>> ><538...@gmail.com>
>> >>> >> >wrote:
>> >>> >> >> >
>> >>> >> >> >> This indicates that your Discharge column has been
>> >>> >> >stored/converted as
>> >>> >> >> >> a factor (run str(df) to verify and check other
>columns).
>> >This
>> >>> >> >> >> usually happens when functions like read.table are left
>to
>> >try to
>> >>> >> >> >> figure out what each column is and it finds something in
>> >that
>> >>> >> >column
>> >>> >> >> >> that cannot be converted to a number (possibly an oh
>> >instead of a
>> >>> >> >> >> zero, an el instead of a one, or just a letter or
>> >punctuation
>> >>> mark
>> >>> >> >> >> accidentally in the file).  You can either find the
>error
>> >in your
>> >>> >> >> >> original data, fix it, and reread the data, or specify
>that
>> >the
>> >>> >> >column
>> >>> >> >> >> should be numeric using the colClasses argument to
>> >read.table or
>> >>> >> >other
>> 

Re: [R] about data problem

2016-09-20 Thread lily li
Thanks. The former method works. I confused character with factor.

Besides, I should use: dta$DischargeNum <- as.numeric( dta$Discharge )
instead of: dta$Discharge <- as.numeric( dta$Discharge )


On Tue, Sep 20, 2016 at 5:18 PM, Jeff Newmiller 
wrote:

> Which means it avoided converting to factor... Success!
>
> Note that the column apparently has garbage characters in one or more of
> the rows, which should be evident when you LOOK AT THE CHARACTERS in the
> column. They should all be numeric symbols, plus or minus, and perhaps
> decimal points. If they are not, then the conversion to numeric will be
> incomplete. See my other message. You have the choice of editing the file
> (may have concerns with traceability), or you can write R code that removes
> the garbage characters using gsub.
> --
> Sent from my phone. Please excuse my brevity.
>
> On September 20, 2016 4:09:02 PM PDT, lily li  wrote:
> >Yes, I tried to add this statement when reading the dataset.
> >But when I use summary(df), it shows:
> >Discharge
> >Length:
> >Class  :character
> >Mode  :character
> >
> >
> >On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradini 
> >wrote:
> >
> >> read.csv("your_data.csv", stringsAsFactors=FALSE)
> >> (I'm just reiterating Jianling said...)
> >>
> >> Joe
> >>
> >> On Tue, Sep 20, 2016 at 4:56 PM, lily li  wrote:
> >>
> >>> Is there a function in read.csv that I can use to avoid converting
> >numeric
> >>> to factor? Thanks a lot.
> >>>
> >>>
> >>>
> >>> On Tue, Sep 20, 2016 at 4:42 PM, lily li 
> >wrote:
> >>>
> >>> > Thanks. Then what should I do to solve the problem?
> >>> >
> >>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
> >>> jdnew...@dcn.davis.ca.us>
> >>> > wrote:
> >>> >
> >>> >> I suppose you can do what works for your data, but I wouldn't
> >recommend
> >>> >> na.rm=TRUE because it hides problems rather than clarifying them.
> >>> >>
> >>> >> If in fact your data includes true NA values (the letters NA or
> >simply
> >>> >> nothing between the commas are typical ways this information may
> >be
> >>> >> indicated), then read.csv will NOT change from integer to factor
> >>> >> (particularly if you have specified which markers represent NA
> >using
> >>> the
> >>> >> na.strings argument documented under read.table)... so you
> >probably DO
> >>> have
> >>> >> unexpected garbage still in your data which could be obscuring
> >valuable
> >>> >> information that could affect your conclusions.
> >>> >> --
> >>> >> Sent from my phone. Please excuse my brevity.
> >>> >>
> >>> >> On September 20, 2016 3:11:42 PM PDT, lily li
> >
> >>> >> wrote:
> >>> >> >I reread the data, and use 'na.rm = T' when reading the data.
> >This
> >>> time
> >>> >> >it
> >>> >> >has no such problem. It seems that the existence of NAs convert
> >the
> >>> >> >integer
> >>> >> >to factor. Thanks for your help.
> >>> >> >
> >>> >> >
> >>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan
> >
> >>> >> >wrote:
> >>> >> >
> >>> >> >> Add the "stringsAsFactors = F"  when you read the data, and
> >then
> >>> >> >> convert them to numeric.
> >>> >> >>
> >>> >> >> On 20 September 2016 at 16:00, lily li 
> >wrote:
> >>> >> >> > Yes, it is stored as factor. I can't check out any problem
> >in the
> >>> >> >> original
> >>> >> >> > data. Reread data doesn't help either. I use read.csv to
> >read in
> >>> >> >the
> >>> >> >> data,
> >>> >> >> > do you think it is better to use read.table? Thanks again.
> >>> >> >> >
> >>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow
> ><538...@gmail.com>
> >>> >> >wrote:
> >>> >> >> >
> >>> >> >> >> This indicates that your Discharge column has been
> >>> >> >stored/converted as
> >>> >> >> >> a factor (run str(df) to verify and check other columns).
> >This
> >>> >> >> >> usually happens when functions like read.table are left to
> >try to
> >>> >> >> >> figure out what each column is and it finds something in
> >that
> >>> >> >column
> >>> >> >> >> that cannot be converted to a number (possibly an oh
> >instead of a
> >>> >> >> >> zero, an el instead of a one, or just a letter or
> >punctuation
> >>> mark
> >>> >> >> >> accidentally in the file).  You can either find the error
> >in your
> >>> >> >> >> original data, fix it, and reread the data, or specify that
> >the
> >>> >> >column
> >>> >> >> >> should be numeric using the colClasses argument to
> >read.table or
> >>> >> >other
> >>> >> >> >> function.
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li
> >
> >>> >> >wrote:
> >>> >> >> >> > Hi R users,
> >>> >> >> >> >
> >>> >> >> >> > I have a problem in reading data.
> >>> >> >> >> > For example, part of my dataframe is like this:
> >>> >> >> >> >
> >>> >> >> >> > df
> >>> >> >> >> > month day year  Discharge
> >>> >> >> >> >31   

Re: [R] about data problem

2016-09-20 Thread Jeff Newmiller
Which means it avoided converting to factor... Success!

Note that the column apparently has garbage characters in one or more of the 
rows, which should be evident when you LOOK AT THE CHARACTERS in the column. 
They should all be numeric symbols, plus or minus, and perhaps decimal points. 
If they are not, then the conversion to numeric will be incomplete. See my 
other message. You have the choice of editing the file (may have concerns with 
traceability), or you can write R code that removes the garbage characters 
using gsub.
-- 
Sent from my phone. Please excuse my brevity.

On September 20, 2016 4:09:02 PM PDT, lily li  wrote:
>Yes, I tried to add this statement when reading the dataset.
>But when I use summary(df), it shows:
>Discharge
>Length:
>Class  :character
>Mode  :character
>
>
>On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradini 
>wrote:
>
>> read.csv("your_data.csv", stringsAsFactors=FALSE)
>> (I'm just reiterating Jianling said...)
>>
>> Joe
>>
>> On Tue, Sep 20, 2016 at 4:56 PM, lily li  wrote:
>>
>>> Is there a function in read.csv that I can use to avoid converting
>numeric
>>> to factor? Thanks a lot.
>>>
>>>
>>>
>>> On Tue, Sep 20, 2016 at 4:42 PM, lily li 
>wrote:
>>>
>>> > Thanks. Then what should I do to solve the problem?
>>> >
>>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
>>> jdnew...@dcn.davis.ca.us>
>>> > wrote:
>>> >
>>> >> I suppose you can do what works for your data, but I wouldn't
>recommend
>>> >> na.rm=TRUE because it hides problems rather than clarifying them.
>>> >>
>>> >> If in fact your data includes true NA values (the letters NA or
>simply
>>> >> nothing between the commas are typical ways this information may
>be
>>> >> indicated), then read.csv will NOT change from integer to factor
>>> >> (particularly if you have specified which markers represent NA
>using
>>> the
>>> >> na.strings argument documented under read.table)... so you
>probably DO
>>> have
>>> >> unexpected garbage still in your data which could be obscuring
>valuable
>>> >> information that could affect your conclusions.
>>> >> --
>>> >> Sent from my phone. Please excuse my brevity.
>>> >>
>>> >> On September 20, 2016 3:11:42 PM PDT, lily li
>
>>> >> wrote:
>>> >> >I reread the data, and use 'na.rm = T' when reading the data.
>This
>>> time
>>> >> >it
>>> >> >has no such problem. It seems that the existence of NAs convert
>the
>>> >> >integer
>>> >> >to factor. Thanks for your help.
>>> >> >
>>> >> >
>>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan
>
>>> >> >wrote:
>>> >> >
>>> >> >> Add the "stringsAsFactors = F"  when you read the data, and
>then
>>> >> >> convert them to numeric.
>>> >> >>
>>> >> >> On 20 September 2016 at 16:00, lily li 
>wrote:
>>> >> >> > Yes, it is stored as factor. I can't check out any problem
>in the
>>> >> >> original
>>> >> >> > data. Reread data doesn't help either. I use read.csv to
>read in
>>> >> >the
>>> >> >> data,
>>> >> >> > do you think it is better to use read.table? Thanks again.
>>> >> >> >
>>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow
><538...@gmail.com>
>>> >> >wrote:
>>> >> >> >
>>> >> >> >> This indicates that your Discharge column has been
>>> >> >stored/converted as
>>> >> >> >> a factor (run str(df) to verify and check other columns). 
>This
>>> >> >> >> usually happens when functions like read.table are left to
>try to
>>> >> >> >> figure out what each column is and it finds something in
>that
>>> >> >column
>>> >> >> >> that cannot be converted to a number (possibly an oh
>instead of a
>>> >> >> >> zero, an el instead of a one, or just a letter or
>punctuation
>>> mark
>>> >> >> >> accidentally in the file).  You can either find the error
>in your
>>> >> >> >> original data, fix it, and reread the data, or specify that
>the
>>> >> >column
>>> >> >> >> should be numeric using the colClasses argument to
>read.table or
>>> >> >other
>>> >> >> >> function.
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li
>
>>> >> >wrote:
>>> >> >> >> > Hi R users,
>>> >> >> >> >
>>> >> >> >> > I have a problem in reading data.
>>> >> >> >> > For example, part of my dataframe is like this:
>>> >> >> >> >
>>> >> >> >> > df
>>> >> >> >> > month day year  Discharge
>>> >> >> >> >31   20106.4
>>> >> >> >> >32   2010   7.58
>>> >> >> >> >33   2010   6.82
>>> >> >> >> >34   2010   8.63
>>> >> >> >> >35   2010   8.16
>>> >> >> >> >36   2010   7.58
>>> >> >> >> >
>>> >> >> >> > Then if I type summary(df), why it converts the discharge
>data
>>> >> >to
>>> >> >> >> levels? I
>>> >> >> >> > also met the same problem when reading some other csv
>files.
>>> How
>>> >> >to
>>> >> >> solve
>>> >> >> 

Re: [R] about data problem

2016-09-20 Thread lily li
Yes, I tried to add this statement when reading the dataset.
But when I use summary(df), it shows:
Discharge
Length:
Class  :character
Mode  :character


On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradini  wrote:

> read.csv("your_data.csv", stringsAsFactors=FALSE)
> (I'm just reiterating Jianling said...)
>
> Joe
>
> On Tue, Sep 20, 2016 at 4:56 PM, lily li  wrote:
>
>> Is there a function in read.csv that I can use to avoid converting numeric
>> to factor? Thanks a lot.
>>
>>
>>
>> On Tue, Sep 20, 2016 at 4:42 PM, lily li  wrote:
>>
>> > Thanks. Then what should I do to solve the problem?
>> >
>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
>> jdnew...@dcn.davis.ca.us>
>> > wrote:
>> >
>> >> I suppose you can do what works for your data, but I wouldn't recommend
>> >> na.rm=TRUE because it hides problems rather than clarifying them.
>> >>
>> >> If in fact your data includes true NA values (the letters NA or simply
>> >> nothing between the commas are typical ways this information may be
>> >> indicated), then read.csv will NOT change from integer to factor
>> >> (particularly if you have specified which markers represent NA using
>> the
>> >> na.strings argument documented under read.table)... so you probably DO
>> have
>> >> unexpected garbage still in your data which could be obscuring valuable
>> >> information that could affect your conclusions.
>> >> --
>> >> Sent from my phone. Please excuse my brevity.
>> >>
>> >> On September 20, 2016 3:11:42 PM PDT, lily li 
>> >> wrote:
>> >> >I reread the data, and use 'na.rm = T' when reading the data. This
>> time
>> >> >it
>> >> >has no such problem. It seems that the existence of NAs convert the
>> >> >integer
>> >> >to factor. Thanks for your help.
>> >> >
>> >> >
>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan 
>> >> >wrote:
>> >> >
>> >> >> Add the "stringsAsFactors = F"  when you read the data, and then
>> >> >> convert them to numeric.
>> >> >>
>> >> >> On 20 September 2016 at 16:00, lily li  wrote:
>> >> >> > Yes, it is stored as factor. I can't check out any problem in the
>> >> >> original
>> >> >> > data. Reread data doesn't help either. I use read.csv to read in
>> >> >the
>> >> >> data,
>> >> >> > do you think it is better to use read.table? Thanks again.
>> >> >> >
>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com>
>> >> >wrote:
>> >> >> >
>> >> >> >> This indicates that your Discharge column has been
>> >> >stored/converted as
>> >> >> >> a factor (run str(df) to verify and check other columns).  This
>> >> >> >> usually happens when functions like read.table are left to try to
>> >> >> >> figure out what each column is and it finds something in that
>> >> >column
>> >> >> >> that cannot be converted to a number (possibly an oh instead of a
>> >> >> >> zero, an el instead of a one, or just a letter or punctuation
>> mark
>> >> >> >> accidentally in the file).  You can either find the error in your
>> >> >> >> original data, fix it, and reread the data, or specify that the
>> >> >column
>> >> >> >> should be numeric using the colClasses argument to read.table or
>> >> >other
>> >> >> >> function.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li 
>> >> >wrote:
>> >> >> >> > Hi R users,
>> >> >> >> >
>> >> >> >> > I have a problem in reading data.
>> >> >> >> > For example, part of my dataframe is like this:
>> >> >> >> >
>> >> >> >> > df
>> >> >> >> > month day year  Discharge
>> >> >> >> >31   20106.4
>> >> >> >> >32   2010   7.58
>> >> >> >> >33   2010   6.82
>> >> >> >> >34   2010   8.63
>> >> >> >> >35   2010   8.16
>> >> >> >> >36   2010   7.58
>> >> >> >> >
>> >> >> >> > Then if I type summary(df), why it converts the discharge data
>> >> >to
>> >> >> >> levels? I
>> >> >> >> > also met the same problem when reading some other csv files.
>> How
>> >> >to
>> >> >> solve
>> >> >> >> > this problem? Thanks.
>> >> >> >> >
>> >> >> >> > Discharge
>> >> >> >> > 7.58 :2
>> >> >> >> > 6.4   :1
>> >> >> >> > 6.82 :1
>> >> >> >> > 8.63 :1
>> >> >> >> > 8.16 :1
>> >> >> >> >
>> >> >> >> > [[alternative HTML version deleted]]
>> >> >> >> >
>> >> >> >> > __
>> >> >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
>> >> >see
>> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> >> >> posting-guide.html
>> >> >> >> > and provide commented, minimal, self-contained, reproducible
>> >> >code.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Gregory (Greg) L. Snow Ph.D.
>> >> >> >> 538...@gmail.com
>> 

Re: [R] about data problem

2016-09-20 Thread Jeff Newmiller
Find the offending data. One approach is to look at the input data with your 
image sensors and neural pattern processor (eyes and brain). One way to reduce 
the load on those told is to read in the data with the stringsAsFactors=TRUE 
argument and try manually converting the resulting character strings into 
numeric values. You can then use the is.na function to find which rows failed 
to convert and use indexing to review the strings that had trouble. 

# I recommend against using df as a variable name, since it is the name of a 
function in base R
dta$DischargeNum <- as.numeric( dta$Discharge )
dta[ is.na( dta$DischargeNum ), "Discharge" ]
-- 
Sent from my phone. Please excuse my brevity.

On September 20, 2016 3:42:39 PM PDT, lily li  wrote:
>Thanks. Then what should I do to solve the problem?
>
>On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller
>
>wrote:
>
>> I suppose you can do what works for your data, but I wouldn't
>recommend
>> na.rm=TRUE because it hides problems rather than clarifying them.
>>
>> If in fact your data includes true NA values (the letters NA or
>simply
>> nothing between the commas are typical ways this information may be
>> indicated), then read.csv will NOT change from integer to factor
>> (particularly if you have specified which markers represent NA using
>the
>> na.strings argument documented under read.table)... so you probably
>DO have
>> unexpected garbage still in your data which could be obscuring
>valuable
>> information that could affect your conclusions.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 20, 2016 3:11:42 PM PDT, lily li 
>wrote:
>> >I reread the data, and use 'na.rm = T' when reading the data. This
>time
>> >it
>> >has no such problem. It seems that the existence of NAs convert the
>> >integer
>> >to factor. Thanks for your help.
>> >
>> >
>> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan
>
>> >wrote:
>> >
>> >> Add the "stringsAsFactors = F"  when you read the data, and then
>> >> convert them to numeric.
>> >>
>> >> On 20 September 2016 at 16:00, lily li 
>wrote:
>> >> > Yes, it is stored as factor. I can't check out any problem in
>the
>> >> original
>> >> > data. Reread data doesn't help either. I use read.csv to read in
>> >the
>> >> data,
>> >> > do you think it is better to use read.table? Thanks again.
>> >> >
>> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com>
>> >wrote:
>> >> >
>> >> >> This indicates that your Discharge column has been
>> >stored/converted as
>> >> >> a factor (run str(df) to verify and check other columns).  This
>> >> >> usually happens when functions like read.table are left to try
>to
>> >> >> figure out what each column is and it finds something in that
>> >column
>> >> >> that cannot be converted to a number (possibly an oh instead of
>a
>> >> >> zero, an el instead of a one, or just a letter or punctuation
>mark
>> >> >> accidentally in the file).  You can either find the error in
>your
>> >> >> original data, fix it, and reread the data, or specify that the
>> >column
>> >> >> should be numeric using the colClasses argument to read.table
>or
>> >other
>> >> >> function.
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li 
>> >wrote:
>> >> >> > Hi R users,
>> >> >> >
>> >> >> > I have a problem in reading data.
>> >> >> > For example, part of my dataframe is like this:
>> >> >> >
>> >> >> > df
>> >> >> > month day year  Discharge
>> >> >> >31   20106.4
>> >> >> >32   2010   7.58
>> >> >> >33   2010   6.82
>> >> >> >34   2010   8.63
>> >> >> >35   2010   8.16
>> >> >> >36   2010   7.58
>> >> >> >
>> >> >> > Then if I type summary(df), why it converts the discharge
>data
>> >to
>> >> >> levels? I
>> >> >> > also met the same problem when reading some other csv files.
>How
>> >to
>> >> solve
>> >> >> > this problem? Thanks.
>> >> >> >
>> >> >> > Discharge
>> >> >> > 7.58 :2
>> >> >> > 6.4   :1
>> >> >> > 6.82 :1
>> >> >> > 8.63 :1
>> >> >> > 8.16 :1
>> >> >> >
>> >> >> > [[alternative HTML version deleted]]
>> >> >> >
>> >> >> > __
>> >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
>> >see
>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> >> posting-guide.html
>> >> >> > and provide commented, minimal, self-contained, reproducible
>> >code.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Gregory (Greg) L. Snow Ph.D.
>> >> >> 538...@gmail.com
>> >> >>
>> >> >
>> >> > [[alternative HTML version deleted]]
>> >> >
>> >> > __
>> >> > 

Re: [R] about data problem

2016-09-20 Thread Joe Ceradini
read.csv("your_data.csv", stringsAsFactors=FALSE)
(I'm just reiterating Jianling said...)

Joe

On Tue, Sep 20, 2016 at 4:56 PM, lily li  wrote:

> Is there a function in read.csv that I can use to avoid converting numeric
> to factor? Thanks a lot.
>
>
>
> On Tue, Sep 20, 2016 at 4:42 PM, lily li  wrote:
>
> > Thanks. Then what should I do to solve the problem?
> >
> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
> jdnew...@dcn.davis.ca.us>
> > wrote:
> >
> >> I suppose you can do what works for your data, but I wouldn't recommend
> >> na.rm=TRUE because it hides problems rather than clarifying them.
> >>
> >> If in fact your data includes true NA values (the letters NA or simply
> >> nothing between the commas are typical ways this information may be
> >> indicated), then read.csv will NOT change from integer to factor
> >> (particularly if you have specified which markers represent NA using the
> >> na.strings argument documented under read.table)... so you probably DO
> have
> >> unexpected garbage still in your data which could be obscuring valuable
> >> information that could affect your conclusions.
> >> --
> >> Sent from my phone. Please excuse my brevity.
> >>
> >> On September 20, 2016 3:11:42 PM PDT, lily li 
> >> wrote:
> >> >I reread the data, and use 'na.rm = T' when reading the data. This time
> >> >it
> >> >has no such problem. It seems that the existence of NAs convert the
> >> >integer
> >> >to factor. Thanks for your help.
> >> >
> >> >
> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan 
> >> >wrote:
> >> >
> >> >> Add the "stringsAsFactors = F"  when you read the data, and then
> >> >> convert them to numeric.
> >> >>
> >> >> On 20 September 2016 at 16:00, lily li  wrote:
> >> >> > Yes, it is stored as factor. I can't check out any problem in the
> >> >> original
> >> >> > data. Reread data doesn't help either. I use read.csv to read in
> >> >the
> >> >> data,
> >> >> > do you think it is better to use read.table? Thanks again.
> >> >> >
> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com>
> >> >wrote:
> >> >> >
> >> >> >> This indicates that your Discharge column has been
> >> >stored/converted as
> >> >> >> a factor (run str(df) to verify and check other columns).  This
> >> >> >> usually happens when functions like read.table are left to try to
> >> >> >> figure out what each column is and it finds something in that
> >> >column
> >> >> >> that cannot be converted to a number (possibly an oh instead of a
> >> >> >> zero, an el instead of a one, or just a letter or punctuation mark
> >> >> >> accidentally in the file).  You can either find the error in your
> >> >> >> original data, fix it, and reread the data, or specify that the
> >> >column
> >> >> >> should be numeric using the colClasses argument to read.table or
> >> >other
> >> >> >> function.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li 
> >> >wrote:
> >> >> >> > Hi R users,
> >> >> >> >
> >> >> >> > I have a problem in reading data.
> >> >> >> > For example, part of my dataframe is like this:
> >> >> >> >
> >> >> >> > df
> >> >> >> > month day year  Discharge
> >> >> >> >31   20106.4
> >> >> >> >32   2010   7.58
> >> >> >> >33   2010   6.82
> >> >> >> >34   2010   8.63
> >> >> >> >35   2010   8.16
> >> >> >> >36   2010   7.58
> >> >> >> >
> >> >> >> > Then if I type summary(df), why it converts the discharge data
> >> >to
> >> >> >> levels? I
> >> >> >> > also met the same problem when reading some other csv files. How
> >> >to
> >> >> solve
> >> >> >> > this problem? Thanks.
> >> >> >> >
> >> >> >> > Discharge
> >> >> >> > 7.58 :2
> >> >> >> > 6.4   :1
> >> >> >> > 6.82 :1
> >> >> >> > 8.63 :1
> >> >> >> > 8.16 :1
> >> >> >> >
> >> >> >> > [[alternative HTML version deleted]]
> >> >> >> >
> >> >> >> > __
> >> >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
> >> >see
> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> >> > PLEASE do read the posting guide http://www.R-project.org/
> >> >> >> posting-guide.html
> >> >> >> > and provide commented, minimal, self-contained, reproducible
> >> >code.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Gregory (Greg) L. Snow Ph.D.
> >> >> >> 538...@gmail.com
> >> >> >>
> >> >> >
> >> >> > [[alternative HTML version deleted]]
> >> >> >
> >> >> > __
> >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> > PLEASE do read the posting guide http://www.R-project.org/
> >> >> posting-guide.html
> >> 

Re: [R] about data problem

2016-09-20 Thread lily li
Is there a function in read.csv that I can use to avoid converting numeric
to factor? Thanks a lot.



On Tue, Sep 20, 2016 at 4:42 PM, lily li  wrote:

> Thanks. Then what should I do to solve the problem?
>
> On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller 
> wrote:
>
>> I suppose you can do what works for your data, but I wouldn't recommend
>> na.rm=TRUE because it hides problems rather than clarifying them.
>>
>> If in fact your data includes true NA values (the letters NA or simply
>> nothing between the commas are typical ways this information may be
>> indicated), then read.csv will NOT change from integer to factor
>> (particularly if you have specified which markers represent NA using the
>> na.strings argument documented under read.table)... so you probably DO have
>> unexpected garbage still in your data which could be obscuring valuable
>> information that could affect your conclusions.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 20, 2016 3:11:42 PM PDT, lily li 
>> wrote:
>> >I reread the data, and use 'na.rm = T' when reading the data. This time
>> >it
>> >has no such problem. It seems that the existence of NAs convert the
>> >integer
>> >to factor. Thanks for your help.
>> >
>> >
>> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan 
>> >wrote:
>> >
>> >> Add the "stringsAsFactors = F"  when you read the data, and then
>> >> convert them to numeric.
>> >>
>> >> On 20 September 2016 at 16:00, lily li  wrote:
>> >> > Yes, it is stored as factor. I can't check out any problem in the
>> >> original
>> >> > data. Reread data doesn't help either. I use read.csv to read in
>> >the
>> >> data,
>> >> > do you think it is better to use read.table? Thanks again.
>> >> >
>> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com>
>> >wrote:
>> >> >
>> >> >> This indicates that your Discharge column has been
>> >stored/converted as
>> >> >> a factor (run str(df) to verify and check other columns).  This
>> >> >> usually happens when functions like read.table are left to try to
>> >> >> figure out what each column is and it finds something in that
>> >column
>> >> >> that cannot be converted to a number (possibly an oh instead of a
>> >> >> zero, an el instead of a one, or just a letter or punctuation mark
>> >> >> accidentally in the file).  You can either find the error in your
>> >> >> original data, fix it, and reread the data, or specify that the
>> >column
>> >> >> should be numeric using the colClasses argument to read.table or
>> >other
>> >> >> function.
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li 
>> >wrote:
>> >> >> > Hi R users,
>> >> >> >
>> >> >> > I have a problem in reading data.
>> >> >> > For example, part of my dataframe is like this:
>> >> >> >
>> >> >> > df
>> >> >> > month day year  Discharge
>> >> >> >31   20106.4
>> >> >> >32   2010   7.58
>> >> >> >33   2010   6.82
>> >> >> >34   2010   8.63
>> >> >> >35   2010   8.16
>> >> >> >36   2010   7.58
>> >> >> >
>> >> >> > Then if I type summary(df), why it converts the discharge data
>> >to
>> >> >> levels? I
>> >> >> > also met the same problem when reading some other csv files. How
>> >to
>> >> solve
>> >> >> > this problem? Thanks.
>> >> >> >
>> >> >> > Discharge
>> >> >> > 7.58 :2
>> >> >> > 6.4   :1
>> >> >> > 6.82 :1
>> >> >> > 8.63 :1
>> >> >> > 8.16 :1
>> >> >> >
>> >> >> > [[alternative HTML version deleted]]
>> >> >> >
>> >> >> > __
>> >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
>> >see
>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> >> posting-guide.html
>> >> >> > and provide commented, minimal, self-contained, reproducible
>> >code.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Gregory (Greg) L. Snow Ph.D.
>> >> >> 538...@gmail.com
>> >> >>
>> >> >
>> >> > [[alternative HTML version deleted]]
>> >> >
>> >> > __
>> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>> >>
>> >> --
>> >> Jianling Fan
>> >> 樊建凌
>> >>
>> >
>> >   [[alternative HTML version deleted]]
>> >
>> >__
>> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> 

Re: [R] about data problem

2016-09-20 Thread lily li
Thanks. Then what should I do to solve the problem?

On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller 
wrote:

> I suppose you can do what works for your data, but I wouldn't recommend
> na.rm=TRUE because it hides problems rather than clarifying them.
>
> If in fact your data includes true NA values (the letters NA or simply
> nothing between the commas are typical ways this information may be
> indicated), then read.csv will NOT change from integer to factor
> (particularly if you have specified which markers represent NA using the
> na.strings argument documented under read.table)... so you probably DO have
> unexpected garbage still in your data which could be obscuring valuable
> information that could affect your conclusions.
> --
> Sent from my phone. Please excuse my brevity.
>
> On September 20, 2016 3:11:42 PM PDT, lily li  wrote:
> >I reread the data, and use 'na.rm = T' when reading the data. This time
> >it
> >has no such problem. It seems that the existence of NAs convert the
> >integer
> >to factor. Thanks for your help.
> >
> >
> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan 
> >wrote:
> >
> >> Add the "stringsAsFactors = F"  when you read the data, and then
> >> convert them to numeric.
> >>
> >> On 20 September 2016 at 16:00, lily li  wrote:
> >> > Yes, it is stored as factor. I can't check out any problem in the
> >> original
> >> > data. Reread data doesn't help either. I use read.csv to read in
> >the
> >> data,
> >> > do you think it is better to use read.table? Thanks again.
> >> >
> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com>
> >wrote:
> >> >
> >> >> This indicates that your Discharge column has been
> >stored/converted as
> >> >> a factor (run str(df) to verify and check other columns).  This
> >> >> usually happens when functions like read.table are left to try to
> >> >> figure out what each column is and it finds something in that
> >column
> >> >> that cannot be converted to a number (possibly an oh instead of a
> >> >> zero, an el instead of a one, or just a letter or punctuation mark
> >> >> accidentally in the file).  You can either find the error in your
> >> >> original data, fix it, and reread the data, or specify that the
> >column
> >> >> should be numeric using the colClasses argument to read.table or
> >other
> >> >> function.
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li 
> >wrote:
> >> >> > Hi R users,
> >> >> >
> >> >> > I have a problem in reading data.
> >> >> > For example, part of my dataframe is like this:
> >> >> >
> >> >> > df
> >> >> > month day year  Discharge
> >> >> >31   20106.4
> >> >> >32   2010   7.58
> >> >> >33   2010   6.82
> >> >> >34   2010   8.63
> >> >> >35   2010   8.16
> >> >> >36   2010   7.58
> >> >> >
> >> >> > Then if I type summary(df), why it converts the discharge data
> >to
> >> >> levels? I
> >> >> > also met the same problem when reading some other csv files. How
> >to
> >> solve
> >> >> > this problem? Thanks.
> >> >> >
> >> >> > Discharge
> >> >> > 7.58 :2
> >> >> > 6.4   :1
> >> >> > 6.82 :1
> >> >> > 8.63 :1
> >> >> > 8.16 :1
> >> >> >
> >> >> > [[alternative HTML version deleted]]
> >> >> >
> >> >> > __
> >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
> >see
> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> > PLEASE do read the posting guide http://www.R-project.org/
> >> >> posting-guide.html
> >> >> > and provide commented, minimal, self-contained, reproducible
> >code.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Gregory (Greg) L. Snow Ph.D.
> >> >> 538...@gmail.com
> >> >>
> >> >
> >> > [[alternative HTML version deleted]]
> >> >
> >> > __
> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide http://www.R-project.org/
> >> posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >> --
> >> Jianling Fan
> >> 樊建凌
> >>
> >
> >   [[alternative HTML version deleted]]
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help

Re: [R] about data problem

2016-09-20 Thread Jeff Newmiller
I suppose you can do what works for your data, but I wouldn't recommend 
na.rm=TRUE because it hides problems rather than clarifying them. 

If in fact your data includes true NA values (the letters NA or simply nothing 
between the commas are typical ways this information may be indicated), then 
read.csv will NOT change from integer to factor (particularly if you have 
specified which markers represent NA using the na.strings argument documented 
under read.table)... so you probably DO have unexpected garbage still in your 
data which could be obscuring valuable information that could affect your 
conclusions. 
-- 
Sent from my phone. Please excuse my brevity.

On September 20, 2016 3:11:42 PM PDT, lily li  wrote:
>I reread the data, and use 'na.rm = T' when reading the data. This time
>it
>has no such problem. It seems that the existence of NAs convert the
>integer
>to factor. Thanks for your help.
>
>
>On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan 
>wrote:
>
>> Add the "stringsAsFactors = F"  when you read the data, and then
>> convert them to numeric.
>>
>> On 20 September 2016 at 16:00, lily li  wrote:
>> > Yes, it is stored as factor. I can't check out any problem in the
>> original
>> > data. Reread data doesn't help either. I use read.csv to read in
>the
>> data,
>> > do you think it is better to use read.table? Thanks again.
>> >
>> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com>
>wrote:
>> >
>> >> This indicates that your Discharge column has been
>stored/converted as
>> >> a factor (run str(df) to verify and check other columns).  This
>> >> usually happens when functions like read.table are left to try to
>> >> figure out what each column is and it finds something in that
>column
>> >> that cannot be converted to a number (possibly an oh instead of a
>> >> zero, an el instead of a one, or just a letter or punctuation mark
>> >> accidentally in the file).  You can either find the error in your
>> >> original data, fix it, and reread the data, or specify that the
>column
>> >> should be numeric using the colClasses argument to read.table or
>other
>> >> function.
>> >>
>> >>
>> >>
>> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li 
>wrote:
>> >> > Hi R users,
>> >> >
>> >> > I have a problem in reading data.
>> >> > For example, part of my dataframe is like this:
>> >> >
>> >> > df
>> >> > month day year  Discharge
>> >> >31   20106.4
>> >> >32   2010   7.58
>> >> >33   2010   6.82
>> >> >34   2010   8.63
>> >> >35   2010   8.16
>> >> >36   2010   7.58
>> >> >
>> >> > Then if I type summary(df), why it converts the discharge data
>to
>> >> levels? I
>> >> > also met the same problem when reading some other csv files. How
>to
>> solve
>> >> > this problem? Thanks.
>> >> >
>> >> > Discharge
>> >> > 7.58 :2
>> >> > 6.4   :1
>> >> > 6.82 :1
>> >> > 8.63 :1
>> >> > 8.16 :1
>> >> >
>> >> > [[alternative HTML version deleted]]
>> >> >
>> >> > __
>> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
>see
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible
>code.
>> >>
>> >>
>> >>
>> >> --
>> >> Gregory (Greg) L. Snow Ph.D.
>> >> 538...@gmail.com
>> >>
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Jianling Fan
>> 樊建凌
>>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about data problem

2016-09-20 Thread lily li
I reread the data, and use 'na.rm = T' when reading the data. This time it
has no such problem. It seems that the existence of NAs convert the integer
to factor. Thanks for your help.


On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan  wrote:

> Add the "stringsAsFactors = F"  when you read the data, and then
> convert them to numeric.
>
> On 20 September 2016 at 16:00, lily li  wrote:
> > Yes, it is stored as factor. I can't check out any problem in the
> original
> > data. Reread data doesn't help either. I use read.csv to read in the
> data,
> > do you think it is better to use read.table? Thanks again.
> >
> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com> wrote:
> >
> >> This indicates that your Discharge column has been stored/converted as
> >> a factor (run str(df) to verify and check other columns).  This
> >> usually happens when functions like read.table are left to try to
> >> figure out what each column is and it finds something in that column
> >> that cannot be converted to a number (possibly an oh instead of a
> >> zero, an el instead of a one, or just a letter or punctuation mark
> >> accidentally in the file).  You can either find the error in your
> >> original data, fix it, and reread the data, or specify that the column
> >> should be numeric using the colClasses argument to read.table or other
> >> function.
> >>
> >>
> >>
> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li  wrote:
> >> > Hi R users,
> >> >
> >> > I have a problem in reading data.
> >> > For example, part of my dataframe is like this:
> >> >
> >> > df
> >> > month day year  Discharge
> >> >31   20106.4
> >> >32   2010   7.58
> >> >33   2010   6.82
> >> >34   2010   8.63
> >> >35   2010   8.16
> >> >36   2010   7.58
> >> >
> >> > Then if I type summary(df), why it converts the discharge data to
> >> levels? I
> >> > also met the same problem when reading some other csv files. How to
> solve
> >> > this problem? Thanks.
> >> >
> >> > Discharge
> >> > 7.58 :2
> >> > 6.4   :1
> >> > 6.82 :1
> >> > 8.63 :1
> >> > 8.16 :1
> >> >
> >> > [[alternative HTML version deleted]]
> >> >
> >> > __
> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide http://www.R-project.org/
> >> posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >> --
> >> Gregory (Greg) L. Snow Ph.D.
> >> 538...@gmail.com
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jianling Fan
> 樊建凌
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about data problem

2016-09-20 Thread Jianling Fan
Add the "stringsAsFactors = F"  when you read the data, and then
convert them to numeric.

On 20 September 2016 at 16:00, lily li  wrote:
> Yes, it is stored as factor. I can't check out any problem in the original
> data. Reread data doesn't help either. I use read.csv to read in the data,
> do you think it is better to use read.table? Thanks again.
>
> On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com> wrote:
>
>> This indicates that your Discharge column has been stored/converted as
>> a factor (run str(df) to verify and check other columns).  This
>> usually happens when functions like read.table are left to try to
>> figure out what each column is and it finds something in that column
>> that cannot be converted to a number (possibly an oh instead of a
>> zero, an el instead of a one, or just a letter or punctuation mark
>> accidentally in the file).  You can either find the error in your
>> original data, fix it, and reread the data, or specify that the column
>> should be numeric using the colClasses argument to read.table or other
>> function.
>>
>>
>>
>> On Tue, Sep 20, 2016 at 3:46 PM, lily li  wrote:
>> > Hi R users,
>> >
>> > I have a problem in reading data.
>> > For example, part of my dataframe is like this:
>> >
>> > df
>> > month day year  Discharge
>> >31   20106.4
>> >32   2010   7.58
>> >33   2010   6.82
>> >34   2010   8.63
>> >35   2010   8.16
>> >36   2010   7.58
>> >
>> > Then if I type summary(df), why it converts the discharge data to
>> levels? I
>> > also met the same problem when reading some other csv files. How to solve
>> > this problem? Thanks.
>> >
>> > Discharge
>> > 7.58 :2
>> > 6.4   :1
>> > 6.82 :1
>> > 8.63 :1
>> > 8.16 :1
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Gregory (Greg) L. Snow Ph.D.
>> 538...@gmail.com
>>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jianling Fan
樊建凌

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about data problem

2016-09-20 Thread lily li
Yes, it is stored as factor. I can't check out any problem in the original
data. Reread data doesn't help either. I use read.csv to read in the data,
do you think it is better to use read.table? Thanks again.

On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538...@gmail.com> wrote:

> This indicates that your Discharge column has been stored/converted as
> a factor (run str(df) to verify and check other columns).  This
> usually happens when functions like read.table are left to try to
> figure out what each column is and it finds something in that column
> that cannot be converted to a number (possibly an oh instead of a
> zero, an el instead of a one, or just a letter or punctuation mark
> accidentally in the file).  You can either find the error in your
> original data, fix it, and reread the data, or specify that the column
> should be numeric using the colClasses argument to read.table or other
> function.
>
>
>
> On Tue, Sep 20, 2016 at 3:46 PM, lily li  wrote:
> > Hi R users,
> >
> > I have a problem in reading data.
> > For example, part of my dataframe is like this:
> >
> > df
> > month day year  Discharge
> >31   20106.4
> >32   2010   7.58
> >33   2010   6.82
> >34   2010   8.63
> >35   2010   8.16
> >36   2010   7.58
> >
> > Then if I type summary(df), why it converts the discharge data to
> levels? I
> > also met the same problem when reading some other csv files. How to solve
> > this problem? Thanks.
> >
> > Discharge
> > 7.58 :2
> > 6.4   :1
> > 6.82 :1
> > 8.63 :1
> > 8.16 :1
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538...@gmail.com
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about data problem

2016-09-20 Thread Greg Snow
This indicates that your Discharge column has been stored/converted as
a factor (run str(df) to verify and check other columns).  This
usually happens when functions like read.table are left to try to
figure out what each column is and it finds something in that column
that cannot be converted to a number (possibly an oh instead of a
zero, an el instead of a one, or just a letter or punctuation mark
accidentally in the file).  You can either find the error in your
original data, fix it, and reread the data, or specify that the column
should be numeric using the colClasses argument to read.table or other
function.



On Tue, Sep 20, 2016 at 3:46 PM, lily li  wrote:
> Hi R users,
>
> I have a problem in reading data.
> For example, part of my dataframe is like this:
>
> df
> month day year  Discharge
>31   20106.4
>32   2010   7.58
>33   2010   6.82
>34   2010   8.63
>35   2010   8.16
>36   2010   7.58
>
> Then if I type summary(df), why it converts the discharge data to levels? I
> also met the same problem when reading some other csv files. How to solve
> this problem? Thanks.
>
> Discharge
> 7.58 :2
> 6.4   :1
> 6.82 :1
> 8.63 :1
> 8.16 :1
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] about data problem

2016-09-20 Thread lily li
Hi R users,

I have a problem in reading data.
For example, part of my dataframe is like this:

df
month day year  Discharge
   31   20106.4
   32   2010   7.58
   33   2010   6.82
   34   2010   8.63
   35   2010   8.16
   36   2010   7.58

Then if I type summary(df), why it converts the discharge data to levels? I
also met the same problem when reading some other csv files. How to solve
this problem? Thanks.

Discharge
7.58 :2
6.4   :1
6.82 :1
8.63 :1
8.16 :1

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "invalid argument to unary operator" while selecting rows by name

2016-09-20 Thread ruipbarradas

Sorry, I've made a stupid mistake.
It's obviously the other way around.

ix <- which(rownames(data) %in% c("601", "604"))
clean <- data[-ix, ]


Rui Barradas


Citando ruipbarra...@sapo.pt:


Hello,

Try something like the following.

ix <- which(c("601", "604") %in% rownames(data))
clean <- data[-ix, ]


Hope this helps,

Rui Barradas




Citando Pauline Laïlle :


Dear all,

I built a dataframe with read.csv2(). Initially, row names are integers
(order of answers to a survey). They are listed in the csv's first column.
The import works well and my dataframe looks like I wanted it to look.

Row names go as follows :
[1] "6"   "29"  "31"  "32"  "52"  "55"  "63"  "71"  "72"  "80"  "88"  "89"
"91"  "93"  "105" "110" "111" "117" "119" "120"
[21] "122" "127" "128" "133" "137" "140" "163" "165" "167" "169" "177"
"178" "179" "184" "186" "192" "193" "200" "201" "228"
etc.

I would like to drop rows "601" & "604" to clean the dataframe.

While data["601",] shows me the first row i'd like to drop, data[-"601",]
returns the following :
Error in -"601" : invalid argument to unary operator

idem with data[c("601","604"),] and data[-c("601","604"),]

It is the first time that I run into this specific error. After reading a
bit about it I still don't understand what it means and how to fix it.

Thanks for reading!
Best,
Pauline.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "invalid argument to unary operator" while selecting rows by name

2016-09-20 Thread ruipbarradas

Hello,

Try something like the following.

ix <- which(c("601", "604") %in% rownames(data))
clean <- data[-ix, ]


Hope this helps,

Rui Barradas




Citando Pauline Laïlle :


Dear all,

I built a dataframe with read.csv2(). Initially, row names are integers
(order of answers to a survey). They are listed in the csv's first column.
The import works well and my dataframe looks like I wanted it to look.

Row names go as follows :
 [1] "6"   "29"  "31"  "32"  "52"  "55"  "63"  "71"  "72"  "80"  "88"  "89"
 "91"  "93"  "105" "110" "111" "117" "119" "120"
 [21] "122" "127" "128" "133" "137" "140" "163" "165" "167" "169" "177"
"178" "179" "184" "186" "192" "193" "200" "201" "228"
etc.

I would like to drop rows "601" & "604" to clean the dataframe.

While data["601",] shows me the first row i'd like to drop, data[-"601",]
returns the following :
Error in -"601" : invalid argument to unary operator

idem with data[c("601","604"),] and data[-c("601","604"),]

It is the first time that I run into this specific error. After reading a
bit about it I still don't understand what it means and how to fix it.

Thanks for reading!
Best,
Pauline.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to plot the regression line of multivariable linear model?

2016-09-20 Thread Greg Snow
You might consider the Predict.Plot and TkPredict functions in the
TeachingDemos package.  These help you explore multiple linear
regression models by plotting the "line" relating the response to one
of the predictors at given values of the other predictors.  These
lines can be combined in a single plot (Predict.Plot) or changed
interactively (TkPredict).  See the examples in the help page.

On Sun, Sep 18, 2016 at 9:26 AM, mviljamaa  wrote:
> I'm having a bit of trouble plotting the regression line of multivariable
> linear model.
>
> Specifically my model has one response and two predictors, i.e. it's of the
> form
>
> Y = b_0+b_1*X_1+b_2*X_2
>
> Plotting the regression line for a single predictor model
>
> Y = b_0+b_1*X_1
>
> is simple enough, just call abline() with the coefficients returned by lm().
>
> However, I don't know if this can be adapted to multivariable linear models.
>
> I also know about curve(), but I don't know how am I supposed to input the
> multivariable model's coefficients into it.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "invalid argument to unary operator" while selecting rows by name

2016-09-20 Thread Bert Gunter
Hint: "601"  is not 601.

Have you gone through any R tutorials?

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Sep 20, 2016 at 5:42 AM, Pauline Laïlle
 wrote:
> Dear all,
>
> I built a dataframe with read.csv2(). Initially, row names are integers
> (order of answers to a survey). They are listed in the csv's first column.
> The import works well and my dataframe looks like I wanted it to look.
>
> Row names go as follows :
>  [1] "6"   "29"  "31"  "32"  "52"  "55"  "63"  "71"  "72"  "80"  "88"  "89"
>  "91"  "93"  "105" "110" "111" "117" "119" "120"
>  [21] "122" "127" "128" "133" "137" "140" "163" "165" "167" "169" "177"
> "178" "179" "184" "186" "192" "193" "200" "201" "228"
> etc.
>
> I would like to drop rows "601" & "604" to clean the dataframe.
>
> While data["601",] shows me the first row i'd like to drop, data[-"601",]
> returns the following :
> Error in -"601" : invalid argument to unary operator
>
> idem with data[c("601","604"),] and data[-c("601","604"),]
>
> It is the first time that I run into this specific error. After reading a
> bit about it I still don't understand what it means and how to fix it.
>
> Thanks for reading!
> Best,
> Pauline.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] "invalid argument to unary operator" while selecting rows by name

2016-09-20 Thread Pauline Laïlle
Dear all,

I built a dataframe with read.csv2(). Initially, row names are integers
(order of answers to a survey). They are listed in the csv's first column.
The import works well and my dataframe looks like I wanted it to look.

Row names go as follows :
 [1] "6"   "29"  "31"  "32"  "52"  "55"  "63"  "71"  "72"  "80"  "88"  "89"
 "91"  "93"  "105" "110" "111" "117" "119" "120"
 [21] "122" "127" "128" "133" "137" "140" "163" "165" "167" "169" "177"
"178" "179" "184" "186" "192" "193" "200" "201" "228"
etc.

I would like to drop rows "601" & "604" to clean the dataframe.

While data["601",] shows me the first row i'd like to drop, data[-"601",]
returns the following :
Error in -"601" : invalid argument to unary operator

idem with data[c("601","604"),] and data[-c("601","604"),]

It is the first time that I run into this specific error. After reading a
bit about it I still don't understand what it means and how to fix it.

Thanks for reading!
Best,
Pauline.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mgcv: bam(), error in models with random intercepts and random slopes

2016-09-20 Thread Fotis Fotiadis
Hi all

I am using the bam function of the mgcv package to model behavioral data of
a learning experiment. To model individual variation in learning rate, I am
testing models with (a) by-participant random intercepts of trial, (b)
by-participant random slopes and random intercepts of trial, and (c)
by-participant random smooth terms.

While all (a) and (c) models converge, I am getting an error for every
possible variation of a model with random intercepts and random slopes. For
example:

m1.rs<-bam(acc~ 1 + igc + s(ctrial) + s(sbj, bs="re") + s(ctrial, sbj,
bs="re") , data=data_a, family=binomial)
Error in G$smooth[[i]]$first.para:G$smooth[[i]]$last.para :
  argument of length 0

Any idea on what that error might be?

Thank you in advance for your time.
Fotis

P.S.: R version: 3.3.1, mgcv version: 1.8.15

-- 
PhD Candidate
Department of Philosophy and History of Science
University of Athens, Greece.
http://users.uoa.gr/~aprotopapas/LLL/en/members.html#fotisfotiadis

Notice: Please do not use this account for social networks invitations, for
sending chain-mails to me, or as it were a facebook account. Thank you for
respecting my privacy.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Run a fixed effect regression and a logit regression on a national survey that need to be "weighted"

2016-09-20 Thread Adams, Jean
If you want your records to be weighted by the survey weights during the
analysis, then use the weights= argument of the glm() function.

Jean

On Tue, Sep 20, 2016 at 5:04 AM, laura roncaglia 
wrote:

> I am a beginner user of R. I am using a national survey to test what
> variables influence the partecipation in complementary pensions (the
> partecipation in complementary pension is voluntary in my country).
>
> Since the dependent variable is a dummy (1 if the person partecipate and 0
> otherwise) I want to run a logit or probit regression; moreover I want to
> run a fixed effect regression since I subset the survey in order to have
> only the individuals interviewed more than one time.
>
> The data frame is composed by several social and economical variables and
> it also contain a variable "weight" which is the survey weight (they are
> weighting coefficients to adjust the results of the sample to the national
> data).
>
>  family pers sex income pension1 101   F  1   12
> 201   F  2   13 202   M  4   04 30
> 1   M  25000   05 302   F  5   06 401   M
> 6   1
>
> pers is the component of the family and pension takes 1 if the person
> partecipate to complementary pension (it is a semplification of the
> original survey, which contains more variables and observation (aroun 22k
> observations)).
>
> I know how to use the plm and glm functions for a fixed effect or logit
> regressoin; in this case I don't know what to do since I need to take
> account of the survey weights.
>
> I used the svydesing function to "weight" the data frame:
>
> df1 <- svydesign(ids=~1, data=df, weights=~dfweight)
>
> I used ids=~1 because there isn't a "cluster" variable in the survey (I
> know that the towns are ramdomly selected and then individuals are ramdomly
> selected, but there isn't a variable that indicate the stratification).
>
> At this point I am lost: I don't know if it is right to use the survey
> package and then what function use to run the regression, or there is a way
> to use the plm or glm functions taking account of the weights.
>
> I tried so hard to search a solution on the website but if you could give
> me an answer I'd be glad.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Return the indices of rows of a data frame

2016-09-20 Thread Bert Gunter
There are many good R tutorials on the web. Some recommendations can
be found here:

https://www.rstudio.com/online-learning/#R

Please spend some time learning fundamental R constructs and
functionality before posting what appear to be very basic questions
here.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Sep 19, 2016 at 8:37 PM, John  wrote:
> Hi,
>
>I have the following dataframe:
>
>> temp<-data.frame(a=c(1,1,2), b=2:4, c=1:3)
>> row.names(temp)<-c("D", "E", "F")
>> temp
>   a b c
> D 1 2 1
> E 1 3 2
> F 2 4 3
>
>I would like R to tell me which rows has value "a" equal to 1. The
> answer is the first row and the second row, or row D and row E. Which
> function should i use? function subset? function which?
>
>Thanks!
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm model with many categorical variables

2016-09-20 Thread Bert Gunter
You need statistical help, which is generally off topic here. I
suggest you post to a statistcal site like stats.stackexchange.com
instead. Better yet, find a local statistical expert with whom you can
consult.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Sep 20, 2016 at 1:34 AM, Michael Haenlein
 wrote:
> Dear all,
>
> I am trying to estimate a lm model with one continuous dependent variable
> and 11 independent variables that are all categorical, some of which have
> many categories (several dozens in some cases).
>
> I am not interested in statistical inference to a larger population. The
> objective of my model is to find a way to best predict my continuous
> variable within the sample.
>
> When I run the lm model I evidently get many regression coefficients that
> are not significant. Is there some way to automatically combine levels of a
> categorical variable together if the regression coefficients for the
> individual levels are not significant?
>
> My idea is to find some form of grouping of the different categories that
> allows me to work with less levels while keeping or even improving the
> quality of predictions.
>
> Thanks,
>
> Michael
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Return the indices of rows of a data frame

2016-09-20 Thread Robert Baer



On 9/19/2016 10:37 PM, John wrote:

Hi,

I have the following dataframe:


temp<-data.frame(a=c(1,1,2), b=2:4, c=1:3)
row.names(temp)<-c("D", "E", "F")
temp

   a b c
D 1 2 1
E 1 3 2
F 2 4 3

I would like R to tell me which rows has value "a" equal to 1. The
answer is the first row and the second row, or row D and row E. Which
function should i use? function subset? function which?


row.names(temp[temp$a==1,])

--


--
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine
A T Still University of Health Sciences
800 W. Jefferson St
Kirksville, MO 63501
660-626-2321 Department
660-626-2965 FAX

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issue on LGP solving

2016-09-20 Thread Dr. Debasis Ghosh
Thanks Petr!! However, I found in the goalprog package I found "achievements" 
as "a  data frame with the deviation variables for each objective together with 
the
priority level". I defined

> p1<-c(2,0,0,0,0,0)
> p2 <- c(0,0,0,0,1,0)
> p3<- c(0,0,0,0,0,1)
> achievement <- data.frame(p1,p2,p3)

Here p1, p2 and p3 are the 3 priority levels. 

I understand the problem is at "achievement" data frame. To your point, data 
frame with four named columns (objective, priority, p and n), how these four 
columns are defined ? 

Appreciate your time Petr. Thanks again!! 

Regards,
Debasis Ghosh, Ph.D

-Original Message-
From: PIKAL Petr [mailto:petr.pi...@precheza.cz] 
Sent: Tuesday, September 20, 2016 6:55 AM
To: Dr. Debasis Ghosh; R-help@r-project.org
Subject: RE: [R] Issue on LGP solving

Hi

Just a wild guess. Achievement in the goalprog package is data frame with four 
named columns (objective, priority, p and n).

Your achievement is 3 column data.frame with names p1, p2 and p3.

Maybe data frame with defined structure is required.

Cheers
Petr


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Dr.
> Debasis Ghosh
> Sent: Tuesday, September 20, 2016 8:12 AM
> To: R-help@r-project.org
> Subject: [R] Issue on LGP solving
>
> I was solving a LGP problem which is very basic.
>
>
>
> Find x0 = [x1; x2], n0 = [n1; n2; n3] and p0 = [p1; p2; p3] that minimize a =
> [(2p1); (n2); (n3)]
>
> The objectives are as follows
>
> 10x1 + 15x2 + n1 - p1 = 40
>
> 100x1 + 100x2 + n2 - p2 = 1000
>
> x2 + n3 - p3 = 7
>
> x; n; p >= 0
>
> The solution is x' = [4; 0] and a = [0; 600; 7]
>
>
>
>
>
> > local({pkg <- select.list(sort(.packages(all.available =
> TRUE)),graphics=TRUE)
>
> + if(nchar(pkg)) library(pkg, character.only=TRUE)})
>
> > local({pkg <- select.list(sort(.packages(all.available =
> TRUE)),graphics=TRUE)
>
> + if(nchar(pkg)) library(pkg, character.only=TRUE)})
>
>
>
>
>
> > coeff<-matrix (c(10,15,100,100,0,1), nrow=3, ncol=2, byrow=TRUE)
>
> > target<-c(40,1000,7)
>
> > p1<-c(2,0,0,0,0,0)
>
> > p2 <- c(0,0,0,0,1,0)
>
> > p3<- c(0,0,0,0,0,1)
>
> > achievement <- data.frame(p1,p2,p3)
>
> > achievement
>
>   p1 p2 p3
>
> 1  2  0  0
>
> 2  0  0  0
>
> 3  0  0  0
>
> 4  0  0  0
>
> 5  0  1  0
>
> 6  0  0  1
>
> > llgp(coeff,target,achievement)
>
>
>
> Do you have any idea why I am seeing below error ?
>
>
>
>
>
> Error in matrix(0, nrow = levels, ncol = nonbasics) :
>
>   invalid 'nrow' value (too large or NA)
>
> In addition: Warning messages:
>
> 1: In max(achievements$priority) :
>
>   no non-missing arguments to max; returning -Inf
>
> 2: In matrix(0, nrow = levels, ncol = nonbasics) :
>
>   NAs introduced by coercion to integer range
>
>
>
> Regards,
>
> Debasis Ghosh, Ph.D
>
>
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for 

Re: [R] Errors in Raster to Point

2016-09-20 Thread David Remotti
First answer is that R is not the proper environment for such a 
question. There a re many free package for image analysis or even GIS. 
Try for example Q-GIS


David


Il 19/09/2016 22:34, GwanSeon Kim ha scritto:

Hi, all
I am just beginner to use R.
I am working with TIF image file, and the information about the raster is
following:

class   : RasterLayer
dimensions  : 11150, 21808, 243159200  (nrow, ncol, ncell)
resolution  : 30, 30  (x, y)
extent  : 569685, 1223925, 1513995, 1848495  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0
+y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0
data source :
C:\Users\Gwan\AppData\Local\Temp\Rtmpg506Ee\raster\r_tmp_2016-09-14_122409_6260_09589.grd
names   : test_map
values  : 1, 225  (min, max)
attributes  :
ID OBJECTID Value Red Green Blue   Count   Class_Name
Opacity
  from:  02 1   1 00 5982503 Corn
 1
  to  : 48  255   254   0 00   10336 Dbl Crop Barley/Soybeans
 1



>From this Rasterlayer, I want to convert raster to point for each pixel
based on "Value (one of column name)" and create a raster with
georeferenced information.
I used code as following: RP <- rasterToPoints(KY_raster)
However, I could not get the points and have an error message "cannot
allocate vector of size 5.4 Gb" and "Your computer is low on memory. Save
your files and close these programs".
Could someone please help me how I can convert to raster to points??
Best,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm model with many categorical variables

2016-09-20 Thread Ismail SEZEN

> On 20 Sep 2016, at 11:34, Michael Haenlein  wrote:
> 
> Dear all,
> 
> I am trying to estimate a lm model with one continuous dependent variable
> and 11 independent variables that are all categorical, some of which have
> many categories (several dozens in some cases).

If I’m not wrong, ( I assume that categorical variables are in factor form) lm 
will pick the most crowded categories and will try to fit a linear model over 
them. (This might be wrong, please correct me somebody)

> 
> I am not interested in statistical inference to a larger population. The
> objective of my model is to find a way to best predict my continuous
> variable within the sample.

The best pick would be a CART ( Classification and Reg. Tree, rpart) or CIT 
(Conditional Inference Tree, ctree) model to predict continous response 
variable by categorical variables. Please, see new partykit (old party) package 
for CIT.

> 
> When I run the lm model I evidently get many regression coefficients that
> are not significant. Is there some way to automatically combine levels of a
> categorical variable together if the regression coefficients for the
> individual levels are not significant?


> 
> My idea is to find some form of grouping of the different categories that
> allows me to work with less levels while keeping or even improving the
> quality of predictions.

I also want to mention cforest here, you can measure the importance of your 
predictor variables. I would recommend partykit package for categorical 
predictors, but also you can give it a try to rpart.

> 
> Thanks,
> 
> Michael
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Issue on LGP solving

2016-09-20 Thread PIKAL Petr
Hi

Just a wild guess. Achievement in the goalprog package is data frame with four 
named columns (objective, priority, p and n).

Your achievement is 3 column data.frame with names p1, p2 and p3.

Maybe data frame with defined structure is required.

Cheers
Petr


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Dr.
> Debasis Ghosh
> Sent: Tuesday, September 20, 2016 8:12 AM
> To: R-help@r-project.org
> Subject: [R] Issue on LGP solving
>
> I was solving a LGP problem which is very basic.
>
>
>
> Find x0 = [x1; x2], n0 = [n1; n2; n3] and p0 = [p1; p2; p3] that minimize a =
> [(2p1); (n2); (n3)]
>
> The objectives are as follows
>
> 10x1 + 15x2 + n1 - p1 = 40
>
> 100x1 + 100x2 + n2 - p2 = 1000
>
> x2 + n3 - p3 = 7
>
> x; n; p >= 0
>
> The solution is x' = [4; 0] and a = [0; 600; 7]
>
>
>
>
>
> > local({pkg <- select.list(sort(.packages(all.available =
> TRUE)),graphics=TRUE)
>
> + if(nchar(pkg)) library(pkg, character.only=TRUE)})
>
> > local({pkg <- select.list(sort(.packages(all.available =
> TRUE)),graphics=TRUE)
>
> + if(nchar(pkg)) library(pkg, character.only=TRUE)})
>
>
>
>
>
> > coeff<-matrix (c(10,15,100,100,0,1), nrow=3, ncol=2, byrow=TRUE)
>
> > target<-c(40,1000,7)
>
> > p1<-c(2,0,0,0,0,0)
>
> > p2 <- c(0,0,0,0,1,0)
>
> > p3<- c(0,0,0,0,0,1)
>
> > achievement <- data.frame(p1,p2,p3)
>
> > achievement
>
>   p1 p2 p3
>
> 1  2  0  0
>
> 2  0  0  0
>
> 3  0  0  0
>
> 4  0  0  0
>
> 5  0  1  0
>
> 6  0  0  1
>
> > llgp(coeff,target,achievement)
>
>
>
> Do you have any idea why I am seeing below error ?
>
>
>
>
>
> Error in matrix(0, nrow = levels, ncol = nonbasics) :
>
>   invalid 'nrow' value (too large or NA)
>
> In addition: Warning messages:
>
> 1: In max(achievements$priority) :
>
>   no non-missing arguments to max; returning -Inf
>
> 2: In matrix(0, nrow = levels, ncol = nonbasics) :
>
>   NAs introduced by coercion to integer range
>
>
>
> Regards,
>
> Debasis Ghosh, Ph.D
>
>
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for 

[R] Run a fixed effect regression and a logit regression on a national survey that need to be "weighted"

2016-09-20 Thread laura roncaglia
I am a beginner user of R. I am using a national survey to test what
variables influence the partecipation in complementary pensions (the
partecipation in complementary pension is voluntary in my country).

Since the dependent variable is a dummy (1 if the person partecipate and 0
otherwise) I want to run a logit or probit regression; moreover I want to
run a fixed effect regression since I subset the survey in order to have
only the individuals interviewed more than one time.

The data frame is composed by several social and economical variables and
it also contain a variable "weight" which is the survey weight (they are
weighting coefficients to adjust the results of the sample to the national
data).

 family pers sex income pension1 101   F  1   12
201   F  2   13 202   M  4   04 30
1   M  25000   05 302   F  5   06 401   M
6   1

pers is the component of the family and pension takes 1 if the person
partecipate to complementary pension (it is a semplification of the
original survey, which contains more variables and observation (aroun 22k
observations)).

I know how to use the plm and glm functions for a fixed effect or logit
regressoin; in this case I don't know what to do since I need to take
account of the survey weights.

I used the svydesing function to "weight" the data frame:

df1 <- svydesign(ids=~1, data=df, weights=~dfweight)

I used ids=~1 because there isn't a "cluster" variable in the survey (I
know that the towns are ramdomly selected and then individuals are ramdomly
selected, but there isn't a variable that indicate the stratification).

At this point I am lost: I don't know if it is right to use the survey
package and then what function use to run the regression, or there is a way
to use the plm or glm functions taking account of the weights.

I tried so hard to search a solution on the website but if you could give
me an answer I'd be glad.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Issue on LGP solving

2016-09-20 Thread Dr. Debasis Ghosh
I was solving a LGP problem which is very basic. 

 

Find x0 = [x1; x2], n0 = [n1; n2; n3] and p0 = [p1; p2; p3] that minimize a
= [(2p1); (n2); (n3)]

The objectives are as follows

10x1 + 15x2 + n1 - p1 = 40

100x1 + 100x2 + n2 - p2 = 1000

x2 + n3 - p3 = 7

x; n; p >= 0

The solution is x' = [4; 0] and a = [0; 600; 7]

 

 

> local({pkg <- select.list(sort(.packages(all.available =
TRUE)),graphics=TRUE)

+ if(nchar(pkg)) library(pkg, character.only=TRUE)})

> local({pkg <- select.list(sort(.packages(all.available =
TRUE)),graphics=TRUE)

+ if(nchar(pkg)) library(pkg, character.only=TRUE)})

 

 

> coeff<-matrix (c(10,15,100,100,0,1), nrow=3, ncol=2, byrow=TRUE)

> target<-c(40,1000,7)

> p1<-c(2,0,0,0,0,0)

> p2 <- c(0,0,0,0,1,0)

> p3<- c(0,0,0,0,0,1)

> achievement <- data.frame(p1,p2,p3)

> achievement

  p1 p2 p3

1  2  0  0

2  0  0  0

3  0  0  0

4  0  0  0

5  0  1  0

6  0  0  1

> llgp(coeff,target,achievement)

 

Do you have any idea why I am seeing below error ?

 

 

Error in matrix(0, nrow = levels, ncol = nonbasics) : 

  invalid 'nrow' value (too large or NA)

In addition: Warning messages:

1: In max(achievements$priority) :

  no non-missing arguments to max; returning -Inf

2: In matrix(0, nrow = levels, ncol = nonbasics) :

  NAs introduced by coercion to integer range

 

Regards,

Debasis Ghosh, Ph.D

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using lm's subset parameter results in Error in xj[i] : invalid subscript type 'list'

2016-09-20 Thread PIKAL Petr
Hi

see in line

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of mviljamaa
> Sent: Tuesday, September 20, 2016 10:01 AM
> To: r-help@r-project.org
> Subject: [R] Using lm's subset parameter results in Error in xj[i] : invalid
> subscript type 'list'
>
> I'm trying to take lm on a subset of my dataset and to do this I believe I 
> need
> to pass my subset of the data as the subset parameter of lm.
>
> So I do my subsetting:
>
> firstkids <- kidmomhsage[0:234,], i.e. the first 234 rows of the data frame.

It works, however line numbering in R starts with 1 not 0. R is clever enough 
to subset with 0:xx vector however you will not get line 1 by subsetting with 0

dd<-data.frame(a=1:10, b=rnorm(10))
dd[0,]
[1] a b
<0 rows> (or 0-length row.names)

>
> Then construct the model:
>
> fit4 <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age +
> kidmomhsage$mom_hs + kidmomhsage$mom_age *
> kidmomhsage$mom_hs,
> subset=firstkids)

You definitelly should spend some time with introductory documentation. The 
above construction shall be e.g.

fit4 <- lm(kid_score ~ mom_age +mom_hs + mom_age *mom_hs, data = kidmomhsage,  
subset=1:234)

>
> which results in:
>
> Error in xj[i] : invalid subscript type 'list'
>
> I read somewhere a recommendation to use "unlist":

I wonder where did you read such recommendation for lm function, you should 
better avoid that source.

Cheers
Petr

>
> fit4 <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age +
> kidmomhsage$mom_hs + kidmomhsage$mom_age *
> kidmomhsage$mom_hs,
> subset=unlist(firstkids))
>
> which seems to not produce the error and results in some sort of model, but
> is this model the correct one (i.e. for the data set firstkids, just as it 
> originally
> appears)? How does unlist change the data?
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such 

[R] lm model with many categorical variables

2016-09-20 Thread Michael Haenlein
Dear all,

I am trying to estimate a lm model with one continuous dependent variable
and 11 independent variables that are all categorical, some of which have
many categories (several dozens in some cases).

I am not interested in statistical inference to a larger population. The
objective of my model is to find a way to best predict my continuous
variable within the sample.

When I run the lm model I evidently get many regression coefficients that
are not significant. Is there some way to automatically combine levels of a
categorical variable together if the regression coefficients for the
individual levels are not significant?

My idea is to find some form of grouping of the different categories that
allows me to work with less levels while keeping or even improving the
quality of predictions.

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using lm's subset parameter results in Error in xj[i] : invalid subscript type 'list'

2016-09-20 Thread Jorge Cimentada
By subsetting the rows in the firstkids data frame, you've already
subsetted your data.

Try specifying firstkids as your data instead of a subset in the lm call.
Also, eliminate the kidmomhsage prefix from all of your variables since
you're running the linear model on a different data frame(firstkids)

Something along this line:

lm(kid_score ~ mom_age , data = firstkids)

*Jorge Cimentada*
*Ph.D. Candidate*
Dpt. Ciències Polítiques i Socials
Ramon Trias Fargas, 25-27 | 08005 Barcelona

Office 24.331
[Tel.] 697 382 009
jorge.ciment...@upf.edu
http://www.upf.edu/dcpis/



On Tue, Sep 20, 2016 at 10:00 AM, mviljamaa  wrote:

> I'm trying to take lm on a subset of my dataset and to do this I believe I
> need to pass my subset of the data as the subset parameter of lm.
>
> So I do my subsetting:
>
> firstkids <- kidmomhsage[0:234,], i.e. the first 234 rows of the data
> frame.
>
> Then construct the model:
>
> fit4 <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age +
> kidmomhsage$mom_hs + kidmomhsage$mom_age * kidmomhsage$mom_hs,
> subset=firstkids)
>
> which results in:
>
> Error in xj[i] : invalid subscript type 'list'
>
> I read somewhere a recommendation to use "unlist":
>
> fit4 <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age +
> kidmomhsage$mom_hs + kidmomhsage$mom_age * kidmomhsage$mom_hs,
> subset=unlist(firstkids))
>
> which seems to not produce the error and results in some sort of model,
> but is this model the correct one (i.e. for the data set firstkids, just as
> it originally appears)? How does unlist change the data?
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Errors in Raster to Point

2016-09-20 Thread Michael Sumner
On Tue, 20 Sep 2016, 15:55 GwanSeon Kim  wrote:

> Hi, all
> I am just beginner to use R.
> I am working with TIF image file, and the information about the raster is
> following:
>
> class   : RasterLayer
> dimensions  : 11150, 21808, 243159200  (nrow, ncol, ncell)
> resolution  : 30, 30  (x, y)
> extent  : 569685, 1223925, 1513995, 1848495  (xmin, xmax, ymin, ymax)
> coord. ref. : +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0
> +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0
> data source :
>
> C:\Users\Gwan\AppData\Local\Temp\Rtmpg506Ee\raster\r_tmp_2016-09-14_122409_6260_09589.grd
> names   : test_map
> values  : 1, 225  (min, max)
> attributes  :
>ID OBJECTID Value Red Green Blue   Count   Class_Name
> Opacity
>  from:  02 1   1 00 5982503 Corn
> 1
>  to  : 48  255   254   0 00   10336 Dbl Crop Barley/Soybeans
> 1
>
>
>
> >From this Rasterlayer, I want to convert raster to point for each pixel
> based on "Value (one of column name)" and create a raster with
> georeferenced information.
> I used code as following: RP <- rasterToPoints(KY_raster)
> However, I could not get the points and have an error message "cannot
> allocate vector of size 5.4 Gb" and "Your computer is low on memory. Save
> your files and close these programs".
> Could someone please help me how I can convert to raster to points??
> Best,
>


The First question is why? The point (centre)coordinates of every pixel are
massively redundant since they are a simple function of cell index and the
raster's extent.

You might try as.data.frame with xy =TRUE to avoid any overhead in casting
to Spatial, but still it's very likely that this is just a step towards
your actual goal. Tell us what you want to do and I am sure there is a
better way.

Cheers, Mike

>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Dr. Michael Sumner
Software and Database Engineer
Australian Antarctic Division
203 Channel Highway
Kingston Tasmania 7050 Australia

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using lm's subset parameter results in Error in xj[i] : invalid subscript type 'list'

2016-09-20 Thread mviljamaa
I'm trying to take lm on a subset of my dataset and to do this I believe 
I need to pass my subset of the data as the subset parameter of lm.


So I do my subsetting:

firstkids <- kidmomhsage[0:234,], i.e. the first 234 rows of the data 
frame.


Then construct the model:

fit4 <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age + 
kidmomhsage$mom_hs + kidmomhsage$mom_age * kidmomhsage$mom_hs, 
subset=firstkids)


which results in:

Error in xj[i] : invalid subscript type 'list'

I read somewhere a recommendation to use "unlist":

fit4 <- lm(kidmomhsage$kid_score ~ kidmomhsage$mom_age + 
kidmomhsage$mom_hs + kidmomhsage$mom_age * kidmomhsage$mom_hs, 
subset=unlist(firstkids))


which seems to not produce the error and results in some sort of model, 
but is this model the correct one (i.e. for the data set firstkids, just 
as it originally appears)? How does unlist change the data?


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Where is R installed on my Linux?

2016-09-20 Thread Loris Bennett
Mike Wojnowicz  writes:

> I have successfully installed R on my AWS EC2 r3.8 box running Linux with
>>sudo yum install -y R
>
> However, I cannot find R anywhere (which I want for the sake of
> tar'ing it up and decompressing to make future installations easier.)
> For example,
>
>> rpm -ql R
>
> Says there is nothing to show.
>
> Does anyone have any ideas?
>
> -Mike

If you run

  yum info R

you'll find this description:

  This is a metapackage that provides both core R userspace and
  all R development components.

The metapackage consists of 'r-core' plus various other packages.  These
are the packages that 'rpm' sees.

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Where is R installed on my Linux?

2016-09-20 Thread Jim Lemon
Hi Mike,
Depending upon the flavor of Linux (looks like it's in the RedHat
family) it will usually start by running the command "R" in a
terminal. What does:

which R

say? Then look in the startup file (often in /usr/local/bin) for the
R_HOME directory.

Jim


On Tue, Sep 20, 2016 at 9:38 AM, Mike Wojnowicz  wrote:
> I have successfully installed R on my AWS EC2 r3.8 box running Linux with
>>sudo yum install -y R
>
> However, I cannot find R anywhere (which I want for the sake of tar'ing it up 
> and decompressing to make future installations easier.)  For example,
>
>> rpm -ql R
>
> Says there is nothing to show.
>
> Does anyone have any ideas?
>
> -Mike
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.