Re: [R-sig-eco] subsetting data in R

Manuel Spínola Tue, 26 Apr 2011 17:07:03 -0700

Thank you very much Ben.

I was doing an analysis of indicator species with the subset data and 
the other levels were still in my subset data and the analysis was 
considering them in the analysis.
My 3000 columns are plant species presence/absence type of data.


Best,

Manuel

On 26/04/2011 12:06 p.m., Ben Bolker wrote:
>    If this isn't already answered:
>
>    I don't quite understand the question: what do you mean by "do a
> complete data set from an object in R"?  What do you mean by "the
> subsetting is dangerous ... as you need to specify the levels for all
> your factors again"?
>
>    (What do your 3000 columns of data represent?  If these are predictor
> variables I hope you have a truly enormous number of responses ...)
>
>    It may have been mentioned already, but droplevels(subset(...)) will
> probably do what you want.  (I have tried very hard over the years to
> get drop.levels= to be an optional argument to subset(), but so far I
> have failed.  droplevels() is an improvement over the drop.levels()
> function in gdata because (1) it is in base R and (2) it doesn't reorder
> the factor by default (which is what gdata::drop.levels [insanely in my
> opinion] does).
>
> On 11-04-24 11:21 AM, Manuel Spínola wrote:
>> Thank you for all the responses.
>>
>> Is there a way to do a complete data set from an object in R?
>> I have a data set with more than 3000 columns.
>>
>> The subsetting is ok but it could be dangerous if you are using other
>> factors to do some analysis as you need to specify the levels for all
>> your factors again.
>>
>> Best,
>>
>> Manuel
>>
>> On 24/04/2011 08:30 a.m., Gustavo Carvalho wrote:
>>> pa2<- subset(pa, influencia=="AP")
>>> pa2$influencia<- factor(pa2$influencia)
>>> levels(pa2$influencia)
>>>
>>> On Sun, Apr 24, 2011 at 11:24 AM, Manuel SpÃnola<mspinol...@gmail.com>   
>>> wrote:
>>>> Thank you very much for your response, Christian, Roman, and Sarah.
>>>>
>>>> Sarah,
>>>>
>>>> I am trying your suggestion but I cannot see the levels:
>>>>
>>>>    >   pa2 = factor(subset(pa, influencia=="AP")$influencia)
>>>>    >   levels(pa2$influencia)
>>>> Error in pa2$influencia : $ operator is invalid for atomic vectors
>>>>
>>>> Best,
>>>>
>>>> Manuel
>>>>
>>>>
>>>>
>>>> On 24/04/2011 07:51 a.m., Sarah Goslee wrote:
>>>>> By default, read.csv() turns character variables into factors, using all 
>>>>> the
>>>>> unique values as the levels.
>>>>>
>>>>> subset() retains those levels by default, as they are a vital element of 
>>>>> the
>>>>> data. If you are studying some attribute of men and women, say height,
>>>>> even if you are only looking at the heights for women it's important to 
>>>>> remember
>>>>> that men still exist.
>>>>>
>>>>> If you don't want influencia to be a factor, you can change that in the 
>>>>> import
>>>>> stringsAsFactors=FALSE.
>>>>>
>>>>> If you do want influencia to be a factor, but want the unused levels to be
>>>>> removed, you can use factor() to do that.
>>>>>
>>>>>> testdata<- data.frame(group=c("A", "B", "C", "A", "B", "C"), value=1:6)
>>>>>> testdata
>>>>>      group value
>>>>> 1     A     1
>>>>> 2     B     2
>>>>> 3     C     3
>>>>> 4     A     4
>>>>> 5     B     5
>>>>> 6     C     6
>>>>>> str(testdata)
>>>>> 'data.frame': 6 obs. of  2 variables:
>>>>>     $ group: Factor w/ 3 levels "A","B","C": 1 2 3 1 2 3
>>>>>     $ value: int  1 2 3 4 5 6
>>>>>> subset(testdata, group=="A")
>>>>>      group value
>>>>> 1     A     1
>>>>> 4     A     4
>>>>>> subset(testdata, group=="A")$group
>>>>> [1] A A
>>>>> Levels: A B C
>>>>>> ?subset
>>>>>> factor(subset(testdata, group=="A")$group)
>>>>> [1] A A
>>>>> Levels: A
>>>>>
>>>>> Sarah
>>>>>
>>>>> On Sun, Apr 24, 2011 at 9:04 AM, Manuel SpÃnola<mspinol...@gmail.com>    
>>>>>  wrote:
>>>>>> Dear list members,
>>>>>>
>>>>>> I have a question regarding too subsetting a data set in R.
>>>>>>
>>>>>> I created an object for my data:
>>>>>>
>>>>>>     >pa = read.csv("espec_indic.csv", header = T, sep=",", check.names = 
>>>>>> F)
>>>>>>
>>>>>>     >     levels(pa$influencia)
>>>>>> [1] "AID" "AII" "AP"
>>>>>>
>>>>>> The object has 3 levels for influencia (AP, AID, AII)
>>>>>>
>>>>>> Now I subset only observations with influencia = "AID"
>>>>>>
>>>>>>     >pa2 = subset(pa, influencia=="AID")
>>>>>>
>>>>>> but if I ask for the levels of influencia still show me the 3 levels,
>>>>>> AP, AID, AII.
>>>>>>
>>>>>>     >     levels(pa2$influencia)
>>>>>> [1] "AID" "AII" "AP"
>>>>>>
>>>>>> Why is that?
>>>>>>
>>>>>> I was thinking that I was creating a new data frame with only AID as a
>>>>>> level for influencia.
>>>>>>
>>>>>> How can I make a complete new object with only the observations for
>>>>>> "AID" and that the only level for influencia is indeed "AID"?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Manuel
>>>>>>
>>>>>>
>>>> --
>>>> *Manuel SpÃnola, Ph.D.*
>>>> Instituto Internacional en ConservaciÃ³n y Manejo de Vida Silvestre
>>>> Universidad Nacional
>>>> Apartado 1350-3000
>>>> Heredia
>>>> COSTA RICA
>>>> mspin...@una.ac.cr
>>>> mspinol...@gmail.com
>>>> TelÃ©fono: (506) 2277-3598
>>>> Fax: (506) 2237-7036
>>>> Personal website: Lobito de rÃo
>>>> <https://sites.google.com/site/lobitoderio/>
>>>> Institutional website: ICOMVIS<http://www.icomvis.una.ac.cr/>
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-ecology mailing list
>>>> R-sig-ecology@r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>>
>>>>
>>
>>
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>


-- 
*Manuel Spínola, Ph.D.*
Instituto Internacional en Conservación y Manejo de Vida Silvestre
Universidad Nacional
Apartado 1350-3000
Heredia
COSTA RICA
mspin...@una.ac.cr
mspinol...@gmail.com
Teléfono: (506) 2277-3598
Fax: (506) 2237-7036
Personal website: Lobito de río 
<https://sites.google.com/site/lobitoderio/>
Institutional website: ICOMVIS <http://www.icomvis.una.ac.cr/>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] subsetting data in R

Reply via email to