To clarify on my previous post, here is a representation of what I am trying to 
accomplish:

I would like every unique value in either column to be assigned a number so 
like so:

    V1    V2       V3
1   sun  moon    stars
2 stars  moon      sun
3   cat   dog   catdog
4   dog  moon      sun
5  bird plane superman
6  1000   dog     2000

Level                   Value
sun                     ->      0
stars           ->      1
cat                     ->      2
dog                     ->      3
bird                    ->      4
1000            ->      5
moon            ->      6
plane           ->      7
catdog          ->      8
superman        ->      9
2000            ->   10
etc
etc

so internally its represented as:

    V1    V2       V3
1   0           6       1
2   1           6       0
3   2           3       8
4   3           6       0
5   4           7       9
6   5           3       10

does this make sense?  I am hoping there is a way to accomplish this.

Brian

On Nov 23, 2012, at 11:42 PM, Brian Feeny <bfe...@mac.com> wrote:

> 
> I am trying to make it so two columns with similar data use the same internal 
> numbers for same factors, here is the example:
> 
>> read.csv("test.csv",header =FALSE,sep=",")
>     V1    V2       V3
> 1   sun  moon    stars
> 2 stars  moon      sun
> 3   cat   dog   catdog
> 4   dog  moon      sun
> 5  bird plane superman
> 6  1000   dog     2000
>> data <- read.csv("test.csv",header =FALSE,sep=",")
>> str(data)
> 'data.frame': 6 obs. of  3 variables:
> $ V1: Factor w/ 6 levels "1000","bird",..: 6 5 3 4 2 1
> $ V2: Factor w/ 3 levels "dog","moon","plane": 2 2 1 2 3 1
> $ V3: Factor w/ 5 levels "2000","catdog",..: 3 4 2 4 5 1
> 
>> as.numeric(data$V1)
> [1] 6 5 3 4 2 1
>> as.numeric(data$V2)
> [1] 2 2 1 2 3 1
>> as.factor(data$V1)
> [1] sun   stars cat   dog   bird  1000 
> Levels: 1000 bird cat dog stars sun
>> as.factor(data$V2)
> [1] moon  moon  dog   moon  plane dog  
> Levels: dog moon plane
> 
> 
> So notice "dog" is 4 in V1, yet its 1 in V2.  Is there a way, either on 
> import, or after, to have factors computed for both columns and assigned
> the same internal values?
> 
> Brian
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to