On Jul 30, 2009, at 11:15 AM, Jose Iparraguirre D'Elia wrote:

Dear R users,

Consider the first two columns of a data frame like this:

z[,1:2]

x y

1 1 1

2 2 2

3 3 3

4 1 4



Imagine that y represents the times that the value x happens in a population. But z is not exactly a frequency table, because in z we have x=1 twice. So, the x=1 in the first line and the x=1 in the fourth are not the same, differing according to a third variable in the data frame.

Now, I use the function rep() in order to obtain a vector of values of x in the population:

x.pop <- rep(x,y)

x.pop

[1] 1 2 2 3 3 3 1 1 1 1

How can I go from x.pop back to z? If I use table(x.pop), I obtain a frequency table like the one below, but not z.

table(x.pop)

x.pop

1 2 3

5 2 3


(I know I haven't deleted z, obviously, but I need to write a piece of code to do something very similar).

Just in case anyone is wondering by now whether this is an assignment for college, etc.,-it is not. The real world problem I'm working on at the moment has to do with income distribution in Northern Ireland. I want to see how many people would leave poverty if the income of those currently below 60% median income increases by, say, £20 a week. I am working with the Family Resources Survey sample for Northern Ireland (n=2,263), which I have to gross up before increasing the incomes (grossed up n=1,712,886). Once I increased the income figures for those individuals in poverty, I need to 'un-gross' the data to get back to n=2,263 -and table() simply does not do the trick, because of exactly the same situation in the example above.

So, please, how can I retrieve z?

Many thanks,

Jose

Presuming that your larger case is similar in structure to 'x.pop', which is to say that each unique value is in sequential runs, you can use:

z <- do.call(data.frame, rle(x.pop))[, c(2, 1)]

colnames(z) <- c("x", "y")

> z
x y
1 1 1
2 2 2
3 3 3
4 1 4


See ?rle for more information on summarizing runs of values. The core of the first step above yields:

> rle(x.pop)
Run Length Encoding
lengths: int [1:4] 1 2 3 4
values : num [1:4] 1 2 3 1

which is a list of two elements, that we coerce to a data frame using do.call(), reversing the two columns to match your original order.

HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to