On Sat, 30 May 2009, [ISO-8859-1] Sebasti�n Goinheix wrote: > OK.This is a great proyect, and i`m very happy to participate (at least in > the user list). > Thank you very much. > > 2009/5/30 Allin Cottrell <cottrell(a)wfu.edu> > > I'm in transit right now, but will try to offer and answer before > > long.
Meanwhile Jack Lucchetti has posted a possible solution. But I'll go ahead and give mine too -- it's more complicated but I think it may be more general. I'm supposing you have a data set that is structurally similar to this simple hypothetical example: hhid y x 10004 100 1 10004 110 4 24532 90 4 24532 120 4 24532 100 2 39800 150 5 46541 100 4 46541 80 3 46541 90 6 where "hhid" records the household identifier for various individuals, and y and x are the variables of interest. I'm assuming you want to consolidate the data by household, either by summing the values or possibly taking a household average. Here's my solution: <script> # Supose the above data are in hh.txt open hh.txt scalar n = $nobs # how many households are there? matrix hhvals = values(hhid) scalar nhh = rows(hhvals) printf "Found %d households\n", nhh # how many variables are there? (excluding the constant) scalar nv = $nvars - 1 printf "We have %d variables\n", nv # create a matrix to hold the household data (with an extra # column for the number of members) matrix X = zeros(nhh, nv + 1) # create list of variables (excluding hhid) list vars = dataset vars -= hhid # scalars for accounting scalar j, Xrow, Xcol # form household-level variables in matrix X: here I'm just # summing the values for the members of the household loop i=1..n --quiet loop j=1..nhh --quiet if hhid[i] = hhvals[j] printf "obs %d belongs to household %d\n", i, hhvals[j] Xrow = j break endif endloop # column 1 holds the household ID X[Xrow,1] = hhid[i] Xcol = 2 loop foreach k vars --quiet X[Xrow, Xcol] += $k[i] Xcol++ endloop # in the last column of X, cumulate the number of members # in the given household X[Xrow,Xcol] += 1 endloop # print HH data in matrix form to check print X # replace original dataset with household version (one could # form household means here, if wanted) loop i=1..nhh --quiet hhid[i] = X[i,1] Xcol = 2 loop foreach k vars --quiet $k[i] = X[i, Xcol] Xcol++ endloop endloop # restrict the sample to the number of households and save smpl 1 nhh series nmembers = X[, nv+1] setinfo nmembers -d "Number of people in household" print --byobs store hh2.gdt </script> The outline is that we take the original data, cumulate it into a matrix, then use the matrix to overwrite the first nhh rows of the original dataset, then finally chop off the unwanted rows with "smpl" and save under a new name. The household IDs don't have to be consecutive, or 1-based, and the rows do not have to be organized by household. Allin Cottrell