Hello, I am facing a quite challenging task (at least to me) and I was wondering if someone could advise how R could assist me to speed the task up.
I am dealing with a dataset with 3 discrete variables and one continuous variable. The discrete variables are: V1: 8 modalities V2: 13 modalities V3: 13 modalities The continuous variable V4 is a decimal number always greater than zero in the marginals of each of the 3 variables but it is sometimes equal to zero (and sometimes negative) in the joint tables. I have got 2 files: => one with distribution of all possible combinations of V1xV2 (some of which are zero or neagtive) and => one with the marginal distribution of V3. I am trying to build the long and narrow dataset V1xV2xV3 in such a way that each V1xV2 cell does not get modified and V3 fits as closely as possible to its marginal distribution. Does it make sense? To be even more specific, my 2 input files look like the following. FILE 1 V1,V2,V4 A, A, 24.251 A, B, 1.065 (...) B, C, 0.294 B, D, 2.731 (...) H, L, 0.345 H, M, 0.000 FILE 2 V3, V4 A, 1.575 B, 4.294 C, 10.044 (...) L, 5.123 M, 3.334 What I need to achieve is a file such as the following FILE 3 V1, V2, V3, V4 A, A, A, ??? A, A, B, ??? (...) D, D, E, ??? D, D, F, ??? (...) H, M, L, ??? H, M, M, ??? Please notice that FILE 3 need to be such that if I aggregate on V1+V2 I recover exactly FILE 1 and that if I aggregate on V3 I can recover a file as close as possible to FILE 3 (ideally the same file). Can anyone suggest how I could do that with R? Thank you very much indeed for any assistance you are able to provide. Kind regards, Luca [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.