Thanks for this, I knew not knowing how to do that join was a problem with me not understanding data.table, not a problem with data.table.
Very good to know the 'bysubl' "error" is fixed in 1.9.3 (even if it is brought about by users like me trying to do our joins wrongly :)) thanks, Amy On 20 June 2014 11:18, Arunkumar Srinivasan <[email protected]> wrote: > Hi Amy, > > Good to know that it’s not reproducible in 1.9.3. Matt already fixed it. > > X[Y, LHS := RHS] can not exceed nrow(X) because this assignment is made *by > reference*. If the join from X[Y] results in more than nrow(X), then X > will be to be re-allocated entirely. > > If you only want those that match with X, then you should do: X[Y, female > := i.female, nomatch=0L]. > > If instead you want all the rows from y, then you could do: x[y, > allow.cartesian=TRUE]. > > > Arun > > From: Amy [email protected] > Reply: Amy [email protected] > Date: June 20, 2014 at 3:01:50 AM > To: Arunkumar Srinivasan [email protected] > Cc: [email protected] > [email protected] > > Subject: Re: [datatable-help] What is going on with R 3.1 ? > > Hi Arun, > > In 1.9.3 I get the "Error in vecseq(f__, len__, if (allow.cartesian) NULL > else as.integer(max(nrow(x), : Join results in 33 rows; more than 28 = > max(nrow(x),nrow(i))...." message and it doesn't assign the column (upon > `x[y, female:=female]`, so no, the error doesn't occur. > > But as an aside, shouldn't it this command work? > If I have x with subjects a, a, b, c, d; y with genders for subjects a--f, > shouldn't x[y, female:=female] copy the female column from y to x, > duplicating as necessary? > Of course y[x] produces the table I'm after, but in the case that y has > extra columns I /don't/ want in the output and x has extra columns I /do/, > `y[x]` is then not the table I'm after. (But now we are straying into a > different question, my limited understanding of how to use data.table, as > opposed to the bug this thread is about). > > PS - typo on the data.table Readmein the "if you get latex errors during > installation" bit: > > devtools:::install_github("datat.able", ...) > > "datat.able" --> "data.table". > > cheers > Amy > > > On 20 June 2014 10:51, Arunkumar Srinivasan <[email protected]> wrote: > >> Hi, >> >> Could you let us know if you’re able to reproduce it in the devel >> version 1.9.3 <https://github.com/Rdatatable/data.table> as well? >> >> >> Arun >> >> From: mathematical.coffee [email protected] >> Reply: mathematical.coffee [email protected] >> Date: June 20, 2014 at 2:44:50 AM >> To: [email protected] >> [email protected] >> Subject: Re: [datatable-help] What is going on with R 3.1 ? >> >> Hi all, >> >> Sorry to resurrect an old thread, but I've been experiencing these >> problems >> too and have come up with a reproducible example (for me anyway). >> >> Data.table 1.9.2, R 3.1.0 >> >> I was trying to join some tables and got the usual "rerun with >> allow.cartesian=TRUE" message like Michele, and then got this error: >> >> Error in if (!is.null(lhs)) { : missing value where TRUE/FALSE needed >> >> However while I was trying to strip down my data to reproduce the error, I >> now consistently get this one instead: >> >> Error in `[.data.table`(x, y, `:=`(female, female)) : >> object 'bysubl' not found >> >> >> rather than the TRUE/FALSE one. But they seem to be related. >> >> * x has a column of subjects, some duplicated >> * y has a column of subjects, none duplicated, and some not present in x >> (all subjects of x are in y though). >> * y additionally has a binary column `female` that I wish to join into x >> >> (I know there are other ways to do this, but this is a stripped down >> example >> and seems to point out something going wrong in data.table so it is just >> an >> illustrative example): >> >> ``` >> library(data.table) >> x=fread('x.csv') >> y=fread('y.csv') >> setkey(x, subject) >> setkey(y, subject) >> >> x[y] >> # Error in vecseq(f__, len__, if (allow.cartesian) NULL else >> as.integer(max(nrow(x), : >> # Join results in 33 rows; more than 28 = max(nrow(x),nrow(i)). Check for >> duplicate key values in i, each of which join to the same group in x over >> and over again. If that's ok, try including `j` and dropping `by` >> (by-without-by) so that j runs for each group to avoid the large >> allocation. >> If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. >> Otherwise, please search for this error message in the FAQ, Wiki, Stack >> Overflow and datatable-help for advice. >> >> x[y, female:=female] >> Error in `[.data.table`(x, y, `:=`(female, female)) : >> object 'bysubl' not found >> ``` >> >> I get the above reproducibly with this dataset. >> >> From now onwards, if I type in 'x' or 'y' into the prompt I get nothing >> printed at all. Additionally: >> >> ``` >> tables() >> # Error in gettext(domain, unlist(args)) : invalid 'string' value >> # Error: argument "finally" is missing, with no default >> ``` >> >> The only solution is to restart the R session. >> >> Note: this *doesn't* occur if the column I try to merge (`female` in this >> case) is continuous, for example. I can only get it if it's logical. >> >> I've attached x.csv and y.csv to this email for you to play with. >> >> I think it might be possible to strip down the tables to less rows (x has >> 28, y has 26) but in my (not exhaustive) attempts to do so, I didn't get >> this particular error. >> >> x.csv <http://r.789695.n4.nabble.com/file/n4692401/x.csv> >> y.csv <http://r.789695.n4.nabble.com/file/n4692401/y.csv> >> >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/What-is-going-on-with-R-3-1-tp4689002p4692401.html >> Sent from the datatable-help mailing list archive at Nabble.com. >> _______________________________________________ >> datatable-help mailing list >> [email protected] >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> >> >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
