On 31/01/2021 3:57 p.m., Martin Møller Skarbiniks Pedersen wrote:
This is really puzzling me and when I try to make a small example
everything works like expected.

The problem:

I got these two large vectors of strings.

str(s1)
  chr [1:766608] "0.dk" ...
str(s2)
  chr [1:59387] "043.dk" "0606.dk" "0618.dk" "0888.dk" "0iq.dk" "0it.dk" ...

And I need to create the union-set of s1 and s2.
I expect the size of the union-set to be between 766608 and 766608+59387.
However it is 681193 which is less that number of elements in s1!

length(base::union(s1, s2))
[1] 681193

Any hints?

I imagine unique(s1) is shorter than s1.  The union function is the same as

unique(c(s1, s2))

for your data. (The only difference is if s1 or s2 is named: the names are dropped.)

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to