On Thursday, 15 June 2017 at 11:48:54 UTC, Ivan Kazmenko wrote:
On Thursday, 15 June 2017 at 06:06:01 UTC, MGW wrote:
There are two arrays of string [] mas1, mas2; Size of each
about 5M lines. By the size they different, but lines in both
match for 95%. It is necessary to find all lines in an array
of mas2 which differ from mas1. The principal criterion -
speed. There are the 8th core processor and it is good to
involve a multithreading.
The approaches which come to mind are:
clip
taking constant time.
Ivan Kazmenko.
As a follow up to this, if your alphabet is reasonably small
perhaps could run radix sort based on the first few characters to
split your arrays up into smaller subsets, and then use one of
Ivan's suggestions within each subset. Subset searches could be
easily run in parallel.