On Thursday, 15 June 2017 at 11:48:54 UTC, Ivan Kazmenko wrote:
On Thursday, 15 June 2017 at 06:06:01 UTC, MGW wrote:
There are two arrays of string [] mas1, mas2; Size of each about 5M lines. By the size they different, but lines in both match for 95%. It is necessary to find all lines in an array of mas2 which differ from mas1. The principal criterion - speed. There are the 8th core processor and it is good to involve a multithreading.

The approaches which come to mind are:

clip
taking constant time.

Ivan Kazmenko.

As a follow up to this, if your alphabet is reasonably small perhaps could run radix sort based on the first few characters to split your arrays up into smaller subsets, and then use one of Ivan's suggestions within each subset. Subset searches could be easily run in parallel.

Reply via email to