My gut tells me that if you take chunks it will be faster with simple for loops. If you have multiple cores, setup one process per core to parse the files. Maybe chunk in files or in line numbers, but each chunk should be big (100K plus lines) so there is low overhead for process swapping. It is very fast to join the resulting lists into one big one. Warning multi process work can be *very hard* to debug :-). Greg Harris Harris Consulting Group Pty Ltd [email protected] www.HarrisConsultingGroup.com <http://harrisconsultinggroup.com/home.htm> Sydney Australia
On Thu, Dec 12, 2019 at 9:23 AM mike smith <[email protected]> wrote: > > > On Thu, Dec 12, 2019, 08:42 Greg Keogh <[email protected]> wrote: > >> >> What if you do parallel, but on batches of >>> <LineCount>/Environment.ProcessorCount lines per thread? >>> >> >> I used a simple Parallel.ForEach and I think the default partitioning >> algorithm does what you describe (I hope!) --* Greg* >> > > > Your description of "string parse over in a blink, threading overhead > more" suggests that it isn't? > > Setup 20 (??) threads permanently and feed them with lines from file(s) on > demand? > > > Mike >> >>>
