On 27/10/2020 15:36, vignesh C wrote:
Attached v9 patches have the fixes for the above comments.

I find this design to be very complicated. Why does the line-boundary information need to be in shared memory? I think this would be much simpler if each worker grabbed a fixed-size block of raw data, and processed that.

In your patch, the leader process scans the input to find out where one line ends and another begins, and because of that decision, the leader needs to make the line boundaries available in shared memory, for the worker processes. If we moved that responsibility to the worker processes, you wouldn't need to keep the line boundaries in shared memory. A worker would only need to pass enough state to the next worker to tell it where to start scanning the next block.

Whether the leader process finds the EOLs or the worker processes, it's pretty clear that it needs to be done ASAP, for a chunk at a time, because that cannot be done in parallel. I think some refactoring in CopyReadLine() and friends would be in order. It probably would be faster, or at least not slower, to find all the EOLs in a block in one tight loop, even when parallel copy is not used.

- Heikki


Reply via email to