Re: Parallel copy

Heikki Linnakangas Fri, 30 Oct 2020 15:10:06 -0700

On 30/10/2020 22:56, Tomas Vondra wrote:

I agree this design looks simpler. I'm a bit worried about serializing
the parsing like this, though. It's true the current approach (where the
first phase of parsing happens in the leader) has a similar issue, but I
think it would be easier to improve that in that design.


My plan was to parallelize the parsing roughly like this:

1) split the input buffer into smaller chunks

2) let workers scan the buffers and record positions of interesting
characters (delimiters, quotes, ...) and pass it back to the leader

3) use the information to actually parse the input data (we only need to
look at the interesting characters, skipping large parts of data)

4) pass the parsed chunks to workers, just like in the current patch


But maybe something like that would be possible even with the approach
you propose - we could have a special parse phase for processing each
buffer, where any worker could look for the special characters, record
the positions in a bitmap next to the buffer. So the whole sequence of
states would look something like this:

      EMPTY
      FILLED
      PARSED
      READY
      PROCESSING

I think it's even simpler than that. You don't need to communicate the"interesting positions" between processes, if the same worker takes careof the chunk through all states from FILLED to DONE.

You can build the bitmap of interesting positions immediately in FILLEDstate, independently of all previous blocks. Once you've built thebitmap, you need to wait for the information on where the first linestarts, but presumably finding the interesting positions is theexpensive part.

Of course, the question is whether parsing really is sufficiently
expensive for this to be worth it.

Yeah, I don't think it's worth it. Splitting the lines is pretty fast, Ithink we have many years to come before that becomes a bottleneck. Butif it turns out I'm wrong and we need to implement that, the path ispretty straightforward.


- Heikki

Re: Parallel copy

Reply via email to