Specific answers based on my understanding.

 > I did not mean that a pass-through operator should not take the 
ownership of a batch it processes. My question was whether they do so 
and if they do, when and how.

Yes, operators do take ownership, somewhere in the process of calling next() on 
their inputs. The exact place may vary between operators. In the Sort, for 
example, the code first checks the incoming batch size, spills sorted batches 
if needed to make space, then takes ownership. I'd go so far as to say that, if 
an operator does not take ownership, then it is a bug.

> As far as I can see in the 
ProjectorTemplate code, the transfer is not done in all cases and when 
Projector operates in sv2 mode, there is no transfer of the ownership. 

Template code is code that is copied for each generated operator. In general, 
this code should be minimal. Code that is common to all operator instances 
should not reside in the template. Instead, it should reside in the operator 
(the so-called RecordBatch). There is really no reason to copy the same byte 
codes over and over, taking up space in the code cache.

That said, the code to take ownership is likely to be in the Project operator 
implementation. Look for a place that works with "transfer pairs", they are the 
actual transfer mechanism. A quick glance at the code suggests this is done in 
ProjectRecordBatch.setupNewSchemaFromInput(). (An unfortunate name if we also 
do transfers.)

> Additionally, when there is a transfer, it is done when the processing 
of the batch is almost complete. 

Depends on what you mean by "almost complete." Since Project is 
single-threaded, there is no harm in doing the transfer later rather than 
sooner; the upstream operator won't be called until Project again calls next(). 
Makes sense to do it earlier, but not necessary.

> IMO, such behavior is counter intuitive 
and I would expect that if there is a transfer of the ownership, it is 
part of  RecordBatch.next(), meaning that once an operator gets a 
reference to a record batch, it owns it. 

Perhaps. But, the Operator (that is, RecordBatch) protocol is a bit fussy. The 
next() call to RecordBatch tells that RecordBatch to build a batch of data and 
make it available. An operator has no visibility to its parent (its downstream 
operator). The caller must do the transfer as only the caller has visibility to 
its own vector container and that of the upstream (incoming) record batch. Yes, 
this is quite confusing. Nothing beats stepping though several operators to see 
how this works in practice.

Here, I will put in a plug for the revised Operator classes in the "batch 
handling" code. The new classes try to disentangle the many bits of 
functionality combined in Record Batch. Those three are: 1) iterator protocol, 
2) batch management, and 3) operator implementation. I believe we'll all 
understand this code better if we can separate these three concerns.

> At this point, an operator may 
consume content of the record batch and create a completely new record 
batch or it can modify the record batch and pass it to the next 
downstream operator.

Just to be clear, record batches (specifically vectors) are immutable. It is 
not possible to modify a record batch. One can, however reuse parts of it. A 
Filter can slap on an SV2. A Project can discard some vectors, add others, and 
retain still others. But, in both cases, the operator must produce a new batch 
based on those vectors. Specifically, each operator has its own VectorContainer 
that contain its own vectors. Sharing occurs at the level of DrillBufs that 
underlie the vectors. (Again, quite confusing, but it makes sense once you 
understand the operator allocators we discussed previously.)

Part of the complexity comes from proper memory management. New vectors are 
allocated in the Project operator's allocator. Retained vectors are transferred 
from the upstream operator's allocator (ledger) to the that of the Project 
operator. Discarded vectors are released (perhaps after being shifted into the 
Project operator's allocator.)

OK, again enough for one note. More to come.

Thanks,

- Paul
  

Reply via email to