Hi!

The join implementations used for the DataSet API and for the Blink Planner
are quite intricate. They make use of these custom memory segments, to
operate as much as possible on bytes, to control JVM memory utilization and
to save serialization costs.
That makes the implementation super complicated.

Do your experiments need that kind of behavior?
If not, it might actually be simpler to just start with some very simple
object-based hash table.

If you want to actually work with these classes, I would suggest to first
change the MutableHashTable a bit. It is probably easier to remove the
logic to pull from iterators into build and probe side, but rather do this
outside the class.
(I am assuming you plan to work with a fork for research here).

Best,
Stephan




On Thu, Jul 25, 2019 at 11:58 AM Benjamin Burkhardt <
pri...@benjaminburkhardt.de> wrote:

> Hi,
>
> while doing a join, a MutableHashTable is created and filled while
> building. After building it is closed and the probing can begin.
> I would like to start probing while building the hash table still runs.
> (ignoring the fact that this would lead to join misses...)
>
> Anyone having an idea how one could do that?
>
> Am 24. Juli 2019, 04:14 +0200 schrieb Caizhi Weng <tsreape...@gmail.com>:
> > Hi Benjamin,
> >
> > As you mentioned hash join I assume that you are referring to
> > `HashJoinOperator` in blink planner.
> >
> > The input is selected by `nextSelection` method. As you can see, it will
> > first read all records in the build side then read all records in the
> probe
> > side. So the probing will only start after the build side has been fully
> > received.
> >
> > Apart from this specific question, I'm interested in how you're going to
> > implement a hash join where any of the two sides can be read. Could you
> > share your ideas or give some hints about this? Thanks a lot.
> >
> > Benjamin Burkhardt <pri...@benjaminburkhardt.de> 于2019年7月24日周三 上午1:24写道:
> >
> > > Hi all,
> > >
> > > Let’s imagine a simple repartition hash Join oft two tables.
> > >
> > > As soon as the first table is hashed completely (all EndOfPartition
> Events
> > > sent) the shipping and probing of the second table starts.
> > >
> > > What I can’t find:
> > >
> > > 1. What triggers to start the probing exactly?
> > > 2. Where can I find it in the code?
> > >
> > >
> > > My final goal is to change the 2-phase join mechanism to a mixed
> > > implementation where probing for finished subpartitions begins earlier.
> > >
> > > I appreciate any help.
> > >
> > > Thanks.
> > >
> > > Benjamin
> > >
>

Reply via email to