Hey,

I agree with Martin on this. It's the optimizer's job to decide the join
strategy.

Maybe the join hint worked on 99% of your cases, but we can't simply
generalize this for all datasets and algorithms and hard-code a joint hint
that assumes that the vertex set is always much smaller than the edge set.

Cheers,
Vasia.

On 22 August 2015 at 11:28, Martin Junghanns <m.jungha...@mailbox.org>
wrote:

> Hi,
>
> I guess enforcing a Join Strategy by default is not the best option since
> you can't assume what the user did before actually calling the Gelly
> functions and how the data looks like (maybe its one of the 1% graphs where
> the relation is the other way around or the vertex data set is very large);
> maybe the datasets are already sorted / partitioned. Another solution could
> be overloading the Gelly functions that use joins and letting the users
> decide to give hints or not?
>
> As an example, I am currently benchmarking graphs with up to 700M vertices
> and 3B edges on a YARN cluster and at one point in the job I need to join
> vertices and edges. I also tried to give the broadcast-hash-second
> (vertices) hint and the job performed significantly slower than letting the
> system decide.
>
> Best,
> Martin
>
>
> On 22.08.2015 09:51, Andra Lungu wrote:
>
>> Hey everyone,
>>
>> When coding for my thesis, I observed that half of the current Gelly
>> functions (the ones that use join operators) fail on a cluster environment
>> with the following exception:
>>
>> java.lang.IllegalArgumentException: Too few memory segments provided.
>> Hash Join
>> needs at least 33 memory segments.
>>
>> This is because, in 99% of the cases, the vertex data set is significantly
>> smaller than the edge data set. What I did to get rid of the error was the
>> following:
>>
>> DataSet<Tuple2<Edge<K, EV>, Vertex<K, VV>>> edgesWithSources = edges
>>        .join(this.vertices,
>> JoinOperatorBase.JoinHint.BROADCAST_HASH_SECOND).where(0).equalTo(0)
>>
>> In short, I added join hints. I believe this should also be in Gelly, in
>> case someone bumps into the same problem somewhere in the future.
>>
>> What do you think?
>>
>>
>

Reply via email to