Your arguments are perfectly valid. So, what I suggest is to have the
functions as they are now, e.g. groupReduceOnNeighbors
 and to add a groupReduceOnNeighbors(blablaSameArguments, boolean
useJoinHints). That way, the user can decide whether they'd like to trade
speed for a program that actually finishes :).

On Sat, Aug 22, 2015 at 11:28 AM, Martin Junghanns <m.jungha...@mailbox.org>
wrote:

> Hi,
>
> I guess enforcing a Join Strategy by default is not the best option since
> you can't assume what the user did before actually calling the Gelly
> functions and how the data looks like (maybe its one of the 1% graphs where
> the relation is the other way around or the vertex data set is very large);
> maybe the datasets are already sorted / partitioned. Another solution could
> be overloading the Gelly functions that use joins and letting the users
> decide to give hints or not?
>
> As an example, I am currently benchmarking graphs with up to 700M vertices
> and 3B edges on a YARN cluster and at one point in the job I need to join
> vertices and edges. I also tried to give the broadcast-hash-second
> (vertices) hint and the job performed significantly slower than letting the
> system decide.
>
> Best,
> Martin
>
>
> On 22.08.2015 09:51, Andra Lungu wrote:
>
>> Hey everyone,
>>
>> When coding for my thesis, I observed that half of the current Gelly
>> functions (the ones that use join operators) fail on a cluster environment
>> with the following exception:
>>
>> java.lang.IllegalArgumentException: Too few memory segments provided.
>> Hash Join
>> needs at least 33 memory segments.
>>
>> This is because, in 99% of the cases, the vertex data set is significantly
>> smaller than the edge data set. What I did to get rid of the error was the
>> following:
>>
>> DataSet<Tuple2<Edge<K, EV>, Vertex<K, VV>>> edgesWithSources = edges
>>        .join(this.vertices,
>> JoinOperatorBase.JoinHint.BROADCAST_HASH_SECOND).where(0).equalTo(0)
>>
>> In short, I added join hints. I believe this should also be in Gelly, in
>> case someone bumps into the same problem somewhere in the future.
>>
>> What do you think?
>>
>>
>

Reply via email to