Re: HashedRelation Memory Pressure on Broadcast Joins

2016-03-07 Thread Davies Liu
The underlying buffer for UnsafeRow is reused in UnsafeProjection. On Thu, Mar 3, 2016 at 9:11 PM, Rishi Mishra wrote: > Hi Davies, > When you say "UnsafeRow could come from UnsafeProjection, so We should copy > the rows for safety." do you intend to say that the

Re: HashedRelation Memory Pressure on Broadcast Joins

2016-03-03 Thread Rishi Mishra
Hi Davies, When you say *"UnsafeRow could come from UnsafeProjection, so We should copy the rows for safety." *do you intend to say that the underlying state might change , because of some state update APIs ? Or its due to some other rationale ? Regards, Rishitesh Mishra, SnappyData .

Re: HashedRelation Memory Pressure on Broadcast Joins

2016-03-02 Thread Davies Liu
I see, we could reduce the memory by moving the copy out of the HashedRelation, then we should do the copy before call HashedRelation for shuffle hash join. Another things is that when we do broadcasting, we will have another serialized copy of hash table. For the table that's larger than 100M,

Re: HashedRelation Memory Pressure on Broadcast Joins

2016-03-02 Thread Matt Cheah
I would expect the memory pressure to grow because not only are we storing the backing array to the iterator of the rows on the driver, but we’re also storing a copy of each of those rows in the hash table. Whereas if we didn’t do the copy on the drive side then the hash table would only have to

Re: HashedRelation Memory Pressure on Broadcast Joins

2016-03-02 Thread Davies Liu
UnsafeHashedRelation and HashedRelation could also be used in Executor (for non-broadcast hash join), then the UnsafeRow could come from UnsafeProjection, so We should copy the rows for safety. We could have a smarter copy() for UnsafeRow (avoid the copy if it's already copied), but I don't think

HashedRelation Memory Pressure on Broadcast Joins

2016-03-01 Thread Matt Cheah
Hi everyone, I had a quick question regarding our implementation of UnsafeHashedRelation and HashedRelation. It appears that we copy the rows that we’ve collected into