Re: Iterator of KeyValueGroupedDataset.flatMapGroupsWithState function

2018-10-31 Thread Tathagata Das
It is okay to collect the iterator. That will not break Spark. However,
collecting it requires memory in the executor, so you may cause OOMs if a
group has a LOT of new data.

On Wed, Oct 31, 2018 at 3:44 AM Antonio Murgia -
antonio.murg...@studio.unibo.it  wrote:

> Hi all,
>
> I'm currently developing a Spark Structured Streaming job and I'm
> performing flatMapGroupsWithState.
>
> I'm concerned about the laziness of the Iterator[V] that is passed to my
> custom function (func: (K, Iterator[V], GroupState[S]) => Iterator[U]).
>
> Is it ok to collect that iterator (with a toList, for example)? I have a
> logic that is practically impossible to perform on a Iterator, but I do not
> want to break Spark lazy chain, obviously.
>
>
> Thank you in advance.
>
>
> #A.M.
>


Iterator of KeyValueGroupedDataset.flatMapGroupsWithState function

2018-10-31 Thread Antonio Murgia - antonio.murg...@studio.unibo.it
Hi all,

I'm currently developing a Spark Structured Streaming job and I'm performing 
flatMapGroupsWithState.

I'm concerned about the laziness of the Iterator[V] that is passed to my custom 
function (func: (K, Iterator[V], GroupState[S]) => Iterator[U]).

Is it ok to collect that iterator (with a toList, for example)? I have a logic 
that is practically impossible to perform on a Iterator, but I do not want to 
break Spark lazy chain, obviously.


Thank you in advance.


#A.M.