Re: when to use broadcast variables
Broadcast variables need to fit entirely in memory - so that's a pretty good litmus test for whether or not to broadcast a smaller dataset or turn it into an RDD. On Fri, May 2, 2014 at 7:50 AM, Prashant Sharma wrote: > I had like to be corrected on this but I am just trying to say small enough > of the order of few 100 MBs. Imagine the size gets shipped to all nodes, it > can be a GB but not GBs and then depends on the network too. > > Prashant Sharma > > > On Fri, May 2, 2014 at 6:42 PM, Diana Carroll wrote: >> >> Anyone have any guidance on using a broadcast variable to ship data to >> workers vs. an RDD? >> >> Like, say I'm joining web logs in an RDD with user account data. I could >> keep the account data in an RDD or if it's "small", a broadcast variable >> instead. How small is small? Small enough that I know it can easily fit in >> memory on a single node? Some other guideline? >> >> Thanks! >> >> Diana > >
Re: when to use broadcast variables
I had like to be corrected on this but I am just trying to say small enough of the order of few 100 MBs. Imagine the size gets shipped to all nodes, it can be a GB but not GBs and then depends on the network too. Prashant Sharma On Fri, May 2, 2014 at 6:42 PM, Diana Carroll wrote: > Anyone have any guidance on using a broadcast variable to ship data to > workers vs. an RDD? > > Like, say I'm joining web logs in an RDD with user account data. I could > keep the account data in an RDD or if it's "small", a broadcast variable > instead. How small is small? Small enough that I know it can easily fit > in memory on a single node? Some other guideline? > > Thanks! > > Diana >
when to use broadcast variables
Anyone have any guidance on using a broadcast variable to ship data to workers vs. an RDD? Like, say I'm joining web logs in an RDD with user account data. I could keep the account data in an RDD or if it's "small", a broadcast variable instead. How small is small? Small enough that I know it can easily fit in memory on a single node? Some other guideline? Thanks! Diana