Re: spark hangs at broadcasting during a filter

2015-08-06 Thread Alex Gittens
Thanks. Repartitioning to a smaller number of partitions seems to fix my issue, but I'll keep broadcasting in mind (droprows is an integer array with about 4 million entries). On Wed, Aug 5, 2015 at 12:34 PM, Philip Weaver philip.wea...@gmail.com wrote: How big is droprows? Try explicitly

Re: spark hangs at broadcasting during a filter

2015-08-05 Thread Philip Weaver
How big is droprows? Try explicitly broadcasting it like this: val broadcastDropRows = sc.broadcast(dropRows) val valsrows = ... .filter(x = !broadcastDropRows.value.contains(x._1)) - Philip On Wed, Aug 5, 2015 at 11:54 AM, AlexG swift...@gmail.com wrote: I'm trying to load a 1 Tb file