Thanks. Repartitioning to a smaller number of partitions seems to fix my
issue, but I'll keep broadcasting in mind (droprows is an integer array
with about 4 million entries).
On Wed, Aug 5, 2015 at 12:34 PM, Philip Weaver philip.wea...@gmail.com
wrote:
How big is droprows?
Try explicitly
How big is droprows?
Try explicitly broadcasting it like this:
val broadcastDropRows = sc.broadcast(dropRows)
val valsrows = ...
.filter(x = !broadcastDropRows.value.contains(x._1))
- Philip
On Wed, Aug 5, 2015 at 11:54 AM, AlexG swift...@gmail.com wrote:
I'm trying to load a 1 Tb file