subject:"Spark UNEVENLY distributing data"

Re: Spark UNEVENLY distributing data

2018-05-22 Thread Saad Mufti

I think TableInputFormat will try to maintain as much locality as possible, assigning one Spark partition per region and trying to assign that partition to a YARN container/executor on the same node (assuming you're using Spark over YARN). So the reason for the uneven distribution could be that

Spark UNEVENLY distributing data

2018-05-19 Thread Alchemist

I am trying to parallelize a simple Spark program processes HBASE data in parallel.// Get Hbase RDD JavaPairRDD hBaseRDD = jsc .newAPIHadoopRDD(conf, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); long