I'm executing a load process into HBase with spark. (around 150M record). At the end of the process there are a lot of fail tasks.
I get this error: 19/05/28 11:02:31 ERROR client.AsyncProcess: Failed to get region location org.apache.hadoop.hbase.TableNotFoundException: my_table at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1417) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1211) at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:410) at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:359) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:238) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:146) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1092) at example.bigdata.v360.dsl.UpsertDsl$$anonfun$writeToHBase$1.apply(UpsertDsl.scala:25) at example.bigdata.v360.dsl.UpsertDsl$$anonfun$writeToHBase$1.apply(UpsertDsl.scala:19) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) When I execute from the hbase shell an scan, it works. Which could it be the reason? I'm not sure if it's more a error from HBase or Spark.