terrytlu created HBASE-29272:
--------------------------------
Summary: When Spark reads an HBase snapshot, it always read empty
data.
Key: HBASE-29272
URL: https://issues.apache.org/jira/browse/HBASE-29272
Project: HBase
Issue Type: Bug
Reporter: terrytlu
Attachments: HbaseSnapshot.java
We found when Spark reads an HBase snapshot, it always read empty data.
This is because
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl.InputSplit#getLength
will always return 0.
As spark will ignore empty splits, which is controlled by
spark.hadoopRDD.ignoreEmptySplits, after spark 3.2.0(SPARK-34809) the default
vaule is true.
So the attachment will always return 0 rows in Spark 3.2.0 even if the hbase
snapshot actually has data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)