Cool, thanks for the report, Ben. For what it's worth, I think there's
still some low hanging fruit in the Spark connector for Kudu (for example,
I believe locality on reads is currently broken). So, you can expect
performance to continue to improve in future versions. I'd also be
interested to
FYI.
I did a quick-n-dirty performance test.
First, the setup:
QA cluster:
15 data nodes
64GB memory each
HBase is using 4GB of memory
Kudu is using 1GB of memory
1 HBase/Kudu master node
64GB memory
HBase/Kudu master is using 1GB of memory each
10Gb Ethernet
Using Spark on both to load/read