Bulk-loader performance

Tulasi Paradarami Wed, 04 Mar 2015 17:42:07 -0800

Hi,

Here are the details of our environment:
Phoenix 4.3
HBase 0.98.6


I'm loading data to a Phoenix table using the csv bulk-loader (after making
some changes to the map(...) method) and it is processing about 16,000 -
20,000 rows/sec. I noticed that the bulk-loader spends upto 40% of the
execution time in the following steps.

//...
csvRecord = csvLineParser.parse(value.toString());
csvUpsertExecutor.execute(ImmutableList.of(csvRecord));
Iterator<Pair<byte[], List<KeyValue>>> uncommittedDataIterator =
PhoenixRuntime.getUncommittedDataIterator(conn, true);
//...

We plan to load up-to 100TB of data and overall performance of the
bulk-loader is not satisfactory.

Could someone comment on the following:
- Is there a way to perform bulk-loading without creating a
PhoenixConnection and performing an upsert + conn.rollback()?
- Some additional details around why bulk-loading is designed this way? A
reference to JIRA with details will help too.
- If I want to bypass csv parsing and uncommittedDataIterator, are there
any Phoenix APIs that can be used for creating the output key-values.

Cheers

Bulk-loader performance

Reply via email to