Re: Bulk-loader performance

Gabriel Reid Thu, 05 Mar 2015 07:47:46 -0800

Hi Tulasi,

Answers (and questions) inlined below:

On Thu, Mar 5, 2015 at 2:41 AM Tulasi Paradarami <[email protected]>
wrote:

> Hi,
>
> Here are the details of our environment:
> Phoenix 4.3
> HBase 0.98.6
>
> I'm loading data to a Phoenix table using the csv bulk-loader (after making
> some changes to the map(...) method) and it is processing about 16,000 -
> 20,000 rows/sec. I noticed that the bulk-loader spends upto 40% of the
> execution time in the following steps.

> //...
> csvRecord = csvLineParser.parse(value.toString());
> csvUpsertExecutor.execute(ImmutableList.of(csvRecord));
> Iterator<Pair<byte[], List<KeyValue>>> uncommittedDataIterator =
> PhoenixRuntime.getUncommittedDataIterator(conn, true);
> //...
>

The non-code translation of those steps is:
1. Parse the CSV record
2. Convert the contents of the CSV record into KeyValues

Although it may look as though data is being written over the wire to
Phoenix, the execution of an upsert executor and retrieval of the
uncommitted KeyValues is all local (in memory). The code is implemented in
this way because JDBC is the general API used within Phoenix -- there isn't
direct "convert fields to Phoenix encoding" API, although this is doing the
equivalent operation.

Could you give some more information on your performance numbers? For
example, is this the throughput that you're getting in a single process, or
over a number of processes? If so, how many processes? Also, how many
columns are in the records that you're loading?

>
> We plan to load up-to 100TB of data and overall performance of the
> bulk-loader is not satisfactory.
>

How many records are in that 100TB? What is the current (projected) time
required to load the data? What is the minimum allowable ingest speed to be
considered satisfactory?

- Gabriel

Re: Bulk-loader performance

Reply via email to