Hi, Here are the details of our environment: Phoenix 4.3 HBase 0.98.6
I'm loading data to a Phoenix table using the csv bulk-loader (after making some changes to the map(...) method) and it is processing about 16,000 - 20,000 rows/sec. I noticed that the bulk-loader spends upto 40% of the execution time in the following steps. //... csvRecord = csvLineParser.parse(value.toString()); csvUpsertExecutor.execute(ImmutableList.of(csvRecord)); Iterator<Pair<byte[], List<KeyValue>>> uncommittedDataIterator = PhoenixRuntime.getUncommittedDataIterator(conn, true); //... We plan to load up-to 100TB of data and overall performance of the bulk-loader is not satisfactory. Could someone comment on the following: - Is there a way to perform bulk-loading without creating a PhoenixConnection and performing an upsert + conn.rollback()? - Some additional details around why bulk-loading is designed this way? A reference to JIRA with details will help too. - If I want to bypass csv parsing and uncommittedDataIterator, are there any Phoenix APIs that can be used for creating the output key-values. Cheers
