If you use Kudu API and set flush mode for a session to anything but AUTO_FLUSH_SYNC, those inserts will be accumulated into batches at the client side and sent to the corresponding tablet servers in chunks. Consider using the AUTO_FLUSH_BACKGROUND mode while working with KuduSession API (using MANUAL_FLUSH would require you to flush those batches manually before the size of the accumulated data reaches the max allowed size, which is configurable).

Also, if the lines in your file(s) contain data for independent rows (i.e. you are not expecting to perform upserts for some lines), you could split those lines into ranges (e.g., 0 -- 999999, 100000 -- 199999, etc.) and run multiple Kudu sessions (one per line range in the file) in parallel.

Hope this helps.


Best regards,

Alexey



On 7/10/17 7:54 PM, sky wrote:
Hi,
     If load  data from a csv file, I can only traverse the file, one by one 
insert through the API ?






At 2017-07-10 22:40:05, "Jean-Daniel Cryans" <jdcry...@gmail.com> wrote:
(sending to user@ and putting dev@ in bcc)

Hi,

Kudu by itself doesn't really have file loading capabilities, you'd have to
write your own code that reads a file and then uses either the Java or C++
API to insert the data.

Hope this helps,

J-D

On Mon, Jul 10, 2017 at 1:55 AM, sky <x_h...@163.com> wrote:

Hi all,
     Kudu how to load data from a file?  I know that kudu can insert data
from impala , but is there any other way? Not through impala, executed by
kudu alone.
     Thanks.

Reply via email to