On a single node, you can easily achieve 10s of thousands of key-value inserts per second. Depending on how many columns are in each row, 600 a second is rather slow :)

Your loop looks good. Using a single BatchWriter and letting it amortize sending data from your client to the servers will be the most efficient.

If the JSON parsing is the slowest part, you could consider a single thread reading the file and provide the line to a thread pool, parse the line and add it to some concurrent data structure. You could have a consumer on that data structure reading each parsed object and sending it to Accumulo.

Alternatively, this is where MapReduce is a clear win as it's very good at parallelizing these types of problems. You could use the FileInputFormat and the AccumuloOutputFormat to accomplish this task.

Andrea Leoni wrote:
Thank you for your answer.
Today i tried to create a big command file and push it to shell (about 300k
insert per file). As you said it is too slow for me (about 600 inserted
row/sec)

I'm on Accumulo by just one week. I'm a noob but i'm learning.

Actually my app has to store a large number of data.

The row is the timestamp and the family/qualif are the column... I catch my
data from a JSON file, so my app scan it for new records, parse it and once
for record create a mutation and push it on Accumulo with batchWriter...

Maybe I wrong something that can increase the speed of my inserts.

Actually I:

LOOP
1) read a json line
2) parse it
3) create a mutation
4) put in this mutation the line's information
5) use batchWriter to insert mutation in Accumulo
END LOOP

Is it all right? I now that point 1) and 2) are slow but it's necessary and
i use the fastest json parser i've found online.

Thank you so much again!
(and sorry again for my bad english!)



-----
Andrea Leoni
Italy
Computer Engineering
--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Ingest-speed-tp14005p14013.html
Sent from the Developers mailing list archive at Nabble.com.

Reply via email to