Re: Fastes way to import 100m rows
Hi, i have tried some bulk sizes. BulkSize => Rows per Second 5000 => 500 4000 => 625 2000 => 625 1000 => 550 500 => 500 So i think 4000 is the best value. My docs have about 270 columns. Is it to much? It is a denormalized view. While the imports, the CPU load is around 90%. This could be the bottleneck? Thanks and regards, Andreas Am Montag, 19. Mai 2014 18:51:54 UTC+2 schrieb Itamar Syn-Hershko: > > That doesn't seem right, try making larger bulk sizes. Also, what size is > your docs? > > -- > > Itamar Syn-Hershko > http://code972.com | @synhershko <https://twitter.com/synhershko> > Freelance Developer & Consultant > Author of RavenDB in Action <http://manning.com/synhershko/> > > > On Mon, May 19, 2014 at 7:35 PM, Andreas Hembach > > > wrote: > >> Hi, >> >> i am importing locally. Ok its a Testserver with only 2 CPU's 2.40GHz and >> 4 GB Ram but for testing i only import 100k rows. >> >> Greetings, >> Andreas >> >> Am Montag, 19. Mai 2014 18:17:52 UTC+2 schrieb Itamar Syn-Hershko: >>> >>> That's a very low rate. Are you importing locally or via remote >>> connection? >>> >>> -- >>> >>> Itamar Syn-Hershko >>> http://code972.com | @synhershko <https://twitter.com/synhershko> >>> Freelance Developer & Consultant >>> Author of RavenDB in Action <http://manning.com/synhershko/> >>> >>> >>> On Mon, May 19, 2014 at 7:16 PM, Andreas Hembach wrote: >>> >>>> Hi all, >>>> >>>> i need to import 100m rows. At the moment i use the bulk api with 100 >>>> entries at once (Is that a good or a bad value?). >>>> But i only get ~500 rows/second imported (Is that a lot or a little >>>> more?). Is there a way to import the data faster? >>>> >>>> I set number_of_replicas = 0 and refresh_interval = -1 without a big >>>> difference to the default values. >>>> >>>> Thank you for your help, >>>> Andreas >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to elasticsearc...@googlegroups.com. >>>> >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc% >>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com . >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/c931035b-1925-4345-8029-96b260edca11%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/c931035b-1925-4345-8029-96b260edca11%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/17585b3b-fa20-49b6-8579-699d4097f28a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Fastes way to import 100m rows
Hi, i am importing locally. Ok its a Testserver with only 2 CPU's 2.40GHz and 4 GB Ram but for testing i only import 100k rows. Greetings, Andreas Am Montag, 19. Mai 2014 18:17:52 UTC+2 schrieb Itamar Syn-Hershko: > > That's a very low rate. Are you importing locally or via remote connection? > > -- > > Itamar Syn-Hershko > http://code972.com | @synhershko <https://twitter.com/synhershko> > Freelance Developer & Consultant > Author of RavenDB in Action <http://manning.com/synhershko/> > > > On Mon, May 19, 2014 at 7:16 PM, Andreas Hembach > > > wrote: > >> Hi all, >> >> i need to import 100m rows. At the moment i use the bulk api with 100 >> entries at once (Is that a good or a bad value?). >> But i only get ~500 rows/second imported (Is that a lot or a little more?). >> Is there a way to import the data faster? >> >> I set number_of_replicas = 0 and refresh_interval = -1 without a big >> difference to the default values. >> >> Thank you for your help, >> Andreas >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com . >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c931035b-1925-4345-8029-96b260edca11%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Fastes way to import 100m rows
Hi all, i need to import 100m rows. At the moment i use the bulk api with 100 entries at once (Is that a good or a bad value?). But i only get ~500 rows/second imported (Is that a lot or a little more?). Is there a way to import the data faster? I set number_of_replicas = 0 and refresh_interval = -1 without a big difference to the default values. Thank you for your help, Andreas -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
need help with aggregation and unique counted values
Hi all, i'm new here and have a problem with an query. I hope someone can help me. My Problem: - I have a log with user clicks, the user revenue and there session id's. Now i want to build a data histogram with all counted clicks, the unqiue session ids and the user revenue. My Query: { "query":{ "match_all":{} }, "aggs":{ "log_over_time":{ "date_histogram":{ "field":"dateline", "interval":"month", "format":"-MM" }, "aggs":{ "amount":{ "sum":{ "field":"order_amount" } }, "unique":{ "terms":{ "field":"user_session_id", "size":10 } } } } } } My first approach is to count the "unique" entries. But the response is very very large and limited to 10 entries. Is there a better way to do this? Can i do something like group by value? A big thank you for the help! Greetings, Andreas -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e4172503-ad5b-4bb0-9856-2fd3abb647b1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.