Re: Fastes way to import 100m rows

2014-05-19 Thread Andreas Hembach
Hi,

i have tried some bulk sizes. 

BulkSize => Rows per Second
5000   => 500
4000   => 625
2000   => 625
1000   => 550
500 => 500

So i think 4000 is the best value. My docs have about 270 columns. Is it to 
much? It is a denormalized view.

While the imports, the CPU load is around 90%. This could be the bottleneck?

Thanks and regards,
Andreas
Am Montag, 19. Mai 2014 18:51:54 UTC+2 schrieb Itamar Syn-Hershko:
>
> That doesn't seem right, try making larger bulk sizes. Also, what size is 
> your docs?
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Author of RavenDB in Action <http://manning.com/synhershko/>
>
>
> On Mon, May 19, 2014 at 7:35 PM, Andreas Hembach 
> 
> > wrote:
>
>> Hi,
>>
>> i am importing locally. Ok its a Testserver with only 2 CPU's 2.40GHz and 
>> 4 GB Ram but for testing i only import 100k rows.
>>
>> Greetings,
>> Andreas
>>
>> Am Montag, 19. Mai 2014 18:17:52 UTC+2 schrieb Itamar Syn-Hershko:
>>>
>>> That's a very low rate. Are you importing locally or via remote 
>>> connection?
>>>
>>> --
>>>
>>> Itamar Syn-Hershko
>>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>>> Freelance Developer & Consultant
>>> Author of RavenDB in Action <http://manning.com/synhershko/>
>>>
>>>
>>> On Mon, May 19, 2014 at 7:16 PM, Andreas Hembach wrote:
>>>
>>>>  Hi all,
>>>>
>>>> i need to import 100m rows. At the moment i use the bulk api with 100 
>>>> entries at once (Is that a good or a bad value?).
>>>> But i only get ~500 rows/second imported (Is that a lot or a little 
>>>> more?). Is there a way to import the data faster?
>>>>
>>>> I set number_of_replicas = 0 and refresh_interval = -1 without a big 
>>>> difference to the default values.
>>>>
>>>>  Thank you for your help,
>>>> Andreas
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>>
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%
>>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/c931035b-1925-4345-8029-96b260edca11%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/c931035b-1925-4345-8029-96b260edca11%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/17585b3b-fa20-49b6-8579-699d4097f28a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Fastes way to import 100m rows

2014-05-19 Thread Andreas Hembach
Hi,

i am importing locally. Ok its a Testserver with only 2 CPU's 2.40GHz and 4 
GB Ram but for testing i only import 100k rows.

Greetings,
Andreas

Am Montag, 19. Mai 2014 18:17:52 UTC+2 schrieb Itamar Syn-Hershko:
>
> That's a very low rate. Are you importing locally or via remote connection?
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Author of RavenDB in Action <http://manning.com/synhershko/>
>
>
> On Mon, May 19, 2014 at 7:16 PM, Andreas Hembach 
> 
> > wrote:
>
>> Hi all,
>>
>> i need to import 100m rows. At the moment i use the bulk api with 100 
>> entries at once (Is that a good or a bad value?).
>> But i only get ~500 rows/second imported (Is that a lot or a little more?). 
>> Is there a way to import the data faster?
>>
>> I set number_of_replicas = 0 and refresh_interval = -1 without a big 
>> difference to the default values.
>>
>>  Thank you for your help,
>> Andreas
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c931035b-1925-4345-8029-96b260edca11%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Fastes way to import 100m rows

2014-05-19 Thread Andreas Hembach
Hi all,

i need to import 100m rows. At the moment i use the bulk api with 100 
entries at once (Is that a good or a bad value?).
But i only get ~500 rows/second imported (Is that a lot or a little more?). Is 
there a way to import the data faster?

I set number_of_replicas = 0 and refresh_interval = -1 without a big 
difference to the default values.

Thank you for your help,
Andreas

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


need help with aggregation and unique counted values

2014-03-18 Thread Andreas Hembach
Hi all,

i'm new here and have a problem with an query. I hope someone can help me.

My Problem:
- I have a log with user clicks, the user revenue and there session id's. 
Now i want to build a data histogram with all counted clicks, the unqiue 
session ids and the user revenue.

My Query:
{
   "query":{
  "match_all":{}
   },
   "aggs":{
  "log_over_time":{
 "date_histogram":{
"field":"dateline",
"interval":"month",
"format":"-MM"
 },
 "aggs":{
"amount":{
   "sum":{
  "field":"order_amount"
   }
},
"unique":{
   "terms":{
  "field":"user_session_id",
  "size":10
   }
}
 }
  }
   }
}

My first approach is to count the "unique" entries. But the response is 
very very large and limited to 10 entries.

Is there a better way to do this? Can i do something like group by value?

A big thank you for the help!

Greetings,
Andreas

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e4172503-ad5b-4bb0-9856-2fd3abb647b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.