Re: What tools to use for indexing ES in production?

2015-02-19 Thread Kevin Liu
Well, the sales messages are coming through kafka. We need to extract some 
info from the database. We can do anything really. I'm just not sure what 
are the common practice here. It seems to be so many options. what kind of 
questions am I not asking here? 

On Monday, February 16, 2015 at 2:30:45 AM UTC-8, Mark Walkom wrote:
>
> This depends on how the app is made and what options you have to extract 
> data from it.
>
> On 16 February 2015 at 20:28, Kevin Liu 
> > wrote:
>
>> We want to index our sales records as they come in from our apps. We are 
>> using quartz job right now, which is really slow and not really real-time. 
>>
>> We will be implementing a message bus soon for firing sales event. 
>>
>> The process is: 
>> read from a message queue
>> grab some extra data from mysql
>> do some ETL to construct the document
>> index it to ES. 
>>
>> I've been reading about apache spark, ES river, logstash. 
>>
>> My questions is what kind of tools are right for the job here? 
>> is apache spark an over kill? 
>> is DIY a better option here? 
>> what are you guys using? 
>>
>> please advice and point me to the right thing to read. 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/949b0c40-e8b1-4d40-bced-68e5c005c713%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/949b0c40-e8b1-4d40-bced-68e5c005c713%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4a11083b-4e9f-4be2-9a8e-83caf34d56b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


What tools to use for indexing ES in production?

2015-02-16 Thread Kevin Liu
We want to index our sales records as they come in from our apps. We are 
using quartz job right now, which is really slow and not really real-time. 

We will be implementing a message bus soon for firing sales event. 

The process is: 
read from a message queue
grab some extra data from mysql
do some ETL to construct the document
index it to ES. 

I've been reading about apache spark, ES river, logstash. 

My questions is what kind of tools are right for the job here? 
is apache spark an over kill? 
is DIY a better option here? 
what are you guys using? 

please advice and point me to the right thing to read. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/949b0c40-e8b1-4d40-bced-68e5c005c713%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Schema Design Question

2014-12-09 Thread Kevin Liu
I'm doing research for deploying elasticsearch for production. This is 
really new to me, but I couldn't find much in google.

We need to search sales in elasticsearch. 
- Sale has bunch of payments, products, and 1 user. 
- Sale has huge amount of data per day.
- Product object in sale might get updated few times a day, such field like 
name, price..etc. 

Sale: { // very high write
  payments: {
payment: {
  id:123,
  payment_type: "visa"
  ...
},
products: { 
  product: { // this will get update few time per day
id: 12345,
name: "yellow jacket",
...
   },
   user: { // this will change once a while
 id: 111,
 firstname: "kevin",
 ...
  }
}

What is the best way to design this for fast search? I read that there are 
three ways of doing so:
1. have everything you need in a document, but you'll have to update all 
the document if anything changes.
2. have only searchable terms in the document. do second query if you need 
more detail information.
3. have only entity id in the inner objects. very small amount of update is 
needed since foreign key id don't really change much, but still have to do 
two query each time to get detail information.

I understand this is largely depends on the usage and environments, but is 
there some kind of best practice for Sale indexes ? I couldn't find any 
real world elasticsearch schema that I can reference. I really want to see 
how other people do it. I'm sure this problem been solve before. 

Can someone please give me their thoughts on this? 


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e3dae07f-f6c4-4e0e-86ed-b3c9ab063617%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.