Re: What tools to use for indexing ES in production?
Well, the sales messages are coming through kafka. We need to extract some info from the database. We can do anything really. I'm just not sure what are the common practice here. It seems to be so many options. what kind of questions am I not asking here? On Monday, February 16, 2015 at 2:30:45 AM UTC-8, Mark Walkom wrote: > > This depends on how the app is made and what options you have to extract > data from it. > > On 16 February 2015 at 20:28, Kevin Liu > > wrote: > >> We want to index our sales records as they come in from our apps. We are >> using quartz job right now, which is really slow and not really real-time. >> >> We will be implementing a message bus soon for firing sales event. >> >> The process is: >> read from a message queue >> grab some extra data from mysql >> do some ETL to construct the document >> index it to ES. >> >> I've been reading about apache spark, ES river, logstash. >> >> My questions is what kind of tools are right for the job here? >> is apache spark an over kill? >> is DIY a better option here? >> what are you guys using? >> >> please advice and point me to the right thing to read. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com . >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/949b0c40-e8b1-4d40-bced-68e5c005c713%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/949b0c40-e8b1-4d40-bced-68e5c005c713%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a11083b-4e9f-4be2-9a8e-83caf34d56b4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
What tools to use for indexing ES in production?
We want to index our sales records as they come in from our apps. We are using quartz job right now, which is really slow and not really real-time. We will be implementing a message bus soon for firing sales event. The process is: read from a message queue grab some extra data from mysql do some ETL to construct the document index it to ES. I've been reading about apache spark, ES river, logstash. My questions is what kind of tools are right for the job here? is apache spark an over kill? is DIY a better option here? what are you guys using? please advice and point me to the right thing to read. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/949b0c40-e8b1-4d40-bced-68e5c005c713%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Schema Design Question
I'm doing research for deploying elasticsearch for production. This is really new to me, but I couldn't find much in google. We need to search sales in elasticsearch. - Sale has bunch of payments, products, and 1 user. - Sale has huge amount of data per day. - Product object in sale might get updated few times a day, such field like name, price..etc. Sale: { // very high write payments: { payment: { id:123, payment_type: "visa" ... }, products: { product: { // this will get update few time per day id: 12345, name: "yellow jacket", ... }, user: { // this will change once a while id: 111, firstname: "kevin", ... } } What is the best way to design this for fast search? I read that there are three ways of doing so: 1. have everything you need in a document, but you'll have to update all the document if anything changes. 2. have only searchable terms in the document. do second query if you need more detail information. 3. have only entity id in the inner objects. very small amount of update is needed since foreign key id don't really change much, but still have to do two query each time to get detail information. I understand this is largely depends on the usage and environments, but is there some kind of best practice for Sale indexes ? I couldn't find any real world elasticsearch schema that I can reference. I really want to see how other people do it. I'm sure this problem been solve before. Can someone please give me their thoughts on this? -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e3dae07f-f6c4-4e0e-86ed-b3c9ab063617%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.