Re: SOLR for Log analysis feasibility
my thoughts exactly that it may seem fairly straightforward but i fear for when a client wants a perfectly reasonable new feature to be added to their report and SOLR simply cannot support this feature. i am hoping we wont have any real issues with scalability as Loggly because we dont index and store large documents of data within SOLR. Most of our documents will be very small. Does anyone have any experience with using field collapsing in a production environment? thank you for all your replies. Joe -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-for-Log-analysis-feasibility-tp1992202p1998360.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR for Log analysis feasibility
We do a lot of precisely this sort of thing. Ours is a commercial product (Honeycomb Lexicon) that extracts behavioural information from logs, events and network data (don't worry, I'm not pushing this on you!) - only to say that there are a lot of considerations beyond base Solr when it comes to handling log, event and other 'transient' data streams. Aside from the obvious issues of horizontal scaling, reliable delivery/retry/replication etc., there are other important issues, particularly with regards data classification, reporting engines and numerous other items. It's one of those things that's sounds perfectly reasonable at the outset, but all sorts of things crop up the deeper you get into it. Peter On Tue, Nov 30, 2010 at 11:44 AM, phoey wrote: > > We are looking into building a reporting feature and investigating solutions > which will allow us to search though our logs for downloads, searches and > view history. > > Each log item is relatively small > > download history > > > > item123-v1 > photography > item 1 > 1 > 1 > hires > 123 > 2009-11-07T14:50:54Z > > > > search history > > > > 1 > brand assets > 1 > 2009-11-07T14:50:54Z > > > > view history > > > > 1 > 123 > 1 > 2009-11-07T14:50:54Z > > > > > and we reckon that we could have around 10 - 30 million log records for each > type (downloads, searches, views) so 70 million records in total but > obviously must scale higher. > > concurrent users will be around 10 - 20 (relatively low) > > new logs will be imported as a batch overnight. > > Because we have some previous experience with SOLR and because the interface > needs to have full-text searching and filtering we built a prototype using > SOLR 4.0. We used the new field collapsing feature within SOLR 4.0 to > collapse on groups of data. For example view History needs to collapse on > itemId. Each row will then show the frequency on how many views the item has > had. This is achieved by the number of items which have been grouped. > > The requirements for the solution is to be schemaless to allow adding new > fields to new documents easier, and have a powerful search interface, both > which SOLR can do. > > QUESTIONS > > Our prototype is working as expected but im unsure if > > 1. has anyone got experience with using SOLR for log analysis. > 2. SOLR can scale but when is the limit when i should start considering > about sharding the index. It should be fine with 100+ million records. > 3. We are using a nightly build of SOLR for the "field collapsing" feature. > Would it be possible to patch SOLR 1.4.1 with the SOLR-236 patch? has anyone > used this in production? > > thanks > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-for-Log-analysis-feasibility-tp1992202p1992202.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: SOLR for Log analysis feasibility
i know, it's not solr .. but perhaps you should have a look at it: http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/ On Tue, Nov 30, 2010 at 12:58 PM, Peter Karich wrote: > take a look into this: > http://vimeo.com/16102543 > > for that amount of data it isn't that easy :-) > > > We are looking into building a reporting feature and investigating >> solutions >> which will allow us to search though our logs for downloads, searches and >> view history. >> >> Each log item is relatively small >> >> download history >> >> >> >>item123-v1 >>photography >>item 1 >>1 >>1 >>hires >>123 >>2009-11-07T14:50:54Z >> >> >> >> search history >> >> >> >>1 >>brand assets >>1 >>2009-11-07T14:50:54Z >> >> >> >> view history >> >> >> >>1 >>123 >>1 >>2009-11-07T14:50:54Z >> >> >> >> >> and we reckon that we could have around 10 - 30 million log records for >> each >> type (downloads, searches, views) so 70 million records in total but >> obviously must scale higher. >> >> concurrent users will be around 10 - 20 (relatively low) >> >> new logs will be imported as a batch overnight. >> >> Because we have some previous experience with SOLR and because the >> interface >> needs to have full-text searching and filtering we built a prototype using >> SOLR 4.0. We used the new field collapsing feature within SOLR 4.0 to >> collapse on groups of data. For example view History needs to collapse on >> itemId. Each row will then show the frequency on how many views the item >> has >> had. This is achieved by the number of items which have been grouped. >> >> The requirements for the solution is to be schemaless to allow adding new >> fields to new documents easier, and have a powerful search interface, both >> which SOLR can do. >> >> QUESTIONS >> >> Our prototype is working as expected but im unsure if >> >> 1. has anyone got experience with using SOLR for log analysis. >> 2. SOLR can scale but when is the limit when i should start considering >> about sharding the index. It should be fine with 100+ million records. >> 3. We are using a nightly build of SOLR for the "field collapsing" >> feature. >> Would it be possible to patch SOLR 1.4.1 with the SOLR-236 patch? has >> anyone >> used this in production? >> >> thanks >> > > > -- > http://jetwick.com twitter search prototype > >
Re: SOLR for Log analysis feasibility
take a look into this: http://vimeo.com/16102543 for that amount of data it isn't that easy :-) We are looking into building a reporting feature and investigating solutions which will allow us to search though our logs for downloads, searches and view history. Each log item is relatively small download history item123-v1 photography item 1 1 1 hires 123 2009-11-07T14:50:54Z search history 1 brand assets 1 2009-11-07T14:50:54Z view history 1 123 1 2009-11-07T14:50:54Z and we reckon that we could have around 10 - 30 million log records for each type (downloads, searches, views) so 70 million records in total but obviously must scale higher. concurrent users will be around 10 - 20 (relatively low) new logs will be imported as a batch overnight. Because we have some previous experience with SOLR and because the interface needs to have full-text searching and filtering we built a prototype using SOLR 4.0. We used the new field collapsing feature within SOLR 4.0 to collapse on groups of data. For example view History needs to collapse on itemId. Each row will then show the frequency on how many views the item has had. This is achieved by the number of items which have been grouped. The requirements for the solution is to be schemaless to allow adding new fields to new documents easier, and have a powerful search interface, both which SOLR can do. QUESTIONS Our prototype is working as expected but im unsure if 1. has anyone got experience with using SOLR for log analysis. 2. SOLR can scale but when is the limit when i should start considering about sharding the index. It should be fine with 100+ million records. 3. We are using a nightly build of SOLR for the "field collapsing" feature. Would it be possible to patch SOLR 1.4.1 with the SOLR-236 patch? has anyone used this in production? thanks -- http://jetwick.com twitter search prototype