Hey,
Your system sounds similar to the work don by Stu Hood at Rackspace in their
Mailtrust unit. See
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-datafor
more details and inspiration.
Regards,
Jeff
On Thu, Jun 4, 2009 at 4:58 PM, silentsurfe...@yahoo.com wrote:
Hi,
This is encouraging to know that solr/lucene solution may work.
Can anyone using solr/lucene for such scenario can confirm that the
solution is used and working fine? That would be really helpful, as I just
started looking into the solr/lucene solution only couple of days back and
might be difficult to be 100% confident before proposing the solution
approach in next couple of days.
Thanks,Surfer
--- On Thu, 6/4/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote:
From: Otis Gospodnetic otis_gospodne...@yahoo.com
Subject: Re: Questions regarding IT search solution
To:
solr-user@lucene.apache.org
Date: Thursday, June 4, 2009, 10:26 PM
My guess is Solr/Lucene would work. Not sure how well/fast, but it would,
esp. if you avoid range queries (or use tdate), and esp. if you
shard/segment indices smartly, so that at query time you send (or distribute
if you have to) the query to only those shards that have the data (if your
query is for a limited time period).
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Silent Surfer silentsurfe...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Thursday, June 4, 2009 5:52:21 PM
Subject: Re:
Questions regarding IT search solution
Hi,
As Alex correctly pointed out my main intention is to figure out whether
Solr/lucene offer functionalities to replicate what Splunk is doing in
terms of
building indexes etc for enabling search capabilities.
We evaluated Splunk, but it is not very cost effective solution for us as
we may
have logs running into few GBs per day as there can be around 25-20
servers
running, and Splunk licensing model is based of size of logs per day that
too,
the license valid for only 1 year.
With this back ground, any further inputs on this are greatly
appreciated.
Thanks,Surfer
--- On Thu, 6/4/09, Alexandre Rafalovitch wrote:
From: Alexandre Rafalovitch
Subject: Re: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Thursday, June 4, 2009, 9:27 PM
I would also be interested to know what other existing solutions exist.
Splunk's advantage is that it does extraction of the fields with
advanced searching functionality (it has lexers/parsers for multiple
content types). I believe that's the Solr's function desired in
original posting. At the time they came out (2004), I was not aware of
any good open source solutions to do what they did. And I would have
loved one, as I was analyzing multi-gigabite logs.
Hadoop might be a way to process the files, but what would do the
indexing and searching?
Regards,
Alex.
On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote:
Why build one? Don't those already exist?
Personally, I'd start with Hadoop instead of Solr. Putting
logs in a
search index is guaranteed to not scale. People were already trying
different approaches ten years ago.
wunder
On 6/4/09 8:41 AM, Silent Surfer wrote:
Hi,
Any help/pointers on the following message would really help me..
Thanks,Surfer
--- On Tue, 6/2/09, Silent Surfer wrote:
From: Silent Surfer
Subject: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Tuesday, June 2, 2009, 5:45 PM
Hi,
I am new to Lucene forum and it is my first question.I need a
clarification
from you.
Requirement:--1. Build a IT search tool for logs
similar to
that of Splunk(Only wrt searching logs but not in terms of reporting,
graphs
etc) using
solr/lucene. The log files are mainly the server logs like JBoss,
Custom application server logs (May or may not be log4j logs) and the
files
size can go potentially upto 100 MB2. The logs are spread across
multiple
servers (25 to 30 servers)2. Capability to be do search almost
realtime3.
Support distributed search
Our search criterion can be based on a keyword or timestamp or IP
address
etc.
Can anyone throw some light if solr/lucene is right solution for this
?
Appreciate any quick help in this regard.
Thanks,Surfer