Re: Questions regarding IT search solution
Hi Jeff, Thanks for the link. You are my lifesaver :)This is exactly simillar to what I am looking for. Thanks,Surfer --- On Fri, 6/5/09, Jeff Hammerbacher ham...@cloudera.com wrote: From: Jeff Hammerbacher ham...@cloudera.com Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org, silentsurfe...@yahoo.com Date: Friday, June 5, 2009, 12:15 AM Hey, Your system sounds similar to the work don by Stu Hood at Rackspace in their Mailtrust unit. See http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-datafor more details and inspiration. Regards, Jeff On Thu, Jun 4, 2009 at 4:58 PM, silentsurfe...@yahoo.com wrote: Hi, This is encouraging to know that solr/lucene solution may work. Can anyone using solr/lucene for such scenario can confirm that the solution is used and working fine? That would be really helpful, as I just started looking into the solr/lucene solution only couple of days back and might be difficult to be 100% confident before proposing the solution approach in next couple of days. Thanks,Surfer --- On Thu, 6/4/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: From: Otis Gospodnetic otis_gospodne...@yahoo.com Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 10:26 PM My guess is Solr/Lucene would work. Not sure how well/fast, but it would, esp. if you avoid range queries (or use tdate), and esp. if you shard/segment indices smartly, so that at query time you send (or distribute if you have to) the query to only those shards that have the data (if your query is for a limited time period). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Silent Surfer silentsurfe...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 5:52:21 PM Subject: Re: Questions regarding IT search solution Hi, As Alex correctly pointed out my main intention is to figure out whether Solr/lucene offer functionalities to replicate what Splunk is doing in terms of building indexes etc for enabling search capabilities. We evaluated Splunk, but it is not very cost effective solution for us as we may have logs running into few GBs per day as there can be around 25-20 servers running, and Splunk licensing model is based of size of logs per day that too, the license valid for only 1 year. With this back ground, any further inputs on this are greatly appreciated. Thanks,Surfer --- On Thu, 6/4/09, Alexandre Rafalovitch wrote: From: Alexandre Rafalovitch Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 9:27 PM I would also be interested to know what other existing solutions exist. Splunk's advantage is that it does extraction of the fields with advanced searching functionality (it has lexers/parsers for multiple content types). I believe that's the Solr's function desired in original posting. At the time they came out (2004), I was not aware of any good open source solutions to do what they did. And I would have loved one, as I was analyzing multi-gigabite logs. Hadoop might be a way to process the files, but what would do the indexing and searching? Regards, Alex. On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote: Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer wrote: Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer wrote: From: Silent Surfer Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Re: Questions regarding IT search solution
Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote: From: Silent Surfer silentsurfe...@yahoo.com Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Re: Questions regarding IT search solution
Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote: From: Silent Surfer silentsurfe...@yahoo.com Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Re: Questions regarding IT search solution
Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer silentsurfe...@yahoo.com wrote: Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote: From: Silent Surfer silentsurfe...@yahoo.com Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Re: Questions regarding IT search solution
I would also be interested to know what other existing solutions exist. Splunk's advantage is that it does extraction of the fields with advanced searching functionality (it has lexers/parsers for multiple content types). I believe that's the Solr's function desired in original posting. At the time they came out (2004), I was not aware of any good open source solutions to do what they did. And I would have loved one, as I was analyzing multi-gigabite logs. Hadoop might be a way to process the files, but what would do the indexing and searching? Regards, Alex. On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwunderw...@netflix.com wrote: Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer silentsurfe...@yahoo.com wrote: Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote: From: Silent Surfer silentsurfe...@yahoo.com Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Re: Questions regarding IT search solution
Hi, As Alex correctly pointed out my main intention is to figure out whether Solr/lucene offer functionalities to replicate what Splunk is doing in terms of building indexes etc for enabling search capabilities. We evaluated Splunk, but it is not very cost effective solution for us as we may have logs running into few GBs per day as there can be around 25-20 servers running, and Splunk licensing model is based of size of logs per day that too, the license valid for only 1 year. With this back ground, any further inputs on this are greatly appreciated. Thanks,Surfer --- On Thu, 6/4/09, Alexandre Rafalovitch arafa...@gmail.com wrote: From: Alexandre Rafalovitch arafa...@gmail.com Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 9:27 PM I would also be interested to know what other existing solutions exist. Splunk's advantage is that it does extraction of the fields with advanced searching functionality (it has lexers/parsers for multiple content types). I believe that's the Solr's function desired in original posting. At the time they came out (2004), I was not aware of any good open source solutions to do what they did. And I would have loved one, as I was analyzing multi-gigabite logs. Hadoop might be a way to process the files, but what would do the indexing and searching? Regards, Alex. On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwunderw...@netflix.com wrote: Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer silentsurfe...@yahoo.com wrote: Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote: From: Silent Surfer silentsurfe...@yahoo.com Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Re: Questions regarding IT search solution
My guess is Solr/Lucene would work. Not sure how well/fast, but it would, esp. if you avoid range queries (or use tdate), and esp. if you shard/segment indices smartly, so that at query time you send (or distribute if you have to) the query to only those shards that have the data (if your query is for a limited time period). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Silent Surfer silentsurfe...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 5:52:21 PM Subject: Re: Questions regarding IT search solution Hi, As Alex correctly pointed out my main intention is to figure out whether Solr/lucene offer functionalities to replicate what Splunk is doing in terms of building indexes etc for enabling search capabilities. We evaluated Splunk, but it is not very cost effective solution for us as we may have logs running into few GBs per day as there can be around 25-20 servers running, and Splunk licensing model is based of size of logs per day that too, the license valid for only 1 year. With this back ground, any further inputs on this are greatly appreciated. Thanks,Surfer --- On Thu, 6/4/09, Alexandre Rafalovitch wrote: From: Alexandre Rafalovitch Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 9:27 PM I would also be interested to know what other existing solutions exist. Splunk's advantage is that it does extraction of the fields with advanced searching functionality (it has lexers/parsers for multiple content types). I believe that's the Solr's function desired in original posting. At the time they came out (2004), I was not aware of any good open source solutions to do what they did. And I would have loved one, as I was analyzing multi-gigabite logs. Hadoop might be a way to process the files, but what would do the indexing and searching? Regards, Alex. On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote: Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer wrote: Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer wrote: From: Silent Surfer Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Re: Questions regarding IT search solution
Hi, This is encouraging to know that solr/lucene solution may work. Can anyone using solr/lucene for such scenario can confirm that the solution is used and working fine? That would be really helpful, as I just started looking into the solr/lucene solution only couple of days back and might be difficult to be 100% confident before proposing the solution approach in next couple of days. Thanks,Surfer --- On Thu, 6/4/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: From: Otis Gospodnetic otis_gospodne...@yahoo.com Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 10:26 PM My guess is Solr/Lucene would work. Not sure how well/fast, but it would, esp. if you avoid range queries (or use tdate), and esp. if you shard/segment indices smartly, so that at query time you send (or distribute if you have to) the query to only those shards that have the data (if your query is for a limited time period). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Silent Surfer silentsurfe...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 5:52:21 PM Subject: Re: Questions regarding IT search solution Hi, As Alex correctly pointed out my main intention is to figure out whether Solr/lucene offer functionalities to replicate what Splunk is doing in terms of building indexes etc for enabling search capabilities. We evaluated Splunk, but it is not very cost effective solution for us as we may have logs running into few GBs per day as there can be around 25-20 servers running, and Splunk licensing model is based of size of logs per day that too, the license valid for only 1 year. With this back ground, any further inputs on this are greatly appreciated. Thanks,Surfer --- On Thu, 6/4/09, Alexandre Rafalovitch wrote: From: Alexandre Rafalovitch Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 9:27 PM I would also be interested to know what other existing solutions exist. Splunk's advantage is that it does extraction of the fields with advanced searching functionality (it has lexers/parsers for multiple content types). I believe that's the Solr's function desired in original posting. At the time they came out (2004), I was not aware of any good open source solutions to do what they did. And I would have loved one, as I was analyzing multi-gigabite logs. Hadoop might be a way to process the files, but what would do the indexing and searching? Regards, Alex. On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote: Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer wrote: Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer wrote: From: Silent Surfer Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Re: Questions regarding IT search solution
Hey, Your system sounds similar to the work don by Stu Hood at Rackspace in their Mailtrust unit. See http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-datafor more details and inspiration. Regards, Jeff On Thu, Jun 4, 2009 at 4:58 PM, silentsurfe...@yahoo.com wrote: Hi, This is encouraging to know that solr/lucene solution may work. Can anyone using solr/lucene for such scenario can confirm that the solution is used and working fine? That would be really helpful, as I just started looking into the solr/lucene solution only couple of days back and might be difficult to be 100% confident before proposing the solution approach in next couple of days. Thanks,Surfer --- On Thu, 6/4/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: From: Otis Gospodnetic otis_gospodne...@yahoo.com Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 10:26 PM My guess is Solr/Lucene would work. Not sure how well/fast, but it would, esp. if you avoid range queries (or use tdate), and esp. if you shard/segment indices smartly, so that at query time you send (or distribute if you have to) the query to only those shards that have the data (if your query is for a limited time period). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Silent Surfer silentsurfe...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, June 4, 2009 5:52:21 PM Subject: Re: Questions regarding IT search solution Hi, As Alex correctly pointed out my main intention is to figure out whether Solr/lucene offer functionalities to replicate what Splunk is doing in terms of building indexes etc for enabling search capabilities. We evaluated Splunk, but it is not very cost effective solution for us as we may have logs running into few GBs per day as there can be around 25-20 servers running, and Splunk licensing model is based of size of logs per day that too, the license valid for only 1 year. With this back ground, any further inputs on this are greatly appreciated. Thanks,Surfer --- On Thu, 6/4/09, Alexandre Rafalovitch wrote: From: Alexandre Rafalovitch Subject: Re: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Thursday, June 4, 2009, 9:27 PM I would also be interested to know what other existing solutions exist. Splunk's advantage is that it does extraction of the fields with advanced searching functionality (it has lexers/parsers for multiple content types). I believe that's the Solr's function desired in original posting. At the time they came out (2004), I was not aware of any good open source solutions to do what they did. And I would have loved one, as I was analyzing multi-gigabite logs. Hadoop might be a way to process the files, but what would do the indexing and searching? Regards, Alex. On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote: Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, Silent Surfer wrote: Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer wrote: From: Silent Surfer Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer
Questions regarding IT search solution
Hi, I am new to Lucene forum and it is my first question.I need a clarification from you. Requirement:--1. Build a IT search tool for logs similar to that of Splunk(Only wrt searching logs but not in terms of reporting, graphs etc) using solr/lucene. The log files are mainly the server logs like JBoss, Custom application server logs (May or may not be log4j logs) and the files size can go potentially upto 100 MB2. The logs are spread across multiple servers (25 to 30 servers)2. Capability to be do search almost realtime3. Support distributed search Our search criterion can be based on a keyword or timestamp or IP address etc. Can anyone throw some light if solr/lucene is right solution for this ? Appreciate any quick help in this regard. Thanks,Surfer Thanks,Tiru