Re: Query time only Ranges

2010-03-31 Thread Silent Surfer
Small typo..Corrected and sending..

the query - [01/01/1900T16:43:42Z/HOUR TO 01/01/1900T18:55:23Z/HOUR ] 
would be equivalent to 
[01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z]


Thx,
Tiru


- Original Message 
From: Silent Surfer 
To: solr-user@lucene.apache.org
Sent: Wed, March 31, 2010 12:36:22 PM
Subject: Re: Query time only Ranges

Hi Ankit,

Try the following approach.
create a query like - [01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ]

Solr will automatically will take care of Rounding off to the HOUR specified.

For eg:
the query - [01/01/1900T16:43:42Z/HOUR TO 01/01/1900T18:55:23Z/HOUR ] 
would be equivalent to 
[01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ]

Regards,
sS


- Original Message 
From: "abhatna...@vantage.com" 
To: solr-user@lucene.apache.org
Sent: Wed, March 31, 2010 9:56:38 AM
Subject: Query time only Ranges


Hi All,

I am working on use case - wherein i need to Query to just time ranges
without date component.

search for docs with between 4pm - 6pm 

Approaches-
create something like - [01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z ] - a
fixed time component

or

create a field for hh only

or may be create a custom field for Time only


Please suggest me which will be a good approach or any other approach if
possible


Ankit



-- 
View this message in context: 
http://n3.nabble.com/Query-time-only-Ranges-tp688831p688831.html
Sent from the Solr - User mailing list archive at Nabble.com.


  


Re: Query time only Ranges

2010-03-31 Thread Silent Surfer
Hi Ankit,

Try the following approach.
create a query like - [01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ]

Solr will automatically will take care of Rounding off to the HOUR specified.

For eg:
the query - [01/01/1900T16:43:42Z/HOUR TO 01/01/1900T18:55:23Z/HOUR ] 
would be equivalent to 
[01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ]

Regards,
sS


- Original Message 
From: "abhatna...@vantage.com" 
To: solr-user@lucene.apache.org
Sent: Wed, March 31, 2010 9:56:38 AM
Subject: Query time only Ranges


Hi All,

I am working on use case - wherein i need to Query to just time ranges
without date component.

search for docs with between 4pm - 6pm 

Approaches-
create something like - [01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z ] - a
fixed time component

or

create a field for hh only

or may be create a custom field for Time only


Please suggest me which will be a good approach or any other approach if
possible


Ankit



-- 
View this message in context: 
http://n3.nabble.com/Query-time-only-Ranges-tp688831p688831.html
Sent from the Solr - User mailing list archive at Nabble.com.



  



Re: DIH - Unable to connect to a DB using JDBC:ODBC bridge

2010-03-31 Thread Silent Surfer
Hi Mitch,

The configuration that you have seems to be perfectly fine .
Could you please let us know what error you are seeing in the logs ?

Also, could you please confirm whether you have the 
mysql-connector-java-5.1.12-bin.jar under the lib folder ? 

Following is my configuration that I used and works perfectly fine



Thanks,
sS


- Original Message 
From: MitchK 
To: solr-user@lucene.apache.org
Sent: Wed, March 31, 2010 12:57:04 AM
Subject: Re: DIH - Unable to connect to a DB using JDBC:ODBC bridge


Hi,

sorry, I have not much experiences in doing this with Solr, but my
data-config.xml looks like:








The "db" at the end of the url stands for the db you want to use. 

Perhaps this helps a little bit.

Kind regards
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/DIH-Unable-to-connect-to-a-DB-using-JDBC-ODBC-bridge-tp686781p687887.html
Sent from the Solr - User mailing list archive at Nabble.com.



  



Re: multicore query via solrJ

2009-10-23 Thread Silent Surfer
Hi Lici,

You may want to try the following snippet

---
SolrServer solr = new 
CommonsHttpSolrServer("http://localhost:8983/solr";); // 

ModifiableSolrParams params = new ModifiableSolrParams();

params.set("wt", "json"); // Can be json,standard..
params.set("rows", RowsToFetch); // Total # of rows to fetch
params.set("start", StartingRow);  // Starting record
params.set("shards", 
"localhost:8983/solr,localhost:8984/solr,localhost:8985/solr"); // Shard URL
.
.
.
params.set("q", queryStr.toString());  // User Query
QueryResponse response = solr.query(params);
SolrDocumentList docs = response.getResults();
---

Thanks,
sS

--- On Fri, 10/23/09, Licinio Fernández Maurelo  
wrote:

> From: Licinio Fernández Maurelo 
> Subject: Re: multicore query via solrJ
> To: solr-user@lucene.apache.org
> Date: Friday, October 23, 2009, 7:30 AM
> As no answer is given, I assume it's
> not possible. It will be great to code
> a method like this
> 
> query(SolrServer,  List)
> 
> 
> 
> El 20 de octubre de 2009 11:21, Licinio Fernández Maurelo
> <
> licinio.fernan...@gmail.com>
> escribió:
> 
> > Hi there,
> > is there any way to perform a multi-core query using
> solrj?
> >
> > P.S.:
> >
> > I know about this syntax:
> > http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1&q=
> > but i'm looking for a more fancy way to do this using
> solrj (something like
> > shards(query) )
> >
> > thx
> >
> >
> >
> > --
> > Lici
> >
> 
> 
> 
> -- 
> Lici
> 






Re: Can we point a Solr server to index directory dynamically at runtime..

2009-09-25 Thread Silent Surfer
Hi Michael,

We are storing all our data in addition to index, as we need to display those 
values to the user. So unfortunately we cannot go with the option stored=false, 
which could have potentially solved our issue.

Appreciate any other pointers/suggestions

Thanks,
sS

--- On Fri, 9/25/09, Michael  wrote:

> From: Michael 
> Subject: Re: Can we point a Solr server to index directory dynamically at  
> runtime..
> To: solr-user@lucene.apache.org
> Date: Friday, September 25, 2009, 2:00 PM
> Are you storing (in addition to
> indexing) your data?  Perhaps you could turn
> off storage on data older than 7 days (requires
> reindexing), thus losing the
> ability to return snippets but cutting down on your storage
> space and server
> count.  I've experienced 10x decrease in space
> requirements and a large
> boost in speed after cutting extraneous storage from Solr
> -- the stored data
> is mixed in with the index data and so it slows down
> searches.
> You could also put all 200G onto one Solr instance rather
> than 10 for >7days
> data, and accept that those searches will be slower.
> 
> Michael
> 
> On Fri, Sep 25, 2009 at 1:34 AM, Silent Surfer 
> wrote:
> 
> > Hi,
> >
> > Thank you Michael and Chris for the response.
> >
> > Today after the mail from Michael, we tested with the
> dynamic loading of
> > cores and it worked well. So we need to go with the
> hybrid approach of
> > Multicore and Distributed searching.
> >
> > As per our testing, we found that a Solr instance with
> 20 GB of
> > index(single index or spread across multiple cores)
> can provide better
> > performance when compared to having a Solr instance
> say 40 (or) 50 GB of
> > index (single index or index spread across cores).
> >
> > So the 200 GB of index on day 1 will be spread across
> 200/20=10 Solr salve
> > instances.
> >
> > On day 2 data, 10 more Solr slave servers are
> required; Cumulative Solr
> > Slave instances = 200*2/20=20
> > ...
> > ..
> > ..
> > On day 30 data, 10 more Solr slave servers are
> required; Cumulative Solr
> > Slave instances = 200*30/20=300
> >
> > So with the above approach, we may need ~300 Solr
> slave instances, which
> > becomes very unmanageable.
> >
> > But we know that most of the queries is for the past 1
> week, i.e we
> > definitely need 70 Solr Slaves containing the last 7
> days worth of data up
> > and running.
> >
> > Now for the rest of the 230 Solr instances, do we need
> to keep it running
> > for the odd query,that can span across the 30 days of
> data (30*200 GB=6 TB
> > data) which can come up only a couple of times a day.
> > This linear increase of Solr servers with the
> retention period doesn't
> > seems to be a very scalable solution.
> >
> > So we are looking for something more simpler approach
> to handle this
> > scenario.
> >
> > Appreciate any further inputs/suggestions.
> >
> > Regards,
> > sS
> >
> > --- On Fri, 9/25/09, Chris Hostetter 
> wrote:
> >
> > > From: Chris Hostetter 
> > > Subject: Re: Can we point a Solr server to index
> directory dynamically
> > at  runtime..
> > > To: solr-user@lucene.apache.org
> > > Date: Friday, September 25, 2009, 4:04 AM
> > > : Using a multicore approach, you
> > > could send a "create a core named
> > > : 'core3weeksold' pointing to
> '/datadirs/3weeksold' "
> > > command to a live Solr,
> > > : which would spin it up on the fly.  Then
> you query
> > > it, and maybe keep it
> > > : spun up until it's not queried for 60 seconds
> or
> > > something, then send a
> > > : "remove core 'core3weeksold' " command.
> > > : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler
> > > .
> > >
> > > something that seems implicit in the question is
> what to do
> > > when the
> > > request spans all of the data ... this is where
> (in theory)
> > > distributed
> > > searching could help you out.
> > >
> > > index each days worth of data into it's own core,
> that
> > > makes it really
> > > easy to expire the old data (just UNLOAD and
> delete an
> > > entire core once
> > > it's more then 30 days old) if your user is only
> searching
> > > "current" dta
> > > then your app can directly query the core
> containing the
> > > most current data
> > > -- but if they want to query the last week, or
> last two
> > > weeks worth of
> > > data, you do a distributed request for all of the
> shards
> > > needed to search
> > > the appropriate amount of data.
> > >
> > > Between the ALIAS and SWAP commands it on the
> CoreAdmin
> > > screen it should
> > > be pretty easy have cores with names like
> > > "today","1dayold","2dayold" so
> > > that your app can configure simple shard params
> for all the
> > > perumations
> > > you'll need to query.
> > >
> > >
> > > -Hoss
> > >
> > >
> >
> >
> >
> >
> >
> >
> 






Re: Can we point a Solr server to index directory dynamically at runtime..

2009-09-24 Thread Silent Surfer
Hi,

Thank you Michael and Chris for the response. 

Today after the mail from Michael, we tested with the dynamic loading of cores 
and it worked well. So we need to go with the hybrid approach of Multicore and 
Distributed searching.

As per our testing, we found that a Solr instance with 20 GB of index(single 
index or spread across multiple cores) can provide better performance when 
compared to having a Solr instance say 40 (or) 50 GB of index (single index or 
index spread across cores).

So the 200 GB of index on day 1 will be spread across 200/20=10 Solr salve 
instances.

On day 2 data, 10 more Solr slave servers are required; Cumulative Solr Slave 
instances = 200*2/20=20
...
..
..
On day 30 data, 10 more Solr slave servers are required; Cumulative Solr Slave 
instances = 200*30/20=300

So with the above approach, we may need ~300 Solr slave instances, which 
becomes very unmanageable.

But we know that most of the queries is for the past 1 week, i.e we definitely 
need 70 Solr Slaves containing the last 7 days worth of data up and running.

Now for the rest of the 230 Solr instances, do we need to keep it running for 
the odd query,that can span across the 30 days of data (30*200 GB=6 TB data) 
which can come up only a couple of times a day.
This linear increase of Solr servers with the retention period doesn't seems to 
be a very scalable solution. 

So we are looking for something more simpler approach to handle this scenario. 

Appreciate any further inputs/suggestions.

Regards,
sS

--- On Fri, 9/25/09, Chris Hostetter  wrote:

> From: Chris Hostetter 
> Subject: Re: Can we point a Solr server to index directory dynamically at  
> runtime..
> To: solr-user@lucene.apache.org
> Date: Friday, September 25, 2009, 4:04 AM
> : Using a multicore approach, you
> could send a "create a core named
> : 'core3weeksold' pointing to '/datadirs/3weeksold' "
> command to a live Solr,
> : which would spin it up on the fly.  Then you query
> it, and maybe keep it
> : spun up until it's not queried for 60 seconds or
> something, then send a
> : "remove core 'core3weeksold' " command.
> : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler
> .
> 
> something that seems implicit in the question is what to do
> when the 
> request spans all of the data ... this is where (in theory)
> distributed 
> searching could help you out.
> 
> index each days worth of data into it's own core, that
> makes it really 
> easy to expire the old data (just UNLOAD and delete an
> entire core once 
> it's more then 30 days old) if your user is only searching
> "current" dta 
> then your app can directly query the core containing the
> most current data 
> -- but if they want to query the last week, or last two
> weeks worth of 
> data, you do a distributed request for all of the shards
> needed to search 
> the appropriate amount of data.
> 
> Between the ALIAS and SWAP commands it on the CoreAdmin
> screen it should 
> be pretty easy have cores with names like
> "today","1dayold","2dayold" so 
> that your app can configure simple shard params for all the
> perumations 
> you'll need to query.
> 
> 
> -Hoss
> 
>







Can we point a Solr server to index directory dynamically at runtime..

2009-09-23 Thread Silent Surfer
Hi,

Is there any way to dynamically point the Solr servers to an index/data 
directories at run time?

We are generating 200 GB worth of index per day and we want to retain the index 
for approximately 1 month. So our idea is to keep the first 1 week of index 
available at anytime for the users i.e have set of Solr servers up and running 
and handle request to get the past 1 week of date. 

But when user tries to query data which is older than 7 days old, we want to 
dynamically point the existing Solr instances to the inactive/dormant indexes 
and get the results.

The main intention is to limit the number of Solr Slave instances and there by 
limit the # of Servers required.

If the index directory and Solr instances are tightly coupled, then most of the 
Solr instances are just up and running and may hardly used, as most of the 
users are mainly interested in past 1 week data and not beyond that.

Any thoughts or any other approaches to tackle this would be greatly 
appreciated.

Thanks,
sS


  



Query regarding incremental index replication

2009-09-09 Thread Silent Surfer
Hi ,

Currently we are using Solr 1.3 and we have the following requirement.

As we need to process very high volumes of documents (of the order of 400 GB 
per day), we are planning to separate indexer(s) and searcher(s), so that there 
won't be performance hit.

Our idea is to have have a set of servers which is used only for indexers for 
index creation and then every 5 mins or so, the index will be copied to the 
searchers(set of solr servers only for querying). For this we tried to use the 
snapshooter,rsysnc etc.

But the problem with this approach is, the same index is present on both the 
indexer and searcher, and hence occupying large FS.

What we need is a mechanism, where in the indexer contains only the index for 
the past 5 mins(last indexing cycle before the snap shooter is run) and the 
searcher should have the accumulated(total) index i.e every 5 mins, we should 
be able to move the entire index from indexer to searcher and so on.

The above scenario is slightly different from master/slave implementation, as 
on master we want only the latest(WIP) index and the slave should contain the 
entire index.

Appreciate if anyone can throw some light on how to achieve this.

Thanks,
sS


  



Re: date field

2009-09-08 Thread Silent Surfer
Hi,

If you are still not went live already, I would suggest to use the long instead 
of date field. According to our testing, search based on date fields are very 
slow when compared to search based on long field.

You can use System.getTimeInMillis() to get the time
When showing it to the user, apply a date formatter.

When taking input from user, let him enter whatever the date he wants to and 
then you can convert to "long" and do your searches based on it.

Experts can pitch in with any other ideas..

Thanks,
sS

--- On Tue, 9/8/09, Gérard Dupont  wrote:

> From: Gérard Dupont 
> Subject: date field
> To: solr-user@lucene.apache.org
> Cc: "Nicolas Bureau" 
> Date: Tuesday, September 8, 2009, 8:51 AM
> Hi all,
> 
> I'm currently facing a little difficulty to index and
> search on date field.
> The indexing is done in the right way (I guess) and I can
> find valid date in
> the field like "2009-05-01T12:45:32Z". However when I'm
> searching the user
> don't always give an exact date. for instance they give
> "2008-05-01" to get
> all documents related to that day.  I can do a trick
> using wildcard but is
> there another way to do it ? Moreover if they give the full
> date string (or
> if I hack the query parser) I can have the full syntax, but
> then the ":"
> annoy me because the Lucene parser does not allow it
> without quotes. Any
> ideas ?
> 
> -- 
> Gérard Dupont
> Information Processing Control and Cognition (IPCC) - EADS
> DS
> http://weblab.forge.ow2.org
> 
> Document & Learning team - LITIS Laboratory
> 


 



Impact of compressed=true attribute (in schema.xml) on Indexing/Query

2009-08-29 Thread Silent Surfer
Hi,

We observed that when we use the setting "compressed=true" the index size is 
around 0.66 times the actual log file, where as if we do not use any 
compressed=true setting, the index size is almost as much as 2.6 times.

Our sample solr document size is approximately 1000 bytes. In addition to the 
text data we have around 9 metadata tags associated to it. 

We need to display all off the metadata values on the GUI, and hence we are 
setting stored=true in our schema.xml

Now the question is, how the compressed=true flag impacts the indexing and 
Querying operations. I am sure that there will be CPU utilization spikes as 
there will be operation of compressing(during indexing) and 
uncompressing(during querying) of the indexed data. I am mainly looking for any 
bench marks for the above scenario.

The expected volumes of the data coming in would be approximately 400 GB of 
data per day, so it is very important for us to evaluate the compressed=true, 
due to the file system utilization and index sizing issues.

Any help would be greatly appreciated..

Thanks,
sS


  



How to reduce the Solr index size..

2009-08-20 Thread Silent Surfer
Hi,

I am newbie to Solr. We recently started using Solr.

We are using Solr to process the server logs. We are creating the indexes for 
each line of the logs, so that users would be able to do a fine grain search 
upto second/ms.

Now what we are observing is , the index size that is being created is almost 
double the size of the actual log size. i.e if the logs size is say 1 MB, the 
actual index size is around 2 MB.

Could anyone let us know what can be done to reduce the index size. Do we need 
to change any configurations/delete any files which are created during the 
indexing processes, but not required for searching..

Our schema is as follows:


   
   
   
   
   
   
   
   
   
   

message field holds the actual logtext.

Thanks,
sS


  



Re: Limit of Index size per machine..

2009-08-05 Thread Silent Surfer

Hi,

We initially went with Hadoop path, but as it is one more software based file 
system on top of the OS file system, we didn't get a buy in from our system 
Engineers. i.e In case if we run into any HDFS issues, SEs won't be supporting 
us :(

Regards,
sS

--- On Thu, 8/6/09, Walter Underwood  wrote:

> From: Walter Underwood 
> Subject: Re: Limit of Index size per machine..
> To: solr-user@lucene.apache.org
> Date: Thursday, August 6, 2009, 5:12 AM
> That is why people don't use search
> engines to manage logs. Look at a  
> Hadoop cluster.
> 
> wunder
> 
> On Aug 5, 2009, at 10:08 PM, Silent Surfer wrote:
> 
> >
> > Hi,
> >
> > That means we need approximately 3000 GB (Index
> Size)/24 GB (RAM) =  
> > 125 servers.
> >
> > It would be very hard to convince my org to go for 125
> servers for  
> > log management of 3 Terabytes of indexes.
> >
> > Has any one used, solr for processing and handling of
> the indexes of  
> > the order of 3 TB ? If so how many servers were used
> for indexing  
> > alone.
> >
> > Thanks,
> > sS
> >
> >
> > --- On Wed, 8/5/09, Ian Connor 
> wrote:
> >
> >> From: Ian Connor 
> >> Subject: Re: Limit of Index size per machine..
> >> To: solr-user@lucene.apache.org
> >> Date: Wednesday, August 5, 2009, 9:38 PM
> >> I try to keep the index directory
> >> size less than the amount of RAM and rely
> >> on the OS to cache as it needs. Linux does a
> pretty good
> >> job here and I am
> >> sure OS X will do a good job also.
> >>
> >> Distributed search here will be your friend so you
> can
> >> chunk it up to a
> >> number of servers to keep your cost down (2GB RAM
> sticks
> >> are much cheaper
> >> than 4GB RAM sticks $20 < $100).
> >>
> >> Ian.
> >>
> >> On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer
>  
> >> >wrote:
> >>
> >>>
> >>> Hi ,
> >>>
> >>> We are planning to use Solr for indexing the
> server
> >> log contents.
> >>> The expected processed log file size per day:
> 100 GB
> >>> We are expecting to retain these indexes for
> 30 days
> >> (100*30 ~ 3 TB).
> >>>
> >>> Can any one provide what would be the optimal
> size of
> >> the index that I can
> >>> store on a single server, without hampering
> the search
> >> performance etc.
> >>>
> >>> We are planning to use OSX server with a
> configuration
> >> of 16 GB (Can go to
> >>> 24 GB).
> >>>
> >>> We need to figure out how many servers are
> required to
> >> handle such amount
> >>> of data..
> >>>
> >>> Any help would be greatly appreciated.
> >>>
> >>> Thanks
> >>> SilentSurfer
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >> -- 
> >> Regards,
> >>
> >> Ian Connor
> >> 1 Leighton St #723
> >> Cambridge, MA 02141
> >> Call Center Phone: +1 (714) 239 3875 (24 hrs)
> >> Fax: +1(770) 818 5697
> >> Skype: ian.connor
> >>
> >
> >
> >
> >
> 
>







Re: Limit of Index size per machine..

2009-08-05 Thread Silent Surfer

Hi,

That means we need approximately 3000 GB (Index Size)/24 GB (RAM) = 125 
servers. 

It would be very hard to convince my org to go for 125 servers for log 
management of 3 Terabytes of indexes. 

Has any one used, solr for processing and handling of the indexes of the order 
of 3 TB ? If so how many servers were used for indexing alone.

Thanks,
sS


--- On Wed, 8/5/09, Ian Connor  wrote:

> From: Ian Connor 
> Subject: Re: Limit of Index size per machine..
> To: solr-user@lucene.apache.org
> Date: Wednesday, August 5, 2009, 9:38 PM
> I try to keep the index directory
> size less than the amount of RAM and rely
> on the OS to cache as it needs. Linux does a pretty good
> job here and I am
> sure OS X will do a good job also.
> 
> Distributed search here will be your friend so you can
> chunk it up to a
> number of servers to keep your cost down (2GB RAM sticks
> are much cheaper
> than 4GB RAM sticks $20 < $100).
> 
> Ian.
> 
> On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer wrote:
> 
> >
> > Hi ,
> >
> > We are planning to use Solr for indexing the server
> log contents.
> > The expected processed log file size per day: 100 GB
> > We are expecting to retain these indexes for 30 days
> (100*30 ~ 3 TB).
> >
> > Can any one provide what would be the optimal size of
> the index that I can
> > store on a single server, without hampering the search
> performance etc.
> >
> > We are planning to use OSX server with a configuration
> of 16 GB (Can go to
> > 24 GB).
> >
> > We need to figure out how many servers are required to
> handle such amount
> > of data..
> >
> > Any help would be greatly appreciated.
> >
> > Thanks
> > SilentSurfer
> >
> >
> >
> >
> >
> 
> 
> -- 
> Regards,
> 
> Ian Connor
> 1 Leighton St #723
> Cambridge, MA 02141
> Call Center Phone: +1 (714) 239 3875 (24 hrs)
> Fax: +1(770) 818 5697
> Skype: ian.connor
> 


  



Limit of Index size per machine..

2009-08-05 Thread Silent Surfer

Hi ,

We are planning to use Solr for indexing the server log contents.
The expected processed log file size per day: 100 GB
We are expecting to retain these indexes for 30 days (100*30 ~ 3 TB).

Can any one provide what would be the optimal size of the index that I can 
store on a single server, without hampering the search performance etc.

We are planning to use OSX server with a configuration of 16 GB (Can go to 24 
GB).

We need to figure out how many servers are required to handle such amount of 
data..

Any help would be greatly appreciated.

Thanks
SilentSurfer


  



Query regarding Solr search options..

2009-06-23 Thread Silent Surfer

Hi,

Can Solr search be customized to provide N number of lines before and after the 
line that contains matches the keyword.

For eg: Suppose i have a document with 10 lines, and 5th line contains the key 
word 'X' I am interested in. Now if I am fire a Solr search for the keyword 
'X'. Is there any preference/option available in Solr, which can be set so the 
search results contains only the 3 lines above and 3 lines after the line where 
the Keyword match successfully.

Thanks,
Silent Surfer


  


Re: Questions regarding IT search solution

2009-06-08 Thread Silent Surfer
Hi Jeff,
Thanks for the link.  You are my lifesaver :)This is exactly simillar to what I 
am looking for.
Thanks,Surfer

--- On Fri, 6/5/09, Jeff Hammerbacher  wrote:

From: Jeff Hammerbacher 
Subject: Re: Questions regarding IT search solution
To: solr-user@lucene.apache.org, silentsurfe...@yahoo.com
Date: Friday, June 5, 2009, 12:15 AM

Hey,

Your system sounds similar to the work don by Stu Hood at Rackspace in their
Mailtrust unit. See
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-datafor
more details and inspiration.

Regards,
Jeff

On Thu, Jun 4, 2009 at 4:58 PM,  wrote:

> Hi,
> This is encouraging to know that solr/lucene solution may work.
> Can anyone using solr/lucene for such scenario can confirm that the
> solution is used and working fine? That would be really helpful, as I just
> started looking into the solr/lucene solution only couple of days back and
> might be difficult to be 100% confident before proposing the solution
> approach in next couple of days.
> Thanks,Surfer
>
> --- On Thu, 6/4/09, Otis Gospodnetic  wrote:
>
> From: Otis Gospodnetic 
> Subject: Re: Questions regarding IT search solution
> To:
>  solr-user@lucene.apache.org
> Date: Thursday, June 4, 2009, 10:26 PM
>
>
> My guess is Solr/Lucene would work.  Not sure how well/fast, but it would,
> esp. if you avoid range queries (or use tdate), and esp. if you
> shard/segment indices smartly, so that at query time you send (or distribute
> if you have to) the query to only those shards that have the data (if your
> query is for a limited time period).
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: Silent Surfer 
> > To: solr-user@lucene.apache.org
> > Sent: Thursday, June 4, 2009 5:52:21 PM
> > Subject: Re:
>  Questions regarding IT search solution
> >
> > Hi,
> > As Alex correctly pointed out my main intention is to figure out whether
> > Solr/lucene offer functionalities to replicate what Splunk is doing in
> terms of
> > building indexes etc for enabling search capabilities.
> > We evaluated Splunk, but it is not very cost effective solution for us as
> we may
> > have logs running into few GBs per day as there can be around 25-20
> servers
> > running, and Splunk licensing model is based of size of logs per day that
> too,
> > the license valid for only 1 year.
> > With this back ground, any further inputs on this are greatly
> appreciated.
> > Thanks,Surfer
> >
> > --- On Thu, 6/4/09, Alexandre Rafalovitch wrote:
> >
> > From: Alexandre Rafalovitch
> > Subject: Re: Questions regarding IT search solution
> > To: solr-user@lucene.apache.org
> > Date: Thursday, June 4, 2009, 9:27 PM
> >
> > I would also be interested to know what other existing solutions exist.
> >
> > Splunk's advantage is that it does extraction of the fields with
> > advanced searching functionality (it has lexers/parsers for multiple
> > content types). I believe that's the Solr's function desired in
> > original posting. At the time they came out (2004), I was not aware of
> > any good open source solutions to do what they did. And I would have
> > loved one, as I was analyzing multi-gigabite logs.
> >
> > Hadoop might be a way to process the files, but what would do the
> > indexing and searching?
> >
> > Regards,
> >     Alex.
> >
> > On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote:
> > > Why build one? Don't those already exist?
> > >
> > > Personally, I'd start with Hadoop instead of Solr. Putting
>  logs in a
> > > search index is guaranteed to not scale. People were already trying
> > > different approaches ten years ago.
> > >
> > > wunder
> > >
> > > On 6/4/09 8:41 AM, "Silent Surfer" wrote:
> > >
> > >> Hi,
> > >> Any help/pointers on the following message would really help me..
> > >> Thanks,Surfer
> > >>
> > >> --- On Tue, 6/2/09, Silent Surfer wrote:
> > >>
> > >> From: Silent Surfer
> > >> Subject: Questions regarding IT search solution
> > >> To: solr-user@lucene.apache.org
> > >> Date: Tuesday, June 2, 2009, 5:45 PM
> > >>
> > >> Hi,
> > >> I am new to Lucene forum and it is my first question.I need a
> clarification
> > >> from you.
> > >> Requirement:--1. Build a IT search tool for logs
> similar to
> > >> that of Splunk(Only wrt searching logs but not in terms of reporting,
> graphs
> > >> etc) using
>  solr/lucene. The log files are mainly the server logs like JBoss,
> > >> Custom application server logs (May or may not be log4j logs) and the
> files
> > >> size can go potentially upto 100 MB2. The logs are spread across
> multiple
> > >> servers (25 to 30 servers)2. Capability to be do search almost
> realtime3.
> > >> Support  distributed search
> > >>
> > >> Our search criterion can be based on a keyword or timestamp or IP
> address
> > etc.
> > >> Can anyone throw some light if solr/lucene is right solution for this
> ?
> > >> Appreciate any quick help in this regard.
> > >> Thanks,Surfer
>
>
>
>
>
>



  

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer
Hi,
As Alex correctly pointed out my main intention is to figure out whether 
Solr/lucene offer functionalities to replicate what Splunk is doing in terms of 
building indexes etc for enabling search capabilities.
We evaluated Splunk, but it is not very cost effective solution for us as we 
may have logs running into few GBs per day as there can be around 25-20 servers 
running, and Splunk licensing model is based of size of logs per day that too, 
the license valid for only 1 year.
With this back ground, any further inputs on this are greatly appreciated.
Thanks,Surfer 

--- On Thu, 6/4/09, Alexandre Rafalovitch  wrote:

From: Alexandre Rafalovitch 
Subject: Re: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Thursday, June 4, 2009, 9:27 PM

I would also be interested to know what other existing solutions exist.

Splunk's advantage is that it does extraction of the fields with
advanced searching functionality (it has lexers/parsers for multiple
content types). I believe that's the Solr's function desired in
original posting. At the time they came out (2004), I was not aware of
any good open source solutions to do what they did. And I would have
loved one, as I was analyzing multi-gigabite logs.

Hadoop might be a way to process the files, but what would do the
indexing and searching?

Regards,
    Alex.

On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwood wrote:
> Why build one? Don't those already exist?
>
> Personally, I'd start with Hadoop instead of Solr. Putting logs in a
> search index is guaranteed to not scale. People were already trying
> different approaches ten years ago.
>
> wunder
>
> On 6/4/09 8:41 AM, "Silent Surfer"  wrote:
>
>> Hi,
>> Any help/pointers on the following message would really help me..
>> Thanks,Surfer
>>
>> --- On Tue, 6/2/09, Silent Surfer  wrote:
>>
>> From: Silent Surfer 
>> Subject: Questions regarding IT search solution
>> To: solr-user@lucene.apache.org
>> Date: Tuesday, June 2, 2009, 5:45 PM
>>
>> Hi,
>> I am new to Lucene forum and it is my first question.I need a clarification
>> from you.
>> Requirement:--1. Build a IT search tool for logs similar to
>> that of Splunk(Only wrt searching logs but not in terms of reporting, graphs
>> etc) using solr/lucene. The log files are mainly the server logs like JBoss,
>> Custom application server logs (May or may not be log4j logs) and the files
>> size can go potentially upto 100 MB2. The logs are spread across multiple
>> servers (25 to 30 servers)2. Capability to be do search almost realtime3.
>> Support  distributed search
>>
>> Our search criterion can be based on a keyword or timestamp or IP address 
>> etc.
>> Can anyone throw some light if solr/lucene is right solution for this ?
>> Appreciate any quick help in this regard.
>> Thanks,Surfer



  

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer
Hi,
Any help/pointers on the following message would really help me..
Thanks,Surfer

--- On Tue, 6/2/09, Silent Surfer  wrote:

From: Silent Surfer 
Subject: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Tuesday, June 2, 2009, 5:45 PM

Hi,
I am new to Lucene forum and it is my first question.I need a clarification 
from you.
Requirement:--1. Build a IT search tool for logs similar to 
that of Splunk(Only wrt searching logs but not in terms of reporting, graphs 
etc) using solr/lucene. The log files are mainly the server logs like JBoss, 
Custom application server logs (May or may not be log4j logs) and the files 
size can go potentially upto 100 MB2. The logs are spread across multiple 
servers (25 to 30 servers)2. Capability to be do search almost realtime3. 
Support  distributed search

Our search criterion can be based on a keyword or timestamp or IP address etc.
Can anyone throw some light if solr/lucene is right solution for this ?
Appreciate any quick help in this regard.
Thanks,Surfer

  


  

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer
Hi,
Any help/pointers on the following message would really help me..
Thanks,Surfer

--- On Tue, 6/2/09, Silent Surfer  wrote:

From: Silent Surfer 
Subject: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Tuesday, June 2, 2009, 5:45 PM

Hi,
I am new to Lucene forum and it is my first question.I need a clarification 
from you.
Requirement:--1. Build a IT search tool for logs similar to 
that of Splunk(Only wrt searching logs but not in terms of reporting, graphs 
etc) using solr/lucene. The log files are mainly the server logs like JBoss, 
Custom application server logs (May or may not be log4j logs) and the files 
size can go potentially upto 100 MB2. The logs are spread across multiple 
servers (25 to 30 servers)2. Capability to be do search almost realtime3. 
Support  distributed search

Our search criterion can be based on a keyword or timestamp or IP address etc.
Can anyone throw some light if solr/lucene is right solution for this ?
Appreciate any quick help in this regard.
Thanks,Surfer

  


  

Questions regarding IT search solution

2009-06-02 Thread Silent Surfer
Hi,
I am new to Lucene forum and it is my first question.I need a clarification 
from you.
Requirement:--1. Build a IT search tool for logs similar to 
that of Splunk(Only wrt searching logs but not in terms of reporting, graphs 
etc) using solr/lucene. The log files are mainly the server logs like JBoss, 
Custom application server logs (May or may not be log4j logs) and the files 
size can go potentially upto 100 MB2. The logs are spread across multiple 
servers (25 to 30 servers)2. Capability to be do search almost realtime3. 
Support  distributed search

Our search criterion can be based on a keyword or timestamp or IP address etc.
Can anyone throw some light if solr/lucene is right solution for this ?
Appreciate any quick help in this regard.
Thanks,Surfer



Thanks,Tiru