Thank you for the great answer Uwe! Sadly my department rejected the above combination of using Logstash + Elasticsearch. According to their experience, elastic search works fine on about 3 days of log data, but slows terribly down providing the magnitude of 3 months of data or so.
But I will take a look at Logstash anyway. After skimming through Logstash documentation I can see that there are so called Logstash "outputs": http://logstash.net/docs/1.4.2/tutorials/getting-started-with-logstash What do you think, is it possible to use Logstash as a preprocessor which outputs the filtered logs and feeds them into my Lucene app? Or if that's not a good idea, can you elaborate on how can I do this preprocessing you are referring to? Do you mean implementing an Analyzer like these? https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/package-summary.html Thank you, Gergely Nagy 2015-02-09 17:10 GMT+09:00 Uwe Schindler <u...@thetaphi.de>: > Hi, > > > I am in the beginning of implementing a Lucene application which would > > supposedly search through some log files. > > > > One of the requirements is to return results between a time range. Let's > say > > these are two lines in a series of log files: > > 2015-02-08 00:02:06.852Z INFO... > > ... > > 2015-02-08 18:02:04.012Z INFO... > > > > Now I need to search for these lines and return all the text in-between. > I was > > using this demo application to build an index: > > http://lucene.apache.org/core/4_10_3/demo/src- > > html/org/apache/lucene/demo/IndexFiles.html > > > > After that my first thought was using a term range query like this: > > TermRangeQuery query = > > TermRangeQuery.newStringRange("contents", > > "2015-02-08 00:02:06.852Z", "2015-02-08 18:02:04.012Z", true, true); > > > > But for some reason this didn't return any results. > > Lucene tokenizes the text, so you can search for terms ("words"). Those > dates are splitted into several terms. In general, this is not the way to > search on numeric / date range: > - it is horribly slow, because there are many terms in that "content" > field. > > > Then I was Googling for a while how to solve this problem, but all the > > datetime examples I found are searching based on a much simpler field. > > Those examples usually use a field like this: > > doc.add(new LongField("modified", file.lastModified(), Field.Store.NO)); > > That is the way to do it. Log files are "structured", so you need to do > preprocessing. You have to put the different information into different > fields (like the "modified" field, as mentioned in your example). You can > still fill the "contents" field as you did above with all information to do > plain fulltext search (like finding a log line based on some message > contents), but in addition, you use other fields for more specific searches > like ranges. In Lucene you generally fill several fields with the redundant > information (like dates in fulltext field and some extra timestamp field). > > The information you return to the user can be put into a "stored" only > field. This one is returned with search results. > > > So I was wondering, how can I index these log files to make a range query > > work on them? Any ideas? Maybe my approach is completely wrong. I am > > still new to Lucene so any help is appreciated. > > The first aproach is wrong, the second approach is right. You just have to > make your field definitions correct. > > An alternative would be to use Logstash in combination with Elasticsearch, > which is based on Lucene. This has everything you want to do already > implemented for log files. > > Uwe > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >