I'd suggest writing a perl script or insert-favourite-scripting-language-here script to pre-filter this content out of the files before it gets to Lucene/Solr Or you could just grep for "Data' and"Description" (or is 'Description' multi-line)?
-Glen Newton On Mon, Feb 27, 2012 at 11:55 AM, Prakash Reddy Bande <praka...@altair.com> wrote: > Hi, > > I want to customize the indexing of some specific kind of files I have. I am > using 2.9.3 but upgrading is possible. > This is how my file's data looks > > ***************************** > Data for 2010 > Description: This section has a general description of the data. > DATA_BEGIN > Month P1 P2 P3 > 01 3243.433 43534.324 45345.2443 > 02 3242.324 234234.24 323.2343 > ... > ... > ... > ... > DATA_END > Data for 2011 > Description: This section has a general description of the data. > DATA_BEGIN > Month P1 P2 P3 > 01 3243.433 43534.324 45345.2443 > 02 3242.324 234234.24 323.2343 > ... > ... > ... > ... > DATA_END > ***************************** > > I would like to use a StandardAnalyser, but do not want to index the data of > the columns, i.e. skip all those numbers. Basically, as soon as I hit the > keyword DATA_BEGIN, I want to jump to DATA_END. > So, what is the best approach? Using a custom Reader, custom tokenizer or > some other mechanism. > Regards, > > Prakash Bande > Altair Eng. Inc. > Troy MI > Ph: 248-614-2400 ext 489 > Cell: 248-404-0292 > -- - http://zzzoot.blogspot.com/ - --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org