Ok, say each line is an address. So the text file would look like: 123 Water St. Somerville, GA 12345 456 Easy St. Hope, CA 45676 34 Ocean Blvd. Staten Island, NY 93843
The file would have hundreds of thousands of addresses. So the user would type "34, St" in the search box and press a "Search" button. In the table below the search box, the first and third record from the addresses above would be displayed because they both have a "34" somewhere in them, and they both have a "St" somewhere in them. So the table would show: 123 Water St. Somerville, GA 12345 34 Ocean Blvd. Staten Island, NY 93843 because they match both criteria as pointed out here: 123 Water "St". Somerville, GA 12"34"5 "34" Ocean Blvd. "St"aten Island, NY 93843 Thanks. Brittany Well, this could get to be a really ugly query. Let's say you have 10 lines. Then the doc would have 10 different fields? ("line1", "line2" etc.)? Then to search it you have to have an or clause across all fields. And a file with 100,000 lines would be a 100,000 term query...... Or I misunderstand you completely. Calling doc.add with the *same* field (say "text") is a possibility, especially if you provide your own tokenizer that returns a large increment gap, say 1000. This offset gets added to each call to doc.add on a field. So say you have 10 lines, each with 5 tokens. The first token of each line would be at offsets 0, 15, 30, 45... You have a couple of choices here. Say you can guarantee that no line will be longer than 100 terms. Each line could begin on an even 100 offset (assuming you're not indexing something with many millions of lines). Now, to find the line you just divide the offset by 100. Another possibility is to keep a field in the document that correlates offsets to lines and read that in when you need to. It all depends upon what the purpose of needing to keep track of lines. If it's for a single document, this kind of thing can work. But if you want line information for all the hits, it could be too expensive. The increment gap will play interesting games with Span queries (or slop in phrase queries). If you need proximity to span lines, this scheme needs some modification. Say I want hits when "firstname" is within 10 terms of "lastname". Well, if you have a large increment gap this won't work. So it would be a good thing to tell us a bit more about why you want to distinguish lines to get better advice <G>. Best Erick On Fri, Aug 1, 2008 at 9:59 AM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) < [EMAIL PROTECTED]> wrote: > Why should each line be a Document ? If there is a single document having > each line as a Field, then the search would result in a single Document as > a > 'hit' not the individual lines matching it. Is this right ? > > Nagesh > > On Fri, Aug 1, 2008 at 7:21 PM, <[EMAIL PROTECTED]> wrote: > > > Hello Brittany, > > > > I think the easiest thing for you to do is make each line a Document. > You > > might want a FileName and LineNumber field on top of a "Text" field, this > > way if you need to gather all the lines of your File back together again > > you > > can do a search on the FileName. > > > > So in your case: > > > > Document 1 > > FileName: [the file] > > LineNumber: 1 > > Text: I like apples > > Document 2 > > ...etc > > > > Regards, > > Roy > > > > On Fri, Aug 1, 2008 at 9:28 AM, Brittany Jacobs < > [EMAIL PROTECTED] > > >wrote: > > > > > Just trying to grasp the concept. > > > > > > > > > > > > I want to search a text file where each line is a separate item to be > > > searched. When text it entered by the user, I want to return all the > > lines > > > in which that text appears. > > > > > > For example, if the text file has: > > > > > > I like apples. > > > > > > I went to the store. > > > > > > I bought an apple. > > > > > > > > > > > > If the user searches "apple", I want it to return the first and third > > > sentences. > > > > > > > > > > > > Is each sentence a Token? Is the user input going to be a QueryParser? > > > How > > > should I read in the file so that each line of text is a token to > search? > > > > > > > > > > > > Thanks in advance. > > > > > > > > > > > > > > > > > > > > > > > > Brittany Jacobs > > > > > > Java Developer > > > > > > JBManagement, Inc. > > > > > > 12 Christopher Way, Suite 103 > > > > > > Eatontown, NJ 07724 > > > > > > ph: 732-542-9200 ext. 229 > > > > > > fax: 732-380-0678 > > > > > > email: <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED] > > > > > > > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]