Thanks for the help
Uwe Schindler wrote: > > Hi, > > Read here: http://wiki.apache.org/lucene-java/LuceneFAQ > > And I think that this type of questions is more for the Lucene Users > mailing > list > (http://lucene.apache.org/java/docs/mailinglists.html#Java%20User%20List). > This list is for developers of Lucene itself, not for users asking for > help > how to implement something specific with Lucene. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: [EMAIL PROTECTED] > >> -----Original Message----- >> From: blazingwolf7 [mailto:[EMAIL PROTECTED] >> Sent: Monday, July 07, 2008 9:15 AM >> To: java-dev@lucene.apache.org >> Subject: RE: Untokenized URL >> >> >> Well, I am open to suggestion, except for using reader. The >> Documnet.get() >> & >> CO, how does it works? >> >> >> Uwe Schindler wrote: >> > >> > As Shai told before, you should store the field twice: As tokenized >> field >> > for your search and with a different name (e.g. "field-untokenized"). >> For >> > your TermEnum Code you may use the untokenized field, for normal search >> > queries the tokenized. >> > If you want to retrieve the field contents with Document.get() & Co. >> > instead >> > of TermEnum, you may store the field one time with Flags Tokenized & >> > Stored. >> > But this does not work with your TermEnum solution. >> > >> > ----- >> > Uwe Schindler >> > H.-H.-Meier-Allee 63, D-28213 Bremen >> > http://www.thetaphi.de >> > eMail: [EMAIL PROTECTED] >> > >> >> -----Original Message----- >> >> From: blazingwolf7 [mailto:[EMAIL PROTECTED] >> >> Sent: Monday, July 07, 2008 7:39 AM >> >> To: java-dev@lucene.apache.org >> >> Subject: Re: Untokenized URL >> >> >> >> >> >> I am trying to retrieve the url and use it as filter. The main problem >> is >> >> I >> >> don't want to use a reader to continuously retrieve the url for each >> >> document located. >> >> >> >> TermDocs termDocs = reader.termDocs(); >> >> TermEnum termEnum = reader.terms (new Term (field, "")); >> >> do{ >> >> Term term = termEnum.term(); >> >> }while(termEnum.next()); >> >> >> >> I am using this code to retrieve the field containing the url but it >> is >> >> tokenized. Is there anyway to untokenized it or is there a better way >> to >> >> do >> >> this? >> >> >> >> >> >> Shai Erera wrote: >> >> > >> >> > I think that the simplest solution will be to index the URL field >> >> twice, >> >> > once as TOKENIZED and once as UN_TOKENIZED. Then you can look up the >> >> > un_tokenized term. >> >> > If you have a document in hand and only want to fetch its URL, then >> add >> >> > the >> >> > URL twice, once as Store.NO, Index.TOKENIZED and once as Store.YES / >> >> > COMPRESS and Index.NO. >> >> > >> >> > Perhaps I don't understand the entire scenario. When do you need to >> >> fetch >> >> > the contentLength and URL? To what purpose? >> >> > >> >> > On Sun, Jul 6, 2008 at 4:26 AM, blazingwolf7 >> <[EMAIL PROTECTED]> >> >> > wrote: >> >> > >> >> >> >> >> >> No, I didn't store the contentLength. Just adding it into the >> index. >> >> >> Which >> >> >> until now I am still scratching my head as I can't think of another >> >> way >> >> >> to >> >> >> retrieve it without continuously using the reader. >> >> >> >> >> >> As for the url, I use doc.add(new Field("url", >> >> Store.NO,Index.TOKENIZED). >> >> >> I >> >> >> will like to keep it this way, having the url being tokenized. I am >> >> >> finding >> >> >> a way to UNtokenized it, I retrieved it using a method that will >> >> retrieve >> >> >> the entire field then extract the information in it. But the >> problem >> >> is, >> >> >> the >> >> >> url are broken down. I am seeking a way to reconstruct it to its >> >> >> orgininal >> >> >> format. Can it be done? >> >> >> >> >> >> >> >> >> Shai Erera wrote: >> >> >> > >> >> >> > Hi >> >> >> > >> >> >> > Regarding the contentLength, when you add it to the document, do >> you >> >> >> use >> >> >> > *store* it as well (i.e., passing Store.YES or Store.COMPRESS)? >> >> >> > >> >> >> > Regarding the URL, how do you add it to the document? For >> example, >> >> if >> >> >> you >> >> >> > do >> >> >> > doc.add(new Field("url", "http://www.cnn.com", Store.NO, >> >> >> > Index.UN_TOKENIZED), it would create a token like "url: >> >> >> http://www.cnn.com" >> >> >> > without breaking it to its parts. Is that what you're looking >> for? >> >> >> > >> >> >> > Shai >> >> >> > >> >> >> > On Fri, Jul 4, 2008 at 11:19 AM, blazingwolf7 >> >> <[EMAIL PROTECTED]> >> >> >> > wrote: >> >> >> > >> >> >> >> >> >> >> >> Hi, >> >> >> >> >> >> >> >> I am currently working on retrieving url and contentLength of >> each >> >> >> >> document >> >> >> >> found during the search. I want to retrieve it during the >> >> calculation >> >> >> of >> >> >> >> score so that I can influence the score in some other way. >> >> >> >> >> >> >> >> I used the methods from TermDocs and TermEnum to get the >> >> information. >> >> >> >> However, the url I retrieve as is know by most, is tokenized. It >> is >> >> >> >> broken >> >> >> >> down into several parts and I will have to rejoin them. Can >> anyone >> >> >> help >> >> >> >> me >> >> >> >> with this? I am stuck here wondering how to get back the whole >> url >> >> >> >> without >> >> >> >> using a Reader. >> >> >> >> >> >> >> >> Also, I try to retrieve the contentLength, but the results >> return >> >> are >> >> >> >> null. >> >> >> >> Why is that? I opened the index using Luke and the contentLength >> is >> >> >> there >> >> >> >> but when I try to get it using this way, the results is null. >> >> >> >> >> >> >> >> Can anyone help me with both of these problems? Any help will be >> >> >> >> appreciated. Thanks >> >> >> >> -- >> >> >> >> View this message in context: >> >> >> >> http://www.nabble.com/Untokenized-URL-tp18275048p18275048.html >> >> >> >> Sent from the Lucene - Java Developer mailing list archive at >> >> >> Nabble.com. >> >> >> >> >> >> >> >> >> >> >> >> >> >> -------------------------------------------------------------------- >> >> - >> >> >> >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> >> >> >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> >> >> >> >> >> >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Regards, >> >> >> > >> >> >> > Shai Erera >> >> >> > >> >> >> > >> >> >> >> >> >> -- >> >> >> View this message in context: >> >> >> http://www.nabble.com/Untokenized-URL-tp18275048p18298055.html >> >> >> Sent from the Lucene - Java Developer mailing list archive at >> >> Nabble.com. >> >> >> >> >> >> >> >> >> >> -------------------------------------------------------------------- >> - >> >> >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> >> >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> >> >> >> >> >> > >> >> > >> >> > -- >> >> > Regards, >> >> > >> >> > Shai Erera >> >> > >> >> > >> >> >> >> -- >> >> View this message in context: http://www.nabble.com/Untokenized-URL- >> >> tp18275048p18310348.html >> >> Sent from the Lucene - Java Developer mailing list archive at >> Nabble.com. >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> >> For additional commands, e-mail: [EMAIL PROTECTED] >> > >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: [EMAIL PROTECTED] >> > For additional commands, e-mail: [EMAIL PROTECTED] >> > >> > >> > >> >> -- >> View this message in context: http://www.nabble.com/Untokenized-URL- >> tp18275048p18311247.html >> Sent from the Lucene - Java Developer mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Untokenized-URL-tp18275048p18311983.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]