Hi, Read here: http://wiki.apache.org/lucene-java/LuceneFAQ
And I think that this type of questions is more for the Lucene Users mailing list (http://lucene.apache.org/java/docs/mailinglists.html#Java%20User%20List). This list is for developers of Lucene itself, not for users asking for help how to implement something specific with Lucene. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] > -----Original Message----- > From: blazingwolf7 [mailto:[EMAIL PROTECTED] > Sent: Monday, July 07, 2008 9:15 AM > To: java-dev@lucene.apache.org > Subject: RE: Untokenized URL > > > Well, I am open to suggestion, except for using reader. The Documnet.get() > & > CO, how does it works? > > > Uwe Schindler wrote: > > > > As Shai told before, you should store the field twice: As tokenized > field > > for your search and with a different name (e.g. "field-untokenized"). > For > > your TermEnum Code you may use the untokenized field, for normal search > > queries the tokenized. > > If you want to retrieve the field contents with Document.get() & Co. > > instead > > of TermEnum, you may store the field one time with Flags Tokenized & > > Stored. > > But this does not work with your TermEnum solution. > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: [EMAIL PROTECTED] > > > >> -----Original Message----- > >> From: blazingwolf7 [mailto:[EMAIL PROTECTED] > >> Sent: Monday, July 07, 2008 7:39 AM > >> To: java-dev@lucene.apache.org > >> Subject: Re: Untokenized URL > >> > >> > >> I am trying to retrieve the url and use it as filter. The main problem > is > >> I > >> don't want to use a reader to continuously retrieve the url for each > >> document located. > >> > >> TermDocs termDocs = reader.termDocs(); > >> TermEnum termEnum = reader.terms (new Term (field, "")); > >> do{ > >> Term term = termEnum.term(); > >> }while(termEnum.next()); > >> > >> I am using this code to retrieve the field containing the url but it is > >> tokenized. Is there anyway to untokenized it or is there a better way > to > >> do > >> this? > >> > >> > >> Shai Erera wrote: > >> > > >> > I think that the simplest solution will be to index the URL field > >> twice, > >> > once as TOKENIZED and once as UN_TOKENIZED. Then you can look up the > >> > un_tokenized term. > >> > If you have a document in hand and only want to fetch its URL, then > add > >> > the > >> > URL twice, once as Store.NO, Index.TOKENIZED and once as Store.YES / > >> > COMPRESS and Index.NO. > >> > > >> > Perhaps I don't understand the entire scenario. When do you need to > >> fetch > >> > the contentLength and URL? To what purpose? > >> > > >> > On Sun, Jul 6, 2008 at 4:26 AM, blazingwolf7 <[EMAIL PROTECTED]> > >> > wrote: > >> > > >> >> > >> >> No, I didn't store the contentLength. Just adding it into the index. > >> >> Which > >> >> until now I am still scratching my head as I can't think of another > >> way > >> >> to > >> >> retrieve it without continuously using the reader. > >> >> > >> >> As for the url, I use doc.add(new Field("url", > >> Store.NO,Index.TOKENIZED). > >> >> I > >> >> will like to keep it this way, having the url being tokenized. I am > >> >> finding > >> >> a way to UNtokenized it, I retrieved it using a method that will > >> retrieve > >> >> the entire field then extract the information in it. But the problem > >> is, > >> >> the > >> >> url are broken down. I am seeking a way to reconstruct it to its > >> >> orgininal > >> >> format. Can it be done? > >> >> > >> >> > >> >> Shai Erera wrote: > >> >> > > >> >> > Hi > >> >> > > >> >> > Regarding the contentLength, when you add it to the document, do > you > >> >> use > >> >> > *store* it as well (i.e., passing Store.YES or Store.COMPRESS)? > >> >> > > >> >> > Regarding the URL, how do you add it to the document? For example, > >> if > >> >> you > >> >> > do > >> >> > doc.add(new Field("url", "http://www.cnn.com", Store.NO, > >> >> > Index.UN_TOKENIZED), it would create a token like "url: > >> >> http://www.cnn.com" > >> >> > without breaking it to its parts. Is that what you're looking for? > >> >> > > >> >> > Shai > >> >> > > >> >> > On Fri, Jul 4, 2008 at 11:19 AM, blazingwolf7 > >> <[EMAIL PROTECTED]> > >> >> > wrote: > >> >> > > >> >> >> > >> >> >> Hi, > >> >> >> > >> >> >> I am currently working on retrieving url and contentLength of > each > >> >> >> document > >> >> >> found during the search. I want to retrieve it during the > >> calculation > >> >> of > >> >> >> score so that I can influence the score in some other way. > >> >> >> > >> >> >> I used the methods from TermDocs and TermEnum to get the > >> information. > >> >> >> However, the url I retrieve as is know by most, is tokenized. It > is > >> >> >> broken > >> >> >> down into several parts and I will have to rejoin them. Can > anyone > >> >> help > >> >> >> me > >> >> >> with this? I am stuck here wondering how to get back the whole > url > >> >> >> without > >> >> >> using a Reader. > >> >> >> > >> >> >> Also, I try to retrieve the contentLength, but the results return > >> are > >> >> >> null. > >> >> >> Why is that? I opened the index using Luke and the contentLength > is > >> >> there > >> >> >> but when I try to get it using this way, the results is null. > >> >> >> > >> >> >> Can anyone help me with both of these problems? Any help will be > >> >> >> appreciated. Thanks > >> >> >> -- > >> >> >> View this message in context: > >> >> >> http://www.nabble.com/Untokenized-URL-tp18275048p18275048.html > >> >> >> Sent from the Lucene - Java Developer mailing list archive at > >> >> Nabble.com. > >> >> >> > >> >> >> > >> >> >> > >> -------------------------------------------------------------------- > >> - > >> >> >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> >> >> For additional commands, e-mail: [EMAIL PROTECTED] > >> >> >> > >> >> >> > >> >> > > >> >> > > >> >> > -- > >> >> > Regards, > >> >> > > >> >> > Shai Erera > >> >> > > >> >> > > >> >> > >> >> -- > >> >> View this message in context: > >> >> http://www.nabble.com/Untokenized-URL-tp18275048p18298055.html > >> >> Sent from the Lucene - Java Developer mailing list archive at > >> Nabble.com. > >> >> > >> >> > >> >> -------------------------------------------------------------------- > - > >> >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> >> For additional commands, e-mail: [EMAIL PROTECTED] > >> >> > >> >> > >> > > >> > > >> > -- > >> > Regards, > >> > > >> > Shai Erera > >> > > >> > > >> > >> -- > >> View this message in context: http://www.nabble.com/Untokenized-URL- > >> tp18275048p18310348.html > >> Sent from the Lucene - Java Developer mailing list archive at > Nabble.com. > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > -- > View this message in context: http://www.nabble.com/Untokenized-URL- > tp18275048p18311247.html > Sent from the Lucene - Java Developer mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]