I don't think I could get much from what you said, could you please elaborate? Appreciate.
On Mon, Jun 10, 2013 at 5:20 PM, Jack Krupansky <[email protected]>wrote: > Your stored value could be very different from your indexed (searchable) > value. You can also associate payloads with an indexed term. And there are > DocValues as well. > > > -- Jack Krupansky > > -----Original Message----- From: nikhil desai > Sent: Monday, June 10, 2013 8:06 PM > To: [email protected] > Subject: Re: Lucene Indexes explanantion > > > Sure. Thanks Jack. > I don't have much experience working with Lucene, however, here is what I > am trying to resolve. > > I learned that the Custom attributes cannot be used for indexing or > searching purposes. However I wanted the attributes to be used for indexing > and searching. So I created custom attributes and inserted them as tokens > into the tokenstream by assigning positionIncrement attribute to 0. Now > since my new token stream has attributes(as tokens) and they are used while > indexing, I can now search the document based on the attributes(tokens I > newly inserted). However I still have an issue. And by the way I have a lot > of attributes that I need to assign to an individual token. > > Ex: Sentence: "LinkedIn is famous" > After passing through custom analyzer and few filters that I have written > and appending Attributes to the tokens, the new Tokenstream we get is > "LinkedIn Noun SocialSite famous JJ Positive" - (what that means is that > LinkedIn is Noun and is also an Socialsite, famous is an adjective and also > a Positive word, 'is' is removed as it does not make sense to index 'is') > > This is now definitely searchable based on Attributes(Here: Noun, > SocialSite, JJ, Positive). > > However, since I have put this entire text "LinkedIn is famous" as a Field > while adding a Document, when I search for say "SocialSite", I get a > Document as an output which has "LinkedIn is famous" as one of the fields. > > However, is it possible to get only "LinkedIn" as output rather than an > entire text? i.e Only the actual token(the token present in the original > input) as output? > Another example: if I search for say "Positive" I should get "famous" as > output and not the entire "LinkedIn is famous". > > I know that if I put it as a Field in the document, I should be able to get > it, but how do I add such a Field? because, only when the Tokens are passed > through the filters we get to know what all Attributes would be attached to > it, so while we do indexwriter.addDocument() we have no idea about the > Attributes. > > The typical problem that I see is the indexing is done based on the new > tokenstream which is good, but when it retrieves the Document, it has the > older actual Tokenstream(or actual input) and that is what is given as > output. > > Does that make any sense? Or I have a typical use case that does not go > well with Lucene? > > Any help comments are appreciated. > > On Mon, Jun 10, 2013 at 1:32 PM, Jack Krupansky <[email protected]>* > *wrote: > > Even though you've posted for Lucene, you might want to consider taking a >> look at Solr because Solr has an Admin UI with an Analysis page which >> gives >> you a nice display of how index and query text is analyzed into tokens, >> terms, and attributes - all of which Solr inherits from Lucene. >> >> And check out the unit tests for Lucene (and Solr) for indexing. Then you >> can actually step through code and see it happen. >> >> Otherwise, google for blogs on various sub-topics of interest with >> specific terms. >> >> OTOH... don't try diving too deeply until you've written and understood a >> fair amount of Java code using Lucene. Otherwise, you won't have enough >> context to understand or even ask intelligent questions. >> >> -- Jack Krupansky >> >> -----Original Message----- From: nikhil desai >> Sent: Monday, June 10, 2013 1:24 PM >> To: [email protected] >> Subject: Lucene Indexes explanantion >> >> >> Hello, >> >> My first time post in this group. >> >> I have been using Lucene recently. I have a question. >> >> Where can I find a good explanation on Indexes. Or rather how indexing >> (Not >> really the mathematical aspect) happens in Lucene, what all >> attributes(charTerm, Offset etc) come into play? And the way it is >> implemented? I checked the "Lucene In Action" and could not find much on >> actual indexing, what all classes etc are being used. >> >> Appreciate your help. >> >> Thanks >> NIKHIL >> >> ------------------------------****----------------------------** >> --**--------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.****apache.org< >> java-user-**[email protected]<[email protected]> >> > >> For additional commands, e-mail: [email protected].****org< >> java-user-help@lucene.**apache.org <[email protected]>> >> >> >> > > ------------------------------**------------------------------**--------- > To unsubscribe, e-mail: > java-user-unsubscribe@lucene.**apache.org<[email protected]> > For additional commands, e-mail: > [email protected].**org<[email protected]> > >
