Certainly sounds like a bug in your analyzer. You could start a new thread if you need help with that. But from your previous email it sounds like you could use WhitespaceTokenizer chained with LowerCaseFilter.
-- Ian. On Thu, Oct 3, 2013 at 7:16 AM, VIGNESH S <vigneshkln...@gmail.com> wrote: > Hi, > > In my Analyzer,problem actually occurs for words which are preceded by > punctuation marks.. > > For Example: > If I am Indexing content ",Andrey Gubarev,JingGoogle,Inc." > > If I search "Andrew Gubarev" ,It is not working properly since word Andrew > is preceded by punctuation ",". > > > On Thu, Oct 3, 2013 at 11:23 AM, VIGNESH S <vigneshkln...@gmail.com> wrote: > >> Hi Ian, >> >> In Lucene Is there any Default Analyzer we can use which will ignore only >> Spaces. >> All other numbers,punctuation,dates everything it should preserve. >> >> I created my analyzer with tokenizer which returns >> Character.isDefined(cn) && (!Character.isWhitespace(cn)). >> My analyzer will use a lowe case filter on top of the tokenizer.This Woks >> Perfect in case of 3.6 >> In 4.3 it is creating problems in offsets of tokens. >> >> >> >> >> On Mon, Sep 30, 2013 at 8:21 PM, Ian Lea <ian....@gmail.com> wrote: >> >>> Whenever someone says they are using a custom analyzer that has to be >>> a suspect. Does it work if you use one of the core lucene analyzers >>> instead? Have you used Luke to verify that the index holds what you >>> think it does? >>> >>> >>> -- >>> Ian. >>> >>> >>> On Mon, Sep 30, 2013 at 3:21 PM, VIGNESH S <vigneshkln...@gmail.com> >>> wrote: >>> > Hi, >>> > >>> > It is not the problem with case..Because Iam using LowercaseFilter. >>> > >>> > My Analyzer is a custom analyzer which will ignore just white spaces.All >>> > other numbers date and other special characters it will consider.The >>> Same >>> > analyzer works for Lucene 3.6. >>> > >>> > >>> > When i do a single term query for "Geoffrey" it is giving hits..But when >>> > given as a part of multiphrase query ,it is not able to find..When the >>> > below code is Executed with say word ="Geoffrey",it is not finding the >>> word >>> > itself .. >>> > >>> > if(TermsEnum.SeekStatus.FOUND ==trm.seekCeil(new BytesRef(word))) >>> > { do { >>> > String s = trm.term().utf8ToString(); >>> > if (s.equals(word)) { >>> > termsWithPrefix.add(new >>> Term("content", >>> > s)); >>> > } else { >>> > break; >>> > } >>> > } >>> > while (trm.next() != null); >>> > } >>> > >>> > >>> > >>> > On Mon, Sep 30, 2013 at 3:01 PM, Ian Lea <ian....@gmail.com> wrote: >>> > >>> >> Whenever someone says something along the lines of a search for >>> >> "geoffrey" not matching "Geoffrey" the case difference springs out, >>> >> Can't recall what if anything you said about the analysis side of >>> >> things but that could be the cause. See >>> >> >>> >> >>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F >>> >> >>> >> If on the other hand the problem is more obscure, and only related to >>> >> the multi phrase stuff, I suggest you build a tiny but complete >>> >> RAMDirectory based program or test case that shows the problem and >>> >> post it here. >>> >> >>> >> >>> >> -- >>> >> Ian. >>> >> >>> >> >>> >> >>> >> On Mon, Sep 30, 2013 at 6:46 AM, VIGNESH S <vigneshkln...@gmail.com> >>> >> wrote: >>> >> > Hi, >>> >> > >>> >> > Thanks for your Reply.The Problem I face is there is a word called >>> >> Geoffrey >>> >> > Romer in my Field. >>> >> > >>> >> > I am Forming a Multiphrase query object properly like " Geoffrey >>> >> Romer".But >>> >> > When i do a Search,it is not returning Hits.This Problem I am facing >>> is >>> >> not >>> >> > for all phrases >>> >> > This Problem happens for only few Phrases. >>> >> > >>> >> > When i do a single query like Geoffrey it is giving a Hit..But when >>> i do >>> >> it >>> >> > in MultiphraseQuery it is not able to find "geoffrey". I confirmed >>> this >>> >> by >>> >> > doing trm.seekCeil(new BytesRef("Geoffrey")) and then and then when >>> i >>> >> > do String s = trm.term().utf8ToString().It is pointing to a diffrent >>> word >>> >> > instead of geoffrey.seekceil is working properly for many phrases >>> though. >>> >> > >>> >> > What could be the problem..please kindly suggest. >>> >> > >>> >> > >>> >> > >>> >> > On Fri, Sep 27, 2013 at 6:58 PM, Allison, Timothy B. < >>> talli...@mitre.org >>> >> >wrote: >>> >> > >>> >> >> 1) An alternate method to your original question would be to do >>> >> something >>> >> >> like this (I haven't compiled or tested this!): >>> >> >> >>> >> >> Query q = new PrefixQuery(new Term("field", "app")); >>> >> >> >>> >> >> q = q.rewrite(indexReader) ; >>> >> >> Set<Term> terms = new HashSet<Term>(); >>> >> >> q.extractTerms(terms); >>> >> >> Term[] arr = terms.toArray(new Term[terms.size()]); >>> >> >> MultiPhraseQuery mpq = new MultiPhraseQuery(); >>> >> >> mpq.add(new Term("field", "microsoft"); >>> >> >> mpq.add(arr); >>> >> >> >>> >> >> >>> >> >> 2) At a higher level, do you need to generate your query >>> >> programmatically? >>> >> >> Here are three parsers that could handle this: >>> >> >> a) ComplexPhraseQueryParser >>> >> >> b) SurroundQueryParser: >>> oal.queryparser.surround.parser.QueryParser >>> >> >> c) experimental: <self_promotion degree="shameless"> >>> >> >> http://issues.apache.org/jira/browse/LUCENE-5205</self_promotion> >>> >> >> >>> >> >> >>> >> >> -----Original Message----- >>> >> >> From: VIGNESH S [mailto:vigneshkln...@gmail.com] >>> >> >> Sent: Friday, September 27, 2013 3:33 AM >>> >> >> To: java-user@lucene.apache.org >>> >> >> Subject: Re: Multiphrase Query in Lucene 4.3 >>> >> >> >>> >> >> Hi, >>> >> >> >>> >> >> The word i am giving is "Romer Geoffrey ".The Word is in the Field. >>> >> >> >>> >> >> trm.seekCeil(new BytesRef("Geoffrey")) and then when i do String s >>> = >>> >> >> trm.term().utf8ToString(); and hence >>> >> >> >>> >> >> It is giving a diffrent word..I think this is why my >>> multiphrasequery is >>> >> >> not giving desired results. >>> >> >> >>> >> >> What may be the reason.. >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> On Fri, Sep 27, 2013 at 11:49 AM, VIGNESH S < >>> vigneshkln...@gmail.com> >>> >> >> wrote: >>> >> >> >>> >> >> > Hi Lan, >>> >> >> > >>> >> >> > Thanks for your Reply. >>> >> >> > >>> >> >> > I am doing similar to this only..In MultiPhraseQuery object actual >>> >> phrase >>> >> >> > is going proper but it is not returning any hits.. >>> >> >> > >>> >> >> > In Lucene 3.6,I implemented the same logic and it is working. >>> >> >> > >>> >> >> > In Lucene 4.3,I implemented the Index for that using >>> >> >> > >>> >> >> > FieldType offsetsType = new FieldType(TextField.TYPE_STORED); >>> >> >> > >>> >> >> > >>> >> >> >>> >> >>> >>> offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); >>> >> >> > >>> >> >> > For MultiphraseQuery, whether I need to add any other parameter in >>> >> >> > addition to this while indexing? >>> >> >> > >>> >> >> > Is there any MultiPhraseQueryTest java file for Lucene 4.3? I >>> checked >>> >> in >>> >> >> > Lucene branch and i was not able to find..Please kindly help. >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > On Thu, Sep 26, 2013 at 2:55 PM, Ian Lea <ian....@gmail.com> >>> wrote: >>> >> >> > >>> >> >> >> I use the code below to do something like this. Not exactly >>> what you >>> >> >> >> want but should be easy to adapt. >>> >> >> >> >>> >> >> >> >>> >> >> >> public List<String> findTerms(IndexReader _reader, >>> >> >> >> String _field) throws IOException { >>> >> >> >> List<String> l = new ArrayList<String>(); >>> >> >> >> Fields ff = MultiFields.getFields(_reader); >>> >> >> >> Terms trms = ff.terms(_field); >>> >> >> >> TermsEnum te = trms.iterator(null); >>> >> >> >> BytesRef br; >>> >> >> >> while ((br = te.next()) != null) { >>> >> >> >> l.add(br.utf8ToString()); >>> >> >> >> } >>> >> >> >> return l; >>> >> >> >> } >>> >> >> >> >>> >> >> >> -- >>> >> >> >> Ian. >>> >> >> >> >>> >> >> >> On Wed, Sep 25, 2013 at 3:04 PM, VIGNESH S < >>> vigneshkln...@gmail.com> >>> >> >> >> wrote: >>> >> >> >> > Hi, >>> >> >> >> > >>> >> >> >> > In the Example of Multiphrase Query it is mentioned >>> >> >> >> > >>> >> >> >> > "To use this class, to search for the phrase "Microsoft app*" >>> first >>> >> >> use >>> >> >> >> > add(Term) on the term "Microsoft", then find all terms that >>> have >>> >> "app" >>> >> >> >> as >>> >> >> >> > prefix using IndexReader.terms(Term), and use >>> >> >> >> MultiPhraseQuery.add(Term[] >>> >> >> >> > terms) to add them to the query" >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > How can i replicate the Same in Lucene 4.3 since >>> >> >> >> IndexReader.terms(Term) is >>> >> >> >> > no more used >>> >> >> >> > >>> >> >> >> > -- >>> >> >> >> > Thanks and Regards >>> >> >> >> > Vignesh Srinivasan >>> >> >> >> >>> >> >> >> >>> --------------------------------------------------------------------- >>> >> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> >> >> >> For additional commands, e-mail: >>> java-user-h...@lucene.apache.org >>> >> >> >> >>> >> >> >> >>> >> >> > >>> >> >> > >>> >> >> > -- >>> >> >> > Thanks and Regards >>> >> >> > Vignesh Srinivasan >>> >> >> > 9739135640 >>> >> >> > >>> >> >> >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> Thanks and Regards >>> >> >> Vignesh Srinivasan >>> >> >> 9739135640 >>> >> >> >>> >> >> >>> --------------------------------------------------------------------- >>> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> >>> >> >> >>> >> > >>> >> > >>> >> > -- >>> >> > Thanks and Regards >>> >> > Vignesh Srinivasan >>> >> > 9739135640 >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >>> >> >>> > >>> > >>> > -- >>> > Thanks and Regards >>> > Vignesh Srinivasan >>> > 9739135640 >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> >> >> -- >> Thanks and Regards >> Vignesh Srinivasan >> 9739135640 >> > > > > -- > Thanks and Regards > Vignesh Srinivasan > 9739135640 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org