Certainly sounds like a bug in your analyzer.  You could start a new
thread if you need help with that.  But from your previous email it
sounds like you could use WhitespaceTokenizer chained with
LowerCaseFilter.


--
Ian.


On Thu, Oct 3, 2013 at 7:16 AM, VIGNESH S <vigneshkln...@gmail.com> wrote:
> Hi,
>
> In my Analyzer,problem actually occurs for words which are preceded by
> punctuation marks..
>
> For Example:
> If I am Indexing content    ",Andrey Gubarev,JingGoogle,Inc."
>
> If I search "Andrew Gubarev" ,It is not working properly since word Andrew
> is preceded by punctuation ",".
>
>
> On Thu, Oct 3, 2013 at 11:23 AM, VIGNESH S <vigneshkln...@gmail.com> wrote:
>
>> Hi Ian,
>>
>> In Lucene Is there any Default Analyzer we can use which will ignore only
>> Spaces.
>> All other numbers,punctuation,dates everything it should preserve.
>>
>> I created my analyzer  with tokenizer which returns
>> Character.isDefined(cn) && (!Character.isWhitespace(cn)).
>> My analyzer will use a lowe case filter on top of the tokenizer.This Woks
>> Perfect in case of 3.6
>> In 4.3 it is creating problems in offsets of tokens.
>>
>>
>>
>>
>> On Mon, Sep 30, 2013 at 8:21 PM, Ian Lea <ian....@gmail.com> wrote:
>>
>>> Whenever someone says they are using a custom analyzer that has to be
>>> a suspect.  Does it work if you use one of the core lucene analyzers
>>> instead?  Have you used Luke to verify that the index holds what you
>>> think it does?
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Mon, Sep 30, 2013 at 3:21 PM, VIGNESH S <vigneshkln...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > It is not the problem with case..Because Iam using LowercaseFilter.
>>> >
>>> > My Analyzer is a custom analyzer which will ignore just white spaces.All
>>> > other numbers date and other special characters it will consider.The
>>> Same
>>> > analyzer works for Lucene 3.6.
>>> >
>>> >
>>> > When i do a single term query for "Geoffrey" it is giving hits..But when
>>> > given as a part of multiphrase query ,it is not able to find..When the
>>> > below code is Executed with say word ="Geoffrey",it is not finding the
>>> word
>>> > itself ..
>>> >
>>> > if(TermsEnum.SeekStatus.FOUND ==trm.seekCeil(new BytesRef(word)))
>>> >  {                            do {
>>> >                                   String s = trm.term().utf8ToString();
>>> >                                   if (s.equals(word)) {
>>> >                                     termsWithPrefix.add(new
>>> Term("content",
>>> > s));
>>> >                                   } else {
>>> >                                     break;
>>> >                                   }
>>> >                                 }
>>> >  while (trm.next() != null);
>>> >  }
>>> >
>>> >
>>> >
>>> > On Mon, Sep 30, 2013 at 3:01 PM, Ian Lea <ian....@gmail.com> wrote:
>>> >
>>> >> Whenever someone says something along the lines of a search for
>>> >> "geoffrey" not matching "Geoffrey" the case difference springs out,
>>> >> Can't recall what if anything you said about the analysis side of
>>> >> things but that could be the cause.  See
>>> >>
>>> >>
>>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F
>>> >>
>>> >> If on the other hand the problem is more obscure, and only related to
>>> >> the multi phrase stuff, I suggest you build a tiny but complete
>>> >> RAMDirectory based program or test case that shows the problem and
>>> >> post it here.
>>> >>
>>> >>
>>> >> --
>>> >> Ian.
>>> >>
>>> >>
>>> >>
>>> >> On Mon, Sep 30, 2013 at 6:46 AM, VIGNESH S <vigneshkln...@gmail.com>
>>> >> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > Thanks for your Reply.The Problem I face is there is a word called
>>> >> Geoffrey
>>> >> > Romer in my Field.
>>> >> >
>>> >> > I am Forming a Multiphrase query object properly like " Geoffrey
>>> >> Romer".But
>>> >> > When i do a Search,it is not returning Hits.This Problem I am facing
>>> is
>>> >> not
>>> >> > for all phrases
>>> >> > This Problem happens for only few Phrases.
>>> >> >
>>> >> > When i do a single query like Geoffrey it is giving a Hit..But when
>>> i do
>>> >> it
>>> >> > in MultiphraseQuery it is not able to find "geoffrey". I confirmed
>>> this
>>> >> by
>>> >> > doing trm.seekCeil(new BytesRef("Geoffrey"))  and then and then when
>>> i
>>> >> > do String s = trm.term().utf8ToString().It is pointing to a diffrent
>>> word
>>> >> > instead of geoffrey.seekceil is working properly for many phrases
>>> though.
>>> >> >
>>> >> > What could be the problem..please kindly suggest.
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Fri, Sep 27, 2013 at 6:58 PM, Allison, Timothy B. <
>>> talli...@mitre.org
>>> >> >wrote:
>>> >> >
>>> >> >> 1) An alternate method to your original question would be to do
>>> >> something
>>> >> >> like this (I haven't compiled or tested this!):
>>> >> >>
>>> >> >> Query q = new PrefixQuery(new Term("field", "app"));
>>> >> >>
>>> >> >> q = q.rewrite(indexReader) ;
>>> >> >> Set<Term> terms = new HashSet<Term>();
>>> >> >> q.extractTerms(terms);
>>> >> >> Term[] arr = terms.toArray(new Term[terms.size()]);
>>> >> >> MultiPhraseQuery mpq = new MultiPhraseQuery();
>>> >> >> mpq.add(new Term("field", "microsoft");
>>> >> >> mpq.add(arr);
>>> >> >>
>>> >> >>
>>> >> >> 2) At a higher level, do you need to generate your query
>>> >> programmatically?
>>> >> >>  Here are three parsers that could handle this:
>>> >> >>   a) ComplexPhraseQueryParser
>>> >> >>   b) SurroundQueryParser:
>>> oal.queryparser.surround.parser.QueryParser
>>> >> >>   c) experimental: <self_promotion degree="shameless">
>>> >> >> http://issues.apache.org/jira/browse/LUCENE-5205</self_promotion>
>>> >> >>
>>> >> >>
>>> >> >> -----Original Message-----
>>> >> >> From: VIGNESH S [mailto:vigneshkln...@gmail.com]
>>> >> >> Sent: Friday, September 27, 2013 3:33 AM
>>> >> >> To: java-user@lucene.apache.org
>>> >> >> Subject: Re: Multiphrase Query in Lucene 4.3
>>> >> >>
>>> >> >> Hi,
>>> >> >>
>>> >> >> The word i am giving is "Romer Geoffrey ".The Word is in the Field.
>>> >> >>
>>> >> >>  trm.seekCeil(new BytesRef("Geoffrey")) and then when i do String s
>>> =
>>> >> >> trm.term().utf8ToString(); and hence
>>> >> >>
>>> >> >> It is giving a diffrent word..I think this is why my
>>> multiphrasequery is
>>> >> >> not giving desired results.
>>> >> >>
>>> >> >> What may be the reason..
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Fri, Sep 27, 2013 at 11:49 AM, VIGNESH S <
>>> vigneshkln...@gmail.com>
>>> >> >> wrote:
>>> >> >>
>>> >> >> > Hi Lan,
>>> >> >> >
>>> >> >> > Thanks for your Reply.
>>> >> >> >
>>> >> >> > I am doing similar to this only..In MultiPhraseQuery object actual
>>> >> phrase
>>> >> >> > is going proper but it is not returning any hits..
>>> >> >> >
>>> >> >> > In Lucene 3.6,I implemented the same logic and it is working.
>>> >> >> >
>>> >> >> > In Lucene 4.3,I implemented the Index for that  using
>>> >> >> >
>>> >> >> >  FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >>
>>>  
>>> offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
>>> >> >> >
>>> >> >> > For MultiphraseQuery, whether I need to add any other parameter in
>>> >> >> > addition to this while indexing?
>>> >> >> >
>>> >> >> > Is there any MultiPhraseQueryTest java file for Lucene 4.3? I
>>> checked
>>> >> in
>>> >> >> > Lucene branch and i was not able to find..Please kindly help.
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > On Thu, Sep 26, 2013 at 2:55 PM, Ian Lea <ian....@gmail.com>
>>> wrote:
>>> >> >> >
>>> >> >> >> I use the code below to do something like this.  Not exactly
>>> what you
>>> >> >> >> want but should be easy to adapt.
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> public List<String> findTerms(IndexReader _reader,
>>> >> >> >>                               String _field) throws IOException {
>>> >> >> >>   List<String> l = new ArrayList<String>();
>>> >> >> >>   Fields ff = MultiFields.getFields(_reader);
>>> >> >> >>   Terms trms = ff.terms(_field);
>>> >> >> >>   TermsEnum te = trms.iterator(null);
>>> >> >> >>   BytesRef br;
>>> >> >> >>   while ((br = te.next()) != null) {
>>> >> >> >>     l.add(br.utf8ToString());
>>> >> >> >>   }
>>> >> >> >>   return l;
>>> >> >> >> }
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >> Ian.
>>> >> >> >>
>>> >> >> >> On Wed, Sep 25, 2013 at 3:04 PM, VIGNESH S <
>>> vigneshkln...@gmail.com>
>>> >> >> >> wrote:
>>> >> >> >> > Hi,
>>> >> >> >> >
>>> >> >> >> > In the Example of Multiphrase Query it is mentioned
>>> >> >> >> >
>>> >> >> >> > "To use this class, to search for the phrase "Microsoft app*"
>>> first
>>> >> >> use
>>> >> >> >> > add(Term) on the term "Microsoft", then find all terms that
>>> have
>>> >> "app"
>>> >> >> >> as
>>> >> >> >> > prefix using IndexReader.terms(Term), and use
>>> >> >> >> MultiPhraseQuery.add(Term[]
>>> >> >> >> > terms) to add them to the query"
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > How can i replicate the Same in Lucene 4.3 since
>>> >> >> >> IndexReader.terms(Term) is
>>> >> >> >> > no more used
>>> >> >> >> >
>>> >> >> >> > --
>>> >> >> >> > Thanks and Regards
>>> >> >> >> > Vignesh Srinivasan
>>> >> >> >>
>>> >> >> >>
>>> ---------------------------------------------------------------------
>>> >> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> >> >> >> For additional commands, e-mail:
>>> java-user-h...@lucene.apache.org
>>> >> >> >>
>>> >> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Thanks and Regards
>>> >> >> > Vignesh Srinivasan
>>> >> >> > 9739135640
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Thanks and Regards
>>> >> >> Vignesh Srinivasan
>>> >> >> 9739135640
>>> >> >>
>>> >> >>
>>> ---------------------------------------------------------------------
>>> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>> >> >>
>>> >> >>
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Thanks and Regards
>>> >> > Vignesh Srinivasan
>>> >> > 9739135640
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > Thanks and Regards
>>> > Vignesh Srinivasan
>>> > 9739135640
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>
>>
>> --
>> Thanks and Regards
>> Vignesh Srinivasan
>> 9739135640
>>
>
>
>
> --
> Thanks and Regards
> Vignesh Srinivasan
> 9739135640

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to