Re: matching "starts with" only

2014-08-13 Thread Erick Erickson
I'd recommend that you spend some time with the
admin/analysis page.

KeywordTokenizer doesn't break up the input at _all_. So
the text "this is a black cat" will never match anything that
starts out "black". String is even more restrictive, it not only doesn't
tokenize, it won't allow lower case.

You haven't articulated the use-case you're really trying to support.
Is it a requirement that you always match from the left? I.e. if the
text is "this is a black cat" you don't want to match on "black cat", but
require "this is a black cat"? If so try EdgeNgramTokenizer.

Best,
Erick


On Tue, Aug 12, 2014 at 11:47 PM, zameer  wrote:

> On solr3.6 search while giving query "black\ cat*"(as you mentioned in
> post),
> I am not getting any result.
> Instead of "black\ cat*" if I am querying "black*\ cat*", its giving result
> as
> black forest cat
> black cat
> black color cat.
>
> But I need only these type result i.e.
> black cat
> black cat is beautiful
> black cat and dog
>
> Note: I am using solr3.6
>
>
> Erick Erickson wrote
> > Right, this is a quirk of phrase queries. For wildcards to work in phrase
> > queries you need SOLR-1604 (ComplexPhraseQueryParser).
> >
> > Or you need to escape your spaces, i.e.
> > black\ cat*
> >
> > Best,
> > Erick
> >
> >
> > On Tue, Aug 5, 2014 at 11:09 PM, zameer <
>
> > zameerulhasan121@
>
> > > wrote:
> >
> >> If we search only "black*" it works but when we use search text "black
> >> cat*"
> >> or "(black cat)*" or "(black cat*)*" it come blank.
> >>
> >>
> >  >>
> >  positionIncrementGap="100">
> >>
> > 
> >>
> > 
> >>
> > 
> >>
> > 
> >>
> > 
> >>
> >>
> >  >>
> >  type="text_general_long"/>
> >>
> >> Thank you in advance
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4151379.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4152662.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: matching "starts with" only

2014-08-12 Thread zameer
On solr3.6 search while giving query "black\ cat*"(as you mentioned in post),
I am not getting any result. 
Instead of "black\ cat*" if I am querying "black*\ cat*", its giving result
as
black forest cat
black cat
black color cat.

But I need only these type result i.e.
black cat
black cat is beautiful
black cat and dog 

Note: I am using solr3.6


Erick Erickson wrote
> Right, this is a quirk of phrase queries. For wildcards to work in phrase
> queries you need SOLR-1604 (ComplexPhraseQueryParser).
> 
> Or you need to escape your spaces, i.e.
> black\ cat*
> 
> Best,
> Erick
> 
> 
> On Tue, Aug 5, 2014 at 11:09 PM, zameer <

> zameerulhasan121@

> > wrote:
> 
>> If we search only "black*" it works but when we use search text "black
>> cat*"
>> or "(black cat)*" or "(black cat*)*" it come blank.
>>
>> 
> >
>  positionIncrementGap="100">
>>   
> 
>> 
> 
>> 
> 
>>   
> 
>> 
> 
>>
>> 
> >
>  type="text_general_long"/>
>>
>> Thank you in advance
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4151379.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>





--
View this message in context: 
http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4152662.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: matching "starts with" only

2014-08-06 Thread Erick Erickson
Right, this is a quirk of phrase queries. For wildcards to work in phrase
queries you need SOLR-1604 (ComplexPhraseQueryParser).

Or you need to escape your spaces, i.e.
black\ cat*

Best,
Erick


On Tue, Aug 5, 2014 at 11:09 PM, zameer  wrote:

> If we search only "black*" it works but when we use search text "black
> cat*"
> or "(black cat)*" or "(black cat*)*" it come blank.
>
>  positionIncrementGap="100">
>   
> 
> 
>   
> 
>
>  type="text_general_long"/>
>
> Thank you in advance
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4151379.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: matching "starts with" only

2014-08-05 Thread zameer
If we search only "black*" it works but when we use search text "black cat*"
or "(black cat)*" or "(black cat*)*" it come blank. 


  


  




Thank you in advance




--
View this message in context: 
http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4151379.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: matching "starts with" only

2013-10-10 Thread Erick Erickson
Be aware that the string type is not analyzed in any way,
so your searches are case sensitive. There's a "lowercase"
type in the example schema.xml that combines
KeywordTokenizer with LowercaseFilter for case-insensitive
searches that you might find useful.

Besides regex, this might be a good place or wildcards, just
black*.

Best,
Erick

On Thu, Oct 10, 2013 at 7:31 AM, adm1n  wrote:
> I've changed the field name to string type, the default one presented in
> schema.xml, and I got what I needed.
>
>
> thanks for your time.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094637.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: matching "starts with" only

2013-10-10 Thread adm1n
I've changed the field name to string type, the default one presented in
schema.xml, and I got what I needed.


thanks for your time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094637.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: matching "starts with" only

2013-10-09 Thread adm1n
search by "starts with" is something new I have to add, as well as the data I
have to index for this purpose, so it's ok to create a new field.

But once I added the following field type:

  


  
  


  


And:

indexing, and afterwards searching by "my_name:/^black/ returns no results,
while searching by "my_name:black" returns only "black" document.

What am I missing?

thanks. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094453.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: matching "starts with" only

2013-10-09 Thread Shawn Heisey

On 10/9/2013 2:16 PM, adm1n wrote:

Why this field have to be copyField? Couldn't it be a single field, for


I always assume that people already are using the existing field and 
type for other purposes.  Offering advice without making that assumption 
will usually result in people making a change and then complaining that 
something else no longer works.


If you don't need what you already have for something else, then you 
could change the type on the existing field with no problem.


Thanks,
Shawn



Re: matching "starts with" only

2013-10-09 Thread adm1n
Shawn Heisey-4:

thanks for the quick response.

Why this field have to be copyField? Couldn't it be a single field, for
example:

  


  
  


  






thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: matching "starts with" only

2013-10-09 Thread Shawn Heisey

On 10/9/2013 12:57 PM, adm1n wrote:

My index contains documents which could be a single word or a short sentence
which contains up to 4-5 words. I need to return documents, which "starts
with" only from the searched pattern.
in regex it would be [^my_query].

for example, for a docs:

black
beautiful black cat
cat
cat is black
black cat

and for the query: "black"

only "black" and "black cat" should be returned.

The text field I'm using is as follows:

   
 
 
 
   
   
 
 
 
   
 
Solr version is 4.2

thanks!


The presence of either the whitespace tokenizer or the NGram filter make 
this impossible, because they both break the indexed value into smaller 
pieces.  Together, they *really* break things up.  Matching is done on a 
per-term basis, and these two components in your analysis chain ensure 
that "black" will be a term for all of those input documents, whether it 
appears at the beginning, middle, or end.


If you set up a copyField to a new field whose fieldType uses the 
Keyword tokenizer (which treats the entire string as a single token) and 
the lowercase filter, you would be able use the regex support in Solr 
4.x and have this as your query string:


newfield:/^black/

Thanks,
Shawn