Re: Compound words

2013-10-28 Thread Parvesh Garg
One more thing, Is there a way to remove my "accidentally sent phone number
in the signature" from the previous mail? aarrrggghhh


Re: Compound words

2013-10-28 Thread Erick Erickson
Why did you reject using synonyms? You can have multi-word
synonyms just fine at index time, and at query time, since the
multiple words are already substituted in the index you don't
need to do the same substitution, just query the raw strings.

I freely acknowledge you may have very good reasons for doing
this yourself, I'm just making sure you know what's already
there.

See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Look particularly at the explanations for "sea biscuit" in that section.

Best,
Erick



On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg  wrote:

> One more thing, Is there a way to remove my "accidentally sent phone number
> in the signature" from the previous mail? aarrrggghhh
>


Re: Compound words

2013-10-28 Thread Parvesh Garg
Hi Erick,

Thanks for the suggestion. Like I said, I'm an infant.

We tried synonyms both ways. sea biscuit => seabiscuit and seabiscuit =>
sea biscuit and didn't understand exactly how it worked. But I just checked
the analysis tool, and it seems to work perfectly fine at index time. Now,
I can happily discard my own filter and 4 days of work. I'm happy I got to
know a few ways on how/when not to write a solr filter :)

I tried the string "sea biscuit sea bird" with expand=false and the tokens
i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But at
query time, when I enter the same term "sea biscuit sea bird", using
edismax and qf, pf2, and pf3, the parsedQuery looks like this:

+((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\"biscuit sea\")
(text:\"sea bird\")) ((text:\"seabiscuit sea\") (text:\"biscuit sea
bird\"))"

What I wanted instead was this

"+((text:seabiscuit) (text:sea) (text:bird)) ((text:\"seabiscuit sea\")
(text:\"sea bird\")) (text:\"seabiscuit sea bird\")"

Looks like there isn't any other way than to pre-process query myself and
create the compound word. What do you mean by "just query the raw string"?
Am I still missing something?

Parvesh Garg
http://www.zettata.com
(This time I did remove my phone number :) )

On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson wrote:

> Why did you reject using synonyms? You can have multi-word
> synonyms just fine at index time, and at query time, since the
> multiple words are already substituted in the index you don't
> need to do the same substitution, just query the raw strings.
>
> I freely acknowledge you may have very good reasons for doing
> this yourself, I'm just making sure you know what's already
> there.
>
> See:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>
> Look particularly at the explanations for "sea biscuit" in that section.
>
> Best,
> Erick
>
>
>
> On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg  wrote:
>
> > One more thing, Is there a way to remove my "accidentally sent phone
> number
> > in the signature" from the previous mail? aarrrggghhh
> >
>


Re: Compound words

2013-10-28 Thread Erick Erickson
Consider setting expand=true at index time. That
puts all the tokens in your index, and then you
may not need to have any synonym
processing at query time since all the variants will
already be in the index.

As it is, you've replaced the words in the original with
synonyms, essentially collapsed them down to a single
word and then you have to do something at query time
to get matches. If all the variants are in the index, you
shouldn't have to. That's what I meant by "raw".

Best,
Erick


On Mon, Oct 28, 2013 at 8:02 AM, Parvesh Garg  wrote:

> Hi Erick,
>
> Thanks for the suggestion. Like I said, I'm an infant.
>
> We tried synonyms both ways. sea biscuit => seabiscuit and seabiscuit =>
> sea biscuit and didn't understand exactly how it worked. But I just checked
> the analysis tool, and it seems to work perfectly fine at index time. Now,
> I can happily discard my own filter and 4 days of work. I'm happy I got to
> know a few ways on how/when not to write a solr filter :)
>
> I tried the string "sea biscuit sea bird" with expand=false and the tokens
> i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But at
> query time, when I enter the same term "sea biscuit sea bird", using
> edismax and qf, pf2, and pf3, the parsedQuery looks like this:
>
> +((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\"biscuit sea\")
> (text:\"sea bird\")) ((text:\"seabiscuit sea\") (text:\"biscuit sea
> bird\"))"
>
> What I wanted instead was this
>
> "+((text:seabiscuit) (text:sea) (text:bird)) ((text:\"seabiscuit sea\")
> (text:\"sea bird\")) (text:\"seabiscuit sea bird\")"
>
> Looks like there isn't any other way than to pre-process query myself and
> create the compound word. What do you mean by "just query the raw string"?
> Am I still missing something?
>
> Parvesh Garg
> http://www.zettata.com
> (This time I did remove my phone number :) )
>
> On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson  >wrote:
>
> > Why did you reject using synonyms? You can have multi-word
> > synonyms just fine at index time, and at query time, since the
> > multiple words are already substituted in the index you don't
> > need to do the same substitution, just query the raw strings.
> >
> > I freely acknowledge you may have very good reasons for doing
> > this yourself, I'm just making sure you know what's already
> > there.
> >
> > See:
> >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> >
> > Look particularly at the explanations for "sea biscuit" in that section.
> >
> > Best,
> > Erick
> >
> >
> >
> > On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg 
> wrote:
> >
> > > One more thing, Is there a way to remove my "accidentally sent phone
> > number
> > > in the signature" from the previous mail? aarrrggghhh
> > >
> >
>


Re: Compound words

2013-10-28 Thread Roman Chyla
Hi Parvesh,
I think you should check the following jira
https://issues.apache.org/jira/browse/SOLR-5379. You will find there links
to other possible solutions/problems:-)
Roman
On 28 Oct 2013 09:06, "Erick Erickson"  wrote:

> Consider setting expand=true at index time. That
> puts all the tokens in your index, and then you
> may not need to have any synonym
> processing at query time since all the variants will
> already be in the index.
>
> As it is, you've replaced the words in the original with
> synonyms, essentially collapsed them down to a single
> word and then you have to do something at query time
> to get matches. If all the variants are in the index, you
> shouldn't have to. That's what I meant by "raw".
>
> Best,
> Erick
>
>
> On Mon, Oct 28, 2013 at 8:02 AM, Parvesh Garg  wrote:
>
> > Hi Erick,
> >
> > Thanks for the suggestion. Like I said, I'm an infant.
> >
> > We tried synonyms both ways. sea biscuit => seabiscuit and seabiscuit =>
> > sea biscuit and didn't understand exactly how it worked. But I just
> checked
> > the analysis tool, and it seems to work perfectly fine at index time.
> Now,
> > I can happily discard my own filter and 4 days of work. I'm happy I got
> to
> > know a few ways on how/when not to write a solr filter :)
> >
> > I tried the string "sea biscuit sea bird" with expand=false and the
> tokens
> > i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But
> at
> > query time, when I enter the same term "sea biscuit sea bird", using
> > edismax and qf, pf2, and pf3, the parsedQuery looks like this:
> >
> > +((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\"biscuit
> sea\")
> > (text:\"sea bird\")) ((text:\"seabiscuit sea\") (text:\"biscuit sea
> > bird\"))"
> >
> > What I wanted instead was this
> >
> > "+((text:seabiscuit) (text:sea) (text:bird)) ((text:\"seabiscuit sea\")
> > (text:\"sea bird\")) (text:\"seabiscuit sea bird\")"
> >
> > Looks like there isn't any other way than to pre-process query myself and
> > create the compound word. What do you mean by "just query the raw
> string"?
> > Am I still missing something?
> >
> > Parvesh Garg
> > http://www.zettata.com
> > (This time I did remove my phone number :) )
> >
> > On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson  > >wrote:
> >
> > > Why did you reject using synonyms? You can have multi-word
> > > synonyms just fine at index time, and at query time, since the
> > > multiple words are already substituted in the index you don't
> > > need to do the same substitution, just query the raw strings.
> > >
> > > I freely acknowledge you may have very good reasons for doing
> > > this yourself, I'm just making sure you know what's already
> > > there.
> > >
> > > See:
> > >
> > >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> > >
> > > Look particularly at the explanations for "sea biscuit" in that
> section.
> > >
> > > Best,
> > > Erick
> > >
> > >
> > >
> > > On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg 
> > wrote:
> > >
> > > > One more thing, Is there a way to remove my "accidentally sent phone
> > > number
> > > > in the signature" from the previous mail? aarrrggghhh
> > > >
> > >
> >
>


Re: Compound words

2013-10-28 Thread Parvesh Garg
Hi Roman, thanks for the link, will go through it.

Erick, will try with expand=true once and check out the results. Will
update this thread with the findings. I remember we rejected expand=true
because of some weird spaghetti problem. Will check it out again.

Thanks,

Parvesh Garg
http://www.zettata.com


On Mon, Oct 28, 2013 at 9:01 PM, Roman Chyla  wrote:

> Hi Parvesh,
> I think you should check the following jira
> https://issues.apache.org/jira/browse/SOLR-5379. You will find there links
> to other possible solutions/problems:-)
> Roman
> On 28 Oct 2013 09:06, "Erick Erickson"  wrote:
>
> > Consider setting expand=true at index time. That
> > puts all the tokens in your index, and then you
> > may not need to have any synonym
> > processing at query time since all the variants will
> > already be in the index.
> >
> > As it is, you've replaced the words in the original with
> > synonyms, essentially collapsed them down to a single
> > word and then you have to do something at query time
> > to get matches. If all the variants are in the index, you
> > shouldn't have to. That's what I meant by "raw".
> >
> > Best,
> > Erick
> >
> >
> > On Mon, Oct 28, 2013 at 8:02 AM, Parvesh Garg 
> wrote:
> >
> > > Hi Erick,
> > >
> > > Thanks for the suggestion. Like I said, I'm an infant.
> > >
> > > We tried synonyms both ways. sea biscuit => seabiscuit and seabiscuit
> =>
> > > sea biscuit and didn't understand exactly how it worked. But I just
> > checked
> > > the analysis tool, and it seems to work perfectly fine at index time.
> > Now,
> > > I can happily discard my own filter and 4 days of work. I'm happy I got
> > to
> > > know a few ways on how/when not to write a solr filter :)
> > >
> > > I tried the string "sea biscuit sea bird" with expand=false and the
> > tokens
> > > i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But
> > at
> > > query time, when I enter the same term "sea biscuit sea bird", using
> > > edismax and qf, pf2, and pf3, the parsedQuery looks like this:
> > >
> > > +((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\"biscuit
> > sea\")
> > > (text:\"sea bird\")) ((text:\"seabiscuit sea\") (text:\"biscuit sea
> > > bird\"))"
> > >
> > > What I wanted instead was this
> > >
> > > "+((text:seabiscuit) (text:sea) (text:bird)) ((text:\"seabiscuit sea\")
> > > (text:\"sea bird\")) (text:\"seabiscuit sea bird\")"
> > >
> > > Looks like there isn't any other way than to pre-process query myself
> and
> > > create the compound word. What do you mean by "just query the raw
> > string"?
> > > Am I still missing something?
> > >
> > > Parvesh Garg
> > > http://www.zettata.com
> > > (This time I did remove my phone number :) )
> > >
> > > On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson <
> erickerick...@gmail.com
> > > >wrote:
> > >
> > > > Why did you reject using synonyms? You can have multi-word
> > > > synonyms just fine at index time, and at query time, since the
> > > > multiple words are already substituted in the index you don't
> > > > need to do the same substitution, just query the raw strings.
> > > >
> > > > I freely acknowledge you may have very good reasons for doing
> > > > this yourself, I'm just making sure you know what's already
> > > > there.
> > > >
> > > > See:
> > > >
> > > >
> > >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> > > >
> > > > Look particularly at the explanations for "sea biscuit" in that
> > section.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > >
> > > >
> > > > On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg 
> > > wrote:
> > > >
> > > > > One more thing, Is there a way to remove my "accidentally sent
> phone
> > > > number
> > > > > in the signature" from the previous mail? aarrrggghhh
> > > > >
> > > >
> > >
> >
>


Re: Compound words

2013-10-29 Thread Parvesh Garg
Hi Erick,

I tried with expand=true and got exactly the same tokens i.e., seabiscuit
sea bird at 1,2 and 3 positions respectively. As per solr documentation at
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory,
explicit mappings ignore the expand parameter in the schema.

So, the problem of creating compound problems at query time remains.


Parvesh Garg
http://www.zettata.com


On Tue, Oct 29, 2013 at 2:11 AM, Parvesh Garg  wrote:

> Hi Roman, thanks for the link, will go through it.
>
> Erick, will try with expand=true once and check out the results. Will
> update this thread with the findings. I remember we rejected expand=true
> because of some weird spaghetti problem. Will check it out again.
>
> Thanks,
>
> Parvesh Garg
> http://www.zettata.com
>
>
> On Mon, Oct 28, 2013 at 9:01 PM, Roman Chyla wrote:
>
>> Hi Parvesh,
>> I think you should check the following jira
>> https://issues.apache.org/jira/browse/SOLR-5379. You will find there
>> links
>> to other possible solutions/problems:-)
>> Roman
>> On 28 Oct 2013 09:06, "Erick Erickson"  wrote:
>>
>> > Consider setting expand=true at index time. That
>> > puts all the tokens in your index, and then you
>> > may not need to have any synonym
>> > processing at query time since all the variants will
>> > already be in the index.
>> >
>> > As it is, you've replaced the words in the original with
>> > synonyms, essentially collapsed them down to a single
>> > word and then you have to do something at query time
>> > to get matches. If all the variants are in the index, you
>> > shouldn't have to. That's what I meant by "raw".
>> >
>> > Best,
>> > Erick
>> >
>> >
>> > On Mon, Oct 28, 2013 at 8:02 AM, Parvesh Garg 
>> wrote:
>> >
>> > > Hi Erick,
>> > >
>> > > Thanks for the suggestion. Like I said, I'm an infant.
>> > >
>> > > We tried synonyms both ways. sea biscuit => seabiscuit and seabiscuit
>> =>
>> > > sea biscuit and didn't understand exactly how it worked. But I just
>> > checked
>> > > the analysis tool, and it seems to work perfectly fine at index time.
>> > Now,
>> > > I can happily discard my own filter and 4 days of work. I'm happy I
>> got
>> > to
>> > > know a few ways on how/when not to write a solr filter :)
>> > >
>> > > I tried the string "sea biscuit sea bird" with expand=false and the
>> > tokens
>> > > i got were seabiscuit sea bird at 1,2 and 3 positions respectively.
>> But
>> > at
>> > > query time, when I enter the same term "sea biscuit sea bird", using
>> > > edismax and qf, pf2, and pf3, the parsedQuery looks like this:
>> > >
>> > > +((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\"biscuit
>> > sea\")
>> > > (text:\"sea bird\")) ((text:\"seabiscuit sea\") (text:\"biscuit sea
>> > > bird\"))"
>> > >
>> > > What I wanted instead was this
>> > >
>> > > "+((text:seabiscuit) (text:sea) (text:bird)) ((text:\"seabiscuit
>> sea\")
>> > > (text:\"sea bird\")) (text:\"seabiscuit sea bird\")"
>> > >
>> > > Looks like there isn't any other way than to pre-process query myself
>> and
>> > > create the compound word. What do you mean by "just query the raw
>> > string"?
>> > > Am I still missing something?
>> > >
>> > > Parvesh Garg
>> > > http://www.zettata.com
>> > > (This time I did remove my phone number :) )
>> > >
>> > > On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson <
>> erickerick...@gmail.com
>> > > >wrote:
>> > >
>> > > > Why did you reject using synonyms? You can have multi-word
>> > > > synonyms just fine at index time, and at query time, since the
>> > > > multiple words are already substituted in the index you don't
>> > > > need to do the same substitution, just query the raw strings.
>> > > >
>> > > > I freely acknowledge you may have very good reasons for doing
>> > > > this yourself, I'm just making sure you know what's already
>> > > > there.
>> > > >
>> > > > See:
>> > > >
>> > > >
>> > >
>> >
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>> > > >
>> > > > Look particularly at the explanations for "sea biscuit" in that
>> > section.
>> > > >
>> > > > Best,
>> > > > Erick
>> > > >
>> > > >
>> > > >
>> > > > On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg 
>> > > wrote:
>> > > >
>> > > > > One more thing, Is there a way to remove my "accidentally sent
>> phone
>> > > > number
>> > > > > in the signature" from the previous mail? aarrrggghhh
>> > > > >
>> > > >
>> > >
>> >
>>
>
>