from:"lee carroll"

Re: How to let crawlers in, but prevent their damage?

2011-01-10 Thread lee carroll

Sorry not an answer but a +1 vote for finding out best practice for this.

Related to it is DOS attacks. We have rewrite rules  in between the proxy
server and solr which attempts to filter out undesriable stuff but would it
be better to have a query app doing this?

any standard rewrite rules which drop invalid or potentially malicious
queries would be very nice :-)

lee c

On 10 January 2011 13:41, Otis Gospodnetic wrote:

> Hi,
>
> How do people with public search services deal with bots/crawlers?
> And I don't mean to ask how one bans them (robots.txt) or slow them down
> (Delay
> stuff in robots.txt) or prevent them from digging too deep in search
> results...
>
> What I mean is that when you have publicly exposed search that bots crawl,
> they
> issue all kinds of crazy "queries" that result in errors, that add noise to
> Solr
> caches, increase Solr cache evictions, etc. etc.
>
> Are there some known recipes for dealing with them, minimizing their
> negative
> side-effects, while still letting them crawl you?
>
> Thanks,
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>

first steps in nlp

2011-01-10 Thread lee carroll

Hi

I'm indexing a set of documents which have a conversational writing style.
In particular the authors are very fond
of listing facts in a variety of ways (this is to keep a human reader
interested) but its causing my index trouble.

For example instead of listing facts like: the house is white, the castle is
pretty.

We get the house is the complete opposite of black and the castle is not
ugly.

What are the best approaches to resolve these sorts of issues. Even if its
just handling "not" correctly would be a good start


cheers lee c

Re: first steps in nlp

2011-01-10 Thread lee carroll

Hi Grant,

Its a search relevancy problem. For example:

a document about london reads like

London is not very good for a peaceful break.

we analyse this at the (i can't remember the technical term) is it lexical
level? (bloody hell i think you may have wrote the book !) anyway which
produces tokens in our index of say

"London good peaceful holiday"

users search for cities which would be nice for them to take a holiday in
say the search is
"good for a peaceful break"

and bang london is top. talk about a relevancy problem :-)

now i was thinking of using phrase matches in the synonyms file but is that
the best approach or could nlp help here?

cheers lee




On 10 January 2011 18:21, Grant Ingersoll  wrote:

>
> On Jan 10, 2011, at 12:42 PM, lee carroll wrote:
>
> > Hi
> >
> > I'm indexing a set of documents which have a conversational writing
> style.
> > In particular the authors are very fond
> > of listing facts in a variety of ways (this is to keep a human reader
> > interested) but its causing my index trouble.
> >
> > For example instead of listing facts like: the house is white, the castle
> is
> > pretty.
> >
> > We get the house is the complete opposite of black and the castle is not
> > ugly.
> >
> > What are the best approaches to resolve these sorts of issues. Even if
> its
> > just handling "not" correctly would be a good start
> >
>
> Hmm, good problem.  I guess I'd start by stepping back and ask what is the
> problem you are trying to solve?  You've stated, I think, one half of the
> problem, namely that your authors have a conversational style, but you
> haven't stated what your users are expecting to do with this information?
>  Is this a pure search app?  Is it something else that is just backed by
> Solr but the user would never do a search?
>
> Do you have a relevance problem?  Also, what is your notion of handling
> "not" correctly?  In other words, more details are welcome!
>
> -Grant
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com
>
>

Re: first steps in nlp

2011-01-11 Thread lee carroll

Just to be more explicit in terms of using synonyms. Our thinking was
something like:

1 analyse texts for patterns such as not x and list these out
2 in a synonyms txt file list in effect antonyms eg
  not pretty -> Ugly
  not ugly -> pretty
  not lively -> quiet
  not very nice -> Ugly
  etc
3 use a synonym filter referencing the antoymns at index time only.

however the language in the text is probably more complex than the above
simple phrases and nlp seems to promise a lot :-) should we venture down
that route instead?

cheers lee c


On 10 January 2011 22:04, lee carroll  wrote:

> Hi Grant,
>
> Its a search relevancy problem. For example:
>
> a document about london reads like
>
> London is not very good for a peaceful break.
>
> we analyse this at the (i can't remember the technical term) is it lexical
> level? (bloody hell i think you may have wrote the book !) anyway which
> produces tokens in our index of say
>
> "London good peaceful holiday"
>
> users search for cities which would be nice for them to take a holiday in
> say the search is
> "good for a peaceful break"
>
> and bang london is top. talk about a relevancy problem :-)
>
> now i was thinking of using phrase matches in the synonyms file but is that
> the best approach or could nlp help here?
>
> cheers lee
>
>
>
>
>
> On 10 January 2011 18:21, Grant Ingersoll  wrote:
>
>>
>> On Jan 10, 2011, at 12:42 PM, lee carroll wrote:
>>
>> > Hi
>> >
>> > I'm indexing a set of documents which have a conversational writing
>> style.
>> > In particular the authors are very fond
>> > of listing facts in a variety of ways (this is to keep a human reader
>> > interested) but its causing my index trouble.
>> >
>> > For example instead of listing facts like: the house is white, the
>> castle is
>> > pretty.
>> >
>> > We get the house is the complete opposite of black and the castle is not
>> > ugly.
>> >
>> > What are the best approaches to resolve these sorts of issues. Even if
>> its
>> > just handling "not" correctly would be a good start
>> >
>>
>> Hmm, good problem.  I guess I'd start by stepping back and ask what is the
>> problem you are trying to solve?  You've stated, I think, one half of the
>> problem, namely that your authors have a conversational style, but you
>> haven't stated what your users are expecting to do with this information?
>>  Is this a pure search app?  Is it something else that is just backed by
>> Solr but the user would never do a search?
>>
>> Do you have a relevance problem?  Also, what is your notion of handling
>> "not" correctly?  In other words, more details are welcome!
>>
>> -Grant
>>
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com
>>
>>
>

Re: DismaxParser Query

2011-01-27 Thread lee carroll

use dismax q for first three fields and a filter query for the 4th and 5th
fields
so
q="keyword1 keyword 2"
qf = field1,feild2,field3
pf = field1,feild2,field3
mm=something sensible for you
defType=dismax
fq=" field4:(keyword3 OR keyword4) AND field5:(keyword5)"

take a look at the dismax docs for extra params



On 27 January 2011 08:52, Isan Fulia  wrote:

> Hi all,
> The query for standard request handler is as follows
> field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR
> field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND
> field5:(keyword5)
>
>
> How the same above query can be written for dismax request handler
>
> --
> Thanks & Regards,
> Isan Fulia.
>

Re: DismaxParser Query

2011-01-27 Thread lee carroll

the default operation can be set in your config to be "or" or on the query
something like q.op=OR



On 27 January 2011 11:26, Isan Fulia  wrote:

> but q="keyword1 keyword2"  does AND operation  not OR
>
> On 27 January 2011 16:22, lee carroll 
> wrote:
>
> > use dismax q for first three fields and a filter query for the 4th and
> 5th
> > fields
> > so
> > q="keyword1 keyword 2"
> > qf = field1,feild2,field3
> > pf = field1,feild2,field3
> > mm=something sensible for you
> > defType=dismax
> > fq=" field4:(keyword3 OR keyword4) AND field5:(keyword5)"
> >
> > take a look at the dismax docs for extra params
> >
> >
> >
> > On 27 January 2011 08:52, Isan Fulia  wrote:
> >
> > > Hi all,
> > > The query for standard request handler is as follows
> > > field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR
> > > field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND
> > > field5:(keyword5)
> > >
> > >
> > > How the same above query can be written for dismax request handler
> > >
> > > --
> > > Thanks & Regards,
> > > Isan Fulia.
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Isan Fulia.
>

Re: DismaxParser Query

2011-01-27 Thread lee carroll

sorry ignore that - we are on dismax here - look at mm param in the docs
you can set this to achieve what you need

On 27 January 2011 11:34, lee carroll  wrote:

> the default operation can be set in your config to be "or" or on the query
> something like q.op=OR
>
>
>
> On 27 January 2011 11:26, Isan Fulia  wrote:
>
>> but q="keyword1 keyword2"  does AND operation  not OR
>>
>> On 27 January 2011 16:22, lee carroll 
>> wrote:
>>
>> > use dismax q for first three fields and a filter query for the 4th and
>> 5th
>> > fields
>> > so
>> > q="keyword1 keyword 2"
>> > qf = field1,feild2,field3
>> > pf = field1,feild2,field3
>> > mm=something sensible for you
>> > defType=dismax
>> > fq=" field4:(keyword3 OR keyword4) AND field5:(keyword5)"
>> >
>> > take a look at the dismax docs for extra params
>> >
>> >
>> >
>> > On 27 January 2011 08:52, Isan Fulia  wrote:
>> >
>> > > Hi all,
>> > > The query for standard request handler is as follows
>> > > field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR
>> > > field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND
>> > > field5:(keyword5)
>> > >
>> > >
>> > > How the same above query can be written for dismax request handler
>> > >
>> > > --
>> > > Thanks & Regards,
>> > > Isan Fulia.
>> > >
>> >
>>
>>
>>
>> --
>> Thanks & Regards,
>> Isan Fulia.
>>
>
>

Re: DismaxParser Query

2011-01-27 Thread lee carroll

with dismax you get to say things like match all terms if less then 3 terms
entered else match term-x
it produces highly flexible and relevant matches and works very well in lots
of common search usescases. field boosting
allows further tuning.

if you have rigid rules like the last one you quote i don't think dismax is
for you. Although i might be wrong and some one might
be able to help



On 27 January 2011 13:32, Isan Fulia  wrote:

> It worked by making mm=0 (it acted as OR operator)
> but how to handle this
>
> field1:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR
> field2:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR
> field3:((keyword1 AND keyword2) OR (keyword3 AND keyword4))
>
>
>
>
> On 27 January 2011 17:06, lee carroll 
> wrote:
>
> > sorry ignore that - we are on dismax here - look at mm param in the docs
> > you can set this to achieve what you need
> >
> > On 27 January 2011 11:34, lee carroll 
> > wrote:
> >
> > > the default operation can be set in your config to be "or" or on the
> > query
> > > something like q.op=OR
> > >
> > >
> > >
> > > On 27 January 2011 11:26, Isan Fulia  wrote:
> > >
> > >> but q="keyword1 keyword2"  does AND operation  not OR
> > >>
> > >> On 27 January 2011 16:22, lee carroll 
> > >> wrote:
> > >>
> > >> > use dismax q for first three fields and a filter query for the 4th
> and
> > >> 5th
> > >> > fields
> > >> > so
> > >> > q="keyword1 keyword 2"
> > >> > qf = field1,feild2,field3
> > >> > pf = field1,feild2,field3
> > >> > mm=something sensible for you
> > >> > defType=dismax
> > >> > fq=" field4:(keyword3 OR keyword4) AND field5:(keyword5)"
> > >> >
> > >> > take a look at the dismax docs for extra params
> > >> >
> > >> >
> > >> >
> > >> > On 27 January 2011 08:52, Isan Fulia 
> > wrote:
> > >> >
> > >> > > Hi all,
> > >> > > The query for standard request handler is as follows
> > >> > > field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR
> > >> > > field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4)
> AND
> > >> > > field5:(keyword5)
> > >> > >
> > >> > >
> > >> > > How the same above query can be written for dismax request handler
> > >> > >
> > >> > > --
> > >> > > Thanks & Regards,
> > >> > > Isan Fulia.
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Thanks & Regards,
> > >> Isan Fulia.
> > >>
> > >
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Isan Fulia.
>

jndi datasource in dataimport

2011-02-05 Thread lee carroll

Hi list,

It looks like you can use a jndi datsource in the data import handler.
however i can't find any syntax on this.

Where is the best place to look for this ? (and confirm if jndi does work in
dataimporthandler)

Re: jndi datasource in dataimport

2011-02-05 Thread lee carroll

ah should this work or am i doing something obvious wrong

in config



in dataimport config


what am i doing wrong ?




On 5 February 2011 10:16, lee carroll  wrote:

> Hi list,
>
> It looks like you can use a jndi datsource in the data import handler.
> however i can't find any syntax on this.
>
> Where is the best place to look for this ? (and confirm if jndi does work
> in dataimporthandler)
>

keepword file with phrases

2011-02-05 Thread lee carroll

Hi List
I'm trying to achieve the following

text in "this aisle contains preserves and savoury spreads"

desired index entry for a field to be used for faceting (ie strict set of
normalised terms)
is "jams" "savoury spreads" ie two facet terms

current set up for the field is


  





  
  





  


The thinking here is
get rid of any mark up nonsense
split into tokens based on whitespace => "this" "aisle" "contains"
"preserves" "and" "savoury" "spreads"
produce shingles of 1 or 2 tokens => "this","this aisle", "aisle", "aisle
contains", "contains", "contains preserves","preserves","and",
  "and savoury",
"savoury", "savoury spreads", "spreads"

expand synonyms using a synomym file (preserves -> jam) =>

"this","this aisle", "aisle", "aisle contains", "contains","contains
preserves","preserves","jam","and","and savoury", "savoury", "savoury
spreads", "spreads"

produce a normalised term list using a keepword file of jam , "savoury
spreads" in it

which should place "jam" "savoury spreads" into the index field facet.

However i don't get savoury spreads in the index. from the analysis.jsp
everything goes to plan upto the last step where the keepword file does not
like keeping the phrase "savoury spreads". i've tried niavely quoting the
phrase in the keepword file :-)

What is the best way to achive the above ? Is this the correct approach or
is there a better way ?

thanks in advance lee

Re: keepword file with phrases

2011-02-05 Thread lee carroll

Just to add things are going not as expected before the keepword, the
synonym list is not be expanded for shingles I think I don't understand term
position

On 5 February 2011 16:08, lee carroll  wrote:

> Hi List
> I'm trying to achieve the following
>
> text in "this aisle contains preserves and savoury spreads"
>
> desired index entry for a field to be used for faceting (ie strict set of
> normalised terms)
> is "jams" "savoury spreads" ie two facet terms
>
> current set up for the field is
>
> 
>   
> 
> 
>  outputUnigrams="true"/>
>  synonyms="goodForSynonyms.txt" ignoreCase="true" expand="true"/>
>  words="goodForKeepWords.txt" ignoreCase="true"/>
>   
>   
> 
> 
>  outputUnigrams="true"/>
>  synonyms="goodForSynonyms.txt" ignoreCase="true" expand="true"/>
>  words="goodForKeepWords.txt" ignoreCase="true"/>
>   
> 
>
> The thinking here is
> get rid of any mark up nonsense
> split into tokens based on whitespace => "this" "aisle" "contains"
> "preserves" "and" "savoury" "spreads"
> produce shingles of 1 or 2 tokens => "this","this aisle", "aisle", "aisle
> contains", "contains", "contains preserves","preserves","and",
>   "and savoury",
> "savoury", "savoury spreads", "spreads"
>
> expand synonyms using a synomym file (preserves -> jam) =>
>
> "this","this aisle", "aisle", "aisle contains", "contains","contains
> preserves","preserves","jam","and","and savoury", "savoury", "savoury
> spreads", "spreads"
>
> produce a normalised term list using a keepword file of jam , "savoury
> spreads" in it
>
> which should place "jam" "savoury spreads" into the index field facet.
>
> However i don't get savoury spreads in the index. from the analysis.jsp
> everything goes to plan upto the last step where the keepword file does not
> like keeping the phrase "savoury spreads". i've tried niavely quoting the
> phrase in the keepword file :-)
>
> What is the best way to achive the above ? Is this the correct approach or
> is there a better way ?
>
> thanks in advance lee
>
>
>
>
>

Re: keepword file with phrases

2011-02-06 Thread lee carroll

Hi Bill,

quoting in the synonyms file did not produce the correct expansion :-(

Looking at Chris's comments now

cheers

lee

On 5 February 2011 23:38, Bill Bell  wrote:

> OK that makes sense.
>
> If you double quote the synonyms file will that help for white space?
>
> Bill
>
>
> On 2/5/11 4:37 PM, "Chris Hostetter"  wrote:
>
> >
> >: You need to switch the order. Do synonyms and expansion first, then
> >: shingles..
> >
> >except then he would be building shingles out of all the permutations of
> >"words" in his symonyms -- including the multi-word synonyms.  i don't
> >*think* that's what he wants based on his example (but i may be wrong)
> >
> >: Have you tried using analysis.jsp ?
> >
> >he already mentioned he has, in his original mail, and that's how he can
> >tell it's not working.
> >
> >lee: based on your followup post about seeing problems in the synonyms
> >output, i suspect the problem you are having is with how the
> >synonymfilter
> >"parses" the synonyms file -- by default it assumes it should split on
> >certain characters to creates multi-word synonyms -- but in your case the
> >tokens you are feeding synonym filter (the output of your shingle filter)
> >really do have whitespace in them
> >
> >there is a "tokenizerFactory" option that Koji added a hwile back to the
> >SYnonymFilterFactory that lets you specify the classname of a
> >TokenizerFactory to use when parsing the synonym rule -- that may be what
> >you need to get your synonyms with spaces in them (so they work properly
> >with your shingles)
> >
> >(assuming of course that i really understand your problem)
> >
> >
> >-Hoss
>
>
>

Re: keepword file with phrases

2011-02-06 Thread lee carroll

Hi Chris,

Yes you've identified the problem :-)

I've tried using keyword tokeniser but that seems to merge all comma
seperated lists of synonyms in one.

the pattern tokeniser would seem to be a candidate but can you pass the
pattern attribute to the tokeniser attribute in the synontm filter ?

example synonym line which is problematic

termA1,termA2,termA3, phrase termA, termA4 => normalisedTermA
termB1,termB2,termB3 => normalisedTermB

when the synonym filter uses the keyword tokeniser

only "phrase term A" ends up being matched as a synonym :-)


lee


On 6 February 2011 12:58, lee carroll  wrote:

> Hi Bill,
>
> quoting in the synonyms file did not produce the correct expansion :-(
>
> Looking at Chris's comments now
>
> cheers
>
> lee
>
>
> On 5 February 2011 23:38, Bill Bell  wrote:
>
>> OK that makes sense.
>>
>> If you double quote the synonyms file will that help for white space?
>>
>> Bill
>>
>>
>> On 2/5/11 4:37 PM, "Chris Hostetter"  wrote:
>>
>> >
>> >: You need to switch the order. Do synonyms and expansion first, then
>> >: shingles..
>> >
>> >except then he would be building shingles out of all the permutations of
>> >"words" in his symonyms -- including the multi-word synonyms.  i don't
>> >*think* that's what he wants based on his example (but i may be wrong)
>> >
>> >: Have you tried using analysis.jsp ?
>> >
>> >he already mentioned he has, in his original mail, and that's how he can
>> >tell it's not working.
>> >
>> >lee: based on your followup post about seeing problems in the synonyms
>> >output, i suspect the problem you are having is with how the
>> >synonymfilter
>> >"parses" the synonyms file -- by default it assumes it should split on
>> >certain characters to creates multi-word synonyms -- but in your case the
>> >tokens you are feeding synonym filter (the output of your shingle filter)
>> >really do have whitespace in them
>> >
>> >there is a "tokenizerFactory" option that Koji added a hwile back to the
>> >SYnonymFilterFactory that lets you specify the classname of a
>> >TokenizerFactory to use when parsing the synonym rule -- that may be what
>> >you need to get your synonyms with spaces in them (so they work properly
>> >with your shingles)
>> >
>> >(assuming of course that i really understand your problem)
>> >
>> >
>> >-Hoss
>>
>>
>>
>

Re: jndi datasource in dataimport

2011-02-08 Thread lee carroll

Hi Still no luck with this is the problem with

the name attribute of the datasource element in the data config ?



On 5 February 2011 10:48, lee carroll  wrote:

> ah should this work or am i doing something obvious wrong
>
> in config
>
>jndiName="java:sourcepathName"
>   type="JdbcDataSource"
>   user="xxx" password="xxx"/>
>
> in dataimport config
>  />
>
> what am i doing wrong ?
>
>
>
>
> On 5 February 2011 10:16, lee carroll wrote:
>
>> Hi list,
>>
>> It looks like you can use a jndi datsource in the data import handler.
>> however i can't find any syntax on this.
>>
>> Where is the best place to look for this ? (and confirm if jndi does work
>> in dataimporthandler)
>>
>
>

more like this

2011-02-11 Thread lee carroll

Hi a MLT query with a q parameter which returns multiple matches such as

q=id:45 id:34
id:54&mlt.fl=filed1&mlt.mindf=1&mlt.mintf=1&mlt=true&fl=id,name

seems to return the results of three seperate mlt queries ie

q=id:45 &mlt.fl=filed1&mlt.mindf=1&mlt.mintf=1&mlt=true&fl=id,name
+
q=id:34 &mlt.fl=filed1&mlt.mindf=1&mlt.mintf=1&mlt=true&fl=id,name
+
q=id:54&mlt.fl=filed1&mlt.mindf=1&mlt.mintf=1&mlt=true&fl=id,name

rather than a combined similarity of all three

Is this becuase field1 is not storing term verctors ?

How best to achive a combined similarity mlt ?

Re: Synonyms.txt

2011-02-20 Thread lee carroll

Hi Marc,
I don't want to sound to prissy and also assume to much about your
application but a generic synonym file could do more harm than good. Lots of
applications have specific vocabularies and a specific synonym list is what
is needed. Remember synonyms increase recall but reduce precision. The
better matched your synonym list is to your users and their searches the
better this ratio between recall and precision will be.

Without knowing your app or motivation I'd say don't go for a generic list
but maybe its right for your circumstances.

see this thread here

http://lucene.472066.n3.nabble.com/French-synonyms-amp-Online-synonyms-td488829.html

On 20 February 2011 15:58, Marc Kalberer  wrote:

> Hello,
> Is there any free Synonyms.txt available on internet ?  Wasn't able to find
> any.  Specially interested by the french version.
> ++
> Marc
> --
> *Programmers.ch*
> Développement WEB
> Solutions libres et Opensources
> Tel: ++41 76 44 888 72
> Site: http://www.programmers.ch
>

Re: Faceting question

2011-05-13 Thread lee carroll

Hi Mark,
I think you would need to issue two seperate queries. Its also a, I was
going to say odd
usecase but who am I to judge, interesting usecase. If you have a faceted
navigation front
end you are in real danger of confusing your users. I suppose its a case of
what do you want to achieve? Faceting mat not be the way to go.

lee c

On 13 May 2011 15:56, Mark  wrote:

> No mixup. I probably didn't explain myself correctly.
>
> Suppose my document has fields "title", "description" and "foo". When I
> search I would like to search across "title" and "description". I then would
> like facet counts on "foo" for documents that matched the "title" field
> only. IE, I would like the faceting behavior on "foo" to be exactly as if i
> searched against only the "title" field.
>
> Does that make sense?
>
>
> On 5/12/11 11:30 PM, Otis Gospodnetic wrote:
>
>> Hi,
>>
>> I think there is a bit of a mixup here.  Facets are not about which field
>> a
>> match was on, but about what values hits have in one or more fields you
>> facet
>> on.
>>
>> Otis
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>>
>> - Original Message 
>>
>>> From: Mark
>>> To: solr-user@lucene.apache.org
>>> Sent: Fri, May 13, 2011 1:19:10 AM
>>> Subject: Faceting question
>>>
>>> Is there anyway to perform a search that searches across 2 fields yet
>>> only
>>> gives  me facets accounts for documents matching 1 field?
>>>
>>> For example
>>>
>>> If  I have fields A&  B and I perform a search across I would like to
>>> match my
>>> query across either of these two fields. I would then like facet counts
>>> for how
>>> many documents matched in field A only.
>>>
>>> Can this accomplished? If not out  of the box what classes should I look
>>> into
>>> to create this  myself?
>>>
>>> Thanks
>>>
>>>

Re: Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-05-31 Thread lee carroll

Tanguy

You might have tried this already but can you set overwritedupes to
false and set the signiture key to be the id. That way solr
will manage updates?

from the wiki

http://wiki.apache.org/solr/Deduplication



HTH

Lee


On 30 May 2011 08:32, Tanguy Moal  wrote:
>
> Hello,
>
> Sorry for re-posting this but it seems my message got lost in the mailing 
> list's messages stream without hitting anyone's attention... =D
>
> Shortly, has anyone already experienced dramatic indexing slowdowns during 
> large bulk imports with overwriteDupes turned on and a fairly high duplicates 
> rate (around 4-8x) ?
>
> It seems to produce a lot of deletions, which in turn appear to make the 
> merging of segments pretty slow, by fairly increasing the number of little 
> reads operations occuring simultaneously with the regular large write 
> operations of the merge. Added to the poor IO performances of a commodity 
> SATA drive, indexing takes ages.
>
> I temporarily bypassed that limitation by disabling the overwriting of 
> duplicates, but that changes the way I request the index, requiring me to 
> turn on field collapsing at search time.
>
> Is this a known limitation ?
>
> Has anyone a few hints on how to optimize the handling of index time 
> deduplication ?
>
> More details on my setup and the state of my understanding are in my previous 
> message here-after.
>
> Thank you very much in advance.
>
> Regards,
>
> Tanguy
>
> On 05/25/11 15:35, Tanguy Moal wrote:
>>
>> Dear list,
>>
>> I'm posting here after some unsuccessful investigations.
>> In my setup I push documents to Solr using the StreamingUpdateSolrServer.
>>
>> I'm sending a comfortable initial amount of documents (~250M) and wished to 
>> perform overwriting of duplicated documents at index time, during the 
>> update, taking advantage of the UpdateProcessorChain.
>>
>> At the beginning of the indexing stage, everything is quite fast; documents 
>> arrive at a rate of about 1000 doc/s.
>> The only extra processing during the import is computation of a couple of 
>> hashes that are used to identify uniquely documents given their content, 
>> using both stock (MD5Signature) and custom (derived from Lookup3Signature) 
>> update processors.
>> I send a commit command to the server every 500k documents sent.
>>
>> During a first period, the server is CPU bound. After a short while (~10 
>> minutes), the rate at which documents are received starts to fall 
>> dramatically, the server being IO bound.
>> I've been firstly thinking of a normal speed decrease during the commit, 
>> while my push client is waiting for the flush to occur. That would have been 
>> a normal slowdown.
>>
>> The thing that retained my attention was the fact that unexpectedly, the 
>> server was performing a lot of small reads, way more the number writes, 
>> which seem to be larger.
>> The combination of the many small reads with the constant amount of bigger 
>> writes seem to be creating a lot of IO contention on my commodity SATA 
>> drive, and the ETA of my built index started to increase scarily =D
>>
>> I then restarted the JVM with JMX enabled so I could start investigating a 
>> little bit more. I've the realized that the UpdateHandler was performing 
>> many reads while processing the update request.
>>
>> Are there any known limitations around the UpdateProcessorChain, when 
>> overwriteDupes is set to true ?
>> I turned that off, which of course breaks the intent of my built index, but 
>> for comparison purposes it's good.
>>
>> That did the trick, indexing is fast again, even with the periodic commits.
>>
>> I therefor have two questions, an interesting first  one and a boring second 
>> one :
>>
>> 1 / What's the workflow of the UpdateProcessorChain when one or more 
>> processors have overwriting of duplicates turned on ? What happens under the 
>> hood ?
>>
>> I tried to answer that myself looking at DirectUpdateHandler2 and my 
>> understanding stopped at the following :
>> - The document is added to the lucene IW
>> - The duplicates are deleted from the lucene IW
>> The dark magic I couldn't understand seems to occur around the idTerm and 
>> updateTerm things, in the addDoc method. The deletions seem to be buffered 
>> somewhere, I just didn't get it :-)
>>
>> I might be wrong since I didn't read the code more than that, but the point 
>> might be at how does solr handles deletions, which is something still 
>> unclear to me. In anyways, a lot of reads seem to occur for that precise 
>> task and it tends to produce a lot of IO, killing indexing performances when 
>> overwriteDupes is on. I don't even understand why so many read operations 
>> occur at this stage since my process had a comfortable amount of RAM (with 
>> Xms=Xmx=8GB), with only 4.5GB are used so far.
>>
>> Any help, recommandation or idea is welcome :-)
>>
>> 2 / In the case there isn't a simple fix for this, I'll have to do with 
>> duplicates in my index. I don't mind since solr offers a great gro

Re: Synonyms valid only in specific categories of data

2011-06-01 Thread lee carroll

I don't think you can assign a synonyms file dynamically to a field.
you would need to create multiple fields for each lang / cat phrases
and have their own synonyms file referenced for each field. that would
be a lot of fields.



On 1 June 2011 09:59, Spyros Kapnissis  wrote:
> Hello to all,
>
>
> I have a collection of text phrases in more than 20 languages that I'm 
> indexing
> in solr. Each phrase belongs to one of about 30 different phrase categories. I
> have specified different fields for each language and added a synonym filter 
> at
> query time. I would however like the synonym filter to take into account the
> category as well. So, a specific synonym should be valid and used only in one 
> or
> more categories per language. (the category is indexed in another field).
>
> Is this somehow possible in the current SynonymFilterFactory implementation?
>
> Hope it makes sense.
>
> Thank you,
> Spyros
>

Re: synonyms problem

2011-06-02 Thread lee carroll

Deniz,

it looks like you are missing an index anlayzer ? or have you removed
that for brevity ?

lee c

On 2 June 2011 10:41, Gora Mohanty  wrote:
> On Thu, Jun 2, 2011 at 11:58 AM, deniz  wrote:
>> Hi all,
>>
>> here is a piece from my solfconfig:
> [...]
>> but somehow synonyms are not read... I mean there is no match when i use a
>> word in the synonym file... any ideas?
> [...]
>
> Please provide further details, e.g., is your field in schema.xml using
> this fieldType, one example line from the synonyms.txt file, how are
> you searching, what results you expect to get, and what are the actual
> results.
>
> Also, while this is not the issue here, normally the fieldType
> "string" is a non-analyzed field, and one would normally use
> a different fieldType, e.g., "text" for data that are to be analyzed.
>
> Regards,
> Gora
>

Re: synonyms problem

2011-06-02 Thread lee carroll

oh and its a string field change this to be text if you need analysis

class="solr.StrField"

lee c

On 2 June 2011 11:45, lee carroll  wrote:
> Deniz,
>
> it looks like you are missing an index anlayzer ? or have you removed
> that for brevity ?
>
> lee c
>
> On 2 June 2011 10:41, Gora Mohanty  wrote:
>> On Thu, Jun 2, 2011 at 11:58 AM, deniz  wrote:
>>> Hi all,
>>>
>>> here is a piece from my solfconfig:
>> [...]
>>> but somehow synonyms are not read... I mean there is no match when i use a
>>> word in the synonym file... any ideas?
>> [...]
>>
>> Please provide further details, e.g., is your field in schema.xml using
>> this fieldType, one example line from the synonyms.txt file, how are
>> you searching, what results you expect to get, and what are the actual
>> results.
>>
>> Also, while this is not the issue here, normally the fieldType
>> "string" is a non-analyzed field, and one would normally use
>> a different fieldType, e.g., "text" for data that are to be analyzed.
>>
>> Regards,
>> Gora
>>
>

Re: Multilingual text analysis

2011-06-02 Thread lee carroll

Juan

I don't think so.

you can try indexing fields like myfield_en. myfield_fr, my field_xx
if you now what language you are dealing with at index and query time.

you can also have seperate cores for your documents for each language
if you don't want to complicate your schema
again you will need to know language at index and query time



On 2 June 2011 08:57, Juan Antonio Farré Basurte
 wrote:
> Hello,
> Some of the possible analyzers that can be applied to a text field, depend on 
> the language of the text to analyze and can be configured for a concrete 
> language.
> In my case, the text fields can be in many different languages, but each 
> document also includes a field containing the language of text fields.
> Is it possible to configure analyzers to use the suitable language for each 
> document, in function of the language field?
> Thanks,
>
> Juan

Re: How to display search results of solr in to other application.

2011-06-02 Thread lee carroll

this is from another post and could help

Can you use a javascript library which handles ajax and json / jsonp
You will end up with much cleaner client code for example a jquery
implementation looks quite nice using solrs neat jsonp support:

queryString = "*:*"
$.getJSON(
"http://[server]:[port]/solr/select/?jsoncallback=?";,
{"q": queryString,
"version": "2.2",
"start": "0",
"rows": "10",
"indent": "on",
"json.wrf": "callbackFunctionToDoSomethingWithOurData",
"wt": "json",
"fl": "field1"}
);

and the callback function

function callbackFunctionToDoSomethingWithOurData(solrData) {
   // do stuff with your nice data
}

Their is also a javascript client for solr as well but i've not used this

On 2 June 2011 08:14, Romi  wrote:
> Hi, I am creating indexes using solr which is running on jetty server port
> 8983, and my application is running on tomcat server port 8080. Now my
> problem is i want to display the results of search on my application. i
> created a ajax-javascript page for parsing Json object. now please suggest
> me how i send my request to solr server for search and get back the result.
>
> Here is my sample html file where i parsed Json data.
>
> 
> 
> Solr Ajax Example
>
> 
> 
>
> 
>  query: 
>  
>
> 
> Raw JSON String: 
> 
> 
> 
>
>
>
> I suppose i am making mistake in xmlhttpPost("/solr/db/select").
>
> Thanks and regards
> Romi.
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3014101.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: how to request for Json object

2011-06-02 Thread lee carroll

use solrs jasonp format



On 2 June 2011 08:54, Romi  wrote:
> sorry for the inconvenience, please look at this file
> http://lucene.472066.n3.nabble.com/file/n3014224/JsonJquery.text
> JsonJquery.text
>
>
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014224.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: how to request for Json object

2011-06-02 Thread lee carroll

just to re-iterate jasonp gets round ajax same server policy



2011/6/2 François Schiettecatte :
> This is not really an issue with SOLR per se, and I have run into this 
> before, you will need to read up on 'Access-Control-Allow-Origin' which needs 
> to be set in the http headers that your ajax pager is returning. Beware that 
> not all browsers obey it and Olivier is right when he suggested creating a 
> proxy, which I did.
>
> François
>
> On Jun 2, 2011, at 3:27 AM, Romi wrote:
>
>> How to parse Json through ajax when your ajax pager is on one
>> server(Tomcat)and Json object is of onther server(solr server). i mean i
>> have to make a request to another server, how can i do it .
>>
>> -
>> Thanks & Regards
>> Romi
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: How to display search results of solr in to other application.

2011-06-02 Thread lee carroll

did you include the jquery lib,
make sure you use the jsasoncallback

ie
$.getJSON(
   "http://[server]:[port]/solr/select/?jsoncallback=?";,
   {"q": queryString,
   "version": "2.2",
   "start": "0",
   "rows": "10",
   "indent": "on",
   "json.wrf": "callbackFunctionToDoSomethingWithOurData",
   "wt": "json",
   "fl": "field1"}
   );

not what you have got



On 2 June 2011 13:00, Romi  wrote:
> I did this:
>
>  $(document).ready(function(){
>
>
> $.getJSON("http://[remotehost]:8983/solr/select/?q=diamond&wt=json&json.wrf=?";,
> function(result){
>
>  alert("hello" + result.response.docs[0].name);
> });
> });
>
>
> But i am not getting any result, what i did wrong ??
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3014797.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: how to make getJson parameter dynamic

2011-06-02 Thread lee carroll

Hi Romi, this is the third thread you have created on this subject.
Its not good and will get you ignored by many people who could help.

The question relates to js rather than SOLR now. See any good js
manual or site for how to assign values to a variable and then
concatanate these into a string.

lee c

On 2 June 2011 13:40, Romi  wrote:
>  $.getJSON("http://192.168.1.9:8983/solr/db/select/?q=diamond&wt=json&json.wrf=?";,
> function(result){
>
>  alert("hello" + result.response.docs[0].name);
> });
> });
>
> using this i am parsing solr json response, but as you can see it is hard
> coded (q=diamond) how can i make it user's choice. i mean user can pass the
> query at run time for example using a text box.
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-make-getJson-parameter-dynamic-tp3014941p3014941.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Need Schema help

2011-06-02 Thread lee carroll

Denis,

would dynamic fields help:
field defined as *_price in schema

at index time you index fields named like:
[1-9]_[0-99]_price

at query time you search the price field for a given country region
1_10_price:[10 TO 100]

This may work for some use-cases i guess

lee

2011/6/2 Denis Kuzmenok :
> Hi)
>
> What i need:
> Index  prices  to  products, each product has multiple prices, to each
> region, country, and price itself.
> I   tried   to  do  with  field  type  "long"  multiple:true, and form
> value  as  "country  code  +  region code + price" (1004000349601, for
> example), but it has strange behaviour.. price:[* TO 1004000349600] do
> include 1004000349601.. I am doing something wrong?
>
> Possible data:
> Country: 1-9
> Region: 0-99
> Price: 1-999
>
>

Re: Search with Synonyms in two fields

2011-06-04 Thread lee carroll

I'm not sure if this is what you mean:
copy field1 to field2 and for field 2 apply your analysis chain with
your synonym list

query something like field1:DE123 and field2:DE123

or have i missed the point, if so can you clarify your use case

cheers lee c

On 4 June 2011 08:44, occurred  wrote:
> Hello,
>
> a query will be like this:
>
> field1:(DE123)
>
> my synonyms are:
> DE123 => 123,456,789
>
> then SOLR should search in field1 for DE123 and in another specified field
> for the synonyms so for example:
> in field2 for 123 OR 456 OR 789
>
> is this somehow possible?
>
> cheers
> Charlie
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Search-with-Synonyms-in-two-fields-tp3022534p3022534.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Search with Synonyms in two fields

2011-06-04 Thread lee carroll

your app can do the above search?

On 4 June 2011 20:42, occurred  wrote:
> No, there should be only one field search:
> field1:DE123
>
> and then based on the config of a FilterFactory Solr will also search in
> field2 based on a synonym list.
> But also field1 should be search with DE123
>
> cheers
> Charlie
>
> Am 04.06.11 17:34, schrieb lee carroll [via Lucene]:
>> I'm not sure if this is what you mean:
>> copy field1 to field2 and for field 2 apply your analysis chain with
>> your synonym list
>>
>> query something like field1:DE123 and field2:DE123
>>
>> or have i missed the point, if so can you clarify your use case
>>
>> cheers lee c
>>
>> On 4 June 2011 08:44, occurred <[hidden email]
>> > wrote:
>>
>> > Hello,
>> >
>> > a query will be like this:
>> >
>> > field1:(DE123)
>> >
>> > my synonyms are:
>> > DE123 => 123,456,789
>> >
>> > then SOLR should search in field1 for DE123 and in another specified
>> field
>> > for the synonyms so for example:
>> > in field2 for 123 OR 456 OR 789
>> >
>> > is this somehow possible?
>> >
>> > cheers
>> > Charlie
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/Search-with-Synonyms-in-two-fields-tp3022534p3022534.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>>
>>
>> 
>> If you reply to this email, your message will be added to the
>> discussion below:
>> http://lucene.472066.n3.nabble.com/Search-with-Synonyms-in-two-fields-tp3022534p3023443.html
>>
>> To unsubscribe from Search with Synonyms in two fields, click here
>> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3022534&code=c2NoYXVibWFpckBpbmZvZGllbnN0LWF1c3NjaHJlaWJ1bmdlbi5kZXwzMDIyNTM0fDUxNjgwOTg4>.
>>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Search-with-Synonyms-in-two-fields-tp3022534p3024166.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: URGENT HELP: Improving Solr indexing time

2011-06-04 Thread lee carroll

Rohit - you have double posted maybe - did Otis's answer not help with
your issue or at least need a response to clarify ?

On 4 June 2011 22:53, Chris Cowan  wrote:
> How long does the query against the DB take (outside of Solr)? If that's slow 
> then it's going to take a while to update the index. You might need to figure 
> a way to break things up a bit, maybe use a delta import instead of a full 
> import.
>
> Chris
>
> On Jun 4, 2011, at 6:23 AM, Rohit Gupta wrote:
>
>> My Solr server takes very long to update index. The table it hits to index is
>> huge with 10Million + records , but even in that case I feel this is very 
>> long
>> time to index. Below is the snapshot of the /dataimport page
>>
>> busy
>> A command is still running...
>> 
>> 1:53:39.664
>> 16276
>> 24237
>> 16273
>> 0
>> 2011-06-04 11:25:26
>> 
>>
>> How can i determine why this is happening and how can I improve this. During 
>> all
>> our test on the local server before the migration we could index 5 million
>> records in 4-5 hrs, but now its taking too long on the live server.
>>
>> Regards,
>> Rohit
>
>

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread lee carroll

Gabriele
Lucene uses a combination of boolean and VSM for its IR.

A straight forward query for a keyword will only match docs with that keyword.

Now things quickly get subtle and complex the more sugar you add, more
complicated queries across fields and more complex
analysis chains but I think the short answer to your question is C
will not be returned, it will not be scored either

lee c

On 7 June 2011 08:30, Gabriele Kahlout  wrote:
> On Tue, Jun 7, 2011 at 8:43 AM, pravesh  wrote:
>
>> >k0 --> A | C
>> >k1 --> A | B
>> >k2 --> A | B | C
>> >k3 --> B | C
>> >Now let q=k1, how do I make sure C doesn't appear as a result since it
>> doesn't contain any occurence of k1?
>> Do we bother to do that. Now that's what lucene does :)
>>
>> Lucene/Solr doesn't do that, it ranks documents based on a scoring
> function, and with that it lacks the possibility of specifying that a
> particular term must appear (the closest way I know of is boosting it).
>
> The solution would be a way to tell Solr/lucene which documents/indices to
> query, i.e. query only the union/intersection of the documents in which
> k1,...kn appear, instead of query all indexed documents and apply the
> ranking function (which will give weight to documents that contains
> k1...kn).
>
>
>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/How-do-I-make-sure-the-resulting-documents-contain-the-query-terms-tp3031637p3033451.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>

Re: Compound word search not what I expected

2011-06-07 Thread lee carroll

see
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

from the wiki

Example of generateWordParts="1" and catenateWords="1":
"PowerShot" -> 0:"Power", 1:"Shot" 1:"PowerShot"
(where 0,1,1 are token positions)
"A's+B's&C's" -> 0:"A", 1:"B", 2:"C", 2:"ABC"
"Super-Duper-XL500-42-AutoCoder!" -> 0:"Super", 1:"Duper", 2:"XL",
2:"SuperDuperXL", 3:"500" 4:"42", 5:"Auto", 6:"Coder", 6:"AutoCoder"

One use for WordDelimiterFilter is to help match words with different
delimiters. One way of doing so is to specify generateWordParts="1"
catenateWords="1" in the analyzer used for indexing, and
generateWordParts="1" in the analyzer used for querying. Given that
the current StandardTokenizer immediately removes many intra-word
delimiters, it is recommended that this filter be used after a
tokenizer that leaves them in place (such as WhitespaceTokenizer).

Re: Default query parser operator

2011-06-07 Thread lee carroll

Hi Brian could your front end app do this field query logic?

(assuming you have an app in front of solr)



On 7 June 2011 18:53, Jonathan Rochkind  wrote:
> There's no feature in Solr to do what you ask, no. I don't think.
>
> On 6/7/2011 1:30 PM, Brian Lamb wrote:
>>
>> Hi Jonathan,
>>
>> Thank you for your reply. Your point about my example is a good one. So
>> let
>> me try to restate using your example. Suppose I want to apply AND to any
>> search terms within field1.
>>
>> Then
>>
>> field1:foo field2:bar field1:baz field2:bom
>>
>> would by written as
>>
>> http://localhost:8983/solr/?q=field1:foo OR field2:bar OR field1:baz OR
>> field2:bom
>>
>> But if they were written together like:
>>
>> http://localhost:8983/solr/?q=field1:(foo baz) field2:(bar bom)
>>
>> I would want it to be
>>
>> http://localhost:8983/solr/?q=field1:(foo AND baz) OR field2:(bar OR bom)
>>
>> But it sounds like you are saying that would not be possible.
>>
>> Thanks,
>>
>> Brian Lamb
>>
>> On Tue, Jun 7, 2011 at 11:27 AM, Jonathan Rochkind
>>  wrote:
>>
>>> Nope, not possible.
>>>
>>> I'm not even sure what it would mean semantically. If you had default
>>> operator "OR" ordinarily, but default operator "AND" just for "field2",
>>> then
>>> what would happen if you entered:
>>>
>>> field1:foo field2:bar field1:baz field2:bom
>>>
>>> Where the heck would the ANDs and ORs go?  The operators are BETWEEN the
>>> clauses that specify fields, they don't belong to a field. In general,
>>> the
>>> operators are part of the query as a whole, not any specific field.
>>>
>>> In fact, I'd be careful of your example query:
>>>    q=field1:foo bar field2:baz
>>>
>>> I don't think that means what you think it means, I don't think the
>>> "field1" applies to the "bar" in that case. Although I could be wrong,
>>> but
>>> you definitely want to check it.  You need "field1:foo field1:bar", or
>>> set
>>> the default field for the query to "field1", or use parens (although that
>>> will change the execution strategy and ranking): q=field1:(foo bar)
>>> 
>>>
>>> At any rate, even if there's a way to specify this so it makes sense, no,
>>> Solr/lucene doesn't support any such thing.
>>>
>>>
>>>
>>>
>>> On 6/7/2011 10:56 AM, Brian Lamb wrote:
>>>
 I feel like this should be fairly easy to do but I just don't see
 anywhere
 in the documentation on how to do this. Perhaps I am using the wrong
 search
 parameters.

 On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb
 wrote:

  Hi all,
>
> Is it possible to change the query parser operator for a specific field
> without having to explicitly type it in the search field?
>
> For example, I'd like to use:
>
> http://localhost:8983/solr/search/?q=field1:word token field2:parser
> syntax
>
> instead of
>
> http://localhost:8983/solr/search/?q=field1:word AND token
> field2:parser
> syntax
>
> But, I only want it to be applied to field1, not field2 and I want the
> operator to always be AND unless the user explicitly types in OR.
>
> Thanks,
>
> Brian Lamb
>
>
>

Re: Solr Coldfusion Search Issue

2011-06-07 Thread lee carroll

Can you see the query actually presented to solr in the logs ?

maybe capture that and then run it with a debug true in the admin pages.

sorry i cant help directly with your syntax


On 7 June 2011 23:06, Alejandro Delgadillo  wrote:
> Hi,
>
> I¹m having some troubles using Solr throught Coldfusion,  the problem right
> now is that when I search for a term in a Custom field, the results
> sometimes have the value that I sent to the custom field and not to the
> field that contains the text, this is the cfsearch sintax that I¹m using:
>
>  criteria='""contents:#form.search#""AND""custom1:#form.tema#""AND""custom2:#
> form.dia#""AND""custom4:#form.anio#""AND""custom3:#form.mon#""'
> name="result" status="meta" startrow="#url.start#" maxrows="#max#"
> contextpassages="5" contexthighlightbegin="B"
> contexthighlightend="BE" suggestions="always">
>
> Every custom fields gets the value by a combo box or drop box with a list of
> option, the thing is that when the users sends a search for CUSTOM1,
> sometimes the results include the same searched value un CONTENTS...
>
> Do anyone have an idea on how to fix this?
>
> I¹ll appreciate all the help I can get.
>
> Regards.
> Alex
>

Re: Boosting result on query.

2011-06-08 Thread lee carroll

If you could move to 3.x and your "linked item" boosts could be
calculated offline in batch periodically you could use an external
file field to store the doc boost.

a few If's though



On 8 June 2011 03:23, Jeff Boul  wrote:
> Hi,
>
> I am trying to figure out options for the following problem. I am on
> Solr 1.4.1 (Lucene 2.9.1).
>
> I need to perform a boost on a query related to the value of a multiple
> value field.
>
> Lets say the result return the following documents:
>
> id   name    linked_items
> 3     doc3    (item1, item33, item55)
> 8     doc8    (item2, item55, item8)
> 0     doc0    (item7)
> 1     doc1    (item1)
> 
>
> I want the result to be boosted regarding the foollowing ordered list of
> linked_items values:
>
> item2 > item55 > item1 > ...
>
> So doc8 will received the higher boost because his 'linked_items' contains
> 'item2'
> then doc3 will received a lower boost because his 'linked_items' contains
> 'item55'
> then doc1 will received a much lower boost because his 'linked_items'
> contains 'item1'
> and maybe doc0 will received some boost if 'item7' is somewhere in the list.
>
> The tricky part is that the ordered list is obtained by querying on an other
> index. So the result of the query on the other index will give me a result
> and I will use the values of one field of those documents to construct the
> ordered list.
>
> It would be even better if the boost not use only the order but also the
> score of the result of the query on the other index.
>
> I'm not very used to Solr and Lucene but from what I read, I think that the
> solution turns around a customization of the Query object.
>
> So the questions are:
>
> 1) Am I right with the Query's cutomization assumption? (if so... can
> someone could give me advices or point me an example of something related)
> 2) Is something already exist that i could use to do that?
> 3) Is that a good approach to use separate index?
>
> Thanks for the help
>
> Jeff
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Boosting-result-on-query-tp3037649p3037649.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Getting a query on an "fl" parameter value ?

2011-06-08 Thread lee carroll

try
http://wiki.apache.org/solr/CommonQueryParameters#fq

On 7 June 2011 16:14, duddy67  wrote:
> Hi all,
>
> I'd like to know if it's possible to get a query on an "fl" value.
> For now my url query looks like that:
>
> /solr/select/?q=keyword&version=2.2&start=0&rows=10&indent=on&fl=id+name+title
>
> it works but I need request also on a "fl" parameter value.
> I'd like to add to my initial query a kind of:  WHERE the "fl" id value is
> equal to 12 OR 45 OR 32.
>
> How can I do that ?
>
>
> Thanks for advance.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Getting-a-query-on-an-fl-parameter-value-tp3034887p3034887.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Tokenising based on known words?

2011-06-09 Thread lee carroll

we've played with HyphenationCompoundWordTokenFilterFactory it works
better than maintaining a word dictionary to split (although we ended
up not using it for reasons i can't recall)

see

http://lucene.apache.org/solr/api/org/apache/solr/analysis/HyphenationCompoundWordTokenFilterFactory.html



On 9 June 2011 06:42, Gora Mohanty  wrote:
> On Thu, Jun 9, 2011 at 4:37 AM, Mark Mandel  wrote:
>> Not sure if this possible, but figured I would ask the question.
>>
>> Basically, we have some users who do some pretty rediculous things ;o)
>>
>> Rather than writing "red jacket", they write "redjacket", which obviously
>> returns no results.
> [...]
>
> Have you tried using synonyms,
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> It seems like they should fit your use case.
>
> Regards,
> Gora
>

Re: Boost or sort a query with range values

2011-06-09 Thread lee carroll

[* TO *]^5

On 9 June 2011 11:31, jlefebvre  wrote:
> Hello
>
> I try to boost a query with a range values but I can't find the correct
> syntax :
> this is ok .&bq=myfield:"-1"^5 but I want to do something lik this
> &bq=myfield:"-1 to 1"^5
>
> Boost value from -1 to 1
>
> thanks
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Boost-or-sort-a-query-with-range-values-tp3043328p3043328.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Document has fields with different update frequencies: how best to model

2011-06-10 Thread lee carroll

Hi,
We have a document type which has fields which are pretty static. Say
they change once every 6 month. But the same document has a field
which changes hourly
What are the best approaches to index this document ?

Eg
Hotel ID (static) , Hotel Description (static and costly to get from a
url etc), FromPrice (changes hourly)

Option 1
Index hourly as a single document and don't worry about the unneeded
field updates

Option 2
Split into 2 document types and index independently. This would
require the front end application to query multiple times?
doc1
ID,Description,DocType
doc2
ID,HotelID,Price,DocType

application performs searches based on hotel attributes
for each hotel match issue query to get price


Any other options ? Can you query across documents ?

We run 1.4.1, we could maybe update to 3.2 but I don't think I could
swing to trunk for JOIN feature (if that indeed is JOIN's use case)

Thanks in advance

PS Am I just worrying about de-normalised data and should sort the
source data out maybe by caching and get over it ...?

cheers Lee c

Re: Document has fields with different update frequencies: how best to model

2011-06-11 Thread lee carroll

Hi Jay
I thought external file field could not be returned as a field but
only used in scoring.
trunk has pseudo field which can take a function value but we cant
move to trunk.

also its a more general question around schema design, what happens if
you have several fields with different update frequencies. It does not
seem external file field is the use case for this.



On 10 June 2011 20:13, Jay Luker  wrote:
> Take a look at ExternalFileField [1]. It's meant for exactly what you
> want to do here.
>
> FYI, there is an issue with caching of the external values introduced
> in v1.4 but, thankfully, resolved in v3.2 [2]
>
> --jay
>
> [1] 
> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
> [2] https://issues.apache.org/jira/browse/SOLR-2536
>
>
> On Fri, Jun 10, 2011 at 12:54 PM, lee carroll
>  wrote:
>> Hi,
>> We have a document type which has fields which are pretty static. Say
>> they change once every 6 month. But the same document has a field
>> which changes hourly
>> What are the best approaches to index this document ?
>>
>> Eg
>> Hotel ID (static) , Hotel Description (static and costly to get from a
>> url etc), FromPrice (changes hourly)
>>
>> Option 1
>> Index hourly as a single document and don't worry about the unneeded
>> field updates
>>
>> Option 2
>> Split into 2 document types and index independently. This would
>> require the front end application to query multiple times?
>> doc1
>> ID,Description,DocType
>> doc2
>> ID,HotelID,Price,DocType
>>
>> application performs searches based on hotel attributes
>> for each hotel match issue query to get price
>>
>>
>> Any other options ? Can you query across documents ?
>>
>> We run 1.4.1, we could maybe update to 3.2 but I don't think I could
>> swing to trunk for JOIN feature (if that indeed is JOIN's use case)
>>
>> Thanks in advance
>>
>> PS Am I just worrying about de-normalised data and should sort the
>> source data out maybe by caching and get over it ...?
>>
>> cheers Lee c
>>
>

Re: Document has fields with different update frequencies: how best to model

2011-06-11 Thread lee carroll

Thanks Jay for the quick reply.

Maybe we can set up a dev env with trunk and use JOIN.

Is JOIN a good use case for this ?

On 11 June 2011 15:28, Jay Luker  wrote:
> You are correct that ExternalFileField values can only be used in
> query functions (i.e. scoring, basically). Sorry for firing off that
> answer without reading your use case more carefully.
>
> I'd be inclined towards giving your Option #1 a try, but that's
> without knowing much about the scale of your app, size of your index,
> documents, etc. Unneeded field updates are only a problem if they're
> causing performance problems, right? Otherwise, trying to avoid seems
> like premature optimization.
>
> --jay
>
> On Sat, Jun 11, 2011 at 5:26 AM, lee carroll
>  wrote:
>> Hi Jay
>> I thought external file field could not be returned as a field but
>> only used in scoring.
>> trunk has pseudo field which can take a function value but we cant
>> move to trunk.
>>
>> also its a more general question around schema design, what happens if
>> you have several fields with different update frequencies. It does not
>> seem external file field is the use case for this.
>>
>>
>>
>> On 10 June 2011 20:13, Jay Luker  wrote:
>>> Take a look at ExternalFileField [1]. It's meant for exactly what you
>>> want to do here.
>>>
>>> FYI, there is an issue with caching of the external values introduced
>>> in v1.4 but, thankfully, resolved in v3.2 [2]
>>>
>>> --jay
>>>
>>> [1] 
>>> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>>> [2] https://issues.apache.org/jira/browse/SOLR-2536
>>>
>>>
>>> On Fri, Jun 10, 2011 at 12:54 PM, lee carroll
>>>  wrote:
>>>> Hi,
>>>> We have a document type which has fields which are pretty static. Say
>>>> they change once every 6 month. But the same document has a field
>>>> which changes hourly
>>>> What are the best approaches to index this document ?
>>>>
>>>> Eg
>>>> Hotel ID (static) , Hotel Description (static and costly to get from a
>>>> url etc), FromPrice (changes hourly)
>>>>
>>>> Option 1
>>>> Index hourly as a single document and don't worry about the unneeded
>>>> field updates
>>>>
>>>> Option 2
>>>> Split into 2 document types and index independently. This would
>>>> require the front end application to query multiple times?
>>>> doc1
>>>> ID,Description,DocType
>>>> doc2
>>>> ID,HotelID,Price,DocType
>>>>
>>>> application performs searches based on hotel attributes
>>>> for each hotel match issue query to get price
>>>>
>>>>
>>>> Any other options ? Can you query across documents ?
>>>>
>>>> We run 1.4.1, we could maybe update to 3.2 but I don't think I could
>>>> swing to trunk for JOIN feature (if that indeed is JOIN's use case)
>>>>
>>>> Thanks in advance
>>>>
>>>> PS Am I just worrying about de-normalised data and should sort the
>>>> source data out maybe by caching and get over it ...?
>>>>
>>>> cheers Lee c
>>>>
>>>
>>
>

Re: WordDelimiter and stemEnglishPossessive doesn't work

2011-06-14 Thread lee carroll

do you need the word delimiter ?
#|\s
i think its just regex in the pattern tokeniser - i might be wrong though ?




On 14 June 2011 11:15, roySolr  wrote:
> Ok, with catenatewords the index term will be mcdonalds. But that's not what
> i want.
>
> I only use the wordDelimiter to split on whitespace. I have already used the
> PatternTokenizerFactory so i can't use the whitespacetokenizer.
>
> I want my index looks like this:
>
> dataset: mcdonald's#burgerking#Free record shop#h&m
>
> mcdonald's
> burgerking
> free
> record
> shop
> h&m
>
> Can i configure the wordDelimiter as an whitespaceTokenizer? So it only
> splits on whitespaces and nothing more(not removing 's etc)..
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/WordDelimiter-and-stemEnglishPossessive-doesn-t-work-tp3047678p3062461.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

simple production set up

2010-11-18 Thread lee carroll

Hi I'm pretty new to SOLR and interested in getting an idea about a simple
standard way of setting up a production SOLR service. I have read the FAQs
and the wiki around SOLR security and performance but have not found much on
a best practice architecture. I'm particularly interested in best practices
around DOS prevention, securing the SOLR web app and setting up dev, test,
production indexes.

Any pointers, links to resources would be great. Thanks in advance

Lee C

Can a URL based datasource in DIH return non xml

2010-11-21 Thread lee carroll

Hi,

Can a URL based datasource in DIH return non xml. My pages being indexed are
writen by many authors and will
often be invalid xhtml. Can DIH cope with htis or will i need another
approach ?

thanks in advance Lee C

Re: Can a URL based datasource in DIH return non xml

2010-11-22 Thread lee carroll

Hi Erik,

Thank you for the response. Just for completeness of the thread
I'm going to process the xhtml off-line. Another approach could be to set up
a web service which DIH could call which returned xml from a html parser.
However for my purposes its just as easy to use curl and perl and then use
DIH

cheers Lee

On 22 November 2010 12:59, Erick Erickson  wrote:

> DIH does some good stuff, but it doesn't handle bad input very robustly
> (actually, how could it intuit what "the right thing" is?). I'd consider
> SolrJ coupled with a "forgiving" HTML parser, e.g.
> http://sourceforge.net/projects/nekohtml/
>
> <http://sourceforge.net/projects/nekohtml/>Best
> Erick
>
> On Sun, Nov 21, 2010 at 7:46 PM, lee carroll
> wrote:
>
> > Hi,
> >
> > Can a URL based datasource in DIH return non xml. My pages being indexed
> > are
> > writen by many authors and will
> > often be invalid xhtml. Can DIH cope with htis or will i need another
> > approach ?
> >
> > thanks in advance Lee C
> >
>

Solr running in a weblogic cluster - bet configuration?

2010-11-24 Thread lee carroll

Hi We are investigating / looking to deploy solr on to a weblogic cluster of
4 servers.
The cluster has no concept of a master / slave configuration so we are
thinking of the following solutions (new to solr so some  or all may be bad
ideas:-)

1) all 4 servers run their own index. we re-index each server individually
this works best in terms of fail over but we worry about machine indexs
being slightly different due to re-index timings. can we index one server
and then copy index?
what is the effect on a solr server when an index is copied but it is still
reading the index?

2) all 4 servers use the same index
not very good for fail over, performance and we worry about multiple servers
updating. although if we controlled re-indexing to always use one server and
then commit on all 4 would this
solve this issue?

3) set up master - slave as per docs for 1.4.X http replication.
the down side here is we introduce a master server in our cluster which
should be HA - what happens if we loose the master will the slaves still be
operable ?
is it justa case of having stale indexes until the master is brought back
into the cluster
is limiting updates to the master server good enough

4) set up repeaters (master/slave servers) as per docs
this fits in better with a weblogic HA cluster but seems to be not the exact
match to the solution repeaters are aimed at (reducing data transfers across
a wan). ie will we bring down the cluster with endless updates with the
follwoing config or
will the set up stop propigating the changes ?

server1 (master, slave of server 2,3,4)
server2 (master, slave of server 1,3,4)
server3 (master, slave of server 1,2,4)
server4 (master, slave of server 1,2,3)

mmph this one sounds very whacky

Anyways if we can index a single instance and copy index seemlessly to
others this seems maybe the best approach if not maybe option3 and deal with
stale indexes if the master drops out (will have a tough time convincing the
police on this one maybe) how have others solved solrs master slave model
within a HA cluster ? (particularly weblogic clusters..

Thanks for all help so far (this is 3rd question posted this week :-)

lee

Re: Solr running in a weblogic cluster - bet configuration?

2010-11-24 Thread lee carroll

Just been reading about a 5 possible set up:

all indexes in the cluster are slaves to a master index outside of the
cluster. building and maintaining the index happens outside of the cluster
which would only be used for queries. are their any issues with this set up
?

thanks

lee

On 24 November 2010 11:53, lee carroll  wrote:

> Hi We are investigating / looking to deploy solr on to a weblogic cluster
> of 4 servers.
> The cluster has no concept of a master / slave configuration so we are
> thinking of the following solutions (new to solr so some  or all may be bad
> ideas:-)
>
> 1) all 4 servers run their own index. we re-index each server individually
> this works best in terms of fail over but we worry about machine indexs
> being slightly different due to re-index timings. can we index one server
> and then copy index?
> what is the effect on a solr server when an index is copied but it is still
> reading the index?
>
> 2) all 4 servers use the same index
> not very good for fail over, performance and we worry about multiple
> servers updating. although if we controlled re-indexing to always use one
> server and then commit on all 4 would this
> solve this issue?
>
> 3) set up master - slave as per docs for 1.4.X http replication.
> the down side here is we introduce a master server in our cluster which
> should be HA - what happens if we loose the master will the slaves still be
> operable ?
> is it justa case of having stale indexes until the master is brought back
> into the cluster
> is limiting updates to the master server good enough
>
> 4) set up repeaters (master/slave servers) as per docs
> this fits in better with a weblogic HA cluster but seems to be not the
> exact match to the solution repeaters are aimed at (reducing data transfers
> across a wan). ie will we bring down the cluster with endless updates with
> the follwoing config or
> will the set up stop propigating the changes ?
>
> server1 (master, slave of server 2,3,4)
> server2 (master, slave of server 1,3,4)
> server3 (master, slave of server 1,2,4)
> server4 (master, slave of server 1,2,3)
>
> mmph this one sounds very whacky
>
> Anyways if we can index a single instance and copy index seemlessly to
> others this seems maybe the best approach if not maybe option3 and deal with
> stale indexes if the master drops out (will have a tough time convincing the
> police on this one maybe) how have others solved solrs master slave model
> within a HA cluster ? (particularly weblogic clusters..
>
> Thanks for all help so far (this is 3rd question posted this week :-)
>
> lee
>
>
>
>
>
>
>

schema design for related fields

2010-12-01 Thread lee carroll

Hi

I've built a schema for a proof of concept and it is all working fairly
fine, niave maybe but fine.
However I think we might run into trouble in the future if we ever use
facets.

The data models train destination city routes from a origin city:
Doc:City
Name: cityname [uniq key]
CityType: city type values [nine possible values so good for faceting]
... [other city attricbutes which relate directy to the doc unique key]
all have limited vocab so good for faceting
FareJanStandard:cheapest standard fare in january(float value)
FareJanFirst:cheapest first class fare in january(float value)
FareFebStandard:cheapest standard fare in feb(float value)
FareFebFirst:cheapest first fare in feb(float value)
. etc

The question is how would i best facet fare price? The desire is to return

number of citys with jan prices in a set of ranges
etc
number of citys with first prices in a set of ranges
etc

install is 1.4.1 running in weblogic

Any ideas ?



Lee C

Re: schema design for related fields

2010-12-01 Thread lee carroll

Hi Erick,
so if i understand you we could do something like:

if Jan is selected in the user interface and we have 10 price ranges

query would be 20 cluases in the query (10 * 2 fare clases)

if first is selected in the user interface and we have 10 price ranges
query would be 120 cluases (12 months * 10 price ranges)

if first and jan selected with 10 price ranges
query would be 10 cluases

if we required facets to be returned for all price combinations we'd need to
supply
240 cluases

the user interface would also need to collate the individual fields into
meaningful aggragates for the user (ie numbers by month, numbers by fare
class)

have I understood or missed the point (i usually have)




On 1 December 2010 15:00, Erick Erickson  wrote:

> I'd think that facet.query would work for you, something like:
> &facet=true&facet.query=FareJanStandard:[price1 TO
> price2]&facet.query:fareJanStandard[price2 TO price3]
> You can string as many facet.query clauses as you want, across as many
> fields as you want, they're all
> independent and will get their own sections in the response.
>
> Best
> Erick
>
> On Wed, Dec 1, 2010 at 4:55 AM, lee carroll  >wrote:
>
> > Hi
> >
> > I've built a schema for a proof of concept and it is all working fairly
> > fine, niave maybe but fine.
> > However I think we might run into trouble in the future if we ever use
> > facets.
> >
> > The data models train destination city routes from a origin city:
> > Doc:City
> >Name: cityname [uniq key]
> >CityType: city type values [nine possible values so good for faceting]
> >... [other city attricbutes which relate directy to the doc unique
> key]
> > all have limited vocab so good for faceting
> >FareJanStandard:cheapest standard fare in january(float value)
> >FareJanFirst:cheapest first class fare in january(float value)
> >FareFebStandard:cheapest standard fare in feb(float value)
> >FareFebFirst:cheapest first fare in feb(float value)
> >. etc
> >
> > The question is how would i best facet fare price? The desire is to
> return
> >
> > number of citys with jan prices in a set of ranges
> > etc
> > number of citys with first prices in a set of ranges
> > etc
> >
> > install is 1.4.1 running in weblogic
> >
> > Any ideas ?
> >
> >
> >
> > Lee C
> >
>

Re: schema design for related fields

2010-12-01 Thread lee carroll

Geert

The UI would be something like:
user selections
for the facet price
max price: £100
fare class: any

city attributes facet
cityattribute1 etc: xxx

results displayed something like

Facet price
Standard fares [10]
First fares [3]
in Jan [9]
in feb [10]
in march [1]
etc
is this compatible with your approach ?

Erick the price is an interval scale ie a fare can be any value (not high,
low, medium etc)

How sensible would the following approach be
index city docs with fields only related to the city unique key
in the same index also index fare docs which would be something like:
Fare:
cityID: xxx
Fareclass:standard
FareMonth: Jan
FarePrice: 100

the query would be something like:
q=FarePrice:[* TO 100] FareMonth:Jan fl=cityID
returning facets for FareClass and FareMonth. hold on this will not facet
city docs correctly. sorry thasts not going to work.







On 1 December 2010 16:25, Erick Erickson  wrote:

> Hmmm, that's getting to be a pretty clunky query sure enough. Now you're
> going to
> have to insure that HTTP request that long get through and stuff like
> that
>
> I'm reaching a bit here, but you can facet on a tokenized field. Although
> that's not
> often done there's no prohibition against it.
>
> So, what if you had just one field for each city that contained some
> abstract
> information about your fares etc. Something like
> janstdfareclass1 jancheapfareclass3 febstdfareclass6
>
> Now just facet on that field? Not #values# in that field, just the field
> itself. You'd then have to make those into human-readable text, but that
> would considerably simplify your query. Probably only works if your user is
> selecting from pre-defined ranges, if they expect to put in arbitrary
> ranges
> this scheme probably wouldn't work...
>
> Best
> Erick
>
> On Wed, Dec 1, 2010 at 10:22 AM, lee carroll
> wrote:
>
> > Hi Erick,
> > so if i understand you we could do something like:
> >
> > if Jan is selected in the user interface and we have 10 price ranges
> >
> > query would be 20 cluases in the query (10 * 2 fare clases)
> >
> > if first is selected in the user interface and we have 10 price ranges
> > query would be 120 cluases (12 months * 10 price ranges)
> >
> > if first and jan selected with 10 price ranges
> > query would be 10 cluases
> >
> > if we required facets to be returned for all price combinations we'd need
> > to
> > supply
> > 240 cluases
> >
> > the user interface would also need to collate the individual fields into
> > meaningful aggragates for the user (ie numbers by month, numbers by fare
> > class)
> >
> > have I understood or missed the point (i usually have)
> >
> >
> >
> >
> > On 1 December 2010 15:00, Erick Erickson 
> wrote:
> >
> > > I'd think that facet.query would work for you, something like:
> > > &facet=true&facet.query=FareJanStandard:[price1 TO
> > > price2]&facet.query:fareJanStandard[price2 TO price3]
> > > You can string as many facet.query clauses as you want, across as many
> > > fields as you want, they're all
> > > independent and will get their own sections in the response.
> > >
> > > Best
> > > Erick
> > >
> > > On Wed, Dec 1, 2010 at 4:55 AM, lee carroll <
> > lee.a.carr...@googlemail.com
> > > >wrote:
> > >
> > > > Hi
> > > >
> > > > I've built a schema for a proof of concept and it is all working
> fairly
> > > > fine, niave maybe but fine.
> > > > However I think we might run into trouble in the future if we ever
> use
> > > > facets.
> > > >
> > > > The data models train destination city routes from a origin city:
> > > > Doc:City
> > > >Name: cityname [uniq key]
> > > >CityType: city type values [nine possible values so good for
> > faceting]
> > > >... [other city attricbutes which relate directy to the doc unique
> > > key]
> > > > all have limited vocab so good for faceting
> > > >FareJanStandard:cheapest standard fare in january(float value)
> > > >FareJanFirst:cheapest first class fare in january(float value)
> > > >FareFebStandard:cheapest standard fare in feb(float value)
> > > >FareFebFirst:cheapest first fare in feb(float value)
> > > >. etc
> > > >
> > > > The question is how would i best facet fare price? The desire is to
> > > return
> > > >
> > > > number of citys with jan prices in a set of ranges
> > > > etc
> > > > number of citys with first prices in a set of ranges
> > > > etc
> > > >
> > > > install is 1.4.1 running in weblogic
> > > >
> > > > Any ideas ?
> > > >
> > > >
> > > >
> > > > Lee C
> > > >
> > >
> >
>

Re: schema design for related fields

2010-12-01 Thread lee carroll

Sorry Geert missed of the price value bit from the user interface so we'd
display

Facet price
Standard fares [10]
First fares [3]

When traveling
in Jan [9]
in feb [10]
in march [1]

Fare Price
0 - 25 :  [20]
25 - 50: [10]
50 - 100 [2]

cheers lee c


On 1 December 2010 17:00, lee carroll  wrote:

> Geert
>
> The UI would be something like:
> user selections
> for the facet price
> max price: £100
> fare class: any
>
> city attributes facet
> cityattribute1 etc: xxx
>
> results displayed something like
>
> Facet price
> Standard fares [10]
> First fares [3]
> in Jan [9]
> in feb [10]
> in march [1]
> etc
> is this compatible with your approach ?
>
> Erick the price is an interval scale ie a fare can be any value (not high,
> low, medium etc)
>
> How sensible would the following approach be
> index city docs with fields only related to the city unique key
> in the same index also index fare docs which would be something like:
> Fare:
> cityID: xxx
> Fareclass:standard
> FareMonth: Jan
> FarePrice: 100
>
> the query would be something like:
> q=FarePrice:[* TO 100] FareMonth:Jan fl=cityID
> returning facets for FareClass and FareMonth. hold on this will not facet
> city docs correctly. sorry thasts not going to work.
>
>
>
>
>
>
>
>
> On 1 December 2010 16:25, Erick Erickson  wrote:
>
>> Hmmm, that's getting to be a pretty clunky query sure enough. Now you're
>> going to
>> have to insure that HTTP request that long get through and stuff like
>> that
>>
>> I'm reaching a bit here, but you can facet on a tokenized field. Although
>> that's not
>> often done there's no prohibition against it.
>>
>> So, what if you had just one field for each city that contained some
>> abstract
>> information about your fares etc. Something like
>> janstdfareclass1 jancheapfareclass3 febstdfareclass6
>>
>> Now just facet on that field? Not #values# in that field, just the field
>> itself. You'd then have to make those into human-readable text, but that
>> would considerably simplify your query. Probably only works if your user
>> is
>> selecting from pre-defined ranges, if they expect to put in arbitrary
>> ranges
>> this scheme probably wouldn't work...
>>
>> Best
>> Erick
>>
>> On Wed, Dec 1, 2010 at 10:22 AM, lee carroll
>> wrote:
>>
>> > Hi Erick,
>> > so if i understand you we could do something like:
>> >
>> > if Jan is selected in the user interface and we have 10 price ranges
>> >
>> > query would be 20 cluases in the query (10 * 2 fare clases)
>> >
>> > if first is selected in the user interface and we have 10 price ranges
>> > query would be 120 cluases (12 months * 10 price ranges)
>> >
>> > if first and jan selected with 10 price ranges
>> > query would be 10 cluases
>> >
>> > if we required facets to be returned for all price combinations we'd
>> need
>> > to
>> > supply
>> > 240 cluases
>> >
>> > the user interface would also need to collate the individual fields into
>> > meaningful aggragates for the user (ie numbers by month, numbers by fare
>> > class)
>> >
>> > have I understood or missed the point (i usually have)
>> >
>> >
>> >
>> >
>> > On 1 December 2010 15:00, Erick Erickson 
>> wrote:
>> >
>> > > I'd think that facet.query would work for you, something like:
>> > > &facet=true&facet.query=FareJanStandard:[price1 TO
>> > > price2]&facet.query:fareJanStandard[price2 TO price3]
>> > > You can string as many facet.query clauses as you want, across as many
>> > > fields as you want, they're all
>> > > independent and will get their own sections in the response.
>> > >
>> > > Best
>> > > Erick
>> > >
>> > > On Wed, Dec 1, 2010 at 4:55 AM, lee carroll <
>> > lee.a.carr...@googlemail.com
>> > > >wrote:
>> > >
>> > > > Hi
>> > > >
>> > > > I've built a schema for a proof of concept and it is all working
>> fairly
>> > > > fine, niave maybe but fine.
>> > > > However I think we might run into trouble in the future if we ever
>> use
>> > > > facets.
>> > > >
>> > > > The data models train destination city routes from a origin city:
>> > > > Doc:City
>> > > >Name: cityname [uniq key]
>> > > >CityType: city type values [nine possible values so good for
>> > faceting]
>> > > >... [other city attricbutes which relate directy to the doc
>> unique
>> > > key]
>> > > > all have limited vocab so good for faceting
>> > > >FareJanStandard:cheapest standard fare in january(float value)
>> > > >FareJanFirst:cheapest first class fare in january(float value)
>> > > >FareFebStandard:cheapest standard fare in feb(float value)
>> > > >FareFebFirst:cheapest first fare in feb(float value)
>> > > >. etc
>> > > >
>> > > > The question is how would i best facet fare price? The desire is to
>> > > return
>> > > >
>> > > > number of citys with jan prices in a set of ranges
>> > > > etc
>> > > > number of citys with first prices in a set of ranges
>> > > > etc
>> > > >
>> > > > install is 1.4.1 running in weblogic
>> > > >
>> > > > Any ideas ?
>> > > >
>> > > >
>> > > >
>> > > > Lee C
>> > > >
>> > >
>> >
>>
>
>

Re: schema design for related fields

2010-12-01 Thread lee carroll

Hi Geert,

Ok I think I follow. the magic is in the multi-valued field.

The only danger would be complexity if we allow users to multi select
months/prices/fare classes. For example they can search for first prices in
jan, april and november. I think what you describe is possible in this case
just complicated. I'll see if i can hack some facets into the proto type
tommorrow. Thanks for your help

Lee C

On 1 December 2010 17:57, Geert-Jan Brits  wrote:

> Ok longer answer than anticipated (and good conceptual practice ;-)
>
> Yeah I belief that would work if I understand correctly that:
>
> 'in Jan [9]
> in feb [10]
> in march [1]'
>
> has nothing to do with pricing, but only with availability?
>
> If so you could seperate it out as two seperate issues:
>
> 1. ) showing pricing (based on context)
> 2. ) showing availabilities (based on context)
>
> For 1.)  you get 39 pricefields ([jan,feb,..,dec,dc] * [standard,first,dc])
> note: 'dc' indicates 'don't care.
>
> depending on the context you query the correct pricefield to populate the
> price facet-values.
> for discussion lets call the fields: _p[fare][date].
> IN other words the price field for no preference at all would become:
> _pdcdc
>
>
> For 2.) define a multivalued field 'FaresPerDate 'which indicate
> availability, which is used to display:
>
> A)
> Standard fares [10]
> First fares [3]
>
> B)
> in Jan [9]
> in feb [10]
> in march [1]
>
> A) depends on your selection (or dont caring) about a month
> B) vice versa depends on your selection (or dont caring)  about a fare type
>
> given all possible date values: [jan,feb,..dec,dontcare]
> given all possible fare values:[standard,first,dontcare]
>
> FaresPerDate consists of multiple values per document where each value
> indicates the availability of a combination of 'fare' and 'date':
>
> (standardJan,firstJan,DCjan...,standardJan,firstDec,DCdec,standardDC,firstDC,DCDC)
> Note that the nr of possible values = 39.
>
> Example:
> 1. ) the user hasn't selected any preference:
>
> q=*:*&facet.field:FaresPerDate&facet.query=_pdcdc:[0 TO
> 20]&facet.query=_pdcdc:[20 TO 40], etc.
>
> in the client you have to make sure to select the correct values of
> 'FaresPerDate' for display:
> in this case:
>
> Standard fares [10] --> FaresPerDate.standardDC
> First fares [3] --> FaresPerDate.firstDC
>
> in Jan [9] -> FaresPerDate.DCJan
> in feb [10] -> FaresPerDate.DCFeb
> in march [1]-> FaresPerDate.DCMarch
>
> 2) the user has selected January
> q=*:*&facet.field:FaresPerDate&fq=FaresPerDate:DCJan&facet.query=_pDCJan:[0
> TO 20]&facet.query=_pDCJan:[20 TO 40]
>
> Standard fares [10] --> FaresPerDate.standardJan
> First fares [3] --> FaresPerDate.firstJan
>
> in Jan [9] -> FaresPerDate.DCJan
> in feb [10] -> FaresPerDate.DCFeb
> in march [1]-> FaresPerDate.DCMarch
>
> Hope that helps,
> Geert-Jan
>
>
> 2010/12/1 lee carroll 
>
> > Sorry Geert missed of the price value bit from the user interface so we'd
> > display
> >
> > Facet price
> > Standard fares [10]
> > First fares [3]
> >
> > When traveling
> > in Jan [9]
> > in feb [10]
> > in march [1]
> >
> > Fare Price
> > 0 - 25 :  [20]
> > 25 - 50: [10]
> > 50 - 100 [2]
> >
> > cheers lee c
> >
> >
> > On 1 December 2010 17:00, lee carroll 
> > wrote:
> >
> > > Geert
> > >
> > > The UI would be something like:
> > > user selections
> > > for the facet price
> > > max price: £100
> > > fare class: any
> > >
> > > city attributes facet
> > > cityattribute1 etc: xxx
> > >
> > > results displayed something like
> > >
> > > Facet price
> > > Standard fares [10]
> > > First fares [3]
> > > in Jan [9]
> > > in feb [10]
> > > in march [1]
> > > etc
> > > is this compatible with your approach ?
> > >
> > > Erick the price is an interval scale ie a fare can be any value (not
> > high,
> > > low, medium etc)
> > >
> > > How sensible would the following approach be
> > > index city docs with fields only related to the city unique key
> > > in the same index also index fare docs which would be something like:
> > > Fare:
> > > cityID: xxx
> > > Fareclass:standard
> > > FareMonth: Jan
> > > FarePrice: 100
> > >
> > > the query would be somethin

SOLR Thesaurus

2010-12-02 Thread lee carroll

Hi List,

Coming to and end of a proto type evaluation of SOLR (all very good etc etc)
Getting to the point at looking at bells and whistles. Does SOLR have a
thesuarus. Cant find any refrerence
to one in the docs and on the wiki etc. (Apart from a few mail threads which
describe the synonym.txt as a thesuarus)

I mean something like:

PT: 
BT: xxx,,
NT: xxx,,
RT:xxx,xxx,xxx
Scope Note: xx,

Like i say bells and whistles

cheers Lee

Re: SOLR Thesaurus

2010-12-02 Thread lee carroll

Hi

Stephen, yes sorry should have been more plain

a term can have a Prefered Term (PT), many Broader Terms (BT), Many Narrower
Terms (NT) Related Terms (RT) etc

So

User supplied Term is say : Ski

Prefered term: Skiing
Broader terms could be : Ski and Snow Boarding, Mountain Sports, Sports
Narrower terms: down hill skiing, telemark, cross country
Related terms: boarding, snow boarding, winter holidays

Michael,

yes exactly, SKOS, although maybe without the over wheening ambition to take
over the world.

By the sounds of it though out of the box you get a simple (but pretty
effective synonym list and ring) Anything more we'd need to write it
ourselfs ie your thesaurus filter and plus a change to the response as
broader terms, narrower terms etc would be good to be suggested to the ui.

No plugins out there ?

On 2 December 2010 16:16, Michael Zach  wrote:

> Hello Lee,
>
> these bells sound like "SKOS" ;o)
>
> AFAIK Solr does not support thesauri just plain flat synonym lists.
>
> One could implement a thesaurus filter and put it into the end of the
> analyzer chain of solr.
>
> The filter would then do a thesaurus lookup for each token it receives and
> possibly
> * expand the query
> or
> * kind of "stem" document tokens to some prefered variants according to the
> thesaurus
>
> Maybe even taking term relations from thesaurus into account and boost
> queries or doc fields at index time.
>
> Maybe have a look at http://poolparty.punkt.at/ a full features SKOS
> thesaurus management server.
> It's also providing webservices which could feed such a Solr filter.
>
> Kind regards
> Michael
>
>
> - Ursprüngliche Mail -
> Von: "lee carroll" 
> An: solr-user@lucene.apache.org
> Gesendet: Donnerstag, 2. Dezember 2010 09:55:54
> Betreff: SOLR Thesaurus
>
> Hi List,
>
> Coming to and end of a proto type evaluation of SOLR (all very good etc
> etc)
> Getting to the point at looking at bells and whistles. Does SOLR have a
> thesuarus. Cant find any refrerence
> to one in the docs and on the wiki etc. (Apart from a few mail threads
> which
> describe the synonym.txt as a thesuarus)
>
> I mean something like:
>
> PT: 
> BT: xxx,,
> NT: xxx,,
> RT:xxx,xxx,xxx
> Scope Note: xx,
>
> Like i say bells and whistles
>
> cheers Lee
>

only index synonyms

2010-12-06 Thread lee carroll

Hi Can the following usecase be achieved.

value to be analysed at index time "this is a pretty line of text"

synonym list is pretty => scenic , text => words

valued placed in the index is "scenic words"

That is to say only the matching synonyms. Basically i want to produce a
normalised set of phrases for faceting.

Cheers Lee C

Re: only index synonyms

2010-12-06 Thread lee carroll

Hi Erik thanks for the reply. I only want the synonyms to be in the index
how can I achieve that ? Sorry probably missing something obvious in the
docs
On 7 Dec 2010 01:28, "Erick Erickson"  wrote:
> See:
>
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>
> with the => syntax, I think that's what you're looking for
>
> Best
> Erick
>
> On Mon, Dec 6, 2010 at 6:34 PM, lee carroll wrote:
>
>> Hi Can the following usecase be achieved.
>>
>> value to be analysed at index time "this is a pretty line of text"
>>
>> synonym list is pretty => scenic , text => words
>>
>> valued placed in the index is "scenic words"
>>
>> That is to say only the matching synonyms. Basically i want to produce a
>> normalised set of phrases for faceting.
>>
>> Cheers Lee C
>>

Re: only index synonyms

2010-12-07 Thread lee carroll

Hi tom

This seems to place in the index
This is a scenic line of words
I just want scenic and words in the index

I'm not at a terminal at the moment but will try again to make sure. I'm
sure I'm missing the obvious

Cheers lee
On 7 Dec 2010 07:40, "Tom Hill"  wrote:
> Hi Lee,
>
>
> On Mon, Dec 6, 2010 at 10:56 PM, lee carroll
>  wrote:
>> Hi Erik
>
> Nope, Erik is the other one. :-)
>
>> thanks for the reply. I only want the synonyms to be in the index
>> how can I achieve that ? Sorry probably missing something obvious in the
>> docs
>
> Exactly what he said, use the => syntax. You've already got it. Add the
lines
>
> pretty => scenic
> text => words
>
> to synonyms.txt, and it will do what you want.
>
> Tom
>
>> On 7 Dec 2010 01:28, "Erick Erickson"  wrote:
>>> See:
>>>
>>
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>>
>>> with the => syntax, I think that's what you're looking for
>>>
>>> Best
>>> Erick
>>>
>>> On Mon, Dec 6, 2010 at 6:34 PM, lee carroll <
lee.a.carr...@googlemail.com
>>>wrote:
>>>
>>>> Hi Can the following usecase be achieved.
>>>>
>>>> value to be analysed at index time "this is a pretty line of text"
>>>>
>>>> synonym list is pretty => scenic , text => words
>>>>
>>>> valued placed in the index is "scenic words"
>>>>
>>>> That is to say only the matching synonyms. Basically i want to produce
a
>>>> normalised set of phrases for faceting.
>>>>
>>>> Cheers Lee C
>>>>
>>

Re: only index synonyms

2010-12-07 Thread lee carroll

ok thanks for your response

To summarise the solution then:

To only index synonyms you must only send words that will match the synonym
list. If words with out synonym ,atches are in the field to be indexed these
words will be indexed. No way to avoid this by using schema.xml config.

thanks lee c

On 7 December 2010 13:21, Erick Erickson  wrote:

> OK, the light finally dawns
>
> *If* you have a defined list of words to remove, you can put them in
> with your stopwords and add a stopword filter to the field in
> schema.xml.
>
> Otherwise, you'll have to do some pre-processing and only send to
> solr words you want. I'm assuming you have a list of valid words
> (i.e. the words in your synonyms file) and could pre-filter the input
> to remove everything else. In that case you don't need a synonyms
> filter since you're controlling the whole process anyway
>
> Best
> Erick
>
> On Tue, Dec 7, 2010 at 6:07 AM, lee carroll  >wrote:
>
> > Hi tom
> >
> > This seems to place in the index
> > This is a scenic line of words
> > I just want scenic and words in the index
> >
> > I'm not at a terminal at the moment but will try again to make sure. I'm
> > sure I'm missing the obvious
> >
> > Cheers lee
> > On 7 Dec 2010 07:40, "Tom Hill"  wrote:
> > > Hi Lee,
> > >
> > >
> > > On Mon, Dec 6, 2010 at 10:56 PM, lee carroll
> > >  wrote:
> > >> Hi Erik
> > >
> > > Nope, Erik is the other one. :-)
> > >
> > >> thanks for the reply. I only want the synonyms to be in the index
> > >> how can I achieve that ? Sorry probably missing something obvious in
> the
> > >> docs
> > >
> > > Exactly what he said, use the => syntax. You've already got it. Add the
> > lines
> > >
> > > pretty => scenic
> > > text => words
> > >
> > > to synonyms.txt, and it will do what you want.
> > >
> > > Tom
> > >
> > >> On 7 Dec 2010 01:28, "Erick Erickson" 
> wrote:
> > >>> See:
> > >>>
> > >>
> >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> > >>>
> > >>> with the => syntax, I think that's what you're looking for
> > >>>
> > >>> Best
> > >>> Erick
> > >>>
> > >>> On Mon, Dec 6, 2010 at 6:34 PM, lee carroll <
> > lee.a.carr...@googlemail.com
> > >>>wrote:
> > >>>
> > >>>> Hi Can the following usecase be achieved.
> > >>>>
> > >>>> value to be analysed at index time "this is a pretty line of text"
> > >>>>
> > >>>> synonym list is pretty => scenic , text => words
> > >>>>
> > >>>> valued placed in the index is "scenic words"
> > >>>>
> > >>>> That is to say only the matching synonyms. Basically i want to
> produce
> > a
> > >>>> normalised set of phrases for faceting.
> > >>>>
> > >>>> Cheers Lee C
> > >>>>
> > >>
> >
>

Re: only index synonyms

2010-12-07 Thread lee carroll

That's ace tom
Will give it a go but sounds spot on
On 7 Dec 2010 20:49, "Tom Hill"  wrote:
> Hi Lee,
>
> Sorry, I think Erick and I both thought the issue was converting the
> synonyms, not removing the other words.
>
> To keep only a set of words that match a list, use the
> KeepWordFilterFactory, with your list of synonyms.
>
>
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.KeepWordFilterFactory
>
> I'd put the synonym filter first in your configuration for the field,
> then the keep words filter factory.
>
> Tom
>
>
>
>
> On Tue, Dec 7, 2010 at 12:06 PM, lee carroll
>  wrote:
>> ok thanks for your response
>>
>> To summarise the solution then:
>>
>> To only index synonyms you must only send words that will match the
synonym
>> list. If words with out synonym ,atches are in the field to be indexed
these
>> words will be indexed. No way to avoid this by using schema.xml config.
>>
>> thanks lee c
>>
>> On 7 December 2010 13:21, Erick Erickson  wrote:
>>
>>> OK, the light finally dawns
>>>
>>> *If* you have a defined list of words to remove, you can put them in
>>> with your stopwords and add a stopword filter to the field in
>>> schema.xml.
>>>
>>> Otherwise, you'll have to do some pre-processing and only send to
>>> solr words you want. I'm assuming you have a list of valid words
>>> (i.e. the words in your synonyms file) and could pre-filter the input
>>> to remove everything else. In that case you don't need a synonyms
>>> filter since you're controlling the whole process anyway
>>>
>>> Best
>>> Erick
>>>
>>> On Tue, Dec 7, 2010 at 6:07 AM, lee carroll <
lee.a.carr...@googlemail.com
>>> >wrote:
>>>
>>> > Hi tom
>>> >
>>> > This seems to place in the index
>>> > This is a scenic line of words
>>> > I just want scenic and words in the index
>>> >
>>> > I'm not at a terminal at the moment but will try again to make sure.
I'm
>>> > sure I'm missing the obvious
>>> >
>>> > Cheers lee
>>> > On 7 Dec 2010 07:40, "Tom Hill"  wrote:
>>> > > Hi Lee,
>>> > >
>>> > >
>>> > > On Mon, Dec 6, 2010 at 10:56 PM, lee carroll
>>> > >  wrote:
>>> > >> Hi Erik
>>> > >
>>> > > Nope, Erik is the other one. :-)
>>> > >
>>> > >> thanks for the reply. I only want the synonyms to be in the index
>>> > >> how can I achieve that ? Sorry probably missing something obvious
in
>>> the
>>> > >> docs
>>> > >
>>> > > Exactly what he said, use the => syntax. You've already got it. Add
the
>>> > lines
>>> > >
>>> > > pretty => scenic
>>> > > text => words
>>> > >
>>> > > to synonyms.txt, and it will do what you want.
>>> > >
>>> > > Tom
>>> > >
>>> > >> On 7 Dec 2010 01:28, "Erick Erickson" 
>>> wrote:
>>> > >>> See:
>>> > >>>
>>> > >>
>>> >
>>> >
>>>
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>> > >>>
>>> > >>> with the => syntax, I think that's what you're looking for
>>> > >>>
>>> > >>> Best
>>> > >>> Erick
>>> > >>>
>>> > >>> On Mon, Dec 6, 2010 at 6:34 PM, lee carroll <
>>> > lee.a.carr...@googlemail.com
>>> > >>>wrote:
>>> > >>>
>>> > >>>> Hi Can the following usecase be achieved.
>>> > >>>>
>>> > >>>> value to be analysed at index time "this is a pretty line of
text"
>>> > >>>>
>>> > >>>> synonym list is pretty => scenic , text => words
>>> > >>>>
>>> > >>>> valued placed in the index is "scenic words"
>>> > >>>>
>>> > >>>> That is to say only the matching synonyms. Basically i want to
>>> produce
>>> > a
>>> > >>>> normalised set of phrases for faceting.
>>> > >>>>
>>> > >>>> Cheers Lee C
>>> > >>>>
>>> > >>
>>> >
>>>
>>

Re: SolJSON

2010-12-09 Thread lee carroll

Hi Alessandro,

Can you use a javascript library which handles ajax and json / jsonp
You will end up with much cleaner client code for example a jquery
implementation looks quite nice using solrs neat jsonp support:

queryString = "*:*"
$.getJSON(
"http://[server]:[port]/solr/select/?jsoncallback=?";,
{"q": queryString,
"version": "2.2",
"start": "0",
"rows": "10",
"indent": "on",
"json.wrf": "callbackFunctionToDoSomethingWithOurData",
"wt": "json",
"fl": "field1"}
);

and the callback function

function callbackFunctionToDoSomethingWithOurData(solrData) {
   // do stuff with your nice data
}

Their is also a javascript client for solr as well but i've not used this

Cheers Lee C

On 9 December 2010 17:30, alessandro.ri...@virgilio.it <
alessandro.ri...@virgilio.it> wrote:

>
>  Dear all,
>
> First of all sorry for the previous email with missing object.
> I'm trying to call our solr server
> with the json parameter in order to parse it on the client side which
> is javascript.
> My problem is that when I try the
> call (see the code below) using the wiki instructions
> (http://wiki.apache.org/solr/SolJSON)
> the XMLHttpRequest object gets blank when using the W3C standard
> rather then the Microsoft ActiveX which is working just fine.
> Do you know if there is some further
> implementation I have to use in order to get the standard
> implementation working??
>
>
>
> function
> xmlhttpPost(strURL) {
>
> var
> xmlHttpReq = false;
>
> var
> self = this;
>if
> (window.XMLHttpRequest) { //
> Mozilla/Safari
>
>  self.xmlHttpReq = new
> XMLHttpRequest();
>
>}
>
> else
> if
> (window.ActiveXObject) { // IE
>
>  self.xmlHttpReq = new
> ActiveXObject("Microsoft.XMLHTTP");
>}
>
> self.xmlHttpReq.open('GET',
> strURL, true);
>
> self.xmlHttpReq.setRequestHeader('Content-Type',
> 'text/plain;charset=UTF-8'
> );
>
>
>
>
> self.xmlHttpReq.onreadystatechange
> = function()
> {
>
>   if
> (self.xmlHttpReq.readyState == 4) {
>
> updatepage(self.xmlHttpReq.responseText);
>
>  }
>}
>
>
> var
> params = getstandardargs().concat(getquerystring());
>
> var
> strData = params.join('&');
>
> self.xmlHttpReq.send(strData);
> }
> Thanks,Alessandro
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: SOLR Thesaurus

2010-12-10 Thread lee carroll

Hi Chris,

Its all a bit early in the morning for this mined :-)

The question asked, in good faith, was does solr support or extend to
implementing a thesaurus. It looks like it does not which is fine. It does
support synonyms and synonym rings which is again fine. The ski example was
an illustration in response to a follow up question for more explanation on
what a thesaurus is.

An attempt at an answer of why a thesaurus; is below.

Use case 1: improve facets

Motivation
Unstructured lists of labels in facets offer very poor user experience.
Similar to tag clouds users find them arbitrary, with out focus and often
overwhelming. Labels in facets which are grouped in meaningful ways relevant
to the user increase engagement, perceived relevance and user satisfaction.

Solution
A thesaurus of term relationships could be used to group facet labels

Implementation
(er completely out of my depth at this point)
Thesaurus relationships defined in a simple text file
term, bt=>term,term nt=> term, term rt=>term, term, pt=>term
if a search specifies a facet to be returned the field terms are identified
by reading the thesaurus into groups, broader terms, narrower terms, related
terms etc
These groups are returned as part of the response for the UI to display
faceted labels as broader, narrower, related terms etc

Use case 2: Increase synonym search precision

Motivation
Synonyms rings do not allow differences in synonym to be identified. Rarely
are synonyms exactly equivalent. This leads to a decrease in search
precision.

Solution
Boost queries based on search term thesaurus relationships

Implementation
(again completely  out of depth here)
Allow terms in the index to be identified as bt , nt, .. terms of the search
term. Allow query parser to boost terms differentially based on these
thesaurus relationships

As for the x and y stuff I'm not sure, like i say its quite early in the
morning for me. I'm sure their may well be a different way of achieving the
above (but note it is more than a hierarchy). However the librarians have
been doing this for 50 years now .

Again though just to repeat this is hardly a killer for us. We've looked at
solr for a project; created a proto type; generated tons of questions, had
them answered in the main by the docs, some on this list and been amazed at
the fantastic results solr has given us. In fact with a combination of
keepwords and synonyms we have got a pretty nice simple set of facet labels
anyway (my motivation for the original question), so our corpus at the
moment does not really need a thesaurus! :-)

Thanks Lee

On 9 December 2010 23:38, Chris Hostetter  wrote:

>
>
> : a term can have a Prefered Term (PT), many Broader Terms (BT), Many
> Narrower
> : Terms (NT) Related Terms (RT) etc
> ...
> : User supplied Term is say : Ski
> :
> : Prefered term: Skiing
> : Broader terms could be : Ski and Snow Boarding, Mountain Sports, Sports
> : Narrower terms: down hill skiing, telemark, cross country
> : Related terms: boarding, snow boarding, winter holidays
>
> I'm still lost.
>
> You've described a black box with some sample input ("Ski") and some
> corrisponding sample output (PT=..., BT=..., NT=..., RT=) -- but you
> haven't explained what you want to do with tht black box.  Assuming such a
> black box existed in solr what are you expecting/hoping to do with it?
> how would such a black box modify solr's user experience?  what is your
> goal?
>
> Smells like an XY Problem...
> http://people.apache.org/~hossman/#xyproblem
>
> Your question appears to be an "XY Problem" ... that is: you are dealing
> with "X", you are assuming "Y" will help you, and you are asking about "Y"
> without giving more details about the "X" so that we can understand the
> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> See Also: http://www.perlmonks.org/index.pl?node_id=542341
>
>
> -Hoss
>

Re: SOLR Thesaurus

2010-12-10 Thread lee carroll

Hi Peter,

Thats way to clever for me :-)
Discovering thesuarus relationships would be fantastic but its not clear
what heuristics you would need to use to discover broader, narrower, related
documents etc. Although I might be doing the clustering down i'm sceptical
about the accuracy.

cheers Lee c

On 10 December 2010 09:38, Peter Sturge  wrote:

> Hi Lee,
>
> Perhaps Solr's clustering component might be helpful for your use case?
> http://wiki.apache.org/solr/ClusteringComponent
>
>
>
>
> On Fri, Dec 10, 2010 at 9:17 AM, lee carroll
>  wrote:
> > Hi Chris,
> >
> > Its all a bit early in the morning for this mined :-)
> >
> > The question asked, in good faith, was does solr support or extend to
> > implementing a thesaurus. It looks like it does not which is fine. It
> does
> > support synonyms and synonym rings which is again fine. The ski example
> was
> > an illustration in response to a follow up question for more explanation
> on
> > what a thesaurus is.
> >
> > An attempt at an answer of why a thesaurus; is below.
> >
> > Use case 1: improve facets
> >
> > Motivation
> > Unstructured lists of labels in facets offer very poor user experience.
> > Similar to tag clouds users find them arbitrary, with out focus and often
> > overwhelming. Labels in facets which are grouped in meaningful ways
> relevant
> > to the user increase engagement, perceived relevance and user
> satisfaction.
> >
> > Solution
> > A thesaurus of term relationships could be used to group facet labels
> >
> > Implementation
> > (er completely out of my depth at this point)
> > Thesaurus relationships defined in a simple text file
> > term, bt=>term,term nt=> term, term rt=>term, term, pt=>term
> > if a search specifies a facet to be returned the field terms are
> identified
> > by reading the thesaurus into groups, broader terms, narrower terms,
> related
> > terms etc
> > These groups are returned as part of the response for the UI to display
> > faceted labels as broader, narrower, related terms etc
> >
> > Use case 2: Increase synonym search precision
> >
> > Motivation
> > Synonyms rings do not allow differences in synonym to be identified.
> Rarely
> > are synonyms exactly equivalent. This leads to a decrease in search
> > precision.
> >
> > Solution
> > Boost queries based on search term thesaurus relationships
> >
> > Implementation
> > (again completely  out of depth here)
> > Allow terms in the index to be identified as bt , nt, .. terms of the
> search
> > term. Allow query parser to boost terms differentially based on these
> > thesaurus relationships
> >
> >
> >
> > As for the x and y stuff I'm not sure, like i say its quite early in the
> > morning for me. I'm sure their may well be a different way of achieving
> the
> > above (but note it is more than a hierarchy). However the librarians have
> > been doing this for 50 years now .
> >
> > Again though just to repeat this is hardly a killer for us. We've looked
> at
> > solr for a project; created a proto type; generated tons of questions,
> had
> > them answered in the main by the docs, some on this list and been amazed
> at
> > the fantastic results solr has given us. In fact with a combination of
> > keepwords and synonyms we have got a pretty nice simple set of facet
> labels
> > anyway (my motivation for the original question), so our corpus at the
> > moment does not really need a thesaurus! :-)
> >
> > Thanks Lee
> >
> >
> > On 9 December 2010 23:38, Chris Hostetter 
> wrote:
> >
> >>
> >>
> >> : a term can have a Prefered Term (PT), many Broader Terms (BT), Many
> >> Narrower
> >> : Terms (NT) Related Terms (RT) etc
> >> ...
> >> : User supplied Term is say : Ski
> >> :
> >> : Prefered term: Skiing
> >> : Broader terms could be : Ski and Snow Boarding, Mountain Sports,
> Sports
> >> : Narrower terms: down hill skiing, telemark, cross country
> >> : Related terms: boarding, snow boarding, winter holidays
> >>
> >> I'm still lost.
> >>
> >> You've described a black box with some sample input ("Ski") and some
> >> corrisponding sample output (PT=..., BT=..., NT=..., RT=) -- but you
> >> haven't explained what you want to do with tht black box.  Assuming such
> a
> >> black box existed in solr what are you expecting/hoping to do with it?
> >> how would such a black box modify solr's user experience?  what is your
> >> goal?
> >>
> >> Smells like an XY Problem...
> >> http://people.apache.org/~hossman/#xyproblem<http://people.apache.org/%7Ehossman/#xyproblem>
> <http://people.apache.org/%7Ehossman/#xyproblem>
> >>
> >> Your question appears to be an "XY Problem" ... that is: you are dealing
> >> with "X", you are assuming "Y" will help you, and you are asking about
> "Y"
> >> without giving more details about the "X" so that we can understand the
> >> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> >> See Also: http://www.perlmonks.org/index.pl?node_id=542341
> >>
> >>
> >> -Hoss
> >>
> >
>

Re: SOLR Thesaurus

2010-12-10 Thread lee carroll

Two Peters (or rather a stupid english bloke who can't work out how to type
fancy accents :-)

Sorry Péter (took me 10 minutes to work out i could cut and paste) my reply
was to the clustering post by Peter Sturge. Clustering sounds great but
being able to define a thesaurus scheme excatly would be good too.



2010/12/10 Péter Király 

> Hi Lee,
>
> according to my vision the user could decide which relationship types
> would he likes to attach to his search, and the application would call
> his attention to other possibilities. So there would be no heuristic
> method applied, because e.g. boarder terms would cause lots of
> misleading results.
>
> Péter
>
> 2010/12/10 lee carroll :
> > Hi Peter,
> >
> > Thats way to clever for me :-)
> > Discovering thesuarus relationships would be fantastic but its not clear
> > what heuristics you would need to use to discover broader, narrower,
> related
> > documents etc. Although I might be doing the clustering down i'm
> sceptical
> > about the accuracy.
> >
> > cheers Lee c
> >
> > On 10 December 2010 09:38, Peter Sturge  wrote:
> >
> >> Hi Lee,
> >>
> >> Perhaps Solr's clustering component might be helpful for your use case?
> >> http://wiki.apache.org/solr/ClusteringComponent
> >>
> >>
> >>
> >>
> >> On Fri, Dec 10, 2010 at 9:17 AM, lee carroll
> >>  wrote:
> >> > Hi Chris,
> >> >
> >> > Its all a bit early in the morning for this mined :-)
> >> >
> >> > The question asked, in good faith, was does solr support or extend to
> >> > implementing a thesaurus. It looks like it does not which is fine. It
> >> does
> >> > support synonyms and synonym rings which is again fine. The ski
> example
> >> was
> >> > an illustration in response to a follow up question for more
> explanation
> >> on
> >> > what a thesaurus is.
> >> >
> >> > An attempt at an answer of why a thesaurus; is below.
> >> >
> >> > Use case 1: improve facets
> >> >
> >> > Motivation
> >> > Unstructured lists of labels in facets offer very poor user
> experience.
> >> > Similar to tag clouds users find them arbitrary, with out focus and
> often
> >> > overwhelming. Labels in facets which are grouped in meaningful ways
> >> relevant
> >> > to the user increase engagement, perceived relevance and user
> >> satisfaction.
> >> >
> >> > Solution
> >> > A thesaurus of term relationships could be used to group facet labels
> >> >
> >> > Implementation
> >> > (er completely out of my depth at this point)
> >> > Thesaurus relationships defined in a simple text file
> >> > term, bt=>term,term nt=> term, term rt=>term, term, pt=>term
> >> > if a search specifies a facet to be returned the field terms are
> >> identified
> >> > by reading the thesaurus into groups, broader terms, narrower terms,
> >> related
> >> > terms etc
> >> > These groups are returned as part of the response for the UI to
> display
> >> > faceted labels as broader, narrower, related terms etc
> >> >
> >> > Use case 2: Increase synonym search precision
> >> >
> >> > Motivation
> >> > Synonyms rings do not allow differences in synonym to be identified.
> >> Rarely
> >> > are synonyms exactly equivalent. This leads to a decrease in search
> >> > precision.
> >> >
> >> > Solution
> >> > Boost queries based on search term thesaurus relationships
> >> >
> >> > Implementation
> >> > (again completely  out of depth here)
> >> > Allow terms in the index to be identified as bt , nt, .. terms of the
> >> search
> >> > term. Allow query parser to boost terms differentially based on these
> >> > thesaurus relationships
> >> >
> >> >
> >> >
> >> > As for the x and y stuff I'm not sure, like i say its quite early in
> the
> >> > morning for me. I'm sure their may well be a different way of
> achieving
> >> the
> >> > above (but note it is more than a hierarchy). However the librarians
> have
> >> > been doing this for 50 years now .
> >> >
> >> > Again though just to repeat this is hardly a killer for us. We've
> looked
> &g

Re: search for a number within a range, where range values are mentioned in documents

2010-12-16 Thread lee carroll

During data import can you update a record with min and max fields, these
would be equal in the case of a single non range value.

I know this is not a solr solution but a data pre-processing one but would
work?

Failing the above i've saw in the docs reference to a compound value field
(in the context of points, ie point = lat , lon which would be a nice way to
store your range fields anthough i still think you will need to pre-process
your data.

cheers lee

On 15 December 2010 18:22, Jonathan Rochkind  wrote:

> I'm not sure you're right that it will result in an out-of-memory error if
> the range is too large. I don't think it will, I think it'll be fine as far
> as memory goes, because of how Lucene works. Or do you actually have reason
> to believe it was causing you memory issues?  Or do you just mean memory
> issues in your "transformer", not actually in Solr?
>
> Using Trie fields should also make it fine as far as CPU time goes.  Using
> a trie int field with a non-zero "precision" should likely be helpful in
> this case.
>
> It _will_ increase the on-disk size of your indexes.
>
> I'm not sure if there's a better approach, i can't think of one, but maybe
> someone else knows one.
>
>
> On 12/15/2010 12:56 PM, Arunkumar Ayyavu wrote:
>
>> Hi!
>>
>> I have a typical case where in an attribute (in a DB record) can
>> contain different ranges of numeric values. Let us say the range
>> values in this attribute for "record1" are
>> (2-4,5000-8000,45000-5,454,231,1000). As you can see this
>> attribute can also contain isolated numeric values such as 454, 231
>> and 1000. Now, I want to return "record1" if the user searches for
>> 20001 or 5003 or 231 or 5. Right now, I'm exploding the range
>> values (within a transformer) and indexing "record1" for each of the
>> values within a range. But this could result in out-of-memory error if
>> the range is too large. Could you help me figure out a better way of
>> addressing this type of queries using Solr.
>>
>> Thanks a ton.
>>
>>

Re: Jquery Autocomplete Json formatting ?

2010-12-16 Thread lee carroll

I think this could be down to the same server rule applied to ajax requests.

Your not allowed to display content from two different servers :-(

the good news solr supports jsonp which is a neat trick around this try this
(pasted from another thread)

queryString = "*:*"
$.getJSON(
"http://[server]:[port]/solr/
select/?jsoncallback=?",
{"q": queryString,
"version": "2.2",
"start": "0",
"rows": "10",
"indent": "on",
"json.wrf": "callbackFunctionToDoSomethingWithOurData",
"wt": "json",
"fl": "field1"}
);

and the callback function

function callbackFunctionToDoSomethingWithOurData(solrData) {
   // do stuff with your nice data
}




cheers lee c

On 16 December 2010 23:18, Anurag  wrote:

>
> Installed Firebug
>
> Now getting the following error
> 4139 matches.call( document.documentElement, "[test!='']:sizzle" );
>
> Though my solr server is running on port8983, I am not using any server to
> run this jquery, its just an html file in my home folder that i am opening
> in my firefox browser.
>
>
>
> -
> Kumar Anurag
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Jquery-Autocomplete-Json-formatting-tp2101346p2101595.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: synonyms database

2010-12-23 Thread lee carroll

Hi ramzesua,
Synonym lists will often be application specific and will of course be
language specific. Given this I don't think you can talk about a generic
solr synonym list, just won't be very helpful in lots of cases.

What are you hoping to achieve with your synonyms for your app?

On 23 December 2010 11:50, ramzesua  wrote:

>
> Hi all. Where can I get synonyms database for Solr?
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/synonyms-database-tp2136076p2136076.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: error in html???

2010-12-24 Thread lee carroll

Hi Satya,

This is not a solr issue. In your client which makes the json request you
need to have some error checking so you catch the error.

Occasionally people have apache set up to return a 200 ok http response with
a custom page on http errors (often for spurious security considerations)
but this breaks REST like services which SOLR implements and IMO should not
be done.

Take a look at the response coming back from solr and make sure you are
getting the correct http header response 500 etc when your queries errors.
If you are, great stuff you can then check your json invocation
documentation and catch and deal with these http errors in the client. if
your getting a 200 response check your apache config

lee c

On 24 December 2010 05:18, satya swaroop  wrote:

> Hi Erick,
>   Every result comes in xml format. But when you get any errors
> like http 500 or http 400 like wise we will get in html format. My query is
> cant we make that html file into json or vice versa..
>
> Regards,
> satya
>

difficult sort

2011-06-17 Thread lee carroll

Is this possible in 1.4.1

Return a result set sorted by a field but within Categorical groups,
limited to 1 record per group
Something like:
group1
xxx (bottom of sorted field within group)
group2
xxx (bottom of sorted field within group)
etc

is the only approach to issue multiple queries and collate in the
front end app ?

cheers lee c

Re: difficult sort

2011-06-17 Thread lee carroll

Thanks for the reply Pravesh

We can't go to trunk or apply patch to production so the field
collapsing goodness is out of reach for now.

Is multiple queries the only way to go for this ?

On 17 June 2011 11:23, pravesh  wrote:
> I'm not sure, but have looked at Collapsing feature in SOLR yet? You may have
> to apply patch for 1.4.1 version, if this is what u want?
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/difficult-sort-tp3075563p3075661.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Showing facet of first N docs

2011-06-18 Thread lee carroll

Hi Tommaso

I don't think you can achieve what you want using vanilla solr.
Facet counts will be for the result set matching not for the top n
result sets matching.

However what is your use case ? Assuming its for faceted navigation
showing facets for the
top n result sets could be confusing to your users. As the next
incremental filter applied by the user would change the "relevancy
focus" of the user and produce another set of top n facet counts with
a document set un-related to the last result set. This could be a very
bad user experience producing a fluctuating facet counts (ie a filter
narrowing the search could produce an increase in a facet term count -
very odd) also the result set could change strangely with docs
floating in and out of the result list.

relevancy seems to be the answer here - if your docs are scored
correctly then counting all docs in the result set for the facet
counts is correct. do you need to improve relevancy?

On 18 June 2011 08:23, Dmitry Kan  wrote:
> Do you mean you would like to boost the facets that contain the most of the
> lemmas?
> What is the user query in this case and if possible, what is the use case
> (may be some other solution exists for what you are trying to achieve)?
>
> On Thu, Jun 16, 2011 at 5:23 PM, Tommaso Teofili
> wrote:
>
>> Thanks Dmitry, but maybe I didn't explain correctly as I am not sure
>> facet.offset is the right solution, I'd like not to page but to filter
>> facets.
>> I'll try to explain better with an example.
>> Imagine I make a query and first 2 docs in results have both 'xyz' and
>> 'abc'
>> as values for field 'lemmas' while also other docs in the results have
>> 'xyz'
>> or 'abc' as values of field 'lemmas' then I would like to show facets
>> "coming from" only the first 2 docs in the results thus having :
>> 
>>  2
>>  2
>> 
>> You can imagine this like a 'give me only facets related to the most
>> relevant docs in the results' functionality.
>> Any idea on how to do that?
>> Tommaso
>>
>>
>> 2011/6/16 Dmitry Kan 
>>
>> > http://wiki.apache.org/solr/SimpleFacetParameters
>> > facet.offset
>> >
>> > This param indicates an offset into the list of constraints to allow
>> > paging.
>> >
>> > The default value is 0.
>> >
>> > This parameter can be specified on a per field basis.
>> >
>> >
>> > Dmitry
>> >
>> >
>> > On Thu, Jun 16, 2011 at 1:39 PM, Tommaso Teofili
>> > wrote:
>> >
>> > > Hi all,
>> > > Do you know if it is possible to show the facets for a particular field
>> > > related only to the first N docs of the total number of results?
>> > > It seems facet.limit doesn't help with it as it defines a window in the
>> > > facet constraints returned.
>> > > Thanks in advance,
>> > > Tommaso
>> > >
>> >
>> >
>> >
>> > --
>> > Regards,
>> >
>> > Dmitry Kan
>> >
>>
>
>
>
> --
> Regards,
>
> Dmitry Kan
>

Re: Multiple indexes

2011-06-19 Thread lee carroll

your data is being used to build an inverted index rather than being
stored as a set of records. de-normalising is fine in most cases. what
is your use case which requires a normalised set of indices ?

2011/6/18 François Schiettecatte :
> You would need to run two independent searches and then 'join' the results.
>
> It is best not to apply a 'sql' mindset to SOLR when it comes to 
> (de)normalization, whereas you strive for normalization in sql, that is 
> usually counter-productive in SOLR. For example, I am working on a project 
> with 30+ normalized tables, but only 4 cores.
>
> Perhaps describing what you are trying to achieve would give us greater 
> insight and thus be able to make more concrete recommendation?
>
> Cheers
>
> François
>
> On Jun 18, 2011, at 2:36 PM, shacky wrote:
>
>> Il 18 giugno 2011 20:27, François Schiettecatte
>>  ha scritto:
>>> Sure.
>>
>> So I can have some searches similar to JOIN on MySQL?
>> The problem is that I need at least two tables in which search data..
>
>

Re: Why are not query keywords treated as a set?

2011-06-19 Thread lee carroll

do you mean a phrase query? "past past"
can you give some more detail?

On 18 June 2011 13:02, Gabriele Kahlout  wrote:
> q=past past
>
> 1.0 = (MATCH) sum of:
> *  0.5 = (MATCH) fieldWeight(content:past in 0), product of:*
>   1.0 = tf(termFreq(content:past)=1)
>   1.0 = idf(docFreq=1, maxDocs=2)
>   0.5 = fieldNorm(field=content, doc=0)
> *  0.5 = (MATCH) fieldWeight(content:past in 0), product of:*
>   1.0 = tf(termFreq(content:past)=1)
>   1.0 = idf(docFreq=1, maxDocs=2)
>   0.5 = fieldNorm(field=content, doc=0)
>
> Is there how I can treat the query keywords as a set?
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>

Re: Why are not query keywords treated as a set?

2011-06-19 Thread lee carroll

this might help in your analysis chain

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory



On 20 June 2011 04:21, Gabriele Kahlout  wrote:
> past past
> *past past*
> *content:past content:past*
>
> I was expecting the query to get parsed into content:past only and not
> content:past content:past.
>
> On Mon, Jun 20, 2011 at 12:12 AM, lee carroll
> wrote:
>
>> do you mean a phrase query? "past past"
>> can you give some more detail?
>>
>> On 18 June 2011 13:02, Gabriele Kahlout  wrote:
>> > q=past past
>> >
>> > 1.0 = (MATCH) sum of:
>> > *  0.5 = (MATCH) fieldWeight(content:past in 0), product of:*
>> >   1.0 = tf(termFreq(content:past)=1)
>> >   1.0 = idf(docFreq=1, maxDocs=2)
>> >   0.5 = fieldNorm(field=content, doc=0)
>> > *  0.5 = (MATCH) fieldWeight(content:past in 0), product of:*
>> >   1.0 = tf(termFreq(content:past)=1)
>> >   1.0 = idf(docFreq=1, maxDocs=2)
>> >   0.5 = fieldNorm(field=content, doc=0)
>> >
>> > Is there how I can treat the query keywords as a set?
>> >
>> > --
>> > Regards,
>> > K. Gabriele
>> >
>> > --- unchanged since 20/9/10 ---
>> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
>> > receipt within 48 hours then I don't resend the email.
>> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> time(x)
>> > < Now + 48h) ⇒ ¬resend(I, this).
>> >
>> > If an email is sent by a sender that is not a trusted contact or the
>> email
>> > does not contain a valid code then the email is not received. A valid
>> code
>> > starts with a hyphen and ends with "X".
>> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
>> > L(-[a-z]+[0-9]X)).
>> >
>>
>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>

Re: Parse solr json object

2011-06-22 Thread lee carroll

try this mail list
http://docs.jquery.com/Discussion
or this doc
http://api.jquery.com/jQuery.each/


On 21 June 2011 07:32, Romi  wrote:
> Hi, for enabling highlighting i want to parse json object. for readilibility
> i included xml format of that json object. please tell me how should i parse
> this object using
> $.each("", function(i,item){
>
> so that i could get highlighted result.
>
>
> 
> -
> 
> -
> 
> -
> 
> These elegant and fluid earrings have six round prong-set and
> twenty-six faceted briolette
> 
> 
> 
> -
> 
> -
> 
> -
> 
> These elegant and fluid earrings have six round prong-set and
> twenty-six faceted briolette
> 
> 
> 
>
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Parse-solr-json-object-tp3089470p3089470.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: MultiValued facet behavior question

2011-06-22 Thread lee carroll

Can your front end app normalize the q parameter. Either with a drop
down or a type a head derived from the values in the specialties
field. that way q will match value(s) in your facet results. I'm not
sure what you are trying to achieve though so maybe i'm off the mark.



On 22 June 2011 04:37, Bill Bell  wrote:
> Doing it with q=specialities:Cardiologist or
> q=Cardiologist&defType=dismax&qf=specialties
> does not matter, the issue is how I see facets. I want the facets to only
> show the one match,
> and not all the multiValued fields in specialties that match...
>
> Example,
>
> Name|specialties
> Bell|Cardiologist
> Smith|Cardiologist,Family Doctor
> Adams,Cardiologist,Family Doctor,Internist
>
> When I facet.field=specialties I get:
>
> Cardiologist: 3
> Internist: 1
> Family Doctor: 1
>
>
> I only want it to return:
>
> Cardiologist: 3
>
> Because this matches exactly... Facet on the field that matches and only
> return the number for that.
>
> It can get more complicated. Here is another example:
>
> q=cardiology&defType=dismax&qf=specialties
>
>
> (Cardiology and cardiologist are stems)...
>
> But I don't really know which value in Cardiologist match perfectly.
>
> Again, I only want it to return:
>
> Cardiologist: 3
>
> If I searched on q=internist&defType=dismax&qf=specialties, I want the
> result to be:
>
>
> Internist: 1
>
>
> Does this all make sense?
>
>
>
>
>
>
>
> On 6/21/11 8:23 PM, "Darren Govoni"  wrote:
>
>>So are you saying that for all results for "cardiologist",
>>you don't want facets not matching "Cardiologist" to be
>>returned as facets?
>>
>>what happens when you make q=specialities:Cardiologist?
>>instead of just q=Cardiologist?
>>
>>Seems that if you make the query on the field, then all
>>your results will necessarily qualify and you can discard
>>any additional facets you don't want (e.g. that don't
>>match the initial query term).
>>
>>Maybe you can write what you see now, with what you
>>want to help clarify.
>>
>>On 06/21/2011 09:47 PM, Bill Bell wrote:
>>> I have a field: specialties that is multiValued.
>>>
>>> It indicates the doctor's specialties: cardiologist, internist, etc.
>>>
>>> When someone does a search: "Cardiologist", I use
>>>
>>>q=cardiologist&defType=dismax&qf=specialties&facet=true&facet.field=speci
>>>alt
>>> ies
>>>
>>> What I want to come out in the facet is the Cardiologist (since it
>>>matches
>>> exactly) and the number that matches: 700.
>>> I don't want to see the other values that are not Cardiologist.
>>>
>>> Now I see:
>>>
>>> Cardiologist: 700
>>> Internist: 45
>>> Family Doctor: 20
>>>
>>> This means that several Cardiologist's are also internists and family
>>> doctors. When it matches exactly, I don't want to see Internists, Family
>>> Doctors. How do I send a query to Solr with a condition.
>>> Facet.query=specialties:Cardiologist&facet.field=specialties
>>>
>>> Then if the query returns something use it, otherwise use the field one?
>>>
>>> Other ideas?
>>>
>>>
>>>
>>>
>>
>
>
>

Re: MultiValued facet behavior question

2011-06-22 Thread lee carroll

Oh sorry forgot to also type:
Often facet fields are not stemmed or heavily analysed. The facet
values are from the index.


On 22 June 2011 08:21, lee carroll  wrote:
> Can your front end app normalize the q parameter. Either with a drop
> down or a type a head derived from the values in the specialties
> field. that way q will match value(s) in your facet results. I'm not
> sure what you are trying to achieve though so maybe i'm off the mark.
>
>
>
> On 22 June 2011 04:37, Bill Bell  wrote:
>> Doing it with q=specialities:Cardiologist or
>> q=Cardiologist&defType=dismax&qf=specialties
>> does not matter, the issue is how I see facets. I want the facets to only
>> show the one match,
>> and not all the multiValued fields in specialties that match...
>>
>> Example,
>>
>> Name|specialties
>> Bell|Cardiologist
>> Smith|Cardiologist,Family Doctor
>> Adams,Cardiologist,Family Doctor,Internist
>>
>> When I facet.field=specialties I get:
>>
>> Cardiologist: 3
>> Internist: 1
>> Family Doctor: 1
>>
>>
>> I only want it to return:
>>
>> Cardiologist: 3
>>
>> Because this matches exactly... Facet on the field that matches and only
>> return the number for that.
>>
>> It can get more complicated. Here is another example:
>>
>> q=cardiology&defType=dismax&qf=specialties
>>
>>
>> (Cardiology and cardiologist are stems)...
>>
>> But I don't really know which value in Cardiologist match perfectly.
>>
>> Again, I only want it to return:
>>
>> Cardiologist: 3
>>
>> If I searched on q=internist&defType=dismax&qf=specialties, I want the
>> result to be:
>>
>>
>> Internist: 1
>>
>>
>> Does this all make sense?
>>
>>
>>
>>
>>
>>
>>
>> On 6/21/11 8:23 PM, "Darren Govoni"  wrote:
>>
>>>So are you saying that for all results for "cardiologist",
>>>you don't want facets not matching "Cardiologist" to be
>>>returned as facets?
>>>
>>>what happens when you make q=specialities:Cardiologist?
>>>instead of just q=Cardiologist?
>>>
>>>Seems that if you make the query on the field, then all
>>>your results will necessarily qualify and you can discard
>>>any additional facets you don't want (e.g. that don't
>>>match the initial query term).
>>>
>>>Maybe you can write what you see now, with what you
>>>want to help clarify.
>>>
>>>On 06/21/2011 09:47 PM, Bill Bell wrote:
>>>> I have a field: specialties that is multiValued.
>>>>
>>>> It indicates the doctor's specialties: cardiologist, internist, etc.
>>>>
>>>> When someone does a search: "Cardiologist", I use
>>>>
>>>>q=cardiologist&defType=dismax&qf=specialties&facet=true&facet.field=speci
>>>>alt
>>>> ies
>>>>
>>>> What I want to come out in the facet is the Cardiologist (since it
>>>>matches
>>>> exactly) and the number that matches: 700.
>>>> I don't want to see the other values that are not Cardiologist.
>>>>
>>>> Now I see:
>>>>
>>>> Cardiologist: 700
>>>> Internist: 45
>>>> Family Doctor: 20
>>>>
>>>> This means that several Cardiologist's are also internists and family
>>>> doctors. When it matches exactly, I don't want to see Internists, Family
>>>> Doctors. How do I send a query to Solr with a condition.
>>>> Facet.query=specialties:Cardiologist&facet.field=specialties
>>>>
>>>> Then if the query returns something use it, otherwise use the field one?
>>>>
>>>> Other ideas?
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>

Re: MultiValued facet behavior question

2011-06-22 Thread lee carroll

Hi Bill, can you explain a little bit more around why you need this.
Knowing the motivation
might suggest a different solution not just involving faceting.



On 22 June 2011 08:49, Bill Bell  wrote:
> You can type q=cardiology and match on cardiologist. If stemming did not
> work you can just add a synonym:
>
> cardiology,cardiologist
>
> But that is not the issue. The issue is around multiValue fields and
> facets. You would expect a user
> Who is searching on the multiValued field to match on some values in
> there. For example,
> they type "Cardiologist" and it matches on the value "Cardiologist". So it
> matches "in the multiValue field".
> So that part works. Then when I output the facet, I need a different
> behavior than the default. I need
> The facet to only output the value that matches (scored) - NOT ALL VALUES
> in the multiValued field.
>
> I think it makes sense?
>
>
> On 6/22/11 1:42 AM, "Michael Kuhlmann"  wrote:
>
>>Am 22.06.2011 05:37, schrieb Bill Bell:
>>> It can get more complicated. Here is another example:
>>>
>>> q=cardiology&defType=dismax&qf=specialties
>>>
>>>
>>> (Cardiology and cardiologist are stems)...
>>>
>>> But I don't really know which value in Cardiologist match perfectly.
>>>
>>> Again, I only want it to return:
>>>
>>> Cardiologist: 3
>>
>>You would never get "Cardiologist: 3" as the facet result, because if
>>"Cardiologist" would be in your index, it's impossible to find it when
>>searching for "cardiology" (except when you manage to write some strange
>>tokenizer that translates "cardiology" to "Cardiologist" on query time,
>>including the upper case letter).
>>
>>Facets are always taken from the index, so they usually match exactly or
>>never when querying for it.
>>
>>-Kuli
>
>
>

Re: MultiValued facet behavior question

2011-06-22 Thread lee carroll

Hi Bill,

>So that part works. Then when I output the facet, I need a different
>behavior than the default. I need
>The facet to only output the value that matches (scored) - NOT ALL VALUES
>in the multiValued field.

>I think it makes sense?

Why do you need this ? If your use case is faceted navigation then not showing
all the facet terms which match your query would be mis-leading to your users.
The fact is your data indicates Ben the cardiologist is also a GP etc.
Is it not valid for
your users to be able to further filter on cardiologists who are also
specialists in x other disciplines ? If the specialisms are mutually
exclusive then your data will reflect this.

The fact is x number of cardiologists match and x number of GP's match etc

I may be missing the point here as you have not said why you need to do this ?

cheers lee c

On 22 June 2011 09:34, Michael Kuhlmann  wrote:
> Am 22.06.2011 09:49, schrieb Bill Bell:
>> You can type q=cardiology and match on cardiologist. If stemming did not
>> work you can just add a synonym:
>>
>> cardiology,cardiologist
>
> Okay, synonyms are the only way I can think of a realistic match.
>
> Stemming won't work on a facet field; you wouldn't get "Cardiologist: 3"
> as the result but "cardiolog: 3" or something like that instead.
>
> Normally, you use declare facet field explicitly for facetting, and not
> for searching, exactly because stemming and tokenizing on facet fields
> don't make sense.
>
> And the short answer is: No, that's not possible.
>
> -Kuli
>

Re: Understanding query explain information

2011-06-22 Thread lee carroll

Hi are you using synonyms ?



On 22 June 2011 10:30, Alexander Ramos Jardim
 wrote:
> Hi guys,
>
> I am getting some doubts about how to correctly understand the debugQuery
> output. I have a field named itemName in my index. This is a text field,
> just that. When I quqery a simple ?q=itemName:iPad , I end up with the
> following query result.
>
> Simply trying to understand why these strings generated such scores, and as
> far as I can understand, the only difference between them is the field
> norms, as all the other results maintain themselves.
>
> Now, how do I get these field norm values? Field Norm is the result of this
> formula right?
>
> *1/square root of (terms)*,* where terms is the number of terms in my field
>> after it is indexed*
>>
>
> Well, if this is true, the field norm for my first document should be 0.5
> (1/sqrt(4)) as  "Livro - IPAD - O Guia do Profissional" ends up with the
> terms "livro|ipad|guia|profissional" as tokens.
>
> What I am forgetting to take into account?
>
> 
> 
>
> 
>  0
>  3
>  
>  on
>  0
>
>  10
>  
>        on
>        on
>  
>  itemName,score
>  2.2
>
>  itemName:ipad
>  
> 
> 
>  
>  3.6808658
>  Livro - IPAD - O Guia do Profissional
>  
>
>  
>  3.1550279
>  Leitor de Cartão para Ipad - Mobimax
>  
>  
>  3.1550279
>  Sleeve para iPad
>
>  
>  
>  3.1550279
>  Sleeve de Neoprene para iPad
>  
>  
>  3.1550279
>
>  Carregador de parede para iPad
>  
>  
>  2.6291897
>  Case Envelope para iPad - Black - Built NY
>  
>  
>
>  2.6291897
>  Case Protetora p/ IPad de Silicone Duo - Browm
> - Iskin
>  
>  
>  2.6291897
>  Case Protetora p/ IPad de Silicone Duo - Clear
> - Iskin
>  
>
>  
>  2.6291897
>  Case p/ iPad Sleeve - Black - Built NY
>  
>  
>  2.6291897
>  Bolsa de Proteção p/ iPad Preta - Geonav
>
>  
> 
> 
>  itemName:ipad
>  itemName:ipad
>  itemName:ipad
>  itemName:ipad
>  
>
>  
> 3.6808658 = (MATCH) fieldWeight(itemName:ipad in 102507), product of:
>  1.0 = tf(termFreq(itemName:ipad)=1)
>  8.413407 = idf(docFreq=165, maxDocs=275239)
>  0.4375 = fieldNorm(field=itemName, doc=102507)
> 
>  
> 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226401), product of:
>  1.0 = tf(termFreq(itemName:ipad)=1)
>  8.413407 = idf(docFreq=165, maxDocs=275239)
>  0.375 = fieldNorm(field=itemName, doc=226401)
> 
>  
> 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226409), product of:
>  1.0 = tf(termFreq(itemName:ipad)=1)
>  8.413407 = idf(docFreq=165, maxDocs=275239)
>  0.375 = fieldNorm(field=itemName, doc=226409)
> 
>  
> 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226447), product of:
>  1.0 = tf(termFreq(itemName:ipad)=1)
>  8.413407 = idf(docFreq=165, maxDocs=275239)
>  0.375 = fieldNorm(field=itemName, doc=226447)
> 
>  
>
> 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226583), product of:
>  1.0 = tf(termFreq(itemName:ipad)=1)
>  8.413407 = idf(docFreq=165, maxDocs=275239)
>  0.375 = fieldNorm(field=itemName, doc=226583)
> 
>  
> 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223178), product of:
>  1.0 = tf(termFreq(itemName:ipad)=1)
>  8.413407 = idf(docFreq=165, maxDocs=275239)
>  0.3125 = fieldNorm(field=itemName, doc=223178)
> 
>  
> 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223196), product of:
>  1.0 = tf(termFreq(itemName:ipad)=1)
>  8.413407 = idf(docFreq=165, maxDocs=275239)
>  0.3125 = fieldNorm(field=itemName, doc=223196)
> 
>  
> 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223831), product of:
>  1.0 = tf(termFreq(itemName:ipad)=1)
>  8.413407 = idf(docFreq=165, maxDocs=275239)
>  0.3125 = fieldNorm(field=itemName, doc=223831)
> 
>  
> 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223856), product of:
>  1.0 = tf(termFreq(itemName:ipad)=1)
>  8.413407 = idf(docFreq=165, maxDocs=275239)
>  0.3125 = fieldNorm(field=itemName, doc=223856)
>
> 
>  
> 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223908), product of:
>  1.0 = tf(termFreq(itemName:ipad)=1)
>  8.413407 = idf(docFreq=165, maxDocs=275239)
>  0.3125 = fieldNorm(field=itemName, doc=223908)
> 
>  
>  LuceneQParser
>  
>  3.0
>  
>
>        1.0
>        
>         1.0
>        
>        
>         0.0
>        
>
>        
>         0.0
>        
>        
>         0.0
>        
>        
>         0.0
>
>        
>        
>         0.0
>        
>        
>         0.0
>        
>  
>
>  
>        2.0
>        
>         1.0
>        
>        
>         0.0
>
>        
>        
>         0.0
>        
>        
>         0.0
>        
>        
>
>         0.0
>        
>        
>         0.0
>        
>        
>         1.0
>
>        
>  
>  
> 
> 
>
>
>
> --
> Alexander Ramos Jardim
>

Re: MultiValued facet behavior question

2011-06-22 Thread lee carroll

Hi Dennis,

I think maybe I just disagree. Your not showing facet counts for
cardiologists and Family Doctors independently. The Family Doctor
count will be all Family Doctors who are also Cardiologists.

This allows users to further filter Cardiologists who are also family
Doctors. (this could be of use to them ??)

If your front end app implements the filtering as a list of fq=xxx
then that would make for consistent results ?

I don't see how not showing that some cardiologists are also Family
Doctors is a better user experience... But again you might have a very
specific use case?

On 22 June 2011 13:44, Dennis de Boer  wrote:
> Hi Lee,
>
> since I have the same problem, I might as well try to answer this question.
>
> You want this behaviour to make things clear for your users. If they select
> cardiologists, does it make sense to also show family doctors as a
> facetvalue to the user.
> The same thing goed for the facets that are related to family doctors. They
> are returned as well, thus making it even moren unclear for the end-user.
>
>
>
> On Wed, Jun 22, 2011 at 2:27 PM, lee carroll
> wrote:
>
>> Hi Bill,
>>
>> >So that part works. Then when I output the facet, I need a different
>> >behavior than the default. I need
>> >The facet to only output the value that matches (scored) - NOT ALL VALUES
>> >in the multiValued field.
>>
>> >I think it makes sense?
>>
>> Why do you need this ? If your use case is faceted navigation then not
>> showing
>> all the facet terms which match your query would be mis-leading to your
>> users.
>> The fact is your data indicates Ben the cardiologist is also a GP etc.
>> Is it not valid for
>> your users to be able to further filter on cardiologists who are also
>> specialists in x other disciplines ? If the specialisms are mutually
>> exclusive then your data will reflect this.
>>
>> The fact is x number of cardiologists match and x number of GP's match etc
>>
>> I may be missing the point here as you have not said why you need to do
>> this ?
>>
>> cheers lee c
>>
>>
>> On 22 June 2011 09:34, Michael Kuhlmann  wrote:
>> > Am 22.06.2011 09:49, schrieb Bill Bell:
>> >> You can type q=cardiology and match on cardiologist. If stemming did not
>> >> work you can just add a synonym:
>> >>
>> >> cardiology,cardiologist
>> >
>> > Okay, synonyms are the only way I can think of a realistic match.
>> >
>> > Stemming won't work on a facet field; you wouldn't get "Cardiologist: 3"
>> > as the result but "cardiolog: 3" or something like that instead.
>> >
>> > Normally, you use declare facet field explicitly for facetting, and not
>> > for searching, exactly because stemming and tokenizing on facet fields
>> > don't make sense.
>> >
>> > And the short answer is: No, that's not possible.
>> >
>> > -Kuli
>> >
>>
>

Re: Complex situation

2011-06-23 Thread lee carroll

Hi Roy,

You have no relationship between time and date due to the
de-normalising of your data.

I don't have a good answer to this and I guess this is a "classic" question.

One approach is maybe to do the following:

make sure you have field collapsing available. trunk or a patch maybe

index not at the shop entity level but shop-opening level so your records are

shop fromDate toDateclosingTime
1  12/12/2010  12/12/2011  18:00
1  12/12/2011  12/12/2012   20:00

Field collapse on shop id. Note this impacts on your number of records
and could be a lot of change for your app :-)

I'm also not sure if field collapsing will have the desired effect on
the facet counts and will behave as expected. Anyone with better
knowledge? is their a better way ?

Anyway, good luck with it Roy


On 23 June 2011 08:29, roySolr  wrote:
> Hello,
>
> I have change my db dates to the correct format like 2011-01-11T00:00:00Z.
>
> Now i have the following data:
>
>
> Manchester Store        2011-01-01T00:00:00Z
> 2011-31-03T00:00:00Z     18:00
> Manchester Store        2011-01-04T00:00:00Z
> 2011-31-12T00:00:00Z     20:00
>
> The "Manchester Store" has 2 seasons with different closing times(18:00 and
> 20:00). Now i have
> 4 fields in SOLR.
>
> Companyname             Manchester Store
> startdate(multiV)          2011-01-01T00:00:00Z, 2011-01-04T00:00:00Z
> enddate(multiV)           2011-31-03T00:00:00Z, 2011-31-12T00:00:00Z
> closingTime(multiV)      18:00, 20:00
>
> I want some facets like this:
>
> Open today(2011-23-06):
> 20:00(1)
>
> The facet query needs to look what's the current date and needs to use that
> closing time. My facet.query look like this:
>
> facet.query=startdate:[* TO NOW] AND enddate:[NOW TO *] AND
> closingTime:"18:00"
>
> This returns 1 count like this: 18:00(1)
>
> When i use this facet.query it returns also 1 result:
>
> facet.query=startdate:[* TO NOW] AND enddate:[NOW TO *] AND
> closingTime:"20:00"
>
> This result is not correct because NOW(2011-23-06) it's not open till 20:00.
> It looks like there is no link between the season and the closingTime. Can
> somebody helps me?? The fields in SOLR are not correct?
>
> Thanks Roy
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Complex-situation-tp3071936p3098875.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Understanding query explain information

2011-06-24 Thread lee carroll

Is it possible that synonyms are being added (synonym expansion) or at
least changing
the field length. I've saw this before. Check what exactly what terms
have been added.


On 23 June 2011 22:50, Alexander Ramos Jardim
 wrote:
> Yes, I am using synonims in index time.
>
> 2011/6/22 lee carroll 
>
>> Hi are you using synonyms ?
>>
>>
>>
>> On 22 June 2011 10:30, Alexander Ramos Jardim
>>  wrote:
>> > Hi guys,
>> >
>> > I am getting some doubts about how to correctly understand the debugQuery
>> > output. I have a field named itemName in my index. This is a text field,
>> > just that. When I quqery a simple ?q=itemName:iPad , I end up with the
>> > following query result.
>> >
>> > Simply trying to understand why these strings generated such scores, and
>> as
>> > far as I can understand, the only difference between them is the field
>> > norms, as all the other results maintain themselves.
>> >
>> > Now, how do I get these field norm values? Field Norm is the result of
>> this
>> > formula right?
>> >
>> > *1/square root of (terms)*,* where terms is the number of terms in my
>> field
>> >> after it is indexed*
>> >>
>> >
>> > Well, if this is true, the field norm for my first document should be 0.5
>> > (1/sqrt(4)) as  "Livro - IPAD - O Guia do Profissional" ends up with the
>> > terms "livro|ipad|guia|profissional" as tokens.
>> >
>> > What I am forgetting to take into account?
>> >
>> > 
>> > 
>> >
>> > 
>> >  0
>> >  3
>> >  
>> >  on
>> >  0
>> >
>> >  10
>> >  
>> >        on
>> >        on
>> >  
>> >  itemName,score
>> >  2.2
>> >
>> >  itemName:ipad
>> >  
>> > 
>> > 
>> >  
>> >  3.6808658
>> >  Livro - IPAD - O Guia do Profissional
>> >  
>> >
>> >  
>> >  3.1550279
>> >  Leitor de Cartão para Ipad - Mobimax
>> >  
>> >  
>> >  3.1550279
>> >  Sleeve para iPad
>> >
>> >  
>> >  
>> >  3.1550279
>> >  Sleeve de Neoprene para iPad
>> >  
>> >  
>> >  3.1550279
>> >
>> >  Carregador de parede para iPad
>> >  
>> >  
>> >  2.6291897
>> >  Case Envelope para iPad - Black - Built NY
>> >  
>> >  
>> >
>> >  2.6291897
>> >  Case Protetora p/ IPad de Silicone Duo - Browm
>> > - Iskin
>> >  
>> >  
>> >  2.6291897
>> >  Case Protetora p/ IPad de Silicone Duo - Clear
>> > - Iskin
>> >  
>> >
>> >  
>> >  2.6291897
>> >  Case p/ iPad Sleeve - Black - Built NY
>> >  
>> >  
>> >  2.6291897
>> >  Bolsa de Proteção p/ iPad Preta - Geonav
>> >
>> >  
>> > 
>> > 
>> >  itemName:ipad
>> >  itemName:ipad
>> >  itemName:ipad
>> >  itemName:ipad
>> >  
>> >
>> >  
>> > 3.6808658 = (MATCH) fieldWeight(itemName:ipad in 102507), product of:
>> >  1.0 = tf(termFreq(itemName:ipad)=1)
>> >  8.413407 = idf(docFreq=165, maxDocs=275239)
>> >  0.4375 = fieldNorm(field=itemName, doc=102507)
>> > 
>> >  
>> > 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226401), product of:
>> >  1.0 = tf(termFreq(itemName:ipad)=1)
>> >  8.413407 = idf(docFreq=165, maxDocs=275239)
>> >  0.375 = fieldNorm(field=itemName, doc=226401)
>> > 
>> >  
>> > 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226409), product of:
>> >  1.0 = tf(termFreq(itemName:ipad)=1)
>> >  8.413407 = idf(docFreq=165, maxDocs=275239)
>> >  0.375 = fieldNorm(field=itemName, doc=226409)
>> > 
>> >  
>> > 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226447), product of:
>> >  1.0 = tf(termFreq(itemName:ipad)=1)
>> >  8.413407 = idf(docFreq=165, maxDocs=275239)
>> >  0.375 = fieldNorm(field=itemName, doc=226447)
>> > 
>> >  
>> >
>> > 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226583), product of:
>> >  1.0 = tf(termFreq(itemName:ipad)=1)
>> >  8.413407 = idf(docFreq=165, maxDocs=275239)
>> >  0.375 = fieldNorm(field=itemName, doc=226583)
>> > 
>> >  
>

Re: Solr 3.1 indexing error Invalid UTF-8 character 0xffff

2011-06-27 Thread lee carroll

Hi Markus

I've seen similar issue before (but not with solr) when processing files as xml.
In our case the problem was due to processing a utf16 file with a byte
order mark. This presents itself as
0x to the xml parser which is not used by utf8 (the bom unicode
would be represented as efbfbf in utf8) This caused the utf8
aware parser to choke.

I don't want to get involved in any unicode / utf war as I'm confused
enough as it stands but
could you check for utf16 files before processing ?

lee c

On 27 June 2011 14:26, Thomas Fischer  wrote:
> Hello,
>
> Am 27.06.2011 um 12:40 schrieb Markus Jelsma:
>
>> Hi,
>>
>> I came across the indexing error below. It happened in a huge batch update
>> from Nutch with SolrJ 3.1. Since the crawl was huge it is very hard to trace
>> the error back to a specific document. So i try my luck here: anyone seen 
>> this
>> before with SolrJ 3.1? Anything else on the Nutch part i should have taken
>> care off?
>>
>> Thanks!
>>
>>
>> Jun 27, 2011 10:24:28 AM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/update params={wt=javabin&version=2} status=500 
>> QTime=423
>> Jun 27, 2011 10:24:28 AM org.apache.solr.common.SolrException log
>> SEVERE: java.lang.RuntimeException: [was class 
>> java.io.CharConversionException] Invalid UTF-8 character 0x at char 
>> #1142033, byte #1155068)
>>       at 
>> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
>
> and loads of other rubbish and
>
>>       ... 26 more
>
>
> I see this as a problem of solr error-reporting. This is not only obnoxiously 
> "loud" (white on grey with oversized fonts), but less useful than it should 
> be.
> Instead of telling the user where the error occurred (i.e. while reading 
> which file, which column at which line) it unravels the stack. This is 
> useless if the program just choked on some unexpected input, like a typo in a 
> schema of config file or an invalid character in a file to be indexed.
> I don't know if this is due to the Tomcat, the logging system of solr itself, 
> but it is annoying.
>
> And yes, I've seen something like this before and found the error not by 
> inspecting solr but by opening the suspected files with an appropriate 
> browser (e.g. Firefox) which tells me exactly where something goes wrong.
>
> All the best
> Thomas
>
>

Re: Default schema - 'keywords' not multivalued

2011-06-27 Thread lee carroll

Hi Tod,
A list of keywords would be fine in a non multi valued field:

keywords : "xxx yyy sss aaa "

multi value field would allow you to repeat the field when indexing

keywords: "xxx"
keywords: "yyy"
keywords: "sss"
etc


On 27 June 2011 16:13, Tod  wrote:
> This was a little curious to me and I wondered what the thought process was
> behind it before I decide to change it.
>
>
> Thanks - Tod
>

moving to multicore without changing existing index

2011-06-28 Thread lee carroll

hi
I'm looking at setting up multi core indices but also have an exiting
index. Can I run
this index along side new index set up as cores. On a dev  machine
I've experimented with
simply adding solr.xml in slor home and listing the new cores in the
cores element but this breaks the existing
index.

container is tomcat and attempted set up was:

solrHome
 conf (existing running index)
 core1 (new core directory)
 solr.xml (cores element has one entry for core1)

Is this a valid approach ?

thanks lee

Re: Building a facet search filter frontend in XSLT

2011-06-29 Thread lee carroll

Hi Filype,

in the response you should have a list of fq arguments something like

field:facetValue
field:FacetValue


use this to set your inputs to be selected / checked



On 29 June 2011 23:54, Filype Pereira  wrote:
> Hi all,
> I am looking for some help in building a front end facet filter using XSLT.
> The code I use is: http://pastebin.com/xVv9La9j
> On the image attached, the checkbox should be selected. (You clicked and
> submited the facet form. The URL changed)
> I can use xsl:if, but there's nothing that I can use on the XML that will
> let me test before outputting the input checkbox.
> Has anyone done any similar thing?
> I haven't seen any examples building a facet search filter frontend in XSLT,
> the example.xsl that comes with solr is pretty basic, are there any other
> examples in XSLT implementing the facet filters around?
> Thanks,
> Filype
>

Re: How do I add a custom field?

2011-07-03 Thread lee carroll

Hi Gabriele,
Did you index any docs with your new field ?

The results will just bring back docs and what fields they have. They won't
bring back "null" fields just because they are in your schema. Lucene
is schema-less.
Solr adds the schema to make it nice to administer and very powerful to use.





On 3 July 2011 11:01, Gabriele Kahlout  wrote:
> Hello,
>
> I want to have an additional  field that appears for every document in
> search results. I understand that I should do this by adding the field to
> the schema.xml, so I add:
>     indexed="false"/>
> Then I restart Solr (so that I loads the new schema.xml) and make a query
> specifying that it should return myField too, but it doesn't. Will it do
> only for newly indexed documents? Am I missing something?
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>

Stored Field

2011-07-14 Thread lee carroll

Hi
Do Stored field values get added to the index for each document field
combination literally or is a pointer used ?
I've been reading http://lucene.apache.org/java/2_4_0/fileformats.pdf
and I think thats the case but not 100% so thought I'd ask.

In logical terms for stored fields do we get this sort of storage:

doc0 field0 > "xxx xx xx xx xx xx xx xx xx xxx"
doc0field1 > "yyy yy yy yy yy yy yy yy yyy"
doc1 field0 > "xxx xx xx xx xx xx xx xx xx xxx"
doc1field1 > "yyy yy yy yy yy yy yy yy yyy"

or this:

doc0 field0 > {1}
doc0field1 > {2}
doc1 field0 > {1}
doc1field1 > {2}

val1 >"xxx xx xx xx xx xx xx xx xx xxx"
val2 >"yyy yy yy yy yy yy yy yy yyy"

I'm trying to understand possible impact of storing fields which have
a small set of repeating values, hoping it would not have an impact on
file size. But I'm now think it will?

thanks in advance

Re: Strip special chars like "-"

2011-08-09 Thread lee carroll

Hi I might be wrong as I've not tried it out to be sure but from the wiki docs:

These parameters may be combined in any way.

Example of generateWordParts="1" and catenateWords="1":
"PowerShot" -> 0:"Power", 1:"Shot" 1:"PowerShot"
(where 0,1,1 are token positions)

does that fit the bill ?

On 9 August 2011 16:03, roySolr  wrote:
> Ok, i there are three query possibilities:
>
> Manchester-united
> Manchester united
> Manchesterunited
>
> The original name of the club is "manchester-united".
>
>
> generateWordParts will fixes two of these possibilities:
>
> "Manchester-united" => "manchester","united"
>
> I can search for "Manchester-united" and "manchester" "united". When i
> search for "manchesterunited" i get no results.
>
> To fix this i could use catenateWords:
>
> "Manchester-united" => "manchesterunited"
>
> In this situation i can search for  "Manchester-united" and
> "manchesterunited". When i search for "manchester united" i get no results.
> The catenateWords option will also fixes only 2 situations.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239256.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: document indexing

2011-08-10 Thread lee carroll

It really does depend upon what you want to do in your app but from
the info given I'd go for denormalizing by repeating the least number
of values. So in your case that would be book

PageID+BookID(uniqueKey), pageID, PageVal1, PageValn, BookID, BookName




On 10 August 2011 09:46, directorscott  wrote:
> Hello,
>
> First of all, I am a beginner and i am trying to develop a sample
> application using SolrNet.
>
> I am struggling about schema definition i need to use to correspond my
> needs. In database, i have Books(bookId, name) and Pages(pageId, bookId,
> text) tables. They have master-detail relationship. I want to be able to
> search in Text area of Pages but list the books. Should i use a schema for
> Pages (with pageid as unique key) or for Books (with bookId as unique key)
> in this scenario?
>
> Thanks.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3241832.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: document indexing

2011-08-10 Thread lee carroll

With the first option you can be page specific in your search results
and searches.
Field collapsing/grouping will help with your normalisation issue.
(what you have listed is different to what I listed you don't have a
unique key)

Option 2 means you loose any ability to reference page, but as you
note your documents are at the level you wish your search results to
be returned.

if you are not interested in page then option 2.

On 10 August 2011 12:22, directorscott  wrote:
> Could you please tell me schema.xml "fields" tag content for such case?
> Currently index data is something like this:
>
> PageID BookID Text
> 1         1        "some text"
> 2         1        "some text"
> 3         1        "some text"
> 4         1        "some text"
> 5         2        "some text"
> 6         2        "some text"
> 7         2        "some text"
> 8         2        "some text"
>
> when i make a simple query for the word "some" on Text field, i will have
> all 8 rows returned. but i want to list only 2 items (Books with IDs 1 and
> 2)
>
> I am also considering to concatenate Text columns and have the index like
> this:
>
> BookID     PageTexts
> 1             "some text some text some text"
> 2             "some text some text some text"
>
> I wonder which index structure is better.
>
>
>
>
> lee carroll wrote:
>>
>> It really does depend upon what you want to do in your app but from
>> the info given I'd go for denormalizing by repeating the least number
>> of values. So in your case that would be book
>>
>> PageID+BookID(uniqueKey), pageID, PageVal1, PageValn, BookID, BookName
>>
>>
>>
>>
>> On 10 August 2011 09:46, directorscott <dgul...@gmail.com> wrote:
>>> Hello,
>>>
>>> First of all, I am a beginner and i am trying to develop a sample
>>> application using SolrNet.
>>>
>>> I am struggling about schema definition i need to use to correspond my
>>> needs. In database, i have Books(bookId, name) and Pages(pageId, bookId,
>>> text) tables. They have master-detail relationship. I want to be able to
>>> search in Text area of Pages but list the books. Should i use a schema
>>> for
>>> Pages (with pageid as unique key) or for Books (with bookId as unique
>>> key)
>>> in this scenario?
>>>
>>> Thanks.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3241832.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3242219.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Count rows with tokens

2011-08-22 Thread lee carroll

Hi This looks like a facteing problem.

See
http://wiki.apache.org/solr/SolrFacetingOverview

cheers lee c

On 22 August 2011 11:52, tom135  wrote:
> Hello,
>
> I want to use Solr as a search engine. I have indexed data like:
> ID | TEXT | CREATION_DATE
>
> Daily increase by 500 000 rows.
>
> My problem:
> *INPUT:* fixed set of tokens (max size 40), set of days
> *RESULT:* How many rows (TEXT) contain fixed set of tokens and are created
> in day1, day2, ..., day20
>
> I tried to build aggregates like:
> *1. Solution*
> DATE (days) | TOKEN_1 | TOKEN_2 | ... | TOKEN_40
>
> where for example:
> TOKEN_3 - string like "ID_1,ID_2,...,ID_N", where ID_* contain the TOKEN_3
>
> then I can split TOKEN_* to Set and size of Set is the number of
> distinct rows.
> *PROBLEM:* But here is the problem with sending to long strings that must be
> splitted by the client side (to big response data).
>
> *2. Solution*
> DATE (days) | TOKENS | COUNT
>
> where
> TOKENS contains combination of input tokens.
> For 3 tokens I have 7 combinations
> For 5 tokens I have 31 combinations
> For 10 tokens I have 1023 combinations
> For 20 tokens I have 1048575 combinations
> etc.
> *PROBLEM:* To many cases (combinations) with 40 tokens
>
> Maybe the 1 Solution would be good if I could split the strings by some Solr
> function (custom function) or...?
>
> Thanks for any ideas
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Count-rows-with-tokens-tp3274643p3274643.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Best way to anchor solr searches?

2011-08-25 Thread lee carroll

I don't think solr conforms to ACID type behaviours for its queries.
This is not to say your use-case is not important
just that its not SOLR's focus. I think its a interesting question but
the solution is probably going to involve rolling your own.

Something like returning 1 user docs and caching these in an
application cache. pagination occurs in this cache rather than as a
solr query with the start param incremented. Maybe offer a refresh
data link with repopulates the cache from solr.

cheers lee c

On 25 August 2011 01:01, arian487  wrote:
> If I'm searching for users based on last login time, and I search once, then
> go to the second page with a new offset, I could potentially see the same
> users on page 2 if the index has changed.  What is the best way to anchor it
> so I avoid this?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Best-way-to-anchor-solr-searches-tp3282576p3282576.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: I can't pass the unit test when compile from apache-solr-3.3.0-src

2011-08-31 Thread lee carroll

Not sure if this has progressed further but I'm getting test failure
for 3.3 also.

Trunk builds and tests fine but 3.3 fails the test below

(Note i've a new box so could be a silly set up issue i've missed but
i think everything is in place (latest version of java 1.6, latest
version of ant)  main difference is the number of cpu's went from
1 to 4)

failed test output is:
Testsuite: org.apache.solr.common.util.ContentStreamTest
Tests run: 3, Failures: 0, Errors: 1, Time elapsed: 21.172 sec
- Standard Error -
NOTE: reproduce with: ant test -Dtestcase=ContentStreamTest
-Dtestmethod=testURLStream
-Dtests.seed=743785413891938113:-7792321629547565878
NOTE: test params are: locale=ar_QA, timezone=Europe/Vilnius
NOTE: all tests run in this JVM:
[CommonGramsQueryFilterFactoryTest, TestBrazilianStemFilterFactory,
TestCzechStemFilterFactory, TestFrenchMinimalStemFilterFactory,
TestHindiFilters, TestKeywordMarkerFilterFactory,
TestPatternReplaceFilter, TestRemoveDuplicatesTokenFilter,
TestStemmerOverrideFilterFactory, TestUAX29URLEmailTokenizerFactory,
SolrExceptionTest, LargeVolumeJettyTest, TestUpdateRequestCodec,
ContentStreamTest]
NOTE: Windows XP 5.1 x86/Sun Microsystems Inc. 1.6.0_27
(32-bit)/cpus=4,threads=2,free=6342464,total=16252928
-  ---

Testcase: testStringStream took 0 sec
Testcase: testFileStream took 0 sec
Testcase: testURLStream took 21.157 sec
Caused an ERROR
Connection timed out: connect
java.net.ConnectException: Connection timed out: connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:395)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
at sun.net.www.http.HttpClient.(HttpClient.java:234)
at sun.net.www.http.HttpClient.New(HttpClient.java:307)
at sun.net.www.http.HttpClient.New(HttpClient.java:324)
at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
at java.net.URL.openStream(URL.java:1010)
at 
org.apache.solr.common.util.ContentStreamTest.testURLStream(ContentStreamTest.java:70)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)



On 3 August 2011 01:51, Shawn Heisey  wrote:
> On 7/29/2011 5:26 PM, Chris Hostetter wrote:
>>
>> Can you please be specific...
>>  * which test(s) fail for you?
>>  * what are the failures?
>>
>> Any time a test fails, that info appears in the "ant test" output, and the
>> full details or all tests are written to build/test-results
>>
>> you can run "ant test-reports" from the solr directory to generate an HTML
>> report of all the success/failure info.
>
> I am also having a consistent build failure with the 3.3 source.  Some info
> from junit about the failure is below.  If you want something different I
> still have it in my session, let me know.
>
>    [junit] NOTE: reproduce with: ant test
> -Dtestcase=TestSqlEntityProcessorDelta
> -Dtestmethod=testNonWritablePersistFile
> -Dtests.seed=4609081405510352067:771607526385155597
>    [junit] NOTE: test params are: locale=ko_KR, timezone=Asia/Saigon
>    [junit] NOTE: all tests run in this JVM:
>    [junit] [TestCachedSqlEntityProcessor, TestClobTransformer,
> TestContentStreamDataSource, TestDataConfig, TestDateFormatTransformer,
> TestDocBuilder, TestDocBuilder2, TestEntityProcessorBase, TestErrorHandling,
> TestEvaluatorBag, TestF                                 eldReader,
> TestFileListEntityProcessor, TestJdbcDataSource, TestLineEntityProcessor,
> TestNumberFormatTransformer, TestPlainTextEntityProcessor,
> TestRegexTransformer, TestScriptTransformer, TestSqlEntityProcessor,
> TestSqlEntityProcessor2
>  TestSqlEntityProcessorDelta]
>    [junit] NOTE: Linux 2.6.18-238.12.1.el5.centos.plusxen amd64/Sun
> Microsystems Inc. 1.6.0_26
> (64-bit)/cpus=3,threads=4,free=100917744,total=254148608
>
>
> Here's what I did on the last run:
>
> rm -rf lucene_solr_3_3
> svn co https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3
> lucene_sol

Re: I can't pass the unit test when compile from apache-solr-3.3.0-src

2011-09-02 Thread lee carroll

Hi Chris,

That makes sense. I was behind fire wall when running both builds. I
thought I was correctly proxied - but maybe the request was being
squashed
by something else before it even got to the firewall.

I've just ran tests again but this time outside of fire wall and all pass.

Thanks a lot

Lee C

On 2 September 2011 01:17, Chris Hostetter  wrote:
>
> : Trunk builds and tests fine but 3.3 fails the test below
>        ...
> : NOTE: reproduce with: ant test -Dtestcase=ContentStreamTest
> : -Dtestmethod=testURLStream
> : -Dtests.seed=743785413891938113:-7792321629547565878
>        ...
> : java.net.ConnectException: Connection timed out: connect
>        ...
> :       at 
> org.apache.solr.common.util.ContentStreamTest.testURLStream(ContentStreamTest.java:70)
> :       at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
> :       at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
>
> The line in question above is just trying to open a URL connection to
> svn.apache.org.  so if it's failing for you that probably means you are
> running tests on a box where the public internet is unreachable?
>
> the reason you don't see the same failure on trunk is because simonw
> seems to have fixed the test on trunk to use a proper "assume" statement,
> as part of a larger unrealted issue...
>
> https://svn.apache.org/viewvc?view=revision&revision=1055636
> https://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/test/org/apache/solr/common/util/ContentStreamTest.java?r1=1055636&r2=1055635&pathrev=1055636
>
> ...i'll backport that particular fix.
>
>
> -Hoss
>

Re: solr equivalent of "select distinct"

2011-09-12 Thread lee carroll

if you have a limited set of searches which need to use this and they
act on a limited known set of fields you can concat fields at index
time and then facet

PK   FLD1  FLD2FLD3 FLD4 FLD5 copy45
AB0  AB  0 x   yx y
AB1  AB  1 x   yx y
CD0  CD  0 a   ba b
CD1  CD  1 e   f e f

faceting on copy45 field would give you the correct "distinct" term
values (plus their counts).
Its pretty contrived and limited to knowing the fields you need to concat.

What is the use case for this ? it maybe another approach would fit better

lee c

On 11 September 2011 22:26, Michael Sokolov  wrote:
> You can get what you want - unique lists of values from docs matching your
> query - for a single field (using facets), but not for the co-occurrence of
> two field values.  So you could combine the two fields together, if you know
> what they are going to be "in advance."  Facets also give you counts, so in
> some special cases, you could get what you want - eg you can tell when there
> is only a single pair of values since their counts will be the same and the
> same as the total.  But that's all I can think of.
>
> -Mike
>
> On 9/11/2011 12:39 PM, Mark juszczec wrote:
>>
>> Here's an example:
>>
>> PK   FLD1      FLD2    FLD3 FLD4 FLD5
>> AB0  A            B          0     x       y
>> AB1  A            B          1     x       y
>> CD0  C            D          0     a       b
>> CD1  C            D          1     e       f
>>
>> I want to write a query using only the terms FLD1 and FLD2 and ONLY get
>> back:
>>
>> A B x y
>> C D a b
>> C D e f
>>
>> Since FLD4 and FLD5 are the same for PK=AB0 and AB1, I only want one
>> occurrence of those records.
>>
>> Since FLD4 and FLD5 are different for PK=CD0 and CD1, I want BOTH
>> occurrences of those records.
>>
>
>

Re: Searching multiple fields

2011-09-27 Thread lee carroll

see

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html



On 27 September 2011 16:04, Mark  wrote:
> I thought that a similarity class will only affect the scoring of a single
> field.. not across multiple fields? Can anyone else chime in with some
> input? Thanks.
>
> On 9/26/11 9:02 PM, Otis Gospodnetic wrote:
>>
>> Hi Mark,
>>
>> Eh, I don't have Lucene/Solr source code handy, but I *think* for that
>> you'd need to write custom Lucene similarity.
>>
>> Otis
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>>> 
>>> From: Mark
>>> To: solr-user@lucene.apache.org
>>> Sent: Monday, September 26, 2011 8:12 PM
>>> Subject: Searching multiple fields
>>>
>>> I have a use case where I would like to search across two fields but I do
>>> not want to weight a document that has a match in both fields higher than a
>>> document that has a match in only 1 field.
>>>
>>> For example.
>>>
>>> Document 1
>>> - Field A: "Foo Bar"
>>> - Field B: "Foo Baz"
>>>
>>> Document 2
>>> - Field A: "Foo Blarg"
>>> - Field B: "Something else"
>>>
>>> Now when I search for "Foo" I would like document 1 and 2 to be similarly
>>> scored however document 1 will be scored much higher in this use case
>>> because it matches in both fields. I could create a third field and use
>>> copyField directive to search across that but I was wondering if there is an
>>> alternative way. It would be nice if we could search across some sort of
>>> "virtual field" that will use both underlying fields but not actually
>>> increase the size of the index.
>>>
>>> Thanks
>>>
>>>
>>>
>

lib directory on 1.4.1 with multi cores and tomcat

2011-10-07 Thread lee carroll

lib directory on 1.4.1 with multi cores

I've specified shared lib as "lib" in the solr.xml file. My assumption
being this will be the lib under solr-home.
However my cores cannot load classes from any new jar's placed in this
dir after a tomcat restart.

What am I missing ?

1 2 >

1 - 100 of 110 matches

Mail list logo