copyField based on value of another field

2015-06-23 Thread Alistair Young
Hi folks,

is it possible to copyField only if another field has a certain value? e.g.

copyField 'dc.subject' to 'image_suggestions' only if rdf 
http://www.nsdl.org/ontologies/relationships#isInImageBank is true

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h


Re: suggester returning stems instead of whole words

2015-06-17 Thread Alistair Young
ah looks like I need to use copyField to get a non stemmed version of the
suggester field

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote:

I was wondering if there's a way to get the suggester to return whole
words. Instead of returning 'technology' , 'temperature' and 'tutorial',
it's returning 'technolog' , 'temperatur' and 'tutori'

using this config:

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str 
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str 
name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/
str
  str name=fielddc.subject/str
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent
  requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count10/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h



suggester returning stems instead of whole words

2015-06-17 Thread Alistair Young
I was wondering if there's a way to get the suggester to return whole words. 
Instead of returning 'technology' , 'temperature' and 'tutorial', it's 
returning 'technolog' , 'temperatur' and 'tutori'

using this config:

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str 
name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/str
  str name=fielddc.subject/str
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent
  requestHandler class=org.apache.solr.handler.component.SearchHandler 
name=/suggest
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count10/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h


Re: suggester returning stems instead of whole words

2015-06-17 Thread Alistair Young
copyField doesn¹t seem to fix the suggestion stemming. Copying the field
to another field of this type:

field name=subject_autocomplete type=text_auto indexed=true
stored=true multiValued=false /

copyField source=dc.subject dest=subject_autocomplete /


fieldType class=solr.TextField name=text_auto
positionIncrementGap=100
 analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
 /analyzer
/fieldType


but I¹m still getting stemmed suggestions after rebuilding the index.

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote:

ah looks like I need to use copyField to get a non stemmed version of the
suggester field

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote:

I was wondering if there's a way to get the suggester to return whole
words. Instead of returning 'technology' , 'temperature' and 'tutorial',
it's returning 'technolog' , 'temperatur' and 'tutori'

using this config:

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str 
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str 
name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory
/
str
  str name=fielddc.subject/str
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent
  requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count10/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h




Re: suggester returning stems instead of whole words

2015-06-17 Thread Alistair Young
yep did both of those things. Getting the same results as using dc.subject

On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com
wrote:

Did you change the SpellCheckComponent's configuration to use
subject_autocomplete instead of dc.subject? After you made that
change, did you invoke spellcheck.build=true to re-build the
spellcheck index?

On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young
alistair.yo...@uhi.ac.uk wrote:
 copyField doesn¹t seem to fix the suggestion stemming. Copying the field
 to another field of this type:

 field name=subject_autocomplete type=text_auto indexed=true
 stored=true multiValued=false /

 copyField source=dc.subject dest=subject_autocomplete /


 fieldType class=solr.TextField name=text_auto
 positionIncrementGap=100
  analyzer
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
  /analyzer
 /fieldType


 but I¹m still getting stemmed suggestions after rebuilding the index.

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk wrote:

ah looks like I need to use copyField to get a non stemmed version of
the
suggester field

Alistair

--
mov eax,1
mov ebx,0
int 80h




On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk wrote:

I was wondering if there's a way to get the suggester to return whole
words. Instead of returning 'technology' , 'temperature' and
'tutorial',
it's returning 'technolog' , 'temperatur' and 'tutori'

using this config:

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactor
y
/
str
  str name=fielddc.subject/str
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent
  requestHandler
class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count10/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h





-- 
Regards,
Shalin Shekhar Mangar.



Re: suggester returning stems instead of whole words

2015-06-17 Thread Alistair Young
looking at the schema browser, subject_autocomplete has a type of text_en
rather than text_auto and all the terms are stemmed. Its contents are the
same as the one it was copied from, dc.subject, which is text_en and
stemmed.

On 17/06/2015 14:58, Erick Erickson erickerick...@gmail.com wrote:

Hmmm, shouldn't be happening that way. Spellcheck is supposed to be
looking at indexed terms. If you go into the admin/schema browser
page and look at the new field, what are the terms in the index? They
shouldn't be stemmed.

And I always get confused where this
  str name=spellcheck.dictionarysuggest/str
is supposed to point. Do you have any other component named suggest
that you might be picking up?

Best,
Erick

On Wed, Jun 17, 2015 at 6:50 AM, Alistair Young
alistair.yo...@uhi.ac.uk wrote:
 yep did both of those things. Getting the same results as using
dc.subject

 On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com
 wrote:

Did you change the SpellCheckComponent's configuration to use
subject_autocomplete instead of dc.subject? After you made that
change, did you invoke spellcheck.build=true to re-build the
spellcheck index?

On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young
alistair.yo...@uhi.ac.uk wrote:
 copyField doesn¹t seem to fix the suggestion stemming. Copying the
field
 to another field of this type:

 field name=subject_autocomplete type=text_auto indexed=true
 stored=true multiValued=false /

 copyField source=dc.subject dest=subject_autocomplete /


 fieldType class=solr.TextField name=text_auto
 positionIncrementGap=100
  analyzer
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
  /analyzer
 /fieldType


 but I¹m still getting stemmed suggestions after rebuilding the index.

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk
wrote:

ah looks like I need to use copyField to get a non stemmed version of
the
suggester field

Alistair

--
mov eax,1
mov ebx,0
int 80h




On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk
wrote:

I was wondering if there's a way to get the suggester to return whole
words. Instead of returning 'technology' , 'temperature' and
'tutorial',
it's returning 'technolog' , 'temperatur' and 'tutori'

using this config:

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFact
or
y
/
str
  str name=fielddc.subject/str
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent
  requestHandler
class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count10/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h





--
Regards,
Shalin Shekhar Mangar.




Re: suggester returning stems instead of whole words

2015-06-17 Thread Alistair Young
working in a tiny tmux window does have some disadvantages, such as losing
one’s place in the file! the subject_autocomplete definition wasn’t inside
fields. Now that it is, everything is working. thanks for listening

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 17/06/2015 15:17, Alistair Young alistair.yo...@uhi.ac.uk wrote:

looking at the schema browser, subject_autocomplete has a type of text_en
rather than text_auto and all the terms are stemmed. Its contents are the
same as the one it was copied from, dc.subject, which is text_en and
stemmed.

On 17/06/2015 14:58, Erick Erickson erickerick...@gmail.com wrote:

Hmmm, shouldn't be happening that way. Spellcheck is supposed to be
looking at indexed terms. If you go into the admin/schema browser
page and look at the new field, what are the terms in the index? They
shouldn't be stemmed.

And I always get confused where this
  str name=spellcheck.dictionarysuggest/str
is supposed to point. Do you have any other component named suggest
that you might be picking up?

Best,
Erick

On Wed, Jun 17, 2015 at 6:50 AM, Alistair Young
alistair.yo...@uhi.ac.uk wrote:
 yep did both of those things. Getting the same results as using
dc.subject

 On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com
 wrote:

Did you change the SpellCheckComponent's configuration to use
subject_autocomplete instead of dc.subject? After you made that
change, did you invoke spellcheck.build=true to re-build the
spellcheck index?

On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young
alistair.yo...@uhi.ac.uk wrote:
 copyField doesn¹t seem to fix the suggestion stemming. Copying the
field
 to another field of this type:

 field name=subject_autocomplete type=text_auto indexed=true
 stored=true multiValued=false /

 copyField source=dc.subject dest=subject_autocomplete /


 fieldType class=solr.TextField name=text_auto
 positionIncrementGap=100
  analyzer
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
  /analyzer
 /fieldType


 but I¹m still getting stemmed suggestions after rebuilding the index.

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk
wrote:

ah looks like I need to use copyField to get a non stemmed version of
the
suggester field

Alistair

--
mov eax,1
mov ebx,0
int 80h




On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk
wrote:

I was wondering if there's a way to get the suggester to return
whole
words. Instead of returning 'technology' , 'temperature' and
'tutorial',
it's returning 'technolog' , 'temperatur' and 'tutori'

using this config:

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFac
t
or
y
/
str
  str name=fielddc.subject/str
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent
  requestHandler
class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count10/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h





--
Regards,
Shalin Shekhar Mangar.





Re: suggester returning stems instead of whole words

2015-06-17 Thread Alistair Young
yep, 4.3.1. The API changed after that so it’s finding the time to rewrite
the entire backend that uses it

On 17/06/2015 16:55, Shalin Shekhar Mangar shalinman...@gmail.com
wrote:

You must be using an old version of Solr. Since Solr 4.8 and beyond,
the fields and types tags have been deprecated and you can place
the field and field type definitions anywhere in the schema.xml.

See http://issues.apache.org/jira/browse/SOLR-5228

On Wed, Jun 17, 2015 at 9:09 PM, Alistair Young
alistair.yo...@uhi.ac.uk wrote:
 working in a tiny tmux window does have some disadvantages, such as
losing
 one’s place in the file! the subject_autocomplete definition wasn’t
inside
 fields. Now that it is, everything is working. thanks for listening

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 17/06/2015 15:17, Alistair Young alistair.yo...@uhi.ac.uk wrote:

looking at the schema browser, subject_autocomplete has a type of
text_en
rather than text_auto and all the terms are stemmed. Its contents are
the
same as the one it was copied from, dc.subject, which is text_en and
stemmed.

On 17/06/2015 14:58, Erick Erickson erickerick...@gmail.com wrote:

Hmmm, shouldn't be happening that way. Spellcheck is supposed to be
looking at indexed terms. If you go into the admin/schema browser
page and look at the new field, what are the terms in the index? They
shouldn't be stemmed.

And I always get confused where this
  str name=spellcheck.dictionarysuggest/str
is supposed to point. Do you have any other component named suggest
that you might be picking up?

Best,
Erick

On Wed, Jun 17, 2015 at 6:50 AM, Alistair Young
alistair.yo...@uhi.ac.uk wrote:
 yep did both of those things. Getting the same results as using
dc.subject

 On 17/06/2015 14:44, Shalin Shekhar Mangar shalinman...@gmail.com
 wrote:

Did you change the SpellCheckComponent's configuration to use
subject_autocomplete instead of dc.subject? After you made that
change, did you invoke spellcheck.build=true to re-build the
spellcheck index?

On Wed, Jun 17, 2015 at 7:06 PM, Alistair Young
alistair.yo...@uhi.ac.uk wrote:
 copyField doesn¹t seem to fix the suggestion stemming. Copying the
field
 to another field of this type:

 field name=subject_autocomplete type=text_auto indexed=true
 stored=true multiValued=false /

 copyField source=dc.subject dest=subject_autocomplete /


 fieldType class=solr.TextField name=text_auto
 positionIncrementGap=100
  analyzer
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
  /analyzer
 /fieldType


 but I¹m still getting stemmed suggestions after rebuilding the
index.

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 17/06/2015 11:28, Alistair Young alistair.yo...@uhi.ac.uk
wrote:

ah looks like I need to use copyField to get a non stemmed version
of
the
suggester field

Alistair

--
mov eax,1
mov ebx,0
int 80h




On 17/06/2015 11:15, Alistair Young alistair.yo...@uhi.ac.uk
wrote:

I was wondering if there's a way to get the suggester to return
whole
words. Instead of returning 'technology' , 'temperature' and
'tutorial',
it's returning 'technolog' , 'temperatur' and 'tutori'

using this config:

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupF
ac
t
or
y
/
str
  str name=fielddc.subject/str
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent
  requestHandler
class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count10/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h





--
Regards,
Shalin Shekhar Mangar.






-- 
Regards,
Shalin Shekhar Mangar.



Re: phrase matches returning near matches

2015-06-16 Thread Alistair Young
yep seems that’s the answer. The highlighting is done separately by the
rails app, so I’ll look into proper solr highlighting.

thanks a lot for the use of your ears, much improved understanding!

cheers,

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 16/06/2015 16:33, Erick Erickson erickerick...@gmail.com wrote:

Hmmm. First, highlighting should work here. If you have it configured
to work  on the dc.description field.

As to whether the phrase management changes is near enough, I
pretty much guarantee it is. This is where the admin/analysis page can
answer this type of question authoritatively since it's based exactly
on your particular analysis chain.

Best,
Erick

On Tue, Jun 16, 2015 at 8:25 AM, Alistair Young
alistair.yo...@uhi.ac.uk wrote:
 yes prolly not a bug. The highlighting is on but nothing is highlighted.
 Perhaps this text is triggering it?

 'consider the impacts of land management changes’

 that would seem reasonable. It’s not a direct match so no highlighting
 (the highlighting does work on a direct match) but 'management changes’
 must be near enough ‘manage change’ to trigger a result.

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 16/06/2015 16:18, Erick Erickson erickerick...@gmail.com wrote:

I agree with Allesandro the behavior you're describing
is _not_ correct at all given your description. So either

1 There's something interesting about your configuration
  that doesn't seem important that you haven't told us,
  although what it could be is a mystery to me  too ;)

2 it's matching on something else. Note that the
 phrase has been stemmed, so something in there
 besides management might stem to manag and/or
something other than changes might stem to chang
and the two of _them_ happen to be next to each
other. are managers changing? for instance. Or
even something less likely. Perhaps turn on
highlighting and see if it pops out?


3 you've uncovered a bug. Although I suspect others
would have reported it and the unit tests would have
barfed all over the place.

One other thing you can do. Go to the admin/analysis
page and turn on the verbose check box. Put
management is undergoing many changes
in both the query and index boxes. The result (it's
kind of hard to read I'll admit) will include the position
of each token after all the analysis is done. Phrase
queries (without slop) should only be matching adjacent
positions. So the question is whether the position info
looks correct

Best,
Erick

On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 According to your debug you are using a default Lucene Query Parser.
 This surprise me as i would expect with that query a match with
distance 0
 between the 2 terms .

 Are you sure nothing else is that field that matches the phrase query
?

 From the documentation

 Lucene supports finding words are a within a specific distance away.
To do
 a proximity search use the tilde, ~, symbol at the end of a Phrase.
For
 example to search for a apache and jakarta within 10 words of each
 other in a document use the search:

 jakarta apache~10 


 Cheers


 2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:

 it¹s a useful behaviour. I¹d just like to understand where it¹s
deciding
 the document is relevant. debug output is:

 lst name=debug
   str name=rawquerystringdc.description:manage change/str
   str name=querystringdc.description:manage change/str
   str name=parsedqueryPhraseQuery(dc.description:manag
chang)/str
   str name=parsedquery_toStringdc.description:manag chang/str
   lst name=explain
 str name=tst:test
 1.2008798 = (MATCH) weight(dc.description:manag chang in 221)
 [DefaultSimilarity], result of:
   1.2008798 = fieldWeight in 221, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = phraseFreq=1.0
 9.6070385 = idf(), sum of:
   4.0365543 = idf(docFreq=101, maxDocs=2125)
   5.5704846 = idf(docFreq=21, maxDocs=2125)
 0.125 = fieldNorm(doc=221)
 /str
   /lst
   str name=QParserLuceneQParser/str
   lst name=timing
 double name=time41.0/double
 lst name=prepare
   double name=time3.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats
 double name=time0.0/double
   /lst
   lst name=debug
 double name=time0.0/double
   /lst
 /lst
 lst name=process
   double name=time35.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats

Re: phrase matches returning near matches

2015-06-16 Thread Alistair Young
yes prolly not a bug. The highlighting is on but nothing is highlighted.
Perhaps this text is triggering it?

'consider the impacts of land management changes’

that would seem reasonable. It’s not a direct match so no highlighting
(the highlighting does work on a direct match) but 'management changes’
must be near enough ‘manage change’ to trigger a result.

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 16/06/2015 16:18, Erick Erickson erickerick...@gmail.com wrote:

I agree with Allesandro the behavior you're describing
is _not_ correct at all given your description. So either

1 There's something interesting about your configuration
  that doesn't seem important that you haven't told us,
  although what it could be is a mystery to me  too ;)

2 it's matching on something else. Note that the
 phrase has been stemmed, so something in there
 besides management might stem to manag and/or
something other than changes might stem to chang
and the two of _them_ happen to be next to each
other. are managers changing? for instance. Or
even something less likely. Perhaps turn on
highlighting and see if it pops out?


3 you've uncovered a bug. Although I suspect others
would have reported it and the unit tests would have
barfed all over the place.

One other thing you can do. Go to the admin/analysis
page and turn on the verbose check box. Put
management is undergoing many changes
in both the query and index boxes. The result (it's
kind of hard to read I'll admit) will include the position
of each token after all the analysis is done. Phrase
queries (without slop) should only be matching adjacent
positions. So the question is whether the position info
looks correct

Best,
Erick

On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 According to your debug you are using a default Lucene Query Parser.
 This surprise me as i would expect with that query a match with
distance 0
 between the 2 terms .

 Are you sure nothing else is that field that matches the phrase query ?

 From the documentation

 Lucene supports finding words are a within a specific distance away.
To do
 a proximity search use the tilde, ~, symbol at the end of a Phrase.
For
 example to search for a apache and jakarta within 10 words of each
 other in a document use the search:

 jakarta apache~10 


 Cheers


 2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:

 it¹s a useful behaviour. I¹d just like to understand where it¹s
deciding
 the document is relevant. debug output is:

 lst name=debug
   str name=rawquerystringdc.description:manage change/str
   str name=querystringdc.description:manage change/str
   str name=parsedqueryPhraseQuery(dc.description:manag
chang)/str
   str name=parsedquery_toStringdc.description:manag chang/str
   lst name=explain
 str name=tst:test
 1.2008798 = (MATCH) weight(dc.description:manag chang in 221)
 [DefaultSimilarity], result of:
   1.2008798 = fieldWeight in 221, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = phraseFreq=1.0
 9.6070385 = idf(), sum of:
   4.0365543 = idf(docFreq=101, maxDocs=2125)
   5.5704846 = idf(docFreq=21, maxDocs=2125)
 0.125 = fieldNorm(doc=221)
 /str
   /lst
   str name=QParserLuceneQParser/str
   lst name=timing
 double name=time41.0/double
 lst name=prepare
   double name=time3.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats
 double name=time0.0/double
   /lst
   lst name=debug
 double name=time0.0/double
   /lst
 /lst
 lst name=process
   double name=time35.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats
 double name=time0.0/double
   /lst
   lst name=debug
 double name=time35.0/double
   /lst
 /lst
   /lst
 /lst


 thanks,

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 16/06/2015 11:26, Alessandro Benedetti
benedetti.ale...@gmail.com
 wrote:

 Can you show us how the query is parsed ?
 You didn't tell us nothing about the query parser you are using.
 Enable the debugQuery=true will show you how the query is parsed and
this
 will be quite useful for us.
 
 
 Cheers
 
 2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:
 
  Hiya,
 
  I've been looking for documentation that would point to where I
could
  modify or explain why 'near neighbours' are returned from a phrase
 search.
  If I search for:
 
  manage change
 
  I

phrase matches returning near matches

2015-06-16 Thread Alistair Young
Hiya,

I've been looking for documentation that would point to where I could modify or 
explain why 'near neighbours' are returned from a phrase search. If I search 
for:

manage change

I get back a document that contains this will help in your management of lots 
more words... changes. It's relevant but I'd like to understand why solr is 
returning it. Is it a combination of fuzzy/slop? The distance between the two 
variations of the two words in the document is quite large.

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h


Re: phrase matches returning near matches

2015-06-16 Thread Alistair Young
it¹s a useful behaviour. I¹d just like to understand where it¹s deciding
the document is relevant. debug output is:

lst name=debug
  str name=rawquerystringdc.description:manage change/str
  str name=querystringdc.description:manage change/str
  str name=parsedqueryPhraseQuery(dc.description:manag chang)/str
  str name=parsedquery_toStringdc.description:manag chang/str
  lst name=explain
str name=tst:test
1.2008798 = (MATCH) weight(dc.description:manag chang in 221)
[DefaultSimilarity], result of:
  1.2008798 = fieldWeight in 221, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = phraseFreq=1.0
9.6070385 = idf(), sum of:
  4.0365543 = idf(docFreq=101, maxDocs=2125)
  5.5704846 = idf(docFreq=21, maxDocs=2125)
0.125 = fieldNorm(doc=221)
/str
  /lst
  str name=QParserLuceneQParser/str
  lst name=timing
double name=time41.0/double
lst name=prepare
  double name=time3.0/double
  lst name=query
double name=time0.0/double
  /lst
  lst name=facet
double name=time0.0/double
  /lst
  lst name=mlt
double name=time0.0/double
  /lst
  lst name=highlight
double name=time0.0/double
  /lst
  lst name=stats
double name=time0.0/double
  /lst
  lst name=debug
double name=time0.0/double
  /lst
/lst
lst name=process
  double name=time35.0/double
  lst name=query
double name=time0.0/double
  /lst
  lst name=facet
double name=time0.0/double
  /lst
  lst name=mlt
double name=time0.0/double
  /lst
  lst name=highlight
double name=time0.0/double
  /lst
  lst name=stats
double name=time0.0/double
  /lst
  lst name=debug
double name=time35.0/double
  /lst
/lst
  /lst
/lst


thanks,

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 16/06/2015 11:26, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:

Can you show us how the query is parsed ?
You didn't tell us nothing about the query parser you are using.
Enable the debugQuery=true will show you how the query is parsed and this
will be quite useful for us.


Cheers

2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:

 Hiya,

 I've been looking for documentation that would point to where I could
 modify or explain why 'near neighbours' are returned from a phrase
search.
 If I search for:

 manage change

 I get back a document that contains this will help in your management
of
 lots more words... changes. It's relevant but I'd like to understand
why
 solr is returning it. Is it a combination of fuzzy/slop? The distance
 between the two variations of the two words in the document is quite
large.

 thanks,

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England



Solr spellcheck - onlyMorePopular threshold?

2014-06-09 Thread Alistair
Hello all,

I was wondering what does the onlyMorePopular option for spellchecking use
as its threshold? Will it always pick the suggestion that returns the most
queries or does it base its result based off of some threshold that can be
configured? 

Thanks!

Ali.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spellcheck-onlyMorePopular-threshold-tp4140727.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Having trouble with German compound words in Solr 4.7

2014-04-22 Thread Alistair
I've managed to solve this (in a quite hacky sort of way) by using filter
queries and the edismax queryparser. 

I added in my solrconfig.xml the following parameters:

str name=defTypeedismax/str
str name=mm75%/str

Then when searching for multiple keywords (for example: schwarzkleid wenz,
where wenz is a german brand name), I use the first keyword as a query and
anything after that I add as a filterquery. So my final query looks
something like this:

   
fl=idsort=popular+descindent=onq=keywords:'schwarzkleide'+wt=jsonfq={!edismax}+keywords:'wenz'fq=deleted:0

My compound splitter filter splits schwarzkleide correctly and it is parsed
as edismax with mm=75%, then the filterqueries are added, for keywords they
are also parsed as edismax. The returned result is all the black dresses
from 'Wenz'. 

If anybody has a better solution to what I've posted I would be more than
happy to read up on it as I'm quite new to Solr and I think my way is a bit
convoluted to be honest.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Having-trouble-with-German-compound-words-in-Solr-4-7-tp4131964p4132478.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Having trouble with German compound words in Solr 4.7

2014-04-21 Thread Alistair
Hi Siegfried,

the debug shows that the separated keywords get OR'd together so a match to
either keyword appears in the results. So if I am searching for:

*keywords:schwarzkleid* this will get transformed to *keywords:schwarz
keywords:kleid *which is equivalent to *keywords:schwarz OR keywords:kleid*.
I need this query to be defaulted to* keywords:schwarz AND keywords:kleid*
so only items that match both keywords appear in my results (in this case
black dresses).

I am pretty confused as to why replacing the default boolean operator is
this difficult :(

Any other suggestions?

Ali



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Having-trouble-with-German-compound-words-in-Solr-4-7-tp4131964p4132338.html
Sent from the Solr - User mailing list archive at Nabble.com.


Having trouble with German compound words in Solr 4.7

2014-04-18 Thread Alistair
Hello all,

I'm a fairly new Solr user and I need my search function to handle compound
words in German. I've searched through the archives and found that Solr
already has a Filter Factory made for such words called
DictionaryCompoundWordTokenFilterFactory. I've already built a list of words
that I want split, and it seems like the filter is working correctly in most
cases, the majority of our searches are clothing items so let's say
/schwarzkleid/ (black dress) becomes /schwarz/ /kleid/, which is what
I want to happen. However, it seems like the keyword search is done using an
*OR* operator. So I'm seeing items that are either black or are dresses but
I just want to see items that are both. I've also read that changing the
default operator in schema.xml or adding q.op as *AND* in the solrconfig.xml
will rectify this issue, but nothing has changed in my query results. It
still uses the *OR* operator.
I've tried using Extended dismax in my queries but I am using the Solr PHP
library and I don't think it supports adding Dismax filters to the queries
themselves (if I'm wrong, please correct me). By the way, I am using Zend
Framework 2.0 in the backend and am communicating with Solr through the Solr
PHP library:  Solr PHP http://www.php.net/manual/tr/book.solr.php  . 

Any suggestions on how to change the operator after my compound word queries
have been split?

Thanks!

Ali



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Having-trouble-with-German-compound-words-in-Solr-4-7-tp4131964.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Having trouble with German compound words in Solr 4.7

2014-04-18 Thread Alistair
Hey Jack,

thanks for the reply. I added autoGeneratePhraseQueries=true to the
fieldType and now it's giving me even more results! I'm not sure if the
debug of my query will be helpful but I'll paste it just in case someone
might have an idea. This produces 113524 results, whereas if I manually
enter the query as keyword:schwarz AND keyword:kleid I only get 20283
results (which is the correct one). 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Having-trouble-with-German-compound-words-in-Solr-4-7-tp4131964p4131973.html
Sent from the Solr - User mailing list archive at Nabble.com.


Strange behaviour with single word and phrase

2013-09-04 Thread Alistair Young
I wonder if anyone could point me in the right direction please?

If I search on the phrase the toolkit I get hits containing that phrase but 
also hits that have the word 'the' before the word 'toolkit', no matter how far 
apart they are.

Also, if I search on the word 'the' there are no hits at all.

Thanks,

Alistair

-
mov eax,1
mov ebx,0
int 80


Re: Strange behaviour with single word and phrase

2013-09-04 Thread Alistair Young
Yep ignoring stop words. Thanks for the pointer.

Alistair

-
mov eax,1
mov ebx,0
int 80




On 04/09/2013 13:43, Jack Krupansky j...@basetechnology.com wrote:

Do you have stop word filtering enabled? What does your field type look
like?

If stop words are ignored, you will get exactly the behavior you
described.

-- Jack Krupansky

-Original Message-
From: Alistair Young
Sent: Wednesday, September 04, 2013 6:57 AM
To: solr-user@lucene.apache.org
Subject: Strange behaviour with single word and phrase

I wonder if anyone could point me in the right direction please?

If I search on the phrase the toolkit I get hits containing that phrase
but also hits that have the word 'the' before the word 'toolkit', no
matter 
how far apart they are.

Also, if I search on the word 'the' there are no hits at all.

Thanks,

Alistair

-
mov eax,1
mov ebx,0
int 80 






Re: Collection not current after insert

2013-07-24 Thread Alistair Young
thanks Michael, adding autoCommit sorted it.

cheers,

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 23/07/2013 18:34, Michael Della Bitta
michael.della.bi...@appinions.com wrote:

Hi Alistair,

You probably need a commit, and not an optimize.

Which version of Solr are you running against? The 4.0 releases have more
complications, but generally sending a commit will do. Not sure if GSearch
sends one, only partly because I never was able to make it work. :)


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

³The Science of Influence Marketing²

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
w: appinions.com http://www.appinions.com/


On Tue, Jul 23, 2013 at 9:57 AM, Alistair Young
alistair.yo...@uhi.ac.ukwrote:

 Hi there,

 My Solr is being fed by Fedora GSearch and when uploading a new
resource,
 the Collection is optimized but not current so the new resource can't be
 found. I have to go to the Core Admin page and Optimize it from there,
in
 order to make the collection current. Is there anything I should look
for
 to see what the problem is? This is the comms to solr when inserting:

 DEBUG 2013-07-23 13:27:37,023 (OperationsImpl) resultXml =
 solrUpdateIndex indexName=FgsIndex
 insertedltk:13000116/inserted
 counts insertTotal=1 updateTotal=0 deleteTotal=0 emptyTotal=0
 docCount=854 warnCount=0/
 /solrUpdateIndex

 DEBUG 2013-07-23 13:27:37,023 (GTransformer)
 xsltName=fgsconfigFinal/index/FgsIndex/updateIndexToResultPage
 DEBUG 2013-07-23 13:27:37,027 (GTransformer) getTransformer
 
transformer=org.apache.xalan.transformer.TransformerImpl@6561b973uriResol
ver=null
 DEBUG 2013-07-23 13:27:37,028 (GenericOperationsImpl) resultXml=?xml
 version=1.0 encoding=UTF-8?
 resultPage operation=updateIndex action=fromPid
value=ltk:13000116
 repositoryName=FgsRepos indexNames= resultPageXslt= dateTime=Tue
Jul
 23 13:27:36 UTC 2013
 updateIndex xmlns:dc=http://purl.org/dc/elements/1.1/;
 xmlns:foxml=info:fedora/fedora-system:def/foxml# xmlns:zs=
 http://www.loc.gov/zing/srw/; warnCount=0 docCount=854
 deleteTotal=0 updateTotal=0 insertTotal=1 indexName=FgsIndex/
 /resultPage

 INFO 2013-07-23 13:27:37,028 (UpdateListener) Index updated by
 notification message, returning:
 ?xml version=1.0 encoding=UTF-8?
 resultPage operation=updateIndex action=fromPid
value=ltk:13000116
 repositoryName=FgsRepos indexNames= resultPageXslt= dateTime=Tue
Jul
 23 13:27:36 UTC 2013
 updateIndex xmlns:dc=http://purl.org/dc/elements/1.1/;
 xmlns:foxml=info:fedora/fedora-system:def/foxml# xmlns:zs=
 http://www.loc.gov/zing/srw/; warnCount=0 docCount=854
 deleteTotal=0 updateTotal=0 insertTotal=1 indexName=FgsIndex/
 /resultPage

 thanks,

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h





Collection not current after insert

2013-07-23 Thread Alistair Young
Hi there,

My Solr is being fed by Fedora GSearch and when uploading a new resource, the 
Collection is optimized but not current so the new resource can't be found. I 
have to go to the Core Admin page and Optimize it from there, in order to make 
the collection current. Is there anything I should look for to see what the 
problem is? This is the comms to solr when inserting:

DEBUG 2013-07-23 13:27:37,023 (OperationsImpl) resultXml =
solrUpdateIndex indexName=FgsIndex
insertedltk:13000116/inserted
counts insertTotal=1 updateTotal=0 deleteTotal=0 emptyTotal=0 
docCount=854 warnCount=0/
/solrUpdateIndex

DEBUG 2013-07-23 13:27:37,023 (GTransformer) 
xsltName=fgsconfigFinal/index/FgsIndex/updateIndexToResultPage
DEBUG 2013-07-23 13:27:37,027 (GTransformer) getTransformer 
transformer=org.apache.xalan.transformer.TransformerImpl@6561b973 
uriResolver=null
DEBUG 2013-07-23 13:27:37,028 (GenericOperationsImpl) resultXml=?xml 
version=1.0 encoding=UTF-8?
resultPage operation=updateIndex action=fromPid value=ltk:13000116 
repositoryName=FgsRepos indexNames= resultPageXslt= dateTime=Tue Jul 23 
13:27:36 UTC 2013
updateIndex xmlns:dc=http://purl.org/dc/elements/1.1/; 
xmlns:foxml=info:fedora/fedora-system:def/foxml# 
xmlns:zs=http://www.loc.gov/zing/srw/; warnCount=0 docCount=854 
deleteTotal=0 updateTotal=0 insertTotal=1 indexName=FgsIndex/
/resultPage

INFO 2013-07-23 13:27:37,028 (UpdateListener) Index updated by notification 
message, returning:
?xml version=1.0 encoding=UTF-8?
resultPage operation=updateIndex action=fromPid value=ltk:13000116 
repositoryName=FgsRepos indexNames= resultPageXslt= dateTime=Tue Jul 23 
13:27:36 UTC 2013
updateIndex xmlns:dc=http://purl.org/dc/elements/1.1/; 
xmlns:foxml=info:fedora/fedora-system:def/foxml# 
xmlns:zs=http://www.loc.gov/zing/srw/; warnCount=0 docCount=854 
deleteTotal=0 updateTotal=0 insertTotal=1 indexName=FgsIndex/
/resultPage

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h