Faceting within groups

2013-12-07 Thread Cool Techi
Hi,
I am not sure if faceting with groups is supported, the documents do seem to 
suggest it works, but cant seem to get the intended results.
str name=q(Amazon Cloud OR (IBM Cloud)/strstr 
name=group.fieldsourceId/strstr name=facet.fieldsentiment/strstr 
name=grouptrue/strstr name=group.facettrue/str
Also, if it work's does solr cloud support it.
Regards,Ayush 

Re: Function query matching

2013-12-07 Thread Peter Keegan
  But for your specific goal Peter: Yes, if the whole point of a function
  you have is to wrap generated a scaled score of your base $qq, ...

Thanks for the confirmation, Chris. So, to do this efficiently, I think I
need to implement a custom Collector that performs the scaling (and other
math) after collecting the matching dismax query docs. I started a separate
thread asking about the state of configurable collectors.

Thanks,
Peter


On Sat, Dec 7, 2013 at 1:45 AM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 I had to do a double take when i read this sentence...

 : Even with any improvements to 'scale', all function queries will add a
 : linear increase to the Qtime as index size increases, since they match
 all
 : docs.

 ...because that smelled like either a bug in your methodology, or a bug in
 Solr.  To convince myself there wasn't a bug in Solr, i wrote a test case
 (i'll commit tomorow, bunch of churn in svn right now making ant
 precommit unhappy) to prove that when wrapping boost functions arround
 queries, Solr will only evaluate the functions for docs matching the
 wrapped query -- so there is no linear increase as the index size
 increases, just the (neccessary) libera increase as the number of
 *matching* docs grows. (for most functions anyway -- as mentioned scale
 is special).

 BUT! ... then i remembered how this thread started, and your goal of
 scaling the scores from a wrapped query.

 I want to be clear for 99% of the people reading this, if you find
 yourself writting a query structure like this...

   q={!func}..functions involving wrapping $qq ...
  qq={!edismax ...lots of stuff but still only matching subset of the
 index...}
  fq={!query v=$qq}

 ...Try to restructure the match you want to do into the form of a
 multiplier

   q={!boost b=$b v=$qq}
   b=...functions producing a score multiplier...
  qq={!edismax ...lots of stuff but still only matching subset of the
 index...}

 Because the later case is much more efficient and Solr will only compute
 the function values for hte docs it needs to (that match the wrapped $qq
 query)

 But for your specific goal Peter: Yes, if the whole point of a function
 you have is to wrap generated a scaled score of your base $qq, then the
 function (wrapping the scale(), wrapping the query()) is going to have to
 be evaluated for every doc -- that will definitely be linear based on the
 size of the index.



 -Hoss
 http://www.lucidworks.com/



Re: [Spellcheck] NullPointerException on QueryComponent.mergeIds

2013-12-07 Thread Jean-Marc Desprez
James,
Sorry for the late response.
The shard.qt parameter actually solved my problem !

Thanks
Jean-Marc


2013/11/12 Dyer, James james.d...@ingramcontent.com

 Jean-Marc,

 This might not solve the particular problem you're having, but to get
 spellcheck to work properly in a distributed enviornment, be sure to set
 the shards.qt parameter to the name of your request handler.  See
 http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Jean-Marc Desprez [mailto:jm.desp...@gmail.com]
 Sent: Tuesday, November 12, 2013 8:57 AM
 To: solr-user@lucene.apache.org
 Subject: [Spellcheck] NullPointerException on QueryComponent.mergeIds

 Hello,

 I'm following this tutorial : http://wiki.apache.org/solr/SolrCloud with a
 SolR 4.5.0

 I'm at the very first step, only two replica and two shard and I have only
 *one* document in the index.

 When I try to get a spellcheck, I have this error :
 java.lang.NullPointerException
 at

 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:843)

 I do not understand what I'm doing wrong and how I can get an error on
 mergeIds with only one document in the index (merge this doc with ... ??)

 Some technical details :
 URL :

 http://127.0.0.1:8983/solr/bench/select?shards.qt=ri_spell_fr_FRq=sistemdistrib=true
 If I set distrib to false, no error.

 My uniqueKey is indexed and stored :

 field name=ref type=string indexed=true stored=true
 multiValued=false /
 uniqueKeyref/uniqueKey


 My conf :
 requestHandler name=ri_spell_fr_FR class=solr.SearchHandler
 lazy=true
   lst name=defaults
 bool name=spellchecktrue/bool
 str  name=spellcheck.onlyMorePopulartrue/str
 str  name=spellcheck.extendedResultstrue/str
 str  name=spellcheck.collateExtendedResultstrue/str
 str  name=spellcheck.maxCollationTries3/str
 str  name=spellcheck.collatetrue/str
 str  name=spellcheck.count5/str
 str  name=spellcheck.dictionaryri_spell_fr_FR/str
 str  name=spellcheck.buildfalse/str
   /lst

   arr name=components
 strspellcheck_fr_FR/str
   /arr
 /requestHandler

 searchComponent name=spellcheck_fr_FR class=solr.SpellCheckComponent
   str name=queryAnalyzerFieldTypesuggest_fr_FR/str

   lst name=spellchecker
 str name=nameri_spell_fr_FR/str
 str name=fieldspell_fr_FR/str
 str name=spellcheckIndexDir./spellchecker_fr_FR/str
 str

 name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
   /lst

   ...

 /searchComponent

 With this URL :
 http://127.0.0.1:8983/solr/bench/select?qt=ri_spell_fr_FRq=sistem

 I have no error but the response is empty :
 responselst name=responseHeaderint name=status0/intint
 name=QTime1/int/lst/response


 Thanks
 Jean-Marc



luke 4.6.0 released

2013-12-07 Thread Dmitry Kan
Just released #luke https://plus.google.com/s/%23luke 4.6.0 for the
latest Lucene 4.6.0:  #luke https://plus.google.com/s/%23luke 4.6.0
https://github.com/DmitryKey/luke/releases/tag/4.6.0

-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: post filtering for boolean filter queries

2013-12-07 Thread Dmitry Kan
On Thu, Dec 5, 2013 at 4:49 PM, Yonik Seeley yo...@heliosearch.com wrote:

 On Thu, Dec 5, 2013 at 7:39 AM, Dmitry Kan solrexp...@gmail.com wrote:
  Thanks Erick!
  To be sure we are using cost 101 and no cache. It seems to affect on
  searches as we expected.
 
  Basically with cache on we see more fat spikes around commit points, as
  cache is getting flushed (we don't rerun too many entries from old
 cache).
  But when the post-filtering is involved, those spikes are thinner, but
 the
  rest of the queries take about 2 seconds longer (our queries are pretty
  heavy duty stuff).
 
  So the post-filtering gives an option of making trade-offs between query
  times for all users during normal execution and query times during
 commits.
  To rephrase we have 2 options:
 
  1. Make all searches somewhat slower for all users and avoid really slow
  searches around commit points: post-filtering option
 
  OR
 
  2. Make majority of searches really fast, but around commit points really
  slow: normal with cache option

 OR

 3. Use warming queries or auto-warming of caches to make all searches fast
 but the commits themselves slow.


thanks Yonik. This is indeeed what we have tried originally. But, as I have
briefly described on the Dublin's Stump the Chump, auto-warming is way too
long and does not complete within up to an hour. So the next commit kicks
in and so on. So we opted for an external automatic warming.



 -Yonik
 http://heliosearch.com -- making solr shine




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: post filtering for boolean filter queries

2013-12-07 Thread Dmitry Kan
bq. How slow is around commit points really slow? You could at least
lessen
the pain here by committing less often if you can stand the latency

They are shamelessly slow, like 60-70 seconds. While normal searches are
within 1-3 seconds range. And, yes. your idea is right and what we are
pursuing: less commits. However we do have shards that are hot because we
need to keep them that hot, i.e. we commit as often as data arrives. This
is where the slow searches pop up.

bq.  Often
users are more disturbed by getting (numbers from thin air) 2 second
responses occasionally spiking to 20 seconds with an average of 3 seconds
 than getting all responses between 4 and 6 seconds with an average of 5.

yes, I believe so too. So at the moment, the call for using post-filtering
or cache is more or less for business folks to make. We have been looking
into other things, like making our shards as small as possible. This a
parallel route to making our cache efficient.

Thanks,
Dmitry


On Thu, Dec 5, 2013 at 3:59 PM, Erick Erickson erickerick...@gmail.comwrote:

 bq: To be sure we are using cost 101 and no cache

 The guy who wrote the code is really good, but I'm paranoid too so I use
 101. Based on the number of off-by-one errors I've coded :)...

 How slow is around commit points really slow? You could at least lessen
 the pain here by committing less often if you can stand the latency

 But otherwise you've pretty much nailed your options. One approach is to
 give users _predictable_ responses, not necessarily the best average. Often
 users are more disturbed by getting (numbers from thin air) 2 second
 responses occasionally spiking to 20 seconds with an average of 3 seconds
  than getting all responses between 4 and 6 seconds with an average of 5.

 FWIW,
 Erick


 On Thu, Dec 5, 2013 at 7:39 AM, Dmitry Kan solrexp...@gmail.com wrote:

  Thanks Erick!
  To be sure we are using cost 101 and no cache. It seems to affect on
  searches as we expected.
 
  Basically with cache on we see more fat spikes around commit points, as
  cache is getting flushed (we don't rerun too many entries from old
 cache).
  But when the post-filtering is involved, those spikes are thinner, but
 the
  rest of the queries take about 2 seconds longer (our queries are pretty
  heavy duty stuff).
 
  So the post-filtering gives an option of making trade-offs between query
  times for all users during normal execution and query times during
 commits.
  To rephrase we have 2 options:
 
  1. Make all searches somewhat slower for all users and avoid really slow
  searches around commit points: post-filtering option
 
  OR
 
  2. Make majority of searches really fast, but around commit points really
  slow: normal with cache option
 
  Dmitry
 
 
  On Wed, Dec 4, 2013 at 3:34 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   OK, so cache=false and cost=100 should do it, see:
   http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/
  
   Best,
   Erick
  
  
   On Wed, Dec 4, 2013 at 5:56 AM, Dmitry Kan solrexp...@gmail.com
 wrote:
  
Thanks Yonik.
   
For our use case, we would like to skip caching only one particular
   filter
cache, yet apply a high cost for it to make sure it executes last of
  all
filter queries.
   
So this means, the rest of the fqs will execute and cache as usual.
   
   
   
   
On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley yo...@heliosearch.com
wrote:
   
 On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan solrexp...@gmail.com
   wrote:
  ok, we were able to confirm the behavior regarding not caching
 the
filter
  query. It works as expected. It does not cache with
 {!cache=false}.
 
  We are still looking into clarifying the cost assignment: i.e.
   whether
it
  works as expected for long boolean filter queries.

 Yes, filters should be ordered by cost (cheapest first) whenever
 you
 use {!cache=false}

 -Yonik
 http://heliosearch.com -- making solr shine

   
   
   
--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan
   
  
 
 
 
  --
  Dmitry
  Blog: http://dmitrykan.blogspot.com
  Twitter: twitter.com/dmitrykan
 




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


simple tokenizer question

2013-12-07 Thread Vulcanoid Developer
Hi,

I am new to solr and I guess this is a basic tokenizer question so please
bear with me.

I am trying to use SOLR to index a few (Indian) legal judgments in text
form and search against them. One of the key points with these documents is
that the sections/provisions of law usually have punctuation/special
characters in them. For example search queries will TYPICALLY be section
12AA, section 80-IA, section 9(1)(vii) and the text of the judgments
themselves will contain these sort of text with section references all over
the place.

Now, using a default schema setup with standardtokenizer, which seems to
delimit on whitespace AND punctuations, I get really bad results because it
looks like 12AA is split and results such having 12 and AA in them turn up.
 It becomes worse with 9(1)(vii) with results containing 9 and 1 etc being
turned up.

What is the best solution here? I really just want to index the document
as-is and also to do whitespace tokenizing on the search and nothing more.

So in other words:
a) I would like the text document to be indexed as-is with say 12AA and
9(1)(vii) in the document stored as it is mentioned.
b) I would like to be able to search for 12AA and for 9(1)(vii) and get
proper full matches on them without any splitting up/munging etc.

Any suggestions are appreciated.  Thank you for your time.

Thanks
Vulcanoid


Re: Function query matching

2013-12-07 Thread Chris Hostetter

(This is why i shouldn't send emails just before going to bed.)

I woke up this morning realizing that of course I was completley wrong 
when i said this...

: I want to be clear for 99% of the people reading this, if you find 
: yourself writting a query structure like this...
: 
:   q={!func}..functions involving wrapping $qq ...
...
: ...Try to restructure the match you want to do into the form of a 
: multiplier
... 
: Because the later case is much more efficient and Solr will only compute 
: the function values for hte docs it needs to (that match the wrapped $qq 
: query)

The reason i was wrong...

Even though function queries do by default match all documents, and even 
if the main query is a function query (ie: q={!func}...), if there is 
an fq that filters down the set of documents, then the (main) function 
query will only be calculated for the documents that match the filter.

It was trivial to ammend the test i mentioned last night to show this (and 
i feel silly for not doing that last night and stoping myself from saying 
something foolish)...

  https://svn.apache.org/viewvc?view=revisionrevision=r1548955

The bottom line for Peter is still the same: using scale() wrapped arround 
a function/query does involve a computing hte results for every document, 
and that is going to scale linearly as the size of hte index grows -- but 
it it is *only* because of the scale function.



-Hoss
http://www.lucidworks.com/


solr.xml

2013-12-07 Thread William Bell
We are having issues with SWAP CoreAdmin in 4.5.1 and 4.6.

Using legacy solr.xml we issue a SWAP, and we want it persistent. It has
bee running flawless since 4.5. Now it creates duplicate lines in solr.xml.

Even the example multi core schema in 4.5.1 doesn't work with
persistent=true - it creates duplicate lines in solr.xml.

 cores adminPath=/admin/cores
core name=autosuggest loadOnStartup=true instanceDir=autosuggest
transient=false/

core name=citystateprovider loadOnStartup=true
instanceDir=citystateprovider transient=false/

core name=collection1 loadOnStartup=true instanceDir=collection1
transient=false/

core name=facility loadOnStartup=true instanceDir=facility
transient=false/

core name=inactiveproviders loadOnStartup=true
instanceDir=inactiveproviders transient=false/

core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true
transient=false/

core name=linesvcgeofull instanceDir=linesvcgeofull
loadOnStartup=true transient=false/

core name=locationgeo loadOnStartup=true instanceDir=locationgeo
transient=false/

core name=market loadOnStartup=true instanceDir=market
transient=false/

core name=portalprovider loadOnStartup=true
instanceDir=portalprovider transient=false/

core name=practice loadOnStartup=true instanceDir=practice
transient=false/

core name=provider loadOnStartup=true instanceDir=provider
transient=false/

core name=providersearch loadOnStartup=true
instanceDir=providersearch transient=false/

core name=tridioncomponents loadOnStartup=true
instanceDir=tridioncomponents transient=false/

core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true
transient=false/

core name=linesvcgeofull instanceDir=linesvcgeofull
loadOnStartup=true transient=false/
/cores


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: simple tokenizer question

2013-12-07 Thread Upayavira
Have you tried a WhitespaceTokenizerFactory followed by the
WordDelimiterFilterFactory? The latter is perhaps more configurable at
what it does. Alternatively, you could use a RegexFilterFactory to
remove extraneous punctuation that wasn't removed by the Whitespace
Tokenizer.

Upayavira

On Sat, Dec 7, 2013, at 06:15 PM, Vulcanoid Developer wrote:
 Hi,
 
 I am new to solr and I guess this is a basic tokenizer question so please
 bear with me.
 
 I am trying to use SOLR to index a few (Indian) legal judgments in text
 form and search against them. One of the key points with these documents
 is
 that the sections/provisions of law usually have punctuation/special
 characters in them. For example search queries will TYPICALLY be section
 12AA, section 80-IA, section 9(1)(vii) and the text of the judgments
 themselves will contain these sort of text with section references all
 over
 the place.
 
 Now, using a default schema setup with standardtokenizer, which seems to
 delimit on whitespace AND punctuations, I get really bad results because
 it
 looks like 12AA is split and results such having 12 and AA in them turn
 up.
  It becomes worse with 9(1)(vii) with results containing 9 and 1 etc
  being
 turned up.
 
 What is the best solution here? I really just want to index the document
 as-is and also to do whitespace tokenizing on the search and nothing
 more.
 
 So in other words:
 a) I would like the text document to be indexed as-is with say 12AA and
 9(1)(vii) in the document stored as it is mentioned.
 b) I would like to be able to search for 12AA and for 9(1)(vii) and get
 proper full matches on them without any splitting up/munging etc.
 
 Any suggestions are appreciated.  Thank you for your time.
 
 Thanks
 Vulcanoid


How to boost documents with all the query terms

2013-12-07 Thread Ing. Jorge Luis Betancourt Gonzalez
Hi:

I'm using solr 3.6 with dismax query parser, I've found that docs that doesn't 
has all the query terms get ranked above other that contains all the terms in 
the search query. Using debugQuery I could see that the most part of the score 
in this cases come from the coord(q,d) factor. Is there any way I could boost 
the documents that contain all the search query terms?

Greetings!

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: How to boost documents with all the query terms

2013-12-07 Thread Ahmet Arslan
Hi, Jorge,

Here is a similar discussion : http://search-lucene.com/m/nK6t9j1fuc2/



On Sunday, December 8, 2013 2:48 AM, Ing. Jorge Luis Betancourt Gonzalez 
jlbetanco...@uci.cu wrote:
 
Hi:

I'm using solr 3.6 with dismax query parser, I've found that docs that doesn't 
has all the query terms get ranked above other that contains all the terms in 
the search query. Using debugQuery I could see that the most part of the score 
in this cases come from the coord(q,d) factor. Is there any way I could boost 
the documents that contain all the search query terms?

Greetings!

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu