Meta-search by subclassing SearchHandler

2013-08-15 Thread Dan Davis
I am considering enabling a true "Federated Search", or meta-search, using
the following basic configuration (this configuration is only for
development and evaluation):

Three Solr cores:

   - One to search data I have indexed locally
   - One with a custom SearchHandler that is a facade, e.g. it performs a
   meta-search (aka Federated Search)
   - One that queries and merges the above cores as "shards"

Lest I seem completely like Sauron, I read
http://2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/files/AndrzejBialecki-Buzzwords-2011_0.pdf
and am familiar with evaluating "precision at 10", etc. although I am no
doubt less familiar with IR than many.

I think that it is much, much better for performance and relevancy to index
it all on a level playing field.  But my employer cannot do that, because
we do not have a license to all the data we may wish to search in the
future.

My questions are simple - has anybody implemented such a SearchHandler that
is a facade for another search engine?   How would I get started with that?

I have made a similar post on the blacklight developers google group.


Re: SolrCloud Load Balancer "weight"

2013-08-15 Thread Tim Vaillancourt

Soon ended up being a while :), feel free to add any thoughts.

https://issues.apache.org/jira/browse/SOLR-5166

Tim

On 07/06/13 03:07 PM, Vaillancourt, Tim wrote:

Cool!

Having those values influenced by stats is a neat idea too. I'll get on that 
soon.

Tim

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Monday, June 03, 2013 5:07 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Load Balancer "weight"


On Jun 3, 2013, at 3:33 PM, Tim Vaillancourt  wrote:


Should I JIRA this? Thoughts?

Yeah - it's always been in the back of my mind - it's come up a few times - 
eventually we would like nodes to report some stats to zk to influence load 
balancing.

- mark


Where is the webapps directory of servlet container?

2013-08-15 Thread Kamaljeet Kaur

Hello,
They write in the  reference guide
  , "Copy
the solr.war file from the Solr distribution to the webapps directory of
your servlet container. "

I can't find the webapps directory of the Servlet Container- Java. Please
help.

In next step its written: "Copy the Solr Home directory
apache-solr-4.x.0/example/solr/ from the distribution to your desired Solr
Home location. "

Which solr home directory they refer to? And which desired Solr Home
location?? Please tell.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Where-is-the-webapps-directory-of-servlet-container-tp4084968.html
Sent from the Solr - User mailing list archive at Nabble.com.


is indirection possible?

2013-08-15 Thread Sandy Mustard

I am relatively new to the use of Solr.

I have a set of documents with fields that contain the id of other 
documents.  Is it possible to specify a query that will return the 
related documents?


Doc 1   id=444, name=First Document, pointer=777
Doc 2   id=777, name=Next Document, pointer=555


I would like to query for Doc 1 and also get Doc 2 returned.

Thanks,
Sandy Mustard


Re: AND not working

2013-08-15 Thread Steven Bower
https://issues.apache.org/jira/browse/SOLR-5163


On Thu, Aug 15, 2013 at 6:04 PM, Steven Bower  wrote:

> @Yonik that was exactly the issue... I'll file a ticket... there def
> should be an exception thrown for something like this..
>
> It would seem to me that eating any sort of exception is a really bad
> thing...
>
> steve
>
>
> On Thu, Aug 15, 2013 at 5:59 PM, Yonik Seeley wrote:
>
>> I can reproduce something like this by specifying a field that doesn't
>> exist for a "qf" param.
>> This seems like a bug... if a field doesn't exist, we should throw an
>> exception (since it's a parameter error not related to the "q" string
>> where
>> we avoid throwing any errors).
>>
>> -Yonik
>> http://lucidworks.com
>>
>>
>> On Thu, Aug 15, 2013 at 5:19 PM, Steven Bower 
>> wrote:
>>
>> > I have query like:
>> >
>> > q=foo AND bar
>> > defType=edismax
>> > qf=field1
>> > qf=field2
>> > qf=field3
>> >
>> > with debug on I see it parsing to this:
>> >
>> > (+(DisjunctionMaxQuery((field1:foo | field2:foo | field3:foo))
>> > DisjunctionMaxQuery((field1:and | field2:and | field3:and))
>> > DisjunctionMaxQuery((field1:bar | field2:bar | field3:bar/no_coord
>> >
>> > basically it seems to be treating the AND as a term... any thoughts?
>> >
>> > thx,
>> >
>> > steve
>> >
>>
>
>


Re: AND not working

2013-08-15 Thread Steven Bower
@Yonik that was exactly the issue... I'll file a ticket... there def should
be an exception thrown for something like this..

It would seem to me that eating any sort of exception is a really bad
thing...

steve


On Thu, Aug 15, 2013 at 5:59 PM, Yonik Seeley  wrote:

> I can reproduce something like this by specifying a field that doesn't
> exist for a "qf" param.
> This seems like a bug... if a field doesn't exist, we should throw an
> exception (since it's a parameter error not related to the "q" string where
> we avoid throwing any errors).
>
> -Yonik
> http://lucidworks.com
>
>
> On Thu, Aug 15, 2013 at 5:19 PM, Steven Bower 
> wrote:
>
> > I have query like:
> >
> > q=foo AND bar
> > defType=edismax
> > qf=field1
> > qf=field2
> > qf=field3
> >
> > with debug on I see it parsing to this:
> >
> > (+(DisjunctionMaxQuery((field1:foo | field2:foo | field3:foo))
> > DisjunctionMaxQuery((field1:and | field2:and | field3:and))
> > DisjunctionMaxQuery((field1:bar | field2:bar | field3:bar/no_coord
> >
> > basically it seems to be treating the AND as a term... any thoughts?
> >
> > thx,
> >
> > steve
> >
>


Re: AND not working

2013-08-15 Thread Yonik Seeley
I can reproduce something like this by specifying a field that doesn't
exist for a "qf" param.
This seems like a bug... if a field doesn't exist, we should throw an
exception (since it's a parameter error not related to the "q" string where
we avoid throwing any errors).

-Yonik
http://lucidworks.com


On Thu, Aug 15, 2013 at 5:19 PM, Steven Bower  wrote:

> I have query like:
>
> q=foo AND bar
> defType=edismax
> qf=field1
> qf=field2
> qf=field3
>
> with debug on I see it parsing to this:
>
> (+(DisjunctionMaxQuery((field1:foo | field2:foo | field3:foo))
> DisjunctionMaxQuery((field1:and | field2:and | field3:and))
> DisjunctionMaxQuery((field1:bar | field2:bar | field3:bar/no_coord
>
> basically it seems to be treating the AND as a term... any thoughts?
>
> thx,
>
> steve
>


Re: AND not working

2013-08-15 Thread Jack Krupansky
If something causes an exception when edismax has called the basic Lucene 
query parser, edismax will eat the exception, escape operators (your "AND") 
and reparse, where your "AND" then gets treated as a simple term.


Maybe some difficulty in your field type analyzer?

We need to see your full query URL, request handler, and field types.

-- Jack Krupansky

-Original Message- 
From: Steven Bower

Sent: Thursday, August 15, 2013 5:19 PM
To: solr-user
Subject: AND not working

I have query like:

q=foo AND bar
defType=edismax
qf=field1
qf=field2
qf=field3

with debug on I see it parsing to this:

(+(DisjunctionMaxQuery((field1:foo | field2:foo | field3:foo))
DisjunctionMaxQuery((field1:and | field2:and | field3:and))
DisjunctionMaxQuery((field1:bar | field2:bar | field3:bar/no_coord

basically it seems to be treating the AND as a term... any thoughts?

thx,

steve 



Re: list docs with geo location info

2013-08-15 Thread Mingfeng Yang
Figured out.  use  author_geo:[* TO *] will do the trick.




On Thu, Aug 15, 2013 at 1:26 PM, Mingfeng Yang wrote:

> I have a schema with a geolocation field named "author_geo" defined as
>
>   stored="true" />
>
> How can I list docs whose author_geo fields are not empty?
>
> Seems filter query "fq=author_geo:*" does not work like other fields which
> are string or text or float type.
>
> curl
> 'localhost/solr/select?q=*:*&rows=10&wt=json&indent=true&fq=author_geo:*&fl=author_geo'
>
> What's the right way of doing it?
>
> Thanks,
> Mingfeng
>
>


Re: Synonym Expansion in Spellchecking Field Solr 4.3.1

2013-08-15 Thread Brendan Grainger
Hi All,

I didn't have the lucene-solr source compiling cleaning in eclipse
initially so I created a very quick maven project to demonstrate this issue:

https://github.com/rainkinz/solr_spellcheck_index_out_of_bounds.git

Having said that I just got everything set up in eclipse, so I can create a
test case if this is actually an issue and not something weird with my
configuration.

Thanks
Brendan



On Thu, Aug 15, 2013 at 1:43 PM, Brendan Grainger <
brendan.grain...@gmail.com> wrote:

> Further to this. If I change:
>
> tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure
> monitoring system,tpm,low tire warning,tire pressure monitor system
>
> to
>
> service tire monitor,tire monitor,tire pressure monitor,tire pressure
> monitoring system,tpm,low tire warning,tire pressure monitor system,tpms
>
> I don't get a crash. I tried it with some other fields too. e.g.:
>
> asdm,airbag system diagnostic module => crash
>
> airbag system diagnostic module,asdm => no crash
>
> Thanks
> Brendan
>
>
>
> On Thu, Aug 15, 2013 at 1:37 PM, Brendan Grainger <
> brendan.grain...@gmail.com> wrote:
>
>> Hi All,
>>
>> I've been debugging an issue where the query 'tpms' would make the
>> spellchecker throw the following exception:
>>
>> 21021 [qtp91486057-17] ERROR org.apache.solr.servlet.SolrDispatchFilter
>>  – null:java.lang.StringIndexOutOfBoundsException: String index out of
>> range: -1
>>  at
>> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
>> at java.lang.StringBuilder.replace(StringBuilder.java:266)
>>  at
>> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190)
>> at
>> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75)
>>
>>
>> I have the following synonyms defined for tpms:
>>
>> tpms,service tire monitor,tire monitor,tire pressure monitor,tire
>> pressure monitoring system,tpm,low tire warning,tire pressure monitor system
>>
>> Note that if you query any of the other synonyms there is no issue, only
>> tpms.
>>
>> Looking at my field definition for my spellchecker I realized I am doing
>> query time synonym expansion:
>>
>> > positionIncrementGap="100" omitNorms="true">
>>   
>> 
>> > ignoreCase="true"
>> words="lang/stopwords_en.txt"
>> enablePositionIncrements="true"
>> />
>> 
>> 
>>   
>>   
>> 
>> > ignoreCase="true" expand="true"/>
>> > ignoreCase="true"
>> words="lang/stopwords_en.txt"
>> enablePositionIncrements="true"
>> />
>> 
>> 
>>   
>> 
>>
>> I copied this field definition from:
>> http://wiki.apache.org/solr/SpellCheckingAnalysis. As the issue seemed
>> related to synonyms I removed the SynonymFilterFactory and everything
>> works.
>>
>> I'm going to try to create a reproducible test case for the crash, but
>> right now I'm wondering what I lose by not having synonym expansion when
>> spell checking?
>>
>> Thanks
>>  Brendan
>>
>>
>>
>
>
> --
> Brendan Grainger
> www.kuripai.com
>



-- 
Brendan Grainger
www.kuripai.com


AND not working

2013-08-15 Thread Steven Bower
I have query like:

q=foo AND bar
defType=edismax
qf=field1
qf=field2
qf=field3

with debug on I see it parsing to this:

(+(DisjunctionMaxQuery((field1:foo | field2:foo | field3:foo))
DisjunctionMaxQuery((field1:and | field2:and | field3:and))
DisjunctionMaxQuery((field1:bar | field2:bar | field3:bar/no_coord

basically it seems to be treating the AND as a term... any thoughts?

thx,

steve


list docs with geo location info

2013-08-15 Thread Mingfeng Yang
I have a schema with a geolocation field named "author_geo" defined as

 

How can I list docs whose author_geo fields are not empty?

Seems filter query "fq=author_geo:*" does not work like other fields which
are string or text or float type.

curl
'localhost/solr/select?q=*:*&rows=10&wt=json&indent=true&fq=author_geo:*&fl=author_geo'

What's the right way of doing it?

Thanks,
Mingfeng


Re: Indexoutofbounds size: 9 index: 8 with data import handler

2013-08-15 Thread eShard
Ok, these errors seem to be caused by passing incorrect parameters in a
search query.
Such as: spellcheck=extendedResults=true 
instead of 
spellcheck.extendedResults=true

Thankfully, it seems to have nothing to do with the DIH at all.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexoutofbounds-size-9-index-8-with-data-import-handler-tp4084812p4084874.html
Sent from the Solr - User mailing list archive at Nabble.com.


Large cache settings values - sanity check

2013-08-15 Thread Eoghan Ó Carragáin
Hi,

I’m involved in the an open source project called Vufind which uses Solr to
search across library catalogue records [1].

The project uses what seems to be very high defaults cache settings in
solrconfig.xml [2]:

   -

   filterCache (size="30" initialSize="30" autowarmCount="5"),
   -

   queryResultCache (size="10" initialSize="10"
   autowarmCount="5"),
   -

   documentCache (size="5" initialSize="5").


These settings haven’t been reviewed since early in the project history (c.
2007) but came up in a recent discussion around out-of-memory issues and
garbage collection.

Of course decisions on cache configuration (along with jvm settings,
sharding etc) vary depending on the instance (index size, query/sec etc),
but I wanted to run these values past this list as a sanity check for what
you’d consider good default settings giving that most adopters of the
software will not touch the defaults.

Some characteristics of library data & Vufind’s schema [3] which may have a
bearing on the issue:

   -

   quite a few facet fields & filtering (~ 12 facets configured by default)
   -

   high number of unique facet values (e.g. several hundred-thousands in a
   facet field for authors or subjects)
   -

   most libraries would do only one or two incremental commits a day (which
   may justify high auto-warming settings since the next commit isn’t for 24
   hours)
   -

   sorting: relevance by default but other options configured by default
   (title, author, callnumber, year, etc)
   -

   mostly, small sparse documents (MARC records containing title, author,
   desciption etc but no full-text content)
   -

   quite a few stored fields, including a field which stores the full MARC
   record for additional parsing by the application
   -

   average number of documents for most adopters probably somewhere between
   500K and 2 million MARC records (Vufind has several adopters with up to 50m
   full-text docs but these make considerable customisations their Solr setup)
   - query/sec will vary from library to library, but shouldn't be anything
   too taxing for most adopters


Do the current cache settings make sense in this context, or should we
consider dropping back to the much lower values given in the Solr example
and wiki?

Many thanks

Eoghan


[1] vufind.org

[2]
https://github.com/vufind-org/vufind/blob/master/solr/biblio/conf/solrconfig.xml
[3]
https://github.com/vufind-org/vufind/blob/master/solr/biblio/conf/schema.xml


Re: Synonym Expansion in Spellchecking Field Solr 4.3.1

2013-08-15 Thread Brendan Grainger
Further to this. If I change:

tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure
monitoring system,tpm,low tire warning,tire pressure monitor system

to

service tire monitor,tire monitor,tire pressure monitor,tire pressure
monitoring system,tpm,low tire warning,tire pressure monitor system,tpms

I don't get a crash. I tried it with some other fields too. e.g.:

asdm,airbag system diagnostic module => crash

airbag system diagnostic module,asdm => no crash

Thanks
Brendan



On Thu, Aug 15, 2013 at 1:37 PM, Brendan Grainger <
brendan.grain...@gmail.com> wrote:

> Hi All,
>
> I've been debugging an issue where the query 'tpms' would make the
> spellchecker throw the following exception:
>
> 21021 [qtp91486057-17] ERROR org.apache.solr.servlet.SolrDispatchFilter  –
> null:java.lang.StringIndexOutOfBoundsException: String index out of range:
> -1
>  at
> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
> at java.lang.StringBuilder.replace(StringBuilder.java:266)
>  at
> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190)
> at
> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75)
>
>
> I have the following synonyms defined for tpms:
>
> tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure
> monitoring system,tpm,low tire warning,tire pressure monitor system
>
> Note that if you query any of the other synonyms there is no issue, only
> tpms.
>
> Looking at my field definition for my spellchecker I realized I am doing
> query time synonym expansion:
>
>  positionIncrementGap="100" omitNorms="true">
>   
> 
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> enablePositionIncrements="true"
> />
> 
> 
>   
>   
> 
>  ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> enablePositionIncrements="true"
> />
> 
> 
>   
> 
>
> I copied this field definition from:
> http://wiki.apache.org/solr/SpellCheckingAnalysis. As the issue seemed
> related to synonyms I removed the SynonymFilterFactory and everything
> works.
>
> I'm going to try to create a reproducible test case for the crash, but
> right now I'm wondering what I lose by not having synonym expansion when
> spell checking?
>
> Thanks
> Brendan
>
>
>


-- 
Brendan Grainger
www.kuripai.com


Synonym Expansion in Spellchecking Field Solr 4.3.1

2013-08-15 Thread Brendan Grainger
Hi All,

I've been debugging an issue where the query 'tpms' would make the
spellchecker throw the following exception:

21021 [qtp91486057-17] ERROR org.apache.solr.servlet.SolrDispatchFilter  –
null:java.lang.StringIndexOutOfBoundsException: String index out of range:
-1
at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
at java.lang.StringBuilder.replace(StringBuilder.java:266)
at
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190)
at
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75)


I have the following synonyms defined for tpms:

tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure
monitoring system,tpm,low tire warning,tire pressure monitor system

Note that if you query any of the other synonyms there is no issue, only
tpms.

Looking at my field definition for my spellchecker I realized I am doing
query time synonym expansion:


  




  
  





  


I copied this field definition from:
http://wiki.apache.org/solr/SpellCheckingAnalysis. As the issue seemed
related to synonyms I removed the SynonymFilterFactory and everything
works.

I'm going to try to create a reproducible test case for the crash, but
right now I'm wondering what I lose by not having synonym expansion when
spell checking?

Thanks
Brendan


AUTO: Siobhan Roche is out of the office (returning 26/08/2013)

2013-08-15 Thread Siobhan Roche

I am out of the office until 26/08/2013.

I will respond to your query on my return.
For any urgent queries, please contact Andrew Morrison.
Thanks
Siobhan


Note: This is an automated response to your message  "Re: SOLR4 Spatial
sorting and query string" sent on 15/08/2013 16:10:07.

This is the only notification you will receive while this person is away.



Re: SOLR4 Spatial sorting and query string

2013-08-15 Thread David Smiley (@MITRE.org)
Hi Roy,

You'll have to calculate this client-side.  I am aware of this conundrum and
I put up a TODO JIRA item for it here months ago: 
https://issues.apache.org/jira/browse/SOLR-4633It actually shouldn't be
that hard to do.

~ David


roySolr wrote
> Hello David,
> 
> The first months there will be not that many points in a doc, i will keep
> the topic in mind!
> 
> The next step is that i want to now which location matched my query.
> Example:
> 
> Product A is available in 3 stores, the doc looks like this:
/
> 
> 
> Product A
> 
> 
>   
> 
> store1_geo
> 
>   
> 
> store2_geo
> 
>   
> 
> store3_geo
> 
> 
> 
> 
> London#store1_geo
> 
> 
> Amsterdam#store2_geo
> 
> 
> New York#store3_geo
> 
> 
> 
/
> 
> I query the index with my location set to Berlin and a radius of 250km. I
> know that this result gets back on the first place because it's close to
> Amsterdam(store2_geo). But normally, How can i know which one matched my
> query as closest point? Is it possible to get this back? I can do it in my
> application but with 200 stores in a doc i don't think it's the best
> solution.
> 
> Thanks,
> 
> Roy





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR4-Spatial-sorting-and-query-string-tp4084318p4084816.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexoutofbounds size: 9 index: 8 with data import handler

2013-08-15 Thread eShard
Good morning,
I'm using solr 4.0 final on tomcat 7.0.34 on linux
I created 3 new data import handlers to consume 3 RSS feeds.
They seemed to work perfectly.
However, today, I'm getting these errors:
10:42:17SEVERE  SolrCorejava.lang.IndexOutOfBoundsException: 
Index: 9,​
Size: 8
10:42:17SEVERE  SolrDispatchFilter 
null:java.lang.IndexOutOfBoundsException: Index: 9,​ Size: 8
10:42:17SEVERE  SolrCoreorg.apache.solr.common.SolrException: 
Server at
https://search:7443/solr/Communities returned non ok status:500,​
message:Internal Server Error
10:42:17SEVERE  SolrDispatchFilter 
null:org.apache.solr.common.SolrException: Server at
https://search/solr/Communities returned non ok status:500,​
message:Internal Server Error

I read that the index is corrupt so I deleted it and restarted and then the
same errors jumped to the next core with the DIH for the RSS feed.

How do I fix this?

Here's my dih in solrconfig.xml
  

dih-comm-feed.xml
 SemaAC

  

Here's the dih config




https://search/C3CommunityFeedDEV/";
processor="XPathEntityProcessor"
forEach="/rss/channel/item"
transformer="DateFormatTransformer">
















Here a partial of my schema
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   
   
  
   
   

   
   
   
   
   
   
   
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexoutofbounds-size-9-index-8-with-data-import-handler-tp4084812.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question about filter query: "half" of my index is slower than the other?

2013-08-15 Thread Dmitry Kan
@Erick: thanks for sharing the knowledge on the hit ratio - evictions
interplay. Sounds quite reasonable.

Dmitry


On Sat, Aug 10, 2013 at 3:11 AM, Erick Erickson wrote:

> To add to what Shawn said, this filterCache is enormous. The key statistics
> are
> the hit ratio and evictions. Evictions aren't bad if the hit ratio is high.
> If hit ratio is
> low and evictions are high, only then should you consider making it larger.
> So
> I'd drop it back to 512.
>
> Hit ratios around 75% are my personal "too low" number, but YMMV...
>
> BUT, it's an LRU cache. So assuming you're forming a filter query for the
> two "sides" and that you append an fq clause to every query,
> you'll only need two entries . Plus, of course, other fqs.
>
> The first thing I'd do is only return 10 rows, turn off highlighting, and
> anything
> else that comes to mind. Then add them back and see which ones
> are causing you grief.
>
> Or add &debug=timing. That'll return a list of how much time each
> component takes and may give you a clue as well.
>
> Best
> Erick
>
>
> On Fri, Aug 9, 2013 at 1:55 PM, Shawn Heisey  wrote:
>
> > On 8/9/2013 9:36 AM, Neal Ensor wrote:
> >
> >> I have an 8 million document solr index, roughly divided down the middle
> >> by
> >> an identifying "product" value, one of two distinct values.  The
> documents
> >> in both "sides" are very similar, with stored text fields, etc.  I have
> >> two
> >> nearly identical request handlers, one for each "side".
> >>
> >> When I perform very similar queries on either "side" for random phrases,
> >> requesting 500 rows with highlighting on titles and summaries, I get
> very
> >> different results.  One "side" consistently returns results in around
> 1-2
> >> seconds, whereas the other one consistently returns in 6-10 seconds.  I
> >> don't see any reason why it's worse; each run of queries is deliberately
> >> randomized to avoid caches getting in the way.  Each test query returns
> >> the
> >> full first 500 in most cases.
> >>
> >
> >  My filter query cache configuration looks like:
> >>
> >>  >>   size="75"
> >>   initialSize="1"
> >>   autowarmCount="0"/>
> >>
> >
> > This filterCache is *enormous* ... even the initialSize is larger than I
> > would normally expect to see for the total size.  With 8 million
> documents,
> > each entry in the cache can be 1 megabyte, and in practice, the entry
> will
> > be either very small or it will be the full 1 megabyte ... depending on
> how
> > many documents get matched by a filter. This has the potential to chew
> up a
> > lot of RAM without really doing much for you.
> >
> > If the same problem happens when you drastically reduce the size of
> > filterCache, I suspect basic performance problems.  Even 1-2 seconds
> seems
> > very slow to me.
> >
> > The first questions I have are some statistics about your index and the
> > server you're running it on.  How big is that index in terms of disk
> space?
> >  How much RAM are you allocating to the JVM?  How much RAM is in the
> entire
> > machine?  Is the machine running software other than Solr, such as a web
> > server, database server, etc?  What operating system are you running on,
> is
> > it 64 bit, and is Java 64 bit?
> >
> > Next, I'd like to know more about your queries.  Can you include typical
> > examples of all query parameters for both "sides"?  What does the indexed
> > and stored data look like for a typical document?  Depending on what I
> > learn here, I might need to see all or part of your config and schema.
> >
> > How often do you send updates/deletes to your index?  How often and
> > exactly how are you doing commits, and do you have any auto commit in
> your
> > config?
> >
> > Thanks,
> > Shawn
> >
> >
>


sort different groups using different boost fields, not by score

2013-08-15 Thread Gerd Koenig
Hi,

I'd like to ask for your support at how I can boost different groups by
using different fields.
The current query
"select?defType=edismax&q=madonna&group=true&group.field=type&group.limit=3"
returns



drinks

...



food

...




The documents within each groupValue are sorted by score (default). What I
want to achive is, that for each result group the returned documents are
determined by using boosting on different fields.
Example:
- the document list in type "drinks" shall be boosted by fields
name^3,year^2,no_of_ingredients^1
- the document list in type "food" shall be boosted by fields calories^2,
difficulty^0.5,title,author

Is this possible at all by using single query ? If yes, how ? or do I have
to use one additional query per group ?

Or do I have to use facetting to receive the top-3 results of drinks and
foods boosted by the field mentioned above ? How would I implement my
requirement in the case of using facetting ?


Re: SOLR4 Spatial sorting and query string

2013-08-15 Thread roySolr
Hello David,

The first months there will be not that many points in a doc, i will keep
the topic in mind!

The next step is that i want to now which location matched my query.
Example:

Product A is available in 3 stores, the doc looks like this:

/
Product A

  store1_geo
  store2_geo
  store3_geo


London#store1_geo
Amsterdam#store2_geo
New York#store3_geo

/

I query the index with my location set to Berlin and a radius of 250km. I
know that this result gets back on the first place because it's close to
Amsterdam(store2_geo). But normally, How can i know which one matched my
query as closest point? Is it possible to get this back? I can do it in my
application but with 200 stores i don't think it's the best solution.

Thanks,

Roy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR4-Spatial-sorting-and-query-string-tp4084318p4084795.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.3 and above core swap

2013-08-15 Thread richardg
Since upgrading to solr 4.3 we get the following errors on our slaves when we
swap cores on our master and is still an issue on 4.4:

Solr index directory
'/usr/local/solr_aggregate/solr_aggregate/data/index.20130513152644966' is
locked.  Throwing exception

SEVERE: Unable to reload core: production
org.apache.solr.common.SolrException: Index locked for write for core
production

SEVERE: Could not reload core 
org.apache.solr.common.SolrException: Unable to reload core: production

On older solr versions it would create a new index.* directory and use it,
it hasn't been the case w/ 4.3+.  The new core seems to replicate fine and
the new index files are in the original index.* directory so I'm not sure
what is happening.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-3-and-above-core-swap-tp4084794.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Troubles defining suggester/ understanding results

2013-08-15 Thread Jack Krupansky
To say some of that more specifically, the keyword tokenizer returns the 
full input as one long token (embedded spaces and all) and the regex filter 
modifies that one long token (replacing some characters with spaces) but it 
remains as one long token, so the final output is a single token with no 
true tokenization, and without tokenization, "labelled" is never output as a 
discrete token, and hence never seen by the suggester.


-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Thursday, August 15, 2013 8:26 AM
To: solr-user@lucene.apache.org
Subject: Re: Troubles defining suggester/ understanding results

Isn't your phrase_suggest analyzer always going to treat every input string
as one single token, and that single token is what the suggester is
returning? And, that single token starts with "th", not "lab"? Is that what
you want?

Try using the Solr admin UI analyzer page to verify how the analyzer is
working.

-- Jack Krupansky

-Original Message- 
From: Mysurf Mail

Sent: Thursday, August 15, 2013 4:32 AM
To: solr-user@lucene.apache.org
Subject: Troubles defining suggester/ understanding results

I am having troubles defining suggester for auto complete after reading the
tutorial.

Here are my shcema definitions:



...


I also added two field types


 
   
   
 




 
   
   
   
   
 


Now since I want to make suggestions from multiple fields and I cant
declare two fields I defined :

and copied three of the fields using :

Problems:
1. everything loads pretty well. but copying the fields to a new fields
just inflate my index. is there a possiblity to define the suggester on
mopre then one field? 2. I cant understand the results. querying

http://127.0.0.1:8983/solr/Book/suggest?q=th

returns docs such as
"that are labelled in black on a black background a little black light"
though quering

http://127.0.0.1:8983/solr/vault-Book/suggest?q=lab

doesnt return anything.
lab is found in the previous result as well.
What is the problem? 



Re: Troubles defining suggester/ understanding results

2013-08-15 Thread Jack Krupansky
Isn't your phrase_suggest analyzer always going to treat every input string 
as one single token, and that single token is what the suggester is 
returning? And, that single token starts with "th", not "lab"? Is that what 
you want?


Try using the Solr admin UI analyzer page to verify how the analyzer is 
working.


-- Jack Krupansky

-Original Message- 
From: Mysurf Mail

Sent: Thursday, August 15, 2013 4:32 AM
To: solr-user@lucene.apache.org
Subject: Troubles defining suggester/ understanding results

I am having troubles defining suggester for auto complete after reading the
tutorial.

Here are my shcema definitions:



...


I also added two field types


 
   
   
 




 
   
   
   
   
 


Now since I want to make suggestions from multiple fields and I cant
declare two fields I defined :

and copied three of the fields using :

Problems:
1. everything loads pretty well. but copying the fields to a new fields
just inflate my index. is there a possiblity to define the suggester on
mopre then one field? 2. I cant understand the results. querying

http://127.0.0.1:8983/solr/Book/suggest?q=th

returns docs such as
"that are labelled in black on a black background a little black light"
though quering

http://127.0.0.1:8983/solr/vault-Book/suggest?q=lab

doesnt return anything.
lab is found in the previous result as well.
What is the problem? 



Re: Solr4 update and query performance question

2013-08-15 Thread Erick Erickson
bq: There is no batching while updating/inserting documents in Solr3

Correct, but all the updates only went to the server you targeted them for.
The batching you're seeing is the auto-distributing the docs to the various
shards, a whole different animal.

Keep an eye on: https://issues.apache.org/jira/browse/SOLR-4816. You might
prompt Joel to see if this is testable. This JIRA routes the docs directly
to the leader of the shard they should go to. IOW it does the routing on
the client side. There will still be batching from the leader to the
replicas, but this should help.

It is usually a Bad Thing to commit after every batch either in Solr 3 or
Solr 4 from the client. I suspect you're right that the wait for all the
searchers on all the shards is one of your problems. Try configuring
autocommit (both hard and soft) in solrconfig.xml and forgetting the commit
bits from the client. This is the usual pattern in Solr4.

Your soft commit (which may be commented out) controls when the documents
are searchable. It is less expensive than hard commits with
openSearcher=true and makes docs visible. Hard commit closes the current
segment and opens a new one. So set up openSearcher=false for your hard
commit and a soft commit interval of whatever latency you can stand would
by my recommendation.

Final note: if you set your hard commit with openSearcher=false, do it
fairly often since it truncates the transaction logs and is quite
inexpensive. If you let your tlog grow huge, if you kill your server and
re-start Solr you get into a situation where solr may replay the tlog. If
it has a bazillion docs in it that can take a very long time to start up.

Best
Erick




On Wed, Aug 14, 2013 at 4:39 PM, Joshi, Shital  wrote:

> We didn't copy/paste Solr3 config to solr4. We started with Solr4 config
> and only updated new searcher queries and few other things.
>
> There is no batching while updating/inserting documents in Solr3, is that
> correct? Committing 1000 documents in Solr3 takes 19 seconds while in Solr4
> it takes about 3-4 minutes. We noticed in Solr4 logs that, commit only
> returns after new searcher is created across all nodes. This is possibly
> cause waitSearcher=true by default in Solr4. This was not the case with
> Solr3, commit would return without waiting for new searcher creation.
>
> In order to improve performance with Solr4, we first changed from
> commit=true to commit=false in update URL and added autoHardCommit setting
> in solrconfig.xml. This improved performance from 3-4 minutes to 1-2
> minutes but that is not good enough.
>
> Then we changed maxBufferedAddsPerServer value in SolrCmdDistributor class
> from 10 to 1000 and deployed this class in
> $JETTY_TEMP_FOLDER/solr-webapp/webapp/WEB-INF/classes folder and restarted
> solr4 nodes. But we still see the batch size of 10 being used. Did we
> change correct variable/class?
>
> Next thing We will try using softCommit=true in update url and check if it
> gives us desired performance.
>
> Thanks for looking into this. Appreciate your help.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, August 13, 2013 8:12 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr4 update and query performance question
>
> 1> That's hard-coded at present. There's anecdotal evidence that there
>  are throughput improvements with larger batch sizes, but no action
>  yet.
> 2> Yep, all searchers are also re-opened, caches re-warmed, etc.
> 3> Odd. I'm assuming your Solr3 was master/slave setup? Seeing the
> queries would help diagnose this. Also, did you try to copy/paste
> the configuration from your Solr3 to Solr4? I'd start with the
> Solr4 and copy/paste only the parts needed from your SOlr3 setup.
>
> Best
> Erick
>
>
> On Mon, Aug 12, 2013 at 11:38 AM, Joshi, Shital 
> wrote:
>
> > Hi,
> >
> > We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes
> > with about 450 mil documents (~90 mil per shard). We're loading 1000 or
> > less documents in CSV format every few minutes. In Solr3, with 300 mil
> > documents, it used to take 30 seconds to load 1000 documents while in
> > Solr4, its taking up to 3 minutes to load 1000 documents. We're using
> > custom sharding, we include _shard_=shardid parameter in update command.
> > Upon looking Solr4 log files we found that:
> >
> > 1.   Documents are added in a batch of 10 records. How do we increase
> > this batch size from 10 to 1000 documents?
> >
> > 2.  We do hard commit after loading 1000 documents. For every hard
> > commit, it refreshes searcher on all nodes. Are all caches also refreshed
> > when hard commit happens? We're planning to change to soft commit and do
> > auto hard commit every 10-15 minutes.
> >
> > 3.  We're not seeing improved query performance compared to Solr3.
> > Queries which took 3-5 seconds in Solr3 (300 mil docs) are taking 20
> > seconds with Solr4. We think this could be due to freque

Re: PostingsHighlighter returning fields which don't match

2013-08-15 Thread ses
Thanks, we tried modifying the source as suggested but found in our case
PostingsHighlighter was returning no highlighting at all once we removed the
self-closing tags. I think perhaps we were not using it in the correct way.


Robert Muir wrote
> Do you want to open a JIRA issue to just change the behavior?

Yes, I think it would be useful to have it is an optional feature which can
be triggered by a parameter as suggested. This is how we implemented it, and
if it were returning highlighting we would happily contribute this back, but
as it stands its not properly tested. I will create a JIRA ticket to cover
this desired functionality though.


Robert Muir wrote
> Unrelated: If your queries actually go against a large number of fields,
> I'm not sure how efficient this highlighter will be. Thats because at some
> number of N fields, it will be much more efficient to use a
> document-oriented term vector approach (e.g. standard
> highlighter/fast-vector-highlighter).

Yes unfortunately it is not any faster. Our original problem was
highlighting performance and in our case PostingsHighlighter is performing
similarly to the default highlighter. 

We are now trying a solution which involves running one query to obtain the
field names in the N documents retrieved (where N=rows) and then a separate
query to specify those fields in 'hl.fl' parameter. This is working on the
basis that those two seperate queries run much faster than one query with
hl.fl=my_dynamic_field_*

Thanks for your detailed responses.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/PostingsHighlighter-returning-fields-which-don-t-match-tp4084495p4084774.html
Sent from the Solr - User mailing list archive at Nabble.com.


Troubles defining suggester/ understanding results

2013-08-15 Thread Mysurf Mail
I am having troubles defining suggester for auto complete after reading the
tutorial.

Here are my shcema definitions:



 ...


I also added two field types


  


  




  




  


Now since I want to make suggestions from multiple fields and I cant
declare two fields I defined :

and copied three of the fields using :

Problems:
1. everything loads pretty well. but copying the fields to a new fields
just inflate my index. is there a possiblity to define the suggester on
mopre then one field? 2. I cant understand the results. querying

 http://127.0.0.1:8983/solr/Book/suggest?q=th

returns docs such as
"that are labelled in black on a black background a little black light"
though quering

 http://127.0.0.1:8983/solr/vault-Book/suggest?q=lab

doesnt return anything.
lab is found in the previous result as well.
What is the problem?


Re: Who's cleaning the Fieldcache?

2013-08-15 Thread Andrea Gazzarini

Hi Chris, Robert

Thank you very much. First, answers to your questions:

1) which version of Solr are you using?

3.6.0


2) is it possibly you have multiple searchers open (ie: one in use
while another one is warming up) when you're seeing these stats?

No, no multiple searchers.


Now, after one day of experiments, I think I got what's happening. 
Briefly, the behaviour seems to be exactly what Chris described (Weak 
references that are garbage collected when needed). Instead I'm not 
seeing what described by Robert.


This is what I understood aboyut my problem:

- Xms4GB
- sort fields definitely too big...once loaded they got about more than 
2GB of memory


So when replication occurs "new" sort field values (belonging to the new 
replicated segment) are loaded in memory...but before old segment are 
garbage collected on slave (I mean, sort field values belonging to the 
old segment) I have no enough memory (2GB + 2GB...only for sort 
fields)so et voilà: OutOfMemory


What I've done is to reduce unique values of sort fields (now the 
FieldCacheImpl is about 600MB) so there's enough memory for ordinary 
work, for replication stuff and for retaining two different 
FieldCacheImpl references (old and replicated segment)...in this way I 
see exatcly what Chris described: when replication occurs, memory grows 
(on slave) for about 800MB; hereafter the memory has a constant growing 
(very slowly) graph but when it reaches a certain point (about 3.7GB) 
garbage collector run and frees something like 2.2GB. Good, I repeat 
this test a lot of times and the behaviour is always the same.


Thank you very much to both of you
Andrea


On 08/14/2013 11:58 PM, Chris Hostetter wrote:

: > FieldCaches are managed using a WeakHashMap - so once the IndexReader's
: > associated with those FieldCaches are no logner used, they will be garbage
: > collected when and if the JVMs garbage collector get arround to it.
: >
: > if they sit arround after you are done with them, they might look like the
: > ytake upa log of memory, but that just means your JVM Heap has that memory
: > to spare and hasn't needed to clean them up yet.
:
: I don't think this is correct.
:
: When you register an entry in the fieldcache, it registers event
: listeners on the segment's core so that when its close()d, any entries
: are purged rather than waiting on GC.
:
: See FieldCacheImpl.java

Ah ... sweet.  I didn't realize that got added.

(In any case: it looks like a WeakHashMap is still used in case the
listeners never get called, correct?)

But bassed on the details from the OP's first message, it looks like he's
running Solr 3.x (there's mentions of "SolrIndexReader" which fromat what
i can tell was gone by 4.0) so perhaps this is an older version before all
the kinks were worked out in the reader close listeners used by
fieldcache?  (I'm noticing things like LUCENE-3644 in particular)

Andrea:

1) which version of Solr are you using?
2) is it possibly you have multiple searchers open (ie: one in use
while another one is warming up) when you're seeing these stats?



-Hoss




Re: Getter API for SolrCloud

2013-08-15 Thread Furkan KAMACI
Here is a conversation about it:
http://lucene.472066.n3.nabble.com/SolrCloud-with-Zookeeper-ensemble-in-production-environment-SEVERE-problems-td4047089.html
However the result of conversation is not clear. Any ideas?


2013/8/15 Furkan KAMACI 

> I've implemented an application that connects my UI and SolrCloud. I want
> to write a code that makes a search request to SolrCloud and I will send
> result to my UI. I know that there are some examples about it by I want a
> fast and really good way for it. One way I did:
>
> ModifiableSolrParams params = new ModifiableSolrParams();
> params.set("q", "*:*");
> params.set("fl", "url lang");
> params.set("sort", "url desc");
> params.set("start", start);
> QueryResponse response = lbHttpSolrServer.query(params);
> for (SolrDocument document : response.getResults()) {
> ...
> }
>
> I want to use CloudSolrServer. Is there any example that is really fast
> for getting data from SolrCloud? (I can get data at any format, i.e.
> javabin. I will process it at my bridge application and send it to UI)
>


Getter API for SolrCloud

2013-08-15 Thread Furkan KAMACI
I've implemented an application that connects my UI and SolrCloud. I want
to write a code that makes a search request to SolrCloud and I will send
result to my UI. I know that there are some examples about it by I want a
fast and really good way for it. One way I did:

ModifiableSolrParams params = new ModifiableSolrParams();
params.set("q", "*:*");
params.set("fl", "url lang");
params.set("sort", "url desc");
params.set("start", start);
QueryResponse response = lbHttpSolrServer.query(params);
for (SolrDocument document : response.getResults()) {
...
}

I want to use CloudSolrServer. Is there any example that is really fast for
getting data from SolrCloud? (I can get data at any format, i.e. javabin. I
will process it at my bridge application and send it to UI)