Re: OR-FilterQuery

2012-02-14 Thread Em
Hi Mikhail,

> it's just how org.apache.lucene.search.CachingWrapperFilter works. The
> first out-of-the box stuff which I've found.
Thanks for your explanation and snippets - I thought this was configurable.

Regards,
Em

Am 15.02.2012 06:16, schrieb Mikhail Khludnev:
> On Tue, Feb 14, 2012 at 11:13 PM, Em  wrote:
> 
>> Hi Mikhail,
>>
>>> it will use per segment bitset at contrast to Solr's fq which caches for
>>> top level reader.
>> Could you explain why this bitset would be per-segment based, please?
> 
> I don't see a reason why this *have* to be so.
>>
> it's just how org.apache.lucene.search.CachingWrapperFilter works. The
> first out-of-the box stuff which I've found.
> as an top-level segment alternative we need
> org.apache.solr.search.SolrIndexSearcher.getDocSet(Query).
> 
> btw, one more top-level snippet
> 
> class FQParser extends QParser{
> 
> Query parse(...){
>   return new SolrConstantScoreQuery(
>   solrIndexSearcher.getDocSet(
>subQuery(localParam.get(V))
>   ).getTopFilter())
> }
> }
> 
> 
> 
>> What is the benefit you are seeing?
>>
>  It seems like two different POVs: Lucene prefer per segment caching to
> have fast incremental updates, but maybe 'because it's good but not in
> worst case' (I guess I've heard it there
> http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/many-facets-apache-solr)
> Solr prefer top-reader caches.
> 
> 
>> Kind regards,
>> Em
>>
>> Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
>>> Hi Em,
>>>
>>> I briefly read the thread. Are you talking about combing of cached
>> clauses
>>> of BooleanQuery, instead of evaluating whole BQ as a filter?
>>>
>>> I found something like that in API (but only in API)
>>>
>> http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)
>>>
>>> Am I get you right? Why do you need it, btw? If I'm ..
>>> I have idea how to do it in two mins:
>>>
>>> q=+f:text
>>> +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3
>> _query_:{!fq}id:4)...
>>>
>>> Right leg will be a BooleanQuery with SHOULD clauses backed on cached
>>> queries (see below).
>>>
>>> if you are not scarred by the syntax yet you can implement trivial
>>> "fq"QParserPlugin, which will be just
>>>
>>> // lazily through User/Generic Cache
>>> q = new FilteredQuery (new MatchAllDocsQuery(), new
>>> CachingWrapperFilter(new
>>> QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V);
>>> return q;
>>>
>>> it will use per segment bitset at contrast to Solr's fq which caches for
>>> top level reader.
>>>
>>> WDYT?
>>>
>>> On Mon, Feb 13, 2012 at 11:34 PM, Em 
>> wrote:
>>>
 Hi,

 have a look at:
 http://search-lucene.com/m/Z8lWGEiKoI

 I think not much had changed since then.

 Regards,
 Em

 Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
> Hi,
>
> how efficent is such an query:
>
> q=some text
> fq=id:(1 OR 2 OR 3...)
>
> Should I better use q:some text AND id:(1 OR 2 OR 3...)?
>
> Is the Filter Cache used for the OR'ed fq?
>
> Thank you
>
>

>>>
>>>
>>>
>>
> 
> 
> 


Re: OR-FilterQuery

2012-02-14 Thread Mikhail Khludnev
On Tue, Feb 14, 2012 at 11:13 PM, Em  wrote:

> Hi Mikhail,
>
> > it will use per segment bitset at contrast to Solr's fq which caches for
> > top level reader.
> Could you explain why this bitset would be per-segment based, please?

I don't see a reason why this *have* to be so.
>
it's just how org.apache.lucene.search.CachingWrapperFilter works. The
first out-of-the box stuff which I've found.
as an top-level segment alternative we need
org.apache.solr.search.SolrIndexSearcher.getDocSet(Query).

btw, one more top-level snippet

class FQParser extends QParser{

Query parse(...){
  return new SolrConstantScoreQuery(
  solrIndexSearcher.getDocSet(
   subQuery(localParam.get(V))
  ).getTopFilter())
}
}



> What is the benefit you are seeing?
>
 It seems like two different POVs: Lucene prefer per segment caching to
have fast incremental updates, but maybe 'because it's good but not in
worst case' (I guess I've heard it there
http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/many-facets-apache-solr)
Solr prefer top-reader caches.


> Kind regards,
> Em
>
> Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
> > Hi Em,
> >
> > I briefly read the thread. Are you talking about combing of cached
> clauses
> > of BooleanQuery, instead of evaluating whole BQ as a filter?
> >
> > I found something like that in API (but only in API)
> >
> http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)
> >
> > Am I get you right? Why do you need it, btw? If I'm ..
> > I have idea how to do it in two mins:
> >
> > q=+f:text
> > +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3
> _query_:{!fq}id:4)...
> >
> > Right leg will be a BooleanQuery with SHOULD clauses backed on cached
> > queries (see below).
> >
> > if you are not scarred by the syntax yet you can implement trivial
> > "fq"QParserPlugin, which will be just
> >
> > // lazily through User/Generic Cache
> > q = new FilteredQuery (new MatchAllDocsQuery(), new
> > CachingWrapperFilter(new
> > QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V);
> > return q;
> >
> > it will use per segment bitset at contrast to Solr's fq which caches for
> > top level reader.
> >
> > WDYT?
> >
> > On Mon, Feb 13, 2012 at 11:34 PM, Em 
> wrote:
> >
> >> Hi,
> >>
> >> have a look at:
> >> http://search-lucene.com/m/Z8lWGEiKoI
> >>
> >> I think not much had changed since then.
> >>
> >> Regards,
> >> Em
> >>
> >> Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
> >>> Hi,
> >>>
> >>> how efficent is such an query:
> >>>
> >>> q=some text
> >>> fq=id:(1 OR 2 OR 3...)
> >>>
> >>> Should I better use q:some text AND id:(1 OR 2 OR 3...)?
> >>>
> >>> Is the Filter Cache used for the OR'ed fq?
> >>>
> >>> Thank you
> >>>
> >>>
> >>
> >
> >
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


feeding mahout cluster output back to solr

2012-02-14 Thread abhayd
hi 
at present we use carrot2 for clustering and doing analysis on customer
feedback data. Since its in memory and search time we are having issues with
performance and cluster size.

I was reading about generating clusters using mahout from solr index data. 

But can we feed segmentation generated by mahout back into solr to use as
facets? I am not even sure how the output from mahout looks like so wanted
to know



--
View this message in context: 
http://lucene.472066.n3.nabble.com/feeding-mahout-cluster-output-back-to-solr-tp3745883p3745883.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
All of the nodes now show as being Active.  When starting the replicas
I did receive the following message though.  Not sure if this is
expected or not.

INFO: Attempting to replicate from
http://JamiesMac.local:8501/solr/slice2_shard2/
Feb 14, 2012 10:53:34 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to
recover:org.apache.solr.common.SolrException: null
java.lang.NullPointerException  at
org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326) at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)  at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

null  java.lang.NullPointerExceptionat
org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326) at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

request: 
http://JamiesMac.local:8501/solr/admin/cores?action=PREPRECOVERY&core=slice2_shard2&nodeName=JamiesMac.local:8502_solr&coreNodeName=JamiesMac.local:8502_solr_slice2_shard1&wt=javabin&version=2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:433)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:208)

Feb 14, 2012 10:53:34 PM org.apache.solr.update.UpdateLog dropBufferedUpdates

Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
Doing so now, will let you know if I continue to see the same issues

On Tue, Feb 14, 2012 at 4:59 PM, Mark Miller  wrote:
> Doh - looks like I was just seeing a test issue. Do you mind updating and 
> trying the latest rev? At the least there should be some better logging 
> around the recovery.
>
> I'll keep working on tests in the meantime.
>
> - Mark
>
> On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote:
>
>> Sounds good, if I pull the latest from trunk and rerun will that be
>> useful or were you able to duplicate my issue now?
>>
>> On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller  wrote:
>>> Okay Jamie, I think I have a handle on this. It looks like an issue with 
>>> what config files are being used by cores created with the admin core 
>>> handler - I think it's just picking up default config and not the correct 
>>> config for the collection. This means they end up using config that has no 
>>> UpdateLog defined - and so recovery fails.
>>>
>>> I've added more logging around this so that it's easy to determine that.
>>>
>>> I'm investigating more and working on a test + fix. I'll file a JIRA issue 
>>> soon as well.
>>>
>>> - Mark
>>>
>>> On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:
>>>
 Thanks Mark, not a huge rush, just me trying to get to use the latest
 stuff on our project.

 On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller  
 wrote:
> Sorry, have not gotten it yet, but will be back trying later today - 
> monday, tuesday tend to be slow for me (meetings and crap).
>
> - Mark
>
> On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
>
>> Has there been any success in replicating this?  I'm wondering if it
>> could be something with my setup that is causing the issue...
>>
>>
>> On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
>>> Yes, I have the following layout on the FS
>>>
>>> ./bootstrap.sh
>>> ./example (standard example directory from distro containing jetty
>>> jars, solr confs, solr war, etc)
>>> ./slice1
>>>  - start.sh
>>>  -solr.xml
>>>  - slice1_shard1
>>>   - data
>>>  - slice2_shard2
>>>   -data
>>> ./slice2
>>>  - start.sh
>>>  - solr.xml
>>>  -slice2_shard1
>>>    -data
>>>  -slice1_shard2
>>>    -data
>>>
>>> if it matters I'm running everything from localhost, zk and the solr 
>>> shards
>>>
>>> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>>>
>>> - Mark Miller
>>> lucidimagination.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: Can I rebuild an index and remove some fields?

2012-02-14 Thread Li Li
I have roughly read the codes of 4.0 trunk. maybe it's feasible.
SegmentMerger.add(IndexReader) will add to be merged Readers
merge() will call
  mergeTerms(segmentWriteState);
  mergePerDoc(segmentWriteState);

   mergeTerms() will construct fields from IndexReaders
for(int
readerIndex=0;readerIndexwrote:

> I was thinking if I make a wrapper class that aggregates another
> IndexReader and filter out terms I don't want anymore it might work.   And
> then pass that wrapper into SegmentMerger.  I think if I filter out terms
> on GetFieldNames(...) and Terms(...) it might work.
>
> Something like:
>
> HashSet ignoredTerms=...;
>
> FilteringIndexReader wrapper=new FilterIndexReader(reader);
>
> SegmentMerger merger=new SegmentMerger(writer);
>
> merger.add(wrapper);
>
> merger.Merge();
>
>
>
>
>
> On Feb 14, 2012, at 1:49 AM, Li Li wrote:
>
> > for method 2, delete is wrong. we can't delete terms.
> >   you also should hack with the tii and tis file.
> >
> > On Tue, Feb 14, 2012 at 2:46 PM, Li Li  wrote:
> >
> >> method1, dumping data
> >> for stored fields, you can traverse the whole index and save it to
> >> somewhere else.
> >> for indexed but not stored fields, it may be more difficult.
> >>if the indexed and not stored field is not analyzed(fields such as
> >> id), it's easy to get from FieldCache.StringIndex.
> >>But for analyzed fields, though theoretically it can be restored from
> >> term vector and term position, it's hard to recover from index.
> >>
> >> method 2, hack with metadata
> >> 1. indexed fields
> >>  delete by query, e.g. field:*
> >> 2. stored fields
> >>   because all fields are stored sequentially. it's not easy to
> delete
> >> some fields. this will not affect search speed. but if you want to get
> >> stored fields,  and the useless fields are very long, then it will slow
> >> down.
> >>   also it's possible to hack with it. but need more effort to
> >> understand the index file format  and traverse the fdt/fdx file.
> >>
> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html
> >>
> >> this will give you some insight.
> >>
> >>
> >> On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart  >wrote:
> >>
> >>> Lets say I have a large index (100M docs, 1TB, split up between 10
> >>> indexes).  And a bunch of the "stored" and "indexed" fields are not
> used in
> >>> search at all.  In order to save memory and disk, I'd like to rebuild
> that
> >>> index *without* those fields, but I don't have original documents to
> >>> rebuild entire index with (don't have the full-text anymore, etc.).  Is
> >>> there some way to rebuild or optimize an existing index with only a
> sub-set
> >>> of the existing indexed fields?  Or alternatively is there a way to
> avoid
> >>> loading some indexed fields at all ( to avoid loading term infos and
> terms
> >>> index ) ?
> >>>
> >>> Thanks
> >>> Bob
> >>
> >>
> >>
>
>


Re: Solr soft commit feature

2012-02-14 Thread Mark Miller
This has not been ported back to the 3.X line yet - mostly because it involved 
some rather large and invasive changes that I wanted to bake on trunk for some 
time first.

Even still, the back port is not trivial, so I don't know that it's something 
I'd personally be able to get to in the short term. If I had any free time, I'd 
probably prefer pushing towards a 4 release with NRT. Some of the changes also 
broke back compat behavior in ways that are more acceptable over a major 
release.

Someone else might jump in and do the work of course.

On Feb 14, 2012, at 7:41 PM, Dipti Srivastava wrote:

> Hi All,
> Is there a way to soft commit in the current released version of solr 3.5?
> 
> Regards,
> Dipti Srivastava
> 
> 
> This message is private and confidential. If you have received it in error, 
> please notify the sender and remove it from your system.
> 
> 

- Mark Miller
lucidimagination.com













Re: OR-FilterQuery

2012-02-14 Thread Erick Erickson
Ah, OK, I misread your post apparently. And yes, what you suggest
would result in some efficiencies, but at present I don't think there's any
syntax that allows one to combine filter queries as you suggest. There
was some discussion about it in the JIRA I referenced, but no action that
I could see.

That is, efficiencies in some circumstances, though I think it would be
hard to predict. For instance, imagine a set of 100 entries in an FQ. And
no, I'm not making things up, I've seen applications where this makes
sense. Splitting that out into 100 separate entries in the filterCache would
use up a lot of space. Likewise, I suspect that the actual process of
creating the heuristics that were able to analyze an incoming filter
query and "do the right thing" in terms of splitting it up and recombining
it would be pretty hairy. Local parameters for instance, and let's throw in
dereferencing too ...

So I suspect that this is one of those features that is quite easy to see
the benefits of in the simple case, but pretty quickly becomes a
nightmare to actually implement correctly, but that's mostly
a guess.

And before putting the work into it, I think modeling the actual
benefits would be wise, as well as convincing myself that there
are enough cases where this *would* be beneficial. I mean Solr
does a pretty reasonable job of caching these anyway, and with the
"non-cached" filters it's not clear to me that the benefits are
sufficient...

Good luck, though, if you want to tackle it!
Erick



On Tue, Feb 14, 2012 at 4:54 PM, Em  wrote:
> Hi Erick,
>
>> Whoa!
>>
>> fq=id(1 OR 2)
>> is not the same thing at all as
>> fq=id:1&fq=id:2
> Ahm, who said they would be the same? :)
> I mean, you are completely right in what you are saying but it seems to
> me that we are talking about two different things.
>
> I was talking about caching each filter-criteria instead of the whole
> filter-query to recombine the cached filter-criteria based on the
> boolean-operators the client sends.
>
> In other words:
> currently
> fq=id:1 OR id:2
> results into ONE cached filter-entry.
>
> fq=id:2 OR id:1
> results into ANOTHER cached filter-entry
>
> fq=id:2 AND id:1
> results into (surprise, surprise) a third filter-entry (although this
> example does not make sense).
>
> My idea was to cache each filter-criteria, that means caching the bitset
> for id:1 and the bitset for id:2 to recombine both bitsets via AND, OR,
> NOT etc. whenever this is neccessary.
>
> This way one could save memory (and maybe computing-time as well) which
> definitely makes sense when you got a way smaller set of
> filter-criterias while having a much larger set of possible (and used)
> combinations of each filter-criteria with a small number of repetitions
> per combination (which would destroy the benefit of caching).
>
> Don't you agree?
>
> Kind regards,
> Em
>
>
> Am 14.02.2012 22:33, schrieb Erick Erickson:
>> Whoa!
>>
>> fq=id(1 OR 2)
>> is not the same thing at all as
>> fq=id:1&fq=id:2
>>
>> Assuming that any document had one and only one ID,  the second clause
>> would return exactly 0 documents, each and every time.
>>
>> Multiple fq clauses are essentially set intersections. So the first query is 
>> the
>> set of all documents where id is 1 or 2
>> the second is the intersection of two sets of documents, one set
>> with an id of 1 and one with an id of 2. Not the same thing at all.
>>
>> There's no support for the concept of
>> (fq=id:1 OR fq=id:2)
>>
>> Best
>> Erick
>>
>> On Tue, Feb 14, 2012 at 2:13 PM, Em  wrote:
>>> Hi Mikhail,
>>>
>>> thanks for kicking in some brainstorming-code!
>>> The given thread is almost a year old and I was working with Solr in my
>>> freetime to see where it fails to behave/perform as I expect/wish.
>>>
>>> I found out that if you got a lot of different access-patterns for a
>>> filter-query, you might end up with either a big cache to make things
>>> fast or with lower performance (impact depends on usecase and
>>> circumstances).
>>>
>>> Scenario:
>>> You got a permission-field and the client is able to filter by one to
>>> three permission-values.
>>> That is:
>>> fq=foo:user
>>> fq=foo:moderator
>>> fq=foo:manager
>>>
>>> If you can not control/guarantee the order of the fq's values, you could
>>> end up with a lot of mess which all returns the same.
>>>
>>> Example:
>>> fq=permission:user OR permission:moderator OR permission:manager
>>> fq=permission:user OR permission:manager OR permission:moderator
>>> fq=permission:moderator OR permission:user OR permission:manager
>>> ...
>>> They all return the same but where cached seperately which leads to the
>>> fact that you are wasting memory a lot.
>>>
>>> Furthermore, if your access pattern will lead to a lot of different fq's
>>> on a small set of distinct values, it may make more sense to cache each
>>> filter-query for itself from a memory-consuming point of view (may cost
>>> a little bit performance).
>>>
>>> That beeing said, if you cache a filter for foo:user

Solr soft commit feature

2012-02-14 Thread Dipti Srivastava
Hi All,
Is there a way to soft commit in the current released version of solr 3.5?

Regards,
Dipti Srivastava


This message is private and confidential. If you have received it in error, 
please notify the sender and remove it from your system.




payload and exact match

2012-02-14 Thread leonardo2
Is there the possibility of perform 'exact search' in a payload field?

I'have to index text with auxiliary info for each word. In particular at
each word is associated the bounding box containing it in the original pdf
page (it is used for highligthing the search terms in the pdf). I used the
payload to store that information.

In the schema.xml, the fieldType definition is:

---








---

while the field definition is:

---

---

When indexing, the field 'words' contains a list of word|box as in the
following example:

---
doc_id=example
words={Fonte:|307.62,948.16,324.62,954.25 Comune|326.29,948.16,349.07,954.25
di|350.74,948.16,355.62,954.25 Bologna|358.95,948.16,381.28,954.25}
---

Such solution works well except in the case of an exact search. For example,
assuming the only indexed doc is the 'example' doc (before shown), the query
words:"Comune di Bologna" returns no results.

Someone know if there is the possibility of perform 'exact search' in a
payload field?

Thanks in advance,
Leonardo


--
View this message in context: 
http://lucene.472066.n3.nabble.com/payload-and-exact-match-tp3745369p3745369.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Semantic autocomplete with Solr

2012-02-14 Thread Roman Chyla
done something along these lines:

https://svnweb.cern.ch/trac/rcarepo/wiki/InspireAutoSuggest#Autosuggestautocompletefunctionality

but you would need MontySolr for that - https://github.com/romanchyla/montysolr

roman

On Tue, Feb 14, 2012 at 11:10 PM, Octavian Covalschi
 wrote:
> Hey guys,
>
> Has anyone done any kind of "smart" autocomplete? Let's say we have a web
> store, and we'd like to autocomplete user's searches. So if I'll type in
> "jacket" next word that will be suggested should be something related to
> jacket (color, fabric) etc...
>
> It seems to me I have to structure this data in a particular way, but that
> way I can do without solr, so I was wondering if Solr could help us.
>
> Thank you in advance.


Re: Semantic autocomplete with Solr

2012-02-14 Thread Octavian Covalschi
Hm... I used it for some basic group by feature, but haven't thought of it
for autocomplete. I'll give it a shot.

Thanks!


On Tue, Feb 14, 2012 at 4:19 PM, Paul Libbrecht  wrote:

> facetting?
>
> paul
>
>
> Le 14 févr. 2012 à 23:10, Octavian Covalschi a écrit :
>
> > Hey guys,
> >
> > Has anyone done any kind of "smart" autocomplete? Let's say we have a web
> > store, and we'd like to autocomplete user's searches. So if I'll type in
> > "jacket" next word that will be suggested should be something related to
> > jacket (color, fabric) etc...
> >
> > It seems to me I have to structure this data in a particular way, but
> that
> > way I can do without solr, so I was wondering if Solr could help us.
> >
> > Thank you in advance.
>
>


Re: Can I rebuild an index and remove some fields?

2012-02-14 Thread Robert Stewart
I was thinking if I make a wrapper class that aggregates another IndexReader 
and filter out terms I don't want anymore it might work.   And then pass that 
wrapper into SegmentMerger.  I think if I filter out terms on 
GetFieldNames(...) and Terms(...) it might work.

Something like:

HashSet ignoredTerms=...;

FilteringIndexReader wrapper=new FilterIndexReader(reader);

SegmentMerger merger=new SegmentMerger(writer);

merger.add(wrapper);

merger.Merge();





On Feb 14, 2012, at 1:49 AM, Li Li wrote:

> for method 2, delete is wrong. we can't delete terms.
>   you also should hack with the tii and tis file.
> 
> On Tue, Feb 14, 2012 at 2:46 PM, Li Li  wrote:
> 
>> method1, dumping data
>> for stored fields, you can traverse the whole index and save it to
>> somewhere else.
>> for indexed but not stored fields, it may be more difficult.
>>if the indexed and not stored field is not analyzed(fields such as
>> id), it's easy to get from FieldCache.StringIndex.
>>But for analyzed fields, though theoretically it can be restored from
>> term vector and term position, it's hard to recover from index.
>> 
>> method 2, hack with metadata
>> 1. indexed fields
>>  delete by query, e.g. field:*
>> 2. stored fields
>>   because all fields are stored sequentially. it's not easy to delete
>> some fields. this will not affect search speed. but if you want to get
>> stored fields,  and the useless fields are very long, then it will slow
>> down.
>>   also it's possible to hack with it. but need more effort to
>> understand the index file format  and traverse the fdt/fdx file.
>> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html
>> 
>> this will give you some insight.
>> 
>> 
>> On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart wrote:
>> 
>>> Lets say I have a large index (100M docs, 1TB, split up between 10
>>> indexes).  And a bunch of the "stored" and "indexed" fields are not used in
>>> search at all.  In order to save memory and disk, I'd like to rebuild that
>>> index *without* those fields, but I don't have original documents to
>>> rebuild entire index with (don't have the full-text anymore, etc.).  Is
>>> there some way to rebuild or optimize an existing index with only a sub-set
>>> of the existing indexed fields?  Or alternatively is there a way to avoid
>>> loading some indexed fields at all ( to avoid loading term infos and terms
>>> index ) ?
>>> 
>>> Thanks
>>> Bob
>> 
>> 
>> 



Re: Semantic autocomplete with Solr

2012-02-14 Thread Paul Libbrecht
facetting?

paul


Le 14 févr. 2012 à 23:10, Octavian Covalschi a écrit :

> Hey guys,
> 
> Has anyone done any kind of "smart" autocomplete? Let's say we have a web
> store, and we'd like to autocomplete user's searches. So if I'll type in
> "jacket" next word that will be suggested should be something related to
> jacket (color, fabric) etc...
> 
> It seems to me I have to structure this data in a particular way, but that
> way I can do without solr, so I was wondering if Solr could help us.
> 
> Thank you in advance.



Re: Need help with graphing function (MATH)

2012-02-14 Thread Kent Fitch
agreeing with wunder - I don't know the application, but I think almost
always, a set of linear approximations over a few ranges would be ok (and
you could increase the number of ranges until it was), and will be faster.

And if you need just one equation, a sigmoid function will do the trick,
such as

110 - 50((x-100)/20)/(sqrt(1+((x-100)/20)^2))

http://www.wolframalpha.com/input/?i=plot+110+-+50%28%28x-100%29%2F20%29%2F%28sqrt%281%2B%28%28x-100%29%2F20%29
^2%29%29%2C+x%3D0..200

Regards

Kent Fitch

On Wed, Feb 15, 2012 at 6:17 AM, Walter Underwood wrote:

> In practice, I expect a linear piecewise function (with sharp corners)
> would be indistinguishable from the smoothed function. It is also much
> easier to read, test, and debug. It might even be faster.
>
> Try the sharp corners one first.
>
> wunder
>
> On Feb 14, 2012, at 10:56 AM, Ted Dunning wrote:
>
> > In general this kind of function is very easy to construct using sums of
> basic sigmoidal functions. The logistic and probit functions are commonly
> used for this.
> >
> > Sent from my iPhone
> >
> > On Feb 14, 2012, at 10:05, Mark  wrote:
> >
> >> Thanks I'll have a look at this. I should have mentioned that the
> actual values on the graph aren't important rather I was showing an example
> of how the function should behave.
> >>
> >> On 2/13/12 6:25 PM, Kent Fitch wrote:
> >>> Hi, assuming you have x and want to generate y, then maybe
> >>>
> >>> - if x < 50, y = 150
> >>>
> >>> - if x > 175, y = 60
> >>>
> >>> - otherwise :
> >>>
> >>> either y = (100/(e^((x -50)/75)^2)) + 50
> >>> http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e
> ^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175
> >>>
> >>>
> >>> - or maybe y =sin((x+5)/38)*42+105
> >>>
> >>>
> http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175
> >>>
> >>> Regards,
> >>>
> >>> Kent Fitch
> >>>
> >>> On Tue, Feb 14, 2012 at 12:29 PM, Mark  static.void@gmail.com>> wrote:
> >>>
> >>>   I need some help with one of my boost functions. I would like the
> >>>   function to look something like the following mockup below. Starts
> >>>   off flat then there is a gradual decline, steep decline then
> >>>   gradual decline and then back to flat.
> >>>
> >>>   Can some of you math guys please help :)
> >>>
> >>>   Thanks.
> >>>
>
>
>
>
>


Re: SolrCloud Replication Question

2012-02-14 Thread Mark Miller
Doh - looks like I was just seeing a test issue. Do you mind updating and 
trying the latest rev? At the least there should be some better logging around 
the recovery.

I'll keep working on tests in the meantime.

- Mark

On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote:

> Sounds good, if I pull the latest from trunk and rerun will that be
> useful or were you able to duplicate my issue now?
> 
> On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller  wrote:
>> Okay Jamie, I think I have a handle on this. It looks like an issue with 
>> what config files are being used by cores created with the admin core 
>> handler - I think it's just picking up default config and not the correct 
>> config for the collection. This means they end up using config that has no 
>> UpdateLog defined - and so recovery fails.
>> 
>> I've added more logging around this so that it's easy to determine that.
>> 
>> I'm investigating more and working on a test + fix. I'll file a JIRA issue 
>> soon as well.
>> 
>> - Mark
>> 
>> On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:
>> 
>>> Thanks Mark, not a huge rush, just me trying to get to use the latest
>>> stuff on our project.
>>> 
>>> On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller  wrote:
 Sorry, have not gotten it yet, but will be back trying later today - 
 monday, tuesday tend to be slow for me (meetings and crap).
 
 - Mark
 
 On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
 
> Has there been any success in replicating this?  I'm wondering if it
> could be something with my setup that is causing the issue...
> 
> 
> On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
>> Yes, I have the following layout on the FS
>> 
>> ./bootstrap.sh
>> ./example (standard example directory from distro containing jetty
>> jars, solr confs, solr war, etc)
>> ./slice1
>>  - start.sh
>>  -solr.xml
>>  - slice1_shard1
>>   - data
>>  - slice2_shard2
>>   -data
>> ./slice2
>>  - start.sh
>>  - solr.xml
>>  -slice2_shard1
>>-data
>>  -slice1_shard2
>>-data
>> 
>> if it matters I'm running everything from localhost, zk and the solr 
>> shards
>> 
>> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
>>> Do you have unique dataDir for each instance?
>>> 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

- Mark Miller
lucidimagination.com













Re: OR-FilterQuery

2012-02-14 Thread Em
Hi Erick,

> Whoa!
>
> fq=id(1 OR 2)
> is not the same thing at all as
> fq=id:1&fq=id:2
Ahm, who said they would be the same? :)
I mean, you are completely right in what you are saying but it seems to
me that we are talking about two different things.

I was talking about caching each filter-criteria instead of the whole
filter-query to recombine the cached filter-criteria based on the
boolean-operators the client sends.

In other words:
currently
fq=id:1 OR id:2
results into ONE cached filter-entry.

fq=id:2 OR id:1
results into ANOTHER cached filter-entry

fq=id:2 AND id:1
results into (surprise, surprise) a third filter-entry (although this
example does not make sense).

My idea was to cache each filter-criteria, that means caching the bitset
for id:1 and the bitset for id:2 to recombine both bitsets via AND, OR,
NOT etc. whenever this is neccessary.

This way one could save memory (and maybe computing-time as well) which
definitely makes sense when you got a way smaller set of
filter-criterias while having a much larger set of possible (and used)
combinations of each filter-criteria with a small number of repetitions
per combination (which would destroy the benefit of caching).

Don't you agree?

Kind regards,
Em


Am 14.02.2012 22:33, schrieb Erick Erickson:
> Whoa!
> 
> fq=id(1 OR 2)
> is not the same thing at all as
> fq=id:1&fq=id:2
> 
> Assuming that any document had one and only one ID,  the second clause
> would return exactly 0 documents, each and every time.
> 
> Multiple fq clauses are essentially set intersections. So the first query is 
> the
> set of all documents where id is 1 or 2
> the second is the intersection of two sets of documents, one set
> with an id of 1 and one with an id of 2. Not the same thing at all.
> 
> There's no support for the concept of
> (fq=id:1 OR fq=id:2)
> 
> Best
> Erick
> 
> On Tue, Feb 14, 2012 at 2:13 PM, Em  wrote:
>> Hi Mikhail,
>>
>> thanks for kicking in some brainstorming-code!
>> The given thread is almost a year old and I was working with Solr in my
>> freetime to see where it fails to behave/perform as I expect/wish.
>>
>> I found out that if you got a lot of different access-patterns for a
>> filter-query, you might end up with either a big cache to make things
>> fast or with lower performance (impact depends on usecase and
>> circumstances).
>>
>> Scenario:
>> You got a permission-field and the client is able to filter by one to
>> three permission-values.
>> That is:
>> fq=foo:user
>> fq=foo:moderator
>> fq=foo:manager
>>
>> If you can not control/guarantee the order of the fq's values, you could
>> end up with a lot of mess which all returns the same.
>>
>> Example:
>> fq=permission:user OR permission:moderator OR permission:manager
>> fq=permission:user OR permission:manager OR permission:moderator
>> fq=permission:moderator OR permission:user OR permission:manager
>> ...
>> They all return the same but where cached seperately which leads to the
>> fact that you are wasting memory a lot.
>>
>> Furthermore, if your access pattern will lead to a lot of different fq's
>> on a small set of distinct values, it may make more sense to cache each
>> filter-query for itself from a memory-consuming point of view (may cost
>> a little bit performance).
>>
>> That beeing said, if you cache a filter for foo:user, foo:moderator and
>> foo:manager you can combine those filters with AND, OR, NOT or whatever
>> without recomputing every filter over and over again which would be the
>> case if your filter-cache is not large enough.
>>
>> However, I never compared the performance differences (in terms of
>> speed) of a cached filter-query like
>> foo:bar OR foo:baz
>> With a combination of two cached filter-queries like
>> foo:bar
>> foo:baz
>> combined by a logical OR.
>>
>> That's how the background looks like.
>> Unfortunately I didn't had the time to implement this in the past.
>>
>> Back to your post:
>> Looks like a cool idea and is almost what I had in mind!
>>
>> I would formulate an easier syntax so that one is able to "parse" each
>> fq-clause on its own to cache the CachingWrapperFilter to reuse it again.
>>
>>> it will use per segment bitset at contrast to Solr's fq which caches for
>>> top level reader.
>> Could you explain why this bitset would be per-segment based, please?
>> I don't see a reason why this *have* to be so.
>> What is the benefit you are seeing?
>>
>> Kind regards,
>> Em
>>
>> Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
>>> Hi Em,
>>>
>>> I briefly read the thread. Are you talking about combing of cached clauses
>>> of BooleanQuery, instead of evaluating whole BQ as a filter?
>>>
>>> I found something like that in API (but only in API)
>>> http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)
>>>
>>> Am I get you right? Why do you need it, btw? If I'm ..
>>> I have idea how to do it in two mins:
>>>
>>> q=+f:text
>>> +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 
>>> _query_:{!

Re: OR-FilterQuery

2012-02-14 Thread Erick Erickson
BTW, you're not the first person who would like this capability, see:
https://issues.apache.org/jira/browse/SOLR-1223

But the fact that this JIRA was originally opened in in June of 2009
and hasn't been implemented yet indicates that it's not  super-high
priority.

Best
Erick

On Tue, Feb 14, 2012 at 4:33 PM, Erick Erickson  wrote:
> Whoa!
>
> fq=id(1 OR 2)
> is not the same thing at all as
> fq=id:1&fq=id:2
>
> Assuming that any document had one and only one ID,  the second clause
> would return exactly 0 documents, each and every time.
>
> Multiple fq clauses are essentially set intersections. So the first query is 
> the
> set of all documents where id is 1 or 2
> the second is the intersection of two sets of documents, one set
> with an id of 1 and one with an id of 2. Not the same thing at all.
>
> There's no support for the concept of
> (fq=id:1 OR fq=id:2)
>
> Best
> Erick
>
> On Tue, Feb 14, 2012 at 2:13 PM, Em  wrote:
>> Hi Mikhail,
>>
>> thanks for kicking in some brainstorming-code!
>> The given thread is almost a year old and I was working with Solr in my
>> freetime to see where it fails to behave/perform as I expect/wish.
>>
>> I found out that if you got a lot of different access-patterns for a
>> filter-query, you might end up with either a big cache to make things
>> fast or with lower performance (impact depends on usecase and
>> circumstances).
>>
>> Scenario:
>> You got a permission-field and the client is able to filter by one to
>> three permission-values.
>> That is:
>> fq=foo:user
>> fq=foo:moderator
>> fq=foo:manager
>>
>> If you can not control/guarantee the order of the fq's values, you could
>> end up with a lot of mess which all returns the same.
>>
>> Example:
>> fq=permission:user OR permission:moderator OR permission:manager
>> fq=permission:user OR permission:manager OR permission:moderator
>> fq=permission:moderator OR permission:user OR permission:manager
>> ...
>> They all return the same but where cached seperately which leads to the
>> fact that you are wasting memory a lot.
>>
>> Furthermore, if your access pattern will lead to a lot of different fq's
>> on a small set of distinct values, it may make more sense to cache each
>> filter-query for itself from a memory-consuming point of view (may cost
>> a little bit performance).
>>
>> That beeing said, if you cache a filter for foo:user, foo:moderator and
>> foo:manager you can combine those filters with AND, OR, NOT or whatever
>> without recomputing every filter over and over again which would be the
>> case if your filter-cache is not large enough.
>>
>> However, I never compared the performance differences (in terms of
>> speed) of a cached filter-query like
>> foo:bar OR foo:baz
>> With a combination of two cached filter-queries like
>> foo:bar
>> foo:baz
>> combined by a logical OR.
>>
>> That's how the background looks like.
>> Unfortunately I didn't had the time to implement this in the past.
>>
>> Back to your post:
>> Looks like a cool idea and is almost what I had in mind!
>>
>> I would formulate an easier syntax so that one is able to "parse" each
>> fq-clause on its own to cache the CachingWrapperFilter to reuse it again.
>>
>>> it will use per segment bitset at contrast to Solr's fq which caches for
>>> top level reader.
>> Could you explain why this bitset would be per-segment based, please?
>> I don't see a reason why this *have* to be so.
>> What is the benefit you are seeing?
>>
>> Kind regards,
>> Em
>>
>> Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
>>> Hi Em,
>>>
>>> I briefly read the thread. Are you talking about combing of cached clauses
>>> of BooleanQuery, instead of evaluating whole BQ as a filter?
>>>
>>> I found something like that in API (but only in API)
>>> http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)
>>>
>>> Am I get you right? Why do you need it, btw? If I'm ..
>>> I have idea how to do it in two mins:
>>>
>>> q=+f:text
>>> +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 
>>> _query_:{!fq}id:4)...
>>>
>>> Right leg will be a BooleanQuery with SHOULD clauses backed on cached
>>> queries (see below).
>>>
>>> if you are not scarred by the syntax yet you can implement trivial
>>> "fq"QParserPlugin, which will be just
>>>
>>> // lazily through User/Generic Cache
>>> q = new FilteredQuery (new MatchAllDocsQuery(), new
>>> CachingWrapperFilter(new
>>> QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V);
>>> return q;
>>>
>>> it will use per segment bitset at contrast to Solr's fq which caches for
>>> top level reader.
>>>
>>> WDYT?
>>>
>>> On Mon, Feb 13, 2012 at 11:34 PM, Em  wrote:
>>>
 Hi,

 have a look at:
 http://search-lucene.com/m/Z8lWGEiKoI

 I think not much had changed since then.

 Regards,
 Em

 Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
> Hi,
>
> how efficent is such an query:
>
> q=some text
> fq=id:(1 OR 2 OR 3...)
>
> Should I 

Re: OR-FilterQuery

2012-02-14 Thread Erick Erickson
Whoa!

fq=id(1 OR 2)
is not the same thing at all as
fq=id:1&fq=id:2

Assuming that any document had one and only one ID,  the second clause
would return exactly 0 documents, each and every time.

Multiple fq clauses are essentially set intersections. So the first query is the
set of all documents where id is 1 or 2
the second is the intersection of two sets of documents, one set
with an id of 1 and one with an id of 2. Not the same thing at all.

There's no support for the concept of
(fq=id:1 OR fq=id:2)

Best
Erick

On Tue, Feb 14, 2012 at 2:13 PM, Em  wrote:
> Hi Mikhail,
>
> thanks for kicking in some brainstorming-code!
> The given thread is almost a year old and I was working with Solr in my
> freetime to see where it fails to behave/perform as I expect/wish.
>
> I found out that if you got a lot of different access-patterns for a
> filter-query, you might end up with either a big cache to make things
> fast or with lower performance (impact depends on usecase and
> circumstances).
>
> Scenario:
> You got a permission-field and the client is able to filter by one to
> three permission-values.
> That is:
> fq=foo:user
> fq=foo:moderator
> fq=foo:manager
>
> If you can not control/guarantee the order of the fq's values, you could
> end up with a lot of mess which all returns the same.
>
> Example:
> fq=permission:user OR permission:moderator OR permission:manager
> fq=permission:user OR permission:manager OR permission:moderator
> fq=permission:moderator OR permission:user OR permission:manager
> ...
> They all return the same but where cached seperately which leads to the
> fact that you are wasting memory a lot.
>
> Furthermore, if your access pattern will lead to a lot of different fq's
> on a small set of distinct values, it may make more sense to cache each
> filter-query for itself from a memory-consuming point of view (may cost
> a little bit performance).
>
> That beeing said, if you cache a filter for foo:user, foo:moderator and
> foo:manager you can combine those filters with AND, OR, NOT or whatever
> without recomputing every filter over and over again which would be the
> case if your filter-cache is not large enough.
>
> However, I never compared the performance differences (in terms of
> speed) of a cached filter-query like
> foo:bar OR foo:baz
> With a combination of two cached filter-queries like
> foo:bar
> foo:baz
> combined by a logical OR.
>
> That's how the background looks like.
> Unfortunately I didn't had the time to implement this in the past.
>
> Back to your post:
> Looks like a cool idea and is almost what I had in mind!
>
> I would formulate an easier syntax so that one is able to "parse" each
> fq-clause on its own to cache the CachingWrapperFilter to reuse it again.
>
>> it will use per segment bitset at contrast to Solr's fq which caches for
>> top level reader.
> Could you explain why this bitset would be per-segment based, please?
> I don't see a reason why this *have* to be so.
> What is the benefit you are seeing?
>
> Kind regards,
> Em
>
> Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
>> Hi Em,
>>
>> I briefly read the thread. Are you talking about combing of cached clauses
>> of BooleanQuery, instead of evaluating whole BQ as a filter?
>>
>> I found something like that in API (but only in API)
>> http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)
>>
>> Am I get you right? Why do you need it, btw? If I'm ..
>> I have idea how to do it in two mins:
>>
>> q=+f:text
>> +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 _query_:{!fq}id:4)...
>>
>> Right leg will be a BooleanQuery with SHOULD clauses backed on cached
>> queries (see below).
>>
>> if you are not scarred by the syntax yet you can implement trivial
>> "fq"QParserPlugin, which will be just
>>
>> // lazily through User/Generic Cache
>> q = new FilteredQuery (new MatchAllDocsQuery(), new
>> CachingWrapperFilter(new
>> QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V);
>> return q;
>>
>> it will use per segment bitset at contrast to Solr's fq which caches for
>> top level reader.
>>
>> WDYT?
>>
>> On Mon, Feb 13, 2012 at 11:34 PM, Em  wrote:
>>
>>> Hi,
>>>
>>> have a look at:
>>> http://search-lucene.com/m/Z8lWGEiKoI
>>>
>>> I think not much had changed since then.
>>>
>>> Regards,
>>> Em
>>>
>>> Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
 Hi,

 how efficent is such an query:

 q=some text
 fq=id:(1 OR 2 OR 3...)

 Should I better use q:some text AND id:(1 OR 2 OR 3...)?

 Is the Filter Cache used for the OR'ed fq?

 Thank you


>>>
>>
>>
>>


Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
Sounds good, if I pull the latest from trunk and rerun will that be
useful or were you able to duplicate my issue now?

On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller  wrote:
> Okay Jamie, I think I have a handle on this. It looks like an issue with what 
> config files are being used by cores created with the admin core handler - I 
> think it's just picking up default config and not the correct config for the 
> collection. This means they end up using config that has no UpdateLog defined 
> - and so recovery fails.
>
> I've added more logging around this so that it's easy to determine that.
>
> I'm investigating more and working on a test + fix. I'll file a JIRA issue 
> soon as well.
>
> - Mark
>
> On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:
>
>> Thanks Mark, not a huge rush, just me trying to get to use the latest
>> stuff on our project.
>>
>> On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller  wrote:
>>> Sorry, have not gotten it yet, but will be back trying later today - 
>>> monday, tuesday tend to be slow for me (meetings and crap).
>>>
>>> - Mark
>>>
>>> On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
>>>
 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...


 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
> Yes, I have the following layout on the FS
>
> ./bootstrap.sh
> ./example (standard example directory from distro containing jetty
> jars, solr confs, solr war, etc)
> ./slice1
>  - start.sh
>  -solr.xml
>  - slice1_shard1
>   - data
>  - slice2_shard2
>   -data
> ./slice2
>  - start.sh
>  - solr.xml
>  -slice2_shard1
>    -data
>  -slice1_shard2
>    -data
>
> if it matters I'm running everything from localhost, zk and the solr 
> shards
>
> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
>> Do you have unique dataDir for each instance?
>> 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:
>>>
>>> - Mark Miller
>>> lucidimagination.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: SolrCloud Replication Question

2012-02-14 Thread Mark Miller
Okay Jamie, I think I have a handle on this. It looks like an issue with what 
config files are being used by cores created with the admin core handler - I 
think it's just picking up default config and not the correct config for the 
collection. This means they end up using config that has no UpdateLog defined - 
and so recovery fails.

I've added more logging around this so that it's easy to determine that.

I'm investigating more and working on a test + fix. I'll file a JIRA issue soon 
as well.

- Mark

On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:

> Thanks Mark, not a huge rush, just me trying to get to use the latest
> stuff on our project.
> 
> On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller  wrote:
>> Sorry, have not gotten it yet, but will be back trying later today - monday, 
>> tuesday tend to be slow for me (meetings and crap).
>> 
>> - Mark
>> 
>> On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
>> 
>>> Has there been any success in replicating this?  I'm wondering if it
>>> could be something with my setup that is causing the issue...
>>> 
>>> 
>>> On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
 Yes, I have the following layout on the FS
 
 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
-data
  -slice1_shard2
-data
 
 if it matters I'm running everything from localhost, zk and the solr shards
 
 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
> Do you have unique dataDir for each instance?
> 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

- Mark Miller
lucidimagination.com













Re: Solr 3.5 not starting on CentOS 6 or RHEL 5

2012-02-14 Thread Yonik Seeley
Perhaps this is some kind of vufind specific issue?
The server (/example) bundled with solr unpacks the war in
/example/work and not /tmp

-Yonik
lucidimagination.com

On Mon, Feb 13, 2012 at 7:06 PM, Bernhardt, Russell (CIV)
 wrote:
> A software package we use recently upgraded to Solr 3.5 (from 1.4.1) and now 
> we're having problems getting the Solr server to start up under RHEL 5 or 
> CentOS 6.
>
> I upgraded our local install of Java to the latest from Oracle and it didn't 
> help, even removed the local OpenJDK just to be sure.
>
> When starting jetty manually (with java -jar start.jar) I get the following 
> messages:
>
> 2012-02-13 07:52:55.954::INFO:  Logging to STDERR via 
> org.mortbay.log.StdErrLog
> 2012-02-13 07:52:56.120::INFO:  jetty-6.1.11
> 2012-02-13 07:52:56.184::INFO:  Extract 
> jar:file:/opt/vufind/solr/jetty/webapps/solr.war!/ to 
> /tmp/Jetty_0_0_0_0_8080_solr.war__solr__7k9npr/webapp
> 2012-02-13 07:52:56.702::WARN:  Failed startup of context 
> org.mortbay.jetty.webapp.WebAppContext@15aaf0b3{/solr,jar:file:/opt/vufind/solr/jetty/webapps/solr.war!/}
> java.util.zip.ZipException: error in opening zip file
>        at java.util.zip.ZipFile.open(Native Method)
>        at java.util.zip.ZipFile.(Unknown Source)
>        at java.util.jar.JarFile.(Unknown Source)
>        at java.util.jar.JarFile.(Unknown Source)
>        at 
> org.mortbay.jetty.webapp.TagLibConfiguration.configureWebApp(TagLibConfiguration.java:168)
>        at 
> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1217)
>        at 
> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:513)
>        at 
> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
>        at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39)
>        at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>        at 
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
>        at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39)
>        at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>        at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39)
>        at 
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
>        at org.mortbay.jetty.Server.doStart(Server.java:222)
>        at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39)
>        at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:977)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>        at java.lang.reflect.Method.invoke(Unknown Source)
>        at org.mortbay.start.Main.invokeMain(Main.java:194)
>        at org.mortbay.start.Main.start(Main.java:512)
>        at org.mortbay.start.Main.main(Main.java:119)
> 2012-02-13 07:52:56.713::INFO:  Opened 
> /opt/vufind/solr/jetty/logs/2012_02_13.request.log
> 2012-02-13 07:52:56.740::INFO:  Started SelectChannelConnector@0.0.0.0:8080
>
> Jetty starts up just fine but shows a 503 error when attempting to access 
> localhost:8080/solr/. The temp directory structure does exist in /tmp/. Any 
> ideas?
>
> Thanks,
>
> Russ Bernhardt
> Systems Analyst
> Library Information Systems
> Naval Postgraduate School, Monterey CA
>


Re: Need help with graphing function (MATH)

2012-02-14 Thread Walter Underwood
In practice, I expect a linear piecewise function (with sharp corners) would be 
indistinguishable from the smoothed function. It is also much easier to read, 
test, and debug. It might even be faster.

Try the sharp corners one first.

wunder

On Feb 14, 2012, at 10:56 AM, Ted Dunning wrote:

> In general this kind of function is very easy to construct using sums of 
> basic sigmoidal functions. The logistic and probit functions are commonly 
> used for this. 
> 
> Sent from my iPhone
> 
> On Feb 14, 2012, at 10:05, Mark  wrote:
> 
>> Thanks I'll have a look at this. I should have mentioned that the actual 
>> values on the graph aren't important rather I was showing an example of how 
>> the function should behave.
>> 
>> On 2/13/12 6:25 PM, Kent Fitch wrote:
>>> Hi, assuming you have x and want to generate y, then maybe
>>> 
>>> - if x < 50, y = 150
>>> 
>>> - if x > 175, y = 60
>>> 
>>> - otherwise :
>>> 
>>> either y = (100/(e^((x -50)/75)^2)) + 50
>>> http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175
>>> 
>>> 
>>> - or maybe y =sin((x+5)/38)*42+105
>>> 
>>> http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175
>>> 
>>> Regards,
>>> 
>>> Kent Fitch
>>> 
>>> On Tue, Feb 14, 2012 at 12:29 PM, Mark >> > wrote:
>>> 
>>>   I need some help with one of my boost functions. I would like the
>>>   function to look something like the following mockup below. Starts
>>>   off flat then there is a gradual decline, steep decline then
>>>   gradual decline and then back to flat.
>>> 
>>>   Can some of you math guys please help :)
>>> 
>>>   Thanks.
>>> 






Re: Need help with graphing function (MATH)

2012-02-14 Thread Em
Hi Mark,

did you already had a look at http://wiki.apache.org/solr/FunctionQuery ?

Regards,
Em

Am 14.02.2012 20:09, schrieb Mark:
> Or better yet an example in solr would be best :)
> 
> Thanks!
> 
> On 2/14/12 11:05 AM, Mark wrote:
>> Would you mind throwing out an example of these types of functions.
>> Looking at Wikipedia (http://en.wikipedia.org/wiki/Probit) its seems
>> like the Probit function is very similar to what I want.
>>
>> Thanks
>>
>> On 2/14/12 10:56 AM, Ted Dunning wrote:
>>> In general this kind of function is very easy to construct using sums
>>> of basic sigmoidal functions. The logistic and probit functions are
>>> commonly used for this.
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 14, 2012, at 10:05, Mark  wrote:
>>>
 Thanks I'll have a look at this. I should have mentioned that the
 actual values on the graph aren't important rather I was showing an
 example of how the function should behave.

 On 2/13/12 6:25 PM, Kent Fitch wrote:
> Hi, assuming you have x and want to generate y, then maybe
>
> - if x<  50, y = 150
>
> - if x>  175, y = 60
>
> - otherwise :
>
> either y = (100/(e^((x -50)/75)^2)) + 50
> http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175
>
>
>
> - or maybe y =sin((x+5)/38)*42+105
>
> http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175
>
>
> Regards,
>
> Kent Fitch
>
> On Tue, Feb 14, 2012 at 12:29 PM,
> Markmailto:static.void@gmail.com>> 
> wrote:
>
> I need some help with one of my boost functions. I would like the
> function to look something like the following mockup below. Starts
> off flat then there is a gradual decline, steep decline then
> gradual decline and then back to flat.
>
> Can some of you math guys please help :)
>
> Thanks.
>
>
>
>
> 


Re: OR-FilterQuery

2012-02-14 Thread Em
Hi Mikhail,

thanks for kicking in some brainstorming-code!
The given thread is almost a year old and I was working with Solr in my
freetime to see where it fails to behave/perform as I expect/wish.

I found out that if you got a lot of different access-patterns for a
filter-query, you might end up with either a big cache to make things
fast or with lower performance (impact depends on usecase and
circumstances).

Scenario:
You got a permission-field and the client is able to filter by one to
three permission-values.
That is:
fq=foo:user
fq=foo:moderator
fq=foo:manager

If you can not control/guarantee the order of the fq's values, you could
end up with a lot of mess which all returns the same.

Example:
fq=permission:user OR permission:moderator OR permission:manager
fq=permission:user OR permission:manager OR permission:moderator
fq=permission:moderator OR permission:user OR permission:manager
...
They all return the same but where cached seperately which leads to the
fact that you are wasting memory a lot.

Furthermore, if your access pattern will lead to a lot of different fq's
on a small set of distinct values, it may make more sense to cache each
filter-query for itself from a memory-consuming point of view (may cost
a little bit performance).

That beeing said, if you cache a filter for foo:user, foo:moderator and
foo:manager you can combine those filters with AND, OR, NOT or whatever
without recomputing every filter over and over again which would be the
case if your filter-cache is not large enough.

However, I never compared the performance differences (in terms of
speed) of a cached filter-query like
foo:bar OR foo:baz
With a combination of two cached filter-queries like
foo:bar
foo:baz
combined by a logical OR.

That's how the background looks like.
Unfortunately I didn't had the time to implement this in the past.

Back to your post:
Looks like a cool idea and is almost what I had in mind!

I would formulate an easier syntax so that one is able to "parse" each
fq-clause on its own to cache the CachingWrapperFilter to reuse it again.

> it will use per segment bitset at contrast to Solr's fq which caches for
> top level reader.
Could you explain why this bitset would be per-segment based, please?
I don't see a reason why this *have* to be so.
What is the benefit you are seeing?

Kind regards,
Em

Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
> Hi Em,
> 
> I briefly read the thread. Are you talking about combing of cached clauses
> of BooleanQuery, instead of evaluating whole BQ as a filter?
> 
> I found something like that in API (but only in API)
> http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)
> 
> Am I get you right? Why do you need it, btw? If I'm ..
> I have idea how to do it in two mins:
> 
> q=+f:text
> +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 _query_:{!fq}id:4)...
> 
> Right leg will be a BooleanQuery with SHOULD clauses backed on cached
> queries (see below).
> 
> if you are not scarred by the syntax yet you can implement trivial
> "fq"QParserPlugin, which will be just
> 
> // lazily through User/Generic Cache
> q = new FilteredQuery (new MatchAllDocsQuery(), new
> CachingWrapperFilter(new
> QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V);
> return q;
> 
> it will use per segment bitset at contrast to Solr's fq which caches for
> top level reader.
> 
> WDYT?
> 
> On Mon, Feb 13, 2012 at 11:34 PM, Em  wrote:
> 
>> Hi,
>>
>> have a look at:
>> http://search-lucene.com/m/Z8lWGEiKoI
>>
>> I think not much had changed since then.
>>
>> Regards,
>> Em
>>
>> Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
>>> Hi,
>>>
>>> how efficent is such an query:
>>>
>>> q=some text
>>> fq=id:(1 OR 2 OR 3...)
>>>
>>> Should I better use q:some text AND id:(1 OR 2 OR 3...)?
>>>
>>> Is the Filter Cache used for the OR'ed fq?
>>>
>>> Thank you
>>>
>>>
>>
> 
> 
> 


Re: Need help with graphing function (MATH)

2012-02-14 Thread Mark

Or better yet an example in solr would be best :)

Thanks!

On 2/14/12 11:05 AM, Mark wrote:
Would you mind throwing out an example of these types of functions. 
Looking at Wikipedia (http://en.wikipedia.org/wiki/Probit) its seems 
like the Probit function is very similar to what I want.


Thanks

On 2/14/12 10:56 AM, Ted Dunning wrote:
In general this kind of function is very easy to construct using sums 
of basic sigmoidal functions. The logistic and probit functions are 
commonly used for this.


Sent from my iPhone

On Feb 14, 2012, at 10:05, Mark  wrote:

Thanks I'll have a look at this. I should have mentioned that the 
actual values on the graph aren't important rather I was showing an 
example of how the function should behave.


On 2/13/12 6:25 PM, Kent Fitch wrote:

Hi, assuming you have x and want to generate y, then maybe

- if x<  50, y = 150

- if x>  175, y = 60

- otherwise :

either y = (100/(e^((x -50)/75)^2)) + 50
http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175 




- or maybe y =sin((x+5)/38)*42+105

http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175 



Regards,

Kent Fitch

On Tue, Feb 14, 2012 at 12:29 PM, 
Markmailto:static.void@gmail.com>>  
wrote:


I need some help with one of my boost functions. I would like the
function to look something like the following mockup below. Starts
off flat then there is a gradual decline, steep decline then
gradual decline and then back to flat.

Can some of you math guys please help :)

Thanks.






Re: Need help with graphing function (MATH)

2012-02-14 Thread Mark
Would you mind throwing out an example of these types of functions. 
Looking at Wikipedia (http://en.wikipedia.org/wiki/Probit) its seems 
like the Probit function is very similar to what I want.


Thanks

On 2/14/12 10:56 AM, Ted Dunning wrote:

In general this kind of function is very easy to construct using sums of basic 
sigmoidal functions. The logistic and probit functions are commonly used for 
this.

Sent from my iPhone

On Feb 14, 2012, at 10:05, Mark  wrote:


Thanks I'll have a look at this. I should have mentioned that the actual values 
on the graph aren't important rather I was showing an example of how the 
function should behave.

On 2/13/12 6:25 PM, Kent Fitch wrote:

Hi, assuming you have x and want to generate y, then maybe

- if x<  50, y = 150

- if x>  175, y = 60

- otherwise :

either y = (100/(e^((x -50)/75)^2)) + 50
http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175


- or maybe y =sin((x+5)/38)*42+105

http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175

Regards,

Kent Fitch

On Tue, Feb 14, 2012 at 12:29 PM, 
Markmailto:static.void@gmail.com>>  wrote:

I need some help with one of my boost functions. I would like the
function to look something like the following mockup below. Starts
off flat then there is a gradual decline, steep decline then
gradual decline and then back to flat.

Can some of you math guys please help :)

Thanks.






Re: Need help with graphing function (MATH)

2012-02-14 Thread Ted Dunning
In general this kind of function is very easy to construct using sums of basic 
sigmoidal functions. The logistic and probit functions are commonly used for 
this. 

Sent from my iPhone

On Feb 14, 2012, at 10:05, Mark  wrote:

> Thanks I'll have a look at this. I should have mentioned that the actual 
> values on the graph aren't important rather I was showing an example of how 
> the function should behave.
> 
> On 2/13/12 6:25 PM, Kent Fitch wrote:
>> Hi, assuming you have x and want to generate y, then maybe
>> 
>> - if x < 50, y = 150
>> 
>> - if x > 175, y = 60
>> 
>> - otherwise :
>> 
>> either y = (100/(e^((x -50)/75)^2)) + 50
>> http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175
>> 
>> 
>> - or maybe y =sin((x+5)/38)*42+105
>> 
>> http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175
>> 
>> Regards,
>> 
>> Kent Fitch
>> 
>> On Tue, Feb 14, 2012 at 12:29 PM, Mark > > wrote:
>> 
>>I need some help with one of my boost functions. I would like the
>>function to look something like the following mockup below. Starts
>>off flat then there is a gradual decline, steep decline then
>>gradual decline and then back to flat.
>> 
>>Can some of you math guys please help :)
>> 
>>Thanks.
>> 
>> 
>> 
>> 


Re: OR-FilterQuery

2012-02-14 Thread Mikhail Khludnev
Hi Em,

I briefly read the thread. Are you talking about combing of cached clauses
of BooleanQuery, instead of evaluating whole BQ as a filter?

I found something like that in API (but only in API)
http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)

Am I get you right? Why do you need it, btw? If I'm ..
I have idea how to do it in two mins:

q=+f:text
+(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 _query_:{!fq}id:4)...

Right leg will be a BooleanQuery with SHOULD clauses backed on cached
queries (see below).

if you are not scarred by the syntax yet you can implement trivial
"fq"QParserPlugin, which will be just

// lazily through User/Generic Cache
q = new FilteredQuery (new MatchAllDocsQuery(), new
CachingWrapperFilter(new
QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V);
return q;

it will use per segment bitset at contrast to Solr's fq which caches for
top level reader.

WDYT?

On Mon, Feb 13, 2012 at 11:34 PM, Em  wrote:

> Hi,
>
> have a look at:
> http://search-lucene.com/m/Z8lWGEiKoI
>
> I think not much had changed since then.
>
> Regards,
> Em
>
> Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
> > Hi,
> >
> > how efficent is such an query:
> >
> > q=some text
> > fq=id:(1 OR 2 OR 3...)
> >
> > Should I better use q:some text AND id:(1 OR 2 OR 3...)?
> >
> > Is the Filter Cache used for the OR'ed fq?
> >
> > Thank you
> >
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


Re: Re: Solr 3.5 not starting on CentOS 6 or RHEL 5

2012-02-14 Thread Bernhardt, Russell (CIV)
Nope, I don't have a custom /tmp mount in fstab, I just have a basic CentOS 6 
install for development and testing...
 Full everyone read/write permissions are in place on /tmp too.


> Is /tmp a separate file system? There are problems with people
> mounting /tmp with 'noexec' as a security precaution, which then
> causes Solr to fail.


Russ Bernhardt
Systems Office
Dudley Knox Library, Naval Postgraduate School
Monterey, CA


Re: Need help with graphing function (MATH)

2012-02-14 Thread Gora Mohanty
On 14 February 2012 23:35, Mark  wrote:
> Thanks I'll have a look at this. I should have mentioned that the actual
> values on the graph aren't important rather I was showing an example of how
> the function should behave.
[...]

>> either y = (100/(e^((x -50)/75)^2)) + 50
[...]

In general, the exponential will be better behave than the sinusoid.
You can change the exact values by tweaking the coeffiocients in the
equation.

Regards,
Gora


Re: OR-FilterQuery

2012-02-14 Thread Erick Erickson
bq:  Is the Filter Cache used for the OR'ed fq?

The filter cache is actually pretty simple conceptually. It's
just a map where the key is the fq and the value is the set
of documents that satisfy that fq (we'll skip the implementation
here, just think of it as the list of all the docs that the fq selects).

Solr doesn't attempt to do much with the key, just think of it
as a single string. Whether or not an fq is reused from the
cache depends upon whether the key is in the map.

So fq=id:(1 OR 2 OR 3) will just look to see if
"id:(1 OR 2 OR 3)" is a key. If so, it'll just use the
document list stored in the cache.

It won't match
"id:(1 OR 2)"
or
"id:(2)
or
"id:1 OR id:2 OR id:3"

In other words, there's no attempt to decompose the fq clause
and store parts of it in the cache, it's exact-match or
nothing.

Hope that helps
Erick

On Mon, Feb 13, 2012 at 2:17 PM,   wrote:
> Hi,
>
> how efficent is such an query:
>
> q=some text
> fq=id:(1 OR 2 OR 3...)
>
> Should I better use q:some text AND id:(1 OR 2 OR 3...)?
>
> Is the Filter Cache used for the OR'ed fq?
>
> Thank you
>


Re: Need help with graphing function (MATH)

2012-02-14 Thread Mark
Thanks I'll have a look at this. I should have mentioned that the actual 
values on the graph aren't important rather I was showing an example of 
how the function should behave.


On 2/13/12 6:25 PM, Kent Fitch wrote:

Hi, assuming you have x and want to generate y, then maybe

- if x < 50, y = 150

- if x > 175, y = 60

- otherwise :

either y = (100/(e^((x -50)/75)^2)) + 50
http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175


- or maybe y =sin((x+5)/38)*42+105

http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175

Regards,

Kent Fitch

On Tue, Feb 14, 2012 at 12:29 PM, Mark > wrote:


I need some help with one of my boost functions. I would like the
function to look something like the following mockup below. Starts
off flat then there is a gradual decline, steep decline then
gradual decline and then back to flat.

Can some of you math guys please help :)

Thanks.






Re: OR-FilterQuery

2012-02-14 Thread Mikhail Khludnev
On Mon, Feb 13, 2012 at 11:17 PM,  wrote:

> Hi,
>
> how efficent is such an query:
>
> q=some text
> fq=id:(1 OR 2 OR 3...)
>
> Should I better use q:some text AND id:(1 OR 2 OR 3...)?
>
1. These two opts have the different scoring.
2. if you hit same fq=id:(1 OR 2 OR 3...) many times you have a benefit due
to reading docset from heap instead of searching on disk.


>
> Is the Filter Cache used for the OR'ed fq?
>
Filter cache is used for whatever filter. I guess I didn't get you. Can't
you rephrase your question?


>
> Thank you
>
>


-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
Thanks Mark, not a huge rush, just me trying to get to use the latest
stuff on our project.

On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller  wrote:
> Sorry, have not gotten it yet, but will be back trying later today - monday, 
> tuesday tend to be slow for me (meetings and crap).
>
> - Mark
>
> On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
>
>> Has there been any success in replicating this?  I'm wondering if it
>> could be something with my setup that is causing the issue...
>>
>>
>> On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
>>> Yes, I have the following layout on the FS
>>>
>>> ./bootstrap.sh
>>> ./example (standard example directory from distro containing jetty
>>> jars, solr confs, solr war, etc)
>>> ./slice1
>>>  - start.sh
>>>  -solr.xml
>>>  - slice1_shard1
>>>   - data
>>>  - slice2_shard2
>>>   -data
>>> ./slice2
>>>  - start.sh
>>>  - solr.xml
>>>  -slice2_shard1
>>>    -data
>>>  -slice1_shard2
>>>    -data
>>>
>>> if it matters I'm running everything from localhost, zk and the solr shards
>>>
>>> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: SolrJ + SolrCloud

2012-02-14 Thread Mark Miller
No hard plans around that that at the moment, but when I free up some time I 
plan on looking at the JIRA issue I pointed to. Looks like a lot of the work 
may already be done.

- mark

On Feb 12, 2012, at 8:14 AM, Darren Govoni wrote:

> Thanks Mark. Is there any plan to make all the Solr search handlers work
> with SolrCloud, like MLT? That missing feature would prohibit us from
> using SolrCloud at the moment. :(
> 
> On Sat, 2012-02-11 at 18:24 -0500, Mark Miller wrote:
>> On Feb 11, 2012, at 6:02 PM, Darren Govoni wrote:
>> 
>>> Hi,
>>> Do all the normal facilities of Solr work with SolrCloud from SolrJ?
>>> Things like /mlt, /cluster, facets , tvf's, etc.
>>> 
>>> Darren
>>> 
>> 
>> 
>> SolrJ works the same in SolrCloud mode as it does in non SolrCloud mode - 
>> it's fully supported. There is even a new SolrJ client called 
>> CloudSolrServer that has built in cluster awareness and load balancing.
>> 
>> In terms of what is supported - anything that is supported with distributed 
>> search - that is most things, but there is the odd man out - like MLT - 
>> looks like an issue is open here: 
>> https://issues.apache.org/jira/browse/SOLR-788 but it's not resolved yet.
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-14 Thread Mark Miller
Sorry, have not gotten it yet, but will be back trying later today - monday, 
tuesday tend to be slow for me (meetings and crap).

- Mark

On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:

> Has there been any success in replicating this?  I'm wondering if it
> could be something with my setup that is causing the issue...
> 
> 
> On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
>> Yes, I have the following layout on the FS
>> 
>> ./bootstrap.sh
>> ./example (standard example directory from distro containing jetty
>> jars, solr confs, solr war, etc)
>> ./slice1
>>  - start.sh
>>  -solr.xml
>>  - slice1_shard1
>>   - data
>>  - slice2_shard2
>>   -data
>> ./slice2
>>  - start.sh
>>  - solr.xml
>>  -slice2_shard1
>>-data
>>  -slice1_shard2
>>-data
>> 
>> if it matters I'm running everything from localhost, zk and the solr shards
>> 
>> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
>>> Do you have unique dataDir for each instance?
>>> 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:

- Mark Miller
lucidimagination.com













Re: Highlighting stopwords

2012-02-14 Thread O. Klein

Koji Sekiguchi wrote
> 
> Uh, what you tried was that you changed the field between q and hl.q,
> that I've not expected use case when I proposed hl.q.
> 
> Do you think that hl.text meats your needs?
> 
> https://issues.apache.org/jira/browse/SOLR-1926?focusedCommentId=12871234&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12871234
> 
> koji
> -- 
> Apache Solr Query Log Visualizer
> http://soleami.com/
> 

Well, If I understand it correctly, yes.

If this means that queries are analyzed like the field they are
highlighting. That would give the highlighter a lot more flexibility. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3744054.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stemming and accents (HunspellStemFilterFactory)

2012-02-14 Thread Chantal Ackermann
Hi Bráulio,

I don't know about HunspellStemFilterFactory especially but concerning
accents:

There are several accent filter that will remove accents from your
tokens. If the Hunspell filter factory requires the accents, then simply
add the accent filters after Hunspell in your index and query filter
chains.

You would then have Hunspell produce the tokens as result of the
stemming and only afterwards the accents would be removed (your example:
'forum' instead of 'fórum'). Do the same on the query side in case
someone inputs accents.

Accent filters are:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
(lowercases, as well!)
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory

and others on that page.

Chantal


On Tue, 2012-02-14 at 14:48 +0100, Bráulio Bhavamitra wrote:
> Hello all,
> 
> I'm evaluating the HunspellStemFilterFactory I found it works with a
> pt_PT dictionary.
> 
> For example, if I search for 'fóruns' it stems it to 'fórum' and then find
> 'fórum' references.
> 
> But if I search for 'foruns' (without accent),
> then HunspellStemFilterFactory cannot stem
> word, as it does' not exist in its dictionary.
> 
> It there any way to make HunspellStemFilterFactory work without accents
> differences?
> 
> best,
> bráulio



Re: Highlighting stopwords

2012-02-14 Thread Koji Sekiguchi

(12/02/14 22:25), O. Klein wrote:

I have not been able to find any logic in the behavior of hl.q and how it
analyses the query. Could you explain how it is supposed to work?


Nothing special on hl.q. If you use hl.q, the value of it will be used for
highlighting rather than the value of q. There's no tricks, I think.

> When using hl.q=content_hl:(spell Check) I now get highlighting including
> stopwords.
>
> but when using hl.q=content_hl:(SC) where SC is synonym I get no
> highlighting.
>
> Can you verify if synonyms work when using hl.q?

  :

> OK I got it working by using hl.q=content_hl:(spell Check)
> content_text:(spell Check) but it makes no sense to me.
>
> only difference between the 2 fields is the use of Stopwords.

Uh, what you tried was that you changed the field between q and hl.q,
that I've not expected use case when I proposed hl.q.

Do you think that hl.text meats your needs?

https://issues.apache.org/jira/browse/SOLR-1926?focusedCommentId=12871234&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12871234

koji
--
Apache Solr Query Log Visualizer
http://soleami.com/


Re: Improving performance for SOLR geo queries?

2012-02-14 Thread Bill Bell
Can we get this back ported to 3x?

Bill Bell
Sent from mobile


On Feb 14, 2012, at 3:45 AM, Matthias Käppler  wrote:

> hey thanks all for the suggestions, didn't have time to look into them
> yet as we're feature-sprinting for MWC, but will report back with some
> feedback over the next weeks (we will have a few more performance
> sprints in March)
> 
> Best,
> Matthias
> 
> On Mon, Feb 13, 2012 at 2:32 AM, Yonik Seeley
>  wrote:
>> On Thu, Feb 9, 2012 at 1:46 PM, Yonik Seeley  
>> wrote:
>>> One way to speed up numeric range queries (at the cost of increased
>>> index size) is to lower the precisionStep.  You could try changing
>>> this from 8 to 4 and then re-indexing to see how that affects your
>>> query speed.
>> 
>> Your issue, and the fact that I had been looking at the post-filtering
>> code again for another client, reminded me that I had been planning on
>> implementing post-filtering for spatial.  It's now checked into trunk.
>> 
>> If you have the ability to use trunk, you can add a high cost (like
>> cost=200) along with cache=false to trigger it.
>> 
>> More details here:
>> http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/
>> 
>> -Yonik
>> lucidimagination.com
> 
> 
> 
> -- 
> Matthias Käppler
> Lead Developer API & Mobile
> 
> Qype GmbH
> Großer Burstah 50-52
> 20457 Hamburg
> Telephone: +49 (0)40 - 219 019 2 - 160
> Skype: m_kaeppler
> Email: matth...@qype.com
> 
> Managing Director: Ian Brotherston
> Amtsgericht Hamburg
> HRB 95913
> 
> This e-mail and its attachments may contain confidential and/or
> privileged information. If you are not the intended recipient (or have
> received this e-mail in error) please notify the sender immediately
> and destroy this e-mail and its attachments. Any unauthorized copying,
> disclosure or distribution of this e-mail and  its attachments is
> strictly forbidden. This notice also applies to future messages.


Mmap

2012-02-14 Thread Bill Bell
Does someone have an example of using unmap in 3.5 and chunksize?

 I am using Solr 3.5.

I noticed in solrconfig.xml:



I don't see this parameter taking.. When I set 
-Dsolr.directoryFactory=solr.MMapDirectoryFactory

How do I see the setting in the log or in stats.jsp ? I cannot find a place 
that indicates it is set or not.

I would assume StandardDirectoryFactory is being used but I do see (when I set 
it or NOT set it)

Bill Bell
Sent from mobile



Re: Solr binary response for C#?

2012-02-14 Thread Erick Erickson
It's not as compact as binary format, but would just using something
like JSON help enough? This is really simple, just specify
&wt=json (there's a method to set this on the server, at least in Java).

Otherwise, you might get a more knowledgeable response on the
C# java list, I'm frankly clueless

Best
Erick

On Mon, Feb 13, 2012 at 1:15 PM, naptowndev  wrote:
> Admittedly I'm new to this, but the project we're working on feeds results
> from Solr to an ASP.net application.  Currently we are using XML, but our
> payloads can be rather large, some up to 17MB.  We are looking for a way to
> minimize that payload and increase performance and I'm curious if there's
> anything anyone has been working out that creates a binary response that can
> be read by C# (similar to the javabin response built into Solr).
>
> That, or if anyone has experience implementing an external protocol like
> Thrift with Solr and consuming it with C# - again all in the effort to
> increase performance across the wire and while being consumed.
>
> Any help and direction would be greatly appreciated!
>
> Thanks!
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-binary-response-for-C-tp3741101p3741101.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Debugging on 3,5

2012-02-14 Thread Bill Bell

I did find a solution, but the output is horrible. Why does explain look so 
badly?


6.351252 = (MATCH) boost(*:*,query(specialties_ids: #1;#0;#0;#0;#0;#0;#0;#0;#0; 
,def=0.0)), product of:
  1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm
  6.351252 = query(specialties_ids: #1;#0;#0;#0;#0;#0;#0;#0;#0; 
,def=0.0)=6.351252



defType=edismax&boost=query($param)¶m=multi_field:87
--


We like the boost parameter in SOLR 3.5 with eDismax.

The question we have is what we would like to replace bq with boost, but we get 
the "multi-valued field issue" when we try to do this.

Bill Bell
Sent from mobile



Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
Has there been any success in replicating this?  I'm wondering if it
could be something with my setup that is causing the issue...


On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
> Yes, I have the following layout on the FS
>
> ./bootstrap.sh
> ./example (standard example directory from distro containing jetty
> jars, solr confs, solr war, etc)
> ./slice1
>  - start.sh
>  -solr.xml
>  - slice1_shard1
>   - data
>  - slice2_shard2
>   -data
> ./slice2
>  - start.sh
>  - solr.xml
>  -slice2_shard1
>    -data
>  -slice1_shard2
>    -data
>
> if it matters I'm running everything from localhost, zk and the solr shards
>
> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
>> Do you have unique dataDir for each instance?
>> 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:


Stemming and accents (HunspellStemFilterFactory)

2012-02-14 Thread Bráulio Bhavamitra
Hello all,

I'm evaluating the HunspellStemFilterFactory I found it works with a
pt_PT dictionary.

For example, if I search for 'fóruns' it stems it to 'fórum' and then find
'fórum' references.

But if I search for 'foruns' (without accent),
then HunspellStemFilterFactory cannot stem
word, as it does' not exist in its dictionary.

It there any way to make HunspellStemFilterFactory work without accents
differences?

best,
bráulio


'foruns' don't match 'forum' with NGramFilterFactory (or EdgeNGramFilterFactory)

2012-02-14 Thread Bráulio Bhavamitra
Hello all,

I'm experimenting with NGramFilterFactory and EgdeNGramFilterFactory.

Both of them shows a match in my solr admin analysis, but when I query
'foruns'
doesn't find any 'forum'.
analysis
http://bhakta.casadomato.org:8982/solr/admin/analysis.jsp?nt=type&name=text&verbose=on&highlight=on&val=f%C3%B3runs&qverbose=on&qval=f%C3%B3runs
search
http://bhakta.casadomato.org:8982/solr/select/?q=foruns&version=2.2&start=0&rows=10&indent=on

Anybody knows what's the problem?

bráulio


Re: Highlighting stopwords

2012-02-14 Thread O. Klein

O. Klein wrote
> 
> 
> O. Klein wrote
>> 
>> Hmm, now the synonyms aren't highlighted anymore.
>> 
>> OK back to basic (im using trunk and FVH).
>> 
>> What is the way to go about if I want to search on a field without
>> stopwords, but still want to highlight the stopwords? (and still
>> highlight synonyms and stemmed words)?
>> 
> 
> I made new field content_hl to prevent problems coming from copyField.
> 
> When using hl.q=content_hl:(spell Check) I now get highlighting including
> stopwords.
> 
> but when using hl.q=content_hl:(SC) where SC is synonym I get no
> highlighting.
> 
> Can you verify if synonyms work when using hl.q?
> 

OK I got it working by using hl.q=content_hl:(spell Check)
content_text:(spell Check) but it makes no sense to me.

only difference between the 2 fields is the use of Stopwords.

What's also weird is that a query like hl.q=content_spell:(SC) also
highlights synonyms, eventhough this field has no synonyms.

I have not been able to find any logic in the behavior of hl.q and how it
analyses the query. Could you explain how it is supposed to work?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3743616.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort my results alphabetically on facetnames

2012-02-14 Thread Michael Kuhlmann

Hi!

On 14.02.2012 13:09, PeterKerk wrote:

I want to sort my results on the facetnames (not by their number of results).


From the example you gave, I'd assume you don't want to sort by facet 
names but by facet values.


Simply add facet.sort=index to your request; see
http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort

Or simply sort the facet result on your own.

Greetings,
Kuli


Re: Re:how to monitor solr in newrelic

2012-02-14 Thread roySolr
Try this when you start SOLR

java -javaagent:/NEWRELICPATH/newrelic.jar -jar start.jar

Normally you will see your SOLR installation on your newrelic dashboard in 2
minutes.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-monitor-solr-in-newrelic-tp3739567p3743488.html
Sent from the Solr - User mailing list archive at Nabble.com.


sort my results alphabetically on facetnames

2012-02-14 Thread PeterKerk
I want to sort my results on the facetnames (not by their number of results).

So now I have this (ordered by number of results):
Instelling voor auditief gehandicapten (16)
Audiologisch centrum (13)
Huisartsenpraktijk (13)
Instelling voor lichamelijk gehandicapten (13)
Ambulancezorg (12)
Beroepsorganisatie (12)

What I want is this:
Ambulancezorg (12)
Audiologisch centrum (13)
Beroepsorganisatie (12)
Huisartsenpraktijk (13)
Instelling voor auditief gehandicapten (16)
Instelling voor lichamelijk gehandicapten (13)

How can I change my request url to sort differently?

My current request url is like so:
http://localhost:8983/solr/zz_healthorg/select/?indent=on&facet=true&q=*:*&fl=id&facet.field=healthorganizationtypes_raw_nl&facet.mincount=1
With the resul below
This XML file does not appear to have any style information associated with
it. The document tree is shown below.


0
1

true
id
on
1
*:*
healthorganizationtypes_raw_nl




1


2


3


4


5


6


7


8


9


10






16
13
13
13
12
12







--
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-my-results-alphabetically-on-facetnames-tp3743471p3743471.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Language specific tokenizer for purpose of multilingual search in single-core solr,

2012-02-14 Thread Paul Libbrecht
only one field element?
There should be two or?
One for each language.

paul


Le 14 févr. 2012 à 07:34, bing a écrit :

> 
> Hi, all, 
> 
> I want to do multilingual search in single-core solr. That requires to
> define language specific tokenizers in scheme.xml. Say for example, I have
> two tokenizers, one for English ("en") and one for simplified Chinese
> ("zh-cn"). Can I just put following definitions together in one schema.xml,
> and both sets of the files ( stopwords, synonym, and protwords) in one
> directory? 
> 
> 
> 1. fieldType and field definition for english ("en")  
> 
>  positionIncrementGap="100">
>  
>
> words="stopwords_en.txt" enablePositionIncrements="true" />
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords_en.txt"/>
>  
>  .
> 
> 
>  multiValued="true"/>
> 
> 
> 2. fieldType and field definition for Chinese ("zh_cn")  
> 
>  positionIncrementGap="100">
>  
>/>
> words="stopwords_ch.txt" enablePositionIncrements="true" />
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
>
> protected="protwords_en.txt"/>
>  
>  .
> 
> 
>  multiValued="true"/>
> 
> 
> Best 
> Bing
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Language-specific-tokenizer-for-purpose-of-multilingual-search-in-single-core-solr-tp3742873p3742873.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Improving performance for SOLR geo queries?

2012-02-14 Thread Matthias Käppler
hey thanks all for the suggestions, didn't have time to look into them
yet as we're feature-sprinting for MWC, but will report back with some
feedback over the next weeks (we will have a few more performance
sprints in March)

Best,
Matthias

On Mon, Feb 13, 2012 at 2:32 AM, Yonik Seeley
 wrote:
> On Thu, Feb 9, 2012 at 1:46 PM, Yonik Seeley  
> wrote:
>> One way to speed up numeric range queries (at the cost of increased
>> index size) is to lower the precisionStep.  You could try changing
>> this from 8 to 4 and then re-indexing to see how that affects your
>> query speed.
>
> Your issue, and the fact that I had been looking at the post-filtering
> code again for another client, reminded me that I had been planning on
> implementing post-filtering for spatial.  It's now checked into trunk.
>
> If you have the ability to use trunk, you can add a high cost (like
> cost=200) along with cache=false to trigger it.
>
> More details here:
> http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/
>
> -Yonik
> lucidimagination.com



-- 
Matthias Käppler
Lead Developer API & Mobile

Qype GmbH
Großer Burstah 50-52
20457 Hamburg
Telephone: +49 (0)40 - 219 019 2 - 160
Skype: m_kaeppler
Email: matth...@qype.com

Managing Director: Ian Brotherston
Amtsgericht Hamburg
HRB 95913

This e-mail and its attachments may contain confidential and/or
privileged information. If you are not the intended recipient (or have
received this e-mail in error) please notify the sender immediately
and destroy this e-mail and its attachments. Any unauthorized copying,
disclosure or distribution of this e-mail and  its attachments is
strictly forbidden. This notice also applies to future messages.


Re: Highlighting stopwords

2012-02-14 Thread O. Klein

O. Klein wrote
> 
> Hmm, now the synonyms aren't highlighted anymore.
> 
> OK back to basic (im using trunk and FVH).
> 
> What is the way to go about if I want to search on a field without
> stopwords, but still want to highlight the stopwords? (and still highlight
> synonyms and stemmed words)?
> 

I made new field content_hl to prevent problems coming from copyField.

When using hl.q=content_hl:(spell Check) I now get highlighting including
stopwords.

but when using hl.q=content_hl:(SC) where SC is synonym I get no
highlighting.

Can you verify if synonyms work when using hl.q?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3743317.html
Sent from the Solr - User mailing list archive at Nabble.com.