Re: Indexing Multiple Languages with solr (Arabic English)

2013-12-03 Thread aniljayanti
Hi,

Thanks for ur post,

I donot know how to use text_ar fieldtype for Arabic language. What are
the configurations need to add in schema.xml file ? Please guide me.


AnilJayanti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Multiple-Languages-with-solr-Arabic-English-tp4104580p4104613.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing Multiple Languages with solr (Arabic English)

2013-12-03 Thread Alexandre Rafalovitch
It's just a text type. So, just declare another field and instead of
text_general or text_en, use text_ar. Then use copyField from source text
field to it.

Go through the tutorial, if you haven't yet. It explains some of the things.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Dec 3, 2013 at 3:12 PM, aniljayanti aniljaya...@yahoo.co.in wrote:

 Hi,

 Thanks for ur post,

 I donot know how to use text_ar fieldtype for Arabic language. What are
 the configurations need to add in schema.xml file ? Please guide me.


 AnilJayanti



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Indexing-Multiple-Languages-with-solr-Arabic-English-tp4104580p4104613.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: post filtering for boolean filter queries

2013-12-03 Thread Dmitry Kan
ok, we were able to confirm the behavior regarding not caching the filter
query. It works as expected. It does not cache with {!cache=false}.

We are still looking into clarifying the cost assignment: i.e. whether it
works as expected for long boolean filter queries.


On Tue, Dec 3, 2013 at 8:55 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Hello!

 We have been experimenting with post filtering lately. Our setup is a
 filter having long boolean query; drawing the example from the Dublin's
 Stump the Chump:

 fq=UserId:(user1 OR user2 OR...OR user1000)

 The underlining issue impacting performance is that the combination of
 user ids in the query above is unique per each user in the system and on
 top the combination is changing every day.

 Our idea was to stop caching the filter query with {!cache=false}. Since
 there is no way to introspect the contents of the filter cache to our
 knowledge (jmx?), we can't be sure those are not cached. This is because
 the initial query per each combination takes substantially more time (as if
 it was *not* cached) than the second and subsequent queries with the same
 fq (as if it *was* cached).

 Question is: does post filtering support boolean queries in fq params?

 Another thing we have been trying is assigning a cost to the fq relatively
 higher than for other filter queries. Does this feature support the boolean
 queries in fq params as well?

 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: twitter.com/dmitrykan




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: SolrCloud keeps repeating exception 'SolrCoreState already closed'

2013-12-03 Thread Shalin Shekhar Mangar
I just ran into this issue on solr 4.6 on an EC2 machine while
indexing wikipedia dump with DIH. I'm trying to isolate exceptions
before the SolrCoreState already closed exception.

On Sun, Nov 10, 2013 at 11:58 PM, Mark Miller markrmil...@gmail.com wrote:
 Can you isolate any exceptions that happened just before that exception. 
 started repeating?

 - Mark

 On Nov 7, 2013, at 9:09 AM, Eric Bus eric@websight.nl wrote:

 Hi,

 I'm having a problem with one of my shards. Since yesterday, SOLR keeps 
 repeating the same exception over and over for this shard.
 The webinterface for this SOLR instance is also not working (it hangs on the 
 Loading indicator).

 Nov 7, 2013 9:08:12 AM org.apache.solr.update.processor.LogUpdateProcessor 
 finish
 INFO: [website1_shard1_replica3] webapp=/solr path=/update 
 params={update.distrib=TOLEADERwt=javabinversion=2} {} 0 0
 Nov 7, 2013 9:08:12 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.RuntimeException: SolrCoreState already closed
at 
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:79)
at 
 org.apache.solr.update.DirectUpdateHandler2.delete(DirectUpdateHandler2.java:276)
at 
 org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:77)
at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalDelete(DistributedUpdateProcessor.java:460)
at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1036)
at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:721)
at 
 org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
 org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
at 
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)
at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)

 I have about 3GB of logfiles for this single message. Reloading the 
 collection does not work. Reloading the specific shard core returns the same 
 exception. The only option seems to be to restart the server. But because 
 it's the leader for a lot of collections, I want to know why this is 
 happening. I've seen this problem before, and I haven't figured out what is 
 causing it.

 I've reported a different problem a few days ago with 'hanging' deleted 
 logfiles. Could this be related? Could the hanging logfiles prevent a new 
 Searcher from opening? I've updated two of my three hosts to 4.5.1 but after 
 only 2 days uptime, I'm still seeing about 11.000 deleted logfiles in the 
 lsof output.

 Best regards,
 Eric Bus





-- 
Regards,
Shalin Shekhar Mangar.


RE: SolrCloud keeps repeating exception 'SolrCoreState already closed'

2013-12-03 Thread Eric Bus
Are you currently running SOLR under Tomcat or standalone with Jetty? I 
switched from Tomcat to Jetty and the problems went away.

- Eric


-Oorspronkelijk bericht-
Van: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Verzonden: dinsdag 3 december 2013 12:44
Aan: solr-user@lucene.apache.org
Onderwerp: Re: SolrCloud keeps repeating exception 'SolrCoreState already 
closed'

I just ran into this issue on solr 4.6 on an EC2 machine while indexing 
wikipedia dump with DIH. I'm trying to isolate exceptions before the 
SolrCoreState already closed exception.

On Sun, Nov 10, 2013 at 11:58 PM, Mark Miller markrmil...@gmail.com wrote:
 Can you isolate any exceptions that happened just before that exception. 
 started repeating?

 - Mark

 On Nov 7, 2013, at 9:09 AM, Eric Bus eric@websight.nl wrote:

 Hi,

 I'm having a problem with one of my shards. Since yesterday, SOLR keeps 
 repeating the same exception over and over for this shard.
 The webinterface for this SOLR instance is also not working (it hangs on the 
 Loading indicator).

 Nov 7, 2013 9:08:12 AM 
 org.apache.solr.update.processor.LogUpdateProcessor finish
 INFO: [website1_shard1_replica3] webapp=/solr path=/update 
 params={update.distrib=TOLEADERwt=javabinversion=2} {} 0 0 Nov 7, 
 2013 9:08:12 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.RuntimeException: SolrCoreState already closed
at 
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:79)
at 
 org.apache.solr.update.DirectUpdateHandler2.delete(DirectUpdateHandler2.java:276)
at 
 org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:77)
at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalDelete(DistributedUpdateProcessor.java:460)
at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1036)
at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:721)
at 
 org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
 org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
at 
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)
at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)

 I have about 3GB of logfiles for this single message. Reloading the 
 collection does not work. Reloading the specific shard core returns the same 
 exception. The only option seems to be to restart the server. But because 
 it's the leader for a lot of collections, I want to know why this is 
 happening. I've seen this problem before, and I haven't figured out what is 
 causing it.

 I've reported a different problem a few days ago with 'hanging' deleted 
 logfiles. Could this be related? Could the hanging logfiles prevent a new 
 Searcher from opening? I've updated two of my three hosts to 4.5.1 but after 
 only 2 days uptime, I'm still seeing about 11.000 deleted logfiles in the 
 lsof output.

 Best regards,
 Eric Bus





--
Regards,
Shalin 

Re: Using the flexible query parser in Solr instead of classic

2013-12-03 Thread Jack Krupansky
I don't recall hearing any discussion of such a switch. In fact, Solr now 
has its own copy of the classic Lucene query parser since Solr needed some 
features that the Lucene guys did not find acceptable.


That said, if you have a proposal to dramatically upgrade the base Solr 
query parser, as well as edismax, I'm sure people would be interested. I 
think the intent was to evolve edismax to the point where it would become 
the default Solr query parser.


So, maybe that would be the ideal starting point - a version of edismax 
based on the flexible query parser rather than the classic query parser.


-- Jack Krupansky

-Original Message- 
From: Karsten R.

Sent: Tuesday, December 03, 2013 1:24 AM
To: solr-user@lucene.apache.org
Subject: Using the flexible query parser in Solr instead of classic

Hi folks,

last year we built a 3.X Solr-QueryParser based on
org.apache.lucene.queryparser.flexible.standard.StandardQueryParser because
we had some additions with SpanQueries and PhraseQueries. We think about to
adapt this for 4.X

At time the SolrQueryParser is based on
org.apache.lucene.queryparser.classic.QueryParser.jj

Is there a plan for 4.X to switch with LuceneQParser from classic to
flexible (
org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.jj
)?
Is there a SOLR-Task to use the flexible QP ?
Is this a need for someone else?

Beste regards
 Karsten


P.S. I did only found one (unanswered) Thread and no Task about Solr and
flexible QP (Thread:
http://lucene.472066.n3.nabble.com/Using-the-contrib-flexible-query-parser-in-Solr-td819.html
)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-the-flexible-query-parser-in-Solr-instead-of-classic-tp4104584.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: SolrCloud keeps repeating exception 'SolrCoreState already closed'

2013-12-03 Thread Shalin Shekhar Mangar
No, I am running on the example jetty. I am re-running the import and
haven't hit the problem yet. Still running.

On Tue, Dec 3, 2013 at 5:45 PM, Eric Bus eric@websight.nl wrote:
 Are you currently running SOLR under Tomcat or standalone with Jetty? I 
 switched from Tomcat to Jetty and the problems went away.

 - Eric


 -Oorspronkelijk bericht-
 Van: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
 Verzonden: dinsdag 3 december 2013 12:44
 Aan: solr-user@lucene.apache.org
 Onderwerp: Re: SolrCloud keeps repeating exception 'SolrCoreState already 
 closed'

 I just ran into this issue on solr 4.6 on an EC2 machine while indexing 
 wikipedia dump with DIH. I'm trying to isolate exceptions before the 
 SolrCoreState already closed exception.

 On Sun, Nov 10, 2013 at 11:58 PM, Mark Miller markrmil...@gmail.com wrote:
 Can you isolate any exceptions that happened just before that exception. 
 started repeating?

 - Mark

 On Nov 7, 2013, at 9:09 AM, Eric Bus eric@websight.nl wrote:

 Hi,

 I'm having a problem with one of my shards. Since yesterday, SOLR keeps 
 repeating the same exception over and over for this shard.
 The webinterface for this SOLR instance is also not working (it hangs on 
 the Loading indicator).

 Nov 7, 2013 9:08:12 AM
 org.apache.solr.update.processor.LogUpdateProcessor finish
 INFO: [website1_shard1_replica3] webapp=/solr path=/update
 params={update.distrib=TOLEADERwt=javabinversion=2} {} 0 0 Nov 7,
 2013 9:08:12 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.RuntimeException: SolrCoreState already closed
at 
 org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:79)
at 
 org.apache.solr.update.DirectUpdateHandler2.delete(DirectUpdateHandler2.java:276)
at 
 org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:77)
at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalDelete(DistributedUpdateProcessor.java:460)
at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1036)
at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:721)
at 
 org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
 org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
at 
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)
at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)

 I have about 3GB of logfiles for this single message. Reloading the 
 collection does not work. Reloading the specific shard core returns the 
 same exception. The only option seems to be to restart the server. But 
 because it's the leader for a lot of collections, I want to know why this 
 is happening. I've seen this problem before, and I haven't figured out what 
 is causing it.

 I've reported a different problem a few days ago with 'hanging' deleted 
 logfiles. Could this be related? Could the hanging logfiles prevent a new 
 Searcher from opening? 

Deleting and committing inside a SearchComponent

2013-12-03 Thread Peyman Faratin
Hi

Is it possible to delete and commit updates to an index inside a custom 
SearchComponent? I know I can do it with solrj but due to several business 
logic requirements I need to build the logic inside the search component.  I am 
using SOLR 4.5.0. 

thank you

Re: Constantly increasing time of full data import

2013-12-03 Thread michallos
This occurs only on production environment so I can't profile it :-) Any
clues?

DirectUpdateHandler2 config:

 autoCommit
maxTime15000/maxTime
openSearcherfalse/openSearcher
/autoCommit

updateLog
str name=dir${solr.ulog.dir:}/str
/updateLog



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4104722.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Automatically build spellcheck dictionary on replicas

2013-12-03 Thread Kydryavtsev Andrey
Did you try to add
  str name=buildOnCommittrue/str
 parameter to your slave's spellcheck configuration?

03.12.2013, 12:04, Mirko idonthaveenoughinformat...@googlemail.com:
 Hi all,
 We use a Solr SpellcheckComponent with a file-based dictionary. We run a
 master and some replica slave servers. To update the dictionary, we copy
 the dictionary txt file to the master, from where it is automatically
 replicated to all slaves. However, it seems we need to run the
 spellcheck.build query on all servers individually.

 Is there a way to automatically build the spellcheck dictionary on all
 servers without calling spellcheck.build on all slaves individually?

 We use Solr 4.0.0

 Thanks,
 Mirko


Re: Automatically build spellcheck dictionary on replicas

2013-12-03 Thread Mirko
Yes, I have that, but it doesn't help. It seems Solr still needs the query
with the spellcheck.build parameter to build the spellchecker index.


2013/12/3 Kydryavtsev Andrey werde...@yandex.ru

 Did you try to add
   str name=buildOnCommittrue/str
  parameter to your slave's spellcheck configuration?

 03.12.2013, 12:04, Mirko idonthaveenoughinformat...@googlemail.com:
  Hi all,
  We use a Solr SpellcheckComponent with a file-based dictionary. We run a
  master and some replica slave servers. To update the dictionary, we copy
  the dictionary txt file to the master, from where it is automatically
  replicated to all slaves. However, it seems we need to run the
  spellcheck.build query on all servers individually.
 
  Is there a way to automatically build the spellcheck dictionary on all
  servers without calling spellcheck.build on all slaves individually?
 
  We use Solr 4.0.0
 
  Thanks,
  Mirko



json update moves doc to end

2013-12-03 Thread Andreas Owen
When I search for “agenda” I get a lot of hits. Now if I update the 2.
Result by json-update the doc is moved to the end of the index when I search
for it again. The field I change is “editorschoice” and it never contains
the search term “agenda” so I don’t see why it changes the order. Why does
it?

 

Part of Solrconfig requesthandler I use:

requestHandler name=/select2 class=solr.SearchHandler

 lst name=defaults

str name=echoParamsexplicit/str

int name=rows10/int

 str name=defTypesynonym_edismax/str

   str name=synonymstrue/str

   str name=qfplain_text^10 editorschoice^200

   title^20 h_*^14 

   tags^10 thema^15 inhaltstyp^6 breadcrumb^6
doctype^10

   contentmanager^5 links^5

   last_modified^5  url^5

   /str

   str name=bq(expiration:[NOW TO *] OR (*:*
-expiration:*))^6/str  !-- tested: now or newer or empty gets small boost
--

   str name=bflog(clicks)^8/str !-- tested --

   !-- todo: anzahl-links(count urlparse in links query) /
häufigkeit von suchbegriff (bf= count in title and text)--

 str name=dftext/str

   str name=fl*,path,score/str

   str name=wtjson/str

   str name=q.opAND/str

   

   !-- Highlighting defaults --

str name=hlon/str

 str name=hl.flplain_text,title/str

   str name=hl.simple.prelt;bgt;/str

str name=hl.simple.postlt;/bgt;/str

   

 !-- lst name=invariants --

str name=faceton/str

   str name=facet.mincount1/str

str
name=facet.field{!ex=inhaltstyp}inhaltstyp/str

   str
name=f.inhaltstyp.facet.sortindex/str

   str
name=facet.field{!ex=doctype}doctype/str

   str name=f.doctype.facet.sortindex/str

   str
name=facet.field{!ex=thema_f}thema_f/str

   str name=f.thema_f.facet.sortindex/str

   str
name=facet.field{!ex=author_s}author_s/str

   str name=f.author_s.facet.sortindex/str

   str
name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str

   str
name=f.sachverstaendiger_s.facet.sortindex/str

   str
name=facet.field{!ex=veranstaltung}veranstaltung/str

   str
name=f.veranstaltung.facet.sortindex/str

   str
name=facet.date{!ex=last_modified}last_modified/str

   str
name=facet.date.gap+1MONTH/str

   str
name=facet.date.endNOW/MONTH+1MONTH/str

   str
name=facet.date.startNOW/MONTH-36MONTHS/str

   str
name=facet.date.otherafter/str   

   /lst

/requestHandler



Re: json update moves doc to end

2013-12-03 Thread Jonathan Rochkind

What order, the order if you supply no explicit sort at all?

Solr does not make any guarantees about what order documents will come 
back in if you do not ask for a sort.


In general in Solr/lucene, the only way to update a document is to 
re-add it as a new document, so that's probably what's going on behind 
the scenes, and it probably effects the 'default' sort order -- which 
Solr makes no agreement about anyway, you probably shouldn't even count 
on it being consistent at all.


If you want a consistent sort order, maybe add a field with a timestamp, 
and ask for results sorted by the timestamp field? And then make sure 
not to change the timestamp when you do an update that you don't want to 
change the order?


Apologies if I've misunderstood the situation.

On 12/3/13 1:00 PM, Andreas Owen wrote:

When I search for agenda I get a lot of hits. Now if I update the 2.
Result by json-update the doc is moved to the end of the index when I search
for it again. The field I change is editorschoice and it never contains
the search term agenda so I don't see why it changes the order. Why does
it?



Part of Solrconfig requesthandler I use:

requestHandler name=/select2 class=solr.SearchHandler

  lst name=defaults

 str name=echoParamsexplicit/str

 int name=rows10/int

  str name=defTypesynonym_edismax/str

str name=synonymstrue/str

str name=qfplain_text^10 editorschoice^200

title^20 h_*^14

tags^10 thema^15 inhaltstyp^6 breadcrumb^6
doctype^10

contentmanager^5 links^5

last_modified^5  url^5

/str

str name=bq(expiration:[NOW TO *] OR (*:*
-expiration:*))^6/str  !-- tested: now or newer or empty gets small boost
--

str name=bflog(clicks)^8/str !-- tested --

!-- todo: anzahl-links(count urlparse in links query) /
häufigkeit von suchbegriff (bf= count in title and text)--

  str name=dftext/str

str name=fl*,path,score/str

str name=wtjson/str

str name=q.opAND/str



!-- Highlighting defaults --

 str name=hlon/str

  str name=hl.flplain_text,title/str

str name=hl.simple.prelt;bgt;/str

 str name=hl.simple.postlt;/bgt;/str



  !-- lst name=invariants --

 str name=faceton/str

str name=facet.mincount1/str

 str
name=facet.field{!ex=inhaltstyp}inhaltstyp/str

str
name=f.inhaltstyp.facet.sortindex/str

str
name=facet.field{!ex=doctype}doctype/str

str name=f.doctype.facet.sortindex/str

str
name=facet.field{!ex=thema_f}thema_f/str

str name=f.thema_f.facet.sortindex/str

str
name=facet.field{!ex=author_s}author_s/str

str name=f.author_s.facet.sortindex/str

str
name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str

str
name=f.sachverstaendiger_s.facet.sortindex/str

str
name=facet.field{!ex=veranstaltung}veranstaltung/str

str
name=f.veranstaltung.facet.sortindex/str

str
name=facet.date{!ex=last_modified}last_modified/str

str
name=facet.date.gap+1MONTH/str

str
name=facet.date.endNOW/MONTH+1MONTH/str

str
name=facet.date.startNOW/MONTH-36MONTHS/str

str
name=facet.date.otherafter/str

/lst

/requestHandler




Re: json update moves doc to end

2013-12-03 Thread Andrea Gazzarini
AFAIK If you don't supply or configure a sort parameter, SOLR is sorting 
by score desc.
In that case, you may want to understand (at least view) how each 
document score is calculated: you can run the query with queryDebug set 
and see the whole explain


This great tool helped me a lot: _http://explain.solr.pl _

Best,
Andrea

On 12/03/2013 07:00 PM, Andreas Owen wrote:

When I search for agenda I get a lot of hits. Now if I update the 2.
Result by json-update the doc is moved to the end of the index when I search
for it again. The field I change is editorschoice and it never contains
the search term agenda so I don't see why it changes the order. Why does
it?

  


Part of Solrconfig requesthandler I use:

requestHandler name=/select2 class=solr.SearchHandler

  lst name=defaults

 str name=echoParamsexplicit/str

 int name=rows10/int

  str name=defTypesynonym_edismax/str

str name=synonymstrue/str

str name=qfplain_text^10 editorschoice^200

title^20 h_*^14

tags^10 thema^15 inhaltstyp^6 breadcrumb^6
doctype^10

contentmanager^5 links^5

last_modified^5  url^5

/str

str name=bq(expiration:[NOW TO *] OR (*:*
-expiration:*))^6/str  !-- tested: now or newer or empty gets small boost
--

str name=bflog(clicks)^8/str !-- tested --

!-- todo: anzahl-links(count urlparse in links query) /
häufigkeit von suchbegriff (bf= count in title and text)--

  str name=dftext/str

str name=fl*,path,score/str

str name=wtjson/str

str name=q.opAND/str




!-- Highlighting defaults --

 str name=hlon/str

  str name=hl.flplain_text,title/str

str name=hl.simple.prelt;bgt;/str

 str name=hl.simple.postlt;/bgt;/str




  !-- lst name=invariants --

 str name=faceton/str

str name=facet.mincount1/str

 str
name=facet.field{!ex=inhaltstyp}inhaltstyp/str

str
name=f.inhaltstyp.facet.sortindex/str

str
name=facet.field{!ex=doctype}doctype/str

str name=f.doctype.facet.sortindex/str

str
name=facet.field{!ex=thema_f}thema_f/str

str name=f.thema_f.facet.sortindex/str

str
name=facet.field{!ex=author_s}author_s/str

str name=f.author_s.facet.sortindex/str

str
name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str

str
name=f.sachverstaendiger_s.facet.sortindex/str

str
name=facet.field{!ex=veranstaltung}veranstaltung/str

str
name=f.veranstaltung.facet.sortindex/str

str
name=facet.date{!ex=last_modified}last_modified/str

str
name=facet.date.gap+1MONTH/str

str
name=facet.date.endNOW/MONTH+1MONTH/str

str
name=facet.date.startNOW/MONTH-36MONTHS/str

str
name=facet.date.otherafter/str

/lst

/requestHandler






RE: json update moves doc to end

2013-12-03 Thread Andreas Owen
So isn't it sorted automaticly by relevance (boost value)? If not do should
i set it in solrconfig?

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Dienstag, 3. Dezember 2013 19:07
To: solr-user@lucene.apache.org
Subject: Re: json update moves doc to end

What order, the order if you supply no explicit sort at all?

Solr does not make any guarantees about what order documents will come back
in if you do not ask for a sort.

In general in Solr/lucene, the only way to update a document is to re-add it
as a new document, so that's probably what's going on behind the scenes, and
it probably effects the 'default' sort order -- which Solr makes no
agreement about anyway, you probably shouldn't even count on it being
consistent at all.

If you want a consistent sort order, maybe add a field with a timestamp, and
ask for results sorted by the timestamp field? And then make sure not to
change the timestamp when you do an update that you don't want to change the
order?

Apologies if I've misunderstood the situation.

On 12/3/13 1:00 PM, Andreas Owen wrote:
 When I search for agenda I get a lot of hits. Now if I update the 2.
 Result by json-update the doc is moved to the end of the index when I 
 search for it again. The field I change is editorschoice and it 
 never contains the search term agenda so I don't see why it changes 
 the order. Why does it?



 Part of Solrconfig requesthandler I use:

 requestHandler name=/select2 class=solr.SearchHandler

   lst name=defaults

  str name=echoParamsexplicit/str

  int name=rows10/int

   str name=defTypesynonym_edismax/str

 str name=synonymstrue/str

 str name=qfplain_text^10 editorschoice^200

 title^20 h_*^14

 tags^10 thema^15 inhaltstyp^6 
 breadcrumb^6
 doctype^10

 contentmanager^5 links^5

 last_modified^5  url^5

 /str

 str name=bq(expiration:[NOW TO *] OR (*:* 
 -expiration:*))^6/str  !-- tested: now or newer or empty gets small 
 boost
 --

 str name=bflog(clicks)^8/str !-- tested --

 !-- todo: anzahl-links(count urlparse in links 
 query) / häufigkeit von suchbegriff (bf= count in title and text)--

   str name=dftext/str

 str name=fl*,path,score/str

 str name=wtjson/str

 str name=q.opAND/str



 !-- Highlighting defaults --

  str name=hlon/str

   str name=hl.flplain_text,title/str

 str name=hl.simple.prelt;bgt;/str

  str name=hl.simple.postlt;/bgt;/str



   !-- lst name=invariants --

  str name=faceton/str

 str name=facet.mincount1/str

  str
 name=facet.field{!ex=inhaltstyp}inhaltstyp/str

 str
 name=f.inhaltstyp.facet.sortindex/str

 str
 name=facet.field{!ex=doctype}doctype/str

 str 
 name=f.doctype.facet.sortindex/str

 str
 name=facet.field{!ex=thema_f}thema_f/str

 str 
 name=f.thema_f.facet.sortindex/str

 str
 name=facet.field{!ex=author_s}author_s/str

 str 
 name=f.author_s.facet.sortindex/str

 str
 name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str

 str
 name=f.sachverstaendiger_s.facet.sortindex/str

 str
 name=facet.field{!ex=veranstaltung}veranstaltung/str

 str
 name=f.veranstaltung.facet.sortindex/str

 str
 name=facet.date{!ex=last_modified}last_modified/str

 str 
 name=facet.date.gap+1MONTH/str

 str 
 name=facet.date.endNOW/MONTH+1MONTH/str

 str 
 name=facet.date.startNOW/MONTH-36MONTHS/str

 str 
 name=facet.date.otherafter/str

 /lst

 /requestHandler





Re: Automatically build spellcheck dictionary on replicas

2013-12-03 Thread Kydryavtsev Andrey
Yep, sorry, it doesn't work for file-based dictionaries:

 In particular, you still need to index the dictionary file once by issuing a 
 search with spellcheck.build=true on the end of the URL; if you system 
 doesn't update that dictionary file, then this only needs to be done once. 
 This manual step may be required even if your configuration sets build=true 
 and reload=true.

http://wiki.apache.org/solr/FileBasedSpellChecker

03.12.2013, 21:27, Mirko idonthaveenoughinformat...@googlemail.com:
 Yes, I have that, but it doesn't help. It seems Solr still needs the query
 with the spellcheck.build parameter to build the spellchecker index.

 2013/12/3 Kydryavtsev Andrey werde...@yandex.ru

  Did you try to add
    str name=buildOnCommittrue/str
   parameter to your slave's spellcheck configuration?

  03.12.2013, 12:04, Mirko idonthaveenoughinformat...@googlemail.com:
  Hi all,
  We use a Solr SpellcheckComponent with a file-based dictionary. We run a
  master and some replica slave servers. To update the dictionary, we copy
  the dictionary txt file to the master, from where it is automatically
  replicated to all slaves. However, it seems we need to run the
  spellcheck.build query on all servers individually.

  Is there a way to automatically build the spellcheck dictionary on all
  servers without calling spellcheck.build on all slaves individually?

  We use Solr 4.0.0

  Thanks,
  Mirko


Re: json update moves doc to end

2013-12-03 Thread Erick Erickson
Try adding debug=all and you'll see exactly how docs
are scored. Also, it'll show you exactly how your query is
parsed. Paste that if it's confused, it'll help figure out
what's going wrong.


On Tue, Dec 3, 2013 at 1:37 PM, Andreas Owen a...@conx.ch wrote:

 So isn't it sorted automaticly by relevance (boost value)? If not do should
 i set it in solrconfig?

 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: Dienstag, 3. Dezember 2013 19:07
 To: solr-user@lucene.apache.org
 Subject: Re: json update moves doc to end

 What order, the order if you supply no explicit sort at all?

 Solr does not make any guarantees about what order documents will come back
 in if you do not ask for a sort.

 In general in Solr/lucene, the only way to update a document is to re-add
 it
 as a new document, so that's probably what's going on behind the scenes,
 and
 it probably effects the 'default' sort order -- which Solr makes no
 agreement about anyway, you probably shouldn't even count on it being
 consistent at all.

 If you want a consistent sort order, maybe add a field with a timestamp,
 and
 ask for results sorted by the timestamp field? And then make sure not to
 change the timestamp when you do an update that you don't want to change
 the
 order?

 Apologies if I've misunderstood the situation.

 On 12/3/13 1:00 PM, Andreas Owen wrote:
  When I search for agenda I get a lot of hits. Now if I update the 2.
  Result by json-update the doc is moved to the end of the index when I
  search for it again. The field I change is editorschoice and it
  never contains the search term agenda so I don't see why it changes
  the order. Why does it?
 
 
 
  Part of Solrconfig requesthandler I use:
 
  requestHandler name=/select2 class=solr.SearchHandler
 
lst name=defaults
 
   str name=echoParamsexplicit/str
 
   int name=rows10/int
 
str name=defTypesynonym_edismax/str
 
  str name=synonymstrue/str
 
  str name=qfplain_text^10 editorschoice^200
 
  title^20 h_*^14
 
  tags^10 thema^15 inhaltstyp^6
  breadcrumb^6
  doctype^10
 
  contentmanager^5 links^5
 
  last_modified^5  url^5
 
  /str
 
  str name=bq(expiration:[NOW TO *] OR (*:*
  -expiration:*))^6/str  !-- tested: now or newer or empty gets small
  boost
  --
 
  str name=bflog(clicks)^8/str !-- tested --
 
  !-- todo: anzahl-links(count urlparse in links
  query) / häufigkeit von suchbegriff (bf= count in title and text)--
 
str name=dftext/str
 
  str name=fl*,path,score/str
 
  str name=wtjson/str
 
  str name=q.opAND/str
 
 
 
  !-- Highlighting defaults --
 
   str name=hlon/str
 
str name=hl.flplain_text,title/str
 
  str name=hl.simple.prelt;bgt;/str
 
   str name=hl.simple.postlt;/bgt;/str
 
 
 
!-- lst name=invariants --
 
   str name=faceton/str
 
  str name=facet.mincount1/str
 
   str
  name=facet.field{!ex=inhaltstyp}inhaltstyp/str
 
  str
  name=f.inhaltstyp.facet.sortindex/str
 
  str
  name=facet.field{!ex=doctype}doctype/str
 
  str
  name=f.doctype.facet.sortindex/str
 
  str
  name=facet.field{!ex=thema_f}thema_f/str
 
  str
  name=f.thema_f.facet.sortindex/str
 
  str
  name=facet.field{!ex=author_s}author_s/str
 
  str
  name=f.author_s.facet.sortindex/str
 
  str
  name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str
 
  str
  name=f.sachverstaendiger_s.facet.sortindex/str
 
  str
  name=facet.field{!ex=veranstaltung}veranstaltung/str
 
  str
  name=f.veranstaltung.facet.sortindex/str
 
  str
  name=facet.date{!ex=last_modified}last_modified/str
 
  str
  name=facet.date.gap+1MONTH/str
 
  str
  name=facet.date.endNOW/MONTH+1MONTH/str
 
  str
  name=facet.date.startNOW/MONTH-36MONTHS/str
 
  str
  name=facet.date.otherafter/str
 
  /lst
 
  /requestHandler
 
 




a core for every user, lots of users... are there issues

2013-12-03 Thread hank williams
 We are building a system where there is a core for every user. There will
be many tens or perhaps ultimately hundreds of thousands or millions of
users. We do not need each of those users to have “warm” data in memory. In
fact doing so would consume lots of memory unnecessarily, for users that
might not have logged in in a long time.

So my question is, is the default behavior of Solr to try to keep all of
our cores warm, and if so, can we stop it? Also given the number of cores
that we will likely have is there anything else we should be keeping in
mind to maximize performance and minimize memory usage?


Re: post filtering for boolean filter queries

2013-12-03 Thread Yonik Seeley
On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan solrexp...@gmail.com wrote:
 ok, we were able to confirm the behavior regarding not caching the filter
 query. It works as expected. It does not cache with {!cache=false}.

 We are still looking into clarifying the cost assignment: i.e. whether it
 works as expected for long boolean filter queries.

Yes, filters should be ordered by cost (cheapest first) whenever you
use {!cache=false}

-Yonik
http://heliosearch.com -- making solr shine


Re: a core for every user, lots of users... are there issues

2013-12-03 Thread Erick Erickson
You probably want to look at transient cores, see:
http://wiki.apache.org/solr/LotsOfCores

But millions will be interesting for a single node, you must have some
kind of partitioning in mind?

Best,
Erick


On Tue, Dec 3, 2013 at 2:38 PM, hank williams hank...@gmail.com wrote:

  We are building a system where there is a core for every user. There will
 be many tens or perhaps ultimately hundreds of thousands or millions of
 users. We do not need each of those users to have “warm” data in memory. In
 fact doing so would consume lots of memory unnecessarily, for users that
 might not have logged in in a long time.

 So my question is, is the default behavior of Solr to try to keep all of
 our cores warm, and if so, can we stop it? Also given the number of cores
 that we will likely have is there anything else we should be keeping in
 mind to maximize performance and minimize memory usage?



Re: Using Payloads as a Coefficient For Score At a Custom QParser That extends ExtendedDismaxQParser

2013-12-03 Thread Furkan KAMACI
I've implemented what I want. I can add payload score into the document
score. I've modified ExtendedDismaxQParser and I can use all the abilities
of edismax at my case. I will explain what I did at my blog.

Thanks;
Furkan KAMACI


2013/12/1 Furkan KAMACI furkankam...@gmail.com

 Hi;

 I use Solr 4.5.1 I have a case: When a user searches for some specific
 keywords some documents should be listed at much more higher than its usual
 score. I mean I have probabilities of which documents user may want to see
 for given keywords.

 I have come up with that idea. I can put a new field to my schema. This
 field holds keyword and probability as payload. When a user searches for a
 keyword I will calculate usual document score for given fields and also I
 will make a search on payloaded field and I will multiply the total score
 with that payload.

 I followed that example:
 http://sujitpal.blogspot.com/2013/07/porting-payloads-to-solr4.html#! owever
 that example extends Qparser directly but I want to use capabilities of
 edismax.

 So I found that example:
 http://digitalpebble.blogspot.com/2010/08/using-payloads-with-dismaxqparser-in.html
  his
 one exteds dismax and but I could not used payloads at that example.

 I want to combine above to solutions. First solution has that case:

 @Override
 public Similarity get(String name) {
 if (payloads.equals(name) || cscores.equals(name)) {
 return new PayloadSimilarity();
 } else {
 return new DefaultSimilarity();
 }
 }

 However dismax behaves different. i.e. when you search for cscores:A it
 changes that into that:

 *+((text:cscores:y text:cscores text:y text:cscoresy)) ()*

 When I debug it name is text instead of cscores and does not work. My idea
 is combining two examples and extending edismax. Do you have any idea how
 to extend it for edismax or do you have any idea what to do for my case.

 *PS:* I've sent same question at Lucene user list too. I ask it here to
 get an idea from Solr perspective too.

 Thanks;
 Furkan KAMACI



Re: a core for every user, lots of users... are there issues

2013-12-03 Thread hank williams
On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson erickerick...@gmail.comwrote:

 You probably want to look at transient cores, see:
 http://wiki.apache.org/solr/LotsOfCores

 But millions will be interesting for a single node, you must have some
 kind of partitioning in mind?


Wow. Thanks for that great link. Yes we are sharding so its not like there
would be millions of cores on one machine or even cluster. And since the
cores are one per user, this is a totally clean approach. But still we want
to make sure that we are not overloading the machine. Do you have any sense
of what a good upper limit might be, or how we might figure that out?



 Best,
 Erick


 On Tue, Dec 3, 2013 at 2:38 PM, hank williams hank...@gmail.com wrote:

   We are building a system where there is a core for every user. There
 will
  be many tens or perhaps ultimately hundreds of thousands or millions of
  users. We do not need each of those users to have “warm” data in memory.
 In
  fact doing so would consume lots of memory unnecessarily, for users that
  might not have logged in in a long time.
 
  So my question is, is the default behavior of Solr to try to keep all of
  our cores warm, and if so, can we stop it? Also given the number of cores
  that we will likely have is there anything else we should be keeping in
  mind to maximize performance and minimize memory usage?
 




-- 
blog: whydoeseverythingsuck.com


How to Empty Content of a Field via Solrj?

2013-12-03 Thread Furkan KAMACI
How can I empty content of a field at Solr (I use Solr 4.5.1 as SolrCloud)
via Solrj? I mean if I have that document at my index:

field1: abc
field2: def
field3: ghi

and if I want to empty the content of field2. I want to have:


field1: abc
field2:  
field3: ghi


Re: How to Empty Content of a Field via Solrj?

2013-12-03 Thread Furkan KAMACI
I know that I can use Atomic Updates for such cases but I want to
atomically update a field by a search result (I want to use that
functionality as like nested queries). Any other ideas are welcome.


2013/12/3 Furkan KAMACI furkankam...@gmail.com

 How can I empty content of a field at Solr (I use Solr 4.5.1 as SolrCloud)
 via Solrj? I mean if I have that document at my index:

 field1: abc
 field2: def
 field3: ghi

 and if I want to empty the content of field2. I want to have:


 field1: abc
 field2:  
 field3: ghi






Re: a core for every user, lots of users... are there issues

2013-12-03 Thread hank williams
Also, I see that the lotsofcores stuff is for solr 4.4 and above. What is
the state of the 4.4 codebase? Could we start using it now? Is it safe?


On Tue, Dec 3, 2013 at 3:33 PM, hank williams hank...@gmail.com wrote:




 On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson erickerick...@gmail.comwrote:

 You probably want to look at transient cores, see:
 http://wiki.apache.org/solr/LotsOfCores

 But millions will be interesting for a single node, you must have some
 kind of partitioning in mind?


 Wow. Thanks for that great link. Yes we are sharding so its not like there
 would be millions of cores on one machine or even cluster. And since the
 cores are one per user, this is a totally clean approach. But still we want
 to make sure that we are not overloading the machine. Do you have any sense
 of what a good upper limit might be, or how we might figure that out?



 Best,
 Erick


 On Tue, Dec 3, 2013 at 2:38 PM, hank williams hank...@gmail.com wrote:

   We are building a system where there is a core for every user. There
 will
  be many tens or perhaps ultimately hundreds of thousands or millions of
  users. We do not need each of those users to have “warm” data in
 memory. In
  fact doing so would consume lots of memory unnecessarily, for users that
  might not have logged in in a long time.
 
  So my question is, is the default behavior of Solr to try to keep all of
  our cores warm, and if so, can we stop it? Also given the number of
 cores
  that we will likely have is there anything else we should be keeping in
  mind to maximize performance and minimize memory usage?
 




 --
 blog: whydoeseverythingsuck.com




-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-03 Thread hank williams
Sorry, I see that we are up to solr 4.6. I missed that.


On Tue, Dec 3, 2013 at 3:53 PM, hank williams hank...@gmail.com wrote:

 Also, I see that the lotsofcores stuff is for solr 4.4 and above. What
 is the state of the 4.4 codebase? Could we start using it now? Is it safe?


 On Tue, Dec 3, 2013 at 3:33 PM, hank williams hank...@gmail.com wrote:




 On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 You probably want to look at transient cores, see:
 http://wiki.apache.org/solr/LotsOfCores

 But millions will be interesting for a single node, you must have some
 kind of partitioning in mind?


 Wow. Thanks for that great link. Yes we are sharding so its not like
 there would be millions of cores on one machine or even cluster. And since
 the cores are one per user, this is a totally clean approach. But still we
 want to make sure that we are not overloading the machine. Do you have any
 sense of what a good upper limit might be, or how we might figure that out?



 Best,
 Erick


 On Tue, Dec 3, 2013 at 2:38 PM, hank williams hank...@gmail.com wrote:

   We are building a system where there is a core for every user. There
 will
  be many tens or perhaps ultimately hundreds of thousands or millions of
  users. We do not need each of those users to have “warm” data in
 memory. In
  fact doing so would consume lots of memory unnecessarily, for users
 that
  might not have logged in in a long time.
 
  So my question is, is the default behavior of Solr to try to keep all
 of
  our cores warm, and if so, can we stop it? Also given the number of
 cores
  that we will likely have is there anything else we should be keeping in
  mind to maximize performance and minimize memory usage?
 




 --
 blog: whydoeseverythingsuck.com




 --
 blog: whydoeseverythingsuck.com




-- 
blog: whydoeseverythingsuck.com


Re: post filtering for boolean filter queries

2013-12-03 Thread Michael Sokolov

On 12/03/2013 01:55 AM, Dmitry Kan wrote:

Hello!

We have been experimenting with post filtering lately. Our setup is a
filter having long boolean query; drawing the example from the Dublin's
Stump the Chump:

fq=UserId:(user1 OR user2 OR...OR user1000)

The underlining issue impacting performance is that the combination of user
ids in the query above is unique per each user in the system and on top the
combination is changing every day.

Our idea was to stop caching the filter query with {!cache=false}. Since
there is no way to introspect the contents of the filter cache to our
knowledge (jmx?), we can't be sure those are not cached. This is because
the initial query per each combination takes substantially more time (as if
it was *not* cached) than the second and subsequent queries with the same
fq (as if it *was* cached).

Question is: does post filtering support boolean queries in fq params?

Another thing we have been trying is assigning a cost to the fq relatively
higher than for other filter queries. Does this feature support the boolean
queries in fq params as well?

Dmitry - I went to a talk at LR where this problem was discussed, and 
the solution of implementing a custom filter cache only for logged-in 
users  was discussed -- sounds interesting, but maybe some tricky parts 
to it


-Mike


Re: SolrCloud FunctionQuery inconsistency

2013-12-03 Thread Chris Hostetter
: Yes, I am populating ptime using a default of NOW.
: 
: I only store the id, so I can't get ptime values. But from the perspective
: of business logic, ptime should not change.

if you are populating it using a *schema* default then the warning text i 
pasted into my last message would definitely apply to your situation and 
eeasily explain the behavior your are seeing -- because the schema 
defaults are applied on a per node basis, so the values wouldn't be 
garunteed to be consistent for hte entire shard.

If you are populating it using and update processor that fills in a 
default (such as the TimestampUpdateProcessorFactory i linked to in my 
last message) prior to the distribute update logic, then everything should 
be working fine and if you are seeing the order change then the problem is 
likeley unrelated to my wild guess.

As erick said: you have to give us a *lot* more details (exactly what 
your data looks like, what queries you are doing, what results you see, 
how those results differ from what you expect, etc...) in order to provide 
more useful/meaningful advice.

https://wiki.apache.org/solr/UsingMailingLists


-Hoss
http://www.lucidworks.com/


Re: a core for every user, lots of users... are there issues

2013-12-03 Thread Erick Erickson
bq: Do you have any sense of what a good upper limit might be, or how we
might figure that out?

As always, it depends (tm). And the biggest thing it depends upon is the
number of simultaneous users you have and the size of their indexes. And
we've arrived at the black box of estimating size again. Siiihh... I'm
afraid that the only way is to test and establish some rules of thumb.

The transient core constraint will limit the number of cores loaded at
once. If you allow too many cores at once, you'll get OOM errors when all
the users pile on at the same time.

Let's say you've determined that 100 is the limit for transient cores. What
I suspect you'll see is degrading response times if this is too low. Say
110 users are signed on and say they submit queries perfectly in order, one
after the other. Every request will require the core to be opened and it'll
take a bit. So that'll be a flag.

Or that's a fine limit but your users have added more and more documents
and you're coming under memory pressure.

As you can tell I don't have any good answers. I've seen between 10M and
300M documents on a single machine

BTW, on a _very_ casual test I found about 1000 cores/second were found in
discovery mode. While they aren't loaded if they're transient, it's still a
consideration if you have 10s of thousands.

Best,
Erick



On Tue, Dec 3, 2013 at 3:33 PM, hank williams hank...@gmail.com wrote:

 On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  You probably want to look at transient cores, see:
  http://wiki.apache.org/solr/LotsOfCores
 
  But millions will be interesting for a single node, you must have some
  kind of partitioning in mind?
 
 
 Wow. Thanks for that great link. Yes we are sharding so its not like there
 would be millions of cores on one machine or even cluster. And since the
 cores are one per user, this is a totally clean approach. But still we want
 to make sure that we are not overloading the machine. Do you have any sense
 of what a good upper limit might be, or how we might figure that out?



  Best,
  Erick
 
 
  On Tue, Dec 3, 2013 at 2:38 PM, hank williams hank...@gmail.com wrote:
 
We are building a system where there is a core for every user. There
  will
   be many tens or perhaps ultimately hundreds of thousands or millions of
   users. We do not need each of those users to have “warm” data in
 memory.
  In
   fact doing so would consume lots of memory unnecessarily, for users
 that
   might not have logged in in a long time.
  
   So my question is, is the default behavior of Solr to try to keep all
 of
   our cores warm, and if so, can we stop it? Also given the number of
 cores
   that we will likely have is there anything else we should be keeping in
   mind to maximize performance and minimize memory usage?
  
 



 --
 blog: whydoeseverythingsuck.com



Re: Deleting and committing inside a SearchComponent

2013-12-03 Thread Upayavira


On Tue, Dec 3, 2013, at 03:22 PM, Peyman Faratin wrote:
 Hi
 
 Is it possible to delete and commit updates to an index inside a custom
 SearchComponent? I know I can do it with solrj but due to several
 business logic requirements I need to build the logic inside the search
 component.  I am using SOLR 4.5.0. 

That just doesn't make sense. Search components are read only.

What are you trying to do? What stuff do you need to change? Could you
do it within an UpdateProcessor?

Upayavira


Re: json update moves doc to end

2013-12-03 Thread Upayavira
By default it sorts by score. If the score is a consistent one, it will
order docs as they appear in the index, which effectively means an
undefined order.

For example a *:* query doesn't have terms that can be used to score, so
every doc will get a score if 1.

Upayavira

On Tue, Dec 3, 2013, at 06:37 PM, Andreas Owen wrote:
 So isn't it sorted automaticly by relevance (boost value)? If not do
 should
 i set it in solrconfig?
 
 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
 Sent: Dienstag, 3. Dezember 2013 19:07
 To: solr-user@lucene.apache.org
 Subject: Re: json update moves doc to end
 
 What order, the order if you supply no explicit sort at all?
 
 Solr does not make any guarantees about what order documents will come
 back
 in if you do not ask for a sort.
 
 In general in Solr/lucene, the only way to update a document is to re-add
 it
 as a new document, so that's probably what's going on behind the scenes,
 and
 it probably effects the 'default' sort order -- which Solr makes no
 agreement about anyway, you probably shouldn't even count on it being
 consistent at all.
 
 If you want a consistent sort order, maybe add a field with a timestamp,
 and
 ask for results sorted by the timestamp field? And then make sure not to
 change the timestamp when you do an update that you don't want to change
 the
 order?
 
 Apologies if I've misunderstood the situation.
 
 On 12/3/13 1:00 PM, Andreas Owen wrote:
  When I search for agenda I get a lot of hits. Now if I update the 2.
  Result by json-update the doc is moved to the end of the index when I 
  search for it again. The field I change is editorschoice and it 
  never contains the search term agenda so I don't see why it changes 
  the order. Why does it?
 
 
 
  Part of Solrconfig requesthandler I use:
 
  requestHandler name=/select2 class=solr.SearchHandler
 
lst name=defaults
 
   str name=echoParamsexplicit/str
 
   int name=rows10/int
 
str name=defTypesynonym_edismax/str
 
  str name=synonymstrue/str
 
  str name=qfplain_text^10 editorschoice^200
 
  title^20 h_*^14
 
  tags^10 thema^15 inhaltstyp^6 
  breadcrumb^6
  doctype^10
 
  contentmanager^5 links^5
 
  last_modified^5  url^5
 
  /str
 
  str name=bq(expiration:[NOW TO *] OR (*:* 
  -expiration:*))^6/str  !-- tested: now or newer or empty gets small 
  boost
  --
 
  str name=bflog(clicks)^8/str !-- tested --
 
  !-- todo: anzahl-links(count urlparse in links 
  query) / häufigkeit von suchbegriff (bf= count in title and text)--
 
str name=dftext/str
 
  str name=fl*,path,score/str
 
  str name=wtjson/str
 
  str name=q.opAND/str
 
 
 
  !-- Highlighting defaults --
 
   str name=hlon/str
 
str name=hl.flplain_text,title/str
 
  str name=hl.simple.prelt;bgt;/str
 
   str name=hl.simple.postlt;/bgt;/str
 
 
 
!-- lst name=invariants --
 
   str name=faceton/str
 
  str name=facet.mincount1/str
 
   str
  name=facet.field{!ex=inhaltstyp}inhaltstyp/str
 
  str
  name=f.inhaltstyp.facet.sortindex/str
 
  str
  name=facet.field{!ex=doctype}doctype/str
 
  str 
  name=f.doctype.facet.sortindex/str
 
  str
  name=facet.field{!ex=thema_f}thema_f/str
 
  str 
  name=f.thema_f.facet.sortindex/str
 
  str
  name=facet.field{!ex=author_s}author_s/str
 
  str 
  name=f.author_s.facet.sortindex/str
 
  str
  name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str
 
  str
  name=f.sachverstaendiger_s.facet.sortindex/str
 
  str
  name=facet.field{!ex=veranstaltung}veranstaltung/str
 
  str
  name=f.veranstaltung.facet.sortindex/str
 
  str
  name=facet.date{!ex=last_modified}last_modified/str
 
  str 
  name=facet.date.gap+1MONTH/str
 
  str 
  name=facet.date.endNOW/MONTH+1MONTH/str
 
  str 
  name=facet.date.startNOW/MONTH-36MONTHS/str
 
  str 
  name=facet.date.otherafter/str
 
  /lst
 
  

Re: Deleting and committing inside a SearchComponent

2013-12-03 Thread Peyman Faratin

On Dec 3, 2013, at 8:41 PM, Upayavira u...@odoko.co.uk wrote:

 
 
 On Tue, Dec 3, 2013, at 03:22 PM, Peyman Faratin wrote:
 Hi
 
 Is it possible to delete and commit updates to an index inside a custom
 SearchComponent? I know I can do it with solrj but due to several
 business logic requirements I need to build the logic inside the search
 component.  I am using SOLR 4.5.0. 
 
 That just doesn't make sense. Search components are read only.
 
i can think of many situations that it makes sense. for instance, you search 
for a document and your index contains many duplicates that only differ by one 
field, such as the time they were indexed (think news feeds from multiple 
sources). So after the search we want to delete the duplicate documents that 
satisfy some policy (here date, but it could be some other policy). 

 What are you trying to do? What stuff do you need to change? Could you
 do it within an UpdateProcessor?

Solution i am working with 

UpdateRequestProcessorChain processorChain = 
rb.req.getCore().getUpdateProcessingChain(rb.req.getParams().get(UpdateParams.UPDATE_CHAIN));
UpdateRequestProcessor processor = processorChain.createProcessor(rb.req, 
rb.rsp);
...
docId = f();
...
DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
cmd.setId(docId.toString());
processor.processDelete(cmd);


 
 Upayavira



Re: SolrCloud FunctionQuery inconsistency

2013-12-03 Thread sling
Thanks, Chirs:
The schema is:
field name=title type=textComplex indexed=true stored=false
multiValued=false omitNorms=true  /
field name=dkeys type=textComplex indexed=true stored=false
multiValued=false omitNorms=true /
field name=ptime type=date indexed=true stored=false
multiValued=false omitNorms=true /

There is no default value for ptime. It is generated by users.

There are 4 shards in this solrcloud, and 2 nodes in each shard.

I was trying query with a function query({!boost b=dateDeboost(ptime)}
channelid:0082  title:abc), which leads differents results from the same
shard(using the param: shards=shard3).

The diffenence is maxScore, which is not consistent. And the maxScore is
either score A or score B.
And at the same time, new docs are indexed.
In my opinion, the maxScore should be the same between querys in a very
short time. or at least, it shoud not always change between score A and
score B.

And quite by accident, the sort result is even inconsistent(say there is a
doc in this query, and not in another query, over and over ). It does appear
once, but not reappear again.


Does this mean , when query happens, the index in replica has not synced
from its leader? so if query from different nodes from the shard at the same
time, it shows different results.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104851.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SolrCloud FunctionQuery inconsistency

2013-12-03 Thread Raju Shikha
Hi All,

Sorry to ask, is it possible to create multiple collections in solr standalone 
mode.I mean only one solr instance.I am able to create multiple collections in 
solr cloud environment. But when creating in solr standalone, it is saying, 
solr is not in cloud mode.Any suggestions great help..


Regards,
Raju Shikha

-Original Message-
From: sling [mailto:sling...@gmail.com] 
Sent: 04 December 2013 08:33
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud FunctionQuery inconsistency

Thanks, Chirs:
The schema is:
field name=title type=textComplex indexed=true stored=false
multiValued=false omitNorms=true  /
field name=dkeys type=textComplex indexed=true stored=false
multiValued=false omitNorms=true /
field name=ptime type=date indexed=true stored=false
multiValued=false omitNorms=true /

There is no default value for ptime. It is generated by users.

There are 4 shards in this solrcloud, and 2 nodes in each shard.

I was trying query with a function query({!boost b=dateDeboost(ptime)}
channelid:0082  title:abc), which leads differents results from the same
shard(using the param: shards=shard3).

The diffenence is maxScore, which is not consistent. And the maxScore is
either score A or score B.
And at the same time, new docs are indexed.
In my opinion, the maxScore should be the same between querys in a very
short time. or at least, it shoud not always change between score A and
score B.

And quite by accident, the sort result is even inconsistent(say there is a
doc in this query, and not in another query, over and over ). It does appear
once, but not reappear again.


Does this mean , when query happens, the index in replica has not synced
from its leader? so if query from different nodes from the shard at the same
time, it shows different results.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104851.html
Sent from the Solr - User mailing list archive at Nabble.com.