date:20111122

--- On Tue, 11/22/11, Rahul Mehta rahul23134...@gmail.com wrote:

 From: Rahul Mehta rahul23134...@gmail.com
 Subject: Integrating Surround Query Parser
 To: solr-user@lucene.apache.org
 Date: Tuesday, November 22, 2011, 8:05 AM
 Hello,

 I want to Run surround query .

    1. Downloading from
    http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm
    2. Moved the
 lucene-surround-2.4.1.jar  to
 /apache-solr-3.1.0/example/lib
    3. Edit  the solrconfig.xml with
       1. queryParser
 name=SurroundQParser class=

 org.apache.lucene.queryParser.surround.parser.QueryParser/
    4. Restart Solr

 Got this error :

 org.apache.solr.common.SolrException: Error Instantiating
 QParserPlugin,
 org.apache.lucene.queryParser.surround.parser.QueryParser
 is not a org.apache.solr.search.QParserPlugin
     at
 org.apache.solr.core.SolrCore.createInstance(SolrCore.java:425)

 -- 
 Thanks  Regards

Hello Rahul,

It is already integrated. Please see : 
http://wiki.apache.org/solr/SurroundQueryParser

Re: how to use term proxymity queries with apache solr

 Have used Proximity Queries only work using a sloppy phrase
 query (e.g.:
 catalyst polymer ~5) but do not allow wildcards.
 
 Want to use Proximity Queries between any terms (e.g.:
 (poly* NEAR *lyst))
 is this possible using additional query parsers like
 Surround?
 
 if yes ,Please suggest how to install surround.
 
 currently we are using solr 3.1 .

Not sure about leading wildcard but uou can use https://issues.apache.org for 
this.

How to be sure that surround

I have done the following steps for installing surround plugin.

   1. Downloading from
   http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm
   2. Moved the lucene-surround-2.4.1.jar  to /apache-solr-3.1.0/example/lib
   3. restart solr .

But How to be sure that surround plugin is being installed .
Means what query i can run.

-- 
Thanks  Regards

Rahul Mehta

Re: how to make effective search with fq and q params

2011-11-22 Thread pravesh

Usually,

Use the 'q' parameter to search for the free text values entered by the
users (where you might want to parse the query and/or apply
boosting/phrase-sloppy, minimum match,tie etc )

Use the 'fq' to limit the searches to certain criterias like location,
date-ranges etc.

Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to use term proxymity queries with apache solr

 Not sure about leading wildcard but you can use https://issues.apache.org for 
 this.

Sorry, link was : https://issues.apache.org/jira/browse/SOLR-1604

Re: How to be sure that surround

 I have done the following steps for
 installing surround plugin.
 
    1. Downloading from
    http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm
    2. Moved the
 lucene-surround-2.4.1.jar  to
 /apache-solr-3.1.0/example/lib
    3. restart solr .
 
 But How to be sure that surround plugin is being installed
 .
 Means what query i can run.
 

Rahul, you need to switch to solr-trunk, it is already there
http://wiki.apache.org/solr/SurroundQueryParser

Re: How to be sure that surround

I have the solr-trunk , but queries are running on both (on trunk (4.0) and
on (3.1) ) . then how i can be sure that what query will run by surround
query parser plugin.

The query i tried :
http://localhost:8983/solr/select?q=abstracts:99n(flat,panel,display)

http://localhost:8983/solr/select?q=abstracts:(poly*%20NEAR%20*lyst)

The above queries both are running on 3.1 and 4.0

How i can sure that these query are running by Surround Plugin.


On Tue, Nov 22, 2011 at 5:51 PM, Ahmet Arslan iori...@yahoo.com wrote:

  I have done the following steps for
  installing surround plugin.
 
 1. Downloading from
 http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm
 2. Moved the
  lucene-surround-2.4.1.jar  to
  /apache-solr-3.1.0/example/lib
 3. restart solr .
 
  But How to be sure that surround plugin is being installed
  .
  Means what query i can run.
 

 Rahul, you need to switch to solr-trunk, it is already there
 http://wiki.apache.org/solr/SurroundQueryParser




-- 
Thanks  Regards

Rahul Mehta

Re: how to use term proxymity queries with apache solr

do i need to install this seperately or it is integrated in solr 4.0 ?

On Tue, Nov 22, 2011 at 5:49 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Not sure about leading wildcard but you can use
 https://issues.apache.org for this.

 Sorry, link was : https://issues.apache.org/jira/browse/SOLR-1604




-- 
Thanks  Regards

Rahul Mehta

Solr highlighting isn't work!

2011-11-22 Thread VladislavLysov

Hello!!!
  I have a trouble with Solr highlighting. I have any document with next
fields- TYPE, DBID and others. When i do next request - 
https://localhost:8443/solr/myCore/afts?wt=standardq=TYPE:
https://localhost:8443/solr/myCore/afts?wt=standardq=TYPE:cm:contentindent=onhl=truehl.fl=DBIDhl.usePhraseHighlighter=truefl=DBID
 
it was returned next text:
response
lst name=responseHeader
int name=status0/int
int name=QTime3/int
/lst
result name=response numFound=166 start=0
doc
arr name=DBID
str892/str
/arr
/doc
doc
/result
lst name=highlighting
lst name=LEAF-892/
/lst
/response
What is problem?
Thank you!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-highlighting-isn-t-work-tp3527701p3527701.html
Sent from the Solr - User mailing list archive at Nabble.com.

Stats per group with StatsComponent?

2011-11-22 Thread Morten Lied Johansen



Hi

We need to get minimum and maximum values for a field, within a group in 
a grouped search-result. Is this possible today, perhaps by using 
StatsComponent some way?


I'll flesh out the example a little, to make the question clearer.

We have a number of documents, indexed with a price, date and a hotel. 
For each hotel, there are a number of documents, each representing a 
price/date combination. We then group our search result on hotel.


We want to show the minimum and maximum price for each hotel.

A little googling leads us to look at StatsComponent, as what it does 
would be what we need, if it could be done for each group. There was a 
thread on this list in August, Grouping and performing statistics per 
group that seemed to go into this a bit, but didn't find a solution.


Is this possible in Solr 3.4, either with StatsComponent, or some other way?

--
Morten
We all live in a yellow subroutine.

Re: how to make effective search with fq and q params

2011-11-22 Thread meghana

Thanks Pravesh for your reply.. 
I definitely try this.. i hope it will improve solr response time.

pravesh wrote
 
 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to
 matchalldocsquery
 
 Regds
 Pravesh
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527654.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to be sure that surround

 I have the solr-trunk , but queries
 are running on both (on trunk (4.0) and
 on (3.1) ) . then how i can be sure that what query will
 run by surround
 query parser plugin.
 
 The query i tried :
 http://localhost:8983/solr/select?q=abstracts:99n(flat,panel,display)
 
 http://localhost:8983/solr/select?q=abstracts:(poly*%20NEAR%20*lyst)
 
 The above queries both are running on 3.1 and 4.0
 
 How i can sure that these query are running by Surround
 Plugin.
 

You can use q={!surround df=abstracts}99n(flat,panel,display)

If you append debugQuery=on, it should display some info regarding which query 
parser is used, which Query is constructed etc.

Re: how to use term proxymity queries with apache solr


 do i need to install this seperately
 or it is integrated in solr 4.0 ?

You need to install SOLR-1604 separately. But this is easy since it is 
implemented as a solr plugin.

Re: Integrating Surround Query Parser

The surround query parser is fully wired into Solr trunk/4.0, if that helps.  
See http://wiki.apache.org/solr/SurroundQueryParser and the JIRA issue linked 
there in case you want to patch it into a different version.

Erik

On Jan 21, 2011, at 02:24 , Ahson Iqbal wrote:

 Hi All
 
 I want to integrate Surround Query Parser with solr, To do this i have 
 downloaded jar file from the internet and and then pasting that jar file in 
 web-inf/lib 
 
 and configured query parser in solrconfig.xml as 
 queryParser name=SurroundQParser 
 class=org.apache.lucene.queryParser.surround.parser.QueryParser/
 
 now when i load solr admin page following exception comes
 org.apache.solr.common.SolrException: Error Instantiating QParserPlugin,  
 org.apache.lucene.queryParser.surround.parser.QueryParser is not a  
 org.apache.solr.search.QParserPlugin
 
 what i think that i didnt get the right plugin, can any body guide me from 
 where 
 to get right plugin for surround query parser or how to accurately integrate 
 this plugin with solr. 
 
 
 thanx
 Ahsan

Re: how to make effective search with fq and q params

If all you're doing is filtering (browsing by facets perhaps), it's perfectly 
fine to have q=*:*.  MatchAllDocsQuery is fast (and would be cached anyway), so 
use *:* as appropriate without worries.

Erik



On Nov 22, 2011, at 07:18 , pravesh wrote:

 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery
 
 Regds
 Pravesh
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: [ANNOUNCEMENT] Second Edition of the First Book on Solr

2011-11-22 Thread Jan Høydahl

Congratulations!

Feel free to write a shorter version of the announcement text, suitable as a 
news teaser on the Solr site, and we'll try to update the site with new thumb 
and all.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 18. nov. 2011, at 06:17, Smiley, David W. wrote:

 Fellow Solr users,
 
 I am proud to announce that the book Apache Solr 3 Enterprise Search Server 
 is officially published!  This is the second edition of the first book on 
 Solr by me, David Smiley, and my co-author Eric Pugh.  You can find full 
 details about the book, download a free chapter, and purchase it here:
   http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 It is also available through other channels like Amazon.  You can feel good 
 about the purchase knowing that 5% of each sale goes to support the Apache 
 Software Foundation.  If you buy directly from the publisher, then the basis 
 of the percentage that goes to the ASF (and to me) is higher than if you buy 
 it through other channels.  
 
 This book naturally covers the latest features in Solr as of version 3.4 like 
 Result Grouping and Geospatial, but this is not a small update to the first 
 book.  We have more experience with Solr and we've listened to reader 
 feedback from the first edition.  No chapter was untouched: Faceting gets its 
 own chapter, all search relevancy matters are discussed in one chapter, 
 auto-complete approaches are all discussed together, much of the chapter on 
 integration was rewritten to discuss newer technologies, and the first 
 chapter was greatly streamlined.  Furthermore, each chapter has a tip in the 
 introduction that advises readers in a hurry on what parts should be read now 
 or later.  Finally, we developed a 2-page parameter quick-reference appendix 
 that you will surely find useful printed on your desk.  In summary, we 
 improved the existing content, and added about 25% more by page count.
 
 Software, errata, and other information about this book and the previous 
 edition is on our website:
   http://www.solrenterprisesearchserver.com/
 We've been working hard on this book for the last 10 months and we hope it 
 really helps saves you time and improves your search project!
 
   Apache Solr 3 Enterprise Search Server In Detail:
 
 If you are a developer building an app today then you know how important a 
 good search experience is.  Apache Solr, built on Apache Lucene, is a wildly 
 popular open source enterprise search server that easily delivers powerful 
 search and faceted navigation features that are elusive with databases.  Solr 
 supports complex search criteria, faceting, result highlighting, 
 query-completion, query spell-check, relevancy tuning, and more.
 
 Apache Solr 3 Enterprise Search Server is a comprehensive reference guide for 
 every feature Solr has to offer.  It serves the reader right from initiation 
 to development to deployment.  It also comes with complete running examples 
 to demonstrate its use and show how to integrate Solr with other languages 
 and frameworks.
 
 Through using a large set of metadata about artists, releases, and tracks 
 courtesy of the MusicBrainz.org project, you will have a testing ground for 
 Solr, and will learn how to import this data in various ways.  You will then 
 learn how to search this data in different ways, including Solr's rich query 
 syntax and boosting match scores based on record data.  Finally, we'll 
 cover various deployment considerations to include indexing strategies and 
 performance-oriented configuration that will enable you to scale Solr to meet 
 the needs of a high-volume site.
 
 Sincerely,
 
   David Smiley (primary author)   david.w.smi...@gmail.com
   Eric Pugh (co-author)   ep...@opensourceconnections.com

Re: date range in solr 3.1

2011-11-22 Thread Jan Høydahl

Hi,

Long shot: Try f.date.facet.range.gap=%2B1DAY instead, in case your + was 
interpreted as space by your browser...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 22. nov. 2011, at 12:57, do3do3 wrote:

 i try to use range faceting in solr 3.1 using facet.range=date,
 f.date.facet.range.gap=+1DAY, f.date.facet.range.start=NOW/DAY-5DAYS, and
 f.date.facet.range.end=NOW/DAY
 and i get this exception 
 
 Exception during facet.range of date org.apache.solr.common.SolrException:
 Can't add gap 1DAYS to value Sun Nov 13 00:00:00 UTC 2011 for field: date at
 org.apache.solr.request.SimpleFacets$RangeEndpointCalculator.addGap(SimpleFacets.java:1093)
 at
 org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:873)
 at
 org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:839)
 at
 org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:778)
 at
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:178)
 at
 org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164)
 at
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
 at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:563)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:399)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:317)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:204)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:182)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:311)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662) Caused by:
 java.text.ParseException: Unrecognized command:   at
 org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:277) at
 org.apache.solr.request.SimpleFacets$DateRangeEndpointCalculator.parseAndAddGap(SimpleFacets.java:1188)
 at
 org.apache.solr.request.SimpleFacets$DateRangeEndpointCalculator.parseAndAddGap(SimpleFacets.java:1160)
 at
 org.apache.solr.request.SimpleFacets$RangeEndpointCalculator.addGap(SimpleFacets.java:1091)
 ... 27 more
 can you help me plz
 thanks in advance :)
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/date-range-in-solr-3-1-tp3527498p3527498.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Matching + and

2011-11-22 Thread Jan Høydahl

Why do you need spaces in the replacement?

Try pattern=\+ replacement=plus - it will cause the transformed charstream 
to contain as many tokens as the original and avoid the highlighting crash.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 22. nov. 2011, at 05:40, Tomasz Wegrzanowski wrote:

 Hi,
 
 I've been trying to match some phrases with + and  (like c++,
 google+, rd etc.),
 but tokenized gets rid of them before I can do anything with synonym filters.
 
 So I tried using CharFilters like this:
 
fieldType name=text class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
  analyzer type=index
charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=\+ replacement= plus /
charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=amp; replacement= and /
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory
 synonyms=synonyms_case_sensitive.txt ignoreCase=false
 expand=true/
filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
filter class=solr.SnowballPorterFilterFactory language=English/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory
 synonyms=query_synonyms.txt ignoreCase=true expand=false /
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
filter class=solr.SnowballPorterFilterFactory language=English/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType
 
 This mostly works, but for a very small number of documents, mostly
 those with large number of pluses in them,
 highlighter just crashes (and it's highlighter since turning it off
 and reissuing the query works just fine, if I replace
 pluses with spaces and reindex, the same query reruns just fine) with
 exception like this:
 
 Nov 21, 2011 11:35:11 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.StringIndexOutOfBoundsException: String index out of range: 
 -1
   at java.lang.String.substring(String.java:1938)
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:237)
   at 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:462)
   at 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:378)
   at 
 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:343)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
   at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
   at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
   at

Unexpected cpu load and Solr incrase response time

2011-11-22 Thread mikopacz

Hi, we currently have 2 servers running on JBoss container (master and slave)
with 20mln documents and about 3GB index size.
Java was executed with options:
*-Xms12G -Xmx12G -XX:NewSize=4G -XX:MaxNewSize=4G -XX:MaxPermSize=256m
-Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.client.gcInterval=360
-Dsun.rmi.dgc.server.gcInterval=360 -XX:+UseCompressedOops -XX:+UseTLAB
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled*

Commit duration on master is 5 minutes and we use Solr 3.3 (because in 3.4
we have a problem with dataimport
https://issues.apache.org/jira/browse/SOLR-2804). 

We have problem that occurs when server gets about *34* qps. *Do you have
any advice how to fix this problem?* I have attached the charts below. The
load and threads count increases between 19:00 and 20:00. On 20:10 we
reduced by half the number of queries.

http://lucene.472066.n3.nabble.com/file/n3527914/solr_users_reqs-day.png 
http://lucene.472066.n3.nabble.com/file/n3527914/load-day.png 
http://lucene.472066.n3.nabble.com/file/n3527914/threads-day.png 
http://lucene.472066.n3.nabble.com/file/n3527914/jboss_threads-day.png 
http://lucene.472066.n3.nabble.com/file/n3527914/cpu-day.png 
http://lucene.472066.n3.nabble.com/file/n3527914/_avg_response_query_time-22-11-2011_15_50_17.png
 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unexpected-cpu-load-and-Solr-incrase-response-time-tp3527914p3527914.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to make effective search with fq and q params

2011-11-22 Thread Jeff Schmidt

Hi Erik:

When using [e]dismax, does configuring q.alt=*:* and not specifying q affect 
the performance/caching in any way?

As a side note, a while back I configured q.alt=*:*, and the application (via 
SolrJ) still set q=*:* if no user input was provided (faceting). With both of 
them set that way, I got zero results. (Solr 3.4.0)  Interesting.

Thanks,

Jeff

On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote:

 If all you're doing is filtering (browsing by facets perhaps), it's perfectly 
 fine to have q=*:*.  MatchAllDocsQuery is fast (and would be cached anyway), 
 so use *:* as appropriate without worries.
 
   Erik
 
 
 
 On Nov 22, 2011, at 07:18 , pravesh wrote:
 
 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery
 
 Regds
 Pravesh
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068

Re: how to make effective search with fq and q params

On Nov 22, 2011, at 09:55 , Jeff Schmidt wrote:
When using [e]dismax, does configuring q.alt=*:* and not specifying q affect
the performance/caching in any way?

No different than using q=*:* with the lucene query parser.
MatchAllDocsQuery is possibly the fastest query out there! (it simply matches
documents in index order, all scores are 1.0)

As a side note, a while back I configured q.alt=*:*, and the application (via
SolrJ) still set q=*:* if no user input was provided (faceting). With both of
them set that way, I got zero results. (Solr 3.4.0) Interesting.

Ouch. Really? I don't see in the code (looking at my trunk checkout) where
there's any *:* used in the SolrJ library. Can you provide some details on how
you used SolrJ? It'd be good to track this down as that seems like a bug to me.

Erik

Thanks,

Jeff

On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote:

If all you're doing is filtering (browsing by facets perhaps), it's
perfectly fine to have q=*:*. MatchAllDocsQuery is fast (and would be
cached anyway), so use *:* as appropriate without worries.

Erik

On Nov 22, 2011, at 07:18 , pravesh wrote:

Usually,

Use the 'q' parameter to search for the free text values entered by the
users (where you might want to parse the query and/or apply
boosting/phrase-sloppy, minimum match,tie etc )

Use the 'fq' to limit the searches to certain criterias like location,
date-ranges etc.

Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery

Regds
Pravesh

--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068

How to select all docs of 'today' ?

2011-11-22 Thread Danicela nutch

Hi,

 I have a fetch-time (date) field to know when the documents were fetched.

 I want to make a query to get all documents fetched today.

 I tried : 

 fetch-time:NOW/DAY
but it returns always 0.

 fetch-time:[NOW/DAY TO NOW/DAY]
 (it returns 0)

 fetch-time:[NOW/DAY-1DAY TO NOW/DAY]
but it returns documents fetched yesterday.

 fetch-time:[NOW/DAY-1HOUR TO NOW/DAY]
but it's incorrect too.

 Do you have any idea ?

 Thanks in advance.

Re: Solr real time update

2011-11-22 Thread Nagendra Nagarajayya


Yu:

To get Near Real Time update in Solr 1.4.1 you will need to use Solr 
1.4.1 with RankingAlgorithm. This allows you to update documents in near 
real time. You can download and give this a try from here:


http://solr-ra.tgels.org/

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org/
http://rankingalgorithm.tgels.org/

On 11/21/2011 9:47 PM, yu shen wrote:

Hi All,

After some study, I used below snippet. Seems the documents is updated,
while still takes a long time. Feels like the parameter does not take
effect. Any comments?
UpdateRequest req = new UpdateRequest();
 req.add(solrDocs);
 req.setCommitWithin(5000);
 req.setParam(commitWithin, 5000);
 req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
 req.process(SOLR_SERVER);

2011/11/22 yu shenshenyu...@gmail.com


Hi All,

I try to do a 'nearly real time update' to solr.  My solr version is
1.4.1. I read this solr 
CommentWithinhttp://wiki.apache.org/solr/CommitWithinwiki, and a related
threadhttp://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-td3472709.htmlmostly
 on the difficulty to do this.
My issue is I tried the code snippet in the wiki:

UpdateRequest req = new UpdateRequest();
req.add(mySolrInputDocument);
req.setCommitWithin(1);
req.process(server);

But my index did not get updated, unless I call SOLR_SERVER.commit();
explicitly. The latter call will take more than 1 minute on average to
return.

Can I do a real time update on solr 1.4.1? Would someone help to show a
workable code snippet?

Spark

Re: wild card search and lower-casing

2011-11-22 Thread Erick Erickson

No, no, no That's something buried in Lucene, it has nothing to
do with the patch! The patch has NOT yet been applied to any
released code.

You could pull the patch from the JIRA and apply it to trunk locally if
you wanted. But there's no patch for 3.x, I'll probably put that up
over the holiday.

But things have changed a bit (one of the things I'll have to do is
create some documentation). You *should* be able to specify
just legacyMultiTerm=true in your fieldType if you want to
apply the 3.x patch to pre 3.6 code. It would be a good field test
if that worked for you.

But you can't do any of this until the JIRA (SOLR-2438) is
marked Resolution: Fixed.

Don't be fooled by Fix Version. Fix Version simply says
that those are the earliest versions it *could* go in.

Best
Erick

On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan dmitry@gmail.com wrote:
I guess, I have found your comment, thanks.

For our current needs I have just set:

setLowercaseExpandedTerms(true); // changed from default false

in the SolrQueryParser's constructor and that seem to work so far.

In order not to start a separate thread on wildcards. Is it so, that for
the trailing wildcard there is a minimum of 2 preceding characters for a
search to happen?

Dmitry

On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson
erickerick...@gmail.comwrote:

On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan dmitry@gmail.com wrote:
Thanks Erick.

Do you think the patch you are working on will be applicable as well to
3.4?

Best,
Dmitry

On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson erickerick...@gmail.com
wrote:

As it happens I'm working on SOLR-2438 which should address this. This
patch
will provide two things:

As of 3.6 and 4.0, this will be the default behavior, although you can
explicitly
define a field type parameter to specify the current behavior.

The reason it is on 3.6 is that I want it to bake for a while before
getting into the
wild, so I have no intention of trying to get it into the 3.5 release.

The patch is up for review now, I'd like another set of eyeballs or
two on it before
committing.

The patch that's up there now is against trunk but I hope to have a 3x
patch that
I'll apply to the 3x code line after 3.5 RC1 is cut.

Best
Erick

On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com
wrote:

You're right:

public SolrQueryParser(IndexSchema schema, String
defaultField) {
...
setLowercaseExpandedTerms(false);
...
}

Please note that lowercaseExpandedTerms uses String.toLowercase()
(uses
default Locale) which is a Locale sensitive operation.

In Lucene AnalyzingQueryParser exists for this purposes, but I am not
sure if it is ported to solr.

http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html

Re: wild card search and lower-casing

2011-11-22 Thread Dmitry Kan

Thanks, Erick. I was in fact reading the patch (the one attached as a
file to the aforementioned jira) you updated sometime yesterday. I'll
watch the issue, but as said the change of a hard-coded boolean to its
opposite worked just fine for me.

Best,
Dmitry

On 11/22/11, Erick Erickson erickerick...@gmail.com wrote:
No, no, no That's something buried in Lucene, it has nothing to
do with the patch! The patch has NOT yet been applied to any
released code.

You could pull the patch from the JIRA and apply it to trunk locally if
you wanted. But there's no patch for 3.x, I'll probably put that up
over the holiday.

But you can't do any of this until the JIRA (SOLR-2438) is
marked Resolution: Fixed.

Don't be fooled by Fix Version. Fix Version simply says
that those are the earliest versions it *could* go in.

Best
Erick

On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan dmitry@gmail.com wrote:
I guess, I have found your comment, thanks.

For our current needs I have just set:

setLowercaseExpandedTerms(true); // changed from default false

in the SolrQueryParser's constructor and that seem to work so far.

In order not to start a separate thread on wildcards. Is it so, that for
the trailing wildcard there is a minimum of 2 preceding characters for a
search to happen?

Dmitry

On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson
erickerick...@gmail.comwrote:

On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan dmitry@gmail.com wrote:
Thanks Erick.

Do you think the patch you are working on will be applicable as well to
3.4?

Best,
Dmitry

On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson
erickerick...@gmail.com
wrote:

As it happens I'm working on SOLR-2438 which should address this. This
patch
will provide two things:

The ability to define a new analysis chain in your schema.xml,
currently
called
multiterm that will be applied to queries of various sorts,
including wildcard,
prefix, range. This will be somewhat of an expert thing to make
yourself...

As of 3.6 and 4.0, this will be the default behavior, although you can
explicitly
define a field type parameter to specify the current behavior.

The reason it is on 3.6 is that I want it to bake for a while before
getting into the
wild, so I have no intention of trying to get it into the 3.5 release.

The patch is up for review now, I'd like another set of eyeballs or
two on it before
committing.

The patch that's up there now is against trunk but I hope to have a 3x
patch that
I'll apply to the 3x code line after 3.5 RC1 is cut.

Best
Erick

On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com
wrote:

You're right:

public SolrQueryParser(IndexSchema schema, String
defaultField) {
...
setLowercaseExpandedTerms(false);
...
}

Please note that lowercaseExpandedTerms uses String.toLowercase()
(uses
default Locale) which is a Locale sensitive operation.

In Lucene AnalyzingQueryParser exists for this purposes, but I am
not
sure if it is ported to solr.

http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html

--
Regards,

Dmitry Kan

AW: How to select all docs of 'today' ?

2011-11-22 Thread Peters, Sebastian

Hi,

fetch-time:[NOW/DAY TO NOW] should do it.

Best
Sebastian


-Ursprüngliche Nachricht-
Von: Danicela nutch [mailto:danicela-nu...@mail.com] 
Gesendet: Dienstag, 22. November 2011 16:08
An: solr-user@lucene.apache.org
Betreff: How to select all docs of 'today' ?

Hi,

 I have a fetch-time (date) field to know when the documents were fetched.

 I want to make a query to get all documents fetched today.

 I tried : 

 fetch-time:NOW/DAY
but it returns always 0.

 fetch-time:[NOW/DAY TO NOW/DAY]
 (it returns 0)

 fetch-time:[NOW/DAY-1DAY TO NOW/DAY]
but it returns documents fetched yesterday.

 fetch-time:[NOW/DAY-1HOUR TO NOW/DAY]
but it's incorrect too.

 Do you have any idea ?

 Thanks in advance.

Re: Autocomplete(terms) performance problem

2011-11-22 Thread solr-ra

You should try out the autocomplete component using Solr with
RankingAlgorithm. The performance is less  than 3ms for a 1 million
Wikipedia titles index with very low deviation. You can get more information
about the performance with different indexes of size 3k, 390k, 1m, 10m docs
from here:

http://solr-ra.tgels.org/solr-ra-autocomplete.jsp

- Nagendra Nagarajayya

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3528112.html
Sent from the Solr - User mailing list archive at Nabble.com.

weird issue with solr and CentOS 5.7

2011-11-22 Thread Boris Quiroz

Hi all,

I'm facing a real weird issue here with solr (lucene 3.3) and CentOS
5.7. I've two servers, one running CentOS 5.5 and the other running
CentOS 5.7. Both servers has the same solr, java and tomcat versions,
the only difference between them is OS version.
I added a custom field to schema.xml: field name=stream_isPrivate
type=boolean indexed=true stored=true required=false/. When
that type is boolean, on CentOS 5.5 works OK indexing Chinese
characters, but on CentOS 5.7 I got this exception:

Nov 22, 2011 11:27:11 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select/
params={indent=onstart=0q=我们从右上角讲起rows=10version=2.2} hits=1
status=0 QTime=8
Nov 22, 2011 11:27:11 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(String.java:694)
at org.apache.solr.schema.BoolField.write(BoolField.java:129)
at org.apache.solr.schema.SchemaField.write(SchemaField.java:124)
at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369)
at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545)
at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482)
at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519)
at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582)
at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131)
at 
org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685)
at java.lang.Thread.run(Thread.java:636)

That only happens on CentOS 5.7. I also tested on Ubuntu Server, and
also works OK.

solrconfig.xml and everything else is the same on both servers. Any
idea what could be happening? Should it be a CentOS bug?

Regards.
-- 
Boris Quiroz
boris.qui...@menco.it

NullPointerException with distributed facets

2011-11-22 Thread Phil Hoy

Hi,

When doing a distributed query in solr 4.0 (4.0.0.2011.06.25.15.36.22) with 
facet.missing=true and facet.limit=20 I get a NullPointerException. By 
increasing the facet limit to 200 or setting facet missing to false it seems to 
fix it. The shards both contain the field but one shard always has a value and 
one never has a value. Single shard queries work fine on each shard. Does 
anyone know the cause or a fix?

java.lang.NullPointerException
at 
org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:489)
at 
org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:278)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:292)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1452)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Phil

Re: how to make effective search with fq and q params

2011-11-22 Thread Jeff Schmidt

Hi Erik:

It's not in the SolrJ library, but rather my use of it:

In my application code:

protected static final String SOLR_ALL_DOCS_QUERY = *:*;

/*
  * If no search terms provided, then return all neighbors.
  * Results are to be returned in neighbor symbol alphabetical order.
*/

if (searchTerms == null) {
searchTerms = SOLR_ALL_DOCS_QUERY;
nodeQuery.addSortField(n_name, SolrQuery.ORDER.asc);
}

So, if no user search terms are provided, I search all documents (there are 
other fqs in effect) and return them in name order.

That worked just fine.  Then I read more about [e]dismax, and went and 
configured:

str name=q.alt*:*/str

Then I would get zero results.  It's not a SolrJ issue though, as this request 
in my browser also resulted in zero results:

http://localhost:8091/solr/ing-content/select/?qt=partner-tmofq=type%3Anodefq=n_neighborof_id%3AING\:afaq=*:*rows=5facet=truefacet.mincount=1facet.field=n_neighborof_processExactfacet.field=n_neighborof_edge_type

That was due to the q=*:*.  Once I set, say, q=cancer, I got results.  So I 
guess this is a [e]dismax thing?  (partner-tmo is the name of my request 
handler).

I solved my problem by net setting *:* in my application, and left q.alt=*:* in 
place.

Hope this helps.  Again, this is stock Solr 3.4.0, running the Apache war under 
Tomcat 6.

Jeff

On Nov 22, 2011, at 8:05 AM, Erik Hatcher wrote:

 
 On Nov 22, 2011, at 09:55 , Jeff Schmidt wrote:
 When using [e]dismax, does configuring q.alt=*:* and not specifying q affect 
 the performance/caching in any way?
 
 No different than using q=*:* with the lucene query parser.  
 MatchAllDocsQuery is possibly the fastest query out there!  (it simply 
 matches documents in index order, all scores are 1.0)
 
 As a side note, a while back I configured q.alt=*:*, and the application 
 (via SolrJ) still set q=*:* if no user input was provided (faceting). With 
 both of them set that way, I got zero results. (Solr 3.4.0)  Interesting.
 
 Ouch.  Really?  I don't see in the code (looking at my trunk checkout) where 
 there's any *:* used in the SolrJ library.  Can you provide some details on 
 how you used SolrJ?  It'd be good to track this down as that seems like a bug 
 to me.
 
   Erik
 
 
 
 Thanks,
 
 Jeff
 
 On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote:
 
 If all you're doing is filtering (browsing by facets perhaps), it's 
 perfectly fine to have q=*:*.  MatchAllDocsQuery is fast (and would be 
 cached anyway), so use *:* as appropriate without worries.
 
 Erik
 
 
 
 On Nov 22, 2011, at 07:18 , pravesh wrote:
 
 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to 
 matchalldocsquery
 
 Regds
 Pravesh
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 j...@535consulting.com
 http://www.535consulting.com
 (650) 423-1068
 
 
 
 
 
 
 
 
 
 



--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068

Re : AW: How to select all docs of 'today' ?

2011-11-22 Thread Danicela nutch

Thanks it works.

 All this is based on the fact that NOW/DAY means the beginning of the day.

- Message d'origine -
De : sebastian.pet...@tib.uni-hannover.de
Envoyés : 22.11.11 16:46
À : solr-user@lucene.apache.org
Objet : AW: How to select all docs of 'today' ?

 Hi, fetch-time:[NOW/DAY TO NOW] should do it. Best Sebastian 
-Ursprüngliche Nachricht- Von: Danicela nutch 
[mailto:danicela-nu...@mail.com] Gesendet: Dienstag, 22. November 2011 16:08 
An: solr-user@lucene.apache.org Betreff: How to select all docs of 'today' ? 
Hi, I have a fetch-time (date) field to know when the documents were fetched. I 
want to make a query to get all documents fetched today. I tried : 
fetch-time:NOW/DAY but it returns always 0. fetch-time:[NOW/DAY TO NOW/DAY] (it 
returns 0) fetch-time:[NOW/DAY-1DAY TO NOW/DAY] but it returns documents 
fetched yesterday. fetch-time:[NOW/DAY-1HOUR TO NOW/DAY] but it's incorrect 
too. Do you have any idea ? Thanks in advance.

Re: FunctionQuery score=0

2011-11-22 Thread John

Can this be fixed somehow? I also need the real score.

On Sun, Nov 20, 2011 at 10:44 AM, John fatmanc...@gmail.com wrote:

 After playing some more with this I managed to get what I want, almost.

 My query now looks like:

 q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02
 title^0.08 categorysearch^0.05 boost='eqsim(alltokens,xyz)'
 v='+tokens5:xyz '})


 With the above query, I am getting only the results that I want, the ones
 whose score after my FucntionQuery are above 0, but the problem now is that
 the final score for all results is changed to 1, which affects the sorting.

 How can I keep the original score that is calculated by the edismax query?

 Cheers,
 John


 On Fri, Nov 18, 2011 at 10:50 AM, Andre Bois-Crettez 
 andre.b...@kelkoo.com wrote:

 Definitely worked for me, with a classic full text search on ipod and
 such.
 Changing the lower bound changed the number of results.

 Follow Chris advice, and give more details.



 John wrote:

 Doesn't seem to work.
 I though that FilterQueries work before the search is performed and not
 after... no?

 Debug doesn't include filter query only the below (changed a bit):

 BoostedQuery(boost(+fieldName:**,boostedFunction(ord(**
 fieldName),query)))


 On Thu, Nov 17, 2011 at 5:04 PM, Andre Bois-Crettez
 andre.b...@kelkoo.comwrote:



 John wrote:



 Some of the results are receiving score=0 in my function and I would
 like
 them not to appear in the search results.




 you can use frange, and filter by score:

 q=ipodfq={!frange l=0 incl=false}query($q)

 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/








 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/

Faceting is not Using Field Value Cache . . ?

2011-11-22 Thread CRB



Seeing something odd going on with faceting . . . we execute facets with 
every query and yet the fieldValueCache is not being used:


name:  fieldValueCache
class:  org.apache.solr.search.FastLRUCache
version:  1.0
description:  Concurrent LRU Cache(maxSize=1, initialSize=10, 
minSize=9000, acceptableSize=9500, cleanupThread=false)

stats: lookups : 0
hits : 0
hitratio : 0.00
inserts : 0
evictions : 0
size : 0
warmupTime : 0
cumulative_lookups : 0
cumulative_hits : 0
cumulative_hitratio : 0.00
cumulative_inserts : 0
cumulative_evictions : 0

I was under the impression the fieldValueCache  was an implicit cache 
(if you don't define it, it will still exist).


We are running Solr v3.3 (and NOT using {!cache=false}).

Thoughts?

Re: Problems with AutoSuggest feature(Terms Components)

2011-11-22 Thread mechravi25

Hi Erick,

Thanks for your reply. I would know all the options that can be given under
the defaults section and how they can be overridden. is there any
documentation available in solr forum. Cos we tried searching and wasn't
able to succeed. 

My Exact scenario is that, I have one master core which has many underlying
shards core(Disturbed architecture). I want the terms.limit should be
defaulted to 10 in the underlying shards cores. When i hit the master core,
it will in-turn hit the underlying shard cores. At this point of time, the
terms.limit which has been passed to the master core has to passed to these
underlying shard cores overriding the default value set. Can you please
suggest the definition of the terms component for the underlying shard
cores.

Regards,
Sivaganesh
 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problems-with-AutoSuggest-feature-Terms-Components-tp3512734p3528597.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: To push the terms.limit parameter from the master core to all the shard cores.

2011-11-22 Thread mechravi25

Hi Mark, 

Thanks for your suggestion. 

My Exact scenario is that, I have one master core which has many underlying
shards core(Disturbed architecture). I want the terms.limit should be
defaulted to 10 in the underlying shards cores. When i hit the master core,
it will in-turn hit the underlying shard cores. At this point of time, the
terms.limit which has been passed to the master core has to passed
dynamically to these underlying shard cores overriding the default value
set. Can you please suggest the definition of the terms component for the
underlying shard cores. 

I would know all the options that can be given under the defaults section
and how they can be overridden. is there any documentation available in solr
forum. Cos we tried searching and wasn't able to succeed. 

Regards, 
Sivaganesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-push-the-terms-limit-parameter-from-the-master-core-to-all-the-shard-cores-tp3520609p3528608.html
Sent from the Solr - User mailing list archive at Nabble.com.

Highlight with multi word synonyms

2011-11-22 Thread Brian Gerby

I'm trying to use multi-word synonyms. For example in my synonyms file I have 
nhl, national hockey league. If I do this index only, a search for nhl returns 
a correct match, but highlights the first word only, national. Ideally, it 
would highlight national hockey league or not highlight at all. If I do the 
synonyms at both index and query time, it finds the match and does the correct 
highlighting, but I understand it is not ideal to do synonyms at index and 
query time. I am expanding synonyms and using edismax. Thoughts?

Re: how to make effective search with fq and q params

I think you're using dismax, not edismax. edismax will take q=*:* just fine as 
it handles all Lucene syntax queries also.  dismax does not.

So, if you're using dismax and there is no actual query (but you want to get 
facets), you set q.alt=*:* and omit q - that's entirely by design.

If there's a non-empty q parameter, q.alt is not considered so there shouldn't 
be any issues with always have q.alt set if that's what you want.

Erik


On Nov 22, 2011, at 11:15 , Jeff Schmidt wrote:

 Hi Erik:
 
 It's not in the SolrJ library, but rather my use of it:
 
 In my application code:
 
 protected static final String SOLR_ALL_DOCS_QUERY = *:*;
 
 /*
  * If no search terms provided, then return all neighbors.
  * Results are to be returned in neighbor symbol alphabetical order.
 */
 
 if (searchTerms == null) {
   searchTerms = SOLR_ALL_DOCS_QUERY;
   nodeQuery.addSortField(n_name, SolrQuery.ORDER.asc);
 }
 
 So, if no user search terms are provided, I search all documents (there are 
 other fqs in effect) and return them in name order.
 
 That worked just fine.  Then I read more about [e]dismax, and went and 
 configured:
 
 str name=q.alt*:*/str
 
 Then I would get zero results.  It's not a SolrJ issue though, as this 
 request in my browser also resulted in zero results:
 
 http://localhost:8091/solr/ing-content/select/?qt=partner-tmofq=type%3Anodefq=n_neighborof_id%3AING\:afaq=*:*rows=5facet=truefacet.mincount=1facet.field=n_neighborof_processExactfacet.field=n_neighborof_edge_type
 
 That was due to the q=*:*.  Once I set, say, q=cancer, I got results.  So I 
 guess this is a [e]dismax thing?  (partner-tmo is the name of my request 
 handler).
 
 I solved my problem by net setting *:* in my application, and left q.alt=*:* 
 in place.
 
 Hope this helps.  Again, this is stock Solr 3.4.0, running the Apache war 
 under Tomcat 6.
 
 Jeff
 
 On Nov 22, 2011, at 8:05 AM, Erik Hatcher wrote:
 
 
 On Nov 22, 2011, at 09:55 , Jeff Schmidt wrote:
 When using [e]dismax, does configuring q.alt=*:* and not specifying q 
 affect the performance/caching in any way?
 
 No different than using q=*:* with the lucene query parser.  
 MatchAllDocsQuery is possibly the fastest query out there!  (it simply 
 matches documents in index order, all scores are 1.0)
 
 As a side note, a while back I configured q.alt=*:*, and the application 
 (via SolrJ) still set q=*:* if no user input was provided (faceting). With 
 both of them set that way, I got zero results. (Solr 3.4.0)  Interesting.
 
 Ouch.  Really?  I don't see in the code (looking at my trunk checkout) where 
 there's any *:* used in the SolrJ library.  Can you provide some details on 
 how you used SolrJ?  It'd be good to track this down as that seems like a 
 bug to me.
 
  Erik
 
 
 
 Thanks,
 
 Jeff
 
 On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote:
 
 If all you're doing is filtering (browsing by facets perhaps), it's 
 perfectly fine to have q=*:*.  MatchAllDocsQuery is fast (and would be 
 cached anyway), so use *:* as appropriate without worries.
 
Erik
 
 
 
 On Nov 22, 2011, at 07:18 , pravesh wrote:
 
 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to 
 matchalldocsquery
 
 Regds
 Pravesh
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 j...@535consulting.com
 http://www.535consulting.com
 (650) 423-1068
 
 
 
 
 
 
 
 
 
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 j...@535consulting.com
 http://www.535consulting.com
 (650) 423-1068

Re: FunctionQuery score=0

2011-11-22 Thread Chris Hostetter


:  q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02
:  title^0.08 categorysearch^0.05 boost='eqsim(alltokens,xyz)'
:  v='+tokens5:xyz '})
: 
: 
:  With the above query, I am getting only the results that I want, the ones
:  whose score after my FucntionQuery are above 0, but the problem now is that
:  the final score for all results is changed to 1, which affects the sorting.
: 
:  How can I keep the original score that is calculated by the edismax query?

a) Like i said.  details matter.

In your earlier messages you mentioned that you were wrapping a function 
arround a query and wanted to not have the function match anythign where 
the result was 0 -- the suggestions provided have done that.

this is the first time you mentioned that you needed the values returned 
by the function as the scores of the documents (had you mentioned that you 
might have gotten differnet answers)

b) if you look closely at the suggestion from André, you'll see that his 
specific suggestion will actually do what you want if you follow it -- 
express the query you want in the q param (so you get the scores from 
it) and then express an fq that refers to the q query as a variable...

:  q=ipodfq={!frange l=0 incl=false}query($q)

c) Based on the concrete example you've given above, i'ts not clear to me 
that you actually need any of this -- if the above query is giving you the 
results you want, but you want the scores from the edismax query to be 
used as the final scores of the function, then there is no need to wrap 
the query in any sort of function at all, or exclude any 0 values

this should be exactly what you want...

q={!type=edismax qf=abstract^0.02 title^0.08 categorysearch^0.05 
boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '}

...why exactly did you think you needed to wrap that query in a function?

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss

spellcheck in dismax

2011-11-22 Thread Ruixiang Zhang

I put the following into dismax requestHandler, but no suggestion field is
returned.

lst name=defaults
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.extendedResultsfalse/str
  str name=spellcheck.count1/str
/lst
arr name=last-components
  strspellcheck/str
/arr

But everything works if I put it as a separate requestHandler. Did I miss
something?

Thanks
Richard

Re: spellcheck in dismax

2011-11-22 Thread alxsss


 It seem you forget this
str name=spellchecktrue/str


 

 

-Original Message-
From: Ruixiang Zhang rxzh...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Nov 22, 2011 11:54 am
Subject: spellcheck in dismax


I put the following into dismax requestHandler, but no suggestion field is
returned.

lst name=defaults
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.extendedResultsfalse/str
  str name=spellcheck.count1/str
/lst
arr name=last-components
  strspellcheck/str
/arr

But everything works if I put it as a separate requestHandler. Did I miss
something?

Thanks
Richard

Re: Faceting is not Using Field Value Cache . . ?

2011-11-22 Thread Samuel García Martínez

AFAIK, FieldValueCache is only used for faceting on tokenized fields.
Maybe, are you getting confused with FieldCache (
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/FieldCache.html)?
This is used for common facets (using facet.method=fc and not tokenized
fields).

This makes any sense for you?

On Tue, Nov 22, 2011 at 7:21 PM, CRB sub.scripti...@metaheuristica.comwrote:


 Seeing something odd going on with faceting . . . we execute facets with
 every query and yet the fieldValueCache is not being used:

name:  fieldValueCache
 class:  org.apache.solr.search.**FastLRUCache
 version:  1.0
 description:  Concurrent LRU Cache(maxSize=1, initialSize=10,
 minSize=9000, acceptableSize=9500, cleanupThread=false)
 stats: lookups : 0
 hits : 0
 hitratio : 0.00
 inserts : 0
 evictions : 0
 size : 0
 warmupTime : 0
 cumulative_lookups : 0
 cumulative_hits : 0
 cumulative_hitratio : 0.00
 cumulative_inserts : 0
 cumulative_evictions : 0

 I was under the impression the fieldValueCache  was an implicit cache (if
 you don't define it, it will still exist).

 We are running Solr v3.3 (and NOT using {!cache=false}).

 Thoughts?




-- 
Un saludo,
Samuel García.

Re: Solr highlighting isn't work!

2011-11-22 Thread Koji Sekiguchi


(11/11/22 22:30), VladislavLysov wrote:

Hello!!!
   I have a trouble with Solr highlighting. I have any document with next
fields- TYPE, DBID and others. When i do next request -
https://localhost:8443/solr/myCore/afts?wt=standardq=TYPE:
https://localhost:8443/solr/myCore/afts?wt=standardq=TYPE:cm:contentindent=onhl=truehl.fl=DBIDhl.usePhraseHighlighter=truefl=DBID
it was returned next text:
response
lst name=responseHeader
int name=status0/int
int name=QTime3/int
/lst
result name=response numFound=166 start=0
doc
arr name=DBID
str892/str
/arr
/doc
doc
/result
lst name=highlighting
lst name=LEAF-892/
/lst
/response
What is problem?
Thank you!


What term are you trying to highlight? You queried cm:content on TYPE field 
and
commanded to highlight the term on DBID field. But seems that DBID field 
includes
only 892, highlighter cannot create any highlighted snippets.

With Solr 3.5 (now RC2 available) or trunk version of Solr, you can use hl.q 
parameter
for highlighting query.

http://wiki.apache.org/solr/HighlightingParameters#hl.q

koji
--
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

Re: Solr real time update

2011-11-22 Thread yu shen

Hi Nagarajayya,

Thanks for your information. Do I need to change any configuration of my
current solr server to integrate your plugin?

Spark


2011/11/22 Nagendra Nagarajayya nnagaraja...@transaxtions.com

 Yu:

 To get Near Real Time update in Solr 1.4.1 you will need to use Solr 1.4.1
 with RankingAlgorithm. This allows you to update documents in near real
 time. You can download and give this a try from here:

 http://solr-ra.tgels.org/

 Regards,

 - Nagendra Nagarajayya
 http://solr-ra.tgels.org/
 http://rankingalgorithm.tgels.**org/ http://rankingalgorithm.tgels.org/


 On 11/21/2011 9:47 PM, yu shen wrote:

 Hi All,

 After some study, I used below snippet. Seems the documents is updated,
 while still takes a long time. Feels like the parameter does not take
 effect. Any comments?
 UpdateRequest req = new UpdateRequest();
 req.add(solrDocs);
 req.setCommitWithin(5000);
 req.setParam(commitWithin, 5000);
 req.setAction(**AbstractUpdateRequest.ACTION.**COMMIT, true,
 true);
 req.process(SOLR_SERVER);

 2011/11/22 yu shenshenyu...@gmail.com

  Hi All,

 I try to do a 'nearly real time update' to solr.  My solr version is
 1.4.1. I read this solr CommentWithinhttp://wiki.**
 apache.org/solr/CommitWithin http://wiki.apache.org/solr/CommitWithin
 **wiki, and a related
 threadhttp://lucene.472066.**n3.nabble.com/Solr-real-time-**
 update-taking-time-td3472709.**htmlhttp://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-td3472709.htmlmostly
 on the difficulty to do this.

 My issue is I tried the code snippet in the wiki:

 UpdateRequest req = new UpdateRequest();
 req.add(mySolrInputDocument);
 req.setCommitWithin(1);
 req.process(server);

 But my index did not get updated, unless I call SOLR_SERVER.commit();
 explicitly. The latter call will take more than 1 minute on average to
 return.

 Can I do a real time update on solr 1.4.1? Would someone help to show a
 workable code snippet?

 Spark

If search matches index in the middle of filter chain, will result return?

2011-11-22 Thread Ellery Leung

Hi all

 

I am using Solr 3.4 with Win7 and Jetty.

 

When I do a search on a field, according to the Analysis from Solr, the
search string matches the index in the middle of the chain.  Here is the
schema:

 

fieldType name=substring_search class=solr.TextField
positionIncrementGap=100

analyzer type=index

charFilter
class=solr.MappingCharFilterFactory
mapping=../../filters/filter-mappings.txt/

charFilter
class=solr.HTMLStripCharFilterFactory /

tokenizer
class=solr.KeywordTokenizerFactory/

filter
class=solr.ASCIIFoldingFilterFactory/

filter class=solr.TrimFilterFactory /

filter class=solr.LowerCaseFilterFactory
/

filter
class=solr.CommonGramsFilterFactory words=../../filters/stopwords.txt
ignoreCase=true/

filter class=solr.NGramFilterFactory
minGramSize=1 maxGramSize=20/

filter
class=solr.RemoveDuplicatesTokenFilterFactory /

/analyzer

analyzer type=query

charFilter
class=solr.MappingCharFilterFactory
mapping=../../filters/filter-mappings.txt/

charFilter
class=solr.HTMLStripCharFilterFactory /

tokenizer
class=solr.KeywordTokenizerFactory/

filter
class=solr.ASCIIFoldingFilterFactory/

filter class=solr.TrimFilterFactory /

filter class=solr.LowerCaseFilterFactory
/

filter
class=solr.RemoveDuplicatesTokenFilterFactory /

/analyzer

/fieldType

 

I am searching for an email called: off...@officeofficeoffice.com.  If I
search any text under 20 characters, result will be returned.  But when I
search the whole string: off...@officeofficeoffice.com, no result return.

 

As you all see in the schema in index part, when I search the whole
string, it will match the index chain before NGramFilterFactory.  But after
NGram, no result found.

 

Here are my questions:

-  Is this behavior normal?

-  In order to get off...@officeofficeoffice.com, does it mean
that I have to make the maxGramSize larger (like 70)?

 

Thank you in advance for all your support.  This is a great community.

Separate ACL and document index

2011-11-22 Thread Floyd Wu

Hi there,

Is it possible to separate ACL index and document index and achieve to
search by user role in SOLR?

Currently my implementation is to index ACL with document, but the
document itself change frequently. I have to perform rebuild index
every time when ACL change. It's heavy for whole system due to
document are so many and content are huge.

Do you guys have any solution to solve this problem. I've been read
mailing list for a while. Seem there is not suitable solution for me.

I want user searches result only for him according to his role but I
don't want to re-index document every time when document's ACL change.

To my knowledge, is this possible to perform a join like database to
achieve this? How and possible?

Thanks

Floyd

Re: If search matches index in the middle of filter chain, will result return?

2011-11-22 Thread Shawn Heisey


On 11/22/2011 7:54 PM, Ellery Leung wrote:

I am searching for an email called: off...@officeofficeoffice.com.  If I
search any text under 20 characters, result will be returned.  But when I
search the whole string: off...@officeofficeoffice.com, no result return.

As you all see in the schema in index part, when I search the whole
string, it will match the index chain before NGramFilterFactory.  But after
NGram, no result found.

Here are my questions:
-  Is this behavior normal?


I'm pretty sure that your query must match after the entire analyzer 
chain is done.  I would expect that behavior to be normal.



-  In order to get off...@officeofficeoffice.com, does it mean
that I have to make the maxGramSize larger (like 70)?


If you were to increase the maxGramSize to 70, you would get a match in 
this case, but your index might get a lot larger, depending on what's in 
your source data.  That's probably not the right approach, though.


In general, you want to have your index and query analyzer chains 
exactly the same.  There are some exceptions, but I don't think the 
NGram filter is one of them.  The synonym filter and WordDelimiterFilter 
are examples where it is expected that your index and query analyzer 
chains will be different.


Add the NGram and CommonGram filters to the query chain, and everything 
should start working.  If you were to go with a single analyzer for both 
like the following, I think it would start working.  You wouldn't even 
need to reindex, since you wouldn't be changing the index analyzer.


fieldType name=substring_search class=solr.TextField 
positionIncrementGap=100

analyzer
charFilter class=solr.MappingCharFilterFactory 
mapping=../../filters/filter-mappings.txt/

charFilter class=solr.HTMLStripCharFilterFactory /
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.TrimFilterFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.CommonGramsFilterFactory 
words=../../filters/stopwords.txt ignoreCase=true/

filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer
/fieldType

Regarding your NGram filter,  I would actually increase the minGramSize 
to at least 2 and decrease the maxGramSize to something like 10 or 15, 
then reindex.


An additional note: CommonGrams may not be all that useful unless you 
are indexing large numbers of huge documents, like entire books.  This 
particular fieldType is not suitable for full text anyway, since it uses 
KeywordTokenizer.  Consider removing CommonGrams from this fieldType and 
reindexing.  Unless you are dealing with large amounts of text, consider 
removing it from the entire schema.  If you do remove it, it's usually 
not a good idea to replace it with a StopFilter.  The index size 
reduction found in stopword removal is not usually worth the potential 
loss of recall.


Be prepared to test all reasonable analyzer combinations, rather than 
taking my word for it.


After reading the Hathi Trust blog, I tried CommonGrams on my own 
index.  It actually made things slower, not faster.  My typical document 
is only a few thousand bytes of metadata.  The Hathi Trust is indexing 
millions of full-length books.


Thanks,
Shawn

Re: Solr real time update

2011-11-22 Thread Nagendra Nagarajayya


Spark:

Solr with RankingAlgorithm is not a plugin but a change of search 
library from Lucene to RankingAlgorithm. Here is more info on the 
changes you will need to make to your solrconfig.xml:


http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search

Regards,

- Nagendra Nagrajayya
http://solr-ra.tgels.org/
http://rankingalgorithm.tgels.org/

On 11/22/2011 5:40 PM, yu shen wrote:

Hi Nagarajayya,

Thanks for your information. Do I need to change any configuration of my
current solr server to integrate your plugin?

Spark


2011/11/22 Nagendra Nagarajayyannagaraja...@transaxtions.com


Yu:

To get Near Real Time update in Solr 1.4.1 you will need to use Solr 1.4.1
with RankingAlgorithm. This allows you to update documents in near real
time. You can download and give this a try from here:

http://solr-ra.tgels.org/

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org/
http://rankingalgorithm.tgels.**org/http://rankingalgorithm.tgels.org/


On 11/21/2011 9:47 PM, yu shen wrote:


Hi All,

After some study, I used below snippet. Seems the documents is updated,
while still takes a long time. Feels like the parameter does not take
effect. Any comments?
UpdateRequest req = new UpdateRequest();
 req.add(solrDocs);
 req.setCommitWithin(5000);
 req.setParam(commitWithin, 5000);
 req.setAction(**AbstractUpdateRequest.ACTION.**COMMIT, true,
true);
 req.process(SOLR_SERVER);

2011/11/22 yu shenshenyu...@gmail.com

  Hi All,

I try to do a 'nearly real time update' to solr.  My solr version is
1.4.1. I read this solr CommentWithinhttp://wiki.**
apache.org/solr/CommitWithinhttp://wiki.apache.org/solr/CommitWithin
**wiki, and a related
threadhttp://lucene.472066.**n3.nabble.com/Solr-real-time-**
update-taking-time-td3472709.**htmlhttp://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-td3472709.htmlmostly
on the difficulty to do this.

My issue is I tried the code snippet in the wiki:

UpdateRequest req = new UpdateRequest();
req.add(mySolrInputDocument);
req.setCommitWithin(1);
req.process(server);

But my index did not get updated, unless I call SOLR_SERVER.commit();
explicitly. The latter call will take more than 1 minute on average to
return.

Can I do a real time update on solr 1.4.1? Would someone help to show a
workable code snippet?

Spark

RE: If search matches index in the middle of filter chain, will result return?

2011-11-22 Thread Ellery Leung

Thanks Shawn.  So to recap:

- Every match must be found after entire chain, not in the middle of the
chain.
- Suggested: index and query chain should be the same.

In my situation, if I make both of them the same, the result may be
misleading because it will also match other records that have the same
partial string.

But your suggestion is wonderful.  Thank you very much.

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: 2011年11月23日 12:04 下午
To: solr-user@lucene.apache.org
Subject: Re: If search matches index in the middle of filter chain, will
result return?

On 11/22/2011 7:54 PM, Ellery Leung wrote:
 I am searching for an email called: off...@officeofficeoffice.com.  If I
 search any text under 20 characters, result will be returned.  But when I
 search the whole string: off...@officeofficeoffice.com, no result return.

 As you all see in the schema in index part, when I search the whole
 string, it will match the index chain before NGramFilterFactory.  But
after
 NGram, no result found.

 Here are my questions:
 -  Is this behavior normal?

I'm pretty sure that your query must match after the entire analyzer 
chain is done.  I would expect that behavior to be normal.

 -  In order to get off...@officeofficeoffice.com, does it mean
 that I have to make the maxGramSize larger (like 70)?

If you were to increase the maxGramSize to 70, you would get a match in 
this case, but your index might get a lot larger, depending on what's in 
your source data.  That's probably not the right approach, though.

In general, you want to have your index and query analyzer chains 
exactly the same.  There are some exceptions, but I don't think the 
NGram filter is one of them.  The synonym filter and WordDelimiterFilter 
are examples where it is expected that your index and query analyzer 
chains will be different.

Add the NGram and CommonGram filters to the query chain, and everything 
should start working.  If you were to go with a single analyzer for both 
like the following, I think it would start working.  You wouldn't even 
need to reindex, since you wouldn't be changing the index analyzer.

fieldType name=substring_search class=solr.TextField 
positionIncrementGap=100
analyzer
charFilter class=solr.MappingCharFilterFactory 
mapping=../../filters/filter-mappings.txt/
charFilter class=solr.HTMLStripCharFilterFactory /
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.TrimFilterFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.CommonGramsFilterFactory 
words=../../filters/stopwords.txt ignoreCase=true/
filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer
/fieldType

Regarding your NGram filter,  I would actually increase the minGramSize 
to at least 2 and decrease the maxGramSize to something like 10 or 15, 
then reindex.

An additional note: CommonGrams may not be all that useful unless you 
are indexing large numbers of huge documents, like entire books.  This 
particular fieldType is not suitable for full text anyway, since it uses 
KeywordTokenizer.  Consider removing CommonGrams from this fieldType and 
reindexing.  Unless you are dealing with large amounts of text, consider 
removing it from the entire schema.  If you do remove it, it's usually 
not a good idea to replace it with a StopFilter.  The index size 
reduction found in stopword removal is not usually worth the potential 
loss of recall.

Be prepared to test all reasonable analyzer combinations, rather than 
taking my word for it.

After reading the Hathi Trust blog, I tried CommonGrams on my own 
index.  It actually made things slower, not faster.  My typical document 
is only a few thousand bytes of metadata.  The Hathi Trust is indexing 
millions of full-length books.

Thanks,
Shawn

Re: Integrating Surround Query Parser

How to apply this patch https://issues.apache.org/jira/browse/SOLR-2703 with
solr 3.1 to install surround as plugin?

On Tue, Nov 22, 2011 at 7:34 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 The surround query parser is fully wired into Solr trunk/4.0, if that
 helps.  See http://wiki.apache.org/solr/SurroundQueryParser and the JIRA
 issue linked there in case you want to patch it into a different version.

Erik

 On Jan 21, 2011, at 02:24 , Ahson Iqbal wrote:

  Hi All
 
  I want to integrate Surround Query Parser with solr, To do this i have
  downloaded jar file from the internet and and then pasting that jar file
 in
  web-inf/lib
 
  and configured query parser in solrconfig.xml as
  queryParser name=SurroundQParser
  class=org.apache.lucene.queryParser.surround.parser.QueryParser/
 
  now when i load solr admin page following exception comes
  org.apache.solr.common.SolrException: Error Instantiating QParserPlugin,
  org.apache.lucene.queryParser.surround.parser.QueryParser is not a
  org.apache.solr.search.QParserPlugin
 
  what i think that i didnt get the right plugin, can any body guide me
 from where
  to get right plugin for surround query parser or how to accurately
 integrate
  this plugin with solr.
 
 
  thanx
  Ahsan
 
 
 




-- 
Thanks  Regards

Rahul Mehta

Re: FunctionQuery score=0

2011-11-22 Thread John

Hi Hoss,

Thanks for the detailed response.

My XY problem is:

1) I am trying to search for a complex query:
q={!type=edismax qf=abstract^0.02 title^0.08 categorysearch^0.05
boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '}

Which answers my query needs. BUT, my boost function actually changes some
of the results to be of score 0, which I want to be excluded from the
result set.

2) This is why I used the frange query to solve the issue with the score 0:
q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02 title^0.08
categorysearch^0.05 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '})

But this time, the remaining results lost their *boosted* scores, and
therefore the sort by score got all mixed up.

3) I assume I can use filter queries, but from my understanding FQs
actually perform another query before the main one and these queries are
expensive in time and I would like to avoid it if possible.

Hope this explains a bit more.

Thanks,
Lev

On Tue, Nov 22, 2011 at 9:15 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 :  q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02
 :  title^0.08 categorysearch^0.05 boost='eqsim(alltokens,xyz)'
 :  v='+tokens5:xyz '})
 : 
 : 
 :  With the above query, I am getting only the results that I want, the
 ones
 :  whose score after my FucntionQuery are above 0, but the problem now is
 that
 :  the final score for all results is changed to 1, which affects the
 sorting.
 : 
 :  How can I keep the original score that is calculated by the edismax
 query?

 a) Like i said.  details matter.

 In your earlier messages you mentioned that you were wrapping a function
 arround a query and wanted to not have the function match anythign where
 the result was 0 -- the suggestions provided have done that.

 this is the first time you mentioned that you needed the values returned
 by the function as the scores of the documents (had you mentioned that you
 might have gotten differnet answers)

 b) if you look closely at the suggestion from André, you'll see that his
 specific suggestion will actually do what you want if you follow it --
 express the query you want in the q param (so you get the scores from
 it) and then express an fq that refers to the q query as a variable...

 :  q=ipodfq={!frange l=0 incl=false}query($q)

 c) Based on the concrete example you've given above, i'ts not clear to me
 that you actually need any of this -- if the above query is giving you the
 results you want, but you want the scores from the edismax query to be
 used as the final scores of the function, then there is no need to wrap
 the query in any sort of function at all, or exclude any 0 values

 this should be exactly what you want...

 q={!type=edismax qf=abstract^0.02 title^0.08 categorysearch^0.05
 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '}

 ...why exactly did you think you needed to wrap that query in a function?

 https://people.apache.org/~hossman/#xyproblem
 XY Problem

 Your question appears to be an XY Problem ... that is: you are dealing
 with X, you are assuming Y will help you, and you are asking about Y
 without giving more details about the X so that we can understand the
 full issue.  Perhaps the best solution doesn't involve Y at all?
 See Also: http://www.perlmonks.org/index.pl?node_id=542341




 -Hoss

Re: Integrating Surround Query Parser