Problem with pdf files indexing

2011-11-22 Thread Dali
Hi !I'm using solr 3.3 version and i have some pdf files which i want to
index. I followed instructions from the wiki page:
http://wiki.apache.org/solr/ExtractingRequestHandler
The problem is that i can add my documents to Solr but i cannot request
them. Here is what i have:

*solrconfig.xml*:
requestHandler name=/update/extract 
  startup=lazy
  class=solr.extraction.ExtractingRequestHandler 
lst name=defaults
  str name=fmap.contenttext/str
  str name=lowernamestrue/str
  str name=uprefixignored_/str
  str name=captureAttrtrue/str
  str name=fmap.alinks/str
  str name=fmap.divignored_/str
/lst
  /requestHandler

*schema.xml *:
field name=title type=string indexed=true stored=true/
 field name=author type=string indexed=true stored=true /
  field name=text type=text_general indexed=true stored=true
multiValued=true/

*data-config.xml* :
 ...
dataSource type=BinFileDataSource name=ds-file/
...
 entity  processor=TikaEntityProcessor  dataSource=ds-file
url=../${document.filename}
field column=Author 
name=author meta=true/
field column=title 
name=title meta=true/
field column=text 
name=text/
/entity
...

I use Solrj to add documents as follows:
SolrServer server = new CommonsHttpSolrServer(http://localhost:8080/solr;);
   ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest(/update/extract);
   up.addFile(new File(d:\\test.pdf));
   up.setParam(literal.id, test);
   up.setParam(extractOnly, true);
   server.commit();
   NamedList result = server.request(up);
   System.out.println(Result:  + result);  // can display information
about test.pdf
   QueryResponse rsp = server.query( new SolrQuery( *:*) );
   System.out.println(rsp:  + rsp); // returns nothing

Any suggestion?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-pdf-files-indexing-tp3527202p3527202.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: wild card search and lower-casing

2011-11-22 Thread Dmitry Kan
I guess, I have found your comment, thanks.

For our current needs I have just set:

setLowercaseExpandedTerms(true); // changed from default false

in the SolrQueryParser's constructor and that seem to work so far.

In order not to start a separate thread on wildcards. Is it so, that for
the trailing wildcard there is a minimum of 2 preceding characters for a
search to happen?

Dmitry

On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson erickerick...@gmail.comwrote:

 It may be. The tricky bit is that there is a constant governing the
 behavior of
 this that restricts it to 3.6 and above. You'll have to change it after
 applying
 the patch for this to work for you. Should be trivial, I'll leave a note
 in the
 code about this, look for SOLR-2438 in the 3x code line for the place
 to change.

 On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan dmitry@gmail.com wrote:
  Thanks Erick.
 
  Do you think the patch you are working on will be applicable as well to
 3.4?
 
  Best,
  Dmitry
 
  On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  As it happens I'm working on SOLR-2438 which should address this. This
  patch
  will provide two things:
 
  The ability to define a new analysis chain in your schema.xml, currently
  called
  multiterm that will be applied to queries of various sorts,
  including wildcard,
  prefix, range. This will be somewhat of an expert thing to make
  yourself...
 
  In the absence of an explicit definition it'll synthesize a multiterm
  analyzer
  out of the query analyzer, taking any char fitlers, and
  lowercaseFilter (if present),
  and ASCIIFoldingfilter (if present) and putting them in the multiterm
  analyzer along
  with a (hardcoded) WhitespaceTokenizer.
 
  As of 3.6 and 4.0, this will be the default behavior, although you can
  explicitly
  define a field type parameter to specify the current behavior.
 
  The reason it is on 3.6 is that I want it to bake for a while before
  getting into the
  wild, so I have no intention of trying to get it into the 3.5 release.
 
  The patch is up for review now, I'd like another set of eyeballs or
  two on it before
  committing.
 
  The patch that's up there now is against trunk but I hope to have a 3x
  patch that
  I'll apply to the 3x code line after 3.5 RC1 is cut.
 
  Best
  Erick
 
 
  On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com
 wrote:
  
   You're right:
  
   public SolrQueryParser(IndexSchema schema, String
   defaultField) {
   ...
   setLowercaseExpandedTerms(false);
   ...
   }
  
   Please note that lowercaseExpandedTerms uses String.toLowercase()
 (uses
   default Locale) which is a Locale sensitive operation.
  
   In Lucene AnalyzingQueryParser exists for this purposes, but I am not
  sure if it is ported to solr.
  
  
 
 http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
  
 
 



date range in solr 3.1

2011-11-22 Thread do3do3
i try to use range faceting in solr 3.1 using facet.range=date,
f.date.facet.range.gap=+1DAY, f.date.facet.range.start=NOW/DAY-5DAYS, and
f.date.facet.range.end=NOW/DAY
and i get this exception 

Exception during facet.range of date org.apache.solr.common.SolrException:
Can't add gap 1DAYS to value Sun Nov 13 00:00:00 UTC 2011 for field: date at
org.apache.solr.request.SimpleFacets$RangeEndpointCalculator.addGap(SimpleFacets.java:1093)
at
org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:873)
at
org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:839)
at
org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:778)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:178)
at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:563)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:399)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:317)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:204)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:182)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:311)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662) Caused by:
java.text.ParseException: Unrecognized command:   at
org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:277) at
org.apache.solr.request.SimpleFacets$DateRangeEndpointCalculator.parseAndAddGap(SimpleFacets.java:1188)
at
org.apache.solr.request.SimpleFacets$DateRangeEndpointCalculator.parseAndAddGap(SimpleFacets.java:1160)
at
org.apache.solr.request.SimpleFacets$RangeEndpointCalculator.addGap(SimpleFacets.java:1091)
... 27 more
can you help me plz
thanks in advance :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/date-range-in-solr-3-1-tp3527498p3527498.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Integrating Surround Query Parser

2011-11-22 Thread Ahmet Arslan


--- On Tue, 11/22/11, Rahul Mehta rahul23134...@gmail.com wrote:

 From: Rahul Mehta rahul23134...@gmail.com
 Subject: Integrating Surround Query Parser
 To: solr-user@lucene.apache.org
 Date: Tuesday, November 22, 2011, 8:05 AM
 Hello,
 
 I want to Run surround query .
 
 
    1. Downloading from
    http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm
    2. Moved the
 lucene-surround-2.4.1.jar  to
 /apache-solr-3.1.0/example/lib
    3. Edit  the solrconfig.xml with
       1. queryParser
 name=SurroundQParser class=
      
 org.apache.lucene.queryParser.surround.parser.QueryParser/
    4. Restart Solr
 
 Got this error :
 
 org.apache.solr.common.SolrException: Error Instantiating
 QParserPlugin,
 org.apache.lucene.queryParser.surround.parser.QueryParser
 is not a org.apache.solr.search.QParserPlugin
     at
 org.apache.solr.core.SolrCore.createInstance(SolrCore.java:425)
 
 
 
 -- 
 Thanks  Regards
 

Hello Rahul,

It is already integrated. Please see : 
http://wiki.apache.org/solr/SurroundQueryParser



Re: how to use term proxymity queries with apache solr

2011-11-22 Thread Ahmet Arslan
 Have used Proximity Queries only work using a sloppy phrase
 query (e.g.:
 catalyst polymer ~5) but do not allow wildcards.
 
 Want to use Proximity Queries between any terms (e.g.:
 (poly* NEAR *lyst))
 is this possible using additional query parsers like
 Surround?
 
 if yes ,Please suggest how to install surround.
 
 currently we are using solr 3.1 .

Not sure about leading wildcard but uou can use https://issues.apache.org for 
this.



How to be sure that surround

2011-11-22 Thread Rahul Mehta
I have done the following steps for installing surround plugin.

   1. Downloading from
   http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm
   2. Moved the lucene-surround-2.4.1.jar  to /apache-solr-3.1.0/example/lib
   3. restart solr .

But How to be sure that surround plugin is being installed .
Means what query i can run.

-- 
Thanks  Regards

Rahul Mehta


Re: how to make effective search with fq and q params

2011-11-22 Thread pravesh
Usually,

Use the 'q' parameter to search for the free text values entered by the
users (where you might want to parse the query and/or apply
boosting/phrase-sloppy, minimum match,tie etc )

Use the 'fq' to limit the searches to certain criterias like location,
date-ranges etc.

Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to use term proxymity queries with apache solr

2011-11-22 Thread Ahmet Arslan
 Not sure about leading wildcard but you can use https://issues.apache.org for 
 this.

Sorry, link was : https://issues.apache.org/jira/browse/SOLR-1604 


Re: How to be sure that surround

2011-11-22 Thread Ahmet Arslan
 I have done the following steps for
 installing surround plugin.
 
    1. Downloading from
    http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm
    2. Moved the
 lucene-surround-2.4.1.jar  to
 /apache-solr-3.1.0/example/lib
    3. restart solr .
 
 But How to be sure that surround plugin is being installed
 .
 Means what query i can run.
 

Rahul, you need to switch to solr-trunk, it is already there
http://wiki.apache.org/solr/SurroundQueryParser


Re: How to be sure that surround

2011-11-22 Thread Rahul Mehta
I have the solr-trunk , but queries are running on both (on trunk (4.0) and
on (3.1) ) . then how i can be sure that what query will run by surround
query parser plugin.

The query i tried :
http://localhost:8983/solr/select?q=abstracts:99n(flat,panel,display)

http://localhost:8983/solr/select?q=abstracts:(poly*%20NEAR%20*lyst)

The above queries both are running on 3.1 and 4.0

How i can sure that these query are running by Surround Plugin.


On Tue, Nov 22, 2011 at 5:51 PM, Ahmet Arslan iori...@yahoo.com wrote:

  I have done the following steps for
  installing surround plugin.
 
 1. Downloading from
 http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm
 2. Moved the
  lucene-surround-2.4.1.jar  to
  /apache-solr-3.1.0/example/lib
 3. restart solr .
 
  But How to be sure that surround plugin is being installed
  .
  Means what query i can run.
 

 Rahul, you need to switch to solr-trunk, it is already there
 http://wiki.apache.org/solr/SurroundQueryParser




-- 
Thanks  Regards

Rahul Mehta


Re: how to use term proxymity queries with apache solr

2011-11-22 Thread Rahul Mehta
do i need to install this seperately or it is integrated in solr 4.0 ?

On Tue, Nov 22, 2011 at 5:49 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Not sure about leading wildcard but you can use
 https://issues.apache.org for this.

 Sorry, link was : https://issues.apache.org/jira/browse/SOLR-1604




-- 
Thanks  Regards

Rahul Mehta


Solr highlighting isn't work!

2011-11-22 Thread VladislavLysov
Hello!!!
  I have a trouble with Solr highlighting. I have any document with next
fields- TYPE, DBID and others. When i do next request - 
https://localhost:8443/solr/myCore/afts?wt=standardq=TYPE:
https://localhost:8443/solr/myCore/afts?wt=standardq=TYPE:cm:contentindent=onhl=truehl.fl=DBIDhl.usePhraseHighlighter=truefl=DBID
 
it was returned next text:
response
lst name=responseHeader
int name=status0/int
int name=QTime3/int
/lst
result name=response numFound=166 start=0
doc
arr name=DBID
str892/str
/arr
/doc
doc
/result
lst name=highlighting
lst name=LEAF-892/
/lst
/response
What is problem?
Thank you!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-highlighting-isn-t-work-tp3527701p3527701.html
Sent from the Solr - User mailing list archive at Nabble.com.


Stats per group with StatsComponent?

2011-11-22 Thread Morten Lied Johansen


Hi

We need to get minimum and maximum values for a field, within a group in 
a grouped search-result. Is this possible today, perhaps by using 
StatsComponent some way?


I'll flesh out the example a little, to make the question clearer.

We have a number of documents, indexed with a price, date and a hotel. 
For each hotel, there are a number of documents, each representing a 
price/date combination. We then group our search result on hotel.


We want to show the minimum and maximum price for each hotel.

A little googling leads us to look at StatsComponent, as what it does 
would be what we need, if it could be done for each group. There was a 
thread on this list in August, Grouping and performing statistics per 
group that seemed to go into this a bit, but didn't find a solution.


Is this possible in Solr 3.4, either with StatsComponent, or some other way?

--
Morten
We all live in a yellow subroutine.


Re: how to make effective search with fq and q params

2011-11-22 Thread meghana
Thanks Pravesh for your reply.. 
I definitely try this.. i hope it will improve solr response time.

pravesh wrote
 
 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to
 matchalldocsquery
 
 Regds
 Pravesh
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527654.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to be sure that surround

2011-11-22 Thread Ahmet Arslan
 I have the solr-trunk , but queries
 are running on both (on trunk (4.0) and
 on (3.1) ) . then how i can be sure that what query will
 run by surround
 query parser plugin.
 
 The query i tried :
 http://localhost:8983/solr/select?q=abstracts:99n(flat,panel,display)
 
 http://localhost:8983/solr/select?q=abstracts:(poly*%20NEAR%20*lyst)
 
 The above queries both are running on 3.1 and 4.0
 
 How i can sure that these query are running by Surround
 Plugin.
 

You can use q={!surround df=abstracts}99n(flat,panel,display)

If you append debugQuery=on, it should display some info regarding which query 
parser is used, which Query is constructed etc.


Re: how to use term proxymity queries with apache solr

2011-11-22 Thread Ahmet Arslan

 do i need to install this seperately
 or it is integrated in solr 4.0 ?

You need to install SOLR-1604 separately. But this is easy since it is 
implemented as a solr plugin.


Re: Integrating Surround Query Parser

2011-11-22 Thread Erik Hatcher
The surround query parser is fully wired into Solr trunk/4.0, if that helps.  
See http://wiki.apache.org/solr/SurroundQueryParser and the JIRA issue linked 
there in case you want to patch it into a different version.

Erik

On Jan 21, 2011, at 02:24 , Ahson Iqbal wrote:

 Hi All
 
 I want to integrate Surround Query Parser with solr, To do this i have 
 downloaded jar file from the internet and and then pasting that jar file in 
 web-inf/lib 
 
 and configured query parser in solrconfig.xml as 
 queryParser name=SurroundQParser 
 class=org.apache.lucene.queryParser.surround.parser.QueryParser/
 
 now when i load solr admin page following exception comes
 org.apache.solr.common.SolrException: Error Instantiating QParserPlugin,  
 org.apache.lucene.queryParser.surround.parser.QueryParser is not a  
 org.apache.solr.search.QParserPlugin
 
 what i think that i didnt get the right plugin, can any body guide me from 
 where 
 to get right plugin for surround query parser or how to accurately integrate 
 this plugin with solr. 
 
 
 thanx
 Ahsan
 
 
 



Re: how to make effective search with fq and q params

2011-11-22 Thread Erik Hatcher
If all you're doing is filtering (browsing by facets perhaps), it's perfectly 
fine to have q=*:*.  MatchAllDocsQuery is fast (and would be cached anyway), so 
use *:* as appropriate without worries.

Erik



On Nov 22, 2011, at 07:18 , pravesh wrote:

 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery
 
 Regds
 Pravesh
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: [ANNOUNCEMENT] Second Edition of the First Book on Solr

2011-11-22 Thread Jan Høydahl
Congratulations!

Feel free to write a shorter version of the announcement text, suitable as a 
news teaser on the Solr site, and we'll try to update the site with new thumb 
and all.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 18. nov. 2011, at 06:17, Smiley, David W. wrote:

 Fellow Solr users,
 
 I am proud to announce that the book Apache Solr 3 Enterprise Search Server 
 is officially published!  This is the second edition of the first book on 
 Solr by me, David Smiley, and my co-author Eric Pugh.  You can find full 
 details about the book, download a free chapter, and purchase it here:
   http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 It is also available through other channels like Amazon.  You can feel good 
 about the purchase knowing that 5% of each sale goes to support the Apache 
 Software Foundation.  If you buy directly from the publisher, then the basis 
 of the percentage that goes to the ASF (and to me) is higher than if you buy 
 it through other channels.  
 
 This book naturally covers the latest features in Solr as of version 3.4 like 
 Result Grouping and Geospatial, but this is not a small update to the first 
 book.  We have more experience with Solr and we've listened to reader 
 feedback from the first edition.  No chapter was untouched: Faceting gets its 
 own chapter, all search relevancy matters are discussed in one chapter, 
 auto-complete approaches are all discussed together, much of the chapter on 
 integration was rewritten to discuss newer technologies, and the first 
 chapter was greatly streamlined.  Furthermore, each chapter has a tip in the 
 introduction that advises readers in a hurry on what parts should be read now 
 or later.  Finally, we developed a 2-page parameter quick-reference appendix 
 that you will surely find useful printed on your desk.  In summary, we 
 improved the existing content, and added about 25% more by page count.
 
 Software, errata, and other information about this book and the previous 
 edition is on our website:
   http://www.solrenterprisesearchserver.com/
 We've been working hard on this book for the last 10 months and we hope it 
 really helps saves you time and improves your search project!
 
   Apache Solr 3 Enterprise Search Server In Detail:
 
 If you are a developer building an app today then you know how important a 
 good search experience is.  Apache Solr, built on Apache Lucene, is a wildly 
 popular open source enterprise search server that easily delivers powerful 
 search and faceted navigation features that are elusive with databases.  Solr 
 supports complex search criteria, faceting, result highlighting, 
 query-completion, query spell-check, relevancy tuning, and more.
 
 Apache Solr 3 Enterprise Search Server is a comprehensive reference guide for 
 every feature Solr has to offer.  It serves the reader right from initiation 
 to development to deployment.  It also comes with complete running examples 
 to demonstrate its use and show how to integrate Solr with other languages 
 and frameworks.
 
 Through using a large set of metadata about artists, releases, and tracks 
 courtesy of the MusicBrainz.org project, you will have a testing ground for 
 Solr, and will learn how to import this data in various ways.  You will then 
 learn how to search this data in different ways, including Solr's rich query 
 syntax and boosting match scores based on record data.  Finally, we'll 
 cover various deployment considerations to include indexing strategies and 
 performance-oriented configuration that will enable you to scale Solr to meet 
 the needs of a high-volume site.
 
 Sincerely,
 
   David Smiley (primary author)   david.w.smi...@gmail.com
   Eric Pugh (co-author)   ep...@opensourceconnections.com
 



Re: date range in solr 3.1

2011-11-22 Thread Jan Høydahl
Hi,

Long shot: Try f.date.facet.range.gap=%2B1DAY instead, in case your + was 
interpreted as space by your browser...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 22. nov. 2011, at 12:57, do3do3 wrote:

 i try to use range faceting in solr 3.1 using facet.range=date,
 f.date.facet.range.gap=+1DAY, f.date.facet.range.start=NOW/DAY-5DAYS, and
 f.date.facet.range.end=NOW/DAY
 and i get this exception 
 
 Exception during facet.range of date org.apache.solr.common.SolrException:
 Can't add gap 1DAYS to value Sun Nov 13 00:00:00 UTC 2011 for field: date at
 org.apache.solr.request.SimpleFacets$RangeEndpointCalculator.addGap(SimpleFacets.java:1093)
 at
 org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:873)
 at
 org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:839)
 at
 org.apache.solr.request.SimpleFacets.getFacetRangeCounts(SimpleFacets.java:778)
 at
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:178)
 at
 org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164)
 at
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
 at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:563)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:399)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:317)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:204)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:182)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:311)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662) Caused by:
 java.text.ParseException: Unrecognized command:   at
 org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:277) at
 org.apache.solr.request.SimpleFacets$DateRangeEndpointCalculator.parseAndAddGap(SimpleFacets.java:1188)
 at
 org.apache.solr.request.SimpleFacets$DateRangeEndpointCalculator.parseAndAddGap(SimpleFacets.java:1160)
 at
 org.apache.solr.request.SimpleFacets$RangeEndpointCalculator.addGap(SimpleFacets.java:1091)
 ... 27 more
 can you help me plz
 thanks in advance :)
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/date-range-in-solr-3-1-tp3527498p3527498.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Matching + and

2011-11-22 Thread Jan Høydahl
Why do you need spaces in the replacement?

Try pattern=\+ replacement=plus - it will cause the transformed charstream 
to contain as many tokens as the original and avoid the highlighting crash.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 22. nov. 2011, at 05:40, Tomasz Wegrzanowski wrote:

 Hi,
 
 I've been trying to match some phrases with + and  (like c++,
 google+, rd etc.),
 but tokenized gets rid of them before I can do anything with synonym filters.
 
 So I tried using CharFilters like this:
 
fieldType name=text class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
  analyzer type=index
charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=\+ replacement= plus /
charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=amp; replacement= and /
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory
 synonyms=synonyms_case_sensitive.txt ignoreCase=false
 expand=true/
filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
filter class=solr.SnowballPorterFilterFactory language=English/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory
 synonyms=query_synonyms.txt ignoreCase=true expand=false /
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
filter class=solr.SnowballPorterFilterFactory language=English/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType
 
 This mostly works, but for a very small number of documents, mostly
 those with large number of pluses in them,
 highlighter just crashes (and it's highlighter since turning it off
 and reissuing the query works just fine, if I replace
 pluses with spaces and reindex, the same query reruns just fine) with
 exception like this:
 
 Nov 21, 2011 11:35:11 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.StringIndexOutOfBoundsException: String index out of range: 
 -1
   at java.lang.String.substring(String.java:1938)
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:237)
   at 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:462)
   at 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:378)
   at 
 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:343)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
   at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
   at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
   at 

Unexpected cpu load and Solr incrase response time

2011-11-22 Thread mikopacz
Hi, we currently have 2 servers running on JBoss container (master and slave)
with 20mln documents and about 3GB index size.
Java was executed with options:
*-Xms12G -Xmx12G -XX:NewSize=4G -XX:MaxNewSize=4G -XX:MaxPermSize=256m
-Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.client.gcInterval=360
-Dsun.rmi.dgc.server.gcInterval=360 -XX:+UseCompressedOops -XX:+UseTLAB
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled*

Commit duration on master is 5 minutes and we use Solr 3.3 (because in 3.4
we have a problem with dataimport
https://issues.apache.org/jira/browse/SOLR-2804). 

We have problem that occurs when server gets about *34* qps. *Do you have
any advice how to fix this problem?* I have attached the charts below. The
load and threads count increases between 19:00 and 20:00. On 20:10 we
reduced by half the number of queries.

http://lucene.472066.n3.nabble.com/file/n3527914/solr_users_reqs-day.png 
http://lucene.472066.n3.nabble.com/file/n3527914/load-day.png 
http://lucene.472066.n3.nabble.com/file/n3527914/threads-day.png 
http://lucene.472066.n3.nabble.com/file/n3527914/jboss_threads-day.png 
http://lucene.472066.n3.nabble.com/file/n3527914/cpu-day.png 
http://lucene.472066.n3.nabble.com/file/n3527914/_avg_response_query_time-22-11-2011_15_50_17.png
 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unexpected-cpu-load-and-Solr-incrase-response-time-tp3527914p3527914.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to make effective search with fq and q params

2011-11-22 Thread Jeff Schmidt
Hi Erik:

When using [e]dismax, does configuring q.alt=*:* and not specifying q affect 
the performance/caching in any way?

As a side note, a while back I configured q.alt=*:*, and the application (via 
SolrJ) still set q=*:* if no user input was provided (faceting). With both of 
them set that way, I got zero results. (Solr 3.4.0)  Interesting.

Thanks,

Jeff

On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote:

 If all you're doing is filtering (browsing by facets perhaps), it's perfectly 
 fine to have q=*:*.  MatchAllDocsQuery is fast (and would be cached anyway), 
 so use *:* as appropriate without worries.
 
   Erik
 
 
 
 On Nov 22, 2011, at 07:18 , pravesh wrote:
 
 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery
 
 Regds
 Pravesh
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068











Re: how to make effective search with fq and q params

2011-11-22 Thread Erik Hatcher

On Nov 22, 2011, at 09:55 , Jeff Schmidt wrote:
 When using [e]dismax, does configuring q.alt=*:* and not specifying q affect 
 the performance/caching in any way?

No different than using q=*:* with the lucene query parser.  
MatchAllDocsQuery is possibly the fastest query out there!  (it simply matches 
documents in index order, all scores are 1.0)

 As a side note, a while back I configured q.alt=*:*, and the application (via 
 SolrJ) still set q=*:* if no user input was provided (faceting). With both of 
 them set that way, I got zero results. (Solr 3.4.0)  Interesting.

Ouch.  Really?  I don't see in the code (looking at my trunk checkout) where 
there's any *:* used in the SolrJ library.  Can you provide some details on how 
you used SolrJ?  It'd be good to track this down as that seems like a bug to me.

Erik


 
 Thanks,
 
 Jeff
 
 On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote:
 
 If all you're doing is filtering (browsing by facets perhaps), it's 
 perfectly fine to have q=*:*.  MatchAllDocsQuery is fast (and would be 
 cached anyway), so use *:* as appropriate without worries.
 
  Erik
 
 
 
 On Nov 22, 2011, at 07:18 , pravesh wrote:
 
 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery
 
 Regds
 Pravesh
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 j...@535consulting.com
 http://www.535consulting.com
 (650) 423-1068
 
 
 
 
 
 
 
 
 



How to select all docs of 'today' ?

2011-11-22 Thread Danicela nutch
Hi,

 I have a fetch-time (date) field to know when the documents were fetched.

 I want to make a query to get all documents fetched today.

 I tried : 

 fetch-time:NOW/DAY
but it returns always 0.

 fetch-time:[NOW/DAY TO NOW/DAY]
 (it returns 0)

 fetch-time:[NOW/DAY-1DAY TO NOW/DAY]
but it returns documents fetched yesterday.

 fetch-time:[NOW/DAY-1HOUR TO NOW/DAY]
but it's incorrect too.

 Do you have any idea ?

 Thanks in advance.


Re: Solr real time update

2011-11-22 Thread Nagendra Nagarajayya

Yu:

To get Near Real Time update in Solr 1.4.1 you will need to use Solr 
1.4.1 with RankingAlgorithm. This allows you to update documents in near 
real time. You can download and give this a try from here:


http://solr-ra.tgels.org/

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org/
http://rankingalgorithm.tgels.org/

On 11/21/2011 9:47 PM, yu shen wrote:

Hi All,

After some study, I used below snippet. Seems the documents is updated,
while still takes a long time. Feels like the parameter does not take
effect. Any comments?
UpdateRequest req = new UpdateRequest();
 req.add(solrDocs);
 req.setCommitWithin(5000);
 req.setParam(commitWithin, 5000);
 req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
 req.process(SOLR_SERVER);

2011/11/22 yu shenshenyu...@gmail.com


Hi All,

I try to do a 'nearly real time update' to solr.  My solr version is
1.4.1. I read this solr 
CommentWithinhttp://wiki.apache.org/solr/CommitWithinwiki, and a related
threadhttp://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-td3472709.htmlmostly
 on the difficulty to do this.
My issue is I tried the code snippet in the wiki:

UpdateRequest req = new UpdateRequest();
req.add(mySolrInputDocument);
req.setCommitWithin(1);
req.process(server);

But my index did not get updated, unless I call SOLR_SERVER.commit();
explicitly. The latter call will take more than 1 minute on average to
return.

Can I do a real time update on solr 1.4.1? Would someone help to show a
workable code snippet?

Spark





Re: wild card search and lower-casing

2011-11-22 Thread Erick Erickson
No, no, no That's something buried in Lucene, it has nothing to
do with the patch! The patch has NOT yet been applied to any
released code.

You could pull the patch from the JIRA and apply it to trunk locally if
you wanted. But there's no patch for 3.x, I'll probably put that up
over the holiday.

But things have changed a bit (one of the things I'll have to do is
create some documentation). You *should* be able to specify
just legacyMultiTerm=true in your fieldType if you want to
apply the 3.x patch to pre 3.6 code. It would be a good field test
if that worked for you.

But you can't do any of this until the JIRA (SOLR-2438) is
marked Resolution: Fixed.

Don't be fooled by Fix Version. Fix Version simply says
that those are the earliest versions it *could* go in.

Best
Erick

Best
Erick

On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan dmitry@gmail.com wrote:
 I guess, I have found your comment, thanks.

 For our current needs I have just set:

 setLowercaseExpandedTerms(true); // changed from default false

 in the SolrQueryParser's constructor and that seem to work so far.

 In order not to start a separate thread on wildcards. Is it so, that for
 the trailing wildcard there is a minimum of 2 preceding characters for a
 search to happen?

 Dmitry

 On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 It may be. The tricky bit is that there is a constant governing the
 behavior of
 this that restricts it to 3.6 and above. You'll have to change it after
 applying
 the patch for this to work for you. Should be trivial, I'll leave a note
 in the
 code about this, look for SOLR-2438 in the 3x code line for the place
 to change.

 On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan dmitry@gmail.com wrote:
  Thanks Erick.
 
  Do you think the patch you are working on will be applicable as well to
 3.4?
 
  Best,
  Dmitry
 
  On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  As it happens I'm working on SOLR-2438 which should address this. This
  patch
  will provide two things:
 
  The ability to define a new analysis chain in your schema.xml, currently
  called
  multiterm that will be applied to queries of various sorts,
  including wildcard,
  prefix, range. This will be somewhat of an expert thing to make
  yourself...
 
  In the absence of an explicit definition it'll synthesize a multiterm
  analyzer
  out of the query analyzer, taking any char fitlers, and
  lowercaseFilter (if present),
  and ASCIIFoldingfilter (if present) and putting them in the multiterm
  analyzer along
  with a (hardcoded) WhitespaceTokenizer.
 
  As of 3.6 and 4.0, this will be the default behavior, although you can
  explicitly
  define a field type parameter to specify the current behavior.
 
  The reason it is on 3.6 is that I want it to bake for a while before
  getting into the
  wild, so I have no intention of trying to get it into the 3.5 release.
 
  The patch is up for review now, I'd like another set of eyeballs or
  two on it before
  committing.
 
  The patch that's up there now is against trunk but I hope to have a 3x
  patch that
  I'll apply to the 3x code line after 3.5 RC1 is cut.
 
  Best
  Erick
 
 
  On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com
 wrote:
  
   You're right:
  
   public SolrQueryParser(IndexSchema schema, String
   defaultField) {
   ...
   setLowercaseExpandedTerms(false);
   ...
   }
  
   Please note that lowercaseExpandedTerms uses String.toLowercase()
 (uses
   default Locale) which is a Locale sensitive operation.
  
   In Lucene AnalyzingQueryParser exists for this purposes, but I am not
  sure if it is ported to solr.
  
  
 
 http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
  
 
 




Re: wild card search and lower-casing

2011-11-22 Thread Dmitry Kan
Thanks, Erick. I was in fact reading the patch (the one attached as a
file to the aforementioned jira) you updated sometime yesterday. I'll
watch the issue, but as said the change of a hard-coded boolean to its
opposite worked just fine for me.

Best,
Dmitry


On 11/22/11, Erick Erickson erickerick...@gmail.com wrote:
 No, no, no That's something buried in Lucene, it has nothing to
 do with the patch! The patch has NOT yet been applied to any
 released code.

 You could pull the patch from the JIRA and apply it to trunk locally if
 you wanted. But there's no patch for 3.x, I'll probably put that up
 over the holiday.

 But things have changed a bit (one of the things I'll have to do is
 create some documentation). You *should* be able to specify
 just legacyMultiTerm=true in your fieldType if you want to
 apply the 3.x patch to pre 3.6 code. It would be a good field test
 if that worked for you.

 But you can't do any of this until the JIRA (SOLR-2438) is
 marked Resolution: Fixed.

 Don't be fooled by Fix Version. Fix Version simply says
 that those are the earliest versions it *could* go in.

 Best
 Erick

 Best
 Erick

 On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan dmitry@gmail.com wrote:
 I guess, I have found your comment, thanks.

 For our current needs I have just set:

 setLowercaseExpandedTerms(true); // changed from default false

 in the SolrQueryParser's constructor and that seem to work so far.

 In order not to start a separate thread on wildcards. Is it so, that for
 the trailing wildcard there is a minimum of 2 preceding characters for a
 search to happen?

 Dmitry

 On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson
 erickerick...@gmail.comwrote:

 It may be. The tricky bit is that there is a constant governing the
 behavior of
 this that restricts it to 3.6 and above. You'll have to change it after
 applying
 the patch for this to work for you. Should be trivial, I'll leave a note
 in the
 code about this, look for SOLR-2438 in the 3x code line for the place
 to change.

 On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan dmitry@gmail.com wrote:
  Thanks Erick.
 
  Do you think the patch you are working on will be applicable as well to
 3.4?
 
  Best,
  Dmitry
 
  On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson
  erickerick...@gmail.com
 wrote:
 
  As it happens I'm working on SOLR-2438 which should address this. This
  patch
  will provide two things:
 
  The ability to define a new analysis chain in your schema.xml,
  currently
  called
  multiterm that will be applied to queries of various sorts,
  including wildcard,
  prefix, range. This will be somewhat of an expert thing to make
  yourself...
 
  In the absence of an explicit definition it'll synthesize a multiterm
  analyzer
  out of the query analyzer, taking any char fitlers, and
  lowercaseFilter (if present),
  and ASCIIFoldingfilter (if present) and putting them in the multiterm
  analyzer along
  with a (hardcoded) WhitespaceTokenizer.
 
  As of 3.6 and 4.0, this will be the default behavior, although you can
  explicitly
  define a field type parameter to specify the current behavior.
 
  The reason it is on 3.6 is that I want it to bake for a while before
  getting into the
  wild, so I have no intention of trying to get it into the 3.5 release.
 
  The patch is up for review now, I'd like another set of eyeballs or
  two on it before
  committing.
 
  The patch that's up there now is against trunk but I hope to have a 3x
  patch that
  I'll apply to the 3x code line after 3.5 RC1 is cut.
 
  Best
  Erick
 
 
  On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com
 wrote:
  
   You're right:
  
   public SolrQueryParser(IndexSchema schema, String
   defaultField) {
   ...
   setLowercaseExpandedTerms(false);
   ...
   }
  
   Please note that lowercaseExpandedTerms uses String.toLowercase()
 (uses
   default Locale) which is a Locale sensitive operation.
  
   In Lucene AnalyzingQueryParser exists for this purposes, but I am
   not
  sure if it is ported to solr.
  
  
 
 http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
  
 
 





-- 
Regards,

Dmitry Kan


AW: How to select all docs of 'today' ?

2011-11-22 Thread Peters, Sebastian
Hi,

fetch-time:[NOW/DAY TO NOW] should do it.

Best
Sebastian


-Ursprüngliche Nachricht-
Von: Danicela nutch [mailto:danicela-nu...@mail.com] 
Gesendet: Dienstag, 22. November 2011 16:08
An: solr-user@lucene.apache.org
Betreff: How to select all docs of 'today' ?

Hi,

 I have a fetch-time (date) field to know when the documents were fetched.

 I want to make a query to get all documents fetched today.

 I tried : 

 fetch-time:NOW/DAY
but it returns always 0.

 fetch-time:[NOW/DAY TO NOW/DAY]
 (it returns 0)

 fetch-time:[NOW/DAY-1DAY TO NOW/DAY]
but it returns documents fetched yesterday.

 fetch-time:[NOW/DAY-1HOUR TO NOW/DAY]
but it's incorrect too.

 Do you have any idea ?

 Thanks in advance.


Re: Autocomplete(terms) performance problem

2011-11-22 Thread solr-ra
You should try out the autocomplete component using Solr with
RankingAlgorithm. The performance is less  than 3ms for a 1 million
Wikipedia titles index with very low deviation. You can get more information
about the performance with different indexes of size 3k, 390k, 1m, 10m docs
from here:

http://solr-ra.tgels.org/solr-ra-autocomplete.jsp

- Nagendra Nagarajayya

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3528112.html
Sent from the Solr - User mailing list archive at Nabble.com.


weird issue with solr and CentOS 5.7

2011-11-22 Thread Boris Quiroz
Hi all,

I'm facing a real weird issue here with solr (lucene 3.3) and CentOS
5.7. I've two servers, one running CentOS 5.5 and the other running
CentOS 5.7. Both servers has the same solr, java and tomcat versions,
the only difference between them is OS version.
I added a custom field to schema.xml: field name=stream_isPrivate
type=boolean indexed=true stored=true required=false/. When
that type is boolean, on CentOS 5.5 works OK indexing Chinese
characters, but on CentOS 5.7 I got this exception:

Nov 22, 2011 11:27:11 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select/
params={indent=onstart=0q=我们从右上角讲起rows=10version=2.2} hits=1
status=0 QTime=8
Nov 22, 2011 11:27:11 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(String.java:694)
at org.apache.solr.schema.BoolField.write(BoolField.java:129)
at org.apache.solr.schema.SchemaField.write(SchemaField.java:124)
at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369)
at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545)
at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482)
at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519)
at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582)
at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131)
at 
org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685)
at java.lang.Thread.run(Thread.java:636)

That only happens on CentOS 5.7. I also tested on Ubuntu Server, and
also works OK.

solrconfig.xml and everything else is the same on both servers. Any
idea what could be happening? Should it be a CentOS bug?

Regards.
-- 
Boris Quiroz
boris.qui...@menco.it


NullPointerException with distributed facets

2011-11-22 Thread Phil Hoy
Hi,

When doing a distributed query in solr 4.0 (4.0.0.2011.06.25.15.36.22) with 
facet.missing=true and facet.limit=20 I get a NullPointerException. By 
increasing the facet limit to 200 or setting facet missing to false it seems to 
fix it. The shards both contain the field but one shard always has a value and 
one never has a value. Single shard queries work fine on each shard. Does 
anyone know the cause or a fix?

java.lang.NullPointerException
at 
org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:489)
at 
org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:278)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:292)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1452)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Phil


Re: how to make effective search with fq and q params

2011-11-22 Thread Jeff Schmidt
Hi Erik:

It's not in the SolrJ library, but rather my use of it:

In my application code:

protected static final String SOLR_ALL_DOCS_QUERY = *:*;

/*
  * If no search terms provided, then return all neighbors.
  * Results are to be returned in neighbor symbol alphabetical order.
*/

if (searchTerms == null) {
searchTerms = SOLR_ALL_DOCS_QUERY;
nodeQuery.addSortField(n_name, SolrQuery.ORDER.asc);
}

So, if no user search terms are provided, I search all documents (there are 
other fqs in effect) and return them in name order.

That worked just fine.  Then I read more about [e]dismax, and went and 
configured:

str name=q.alt*:*/str

Then I would get zero results.  It's not a SolrJ issue though, as this request 
in my browser also resulted in zero results:

http://localhost:8091/solr/ing-content/select/?qt=partner-tmofq=type%3Anodefq=n_neighborof_id%3AING\:afaq=*:*rows=5facet=truefacet.mincount=1facet.field=n_neighborof_processExactfacet.field=n_neighborof_edge_type

That was due to the q=*:*.  Once I set, say, q=cancer, I got results.  So I 
guess this is a [e]dismax thing?  (partner-tmo is the name of my request 
handler).

I solved my problem by net setting *:* in my application, and left q.alt=*:* in 
place.

Hope this helps.  Again, this is stock Solr 3.4.0, running the Apache war under 
Tomcat 6.

Jeff

On Nov 22, 2011, at 8:05 AM, Erik Hatcher wrote:

 
 On Nov 22, 2011, at 09:55 , Jeff Schmidt wrote:
 When using [e]dismax, does configuring q.alt=*:* and not specifying q affect 
 the performance/caching in any way?
 
 No different than using q=*:* with the lucene query parser.  
 MatchAllDocsQuery is possibly the fastest query out there!  (it simply 
 matches documents in index order, all scores are 1.0)
 
 As a side note, a while back I configured q.alt=*:*, and the application 
 (via SolrJ) still set q=*:* if no user input was provided (faceting). With 
 both of them set that way, I got zero results. (Solr 3.4.0)  Interesting.
 
 Ouch.  Really?  I don't see in the code (looking at my trunk checkout) where 
 there's any *:* used in the SolrJ library.  Can you provide some details on 
 how you used SolrJ?  It'd be good to track this down as that seems like a bug 
 to me.
 
   Erik
 
 
 
 Thanks,
 
 Jeff
 
 On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote:
 
 If all you're doing is filtering (browsing by facets perhaps), it's 
 perfectly fine to have q=*:*.  MatchAllDocsQuery is fast (and would be 
 cached anyway), so use *:* as appropriate without worries.
 
 Erik
 
 
 
 On Nov 22, 2011, at 07:18 , pravesh wrote:
 
 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to 
 matchalldocsquery
 
 Regds
 Pravesh
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 j...@535consulting.com
 http://www.535consulting.com
 (650) 423-1068
 
 
 
 
 
 
 
 
 
 



--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068











Re : AW: How to select all docs of 'today' ?

2011-11-22 Thread Danicela nutch
Thanks it works.

 All this is based on the fact that NOW/DAY means the beginning of the day.

- Message d'origine -
De : sebastian.pet...@tib.uni-hannover.de
Envoyés : 22.11.11 16:46
À : solr-user@lucene.apache.org
Objet : AW: How to select all docs of 'today' ?

 Hi, fetch-time:[NOW/DAY TO NOW] should do it. Best Sebastian 
-Ursprüngliche Nachricht- Von: Danicela nutch 
[mailto:danicela-nu...@mail.com] Gesendet: Dienstag, 22. November 2011 16:08 
An: solr-user@lucene.apache.org Betreff: How to select all docs of 'today' ? 
Hi, I have a fetch-time (date) field to know when the documents were fetched. I 
want to make a query to get all documents fetched today. I tried : 
fetch-time:NOW/DAY but it returns always 0. fetch-time:[NOW/DAY TO NOW/DAY] (it 
returns 0) fetch-time:[NOW/DAY-1DAY TO NOW/DAY] but it returns documents 
fetched yesterday. fetch-time:[NOW/DAY-1HOUR TO NOW/DAY] but it's incorrect 
too. Do you have any idea ? Thanks in advance.


Re: FunctionQuery score=0

2011-11-22 Thread John
Can this be fixed somehow? I also need the real score.

On Sun, Nov 20, 2011 at 10:44 AM, John fatmanc...@gmail.com wrote:

 After playing some more with this I managed to get what I want, almost.

 My query now looks like:

 q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02
 title^0.08 categorysearch^0.05 boost='eqsim(alltokens,xyz)'
 v='+tokens5:xyz '})


 With the above query, I am getting only the results that I want, the ones
 whose score after my FucntionQuery are above 0, but the problem now is that
 the final score for all results is changed to 1, which affects the sorting.

 How can I keep the original score that is calculated by the edismax query?

 Cheers,
 John


 On Fri, Nov 18, 2011 at 10:50 AM, Andre Bois-Crettez 
 andre.b...@kelkoo.com wrote:

 Definitely worked for me, with a classic full text search on ipod and
 such.
 Changing the lower bound changed the number of results.

 Follow Chris advice, and give more details.



 John wrote:

 Doesn't seem to work.
 I though that FilterQueries work before the search is performed and not
 after... no?

 Debug doesn't include filter query only the below (changed a bit):

 BoostedQuery(boost(+fieldName:**,boostedFunction(ord(**
 fieldName),query)))


 On Thu, Nov 17, 2011 at 5:04 PM, Andre Bois-Crettez
 andre.b...@kelkoo.comwrote:



 John wrote:



 Some of the results are receiving score=0 in my function and I would
 like
 them not to appear in the search results.




 you can use frange, and filter by score:

 q=ipodfq={!frange l=0 incl=false}query($q)

 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/








 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/





Faceting is not Using Field Value Cache . . ?

2011-11-22 Thread CRB


Seeing something odd going on with faceting . . . we execute facets with 
every query and yet the fieldValueCache is not being used:


name:  fieldValueCache
class:  org.apache.solr.search.FastLRUCache
version:  1.0
description:  Concurrent LRU Cache(maxSize=1, initialSize=10, 
minSize=9000, acceptableSize=9500, cleanupThread=false)

stats: lookups : 0
hits : 0
hitratio : 0.00
inserts : 0
evictions : 0
size : 0
warmupTime : 0
cumulative_lookups : 0
cumulative_hits : 0
cumulative_hitratio : 0.00
cumulative_inserts : 0
cumulative_evictions : 0

I was under the impression the fieldValueCache  was an implicit cache 
(if you don't define it, it will still exist).


We are running Solr v3.3 (and NOT using {!cache=false}).

Thoughts?


Re: Problems with AutoSuggest feature(Terms Components)

2011-11-22 Thread mechravi25
Hi Erick,

Thanks for your reply. I would know all the options that can be given under
the defaults section and how they can be overridden. is there any
documentation available in solr forum. Cos we tried searching and wasn't
able to succeed. 

My Exact scenario is that, I have one master core which has many underlying
shards core(Disturbed architecture). I want the terms.limit should be
defaulted to 10 in the underlying shards cores. When i hit the master core,
it will in-turn hit the underlying shard cores. At this point of time, the
terms.limit which has been passed to the master core has to passed to these
underlying shard cores overriding the default value set. Can you please
suggest the definition of the terms component for the underlying shard
cores.

Regards,
Sivaganesh
 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problems-with-AutoSuggest-feature-Terms-Components-tp3512734p3528597.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: To push the terms.limit parameter from the master core to all the shard cores.

2011-11-22 Thread mechravi25
Hi Mark, 

Thanks for your suggestion. 

My Exact scenario is that, I have one master core which has many underlying
shards core(Disturbed architecture). I want the terms.limit should be
defaulted to 10 in the underlying shards cores. When i hit the master core,
it will in-turn hit the underlying shard cores. At this point of time, the
terms.limit which has been passed to the master core has to passed
dynamically to these underlying shard cores overriding the default value
set. Can you please suggest the definition of the terms component for the
underlying shard cores. 

I would know all the options that can be given under the defaults section
and how they can be overridden. is there any documentation available in solr
forum. Cos we tried searching and wasn't able to succeed. 

Regards, 
Sivaganesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-push-the-terms-limit-parameter-from-the-master-core-to-all-the-shard-cores-tp3520609p3528608.html
Sent from the Solr - User mailing list archive at Nabble.com.


Highlight with multi word synonyms

2011-11-22 Thread Brian Gerby
I'm trying to use multi-word synonyms. For example in my synonyms file I have 
nhl, national hockey league. If I do this index only, a search for nhl returns 
a correct match, but highlights the first word only, national. Ideally, it 
would highlight national hockey league or not highlight at all. If I do the 
synonyms at both index and query time, it finds the match and does the correct 
highlighting, but I understand it is not ideal to do synonyms at index and 
query time. I am expanding synonyms and using edismax. Thoughts?


Re: how to make effective search with fq and q params

2011-11-22 Thread Erik Hatcher
I think you're using dismax, not edismax. edismax will take q=*:* just fine as 
it handles all Lucene syntax queries also.  dismax does not.

So, if you're using dismax and there is no actual query (but you want to get 
facets), you set q.alt=*:* and omit q - that's entirely by design.

If there's a non-empty q parameter, q.alt is not considered so there shouldn't 
be any issues with always have q.alt set if that's what you want.

Erik


On Nov 22, 2011, at 11:15 , Jeff Schmidt wrote:

 Hi Erik:
 
 It's not in the SolrJ library, but rather my use of it:
 
 In my application code:
 
 protected static final String SOLR_ALL_DOCS_QUERY = *:*;
 
 /*
  * If no search terms provided, then return all neighbors.
  * Results are to be returned in neighbor symbol alphabetical order.
 */
 
 if (searchTerms == null) {
   searchTerms = SOLR_ALL_DOCS_QUERY;
   nodeQuery.addSortField(n_name, SolrQuery.ORDER.asc);
 }
 
 So, if no user search terms are provided, I search all documents (there are 
 other fqs in effect) and return them in name order.
 
 That worked just fine.  Then I read more about [e]dismax, and went and 
 configured:
 
 str name=q.alt*:*/str
 
 Then I would get zero results.  It's not a SolrJ issue though, as this 
 request in my browser also resulted in zero results:
 
 http://localhost:8091/solr/ing-content/select/?qt=partner-tmofq=type%3Anodefq=n_neighborof_id%3AING\:afaq=*:*rows=5facet=truefacet.mincount=1facet.field=n_neighborof_processExactfacet.field=n_neighborof_edge_type
 
 That was due to the q=*:*.  Once I set, say, q=cancer, I got results.  So I 
 guess this is a [e]dismax thing?  (partner-tmo is the name of my request 
 handler).
 
 I solved my problem by net setting *:* in my application, and left q.alt=*:* 
 in place.
 
 Hope this helps.  Again, this is stock Solr 3.4.0, running the Apache war 
 under Tomcat 6.
 
 Jeff
 
 On Nov 22, 2011, at 8:05 AM, Erik Hatcher wrote:
 
 
 On Nov 22, 2011, at 09:55 , Jeff Schmidt wrote:
 When using [e]dismax, does configuring q.alt=*:* and not specifying q 
 affect the performance/caching in any way?
 
 No different than using q=*:* with the lucene query parser.  
 MatchAllDocsQuery is possibly the fastest query out there!  (it simply 
 matches documents in index order, all scores are 1.0)
 
 As a side note, a while back I configured q.alt=*:*, and the application 
 (via SolrJ) still set q=*:* if no user input was provided (faceting). With 
 both of them set that way, I got zero results. (Solr 3.4.0)  Interesting.
 
 Ouch.  Really?  I don't see in the code (looking at my trunk checkout) where 
 there's any *:* used in the SolrJ library.  Can you provide some details on 
 how you used SolrJ?  It'd be good to track this down as that seems like a 
 bug to me.
 
  Erik
 
 
 
 Thanks,
 
 Jeff
 
 On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote:
 
 If all you're doing is filtering (browsing by facets perhaps), it's 
 perfectly fine to have q=*:*.  MatchAllDocsQuery is fast (and would be 
 cached anyway), so use *:* as appropriate without worries.
 
Erik
 
 
 
 On Nov 22, 2011, at 07:18 , pravesh wrote:
 
 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to 
 matchalldocsquery
 
 Regds
 Pravesh
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 j...@535consulting.com
 http://www.535consulting.com
 (650) 423-1068
 
 
 
 
 
 
 
 
 
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 j...@535consulting.com
 http://www.535consulting.com
 (650) 423-1068
 
 
 
 
 
 
 
 
 



Re: FunctionQuery score=0

2011-11-22 Thread Chris Hostetter

:  q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02
:  title^0.08 categorysearch^0.05 boost='eqsim(alltokens,xyz)'
:  v='+tokens5:xyz '})
: 
: 
:  With the above query, I am getting only the results that I want, the ones
:  whose score after my FucntionQuery are above 0, but the problem now is that
:  the final score for all results is changed to 1, which affects the sorting.
: 
:  How can I keep the original score that is calculated by the edismax query?

a) Like i said.  details matter.

In your earlier messages you mentioned that you were wrapping a function 
arround a query and wanted to not have the function match anythign where 
the result was 0 -- the suggestions provided have done that.

this is the first time you mentioned that you needed the values returned 
by the function as the scores of the documents (had you mentioned that you 
might have gotten differnet answers)

b) if you look closely at the suggestion from André, you'll see that his 
specific suggestion will actually do what you want if you follow it -- 
express the query you want in the q param (so you get the scores from 
it) and then express an fq that refers to the q query as a variable...

:  q=ipodfq={!frange l=0 incl=false}query($q)

c) Based on the concrete example you've given above, i'ts not clear to me 
that you actually need any of this -- if the above query is giving you the 
results you want, but you want the scores from the edismax query to be 
used as the final scores of the function, then there is no need to wrap 
the query in any sort of function at all, or exclude any 0 values

this should be exactly what you want...

q={!type=edismax qf=abstract^0.02 title^0.08 categorysearch^0.05 
boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '}

...why exactly did you think you needed to wrap that query in a function?

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss

spellcheck in dismax

2011-11-22 Thread Ruixiang Zhang
I put the following into dismax requestHandler, but no suggestion field is
returned.

lst name=defaults
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.extendedResultsfalse/str
  str name=spellcheck.count1/str
/lst
arr name=last-components
  strspellcheck/str
/arr

But everything works if I put it as a separate requestHandler. Did I miss
something?

Thanks
Richard


Re: spellcheck in dismax

2011-11-22 Thread alxsss

 It seem you forget this
str name=spellchecktrue/str


 

 

-Original Message-
From: Ruixiang Zhang rxzh...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Nov 22, 2011 11:54 am
Subject: spellcheck in dismax


I put the following into dismax requestHandler, but no suggestion field is
returned.

lst name=defaults
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.extendedResultsfalse/str
  str name=spellcheck.count1/str
/lst
arr name=last-components
  strspellcheck/str
/arr

But everything works if I put it as a separate requestHandler. Did I miss
something?

Thanks
Richard

 


Re: Faceting is not Using Field Value Cache . . ?

2011-11-22 Thread Samuel García Martínez
AFAIK, FieldValueCache is only used for faceting on tokenized fields.
Maybe, are you getting confused with FieldCache (
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/FieldCache.html)?
This is used for common facets (using facet.method=fc and not tokenized
fields).

This makes any sense for you?

On Tue, Nov 22, 2011 at 7:21 PM, CRB sub.scripti...@metaheuristica.comwrote:


 Seeing something odd going on with faceting . . . we execute facets with
 every query and yet the fieldValueCache is not being used:

name:  fieldValueCache
 class:  org.apache.solr.search.**FastLRUCache
 version:  1.0
 description:  Concurrent LRU Cache(maxSize=1, initialSize=10,
 minSize=9000, acceptableSize=9500, cleanupThread=false)
 stats: lookups : 0
 hits : 0
 hitratio : 0.00
 inserts : 0
 evictions : 0
 size : 0
 warmupTime : 0
 cumulative_lookups : 0
 cumulative_hits : 0
 cumulative_hitratio : 0.00
 cumulative_inserts : 0
 cumulative_evictions : 0

 I was under the impression the fieldValueCache  was an implicit cache (if
 you don't define it, it will still exist).

 We are running Solr v3.3 (and NOT using {!cache=false}).

 Thoughts?




-- 
Un saludo,
Samuel García.


Re: Solr highlighting isn't work!

2011-11-22 Thread Koji Sekiguchi

(11/11/22 22:30), VladislavLysov wrote:

Hello!!!
   I have a trouble with Solr highlighting. I have any document with next
fields- TYPE, DBID and others. When i do next request -
https://localhost:8443/solr/myCore/afts?wt=standardq=TYPE:
https://localhost:8443/solr/myCore/afts?wt=standardq=TYPE:cm:contentindent=onhl=truehl.fl=DBIDhl.usePhraseHighlighter=truefl=DBID
it was returned next text:
response
lst name=responseHeader
int name=status0/int
int name=QTime3/int
/lst
result name=response numFound=166 start=0
doc
arr name=DBID
str892/str
/arr
/doc
doc
/result
lst name=highlighting
lst name=LEAF-892/
/lst
/response
What is problem?
Thank you!


What term are you trying to highlight? You queried cm:content on TYPE field 
and
commanded to highlight the term on DBID field. But seems that DBID field 
includes
only 892, highlighter cannot create any highlighted snippets.

With Solr 3.5 (now RC2 available) or trunk version of Solr, you can use hl.q 
parameter
for highlighting query.

http://wiki.apache.org/solr/HighlightingParameters#hl.q

koji
--
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/


Re: Solr real time update

2011-11-22 Thread yu shen
Hi Nagarajayya,

Thanks for your information. Do I need to change any configuration of my
current solr server to integrate your plugin?

Spark


2011/11/22 Nagendra Nagarajayya nnagaraja...@transaxtions.com

 Yu:

 To get Near Real Time update in Solr 1.4.1 you will need to use Solr 1.4.1
 with RankingAlgorithm. This allows you to update documents in near real
 time. You can download and give this a try from here:

 http://solr-ra.tgels.org/

 Regards,

 - Nagendra Nagarajayya
 http://solr-ra.tgels.org/
 http://rankingalgorithm.tgels.**org/ http://rankingalgorithm.tgels.org/


 On 11/21/2011 9:47 PM, yu shen wrote:

 Hi All,

 After some study, I used below snippet. Seems the documents is updated,
 while still takes a long time. Feels like the parameter does not take
 effect. Any comments?
 UpdateRequest req = new UpdateRequest();
 req.add(solrDocs);
 req.setCommitWithin(5000);
 req.setParam(commitWithin, 5000);
 req.setAction(**AbstractUpdateRequest.ACTION.**COMMIT, true,
 true);
 req.process(SOLR_SERVER);

 2011/11/22 yu shenshenyu...@gmail.com

  Hi All,

 I try to do a 'nearly real time update' to solr.  My solr version is
 1.4.1. I read this solr CommentWithinhttp://wiki.**
 apache.org/solr/CommitWithin http://wiki.apache.org/solr/CommitWithin
 **wiki, and a related
 threadhttp://lucene.472066.**n3.nabble.com/Solr-real-time-**
 update-taking-time-td3472709.**htmlhttp://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-td3472709.htmlmostly
 on the difficulty to do this.

 My issue is I tried the code snippet in the wiki:

 UpdateRequest req = new UpdateRequest();
 req.add(mySolrInputDocument);
 req.setCommitWithin(1);
 req.process(server);

 But my index did not get updated, unless I call SOLR_SERVER.commit();
 explicitly. The latter call will take more than 1 minute on average to
 return.

 Can I do a real time update on solr 1.4.1? Would someone help to show a
 workable code snippet?

 Spark





If search matches index in the middle of filter chain, will result return?

2011-11-22 Thread Ellery Leung
Hi all

 

I am using Solr 3.4 with Win7 and Jetty.

 

When I do a search on a field, according to the Analysis from Solr, the
search string matches the index in the middle of the chain.  Here is the
schema:

 

fieldType name=substring_search class=solr.TextField
positionIncrementGap=100

analyzer type=index

charFilter
class=solr.MappingCharFilterFactory
mapping=../../filters/filter-mappings.txt/

charFilter
class=solr.HTMLStripCharFilterFactory /

tokenizer
class=solr.KeywordTokenizerFactory/

filter
class=solr.ASCIIFoldingFilterFactory/

filter class=solr.TrimFilterFactory /

filter class=solr.LowerCaseFilterFactory
/

filter
class=solr.CommonGramsFilterFactory words=../../filters/stopwords.txt
ignoreCase=true/

filter class=solr.NGramFilterFactory
minGramSize=1 maxGramSize=20/

filter
class=solr.RemoveDuplicatesTokenFilterFactory /

/analyzer

analyzer type=query

charFilter
class=solr.MappingCharFilterFactory
mapping=../../filters/filter-mappings.txt/

charFilter
class=solr.HTMLStripCharFilterFactory /

tokenizer
class=solr.KeywordTokenizerFactory/

filter
class=solr.ASCIIFoldingFilterFactory/

filter class=solr.TrimFilterFactory /

filter class=solr.LowerCaseFilterFactory
/

filter
class=solr.RemoveDuplicatesTokenFilterFactory /

/analyzer

/fieldType

 

I am searching for an email called: off...@officeofficeoffice.com.  If I
search any text under 20 characters, result will be returned.  But when I
search the whole string: off...@officeofficeoffice.com, no result return.

 

As you all see in the schema in index part, when I search the whole
string, it will match the index chain before NGramFilterFactory.  But after
NGram, no result found.

 

Here are my questions:

-  Is this behavior normal?

-  In order to get off...@officeofficeoffice.com, does it mean
that I have to make the maxGramSize larger (like 70)?

 

Thank you in advance for all your support.  This is a great community.



Separate ACL and document index

2011-11-22 Thread Floyd Wu
Hi there,

Is it possible to separate ACL index and document index and achieve to
search by user role in SOLR?

Currently my implementation is to index ACL with document, but the
document itself change frequently. I have to perform rebuild index
every time when ACL change. It's heavy for whole system due to
document are so many and content are huge.

Do you guys have any solution to solve this problem. I've been read
mailing list for a while. Seem there is not suitable solution for me.

I want user searches result only for him according to his role but I
don't want to re-index document every time when document's ACL change.

To my knowledge, is this possible to perform a join like database to
achieve this? How and possible?

Thanks

Floyd


Re: If search matches index in the middle of filter chain, will result return?

2011-11-22 Thread Shawn Heisey

On 11/22/2011 7:54 PM, Ellery Leung wrote:

I am searching for an email called: off...@officeofficeoffice.com.  If I
search any text under 20 characters, result will be returned.  But when I
search the whole string: off...@officeofficeoffice.com, no result return.

As you all see in the schema in index part, when I search the whole
string, it will match the index chain before NGramFilterFactory.  But after
NGram, no result found.

Here are my questions:
-  Is this behavior normal?


I'm pretty sure that your query must match after the entire analyzer 
chain is done.  I would expect that behavior to be normal.



-  In order to get off...@officeofficeoffice.com, does it mean
that I have to make the maxGramSize larger (like 70)?


If you were to increase the maxGramSize to 70, you would get a match in 
this case, but your index might get a lot larger, depending on what's in 
your source data.  That's probably not the right approach, though.


In general, you want to have your index and query analyzer chains 
exactly the same.  There are some exceptions, but I don't think the 
NGram filter is one of them.  The synonym filter and WordDelimiterFilter 
are examples where it is expected that your index and query analyzer 
chains will be different.


Add the NGram and CommonGram filters to the query chain, and everything 
should start working.  If you were to go with a single analyzer for both 
like the following, I think it would start working.  You wouldn't even 
need to reindex, since you wouldn't be changing the index analyzer.


fieldType name=substring_search class=solr.TextField 
positionIncrementGap=100

analyzer
charFilter class=solr.MappingCharFilterFactory 
mapping=../../filters/filter-mappings.txt/

charFilter class=solr.HTMLStripCharFilterFactory /
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.TrimFilterFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.CommonGramsFilterFactory 
words=../../filters/stopwords.txt ignoreCase=true/

filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer
/fieldType

Regarding your NGram filter,  I would actually increase the minGramSize 
to at least 2 and decrease the maxGramSize to something like 10 or 15, 
then reindex.


An additional note: CommonGrams may not be all that useful unless you 
are indexing large numbers of huge documents, like entire books.  This 
particular fieldType is not suitable for full text anyway, since it uses 
KeywordTokenizer.  Consider removing CommonGrams from this fieldType and 
reindexing.  Unless you are dealing with large amounts of text, consider 
removing it from the entire schema.  If you do remove it, it's usually 
not a good idea to replace it with a StopFilter.  The index size 
reduction found in stopword removal is not usually worth the potential 
loss of recall.


Be prepared to test all reasonable analyzer combinations, rather than 
taking my word for it.


After reading the Hathi Trust blog, I tried CommonGrams on my own 
index.  It actually made things slower, not faster.  My typical document 
is only a few thousand bytes of metadata.  The Hathi Trust is indexing 
millions of full-length books.


Thanks,
Shawn



Re: Solr real time update

2011-11-22 Thread Nagendra Nagarajayya

Spark:

Solr with RankingAlgorithm is not a plugin but a change of search 
library from Lucene to RankingAlgorithm. Here is more info on the 
changes you will need to make to your solrconfig.xml:


http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search

Regards,

- Nagendra Nagrajayya
http://solr-ra.tgels.org/
http://rankingalgorithm.tgels.org/

On 11/22/2011 5:40 PM, yu shen wrote:

Hi Nagarajayya,

Thanks for your information. Do I need to change any configuration of my
current solr server to integrate your plugin?

Spark


2011/11/22 Nagendra Nagarajayyannagaraja...@transaxtions.com


Yu:

To get Near Real Time update in Solr 1.4.1 you will need to use Solr 1.4.1
with RankingAlgorithm. This allows you to update documents in near real
time. You can download and give this a try from here:

http://solr-ra.tgels.org/

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org/
http://rankingalgorithm.tgels.**org/http://rankingalgorithm.tgels.org/


On 11/21/2011 9:47 PM, yu shen wrote:


Hi All,

After some study, I used below snippet. Seems the documents is updated,
while still takes a long time. Feels like the parameter does not take
effect. Any comments?
UpdateRequest req = new UpdateRequest();
 req.add(solrDocs);
 req.setCommitWithin(5000);
 req.setParam(commitWithin, 5000);
 req.setAction(**AbstractUpdateRequest.ACTION.**COMMIT, true,
true);
 req.process(SOLR_SERVER);

2011/11/22 yu shenshenyu...@gmail.com

  Hi All,

I try to do a 'nearly real time update' to solr.  My solr version is
1.4.1. I read this solr CommentWithinhttp://wiki.**
apache.org/solr/CommitWithinhttp://wiki.apache.org/solr/CommitWithin
**wiki, and a related
threadhttp://lucene.472066.**n3.nabble.com/Solr-real-time-**
update-taking-time-td3472709.**htmlhttp://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-td3472709.htmlmostly
on the difficulty to do this.

My issue is I tried the code snippet in the wiki:

UpdateRequest req = new UpdateRequest();
req.add(mySolrInputDocument);
req.setCommitWithin(1);
req.process(server);

But my index did not get updated, unless I call SOLR_SERVER.commit();
explicitly. The latter call will take more than 1 minute on average to
return.

Can I do a real time update on solr 1.4.1? Would someone help to show a
workable code snippet?

Spark






RE: If search matches index in the middle of filter chain, will result return?

2011-11-22 Thread Ellery Leung
Thanks Shawn.  So to recap:

- Every match must be found after entire chain, not in the middle of the
chain.
- Suggested: index and query chain should be the same.

In my situation, if I make both of them the same, the result may be
misleading because it will also match other records that have the same
partial string.

But your suggestion is wonderful.  Thank you very much.

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: 2011年11月23日 12:04 下午
To: solr-user@lucene.apache.org
Subject: Re: If search matches index in the middle of filter chain, will
result return?

On 11/22/2011 7:54 PM, Ellery Leung wrote:
 I am searching for an email called: off...@officeofficeoffice.com.  If I
 search any text under 20 characters, result will be returned.  But when I
 search the whole string: off...@officeofficeoffice.com, no result return.

 As you all see in the schema in index part, when I search the whole
 string, it will match the index chain before NGramFilterFactory.  But
after
 NGram, no result found.

 Here are my questions:
 -  Is this behavior normal?

I'm pretty sure that your query must match after the entire analyzer 
chain is done.  I would expect that behavior to be normal.

 -  In order to get off...@officeofficeoffice.com, does it mean
 that I have to make the maxGramSize larger (like 70)?

If you were to increase the maxGramSize to 70, you would get a match in 
this case, but your index might get a lot larger, depending on what's in 
your source data.  That's probably not the right approach, though.

In general, you want to have your index and query analyzer chains 
exactly the same.  There are some exceptions, but I don't think the 
NGram filter is one of them.  The synonym filter and WordDelimiterFilter 
are examples where it is expected that your index and query analyzer 
chains will be different.

Add the NGram and CommonGram filters to the query chain, and everything 
should start working.  If you were to go with a single analyzer for both 
like the following, I think it would start working.  You wouldn't even 
need to reindex, since you wouldn't be changing the index analyzer.

fieldType name=substring_search class=solr.TextField 
positionIncrementGap=100
analyzer
charFilter class=solr.MappingCharFilterFactory 
mapping=../../filters/filter-mappings.txt/
charFilter class=solr.HTMLStripCharFilterFactory /
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.TrimFilterFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.CommonGramsFilterFactory 
words=../../filters/stopwords.txt ignoreCase=true/
filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=20/
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer
/fieldType

Regarding your NGram filter,  I would actually increase the minGramSize 
to at least 2 and decrease the maxGramSize to something like 10 or 15, 
then reindex.

An additional note: CommonGrams may not be all that useful unless you 
are indexing large numbers of huge documents, like entire books.  This 
particular fieldType is not suitable for full text anyway, since it uses 
KeywordTokenizer.  Consider removing CommonGrams from this fieldType and 
reindexing.  Unless you are dealing with large amounts of text, consider 
removing it from the entire schema.  If you do remove it, it's usually 
not a good idea to replace it with a StopFilter.  The index size 
reduction found in stopword removal is not usually worth the potential 
loss of recall.

Be prepared to test all reasonable analyzer combinations, rather than 
taking my word for it.

After reading the Hathi Trust blog, I tried CommonGrams on my own 
index.  It actually made things slower, not faster.  My typical document 
is only a few thousand bytes of metadata.  The Hathi Trust is indexing 
millions of full-length books.

Thanks,
Shawn




Re: Integrating Surround Query Parser

2011-11-22 Thread Rahul Mehta
How to apply this patch https://issues.apache.org/jira/browse/SOLR-2703 with
solr 3.1 to install surround as plugin?

On Tue, Nov 22, 2011 at 7:34 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 The surround query parser is fully wired into Solr trunk/4.0, if that
 helps.  See http://wiki.apache.org/solr/SurroundQueryParser and the JIRA
 issue linked there in case you want to patch it into a different version.

Erik

 On Jan 21, 2011, at 02:24 , Ahson Iqbal wrote:

  Hi All
 
  I want to integrate Surround Query Parser with solr, To do this i have
  downloaded jar file from the internet and and then pasting that jar file
 in
  web-inf/lib
 
  and configured query parser in solrconfig.xml as
  queryParser name=SurroundQParser
  class=org.apache.lucene.queryParser.surround.parser.QueryParser/
 
  now when i load solr admin page following exception comes
  org.apache.solr.common.SolrException: Error Instantiating QParserPlugin,
  org.apache.lucene.queryParser.surround.parser.QueryParser is not a
  org.apache.solr.search.QParserPlugin
 
  what i think that i didnt get the right plugin, can any body guide me
 from where
  to get right plugin for surround query parser or how to accurately
 integrate
  this plugin with solr.
 
 
  thanx
  Ahsan
 
 
 




-- 
Thanks  Regards

Rahul Mehta


Re: FunctionQuery score=0

2011-11-22 Thread John
Hi Hoss,

Thanks for the detailed response.

My XY problem is:

1) I am trying to search for a complex query:
q={!type=edismax qf=abstract^0.02 title^0.08 categorysearch^0.05
boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '}

Which answers my query needs. BUT, my boost function actually changes some
of the results to be of score 0, which I want to be excluded from the
result set.

2) This is why I used the frange query to solve the issue with the score 0:
q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02 title^0.08
categorysearch^0.05 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '})

But this time, the remaining results lost their *boosted* scores, and
therefore the sort by score got all mixed up.

3) I assume I can use filter queries, but from my understanding FQs
actually perform another query before the main one and these queries are
expensive in time and I would like to avoid it if possible.

Hope this explains a bit more.

Thanks,
Lev

On Tue, Nov 22, 2011 at 9:15 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 :  q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02
 :  title^0.08 categorysearch^0.05 boost='eqsim(alltokens,xyz)'
 :  v='+tokens5:xyz '})
 : 
 : 
 :  With the above query, I am getting only the results that I want, the
 ones
 :  whose score after my FucntionQuery are above 0, but the problem now is
 that
 :  the final score for all results is changed to 1, which affects the
 sorting.
 : 
 :  How can I keep the original score that is calculated by the edismax
 query?

 a) Like i said.  details matter.

 In your earlier messages you mentioned that you were wrapping a function
 arround a query and wanted to not have the function match anythign where
 the result was 0 -- the suggestions provided have done that.

 this is the first time you mentioned that you needed the values returned
 by the function as the scores of the documents (had you mentioned that you
 might have gotten differnet answers)

 b) if you look closely at the suggestion from André, you'll see that his
 specific suggestion will actually do what you want if you follow it --
 express the query you want in the q param (so you get the scores from
 it) and then express an fq that refers to the q query as a variable...

 :  q=ipodfq={!frange l=0 incl=false}query($q)

 c) Based on the concrete example you've given above, i'ts not clear to me
 that you actually need any of this -- if the above query is giving you the
 results you want, but you want the scores from the edismax query to be
 used as the final scores of the function, then there is no need to wrap
 the query in any sort of function at all, or exclude any 0 values

 this should be exactly what you want...

 q={!type=edismax qf=abstract^0.02 title^0.08 categorysearch^0.05
 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '}

 ...why exactly did you think you needed to wrap that query in a function?

 https://people.apache.org/~hossman/#xyproblem
 XY Problem

 Your question appears to be an XY Problem ... that is: you are dealing
 with X, you are assuming Y will help you, and you are asking about Y
 without giving more details about the X so that we can understand the
 full issue.  Perhaps the best solution doesn't involve Y at all?
 See Also: http://www.perlmonks.org/index.pl?node_id=542341




 -Hoss


Re: Integrating Surround Query Parser

2011-11-22 Thread Rahul Mehta
This what i tried:


   - Gone to  the solr 3.1 directory which is downloaded from here.
   http://www.trieuvan.com/apache//lucene/solr/3.1.0/apache-solr-3.1.0.tgz
   - wget
   https://issues.apache.org/jira/secure/attachment/12493167/SOLR-2703.patch
   - run the :  patch -p0 -i SOLR-2703.patch --dry-run
   - got an error :
  - patching file
  core/src/test/org/apache/solr/search/TestSurroundQueryParser.java
  - patching file core/src/test-files/solr/conf/schemasurround.xml
  - patching file core/src/test-files/solr/conf/solrconfigsurround.xml
  - patching file
  core/src/java/org/apache/solr/search/SurroundQParserPlugin.java
  - patching file example/solr/conf/solrconfig.xml
  - Hunk #1 FAILED at 1538.
  - 1 out of 1 hunk FAILED -- saving rejects to file
  example/solr/conf/solrconfig.xml.rej
   - our solr config file is getting end at 1508 only.
   - tried finding sudo find / -name TestSurroundQueryParser.java  which is
   not found in the directory .
   - and when m doing svn up giving me Skipped '.'

*Please suggest what should i do now ? *

On Wed, Nov 23, 2011 at 10:39 AM, Rahul Mehta rahul23134...@gmail.comwrote:

 How to apply this patch https://issues.apache.org/jira/browse/SOLR-2703 with
 solr 3.1 to install surround as plugin?


 On Tue, Nov 22, 2011 at 7:34 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 The surround query parser is fully wired into Solr trunk/4.0, if that
 helps.  See http://wiki.apache.org/solr/SurroundQueryParser and the JIRA
 issue linked there in case you want to patch it into a different version.

Erik

 On Jan 21, 2011, at 02:24 , Ahson Iqbal wrote:

  Hi All
 
  I want to integrate Surround Query Parser with solr, To do this i have
  downloaded jar file from the internet and and then pasting that jar
 file in
  web-inf/lib
 
  and configured query parser in solrconfig.xml as
  queryParser name=SurroundQParser
  class=org.apache.lucene.queryParser.surround.parser.QueryParser/
 
  now when i load solr admin page following exception comes
  org.apache.solr.common.SolrException: Error Instantiating QParserPlugin,
  org.apache.lucene.queryParser.surround.parser.QueryParser is not a
  org.apache.solr.search.QParserPlugin
 
  what i think that i didnt get the right plugin, can any body guide me
 from where
  to get right plugin for surround query parser or how to accurately
 integrate
  this plugin with solr.
 
 
  thanx
  Ahsan
 
 
 




 --
 Thanks  Regards

 Rahul Mehta






-- 
Thanks  Regards

Rahul Mehta


Re: Can files be faceted based on their size ?

2011-11-22 Thread neuron005
Thanks for replying
I tried using Trie types for faceting solr but that did not solve the
problem. If I use Trie types(for e.g. I used tlong)...it shows schema
mismatch error as in FileListEntityProcessor api , fileSize has been
defined of type string. That means we can not apply facet.range on fileSize.
Am I right? 
Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-files-be-faceted-based-on-their-size-tp3518393p3529923.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr Performance/Architecture

2011-11-22 Thread Husain, Yavar
Hi Shawn

That was so great of you to explain the architecture in such a detail. I 
enjoyed reading it multiple times.

I have a question here:

You mentioned that we can use crc32(DocumentId)% NumServers. Now actually I am 
using that in my data-config.xml in the sql query itself, something like:

For Documents to be indexed on Server 1: select DocumentId,PNum,... from Sample 
where crc32(DocumentId)%2=0;
For Documents to be indexed on Server 2: select DocumentId,PNum,... from Sample 
where crc32(DocumentId)%2=1;

Will that be a right way? Will it not be a slow query?

Thanks once again.



-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Monday, November 21, 2011 7:47 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Performance/Architecture

On 11/21/2011 12:41 AM, Husain, Yavar wrote:
 Number of rows in SQL Table (Indexed till now using Solr): 1 million
 Total Size of Data in the table: 4GB
 Total Index Size: 3.5 GB

 Total Number of Rows that I have to index: 20 Million (approximately 100 GB 
 Data) and growing

 What is the best practices with respect to distributing the index? What I 
 mean to say here is when should I distribute and what is the magic number 
 that I can have for index size per instance?

 For 1 million itself Solr instance running on a VM is taking roughly 2.5 hrs 
 to index for me. So for 20 million roughly it would take 60 -70 hrs. That 
 would be too much.

 What would be the best distributed architecture for my case? It will be great 
 if people may share their best practices and experience.

I have a MySQL database with 66 million rows at the moment, always 
growing.  My Solr index is split into six large shards and a small shard 
with the newest data.  The small shard (incremental) is calculated by 
looking at counts of data in hourly increments between 7 and 3.5 days 
old, and either choosing a boundary that results in less than 500,000 
documents or the 3.5 day boundary.  This index is usually about 1GB in size.

The rest of the documents are split between the other six shards using 
crc32(did) % 6.  The did field is a mysql bigint autoincrement field.  
These large shards are very close to 11 million records and 20GB each.  
By indexing all six at once, I can complete a full index rebuild in 
about 3.5 hours.

Each full index chain lives on two 64GB Dell servers with dual quad-core 
processors.  Each server contains a Solr instance with 8GB of heap, 
running three large shards.  One server contains the incremental index, 
the other server runs the load balancer.  Both servers run an index-free 
Solr core that we call the broker.  Its search handlers have the shards 
parameter in solrconfig.xml, pointed at the appropriate cores for that 
index chain.

To keep index size down and search speed up, it's important that your 
index only contain the fields needed for two purposes: Searching 
(indexed fields) and displaying a results grid (stored fields).  Any 
other information should be excluded from your schema.xml and/or DIH 
config.  Full item details should be populated from the database or 
other information store (possibly a filesystem), using the unique 
identifier from the search results.

If you are aggregating data from more than one table, see if you can 
have your database get the information into one SELECT statement with 
JOINs, rather than having more than one entity in your DIH config.  
Alternatively, if your secondary tables are small, try using the 
CachedSQLEntityProcessor on them so they are loaded entirely into RAM 
for the import.  Your database software is usually much better at 
combining tables than Solr, so take advantage of it.

If you have multivalued search fields from secondary entities in DIH, 
you can often get your database software to CONCAT them together into a 
single field, then use an appropriate tokenizer to split them into 
separate terms.  I have one such field that is semicolon separated by a 
database JOIN that's specified in a view, then I use a pattern tokenizer 
that splits it at index time.

I hope this is helpful.

Thanks,
Shawn

**
 
This message may contain confidential or proprietary information intended only 
for the use of the 
addressee(s) named above or may contain information that is legally privileged. 
If you are 
not the intended addressee, or the person responsible for delivering it to the 
intended addressee, 
you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly 
prohibited. If you have received this message by mistake, please immediately 
notify us by 
replying to the message and delete the original message and any copies 
immediately thereafter. 

Thank you.- 
**
FAFLD



Solr Search for misspelled search term

2011-11-22 Thread meghana
Hi all,

I need to find a way by which solr check and return for results for
misspelled search term. 
Do anybody have any idea? 

Thank You!!
Meghana

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-for-misspelled-search-term-tp3529961p3529961.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr highlighting isn't work!

2011-11-22 Thread VladislavLysov
Thank you for request! I'm did next request and all is ok 
https://localhost:8443/solr/alfresco/select?wt=standardq=TYPE:%22{http://www.test.com/test/test/model/content/0.1}field%22indent=onhl=truehl.fl=TYPE
https://localhost:8443/solr/alfresco/select?wt=standardq=TYPE:%22{http://www.test.com/test/test/model/content/0.1}field%22indent=onhl=truehl.fl=TYPE
 
But now i have another problem. If i have field with name
@{http://www.test.com/test/eln/model/content/0.1}label.__; and value
{en}label1 and do request 
https://localhost:8443/solr/alfresco/select?q=@{http://www.test.com/test/test/model/content/0.1}label.__:
https://localhost:8443/solr/alfresco/select?q=@{http://www.test.com/test/test/model/content/0.1}label.__:{en}label1wt=xml
 
but it returned exception 
HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot
parse
'@{http://www.agilent.com/openlab/eln/model/content/0.1}label.__:{en}label1;':
Encountered  } }  at line 1, column 54. Was expecting one of: TO ...
RANGEEX_QUOTED ... RANGEEX_GOOP ... 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-highlighting-isn-t-work-tp3527701p3530016.html
Sent from the Solr - User mailing list archive at Nabble.com.