Loading data to solr from mysql
Can anybody suggest me the way to load data from mysql to solr directly. -- View this message in context: http://lucene.472066.n3.nabble.com/Loading-data-to-solr-from-mysql-tp2442184p2442184.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Loading data to solr from mysql
http://wiki.apache.org/solr/DataImportHandler On Mon, Feb 7, 2011 at 11:16 AM, Bagesh Sharma mail.bag...@gmail.com wrote: Can anybody suggest me the way to load data from mysql to solr directly. -- View this message in context: http://lucene.472066.n3.nabble.com/Loading-data-to-solr-from-mysql-tp2442184p2442184.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Error
Pl share your insights on the error. Regards, Prasad java.lang.OutOfMemoryError: Java heap space Exception in thread Timer-1 at org.mortbay.util.URIUtil. decodePath(URIUtil.java:285) at org.mortbay.jetty.HttpURI.getDecodedPath(HttpURI.java:395) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:486) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) java.lang.OutOfMemoryError: Java heap space Exception in thread Lucene Merge Thread #0 org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:236) at org.apache.lucene.store.IndexOutput.writeString(IndexOutput.java:103) at org.apache.lucene.index.FieldsWriter.writeField(FieldsWriter.java:231) at org.apache.lucene.index.FieldsWriter.addDocument(FieldsWriter.java:268) at org.apache.lucene.index.SegmentMerger.copyFieldsNoDeletions(SegmentMerger.java:451) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:352) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:153) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5112) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4675) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
Re: Solr Error
I have already allocated abt 2gb -Xmx2048m. Regards, Prasad On 7 February 2011 18:17, Ahmet Arslan iori...@yahoo.com wrote: Pl share your insights on the error. java.lang.OutOfMemoryError: Java heap space What happens if you increase the Java heap space? java -Xmx1g -jar start.jar
Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene
Heap usage can spike after a commit. Existing caches are still in use and new caches are being generated and/or auto warmed. Can you confirm this is the case? On Friday 28 January 2011 00:34:42 Simon Wistow wrote: On Tue, Jan 25, 2011 at 01:28:16PM +0100, Markus Jelsma said: Are you sure you need CMS incremental mode? It's only adviced when running on a machine with one or two processors. If you have more you should consider disabling the incremental flags. I'll test agin but we added those to get better performance - not much but there did seem to be an improvement. The problem seems to not be in average use but that occasionally there's huge spike in load (there doesn't seem to be a particular killer query) and Solr just never recovers. Thanks, Simon -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Solr Error
What is your index size and Ram you have? - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Error-tp2442417p2443597.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Indexing Performance
On Sat, Feb 5, 2011 at 2:06 PM, Darx Oman darxo...@gmail.com wrote: I indexed 1000 pdf file with the same configuration, it completed in about 32 min. So, it seems like your indexing scales at least as well as the number of the PDF documents that you have. While this might be good news in your case, it is difficult to estimate an expected indexing rate when indexing from documents. Regards, Gora
DIH keeps felling during full-import
I'm receiving the following exception when trying to perform a full-import (~30 hours). Any idea on ways I could fix this? Is there an easy way to use DIH to break apart a full-import into multiple pieces? IE 3 mini-imports instead of 1 large import? Thanks. Feb 7, 2011 5:52:33 AM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection SEVERE: Ignoring Error when closing connection com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:532) at com.mysql.jdbc.Util.handleNewInstance(Util.java:407) at com.mysql.jdbc.Util.getInstance(Util.java:382) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1013) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:987) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:982) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927) at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4751) at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4345) at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1564) at org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399) at org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390) at org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:174) at org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:165) at org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:332) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:360) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Feb 7, 2011 5:52:33 AM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection SEVERE: Ignoring Error when closing connection java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@1a797305 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:934) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:931) at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2724) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1895) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2140) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2620) at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4854) at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4737) at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4345) at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1564) at org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399) at org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390) at org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:174) at org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:332) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:360) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Feb 7, 2011 7:03:29 AM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection SEVERE: Ignoring Error when closing connection com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:532) at com.mysql.jdbc.Util.handleNewInstance(Util.java:407) at com.mysql.jdbc.Util.getInstance(Util.java:382) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1013) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:987) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:982) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927) at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4751) at
Re: DIH keeps failing during full-import
Typo in subject On 2/7/11 7:59 AM, Mark wrote: I'm receiving the following exception when trying to perform a full-import (~30 hours). Any idea on ways I could fix this? Is there an easy way to use DIH to break apart a full-import into multiple pieces? IE 3 mini-imports instead of 1 large import? Thanks. Feb 7, 2011 5:52:33 AM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection SEVERE: Ignoring Error when closing connection com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:532) at com.mysql.jdbc.Util.handleNewInstance(Util.java:407) at com.mysql.jdbc.Util.getInstance(Util.java:382) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1013) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:987) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:982) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927) at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4751) at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4345) at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1564) at org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399) at org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390) at org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:174) at org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:165) at org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:332) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:360) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Feb 7, 2011 5:52:33 AM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection SEVERE: Ignoring Error when closing connection java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@1a797305 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:934) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:931) at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2724) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1895) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2140) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2620) at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4854) at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4737) at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4345) at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1564) at org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399) at org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390) at org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:174) at org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:332) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:360) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Feb 7, 2011 7:03:29 AM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection SEVERE: Ignoring Error when closing connection com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:532) at com.mysql.jdbc.Util.handleNewInstance(Util.java:407) at com.mysql.jdbc.Util.getInstance(Util.java:382) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1013) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:987) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:982) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927) at
Re: DIH keeps felling during full-import
On Mon, Feb 7, 2011 at 9:29 PM, Mark static.void@gmail.com wrote: I'm receiving the following exception when trying to perform a full-import (~30 hours). Any idea on ways I could fix this? Is there an easy way to use DIH to break apart a full-import into multiple pieces? IE 3 mini-imports instead of 1 large import? Thanks. Feb 7, 2011 5:52:33 AM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection SEVERE: Ignoring Error when closing connection com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. [...] This looks like a network issue, or some other failure in communicating with the mysql database. Is that a possibility? Also, how many records are you importing, what is the data size, what is the quality of the network connection, etc.? One way to break up the number of records imported at a time is to shard your data at at the database level, but the advisability of this option depends on whether there is a more fundamental issue. Regards, Gora
Re: DIH keeps felling during full-import
Full import is around 6M documents which when completed totals around 30GB in size. Im guessing it could be a database connectivity problem because I also see these types of errors on delta-imports which could be anywhere from 20K to 300K records. On 2/7/11 8:15 AM, Gora Mohanty wrote: On Mon, Feb 7, 2011 at 9:29 PM, Markstatic.void@gmail.com wrote: I'm receiving the following exception when trying to perform a full-import (~30 hours). Any idea on ways I could fix this? Is there an easy way to use DIH to break apart a full-import into multiple pieces? IE 3 mini-imports instead of 1 large import? Thanks. Feb 7, 2011 5:52:33 AM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection SEVERE: Ignoring Error when closing connection com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. [...] This looks like a network issue, or some other failure in communicating with the mysql database. Is that a possibility? Also, how many records are you importing, what is the data size, what is the quality of the network connection, etc.? One way to break up the number of records imported at a time is to shard your data at at the database level, but the advisability of this option depends on whether there is a more fundamental issue. Regards, Gora
hl.snippets in solr 3.1
hi all, I'm trying to get result like : blabla bkeyword/b blabla ... blablabkeyword/b blabla... so, I'd like to show 2 fragments.I've added these settings str name=hl.simple.pre![CDATA[b]]/str str name=hl.simple.post![CDATA[/b]]/str str name=f.content.hl.fragsize20/str str name=f.content.hl.snippets3/str but I get only 1 fragment blabla bkeyword/b blabla. Am I trying to do it right way? Is it what can be done via changes in config file? how do I add separator between fragments(like ... in this example)? thanks.
Re: HTTP ERROR 400 undefined field: *
Thanks Otis, I'll give that a try. Jed. On 02/06/2011 08:06 PM, Otis Gospodnetic wrote: Yup, here it is, warning about needing to reindex: http://twitter.com/#!/lucene/status/28694113180192768 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Erick Ericksonerickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Sun, February 6, 2011 9:43:00 AM Subject: Re: HTTP ERROR 400 undefined field: * I *think* that there was a post a while ago saying that if you were using trunk 3_x one of the recent changes required re-indexing, but don't quote me on that. Have you tried that? Best Erick On Fri, Feb 4, 2011 at 2:04 PM, Jed Glazner jglaz...@beyondoblivion.comwrote: Sorry for the lack of details. It's all clear in my head.. :) We checked out the head revision from the 3.x branch a few weeks ago ( https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/). We picked up r1058326. We upgraded from a previous checkout (r960098). I am using our customized schema.xml and the solrconfig.xml from the old revision with the new checkout. After upgrading I just copied the data folders from each core into the new checkout (hoping I wouldn't have to re-index the content, as this takes days). Everything seems to work fine, except that now I can't get the score to return. The stack trace is attached. I also saw this warning in the logs not sure exactly what it's talking about: Feb 3, 2011 8:14:10 PM org.apache.solr.core.Config getLuceneVersion WARNING: the luceneMatchVersion is not specified, defaulting to LUCENE_24 emulation. You should at some point declare and reindex to at least 3.0, because 2.4 emulation is deprecated and will be removed in 4.0. This parameter will be mandatory in 4.0. Here is my request handler, the actual fields here are different than what is in mine, but I'm a little uncomfortable publishing how our companies search service works to the world: requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=defTypeedismax/str bool name=tvtrue/bool !-- standard field to query on -- str name=qffield_a^2 field_b^2 field_c^4/str !-- automatic phrase boosting! -- str name=pffield_d^10/str !-- boost function -- !-- we'll comment this out for now becuase we're passing it to solr as a paramter. Once we finalize the exact function we should move it here and take it out of the query string. -- !--str name=bflog(linear(field_e,0.001,1))^10/str-- str name=tie0.1/str /lst arr name=last-components strtvComponent/str /arr /requestHandler Anyway Hopefully this is enough info, let me know if you need more. Jed. On 02/03/2011 10:29 PM, Chris Hostetter wrote: : I was working on an checkout of the 3.x branch from about 6 months ago. : Everything was working pretty well, but we decided that we should update and : get what was at the head. However after upgrading, I am now getting this FWIW: please be specific. head of what? the 3x branch? or trunk? what revision in svn does that corrispond to? (the svnversion command will tell you) : HTTP ERROR 400 undefined field: * : : If I clear the fl parameter (default is set to *, score) then it works fine : with one big problem, no score data. If I try and set fl=score I get the same : error except it says undefined field: score?! : : This works great in the older version, what changed? I've googled for about : an hour now and I can't seem to find anything. i can't reproduce this using either trunk (r1067044) or 3x (r1067045) all of these queries work just fine... http://localhost:8983/solr/select/?q=* http://localhost:8983/solr/select/?q=solrfl=*,score http://localhost:8983/solr/select/?q=solrfl=score http://localhost:8983/solr/select/?q=solr ...you'll have to proivde us with a *lot* more details to help understand why you might be getting an error (like: what your configs look like, what the request looks like, what the full stack trace of your error is in the logs, etc...) -Hoss
Re: hl.snippets in solr 3.1
--- On Mon, 2/7/11, alex alex.alex.alex.9...@gmail.com wrote: From: alex alex.alex.alex.9...@gmail.com Subject: hl.snippets in solr 3.1 To: solr-user@lucene.apache.org Date: Monday, February 7, 2011, 7:38 PM hi all, I'm trying to get result like : blabla bkeyword/b blabla ... blablabkeyword/b blabla... so, I'd like to show 2 fragments.I've added these settings str name=hl.simple.pre![CDATA[b]]/str str name=hl.simple.post![CDATA[/b]]/str str name=f.content.hl.fragsize20/str str name=f.content.hl.snippets3/str but I get only 1 fragment blabla bkeyword/b blabla. Am I trying to do it right way? Is it what can be done via changes in config file? how do I add separator between fragments(like ... in this example)? thanks. These two should be declared under the defaults section of your requestHandler. int name=f.content.hl.fragsize20/int int name=f.content.hl.snippets3/int Where did you define them? Under the highlighting section in solrconfig.xml?
Re: HTTP ERROR 400 undefined field: *
: The stack trace is attached. I also saw this warning in the logs not sure From your attachment... 853 SEVERE: org.apache.solr.common.SolrException: undefined field: score 854 at org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:142) 855 at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) 856 at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) 857 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1357) ...this is one of the key pieces of info that was missing from your earlier email: that you are using the TermVectorComponent. It's likely that something changed in the TVC on 3x between the two versions you were using and thta change freaks out now on * or score in the fl. you still haven't given us an example of the full URLs you are using that trigger this error. (it's posisble there is something slightly off in your syntax - we don't know because you haven't shown us) All in: this sounds like a newly introduced bug in TVC, please post the details into a new Jira issue. as to the warning you asked about... : Feb 3, 2011 8:14:10 PM org.apache.solr.core.Config getLuceneVersion : WARNING: the luceneMatchVersion is not specified, defaulting to LUCENE_24 : emulation. You should at some point declare and reindex to at least 3.0, : because 2.4 emulation is deprecated and will be removed in 4.0. This parameter : will be mandatory in 4.0. if you look at the example configs on the 3x branch it should be explained. it's basically just a new feature that lets you specify which quirks of the underlying lucene code you want (so on upgrading you are in control of wether you eliminate old quirks or not) -Hoss
Re: hl.snippets in solr 3.1
Ahmet Arslan wrote: --- On Mon, 2/7/11, alex alex.alex.alex.9...@gmail.com wrote: From: alex alex.alex.alex.9...@gmail.com Subject: hl.snippets in solr 3.1 To: solr-user@lucene.apache.org Date: Monday, February 7, 2011, 7:38 PM hi all, I'm trying to get result like : blabla bkeyword/b blabla ... blablabkeyword/b blabla... so, I'd like to show 2 fragments.I've added these settings str name=hl.simple.pre![CDATA[b]]/str str name=hl.simple.post![CDATA[/b]]/str str name=f.content.hl.fragsize20/str str name=f.content.hl.snippets3/str but I get only 1 fragment blabla bkeyword/b blabla. Am I trying to do it right way? Is it what can be done via changes in config file? how do I add separator between fragments(like ... in this example)? thanks. These two should be declared under the defaults section of your requestHandler. int name=f.content.hl.fragsize20/int int name=f.content.hl.snippets3/int Where did you define them? Under the highlighting section in solrconfig.xml? yes, it's in solrconfig.xml: requestHandler name=/search class=solr.SearchHandler lst name=defaults str name=defTypedismax/str int name=rows10/int str name=echoParamsexplicit/str str name=qfcontent#94;0.5 title#94;1.2 /str str name=q.alt*:*/str bool name=hltrue/bool str name=hl.fltitle content url/str str name=f.content.hl.fragsize20/str str name=f.content.hl.snippets3/str str name=f.content.hl.alternateFieldcontent/str str name=f.title.hl.fragsize0/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str /lst /requestHandler I don't include the whole config , because there are just default values in it. I can see changes if I change fragsize, but no hl.snippets. and in schema.xml I have: fieldType name=text class=solr.TextField analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true enablePositionIncrements=true/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true enablePositionIncrements=true/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType and field name=content type=text stored=true indexed=true /
How to search for special chars like ä from ae?
Hi! I want to search for special chars like mäcman by giving similar worded simple characters like maecman. I used filter class=solr.ASCIIFoldingFilterFactory/ and I'm getting mäcman from macman but I'm not able to get mäcman from maecman. Can this be done using any other filter? Thanks, Anithya -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-search-for-special-chars-like-a-from-ae-tp2444921p2444921.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: hl.snippets in solr 3.1
I can see changes if I change fragsize, but no hl.snippets. May be your text is too short to generate more than one snippets? What happens when you increase hl.maxAnalyzedChars parameter? hl.maxAnalyzedChars=2147483647
Re: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
: While reloading a core I got this following error, when does this : occur ? Prior to this exception I do not see anything wrong in the logs. well, there are realy two distinct types of errors in your log... : [#|2011-02-01T13:02:36.697-0500|SEVERE|sun-appserver2.1|org.apache.solr.servlet.SolrDispatchFilter|_ThreadID=25;_ThreadName=httpWorkerThread-9001-5;_RequestID=450f6337-1f5c-42bc-a572-f0924de36b56;|org.apache.lucene.store.LockObtainFailedException: : Lock obtain timed out: NativeFSLock@ : /data/solr/core/solr-data/index/lucene-7dc773a074342fa21d7d5ba09fc80678-write.lock : at org.apache.lucene.store.Lock.obtain(Lock.java:85) : at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1565) : at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1421) ...this is error #1, indicating that for some reason the IndexWriter Solr wasn't trying to create wasn't able to get a Native Filesystem lock on your index directory -- is it possible you have two intsances of Solr (or two solr cores) trying to re-use the same data directory? (diagnosing exampley why you got this error also requires knowing what Filesystem you are using). : [#|2011-02-01T13:02:40.330-0500|SEVERE|sun-appserver2.1|org.apache.solr.update.SolrIndexWriter|_ThreadID=82;_ThreadName=Finalizer;_RequestID=121fac59-7b08-46b9-acaa-5c5462418dc7;|SolrIndexWriter : was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE : LEAK!!!|#] : : [#|2011-02-01T13:02:40.330-0500|SEVERE|sun-appserver2.1|org.apache.solr.update.SolrIndexWriter|_ThreadID=82;_ThreadName=Finalizer;_RequestID=121fac59-7b08-46b9-acaa-5c5462418dc7;|SolrIndexWriter : was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE : LEAK!!!|#] ...these errors are warning you that something very unexpected was discovered when the the Garbage Collector tried to cleanup the SolrIndexWriter -- it found that the SolrIndexWriter had never been formally closed. In normal operation, this might indicate the existence of a bug in code not managing it's resources properly --and in fact, it does indicate the existence of a bug in that evidently a Lock timed out failure doesn't cause the SOlrIndexWriter to be closed -- but in your case it's not really something to be worried about -- it's just a cascading effect of the first error. -Hoss
Indexing a date from a POJO
Hi, I would like to know if the code below is correct, because the date is not well displayed in Luke I have a POJO with a date defined as follow: public class SolrPositionDTO { @Field private String address; @Field private Date beginDate; And in the schema config file the field is defined as: field name=beginDate type=date indexed=true stored=true / Thanks in advance for yr help JCD -- Jean-Claude Dauphin jc.daup...@gmail.com jc.daup...@afus.unesco.org http://kenai.com/projects/j-isis/ http://www.unesco.org/isis/ http://www.unesco.org/idams/ http://www.greenstone.org
Re: hl.snippets in solr 3.1
Ahmet Arslan wrote: I can see changes if I change fragsize, but no hl.snippets. May be your text is too short to generate more than one snippets? What happens when you increase hl.maxAnalyzedChars parameter? hl.maxAnalyzedChars=2147483647 It's working now. I guess, it was a problem with config file. thanks!
RE: How to search for special chars like ä from ae?
Hi Anithya, There is a mapping file for MappingCharFilterFactory that behaves the same as ASCIIFoldingFilterFactory: mapping-FoldToASCII.txt, located in Solr's example conf/ directory in Solr 3.1+. You can rename and then edit this file to map ä to ae, ü to ue, etc. (look for WITH DIAERESIS to quickly find characters with umlauts in the mapping file). There is a commented-out example of using MappingCharFilterFactory in Solr's example schema.xml. If you are using Solr 1.4.X, you can download the mapping-FoldToASCII.txt file here (from the 3.x source tree): http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/conf/mapping-FoldToASCII.txt Please consider donating your work back to Solr if you decide to go this route. Good luck, Steve -Original Message- From: Anithya [mailto:surysha...@gmail.com] Sent: Monday, February 07, 2011 12:09 PM To: solr-user@lucene.apache.org Subject: How to search for special chars like ä from ae? Hi! I want to search for special chars like mäcman by giving similar worded simple characters like maecman. I used filter class=solr.ASCIIFoldingFilterFactory/ and I'm getting mäcman from macman but I'm not able to get mäcman from maecman. Can this be done using any other filter? Thanks, Anithya -- View this message in context: http://lucene.472066.n3.nabble.com/How-to- search-for-special-chars-like-a-from-ae-tp2444921p2444921.html Sent from the Solr - User mailing list archive at Nabble.com.
Spatial Solr - Representing a bounding box and searching for it
Hi everyone, I have been looking for a searching solution for spatial data and since I have worked with Solr before, I wanted to give the spatial features a try. 1. What is the default datum used for the LatLong type? Is it WGS 84? 2. What is the best way to represent a region (a bounding box to be exact) and search for it? Spatial metadata records usually contains an element that specifies the region that the record is representing. For example North American Profile (NAP) has the following element: gmd:EX_GeographicBoundingBox gmd:westBoundLongitude gco:Decimal-95.15605/gco:Decimal /gmd:westBoundLongitude gmd:eastBoundLongitude gco:Decimal-74.34407/gco:Decimal /gmd:eastBoundLongitude gmd:southBoundLatitude gco:Decimal41.436108/gco:Decimal /gmd:southBoundLatitude gmd:northBoundLatitude gco:Decimal54.61572/gco:Decimal /gmd:northBoundLatitude /gmd:EX_GeographicBoundingBox which define the bounding box containing the region. As far as I've seen, spatial fields in Solr are limited to points only. I tried using four LatLong to represent four corners of the region, but I couldn't get the bbox query to return the correct box: adding another sfield to the query had no effect. I also tried to use the fq=store:[45,-94 TO 46,-93] example by changing the store field into multivalue and putting the upper-right and lower-left into my document and using them as the range, but that also didn't work. So any suggestions on how to get this working? Sepehr
Re: dynamic fields revisited
Just so anyone else can know and save themselves 1/2 hour if they spend 4 minutes searching. When putting a dynamic field into a document into an index, the name of the field RETAINS the 'constant' part of the dynamic field name. Example - If a dynamic integer field is named '*_i' in the schema.xml file, __and__ you insert a field names 'my_integer_i', which matches the globbed field name '*_i', __then__ the name of the field will be 'my_integer_i' in the index and in your GETs/(updating)POSTs to the index on that document and __NOT__ 'my_integer' like I was kind of hoping that it would be :-( I.E., the suffix (or prefix if you set it up that way,) will NOT be dropped. I was hoping that everything except the globbing character, '*', would just be a flag to the query processor and disappear after being 'noticed'. Not so :-) -- View this message in context: http://lucene.472066.n3.nabble.com/dynamic-fields-revisited-tp2161080p2447814.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dynamic fields revisited
It would be quite annoying if it behaves as you were hoping for. This way it is possible to use different field types (and analyzers) for the same field value. In faceting, for example, this can be important because you should use analyzed fields for q and fq but unanalyzed fields for facet.field. The same goes for sorting and range queries where you can use the same field value to end up in different field types, one for sorting and one for a range query. Without the prefix or suffix of the dynamic field, one must statically declare the fields beforehand and loose the dynamic advantage. Just so anyone else can know and save themselves 1/2 hour if they spend 4 minutes searching. When putting a dynamic field into a document into an index, the name of the field RETAINS the 'constant' part of the dynamic field name. Example - If a dynamic integer field is named '*_i' in the schema.xml file, __and__ you insert a field names 'my_integer_i', which matches the globbed field name '*_i', __then__ the name of the field will be 'my_integer_i' in the index and in your GETs/(updating)POSTs to the index on that document and __NOT__ 'my_integer' like I was kind of hoping that it would be :-( I.E., the suffix (or prefix if you set it up that way,) will NOT be dropped. I was hoping that everything except the globbing character, '*', would just be a flag to the query processor and disappear after being 'noticed'. Not so :-)
Re: DIH keeps failing during full-import
You're probably better off in this instance creating your own process based on SolrJ and your jdbc-driver-of-choice. DIH doesn't provide much in the way of fine-grained control over all aspects of the process, and at +30 hours I suspect you want some better control. FWIW, SolrJ is not very hard at all to use for this kind of thing. Best Erick On Mon, Feb 7, 2011 at 10:59 AM, Mark static.void@gmail.com wrote: Typo in subject On 2/7/11 7:59 AM, Mark wrote: I'm receiving the following exception when trying to perform a full-import (~30 hours). Any idea on ways I could fix this? Is there an easy way to use DIH to break apart a full-import into multiple pieces? IE 3 mini-imports instead of 1 large import? Thanks. Feb 7, 2011 5:52:33 AM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection SEVERE: Ignoring Error when closing connection com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:532) at com.mysql.jdbc.Util.handleNewInstance(Util.java:407) at com.mysql.jdbc.Util.getInstance(Util.java:382) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1013) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:987) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:982) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927) at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4751) at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4345) at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1564) at org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399) at org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390) at org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:174) at org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:165) at org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:332) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:360) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Feb 7, 2011 5:52:33 AM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection SEVERE: Ignoring Error when closing connection java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@1a797305 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:934) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:931) at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2724) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1895) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2140) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2620) at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4854) at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4737) at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4345) at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1564) at org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399) at org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390) at org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:174) at org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:332) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:360) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Feb 7, 2011 7:03:29 AM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection SEVERE: Ignoring Error when closing connection com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:532) at
Re: geodist and spacial search
Thanks Bill, much simpler :-) On Sat, Feb 5, 2011 at 3:56 AM, Bill Bell billnb...@gmail.com wrote: Why not just: q=*:* fq={!bbox} sfield=store pt=49.45031,11.077721 d=40 fl=store sort=geodist() asc http://localhost:8983/solr/select?q=*:*sfield=storept=49.45031,11.077721; d=40fq={!bbox}sort=geodist%28%29%20asc That will sort, and filter up to 40km. No need for the fq={!func}geodist() sfield=store pt=49.45031,11.077721 Bill On 2/4/11 4:30 AM, Eric Grobler impalah...@googlemail.com wrote: Hi Grant, Thanks for the tip This seems to work: q=*:* fq={!func}geodist() sfield=store pt=49.45031,11.077721 fq={!bbox} sfield=store pt=49.45031,11.077721 d=40 fl=store sort=geodist() asc On Thu, Feb 3, 2011 at 7:46 PM, Grant Ingersoll gsing...@apache.org wrote: Use a filter query? See the {!geofilt} stuff on the wiki page. That gives you your filter to restrict down your result set, then you can sort by exact distance to get your sort of just those docs that make it through the filter. On Feb 3, 2011, at 10:24 AM, Eric Grobler wrote: Hi Erick, Thanks I saw that example, but I am trying to sort by distance AND specify the max distance in 1 query. The reason is: running bbox on 2 million documents with a 20km distance takes only 200ms. Sorting 2 million documents by distance takes over 1.5 seconds! So it will be much faster for solr to first filter the 20km documents and then to sort them. Regards Ericz On Thu, Feb 3, 2011 at 1:27 PM, Erick Erickson erickerick...@gmail.com wrote: Further down that very page G... Here's an example of sorting by distance ascending: - ...q=*:*sfield=storept=45.15,-93.85sort=geodist() asc http://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=*:* sfield=storept=45.15,-93.85sort=geodist()%20asc The key is just the sort=geodist(), I'm pretty sure that's independent of the bbox, but I could be wrong. Best Erick On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler impalah...@googlemail.com wrote: Hi In http://wiki.apache.org/solr/SpatialSearch there is an example of a bbox filter and a geodist function. Is it possible to do a bbox filter and sort by distance - combine the two? Thanks Ericz -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search
Re: dynamic fields revisited
You can change the match to be my* and then insert the name you want. Bill Bell Sent from mobile On Feb 7, 2011, at 4:15 PM, gearond gear...@sbcglobal.net wrote: Just so anyone else can know and save themselves 1/2 hour if they spend 4 minutes searching. When putting a dynamic field into a document into an index, the name of the field RETAINS the 'constant' part of the dynamic field name. Example - If a dynamic integer field is named '*_i' in the schema.xml file, __and__ you insert a field names 'my_integer_i', which matches the globbed field name '*_i', __then__ the name of the field will be 'my_integer_i' in the index and in your GETs/(updating)POSTs to the index on that document and __NOT__ 'my_integer' like I was kind of hoping that it would be :-( I.E., the suffix (or prefix if you set it up that way,) will NOT be dropped. I was hoping that everything except the globbing character, '*', would just be a flag to the query processor and disappear after being 'noticed'. Not so :-) -- View this message in context: http://lucene.472066.n3.nabble.com/dynamic-fields-revisited-tp2161080p2447814.html Sent from the Solr - User mailing list archive at Nabble.com.
q.alt=*:* for every request?
Hi, I use dismax handler with solr 1.4. Sometimes, my request comes with q and fq, and others doesn't come with q (only fq and q.alt=*:*). It's quite ok if I send q.alt=*:* for every request? Does it have side effects on performance? -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene
On Mon, Feb 07, 2011 at 02:06:00PM +0100, Markus Jelsma said: Heap usage can spike after a commit. Existing caches are still in use and new caches are being generated and/or auto warmed. Can you confirm this is the case? We see spikes after replication which I suspect is, as you say, because of the ensuing commit. What we seem to have found is that when we weren't using the Concurrent GC stop-the-world gc runs would kill the app. Now that we're using CMS we occasionally find ourselves in situations where the app still has memory left over but the load on the machine spikes, the GC duty cycle goes to 100 and the app never recovers. Restarting usually helps but sometimes we have to take the machine out of the laod balancer, wait for a number of minutes and then out it back in. We're working on two hypotheses Firstly - we're CPU bound somehow and that at some point we cross some threshhold and GC or something else is just unable to to keep up. So whilst it looks like instantaneous death of the app it's actually gradual resource exhaustion where the definition of 'gradual' is 'a very short period of time' (as opposed to some cataclysmic infinite loop bug somewhere). Either that or ... Secondly - there's some sort of Query Of Death that kills machines. We just haven't found it yet, even when replaying logs. Or some combination of both. Or other things. It's maddeningly frustrating. We're also got to try deploying a custom solr.war and try using the MMapDirectory to see if that helps with anything.
Re: Searching for negative numbers very slow
On Fri, Jan 28, 2011 at 12:29:18PM -0500, Yonik Seeley said: That's odd - there should be nothing special about negative numbers. Here are a couple of ideas: - if you have a really big index and querying by a negative number is much more rare, it could just be that part of the index wasn't cached by the OS and so the query needs to hit the disk. This can happen with any term and a really big index - nothing special for negatives here. - if -1 is a really common value, it can be slower. is fq=uid:\-2 or other negative numbers really slow also? This was my first thought but -1 is relatively common but we have other numbers just as common. Interestingly enough fq=uid:-1 fq=foo:bar fq=alpha:omega is much (4x) slower than q=uid:-1 AND foo:bar AND alpha:omega but only when searching for that number. I'm going to wave my hands here and say something like Maybe something to do with the field caches?
Re: q.alt=*:* for every request?
There is no measurable performance penalty when setting the parameter, except maybe the execution of the query with a high value for rows. To make things easy, you can define q.alt=*:* as default in your request handler. No need to specifiy it in the URL. Hi, I use dismax handler with solr 1.4. Sometimes, my request comes with q and fq, and others doesn't come with q (only fq and q.alt=*:*). It's quite ok if I send q.alt=*:* for every request? Does it have side effects on performance?
Solr Analysis Package
I'd like to use the filter factories in the org.apache.solr.analysis package for tokenizing text in a separate application. I need to chain a couple tokenizers together like Solr does on indexing and query parsing. I have looked into the TokenizerChain class to do this. I have successfully implemented a tokenization chain, but was wondering if there is an established way to do this. I just hacked together something that happened to work. Below is a code snippet. Any advise would be appreciated. Dependencies: solr-core-1.4.0, lucene-core-2.9.3, lucene-snowball-2.9.3. I am not tied to these and could use different versions. P.S. Is this more of a question for the solr-dev mailing list? code TokenizerFactory tokenizer = new WhitespaceTokenizerFactory(); MapString,String args = new HashMapString,String(); SnowballPorterFilterFactory porterFilter = new SnowballPorterFilterFactory(); porterFilter.init(args); args = new HashMapString,String(); args.put(generateWordParts, 1); args.put(generateNumberParts, 1); args.put(catenateWords, 1); args.put(catenateNumbers, 1); args.put(catenateAll, 0); WordDelimiterFilterFactory wordFilter = new WordDelimiterFilterFactory(); wordFilter.init(args); LowerCaseFilterFactory lowercaseFilter = new LowerCaseFilterFactory(); TokenFilterFactory[] filters = new TokenFilterFactory[] { wordFilter, lowercaseFilter, porterFilter }; TokenizerChain chain = new TokenizerChain(tokenizer, filters); TokenStream stream = chain.tokenStream(null, new StringReader(builder.toString())); TermAttribute tm = (TermAttribute)stream.getAttribute(TermAttribute.class); while (stream.incrementToken()) { System.out.println(tm.term()); } /code
Re: Performance optimization of Proximity/Wildcard searches
Hi, Yes, assuming you didn't change the index files, say by optimizing the index, the hot portions of the index should remain in the OS cache unless something else kicked them out. Re other thread - I don't think I have those messages any more. Otis --- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org Sent: Mon, February 7, 2011 2:49:44 AM Subject: Re: Performance optimization of Proximity/Wildcard searches Only couple of thousand documents are added daily so the old OS cache should still be useful since old documents remain same, right? Also can you please comment on my other thread related to Term Vectors? Thanks! On Sat, Feb 5, 2011 at 8:40 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Yes, OS cache mostly remains (obviously index files that are no longer around are going to remain the OS cache for a while, but will be useless and gradually replaced by new index files). How long warmup takes is not relevant here, but what queries you use to warm up the index and how much you auto-warm the caches. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org Sent: Sat, February 5, 2011 4:06:54 AM Subject: Re: Performance optimization of Proximity/Wildcard searches Correct me if I am wrong. Commit in index flushes SOLR cache but of course OS cache would still be useful? If a an index is updated every hour then a warm up that takes less than 5 mins should be more than enough, right? On Sat, Feb 5, 2011 at 7:42 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Salman, Warming up may be useful if your caches are getting decent hit ratios. Plus, you are warming up the OS cache when you warm up. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org Sent: Fri, February 4, 2011 3:33:41 PM Subject: Re: Performance optimization of Proximity/Wildcard searches I know so we are not really using it for regular warm-ups (in any case index is updated on hourly basis). Just tried few times to compare results. The issue is I am not even sure if warming up is useful for such regular updates. On Fri, Feb 4, 2011 at 5:16 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Salman, I only skimmed your email, but wanted to say that this part sounds a little suspicious: Our warm up script currently executes all distinct queries in our logs having count 5. It was run yesterday (with all the indexing update every It sounds like this will make warmup take a long time, assuming you have more than a handful distinct queries in your logs. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Salman Akram salman.ak...@northbaysolutions.net To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk Sent: Tue, January 25, 2011 6:32:48 AM Subject: Re: Performance optimization of Proximity/Wildcard searches By warmed index you only mean warming the SOLR cache or OS cache? As I said our index is updated every hour so I am not sure how much SOLR cache would be helpful but OS cache should still be helpful, right? I haven't compared the results with a proper script but from manual testing here are some of the observations. 'Recent' queries which are in cache of course return immediately (only if they are exactly same - even if they took 3-4 mins first time). I will need to test how many recent queries stay in cache but still this would work only for very commonqueries. User can run different queries and I want at least them to be at 'acceptable' level (5-10 secs) even if not very fast. Our warm up script currently executes all distinct queries in our logs having count 5. It was run yesterday
Re: nested faceting ?
I think what you are trying to achieve is called taxonomy facet. There is a solution for that. Check for the slides for Taxonomy faceting. http://www.lucidimagination.com/solutions/webcasts/faceting However, i don't know if you are able to render the hierachy all at once. The solution i point is for one hierachry at a time. devices (100) accessories (1000) if device is selected/clicked, then show -- Samsung (50) Sharp(50) If Accessories is selected/clicked, then show -- Samsung (500) Apple(500) -- View this message in context: http://lucene.472066.n3.nabble.com/nested-faceting-tp2389841p2449439.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr n00b question: writing a custom QueryComponent
Hi all, Been a solr user for a while now, and now I need to add some functionality to solr for which I'm trying to write a custom QueryComponent. Couldn't get much help from websearch. So, turning to solr-user for help. I'm implementing search functionality for (micro)blog aggregation. We use solr 1.4.1. In the current solr config, the title and content fields are both indexed and stored in solr. Storing takes up a lot of space, even with compression. I'd like to store the title and description field in solr in mysql and retrieve these fields in results from MySQL with an id lookup. Using the DataImportHandler won't work because we store just the title and content fields in MySQL. The rest of the fields are in solr itself. I wrote a custom component by extending QueryComponent, and overriding only the finishStage(ResponseBuilder) function where I try to retrieve the necessary records from MySQL. This is how the new QueryComponent is specified in solrconfig.xml searchComponent name=query class=org.apache.solr.handler.component.TestSolr / I see that the component is getting loaded from the solr debug output lst name=prepare double name=time1.0/double lst name=org.apache.solr.handler.component.TestSolr double name=time0.0/double /lst ... But the strange thing is that the finishStage() function is not being called before returning results. What am I missing? Secondly, functions like ResponseBuilder._responseDocs are visible only in the package org.apache.solr.handler.component. How do I access the results in my package? If you folks can give me links to a wiki or some sample custom QueryComponent, that'll be great. -- Thanks in advance. Ishwar. Just another resurrected Neozoic Archosaur comics. http://www.flickr.com/photos/mojosaurus/sets/72157600257724083/
Re: q.alt=*:* for every request?
To be able to see this well, it would be lovely to have a switch that would activate a logging of the query expansion result. The Dismax QParserPlugin is particularly powerful in there so it'd be nice to see what's happening. Any logging category I need to activate? paul Le 8 févr. 2011 à 03:22, Markus Jelsma a écrit : There is no measurable performance penalty when setting the parameter, except maybe the execution of the query with a high value for rows. To make things easy, you can define q.alt=*:* as default in your request handler. No need to specifiy it in the URL. Hi, I use dismax handler with solr 1.4. Sometimes, my request comes with q and fq, and others doesn't come with q (only fq and q.alt=*:*). It's quite ok if I send q.alt=*:* for every request? Does it have side effects on performance?