Does DataImportHandler do any sanitizing?
I am pulling some fields from a mysql database using DataImportHandler and some of them have invalid XML in them. Does DataImportHandler do any kind of filtering/sanitizing to ensure that it will go in OK or is it all on me? Example bad data: orphaned ampersands ("Peanut Butter & Jelly"), curly quotes ("we’re") -jsd-
Re: Running out of memory
On Sun, Aug 12, 2012 at 12:31 PM, Alexey Serba wrote: > > It would be vastly preferable if Solr could just exit when it gets a > memory > > error, because we have it running under daemontools, and that would cause > > an automatic restart. > -XX:OnOutOfMemoryError="; " > Run user-defined commands when an OutOfMemoryError is first thrown. > > > Does Solr require the entire index to fit in memory at all times? > No. > > But it's hard to say about your particular problem without additional > information. How often do you commit? Do you use faceting? Do you sort > by Solr fields and if yes what are those fields? And you should also > check caches. > I upgraded to solr-3.6.1 and an extra large amazon instance (15GB RAM) so we'll see if that helps. So far no out of memory errors.
Re: DataImportHandler WARNING: Unable to resolve variable
That column does not allow NULL. It's definitely an empty string, but I'm using MySQL IF() to catch it and make sure it always has something. On Thu, Aug 9, 2012 at 8:45 PM, Swati Swoboda wrote: > Ah, my bad. I was incorrect - it was not actually indexing. > > @Jon - is there a possibility that your url_type is NULL, but not empty? > Your if check only checks to see if it is empty, which is not the same as > checking to see if it is null. If it is null, that's why you'd be having > those errors - null values are just not accepted, it seems. > > Swati > > -Original Message- > From: Swati Swoboda [mailto:sswob...@igloosoftware.com] > Sent: Thursday, August 09, 2012 11:09 PM > To: solr-user@lucene.apache.org > Subject: RE: DataImportHandler WARNING: Unable to resolve variable > > I am getting a similar issue when while using a Template Transformer. My > fields *always* have a value as well - it is getting indexed correctly. > > Furthermore, the number of warnings I get seems arbitrary. I imported one > document (debug mode) and I got roughly ~400 of those warning messages for > the single field. > > -Original Message- > From: Jon Drukman [mailto:jdruk...@gmail.com] > Sent: Thursday, August 09, 2012 2:38 PM > To: solr-user@lucene.apache.org > Subject: DataImportHandler WARNING: Unable to resolve variable > > I'm trying to use DataImportHandler's delta-import functionality but I'm > getting loads of these every time it runs: > > WARNING: Unable to resolve variable: article.url_type while parsing > expression: article:${article.url_type}:${article.id} > > The definition looks like: > > query="... irrelevant ..." > > deltaQuery="select id,'dummy' as type_id FROM articles WHERE > (post_date > '${dataimporter.last_index_time}' OR updated_date > > '${dataimporter.last_index_time}') AND post_date <= NOW() AND status = > 9" > > deltaImportQuery="select id, article_seo_title, > DATE_FORMAT(post_date,'%Y-%m-%dT%H:%i:%sZ') post_date, subject, >body, IF(url_type='', 'article', url_type) url_type, > featured_image_url from articles WHERE id = ${dataimporter.delta.id}" >transformer="TemplateTransformer,HTMLStripTransformer"> > > > > > template="article:${article.url_type}:${ > article.id}" /> > > > > > > As you can see, I am always making sure that article.url_type has some > value. Why am I getting the warning? > > -jsd- >
Re: Connect to SOLR over socket file
On Fri, Aug 10, 2012 at 2:44 AM, Jason Axelson wrote: > You're correct that there is an underlying problem I'm trying to > solve. The underlying problem is that due to the security policies I > cannot run another service that listens on a TCP port, but a unix > domain socket would be okay. It looks like I might have to go with > mysql full-text search or something like metasearch (I'm using Ruby on > Rails). > > MySQL full text search is pretty terrible. You'd be better off using Lucene directly. Who's in charge of your security policies? Can you get dispensation to listen on localhost only?
Re: /solr/admin/stats.jsp null pointer exception
On Wed, Aug 8, 2012 at 3:03 PM, Chris Hostetter wrote: > I can't reproduce with teh example configs -- it looks like you've > tweaked hte logging to use the XML file format, anyway to get the > stacktrace of the "Caused by" exception so we can see what is null and > where? > Here is the caused by: Caused by: java.lang.NullPointerException at org.apache.solr.common.util.XML.escape(XML.java:197) at org.apache.solr.common.util.XML.escapeCharData(XML.java:79) at org.apache.jsp.admin.stats_jsp._jspService(org.apache.jsp.admin.stats_jsp:188) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:109) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:389) ... 29 more > > As a workarround, i would suggest switching to > "/solr/admin/mbeans?stats=true" ... moving forward you'll have to since > stats.jsp has been removed in Solr 4. > > > good to know. that's not as readable as the old format but it'll do for now. thanks. -jsd-
/solr/admin/stats.jsp null pointer exception
New install of Solr 3.6.1, getting a Null Pointer Exception when trying to access admin/stats.jsp: 2012-08-08T17:55:09 138509624 694 org.apache.solr.servlet.SolrDispatchFilter SEVERE org.apache.solr.common.SolrException log 25 org.apache.jasper.JasperException: java.lang.NullPointerException at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:418) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:486) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:380) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:283) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.NullPointerException Any ideas how to fix this? -jsd-
Re: Solr always at 100% (or more) CPU
I thought this had to be a joke, but no, you were absolutely right. Fixed it right up! Unbelievable. Thanks so much! -jsd- On Mon, Jul 9, 2012 at 10:15 AM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > Are you perhaps being bitten by the leap second bug? Just happened to > me last week. > > > http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/ > > Michael Della Bitta > > > Appinions, Inc. -- Where Influence Isn’t a Game. > http://www.appinions.com > > > On Mon, Jul 9, 2012 at 1:13 PM, Jon Drukman wrote: > > I have a very small Solr setup. The index is 32MB and there are only 8 > > fields, most of which are ints. I run a cron job every hour to use > > DataImportHandler to do a full reimport of a database which has 42,600 > rows. > > > > There is minimal traffic on the server. Maybe a few dozen queries a > > minute. Usually way less than 1 per second. They look like this: > > > > INFO: [] webapp=/solr path=/select > > > params={sort=add_date+desc&fl=content_id&start=0&q=*:*&wt=json&fq=tag_id:((10)+AND+(8))&rows=180} > > hits=35937 status=0 QTime=0 > > > > INFO: [] webapp=/solr path=/select > > > params={sort=add_date+desc&fl=content_id&start=0&q=*:*&wt=json&fq=tag_id:((10)+AND+(791+9))&rows=72} > > hits=1651 status=0 QTime=6 > > > > INFO: [] webapp=/solr path=/select > > > params={sort=add_date+desc&fl=content_id&start=0&q=*:*&wt=json&fq=tag_id:((2)+AND+(10)+AND+(20+24+16)+AND+(31+32+33+792+793))&rows=250} > > hits=6 status=0 QTime=1 > > > > QTime looks good. That's milliseconds, right? > > > > Despite this, solr's java process is constantly using 100% or more CPU. > > While writing this email I've seen it jump from 53% to 91% to 154%. > It's > > up and down all over the place. > > > > I'm worried what might happen if the traffic load actually shot up. This > > doesn't seem healthy. > > > > I'm using the Jetty config from the example directory. Solr 3.5.0 > straight > > from apache.org. > > > > # java -version > > java version "1.6.0_22" > > OpenJDK Runtime Environment (IcedTea6 1.10.6) > > (amazon-52.1.10.6.44.amzn1-x86_64) > > OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode) > > > > Amazon EC2 running Amazon's standard "Amazon Linux" distribution > (basically > > CentOS) > > > > Any advice? > > > > Thanks > > -jsd- >
Solr always at 100% (or more) CPU
I have a very small Solr setup. The index is 32MB and there are only 8 fields, most of which are ints. I run a cron job every hour to use DataImportHandler to do a full reimport of a database which has 42,600 rows. There is minimal traffic on the server. Maybe a few dozen queries a minute. Usually way less than 1 per second. They look like this: INFO: [] webapp=/solr path=/select params={sort=add_date+desc&fl=content_id&start=0&q=*:*&wt=json&fq=tag_id:((10)+AND+(8))&rows=180} hits=35937 status=0 QTime=0 INFO: [] webapp=/solr path=/select params={sort=add_date+desc&fl=content_id&start=0&q=*:*&wt=json&fq=tag_id:((10)+AND+(791+9))&rows=72} hits=1651 status=0 QTime=6 INFO: [] webapp=/solr path=/select params={sort=add_date+desc&fl=content_id&start=0&q=*:*&wt=json&fq=tag_id:((2)+AND+(10)+AND+(20+24+16)+AND+(31+32+33+792+793))&rows=250} hits=6 status=0 QTime=1 QTime looks good. That's milliseconds, right? Despite this, solr's java process is constantly using 100% or more CPU. While writing this email I've seen it jump from 53% to 91% to 154%. It's up and down all over the place. I'm worried what might happen if the traffic load actually shot up. This doesn't seem healthy. I'm using the Jetty config from the example directory. Solr 3.5.0 straight from apache.org. # java -version java version "1.6.0_22" OpenJDK Runtime Environment (IcedTea6 1.10.6) (amazon-52.1.10.6.44.amzn1-x86_64) OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode) Amazon EC2 running Amazon's standard "Amazon Linux" distribution (basically CentOS) Any advice? Thanks -jsd-
Re: Exception in DataImportHandler (stack overflow)
OK, setting the wait_timeout back to its previous value and adding readOnly didn't help, I got the stack overflow again. I re-upped the mysql timeout value again. -jsd- On Tue, May 15, 2012 at 2:42 PM, Jon Drukman wrote: > I fixed it for now by upping the wait_timeout on the mysql server. > Apparently Solr doesn't like having its connection yanked out from under > it and/or isn't smart enough to reconnect if the server goes away. I'll > set it back the way it was and try your readOnly option. > > Is there an option with DataImportHandler to have it transmit one or more > arbitrary SQL statements after connecting? If there was, I could just send > "SET wait_timeout=86400;" after connecting. That would probably prevent > this issue. > > -jsd- > > On Tue, May 15, 2012 at 2:35 PM, Dyer, James wrote: > >> Shot in the dark here, but try adding readOnly="true" to your dataSource >> tag. >> >> >> >> This sets autocommit to true and sets the Holdability to >> ResultSet.CLOSE_CURSORS_AT_COMMIT. DIH does not explicitly close >> resultsets and maybe if your JDBC driver also manages this poorly you could >> end up with strange conditions like the one you're getting? It could be a >> case where your data has grown just over the limit your setup can handle >> under such an unfortunate circumstance. >> >> Let me know if this solves it. If so, we probably should open a bug >> report and get this fixed in DIH. >> >> James Dyer >> E-Commerce Systems >> Ingram Content Group >> (615) 213-4311 >> >> >> -Original Message- >> From: Jon Drukman [mailto:jdruk...@gmail.com] >> Sent: Tuesday, May 15, 2012 4:12 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Exception in DataImportHandler (stack overflow) >> >> i don't think so, my config is straightforward: >> >> >> > url="jdbc:mysql://x/xx" >> user="x" password="x" batchSize="-1" /> >> >>> query="select content_id, description, title, add_date from >> content_solr where active = '1'"> >> > query="select tag_id from tags_assoc where content_id = >> '${content.content_id}'" /> >> > query="select count(1) as likes from votes where content_id = >> '${content.content_id}'" /> >> > query="select sum(views) as views from media_views mv join >> content_media cm USING (media_id) WHERE cm.content_id = >> '${content.content_id}'" /> >> >> >> >> >> i'm triggering the import with: >> >> http://localhost:8983/solr/dataimport?command=full-import&clean=true&commit=true >> >> >> >> On Tue, May 15, 2012 at 2:07 PM, Michael Della Bitta < >> michael.della.bi...@appinions.com> wrote: >> >> > Hi, Jon: >> > >> > Well, you don't see that every day! >> > >> > Is it possible that you have something weird going on in your DDL >> > and/or queries, like a tree schema that now suddenly has a cyclical >> > reference? >> > >> > Michael >> > >> > On Tue, May 15, 2012 at 4:33 PM, Jon Drukman >> wrote: >> > > I have a machine which does a full update using DataImportHandler >> every >> > > hour. It worked up until a little while ago. I did not change the >> > > dataconfig.xml or version of Solr. >> > > >> > > Here is the beginning of the error in the log (the real thing runs for >> > > thousands of lines) >> > > >> > > 2012-05-15 12:44:30.724166500 SEVERE: Full Import >> > > failed:org.apache.solr.handler.dataimport.DataImportHandlerException: >> > > java.lang.StackOverflowError >> > > 2012-05-15 12:44:30.724168500 at >> > > >> > >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) >> > > 2012-05-15 12:44:30.724169500 at >> > > >> > >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) >> > > 2012-05-15 12:44:30.724171500 at >> > > >> > >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) >> > > 2012-05-15 12:44:30.724219500 at >> > > >> > >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) >> > > 2012-
Re: Exception in DataImportHandler (stack overflow)
I fixed it for now by upping the wait_timeout on the mysql server. Apparently Solr doesn't like having its connection yanked out from under it and/or isn't smart enough to reconnect if the server goes away. I'll set it back the way it was and try your readOnly option. Is there an option with DataImportHandler to have it transmit one or more arbitrary SQL statements after connecting? If there was, I could just send "SET wait_timeout=86400;" after connecting. That would probably prevent this issue. -jsd- On Tue, May 15, 2012 at 2:35 PM, Dyer, James wrote: > Shot in the dark here, but try adding readOnly="true" to your dataSource > tag. > > > > This sets autocommit to true and sets the Holdability to > ResultSet.CLOSE_CURSORS_AT_COMMIT. DIH does not explicitly close > resultsets and maybe if your JDBC driver also manages this poorly you could > end up with strange conditions like the one you're getting? It could be a > case where your data has grown just over the limit your setup can handle > under such an unfortunate circumstance. > > Let me know if this solves it. If so, we probably should open a bug > report and get this fixed in DIH. > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Jon Drukman [mailto:jdruk...@gmail.com] > Sent: Tuesday, May 15, 2012 4:12 PM > To: solr-user@lucene.apache.org > Subject: Re: Exception in DataImportHandler (stack overflow) > > i don't think so, my config is straightforward: > > > url="jdbc:mysql://x/xx" > user="x" password="x" batchSize="-1" /> > > query="select content_id, description, title, add_date from > content_solr where active = '1'"> > query="select tag_id from tags_assoc where content_id = > '${content.content_id}'" /> > query="select count(1) as likes from votes where content_id = > '${content.content_id}'" /> > query="select sum(views) as views from media_views mv join > content_media cm USING (media_id) WHERE cm.content_id = > '${content.content_id}'" /> > > > > > i'm triggering the import with: > > http://localhost:8983/solr/dataimport?command=full-import&clean=true&commit=true > > > > On Tue, May 15, 2012 at 2:07 PM, Michael Della Bitta < > michael.della.bi...@appinions.com> wrote: > > > Hi, Jon: > > > > Well, you don't see that every day! > > > > Is it possible that you have something weird going on in your DDL > > and/or queries, like a tree schema that now suddenly has a cyclical > > reference? > > > > Michael > > > > On Tue, May 15, 2012 at 4:33 PM, Jon Drukman wrote: > > > I have a machine which does a full update using DataImportHandler every > > > hour. It worked up until a little while ago. I did not change the > > > dataconfig.xml or version of Solr. > > > > > > Here is the beginning of the error in the log (the real thing runs for > > > thousands of lines) > > > > > > 2012-05-15 12:44:30.724166500 SEVERE: Full Import > > > failed:org.apache.solr.handler.dataimport.DataImportHandlerException: > > > java.lang.StackOverflowError > > > 2012-05-15 12:44:30.724168500 at > > > > > > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) > > > 2012-05-15 12:44:30.724169500 at > > > > > > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) > > > 2012-05-15 12:44:30.724171500 at > > > > > > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) > > > 2012-05-15 12:44:30.724219500 at > > > > > > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) > > > 2012-05-15 12:44:30.724221500 at > > > > > > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) > > > 2012-05-15 12:44:30.724223500 at > > > > > > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) > > > 2012-05-15 12:44:30.724224500 Caused by: java.lang.StackOverflowError > > > 2012-05-15 12:44:30.724225500 at > > > java.lang.String.checkBounds(String.java:404) > > > 2012-05-15 12:44:30.724234500 at > java.lang.String.(String.java:450) > > > 2012-05-15 12:44:30.724235500 at > java.lang.String.(String.java:523) > > > 2012-05-15 12:44:30.724236
Re: Exception in DataImportHandler (stack overflow)
i don't think so, my config is straightforward: i'm triggering the import with: http://localhost:8983/solr/dataimport?command=full-import&clean=true&commit=true On Tue, May 15, 2012 at 2:07 PM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > Hi, Jon: > > Well, you don't see that every day! > > Is it possible that you have something weird going on in your DDL > and/or queries, like a tree schema that now suddenly has a cyclical > reference? > > Michael > > On Tue, May 15, 2012 at 4:33 PM, Jon Drukman wrote: > > I have a machine which does a full update using DataImportHandler every > > hour. It worked up until a little while ago. I did not change the > > dataconfig.xml or version of Solr. > > > > Here is the beginning of the error in the log (the real thing runs for > > thousands of lines) > > > > 2012-05-15 12:44:30.724166500 SEVERE: Full Import > > failed:org.apache.solr.handler.dataimport.DataImportHandlerException: > > java.lang.StackOverflowError > > 2012-05-15 12:44:30.724168500 at > > > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) > > 2012-05-15 12:44:30.724169500 at > > > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) > > 2012-05-15 12:44:30.724171500 at > > > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) > > 2012-05-15 12:44:30.724219500 at > > > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) > > 2012-05-15 12:44:30.724221500 at > > > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) > > 2012-05-15 12:44:30.724223500 at > > > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) > > 2012-05-15 12:44:30.724224500 Caused by: java.lang.StackOverflowError > > 2012-05-15 12:44:30.724225500 at > > java.lang.String.checkBounds(String.java:404) > > 2012-05-15 12:44:30.724234500 at java.lang.String.(String.java:450) > > 2012-05-15 12:44:30.724235500 at java.lang.String.(String.java:523) > > 2012-05-15 12:44:30.724236500 at > > java.net.SocketOutputStream.socketWrite0(Native Method) > > 2012-05-15 12:44:30.724238500 at > > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) > > 2012-05-15 12:44:30.724239500 at > > java.net.SocketOutputStream.write(SocketOutputStream.java:153) > > 2012-05-15 12:44:30.724253500 at > > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > > 2012-05-15 12:44:30.724254500 at > > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > > 2012-05-15 12:44:30.724256500 at > > com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3345) > > 2012-05-15 12:44:30.724257500 at > > com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1983) > > 2012-05-15 12:44:30.724259500 at > > com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163) > > 2012-05-15 12:44:30.724267500 at > > com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2618) > > 2012-05-15 12:44:30.724268500 at > > > com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1644) > > 2012-05-15 12:44:30.724270500 at > > com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:198) > > 2012-05-15 12:44:30.724271500 at > > com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7617) > > 2012-05-15 12:44:30.724273500 at > > com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:907) > > 2012-05-15 12:44:30.724280500 at > > com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2478) > > 2012-05-15 12:44:30.724282500 at > > > com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1584) > > 2012-05-15 12:44:30.724283500 at > > com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4364) > > 2012-05-15 12:44:30.724285500 at > > com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1360) > > 2012-05-15 12:44:30.724286500 at > > com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2652) > > 2012-05-15 12:44:30.724321500 at > > > com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1644) > > 2012-05-15 12:44:30.724322500 at > > com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:198) > > 2012-05-15 12:44:30.724324500 at > > com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7617) > > 2012-05-15 12:44:30.724325500 at > > com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:907) > > 2012-05-15 12:44:30.724327500 at > > com.mysql.jd
Facet auto-suggest
I don't even know what to call this feature. Here's a website that shows the problem: http://pulse.audiusanews.com/pulse/index.php Notice that you can end up in a situation where there are no results. For example, in order, press: People, Performance, Technology, Photos. The client wants it so that when you click a button, it disables buttons that would lead to a dead end. In other words, after clicking Technology, the Photos button would be disabled. Can Solr help with this? -jsd-
Re: Case insensitive but number sensitive string?
Ahmet Arslan yahoo.com> writes: > > > I want a string field that is case > > insensitive. This is what I tried: > > > > > sortMissingLast="true" > > omitNorms="true"> > > > > > > > > > > > > > > > > > > > > > > > > However, it is matching "opengl" for "opengl128". I > > want exact string matches, > > but I want them case-insensitive. What did I do > > wrong? > > > > class="solr.StrField" should be class="solr.TextField" > > This is what I ended up with. Seems to work:
Case insensitive but number sensitive string?
I want a string field that is case insensitive. This is what I tried: However, it is matching "opengl" for "opengl128". I want exact string matches, but I want them case-insensitive. What did I do wrong?
Sorting - bad performance
The performance factors wiki says: "If you do a lot of field based sorting, it is advantageous to add explicitly warming queries to the "newSearcher" and "firstSearcher" event listeners in your solrconfig which sort on those fields, so the FieldCache is populated prior to any queries being executed by your users." I've got an index with 24+ million docs of forum posts from users. I want to be able to get a given user's posts sorted by date. It's taking 20 seconds right now. What would I put in the newSearch/firstSearcher to make that quicker? Is there any other general approach I can use to speed up sorting? The schema looks like cistring is a case-insensitive string type i created:
Shutdown hook executing for a long time
2011-02-16 11:32:45.489::INFO: Shutdown hook executing 2011-02-16 11:35:36.002::INFO: Shutdown hook complete The shutdown time seems to be proportional to the amount of time that Solr has been running. If I immediately restart and shut down again, it takes a fraction of a second. What causes it to take so long to shut down and is there anything I can do to make it happen quicker?
DataImportHandler: regex debugging
I am trying to use the regex transformer but it's not returning anything. Either my regex is wrong, or I've done something else wrong in the setup of the entity. Is there any way to debug this? Making a change and waiting 7 minutes to reindex the entity sucks. This returns columns that are either null, or have some comma-separated strings. I want the bit up to the first comma, if it exists. Ideally I could have it log the query and the input/output of the field statements.
DataImportHandler: no queries when using entity=something
So I'm trying to update a single entity in my index using DataImportHandler. http://solr:8983/solr/dataimport?command=full-import&entity=games It ends near-instantaneously without hitting the database at all, apparently. Status shows: 0 0 0 0 Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. 2011-02-02 16:24:13 2011-02-02 16:24:13 0:0:0.20 The query isn't that extreme. It returns 8771 rows in about 3 seconds. How can I debug this?
Re: DataImportHandler: full import of a single entity
Ahmet Arslan yahoo.com> writes: > > > I've got a DataImportHandler set up > > with 5 entities. I would like to do a full > > import on just one entity. Is that possible? > > > > Yes, there is a parameter named entity for that. > solr/dataimport?command=full-import&entity=myEntity That seems to delete the entire index and replace it with only the contents of that one entity. Is there no way to leave the index alone for the other entities and just redo that one?
DataImportHandler: full import of a single entity
I've got a DataImportHandler set up with 5 entities. I would like to do a full import on just one entity. Is that possible? I worked around it temporarily by hand editing the dataimport.properties file and deleting the delta line for that one entity, and kicking off a delta. But for (hopefully) obvious reasons, delta is less efficient than full. -jsd-
Boosting on a document value
I've got a document with a "type" field. If the type is 1, I want to boost the document's relevancy, but type=1 is not a requirement. Types other than 1 should still be returned and scored as normal, just without the boost. How do I do this? -jsd-
Re: Searching with AND + OR and spaces
Ahmet Arslan yahoo.com> writes: > > > (title:"Call of Duty" OR subhead:"Call of Duty") > > > > No matches, despite the fact that there are many documents > > that should match. > > Field types of title and subhead are important here. Do you use stopwordfilterfactory with enable > position increments? text is the default that comes with schema.xml, it has the enable position increments stopwordfilterfactory. > What is you solr version? 1.4 > > So I left out the quotes, and it seems to work. But > > now when I try doing things > > like > > > > title:Call of Duty OR subhead:Call of Duty AND type:4 > > > > Try using parenthesis. > title:(Call of Duty) OR subhead:(Call of Duty) AND type:4 that seems to work a lot better, thanks!!
Searching with AND + OR and spaces
I want to search two fields for the phrase Call Of Duty. I tried this: (title:"Call of Duty" OR subhead:"Call of Duty") No matches, despite the fact that there are many documents that should match. So I left out the quotes, and it seems to work. But now when I try doing things like title:Call of Duty OR subhead:Call of Duty AND type:4 I get a lot of things like "called it!" and "i'm taking calls" but call of duty doesn't surface. How can I get what I want? -jsd-
Re: SEVERE: Could not start SOLR. Check solr/home property
On 4/27/10 12:04 PM, Chris Hostetter wrote: : SEVERE: Could not start SOLR. Check solr/home property it means something when horribly wrong when starting solr, and since this is frequently caused by either an incorrect explicit solr/home or an incorrect implicitly guessed solr home, that is mentioned in the error message as something to check. it's part of that error message because in cases where the solr home is the problem, that may be the only meaningful error message. in your case however, you have a much more specific error message... : java.lang.RuntimeException: java.io.IOException: read past EOF :at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) :at org.apache.solr.core.SolrCore.(SolrCore.java:579) :at ...so something seems to be seriously wrong with your index. farther down below this exception in your logs, there should be a more detailed execption explaining what file it had problems reading. If you could post that *full* error message it might help us track this down ... i would also suggest trying to use the CheckIndex tool to see if somehow your index got corrupted. Yes, the index was corrupted. I don't know how it happened. Like I said, I set the box up months ago and forgot about it. They decided they wanted to use it so I tried to fire it up. After deleting the index, solr started again just fine without any configuration changes. Like I said, I have never explicitly set solr/home, in any of my production configs, and it always works. Thanks -jsd-
Re: SEVERE: Could not start SOLR. Check solr/home property
On 4/26/10 1:18 PM, Siddhant Goel wrote: Did you by any chance set up multicore? Try passing in the path to the Solr home directory as -Dsolr.solr.home=/path/to/solr/home while you start Solr. Nope, no multicore. I destroyed the index and re-created it from scratch and now it works fine. No idea what was going on there. Luckily it takes < 10 minutes to create and the box is not in production yet.
SEVERE: Could not start SOLR. Check solr/home property
What does this error mean? SEVERE: Could not start SOLR. Check solr/home property I've had this solr installation working before, but I haven't looked at it in a few months. I checked it today and the web side is returning a 500 error, the log file shows this when starting up: SEVERE: Could not start SOLR. Check solr/home property java.lang.RuntimeException: java.io.IOException: read past EOF at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.(SolrCore.java:579) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) For the record, I've never explictly set "solr/home" ever. It always "just worked". -jsd-
Boost documents based on a constant value in a field
I have a very simple schema: two integers and two text fields. required="true" /> stored="true"/> I want to do full text searching on the text fields as normal. However, I want to boost all documents where question_source == 3 to the top. How do I do that? So the results should be: All documents where question_source == 3 first, sorted by relevance in the text fields. All other documents sorted by text field relevance. How do I achieve this? -jsd-
DataImportHandler delta-import confusion
First, let me just say that DataImportHandler is fantastic. It got my old mysql-php-xml index rebuild process down from 30 hours to 6 minutes. I'm trying to use the delta-import functionality now but failing miserably. Here's my entity tag: (some SELECT statements reduced to increase readability) deltaQuery="select moment_id from moments where date_modified > '${dataimporter.last_index_time}'" deltaImportQuery="select [bunch of stuff] WHERE m.moment_id = '${dataimporter.delta.MOMENTID}'" pk="MOMENTID" transformer="TemplateTransformer"> When I look at the MySQL query log I see the date modified query running fine and returning 3 rows. The deltaImportQuery, however, does not have the proper primary key in the where clause. It's just blank. I also tried changing it to ${moment.MOMENTID}. I don't really get the relation between the pk field and the ${dataimport.delta.whatever} stuff. Help please! -jsd-
Re: stemming (maybe?) question
Yonik Seeley wrote: Not sure... I just took the stock solr example, and it worked fine. I inserted "o'meara" into example/exampledocs/solr.xml Advanced o'meara Full-Text Search Capabilities using Lucene the indexed everything: ./post.sh *.xml Then queried in various ways: q=o'meara q=omeara q=o%20meara All of the queries found the solr doc. i grabbed the original example schema.xml and made my username field use the following definition: positionIncrementGap="100"> generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> i removed the stopwords and porter stuff because for proper names i don't want that. seems to work fine now, thanks! -jsd-
Re: stemming (maybe?) question
Yonik Seeley wrote: On Thu, Mar 12, 2009 at 1:36 PM, Jon Drukman wrote: is it possible to make solr think that "omeara" and "o'meara" are the same thing? WordDelimiter would handle it if the document had "o'meara" (but you may or may not want the other stuff that comes with WordDelimiterFilter). You could also use a PatternReplaceFilter to normalize tokens like this. the document does have o'meara in it. i tried creating a new field type based on the wiki information. positionIncrementGap="100"> i reindexed everything but now any search on that field returns zero results. what did i do wrong? -jsd-
stemming (maybe?) question
is it possible to make solr think that "omeara" and "o'meara" are the same thing? -jsd-
Re: exceeded limit of maxWarmingSearchers
Otis Gospodnetic wrote: I'd say: "Make sure you don't commit more frequently than the time it takes for your searcher to warm up", or else you risk searcher overlap and pile-up. cool. i found a place in our code where we were committing the same thing twice in very rapid succession. fingers crossed that fixing that will solve this problem once and for all. thanks -jsd-
Re: exceeded limit of maxWarmingSearchers
Otis Gospodnetic wrote: Jon, If you can, don't commit on every update and that should help or fully solve your problem. is there any sort of heuristic or formula i can apply that can tell me when to commit? put it in a cron job and fire it once per hour? there are certain updates that are critical - we store privacy settings on certain data in the doc. if the user says that document 10 is private, we need to have the update reflected immediately. is there any way to have solr block everything until an update is committed? -jsd-
Re: exceeded limit of maxWarmingSearchers
Otis Gospodnetic wrote: That should be fine (but apparently isn't), as long as you don't have some very slow machine or if your caches are are large and configured to copy a lot of data on commit. this is becoming more and more problematic. we have periods where we get 10 of these exceptions in a 4 second period. how do i diagnose what the cause is, or alternatively work around it? when you say "copy" are you talking about copyFields or something else? we commit on every update, but each update is very small... just a few hundred bytes on average.
Re: exceeded limit of maxWarmingSearchers
Yonik Seeley wrote: I'd advise setting it to a very low limit (like 2) and committing less often. Once you get too many overlapping searchers, things will slow to a crawl and that will just cause more to pile up. The root cause is simply too many commits in conjunction with warming too long. If you are using a dev version of Solr 1.4, you might try commitWithin instead of explicit commits. (see SOLR-793) Depending how long warming takes, you may want to lower autowarm counts. right now we commit on every update, but that's probably not more than once every few minutes. should i back it off? -jsd-
exceeded limit of maxWarmingSearchers
I am getting hit by a storm of these once a day or so: SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=16, try again later. I keep bumping up maxWarmingSearchers. It's at 32 now. Is there any way to figure out what the "right" value is besides trial and error? Our site gets extremely minimal traffic so I'm really puzzled why the out-of-the-box settings are insufficient. The index has about 61000 documents, very small, and we do less than one query per second. -jsd-
Re: I get SEVERE: Lock obtain timed out
Yonik Seeley wrote: On Thu, Jan 29, 2009 at 1:16 PM, Jon Drukman wrote: Julian, have you had any luck figuring this out? My production instance just started having this problem. It seems to crop up after solr's been running for several hours. Our usage is very light (maybe one query every few seconds). I saw someone else mention an out of memory error - this machine has 8GB of RAM and is running 64bit Linux so it's all available to solr. Our index is very small - under 40MB. the solr process is using around 615MB of RAM according to top. I've only seen failure to remove the lock file either when an OOM exception occured, or the JVM died or was killed. i guess it's possible that we hit an out of memory error and the followup lock errors just bumped it out of the log file rotation. i was running with multilog's default settings so my log files were getting thrown out very quickly. i just bumped up the JVM's max heap size and told multilog to keep way more log files so if this happens again hopefully i will be able to get more info on what happened. -jsd-
Re: permanently setting log level?
Vannia Rajan wrote: On Thu, Jan 29, 2009 at 11:55 PM, Jon Drukman wrote: if i go to /solr/admin/logging, i can set the "root" log level to WARNING, which is what i want. however, every time solr restarts, it is set back to INFO. Is there a way to get the WARNING level to stick permanently? Hi, You can set permanent logging-level by changing parameters in $CATALINA_HOME/conf/logging.properties Change all INFO to WARNING in the logging.properties where, $CATALINA_HOME is the path of your apache-tomcat. i'm not using tomcat, i'm using the default jetty setup that comes with solr. i grepped through the entire solr installation for 'INFO' but i don't see it. i don't really know anything about jetty other than i have to run java -jar start.jar to get it to run solr.
permanently setting log level?
if i go to /solr/admin/logging, i can set the "root" log level to WARNING, which is what i want. however, every time solr restarts, it is set back to INFO. Is there a way to get the WARNING level to stick permanently? -jsd-
Re: I get SEVERE: Lock obtain timed out
Julian Davchev wrote: Hi, Any documents or something I can read on how locks work and how I can controll it. When do they occur etc. Cause only way I got out of this mess was restarting tomcat SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SingleInstanceLock: write.lock Cheers, Julian, have you had any luck figuring this out? My production instance just started having this problem. It seems to crop up after solr's been running for several hours. Our usage is very light (maybe one query every few seconds). I saw someone else mention an out of memory error - this machine has 8GB of RAM and is running 64bit Linux so it's all available to solr. Our index is very small - under 40MB. the solr process is using around 615MB of RAM according to top.
Handling proper names
Is there any way to tell Solr that Stephen is the same as Steven and Steve? Carl and Karl? Bobby/Bob/Robert, and so on... -jsd-
Re: exceeded limit of maxWarmingSearchers
Feak, Todd wrote: Have you looked at how long your warm up is taking? If it's taking longer to warm up a searcher then it does for you to do an update, you will be behind the curve and eventually run into this no matter how big that number. Most of them say warmupTime=0. It ranges from 0 to 37. I hope that is msec and not seconds!! As I said, this server is not even remotely loaded, and the index is very small right now - under 5 MB. -jsd-
exceeded limit of maxWarmingSearchers
I am getting this error quite frequently on my Solr installation: SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=8, try again later. I've done some googling but the common explanation of it being related to autocommit doesn't apply. Our server is not even in public use yet, it's serving maybe one query every second, or less. I don't understand what could be causing this. We do a commit on every update, but updates are very infrequent. One every few minutes, and it's a very small update as well. -jsd-
dismax and stopwords (was Re: dismax and long phrases)
Norberto Meijome wrote: On Tue, 07 Oct 2008 09:27:30 -0700 Jon Drukman <[EMAIL PROTECTED]> wrote: Yep, you can "fake" it by only using fieldsets (qf) that have a consistent set of stopwords. does that mean changing the query or changing the schema? Jon, - you change schema.xml to define which type each field is. The fieldType says whether you have stopwords or not. - you change solrconfig.xml to define which fields will dismax query on. i dont think you should have to change your query. i got it to work. the solution is: add a new field to the schema without stopwords, i use the following type: positionIncrementGap="100"> then use copyField to copy the stopworded version to a second, non-stopworded field. add the non-stopword field to the dismax qf and pf fields. in this example, the stopword field is name and the non-stopword field is name_text: name^1.5 name_text^1.8 description^1.0 tags^0.5 location^0.6 user_name^0.4 misc^0.3 group_name^1.5 name^1.5 name_text^1.8 description^1.0 group_name^1.5 restart solr and reindex everything. it now works. thanks for all the help! -jsd-
Re: dismax and long phrases
Mike Klaas wrote: On 6-Oct-08, at 11:20 AM, Jon Drukman wrote: Chris Hostetter wrote: It's not a bug in the implementation, it's a side effect of the basic tenent of how dismax works since it inverts the input and creates a DisjunctionMaxQuery for each "word" in the input, any word that is valid in at least one of the "qf" fields generates a "should" clause that contributes to the MM count. you guys are going way over my head now. is there any way i could 'fake' it by adding a second field without stopwords, or something like that? Yep, you can "fake" it by only using fieldsets (qf) that have a consistent set of stopwords. does that mean changing the query or changing the schema? i'm sorry, this is all new to me. speak slowly and use words of one syllable or less, please. :) -jsd-
Re: dismax and long phrases
Chris Hostetter wrote: It's not a bug in the implementation, it's a side effect of the basic tenent of how dismax works since it inverts the input and creates a DisjunctionMaxQuery for each "word" in the input, any word that is valid in at least one of the "qf" fields generates a "should" clause that contributes to the MM count. you guys are going way over my head now. is there any way i could 'fake' it by adding a second field without stopwords, or something like that? -jsd-
dismax and long phrases
i have a document with the following field Saying goodbye to Norman if i search for "saying goodbye to norman" with the standard query, it works fine. if i specify dismax, however, it does not match. here's the output of debugQuery, which I don't understand at all: saying goodbye to norman saying goodbye to norman +((DisjunctionMaxQuery((user_name:saying^0.4 | description:say | tags:say^0.5 | misc:say^0.3 | group_name:say^1.5 | location:saying^0.6 | name:say^1.5)~0.01) DisjunctionMaxQuery((user_name:goodbye^0.4 | description:goodby | tags:goodby^0.5 | misc:goodby^0.3 | group_name:goodby^1.5 | location:goodbye^0.6 | name:goodby^1.5)~0.01) DisjunctionMaxQuery((user_name:to^0.4 | location:to^0.6)~0.01) DisjunctionMaxQuery((user_name:norman^0.4 | description:norman | tags:norman^0.5 | misc:norman^0.3 | group_name:norman^1.5 | location:norman^0.6 | name:norman^1.5)~0.01))~4) DisjunctionMaxQuery((description:"say goodby norman"~100 | group_name:"say goodby norman"~100^1.5 | name:"say goodby norman"~100^1.5)~0.01) +(((user_name:saying^0.4 | description:say | tags:say^0.5 | misc:say^0.3 | group_name:say^1.5 | location:saying^0.6 | name:say^1.5)~0.01 (user_name:goodbye^0.4 | description:goodby | tags:goodby^0.5 | misc:goodby^0.3 | group_name:goodby^1.5 | location:goodbye^0.6 | name:goodby^1.5)~0.01 (user_name:to^0.4 | location:to^0.6)~0.01 (user_name:norman^0.4 | description:norman | tags:norman^0.5 | misc:norman^0.3 | group_name:norman^1.5 | location:norman^0.6 | name:norman^1.5)~0.01)~4) (description:"say goodby norman"~100 | group_name:"say goodby norman"~100^1.5 | name:"say goodby norman"~100^1.5)~0.01 it works fine if I search for "say goodbye" or "saying goodbye" or "saying goodbye norman". how can i get it to do exact matches (which should score very high)? -jsd-
Re: help required: how to design a large scale solr system
Martin Iwanowski wrote: How can I setup to run Solr as a service, so I don't need to have a SSH connection open? The advice that I was given on this very list was to use daemontools. I set it up and it is really great - starts when the machine boots, auto-restart on failures, easy to bring up/down on demand. Search the archive for my post on the subject, I explained how to set it up in detail. (I've also had success using launchd to manage Solr on Mac OS X in case anyone wants to try running it on their desktop.) -jsd-
Re: dismax - undefined field exception
Sean Timm wrote: Add echoParams=all to your URL and look for the "cat" field in one of the passed parameters. Specifically, in pf and qf. These can be defaulted in the solrconfig.xml file. i tried that but the exception prevents solr from returning anything. but i did look in solrconfig.xml and i see what you're talking about. looks like that was the ticket. thanks! -jsd-
dismax - undefined field exception
whenever i try to use qt=dismax i get the following error: Sep 22, 2008 11:50:48 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: undefined field cat at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1053) i don't have any dynamic fields in my schema, and there is nothing named 'cat'. my schema looks like this (minus the parts that came with the default schema.xml): required="true" /> type_id name i thought i used to have this working but now i'm not so sure. -jsd-
How to use copyfield with dynamicfield?
I have a dynamicField declaration: I want to copy any *_t's into a text field for searching with dismax. As it is, it appears you can't search dynamicfields this way. I tried adding a copyField: I do have a text field in my schema: However I get 400 errors whenever I try to update a record with entries in the *_t. INFO: /update 0 2 Sep 22, 2008 10:04:40 AM org.apache.solr.core.SolrException log SEVERE: org.apache.solr.core.SolrException: ERROR: multiple values encountered for non multiValued field text: first='Centennial Dr, Oakland, CA' second='' at org.apache.solr.update.DocumentBuilder.addSingleField(DocumentBuilder.java:62) I'm going to guess that the copyField with a wildcard is not allowed. If that is true, how does one deal with the situation where you want to allow new fields AND have them searchable? -jsd-
Re: Illegal character in xml file
James liu wrote: > first, u should escape some string like (code by php) > >> function escapeChars($string) { >> > $string = str_replace("&", "&", $string); > > $string = str_replace("<", "<", $string); > > $string = str_replace(">", ">", $string); > > $string = str_replace("'", "'", $string); > > $string = str_replace('"', """, $string); > > > return $string; > > } php has this as a built in function. $string = htmlentities($string); that's what i use to protect my solr input. -jsd-
Re: Dismax + Dynamic fields
Daniel Papasian wrote: Norberto Meijome wrote: Thanks Yonik. ok, that matches what I've seen - if i know the actual name of the field I'm after, I can use it in a query it, but i can't use the dynamic_field_name_* (with wildcard) in the config. Is adding support for this something that is desirable / needed (doable??) , and is it being worked on ? You can use a wildcard with copyFrom to copy the dynamic fields that match the pattern to another field that you can then query on. It seems like that would cover your needs, no? this is biting me right now and i don't understand how to specify the copyFrom to do what i want. i have a dynamic field declaration like: in the documents that i'm adding i am specifying location_t and group_t, for example, although i may decide to add more later - obviously that seems like the ideal use case for the dynamicField. however i cannot search these fields unless i specify them explicitly (q=location_t:something) and it doesn't work with dismax. i want all fields searchable, otherwise why would i bother with indexed="true" in the dynamicField? how do i use copyFrom to search location_t, group_t and any other _t i might decide to add later? -jsd-
Adding a field?
Is there a way to add a field to an existing index without stopping the server, deleting the index, and reloading every document from scratch? -jsd-
Re: Solr won't start under jetty on RHEL5.2
Jon Drukman wrote: I just migrated my solr instance to a new server, running RHEL5.2. I installed java from yum but I suspect it's different from the one I used to use. Turns out my instincts were correct. The version from yum does not work. I installed the official sun jdk and now it starts fine. bad: java version "1.4.2" gij (GNU libgcj) version 4.1.2 20071124 (Red Hat 4.1.2-42) good: java version "1.6.0_07" Java(TM) SE Runtime Environment (build 1.6.0_07-b06) Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode) -jsd-
Solr won't start under jetty on RHEL5.2
I just migrated my solr instance to a new server, running RHEL5.2. I installed java from yum but I suspect it's different from the one I used to use. Anyway, my Solr no longer works. 2008-08-18 18:01:12.079::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2008-08-18 18:01:12.229::INFO: jetty-6.1.3 2008-08-18 18:01:12.330::INFO: Extract jar:file:/home/apps/solr/solr-1.2.0/webapps/solr.war!/ to /tmp/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp 2008-08-18 18:01:12.452::INFO: NO JSP Support for /solr, did not find org.apache.jasper.servlet.JspServlet 18-Aug-08 6:01:12 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 18-Aug-08 6:01:12 PM org.apache.solr.core.Config getInstanceDir INFO: JNDI not configured for Solr (NoInitialContextEx) 18-Aug-08 6:01:12 PM org.apache.solr.core.Config getInstanceDir INFO: Solr home defaulted to 'null' (could not find system property or JNDI) 18-Aug-08 6:01:12 PM org.apache.solr.core.Config setInstanceDir INFO: Solr home set to 'solr/' 18-Aug-08 6:01:12 PM org.apache.solr.core.SolrConfig initConfig INFO: Loaded SolrConfig: solrconfig.xml 18-Aug-08 6:01:12 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: user.dir=/home/apps/solr/solr-1.2.0 2008-08-18 18:01:12.663::WARN: failed SolrRequestFilter java.lang.NoClassDefFoundError: org.apache.solr.core.SolrCore at java.lang.Class.initializeClass(libgcj.so.7rh) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at java.lang.reflect.Method.invoke(libgcj.so.7rh) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) All attempts to load solr pages result in 404 not found errors. I suspect this is a Jetty configuration problem but I know nothing about jetty or servlet containers or anything like that. Could someone explain in words of one syllable or less how to get it to find the installation please? Thanks -jsd-
Re: Administrative questions
Jason Rennie wrote: On Wed, Aug 13, 2008 at 1:52 PM, Jon Drukman <[EMAIL PROTECTED]> wrote: Duh. I should have thought of that. I'm a big fan of djbdns so I'm quite familiar with daemontools. Thanks! :) My pleasure. Was nice to hear recently that DJB is moving toward more flexible licensing terms. For anyone unfamiliar w/ daemontools, here's DJB's explanation of why they rock compared to inittab, ttys, init.d, and rc.local: http://cr.yp.to/daemontools/faq/create.html#why in case anybody wants to know, here's how to run solr under daemontools. 1. install daemontools 2. create /etc/solr 3. create a user and group called solr 4. create shell script /etc/solr/run (edit to taste, i'm using the default jetty that comes with solr) #!/bin/sh exec 2>&1 cd /usr/local/apache-solr-1.2.0/example exec setuidgid solr java -jar start.jar 4. create /etc/solr/log/run containing: #!/bin/sh exec setuidgid solr multilog t ./main 5. ln -s /etc/solr /service/solr that is all. as long as you've got svscan set to launch when the system boots, solr will run and auto-restart on crashes. logs will be in /service/solr/log/main (auto-rotated). yay. -jsd-
Re: Administrative questions
Jason Rennie wrote: On Tue, Aug 12, 2008 at 8:49 PM, Jon Drukman <[EMAIL PROTECTED]> wrote: 1. How do people deal with having solr start when system reboots, manage the log output, etc. Right now I run it manually under a unix 'screen' command with a wrapper script that takes care of restarts when it crashes. That means that only my user can connect to it, and it can't happen when the system starts up... But I don't see any other way to control the process easily. We use daemontools. Restarts solr whenever it goes down (for whatever reason) and directs output to a set of rotated log files. Very handy for a production environment. A bit tricky to set, but solid once you have it in place. http://cr.yp.to/daemontools.html *facepalm* Duh. I should have thought of that. I'm a big fan of djbdns so I'm quite familiar with daemontools. Thanks! -jsd-
Administrative questions
1. How do people deal with having solr start when system reboots, manage the log output, etc. Right now I run it manually under a unix 'screen' command with a wrapper script that takes care of restarts when it crashes. That means that only my user can connect to it, and it can't happen when the system starts up... But I don't see any other way to control the process easily. 2. Is there any way to modify a schema without stopping the process, destroying the existing index, then restarting and reloading all the data? It doesn't take that long and we're not in production yet, but once we're live I can't see that being feasible. -jsd-
Re: Wildcard search question
Norberto Meijome wrote: ok well let's say that i can live without john/jon in the short term. what i really need today is a case insensitive wildcard search with literal matching (no fancy stemming. bobby is bobby, not bobbi.) what are my options? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters define your own type (or modify text / string... but I find that it gets confusing to have variations of text / string ...) to perform the operations on the content as needed. There are also other tokenizer/analysers available that *may* help in the partial searches (ngram , edgengram ), but there isn't much documentation on them yet (that I could find) - I am only getting into them myself i'll see how it goes.. thanks, that got me on the right track. i came up with this: now searching for user_name:bobby* works as i wanted. my next question: is there a way that i can score matches that are at the start of the string higher than matches in the middle? for example, if i search for steve, i get kelly stevenson before steve jobs. i'd like steve jobs to come first. -jsd-
Re: Wildcard search question
Erik Hatcher wrote: No, because the original data is Bobby Gaza, so Bobby* would match, but not bobby*. "string" type (in the example schema, to be clear) does effectively no analysis, leaving the original string indexed as-is, case and all. [...] stemming and wildcard term queries aren't quite compatible, as you've found, but it does depend on how much of the prefix is provided. bob* matches "bobbi", for example. ok well let's say that i can live without john/jon in the short term. what i really need today is a case insensitive wildcard search with literal matching (no fancy stemming. bobby is bobby, not bobbi.) what are my options? -jsd-
Re: Wildcard search question
Erik Hatcher wrote: Jon, You provided a lot of nice details, thanks for helping us help you :) The one missing piece is the definition of the "text" field type. In Solr's _example_ schema, "bobby" gets analyzed (stemmed) to "bobbi"[1]. When you query for bobby*, the query parser is not running an analyzer on the wildcard query, thus literally searching for terms that begin with "bobby"[2]. As for "steve" , same story, but it analyzes to "steve", which is found with a "steve*" query. so, what's the solution? if i change the field to string, will it be able to find bobby* ? eventually it would be nice to be able to use fuzzy matching, to find 'jon' from 'john', for example. thanks -jsd-
Wildcard search question
When I search with q=bobby I get the following record: 2008-06-23T07:06:40Z http://farm1.static.flickr.com/117/... 9 Bobby Gaza [EMAIL PROTECTED] When I search with bobby* I get nothing. When I search with steve* I get "Steve Ballmer" and "Steve Jobs"... What's going on? The relevant part of my schema.xml is: required="true" /> type_id name
Best type to use for enum-like behavior
I am going to store two totally different types of documents in a single solr instance. Eventually I may separate them into separate instances but we are a long way from having either the size or traffic to require that. I read somewhere that a good approach is to add a 'type' field to the data and then use a filter query. What data type would you use for the type field? I could just an integer but then we have to remember that 1=user, 2=item, and so on. In mysql there's an enum type where you use text labels that are mapped to integers behind the scenes (good performance and user friendly). Is there something similar in solr or should I just use a string? -jsd-
Re: Newbie Q: searching multiple fields
Yonik Seeley wrote: There is your issue: type "string" indexes the whole field value as a single token. You want type "text" like you have on the name field. yep, i noticed that right after i hit send. things are working now. sorry, i did say i was a newbie! -jsd-
Re: Newbie Q: searching multiple fields
Yonik Seeley wrote: Verify all the fields you want to search on indexed Verify that the query is being correctly built by adding debugQuery=true to the request here is the schema.xml extract: required="true" /> here is the debugQuery output. i have no idea how to read it: 0 0 dismax descriptive 1 descriptive descriptive +DisjunctionMaxQuery((tags:descriptive^0.8 | description:descriptive^1.5 | name:descript^2.0)~0.01) DisjunctionMaxQuery((tags:descriptive | description:descriptive^2.0 | name:descript^2.0)~0.01) +(tags:descriptive^0.8 | description:descriptive^1.5 | name:descript^2.0)~0.01 (tags:descriptive | description:descriptive^2.0 | name:descript^2.0)~0.01
Newbie Q: searching multiple fields
I am brand new to Solr. I am trying to get a very simple setup running. I've got just a few fields: name, description, tags. I am only able to search on the default field (name) however. I tried to set up the dismax config to search all the fields, but I never get any results on the other fields. Example doc: 318 Testing the new system Here is the very descriptive description jsd 2008-05-16T05:05:10Z q=system finds this doc. q=descriptive does not. q=descriptive&qt=dismax does not q=descriptive&qt=dismax&qf=description does not my solrconfig contains: explicit 0.01 name^2 description^1.5 tags^0.8 name^2 description^2 tags^1 100 *:* What am I missing? -jsd-