Does DataImportHandler do any sanitizing?

2012-08-15 Thread Jon Drukman
I am pulling some fields from a mysql database using DataImportHandler and
some of them have invalid XML in them.  Does DataImportHandler do any kind
of filtering/sanitizing to ensure that it will go in OK or is it all on me?

Example bad data:  orphaned ampersands ("Peanut Butter & Jelly"), curly
quotes ("we’re")

-jsd-


Re: Running out of memory

2012-08-13 Thread Jon Drukman
On Sun, Aug 12, 2012 at 12:31 PM, Alexey Serba  wrote:

> > It would be vastly preferable if Solr could just exit when it gets a
> memory
> > error, because we have it running under daemontools, and that would cause
> > an automatic restart.
> -XX:OnOutOfMemoryError="; "
> Run user-defined commands when an OutOfMemoryError is first thrown.
>
> > Does Solr require the entire index to fit in memory at all times?
> No.
>
> But it's hard to say about your particular problem without additional
> information. How often do you commit? Do you use faceting? Do you sort
> by Solr fields and if yes what are those fields? And you should also
> check caches.
>

I upgraded to solr-3.6.1 and an extra large amazon instance (15GB RAM) so
we'll see if that helps.  So far no out of memory errors.


Re: DataImportHandler WARNING: Unable to resolve variable

2012-08-10 Thread Jon Drukman
That column does not allow NULL.  It's definitely an empty string, but I'm
using MySQL IF() to catch it and make sure it always has something.

On Thu, Aug 9, 2012 at 8:45 PM, Swati Swoboda wrote:

> Ah, my bad. I was incorrect - it was not actually indexing.
>
> @Jon - is there a possibility that your url_type is NULL, but not empty?
> Your if check only checks to see if it is empty, which is not the same as
> checking to see if it is null. If it is null, that's why you'd be having
> those errors - null values are just not accepted, it seems.
>
> Swati
>
> -Original Message-
> From: Swati Swoboda [mailto:sswob...@igloosoftware.com]
> Sent: Thursday, August 09, 2012 11:09 PM
> To: solr-user@lucene.apache.org
> Subject: RE: DataImportHandler WARNING: Unable to resolve variable
>
> I am getting a similar issue when while using a Template Transformer. My
> fields *always* have a value as well - it is getting indexed correctly.
>
> Furthermore, the number of warnings I get seems arbitrary. I imported one
> document (debug mode) and I got roughly ~400 of those warning messages for
> the single field.
>
> -Original Message-
> From: Jon Drukman [mailto:jdruk...@gmail.com]
> Sent: Thursday, August 09, 2012 2:38 PM
> To: solr-user@lucene.apache.org
> Subject: DataImportHandler WARNING: Unable to resolve variable
>
> I'm trying to use DataImportHandler's delta-import functionality but I'm
> getting loads of these every time it runs:
>
> WARNING: Unable to resolve variable: article.url_type while parsing
> expression: article:${article.url_type}:${article.id}
>
> The definition looks like:
>
>  query="... irrelevant ..."
>
> deltaQuery="select id,'dummy' as type_id FROM articles WHERE
> (post_date > '${dataimporter.last_index_time}' OR updated_date >
> '${dataimporter.last_index_time}') AND post_date <= NOW() AND status =
> 9"
>
> deltaImportQuery="select id, article_seo_title,
> DATE_FORMAT(post_date,'%Y-%m-%dT%H:%i:%sZ') post_date, subject,
>body, IF(url_type='', 'article', url_type) url_type,
> featured_image_url from articles WHERE id = ${dataimporter.delta.id}"
>transformer="TemplateTransformer,HTMLStripTransformer">
> 
> 
> 
> 
>  template="article:${article.url_type}:${
> article.id}" />
> 
> 
> 
> 
>
> As you can see, I am always making sure that article.url_type has some
> value.  Why am I getting the warning?
>
> -jsd-
>


Re: Connect to SOLR over socket file

2012-08-10 Thread Jon Drukman
On Fri, Aug 10, 2012 at 2:44 AM, Jason Axelson wrote:

> You're correct that there is an underlying problem I'm trying to
> solve. The underlying problem is that due to the security policies I
> cannot run another service that listens on a TCP port, but a unix
> domain socket would be okay. It looks like I might have to go with
> mysql full-text search or something like metasearch (I'm using Ruby on
> Rails).
>
>
MySQL full text search is pretty terrible.  You'd be better off using
Lucene directly.

Who's in charge of your security policies?  Can you get dispensation to
listen on localhost only?


Re: /solr/admin/stats.jsp null pointer exception

2012-08-09 Thread Jon Drukman
On Wed, Aug 8, 2012 at 3:03 PM, Chris Hostetter wrote:

> I can't reproduce with teh example configs -- it looks like you've
> tweaked hte logging to use the XML file format, anyway to get the
> stacktrace of the "Caused by" exception so we can see what is null and
> where?
>

Here is the caused by:

Caused by: java.lang.NullPointerException
at org.apache.solr.common.util.XML.escape(XML.java:197)
at org.apache.solr.common.util.XML.escapeCharData(XML.java:79)
at
org.apache.jsp.admin.stats_jsp._jspService(org.apache.jsp.admin.stats_jsp:188)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:109)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:389)
... 29 more


>
> As a workarround, i would suggest switching to
> "/solr/admin/mbeans?stats=true" ... moving forward you'll have to since
> stats.jsp has been removed in Solr 4.
>
>
> good to know.  that's not as readable as the old format but it'll do for
now.  thanks.

-jsd-


/solr/admin/stats.jsp null pointer exception

2012-08-08 Thread Jon Drukman
New install of Solr 3.6.1, getting a Null Pointer Exception when trying to
access admin/stats.jsp:



  2012-08-08T17:55:09
  138509624
  694
  org.apache.solr.servlet.SolrDispatchFilter
  SEVERE
  org.apache.solr.common.SolrException
  log
  25
  org.apache.jasper.JasperException: java.lang.NullPointerException
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:418)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:486)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:380)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:283)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException



Any ideas how to fix this?

-jsd-


Re: Solr always at 100% (or more) CPU

2012-07-09 Thread Jon Drukman
I thought this had to be a joke, but no, you were absolutely right.  Fixed
it right up!

Unbelievable.

Thanks so much!
-jsd-


On Mon, Jul 9, 2012 at 10:15 AM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Are you perhaps being bitten by the leap second bug? Just happened to
> me last week.
>
>
> http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix/
>
> Michael Della Bitta
>
> 
> Appinions, Inc. -- Where Influence Isn’t a Game.
> http://www.appinions.com
>
>
> On Mon, Jul 9, 2012 at 1:13 PM, Jon Drukman  wrote:
> > I have a very small Solr setup.  The index is 32MB and there are only 8
> > fields, most of which are ints.  I run a cron job every hour to use
> > DataImportHandler to do a full reimport of a database which has 42,600
> rows.
> >
> > There is minimal traffic on the server.  Maybe a few dozen queries a
> > minute.  Usually way less than 1 per second.  They look like this:
> >
> > INFO: [] webapp=/solr path=/select
> >
> params={sort=add_date+desc&fl=content_id&start=0&q=*:*&wt=json&fq=tag_id:((10)+AND+(8))&rows=180}
> > hits=35937 status=0 QTime=0
> >
> > INFO: [] webapp=/solr path=/select
> >
> params={sort=add_date+desc&fl=content_id&start=0&q=*:*&wt=json&fq=tag_id:((10)+AND+(791+9))&rows=72}
> > hits=1651 status=0 QTime=6
> >
> > INFO: [] webapp=/solr path=/select
> >
> params={sort=add_date+desc&fl=content_id&start=0&q=*:*&wt=json&fq=tag_id:((2)+AND+(10)+AND+(20+24+16)+AND+(31+32+33+792+793))&rows=250}
> > hits=6 status=0 QTime=1
> >
> > QTime looks good.  That's milliseconds, right?
> >
> > Despite this, solr's java process is constantly using 100% or more CPU.
> >  While writing this email I've seen it jump from 53% to 91% to 154%.
>  It's
> > up and down all over the place.
> >
> > I'm worried what might happen if the traffic load actually shot up.  This
> > doesn't seem healthy.
> >
> > I'm using the Jetty config from the example directory.  Solr 3.5.0
> straight
> > from apache.org.
> >
> > # java -version
> > java version "1.6.0_22"
> > OpenJDK Runtime Environment (IcedTea6 1.10.6)
> > (amazon-52.1.10.6.44.amzn1-x86_64)
> > OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
> >
> > Amazon EC2 running Amazon's standard "Amazon Linux" distribution
> (basically
> > CentOS)
> >
> > Any advice?
> >
> > Thanks
> > -jsd-
>


Solr always at 100% (or more) CPU

2012-07-09 Thread Jon Drukman
I have a very small Solr setup.  The index is 32MB and there are only 8
fields, most of which are ints.  I run a cron job every hour to use
DataImportHandler to do a full reimport of a database which has 42,600 rows.

There is minimal traffic on the server.  Maybe a few dozen queries a
minute.  Usually way less than 1 per second.  They look like this:

INFO: [] webapp=/solr path=/select
params={sort=add_date+desc&fl=content_id&start=0&q=*:*&wt=json&fq=tag_id:((10)+AND+(8))&rows=180}
hits=35937 status=0 QTime=0

INFO: [] webapp=/solr path=/select
params={sort=add_date+desc&fl=content_id&start=0&q=*:*&wt=json&fq=tag_id:((10)+AND+(791+9))&rows=72}
hits=1651 status=0 QTime=6

INFO: [] webapp=/solr path=/select
params={sort=add_date+desc&fl=content_id&start=0&q=*:*&wt=json&fq=tag_id:((2)+AND+(10)+AND+(20+24+16)+AND+(31+32+33+792+793))&rows=250}
hits=6 status=0 QTime=1

QTime looks good.  That's milliseconds, right?

Despite this, solr's java process is constantly using 100% or more CPU.
 While writing this email I've seen it jump from 53% to 91% to 154%.  It's
up and down all over the place.

I'm worried what might happen if the traffic load actually shot up.  This
doesn't seem healthy.

I'm using the Jetty config from the example directory.  Solr 3.5.0 straight
from apache.org.

# java -version
java version "1.6.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10.6)
(amazon-52.1.10.6.44.amzn1-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)

Amazon EC2 running Amazon's standard "Amazon Linux" distribution (basically
CentOS)

Any advice?

Thanks
-jsd-


Re: Exception in DataImportHandler (stack overflow)

2012-05-15 Thread Jon Drukman
OK, setting the wait_timeout back to its previous value and adding readOnly
didn't help, I got the stack overflow again.  I re-upped the mysql timeout
value again.

-jsd-


On Tue, May 15, 2012 at 2:42 PM, Jon Drukman  wrote:

> I fixed it for now by upping the wait_timeout on the mysql server.
>  Apparently Solr doesn't like having its connection yanked out from under
> it and/or isn't smart enough to reconnect if the server goes away.  I'll
> set it back the way it was and try your readOnly option.
>
> Is there an option with DataImportHandler to have it transmit one or more
> arbitrary SQL statements after connecting?  If there was, I could just send
> "SET wait_timeout=86400;" after connecting.  That would probably prevent
> this issue.
>
> -jsd-
>
> On Tue, May 15, 2012 at 2:35 PM, Dyer, James wrote:
>
>> Shot in the dark here, but try adding readOnly="true" to your dataSource
>> tag.
>>
>> 
>>
>> This sets autocommit to true and sets the Holdability to
>> ResultSet.CLOSE_CURSORS_AT_COMMIT.  DIH does not explicitly close
>> resultsets and maybe if your JDBC driver also manages this poorly you could
>> end up with strange conditions like the one you're getting?  It could be a
>> case where your data has grown just over the limit your setup can handle
>> under such an unfortunate circumstance.
>>
>> Let me know if this solves it.  If so, we probably should open a bug
>> report and get this fixed in DIH.
>>
>> James Dyer
>> E-Commerce Systems
>> Ingram Content Group
>> (615) 213-4311
>>
>>
>> -Original Message-
>> From: Jon Drukman [mailto:jdruk...@gmail.com]
>> Sent: Tuesday, May 15, 2012 4:12 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Exception in DataImportHandler (stack overflow)
>>
>> i don't think so, my config is straightforward:
>>
>> 
>>  > url="jdbc:mysql://x/xx"
>> user="x" password="x" batchSize="-1" />
>>  
>>>   query="select content_id, description, title, add_date from
>> content_solr where active = '1'">
>>   >  query="select tag_id from tags_assoc where content_id =
>> '${content.content_id}'" />
>>   >  query="select count(1) as likes from votes where content_id =
>> '${content.content_id}'" />
>>   >  query="select sum(views) as views from media_views mv join
>> content_media cm USING (media_id) WHERE cm.content_id =
>> '${content.content_id}'" />
>>
>>  
>> 
>>
>> i'm triggering the import with:
>>
>> http://localhost:8983/solr/dataimport?command=full-import&clean=true&commit=true
>>
>>
>>
>> On Tue, May 15, 2012 at 2:07 PM, Michael Della Bitta <
>> michael.della.bi...@appinions.com> wrote:
>>
>> > Hi, Jon:
>> >
>> > Well, you don't see that every day!
>> >
>> > Is it possible that you have something weird going on in your DDL
>> > and/or queries, like a tree schema that now suddenly has a cyclical
>> > reference?
>> >
>> > Michael
>> >
>> > On Tue, May 15, 2012 at 4:33 PM, Jon Drukman 
>> wrote:
>> > > I have a machine which does a full update using DataImportHandler
>> every
>> > > hour.  It worked up until a little while ago.  I did not change the
>> > > dataconfig.xml or version of Solr.
>> > >
>> > > Here is the beginning of the error in the log (the real thing runs for
>> > > thousands of lines)
>> > >
>> > > 2012-05-15 12:44:30.724166500 SEVERE: Full Import
>> > > failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
>> > > java.lang.StackOverflowError
>> > > 2012-05-15 12:44:30.724168500 at
>> > >
>> >
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669)
>> > > 2012-05-15 12:44:30.724169500 at
>> > >
>> >
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
>> > > 2012-05-15 12:44:30.724171500 at
>> > >
>> >
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
>> > > 2012-05-15 12:44:30.724219500 at
>> > >
>> >
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
>> > > 2012-

Re: Exception in DataImportHandler (stack overflow)

2012-05-15 Thread Jon Drukman
I fixed it for now by upping the wait_timeout on the mysql server.
 Apparently Solr doesn't like having its connection yanked out from under
it and/or isn't smart enough to reconnect if the server goes away.  I'll
set it back the way it was and try your readOnly option.

Is there an option with DataImportHandler to have it transmit one or more
arbitrary SQL statements after connecting?  If there was, I could just send
"SET wait_timeout=86400;" after connecting.  That would probably prevent
this issue.

-jsd-

On Tue, May 15, 2012 at 2:35 PM, Dyer, James wrote:

> Shot in the dark here, but try adding readOnly="true" to your dataSource
> tag.
>
> 
>
> This sets autocommit to true and sets the Holdability to
> ResultSet.CLOSE_CURSORS_AT_COMMIT.  DIH does not explicitly close
> resultsets and maybe if your JDBC driver also manages this poorly you could
> end up with strange conditions like the one you're getting?  It could be a
> case where your data has grown just over the limit your setup can handle
> under such an unfortunate circumstance.
>
> Let me know if this solves it.  If so, we probably should open a bug
> report and get this fixed in DIH.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Jon Drukman [mailto:jdruk...@gmail.com]
> Sent: Tuesday, May 15, 2012 4:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Exception in DataImportHandler (stack overflow)
>
> i don't think so, my config is straightforward:
>
> 
>   url="jdbc:mysql://x/xx"
> user="x" password="x" batchSize="-1" />
>  
>   query="select content_id, description, title, add_date from
> content_solr where active = '1'">
> query="select tag_id from tags_assoc where content_id =
> '${content.content_id}'" />
> query="select count(1) as likes from votes where content_id =
> '${content.content_id}'" />
> query="select sum(views) as views from media_views mv join
> content_media cm USING (media_id) WHERE cm.content_id =
> '${content.content_id}'" />
>
>  
> 
>
> i'm triggering the import with:
>
> http://localhost:8983/solr/dataimport?command=full-import&clean=true&commit=true
>
>
>
> On Tue, May 15, 2012 at 2:07 PM, Michael Della Bitta <
> michael.della.bi...@appinions.com> wrote:
>
> > Hi, Jon:
> >
> > Well, you don't see that every day!
> >
> > Is it possible that you have something weird going on in your DDL
> > and/or queries, like a tree schema that now suddenly has a cyclical
> > reference?
> >
> > Michael
> >
> > On Tue, May 15, 2012 at 4:33 PM, Jon Drukman  wrote:
> > > I have a machine which does a full update using DataImportHandler every
> > > hour.  It worked up until a little while ago.  I did not change the
> > > dataconfig.xml or version of Solr.
> > >
> > > Here is the beginning of the error in the log (the real thing runs for
> > > thousands of lines)
> > >
> > > 2012-05-15 12:44:30.724166500 SEVERE: Full Import
> > > failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
> > > java.lang.StackOverflowError
> > > 2012-05-15 12:44:30.724168500 at
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669)
> > > 2012-05-15 12:44:30.724169500 at
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
> > > 2012-05-15 12:44:30.724171500 at
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
> > > 2012-05-15 12:44:30.724219500 at
> > >
> >
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
> > > 2012-05-15 12:44:30.724221500 at
> > >
> >
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
> > > 2012-05-15 12:44:30.724223500 at
> > >
> >
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
> > > 2012-05-15 12:44:30.724224500 Caused by: java.lang.StackOverflowError
> > > 2012-05-15 12:44:30.724225500 at
> > > java.lang.String.checkBounds(String.java:404)
> > > 2012-05-15 12:44:30.724234500 at
> java.lang.String.(String.java:450)
> > > 2012-05-15 12:44:30.724235500 at
> java.lang.String.(String.java:523)
> > > 2012-05-15 12:44:30.724236

Re: Exception in DataImportHandler (stack overflow)

2012-05-15 Thread Jon Drukman
i don't think so, my config is straightforward:


  
  

   
   
   

  


i'm triggering the import with:
http://localhost:8983/solr/dataimport?command=full-import&clean=true&commit=true



On Tue, May 15, 2012 at 2:07 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Hi, Jon:
>
> Well, you don't see that every day!
>
> Is it possible that you have something weird going on in your DDL
> and/or queries, like a tree schema that now suddenly has a cyclical
> reference?
>
> Michael
>
> On Tue, May 15, 2012 at 4:33 PM, Jon Drukman  wrote:
> > I have a machine which does a full update using DataImportHandler every
> > hour.  It worked up until a little while ago.  I did not change the
> > dataconfig.xml or version of Solr.
> >
> > Here is the beginning of the error in the log (the real thing runs for
> > thousands of lines)
> >
> > 2012-05-15 12:44:30.724166500 SEVERE: Full Import
> > failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
> > java.lang.StackOverflowError
> > 2012-05-15 12:44:30.724168500 at
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669)
> > 2012-05-15 12:44:30.724169500 at
> >
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
> > 2012-05-15 12:44:30.724171500 at
> >
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
> > 2012-05-15 12:44:30.724219500 at
> >
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
> > 2012-05-15 12:44:30.724221500 at
> >
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
> > 2012-05-15 12:44:30.724223500 at
> >
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
> > 2012-05-15 12:44:30.724224500 Caused by: java.lang.StackOverflowError
> > 2012-05-15 12:44:30.724225500 at
> > java.lang.String.checkBounds(String.java:404)
> > 2012-05-15 12:44:30.724234500 at java.lang.String.(String.java:450)
> > 2012-05-15 12:44:30.724235500 at java.lang.String.(String.java:523)
> > 2012-05-15 12:44:30.724236500 at
> > java.net.SocketOutputStream.socketWrite0(Native Method)
> > 2012-05-15 12:44:30.724238500 at
> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
> > 2012-05-15 12:44:30.724239500 at
> > java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> > 2012-05-15 12:44:30.724253500 at
> > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> > 2012-05-15 12:44:30.724254500 at
> > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> > 2012-05-15 12:44:30.724256500 at
> > com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3345)
> > 2012-05-15 12:44:30.724257500 at
> > com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1983)
> > 2012-05-15 12:44:30.724259500 at
> > com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163)
> > 2012-05-15 12:44:30.724267500 at
> > com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2618)
> > 2012-05-15 12:44:30.724268500 at
> >
> com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1644)
> > 2012-05-15 12:44:30.724270500 at
> > com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:198)
> > 2012-05-15 12:44:30.724271500 at
> > com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7617)
> > 2012-05-15 12:44:30.724273500 at
> > com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:907)
> > 2012-05-15 12:44:30.724280500 at
> > com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2478)
> > 2012-05-15 12:44:30.724282500 at
> >
> com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1584)
> > 2012-05-15 12:44:30.724283500 at
> > com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4364)
> > 2012-05-15 12:44:30.724285500 at
> > com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1360)
> > 2012-05-15 12:44:30.724286500 at
> > com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2652)
> > 2012-05-15 12:44:30.724321500 at
> >
> com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1644)
> > 2012-05-15 12:44:30.724322500 at
> > com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:198)
> > 2012-05-15 12:44:30.724324500 at
> > com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7617)
> > 2012-05-15 12:44:30.724325500 at
> > com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:907)
> > 2012-05-15 12:44:30.724327500 at
> > com.mysql.jd

Facet auto-suggest

2012-01-17 Thread Jon Drukman
I don't even know what to call this feature. Here's a website that shows
the problem:

http://pulse.audiusanews.com/pulse/index.php

Notice that you can end up in a situation where there are no results.
For example,
in order, press: People, Performance, Technology, Photos. The client
wants it so that when you click a button, it disables buttons that would
lead to a dead end. In other words, after clicking Technology, the Photos
button would be disabled.

Can Solr help with this?

-jsd-



Re: Case insensitive but number sensitive string?

2011-02-25 Thread Jon Drukman
Ahmet Arslan  yahoo.com> writes:

> 
> > I want a string field that is case
> > insensitive.  This is what I tried:
> > 
> >   > sortMissingLast="true"
> > omitNorms="true">
> >         
> >                
> > 
> >         
> >         
> >                
> > 
> >         
> >     
> > 
> > 
> > However, it is matching "opengl" for "opengl128".  I
> > want exact string matches,
> > but I want them case-insensitive.  What did I do
> > wrong?
> > 
> 
> class="solr.StrField" should be class="solr.TextField" 
> 
> 

This is what I ended up with. Seems to work:

 
















Case insensitive but number sensitive string?

2011-02-25 Thread Jon Drukman
I want a string field that is case insensitive.  This is what I tried:

 









However, it is matching "opengl" for "opengl128".  I want exact string matches,
but I want them case-insensitive.  What did I do wrong?



Sorting - bad performance

2011-02-22 Thread Jon Drukman
The performance factors wiki says:
"If you do a lot of field based sorting, it is advantageous to add explicitly
warming queries to the "newSearcher" and "firstSearcher" event listeners in your
solrconfig which sort on those fields, so the FieldCache is populated prior to
any queries being executed by your users."

I've got an index with 24+ million docs of forum posts from users.  I want to be
able to get a given user's posts sorted by date.  It's taking 20 seconds right
now.  What would I put in the newSearch/firstSearcher to make that quicker?  Is
there any other general approach I can use to speed up sorting?

The schema looks like

 
   
   
   
   
   
 

cistring is a case-insensitive string type i created:

   










Shutdown hook executing for a long time

2011-02-16 Thread Jon Drukman
2011-02-16 11:32:45.489::INFO:  Shutdown hook executing
2011-02-16 11:35:36.002::INFO:  Shutdown hook complete

The shutdown time seems to be proportional to the amount of time that Solr has
been running.  If I immediately restart and shut down again, it takes a fraction
of a second.  What causes it to take so long to shut down and is there anything
I can do to make it happen quicker?



DataImportHandler: regex debugging

2011-02-09 Thread Jon Drukman
I am trying to use the regex transformer but it's not returning anything. 
Either my regex is wrong, or I've done something else wrong in the setup of the
entity.  Is there any way to debug this?  Making a change and waiting 7 minutes
to reindex the entity sucks.






This returns columns that are either null, or have some comma-separated strings.
I want the bit up to the first comma, if it exists.

Ideally I could have it log the query and the input/output
of the field statements.



DataImportHandler: no queries when using entity=something

2011-02-02 Thread Jon Drukman
So I'm trying to update a single entity in my index using DataImportHandler.

http://solr:8983/solr/dataimport?command=full-import&entity=games

It ends near-instantaneously without hitting the database at all, apparently.

Status shows:

0
0
0
0

Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.

2011-02-02 16:24:13
2011-02-02 16:24:13
0:0:0.20

The query isn't that extreme.  It returns 8771 rows in about 3 seconds.

How can I debug this?



Re: DataImportHandler: full import of a single entity

2011-01-18 Thread Jon Drukman
Ahmet Arslan  yahoo.com> writes:

> 
> > I've got a DataImportHandler set up
> > with 5 entities.  I would like to do a full
> > import on just one entity.  Is that possible?
> > 
> 
> Yes, there is a parameter named entity for that. 
> solr/dataimport?command=full-import&entity=myEntity

That seems to delete the entire index and replace it with only the contents of
that one entity.  Is there no way to leave the index alone for the other
entities and just redo that one?



DataImportHandler: full import of a single entity

2011-01-14 Thread Jon Drukman
I've got a DataImportHandler set up with 5 entities.  I would like to do a full
import on just one entity.  Is that possible?

I worked around it temporarily by hand editing the dataimport.properties file
and deleting the delta line for that one entity, and kicking off a delta.  But
for (hopefully) obvious reasons, delta is less efficient than full.

-jsd-



Boosting on a document value

2010-11-15 Thread Jon Drukman
I've got a document with a "type" field.  If the type is 1, I want to boost the
document's relevancy, but type=1 is not a requirement.  Types other than 1
should still be returned and scored as normal, just without the boost.

How do I do this?

-jsd-




Re: Searching with AND + OR and spaces

2010-11-12 Thread Jon Drukman
Ahmet Arslan  yahoo.com> writes:

> 
> > (title:"Call of Duty" OR subhead:"Call of Duty")
> > 
> > No matches, despite the fact that there are many documents
> > that should match.
> 
> Field types of  title and subhead are important here. Do you use
stopwordfilterfactory with enable
> position increments? 

   
   

text is the default that comes with schema.xml, it has the enable position
increments stopwordfilterfactory.

> What is you solr version?

1.4


> > So I left out the quotes, and it seems to work.  But
> > now when I try doing things
> > like
> > 
> > title:Call of Duty OR subhead:Call of Duty AND type:4
> > 
> 
> Try using parenthesis. 
> title:(Call of Duty) OR subhead:(Call of Duty) AND type:4

that seems to work a lot better, thanks!!



Searching with AND + OR and spaces

2010-11-12 Thread Jon Drukman
I want to search two fields for the phrase Call Of Duty.  I tried this:

(title:"Call of Duty" OR subhead:"Call of Duty")

No matches, despite the fact that there are many documents that should match.

So I left out the quotes, and it seems to work.  But now when I try doing things
like

title:Call of Duty OR subhead:Call of Duty AND type:4

I get a lot of things like "called it!" and "i'm taking calls" but call of duty
doesn't surface.

How can I get what I want?

-jsd-




Re: SEVERE: Could not start SOLR. Check solr/home property

2010-04-28 Thread Jon Drukman

On 4/27/10 12:04 PM, Chris Hostetter wrote:


: SEVERE: Could not start SOLR. Check solr/home property

it means something when horribly wrong when starting solr, and since this
is frequently caused by either an incorrect explicit solr/home or an
incorrect implicitly guessed solr home, that is mentioned in the error
message as something to check.

it's part of that error message because in cases where the solr home is
the problem, that may be the only meaningful error message.

in your case however, you have a much more specific error message...

:  java.lang.RuntimeException: java.io.IOException: read past EOF
:at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
:at org.apache.solr.core.SolrCore.(SolrCore.java:579)
:at

...so something seems to be seriously wrong with your index. farther down
below this exception in your logs, there should be a more detailed
execption explaining what file it had problems reading.

If you could post that *full* error message it might help us track this
down ... i would also suggest trying to use the CheckIndex tool to see if
somehow your index got corrupted.


Yes, the index was corrupted.  I don't know how it happened.  Like I 
said, I set the box up months ago and forgot about it.  They decided 
they wanted to use it so I tried to fire it up.  After deleting the 
index, solr started again just fine without any configuration changes. 
Like I said, I have never explicitly set solr/home, in any of my 
production configs, and it always works.


Thanks
-jsd-







Re: SEVERE: Could not start SOLR. Check solr/home property

2010-04-26 Thread Jon Drukman

On 4/26/10 1:18 PM, Siddhant Goel wrote:

Did you by any chance set up multicore? Try passing in the path to the Solr
home directory as -Dsolr.solr.home=/path/to/solr/home while you start Solr.


Nope, no multicore.

I destroyed the index and re-created it from scratch and now it works 
fine.  No idea what was going on there.  Luckily it takes < 10 minutes 
to create and the box is not in production yet.




SEVERE: Could not start SOLR. Check solr/home property

2010-04-26 Thread Jon Drukman

What does this error mean?

SEVERE: Could not start SOLR. Check solr/home property

I've had this solr installation working before, but I haven't looked at 
it in a few months.  I checked it today and the web side is returning a 
500 error, the log file shows this when starting up:



SEVERE: Could not start SOLR. Check solr/home property
 java.lang.RuntimeException: java.io.IOException: read past EOF
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
   at org.apache.solr.core.SolrCore.(SolrCore.java:579)
   at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
   at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)

   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)

   at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
   at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)


For the record, I've never explictly set "solr/home" ever.  It always 
"just worked".


-jsd-



Boost documents based on a constant value in a field

2010-02-05 Thread Jon Drukman

I have a very simple schema: two integers and two text fields.


   required="true" />

   
   stored="true"/>

   


I want to do full text searching on the text fields as normal.  However, 
I want to boost all documents where question_source == 3 to the top. 
How do I do that?


So the results should be:

All documents where question_source == 3 first, sorted by relevance in 
the text fields.


All other documents sorted by text field relevance.

How do I achieve this?

-jsd-



DataImportHandler delta-import confusion

2010-02-01 Thread Jon Drukman
First, let me just say that DataImportHandler is fantastic. It got my 
old mysql-php-xml index rebuild process down from 30 hours to 6 minutes.


I'm trying to use the delta-import functionality now but failing miserably.

Here's my entity tag:  (some SELECT statements reduced to increase 
readability)


  deltaQuery="select moment_id from moments where date_modified > 
'${dataimporter.last_index_time}'"


  deltaImportQuery="select [bunch of stuff]
WHERE m.moment_id = '${dataimporter.delta.MOMENTID}'"

  pk="MOMENTID"

  transformer="TemplateTransformer">

When I look at the MySQL query log I see the date modified query running 
fine and returning 3 rows.  The deltaImportQuery, however, does not have 
the proper primary key in the where clause.  It's just blank.  I also 
tried changing it to ${moment.MOMENTID}.


I don't really get the relation between the pk field and the 
${dataimport.delta.whatever} stuff.


Help please!
-jsd-




Re: stemming (maybe?) question

2009-03-17 Thread Jon Drukman

Yonik Seeley wrote:

Not sure... I just took the stock solr example, and it worked fine.

I inserted "o'meara" into example/exampledocs/solr.xml
 Advanced o'meara Full-Text Search
Capabilities using Lucene

the indexed everything:  ./post.sh *.xml

Then queried in various ways:
q=o'meara
q=omeara
q=o%20meara

All of the queries found the solr doc.


i grabbed the original example schema.xml and made my username field use 
the following definition:


positionIncrementGap="100">

  

generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>



  
  

synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
generateWordParts="1" generateNumberParts="1" catenateWords="0" 
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>



  



i removed the stopwords and porter stuff because for proper names i 
don't want that.


seems to work fine now, thanks!
-jsd-



Re: stemming (maybe?) question

2009-03-16 Thread Jon Drukman

Yonik Seeley wrote:

On Thu, Mar 12, 2009 at 1:36 PM, Jon Drukman  wrote:

is it possible to make solr think that "omeara" and "o'meara" are the same
thing?


WordDelimiter would handle it if the document had "o'meara" (but you
may or may not want the other stuff that comes with
WordDelimiterFilter).
You could also use a PatternReplaceFilter to normalize tokens like this.


the document does have o'meara in it.  i tried creating a new field type 
based on the wiki information.


positionIncrementGap="100">

  
  
  
  
  
  
  
  
  
  
  




i reindexed everything but now any search on that field returns zero 
results.  what did i do wrong?


-jsd-



stemming (maybe?) question

2009-03-12 Thread Jon Drukman
is it possible to make solr think that "omeara" and "o'meara" are the 
same thing?


-jsd-



Re: exceeded limit of maxWarmingSearchers

2009-02-09 Thread Jon Drukman

Otis Gospodnetic wrote:

I'd say: "Make sure you don't commit more frequently than the time it takes for your 
searcher to warm up", or else you risk searcher overlap and pile-up.


cool.  i found a place in our code where we were committing the same 
thing twice in very rapid succession.  fingers crossed that fixing that 
will solve this problem once and for all.


thanks
-jsd-



Re: exceeded limit of maxWarmingSearchers

2009-02-05 Thread Jon Drukman

Otis Gospodnetic wrote:

Jon,

If you can, don't commit on every update and that should help or fully solve 
your problem.


is there any sort of heuristic or formula i can apply that can tell me 
when to commit?  put it in a cron job and fire it once per hour?


there are certain updates that are critical - we store privacy settings 
on certain data in the doc.  if the user says that document 10 is 
private, we need to have the update reflected immediately.  is there any 
way to have solr block everything until an update is committed?


-jsd-



Re: exceeded limit of maxWarmingSearchers

2009-02-04 Thread Jon Drukman

Otis Gospodnetic wrote:

That should be fine (but apparently isn't), as long as you don't have some very 
slow machine or if your caches are are large and configured to copy a lot of 
data on commit.



this is becoming more and more problematic.  we have periods where we 
get 10 of these exceptions in a 4 second period.  how do i diagnose what 
the cause is, or alternatively work around it?


when you say "copy" are you talking about copyFields or something else?

we commit on every update, but each update is very small... just a few 
hundred bytes on average.




Re: exceeded limit of maxWarmingSearchers

2009-01-30 Thread Jon Drukman

Yonik Seeley wrote:

I'd advise setting it to a very low limit (like 2) and committing less
often.  Once you get too many overlapping searchers, things will slow
to a crawl and that will just cause more to pile up.

The root cause is simply too many commits in conjunction with warming
too long.  If you are using a dev version of Solr 1.4, you might try
commitWithin instead of explicit commits. (see SOLR-793)  Depending
how long warming takes, you may want to lower autowarm counts.


right now we commit on every update, but that's probably not more than 
once every few minutes.  should i back it off?


-jsd-



exceeded limit of maxWarmingSearchers

2009-01-30 Thread Jon Drukman

I am getting hit by a storm of these once a day or so:

SEVERE: org.apache.solr.common.SolrException: Error opening new 
searcher. exceeded limit of maxWarmingSearchers=16, try again later.


I keep bumping up maxWarmingSearchers.  It's at 32 now.  Is there any 
way to figure out what the "right" value is besides trial and error? 
Our site gets extremely minimal traffic so I'm really puzzled why the 
out-of-the-box settings are insufficient.


The index has about 61000 documents, very small, and we do less than one 
query per second.


-jsd-



Re: I get SEVERE: Lock obtain timed out

2009-01-29 Thread Jon Drukman

Yonik Seeley wrote:

On Thu, Jan 29, 2009 at 1:16 PM, Jon Drukman  wrote:

Julian, have you had any luck figuring this out?  My production instance
just started having this problem.  It seems to crop up after solr's been
running for several hours.  Our usage is very light (maybe one query every
few seconds).  I saw someone else mention an out of memory error - this
machine has 8GB of RAM and is running 64bit Linux so it's all available to
solr.  Our index is very small - under 40MB.  the solr process is using
around 615MB of RAM according to top.


I've only seen failure to remove the lock file either when an OOM
exception occured, or the JVM died or was killed.


i guess it's possible that we hit an out of memory error and the 
followup lock errors just bumped it out of the log file rotation.  i was 
running with multilog's default settings so my log files were getting 
thrown out very quickly.  i just bumped up the JVM's max heap size and 
told multilog to keep way more log files so if this happens again 
hopefully i will be able to get more info on what happened.


-jsd-



Re: permanently setting log level?

2009-01-29 Thread Jon Drukman

Vannia Rajan wrote:

On Thu, Jan 29, 2009 at 11:55 PM, Jon Drukman  wrote:


if i go to /solr/admin/logging, i can set the "root" log level to WARNING,
which is what i want.  however, every time solr restarts, it is set back to
INFO.  Is there a way to get the WARNING level to stick permanently?



Hi,
You can set permanent logging-level by changing parameters in
$CATALINA_HOME/conf/logging.properties

Change all INFO to WARNING in the logging.properties

where, $CATALINA_HOME is the path of your apache-tomcat.



i'm not using tomcat, i'm using the default jetty setup that comes with 
solr.  i grepped through the entire solr installation for 'INFO' but i 
don't see it.


i don't really know anything about jetty other than i have to run java 
-jar start.jar to get it to run solr.




permanently setting log level?

2009-01-29 Thread Jon Drukman
if i go to /solr/admin/logging, i can set the "root" log level to 
WARNING, which is what i want.  however, every time solr restarts, it is 
set back to INFO.  Is there a way to get the WARNING level to stick 
permanently?


-jsd-



Re: I get SEVERE: Lock obtain timed out

2009-01-29 Thread Jon Drukman

Julian Davchev wrote:

Hi,
Any documents or something I can read on how locks work and how I can
controll it. When do they occur etc.
Cause only way I got out of this mess was restarting tomcat

SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out: SingleInstanceLock: write.lock


Cheers,



Julian, have you had any luck figuring this out?  My production instance 
just started having this problem.  It seems to crop up after solr's been 
running for several hours.  Our usage is very light (maybe one query 
every few seconds).  I saw someone else mention an out of memory error - 
this machine has 8GB of RAM and is running 64bit Linux so it's all 
available to solr.  Our index is very small - under 40MB.  the solr 
process is using around 615MB of RAM according to top.




Handling proper names

2008-11-07 Thread Jon Drukman
Is there any way to tell Solr that Stephen is the same as Steven and 
Steve?  Carl and Karl?  Bobby/Bob/Robert, and so on...


-jsd-



Re: exceeded limit of maxWarmingSearchers

2008-10-29 Thread Jon Drukman

Feak, Todd wrote:
Have you looked at how long your warm up is taking? 


If it's taking longer to warm up a searcher then it does for you to do
an update, you will be behind the curve and eventually run into this no
matter how big that number.


Most of them say warmupTime=0.  It ranges from 0 to 37.  I hope that is 
msec and not seconds!!


As I said, this server is not even remotely loaded, and the index is 
very small right now - under 5 MB.


-jsd-



exceeded limit of maxWarmingSearchers

2008-10-29 Thread Jon Drukman

I am getting this error quite frequently on my Solr installation:

SEVERE: org.apache.solr.common.SolrException: Error opening new 
searcher. exceeded limit of maxWarmingSearchers=8, try again later.



I've done some googling but the common explanation of it being related 
to autocommit doesn't apply.


Our server is not even in public use yet, it's serving maybe one query 
every second, or less.  I don't understand what could be causing this.


We do a commit on every update, but updates are very infrequent.  One 
every few minutes, and it's a very small update as well.


-jsd-



dismax and stopwords (was Re: dismax and long phrases)

2008-10-09 Thread Jon Drukman

Norberto Meijome wrote:

On Tue, 07 Oct 2008 09:27:30 -0700
Jon Drukman <[EMAIL PROTECTED]> wrote:

Yep, you can "fake" it by only using fieldsets (qf) that have a 
consistent set of stopwords.  

does that mean changing the query or changing the schema?


Jon,
- you change schema.xml to define which type each field is. The fieldType says 
whether you have stopwords or not.
- you change solrconfig.xml to define which fields will dismax query on.

i dont think you should have to change your query.


i got it to work.  the solution is:

add a new field to the schema without stopwords, i use the following type:

  positionIncrementGap="100">

  


  



then use copyField to copy the stopworded version to a second, 
non-stopworded field.  add the non-stopword field to the dismax qf and 
pf fields.  in this example, the stopword field is name and the 
non-stopword field is name_text:


 
name^1.5 name_text^1.8 description^1.0 tags^0.5 location^0.6 
user_name^0.4 misc^0.3 group_name^1.5

 
 
name^1.5 name_text^1.8 description^1.0 group_name^1.5
 


restart solr and reindex everything.  it now works.

thanks for all the help!

-jsd-



Re: dismax and long phrases

2008-10-07 Thread Jon Drukman

Mike Klaas wrote:


On 6-Oct-08, at 11:20 AM, Jon Drukman wrote:


Chris Hostetter wrote:
It's not a bug in the implementation, it's a side effect of the basic 
tenent of how dismax works since it inverts the input and creates a 
DisjunctionMaxQuery for each "word" in the input, any word that is 
valid in at least one of the "qf" fields generates a "should" clause 
that contributes to the MM count.


you guys are going way over my head now.

is there any way i could 'fake' it by adding a second field without 
stopwords, or something like that?


Yep, you can "fake" it by only using fieldsets (qf) that have a 
consistent set of stopwords.


does that mean changing the query or changing the schema?

i'm sorry, this is all new to me.   speak slowly and use words of one 
syllable or less, please.  :)


-jsd-




Re: dismax and long phrases

2008-10-06 Thread Jon Drukman

Chris Hostetter wrote:
It's not a bug in the implementation, it's a side effect of the basic 
tenent of how dismax works since it inverts the input and creates a 
DisjunctionMaxQuery for each "word" in the input, any word that is valid 
in at least one of the "qf" fields generates a "should" clause that 
contributes to the MM count.  


you guys are going way over my head now.

is there any way i could 'fake' it by adding a second field without 
stopwords, or something like that?


-jsd-



dismax and long phrases

2008-10-03 Thread Jon Drukman

i have a document with the following field

Saying goodbye to Norman

if i search for "saying goodbye to norman" with the standard query, it 
works fine.  if i specify dismax, however, it does not match.  here's 
the output of debugQuery, which I don't understand at all:


saying goodbye to norman
saying goodbye to norman
+((DisjunctionMaxQuery((user_name:saying^0.4 | 
description:say | tags:say^0.5 | misc:say^0.3 | group_name:say^1.5 | 
location:saying^0.6 | name:say^1.5)~0.01) 
DisjunctionMaxQuery((user_name:goodbye^0.4 | description:goodby | 
tags:goodby^0.5 | misc:goodby^0.3 | group_name:goodby^1.5 | 
location:goodbye^0.6 | name:goodby^1.5)~0.01) 
DisjunctionMaxQuery((user_name:to^0.4 | location:to^0.6)~0.01) 
DisjunctionMaxQuery((user_name:norman^0.4 | description:norman | 
tags:norman^0.5 | misc:norman^0.3 | group_name:norman^1.5 | 
location:norman^0.6 | name:norman^1.5)~0.01))~4) 
DisjunctionMaxQuery((description:"say goodby norman"~100 | 
group_name:"say goodby norman"~100^1.5 | name:"say goodby 
norman"~100^1.5)~0.01)
+(((user_name:saying^0.4 | 
description:say | tags:say^0.5 | misc:say^0.3 | group_name:say^1.5 | 
location:saying^0.6 | name:say^1.5)~0.01 (user_name:goodbye^0.4 | 
description:goodby | tags:goodby^0.5 | misc:goodby^0.3 | 
group_name:goodby^1.5 | location:goodbye^0.6 | name:goodby^1.5)~0.01 
(user_name:to^0.4 | location:to^0.6)~0.01 (user_name:norman^0.4 | 
description:norman | tags:norman^0.5 | misc:norman^0.3 | 
group_name:norman^1.5 | location:norman^0.6 | name:norman^1.5)~0.01)~4) 
(description:"say goodby norman"~100 | group_name:"say goodby 
norman"~100^1.5 | name:"say goodby norman"~100^1.5)~0.01




it works fine if I search for "say goodbye" or "saying goodbye" or 
"saying goodbye norman".  how can i get it to do exact matches (which 
should score very high)?



-jsd-



Re: help required: how to design a large scale solr system

2008-09-24 Thread Jon Drukman

Martin Iwanowski wrote:
How can I setup to run Solr as a service, so I don't need to have a SSH 
connection open?


The advice that I was given on this very list was to use daemontools.  I 
set it up and it is really great - starts when the machine boots, 
auto-restart on failures, easy to bring up/down on demand.  Search the 
archive for my post on the subject, I explained how to set it up in detail.


(I've also had success using launchd to manage Solr on Mac OS X in case 
anyone wants to try running it on their desktop.)


-jsd-



Re: dismax - undefined field exception

2008-09-22 Thread Jon Drukman

Sean Timm wrote:
Add echoParams=all to your URL and look for the "cat" field in one of 
the passed parameters.  Specifically, in pf and qf.  These can be 
defaulted in the solrconfig.xml file.


i tried that but the exception prevents solr from returning anything.

but i did look in solrconfig.xml and i see what you're talking about. 
looks like that was the ticket.  thanks!


-jsd-





dismax - undefined field exception

2008-09-22 Thread Jon Drukman

whenever i try to use qt=dismax i get the following error:

Sep 22, 2008 11:50:48 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: undefined field cat
at 
org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1053)



i don't have any dynamic fields in my schema, and there is nothing named 
'cat'.


my schema looks like this (minus the parts that came with the default 
schema.xml):


 
   required="true" />

   
   
   
   
   
   
   
   
   
   
   
   
   
 

 type_id

 name


i thought i used to have this working but now i'm not so sure.

-jsd-



How to use copyfield with dynamicfield?

2008-09-22 Thread Jon Drukman

I have a dynamicField declaration:




I want to copy any *_t's into a text field for searching with dismax. 
As it is, it appears you can't search dynamicfields this way.


I tried adding a copyField:



I do have a text field in my schema:
 


However I get 400 errors whenever I try to update a record with entries 
in the *_t.



INFO: /update  0 2
Sep 22, 2008 10:04:40 AM org.apache.solr.core.SolrException log
SEVERE: org.apache.solr.core.SolrException: ERROR: multiple values 
encountered for non multiValued field text: first='Centennial Dr, 
Oakland, CA' second=''
at 
org.apache.solr.update.DocumentBuilder.addSingleField(DocumentBuilder.java:62)


I'm going to guess that the copyField with a wildcard is not allowed. 
If that is true, how does one deal with the situation where you want to 
allow new fields AND have them searchable?


-jsd-



Re: Illegal character in xml file

2008-09-19 Thread Jon Drukman
James liu wrote:
> first, u should escape some string like (code by php)
> 
>> function escapeChars($string) {
>>
> $string = str_replace("&", "&", $string);
> 
> $string = str_replace("<", "<", $string);
> 
> $string = str_replace(">", ">", $string);
> 
> $string = str_replace("'", "'", $string);
> 
> $string = str_replace('"', """, $string);
> 
> 
> return $string;
> 
> }

php has this as a built in function.

$string = htmlentities($string);

that's what i use to protect my solr input.

-jsd-



Re: Dismax + Dynamic fields

2008-09-18 Thread Jon Drukman

Daniel Papasian wrote:

Norberto Meijome wrote:

Thanks Yonik. ok, that matches what I've seen - if i know the actual
name of the field I'm after, I can use it in a query it, but i can't
use the dynamic_field_name_* (with wildcard) in the config.

Is adding support for this something that is desirable / needed
(doable??) , and is it being worked on ?


You can use a wildcard with copyFrom to copy the dynamic fields that
match the pattern to another field that you can then query on. It seems
like that would cover your needs, no?


this is biting me right now and i don't understand how to specify the 
copyFrom to do what i want.


i have a dynamic field declaration like:



in the documents that i'm adding i am specifying location_t and group_t, 
for example, although i may decide to add more later - obviously that 
seems like the ideal use case for the dynamicField.  however i cannot 
search these fields unless i specify them explicitly 
(q=location_t:something) and it doesn't work with dismax.


i want all fields searchable, otherwise why would i bother with 
indexed="true" in the dynamicField?


how do i use copyFrom to search location_t, group_t and any other _t i 
might decide to add later?


-jsd-




Adding a field?

2008-08-26 Thread Jon Drukman
Is there a way to add a field to an existing index without stopping the 
server, deleting the index, and reloading every document from scratch?


-jsd-



Re: Solr won't start under jetty on RHEL5.2

2008-08-18 Thread Jon Drukman

Jon Drukman wrote:
I just migrated my solr instance to a new server, running RHEL5.2.  I 
installed java from yum but I suspect it's different from the one I used 
to use.



Turns out my instincts were correct.  The version from yum does not 
work. I installed the official sun jdk and now it starts fine.


bad:

java version "1.4.2"
gij (GNU libgcj) version 4.1.2 20071124 (Red Hat 4.1.2-42)

good:

java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)


-jsd-



Solr won't start under jetty on RHEL5.2

2008-08-18 Thread Jon Drukman
I just migrated my solr instance to a new server, running RHEL5.2.  I 
installed java from yum but I suspect it's different from the one I used 
to use.


Anyway, my Solr no longer works.

2008-08-18 18:01:12.079::INFO:  Logging to STDERR via 
org.mortbay.log.StdErrLog

2008-08-18 18:01:12.229::INFO:  jetty-6.1.3
2008-08-18 18:01:12.330::INFO:  Extract 
jar:file:/home/apps/solr/solr-1.2.0/webapps/solr.war!/ to 
/tmp/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp
2008-08-18 18:01:12.452::INFO:  NO JSP Support for /solr, did not find 
org.apache.jasper.servlet.JspServlet

18-Aug-08 6:01:12 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
18-Aug-08 6:01:12 PM org.apache.solr.core.Config getInstanceDir
INFO: JNDI not configured for Solr (NoInitialContextEx)
18-Aug-08 6:01:12 PM org.apache.solr.core.Config getInstanceDir
INFO: Solr home defaulted to 'null' (could not find system property or JNDI)
18-Aug-08 6:01:12 PM org.apache.solr.core.Config setInstanceDir
INFO: Solr home set to 'solr/'
18-Aug-08 6:01:12 PM org.apache.solr.core.SolrConfig initConfig
INFO: Loaded SolrConfig: solrconfig.xml
18-Aug-08 6:01:12 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: user.dir=/home/apps/solr/solr-1.2.0
2008-08-18 18:01:12.663::WARN:  failed SolrRequestFilter
java.lang.NoClassDefFoundError: org.apache.solr.core.SolrCore
   at java.lang.Class.initializeClass(libgcj.so.7rh)
   at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)

   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)

   at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
   at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
   at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
   at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)

   at org.mortbay.jetty.Server.doStart(Server.java:210)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)

   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
   at java.lang.reflect.Method.invoke(libgcj.so.7rh)
   at org.mortbay.start.Main.invokeMain(Main.java:183)
   at org.mortbay.start.Main.start(Main.java:497)
   at org.mortbay.start.Main.main(Main.java:115)


All attempts to load solr pages result in 404 not found errors.  I 
suspect this is a Jetty configuration problem but I know nothing about 
jetty or servlet containers or anything like that.  Could someone 
explain in words of one syllable or less how to get it to find the 
installation please?


Thanks
-jsd-



Re: Administrative questions

2008-08-15 Thread Jon Drukman

Jason Rennie wrote:

On Wed, Aug 13, 2008 at 1:52 PM, Jon Drukman <[EMAIL PROTECTED]> wrote:


Duh.  I should have thought of that.  I'm a big fan of djbdns so I'm quite
familiar with daemontools.

Thanks!



:)  My pleasure.  Was nice to hear recently that DJB is moving toward more
flexible licensing terms.  For anyone unfamiliar w/ daemontools, here's
DJB's explanation of why they rock compared to inittab, ttys, init.d, and
rc.local:

http://cr.yp.to/daemontools/faq/create.html#why


in case anybody wants to know, here's how to run solr under daemontools.

1. install daemontools
2. create /etc/solr
3. create a user and group called solr
4. create shell script /etc/solr/run  (edit to taste, i'm using the 
default jetty that comes with solr)


#!/bin/sh
exec 2>&1
cd /usr/local/apache-solr-1.2.0/example
exec setuidgid solr java -jar start.jar


4. create /etc/solr/log/run containing:

#!/bin/sh
exec setuidgid solr multilog t ./main

5. ln -s /etc/solr /service/solr

that is all.  as long as you've got svscan set to launch when the system 
boots, solr will run and auto-restart on crashes.  logs will be in 
/service/solr/log/main (auto-rotated).


yay.
-jsd-



Re: Administrative questions

2008-08-13 Thread Jon Drukman

Jason Rennie wrote:

On Tue, Aug 12, 2008 at 8:49 PM, Jon Drukman <[EMAIL PROTECTED]> wrote:


1. How do people deal with having solr start when system reboots, manage
the log output, etc.  Right now I run it manually under a unix 'screen'
command with a wrapper script that takes care of restarts when it crashes.
 That means that only my user can connect to it, and it can't happen when
the system starts up... But I don't see any other way to control the process
easily.



We use daemontools.  Restarts solr whenever it goes down (for whatever
reason) and directs output to a set of rotated log files.  Very handy for a
production environment.  A bit tricky to set, but solid once you have it in
place.

http://cr.yp.to/daemontools.html


*facepalm*

Duh.  I should have thought of that.  I'm a big fan of djbdns so I'm 
quite familiar with daemontools.


Thanks!

-jsd-



Administrative questions

2008-08-12 Thread Jon Drukman
1. How do people deal with having solr start when system reboots, manage 
the log output, etc.  Right now I run it manually under a unix 'screen' 
command with a wrapper script that takes care of restarts when it 
crashes.  That means that only my user can connect to it, and it can't 
happen when the system starts up... But I don't see any other way to 
control the process easily.


2. Is there any way to modify a schema without stopping the process, 
destroying the existing index, then restarting and reloading all the 
data?  It doesn't take that long and we're not in production yet, but 
once we're live I can't see that being feasible.


-jsd-



Re: Wildcard search question

2008-06-24 Thread Jon Drukman

Norberto Meijome wrote:
ok well let's say that i can live without john/jon in the short term. 
what i really need today is a case insensitive wildcard search with 
literal matching (no fancy stemming.  bobby is bobby, not bobbi.)


what are my options?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

define your own type (or modify text / string... but I find that it gets 
confusing to have variations of text / string ...) to perform the operations on 
the content as needed.

There are also other tokenizer/analysers available that *may* help in the 
partial searches (ngram , edgengram ), but there isn't much documentation on 
them yet (that I could find) - I am only getting into them myself i'll see 
how it goes..


thanks, that got me on the right track.  i came up with this:


  


  
  


  


now searching for user_name:bobby* works as i wanted.

my next question: is there a way that i can score matches that are at 
the start of the string higher than matches in the middle?  for example, 
if i search for steve, i get kelly stevenson before steve jobs.  i'd 
like steve jobs to come first.


-jsd-



Re: Wildcard search question

2008-06-23 Thread Jon Drukman

Erik Hatcher wrote:
No, because the original data is Bobby Gaza, so 
Bobby* would match, but not bobby*. "string" type (in the example 
schema, to be clear) does effectively no analysis, leaving the original 
string indexed as-is, case and all.

[...]

stemming and wildcard term queries aren't quite compatible, as you've 
found, but it does depend on how much of the prefix is provided.  bob* 
matches "bobbi", for example.


ok well let's say that i can live without john/jon in the short term. 
what i really need today is a case insensitive wildcard search with 
literal matching (no fancy stemming.  bobby is bobby, not bobbi.)


what are my options?

-jsd-



Re: Wildcard search question

2008-06-23 Thread Jon Drukman

Erik Hatcher wrote:

Jon,

You provided a lot of nice details, thanks for helping us help you :)

The one missing piece is the definition of the "text" field type.   In 
Solr's _example_ schema, "bobby" gets analyzed (stemmed) to 
"bobbi"[1].   When you query for bobby*, the query parser is not running 
an analyzer on the wildcard query, thus literally searching for terms 
that begin with "bobby"[2].


As for "steve" , same story, but it analyzes to "steve", which is found 
with a "steve*" query.


so, what's the solution?  if i change the field to string, will it be 
able to find bobby* ?  eventually it would be nice to be able to use 
fuzzy matching, to find 'jon' from 'john', for example.


thanks
-jsd-



Wildcard search question

2008-06-23 Thread Jon Drukman

When I search with q=bobby I get the following record:


2008-06-23T07:06:40Z
http://farm1.static.flickr.com/117/...
9
Bobby Gaza
[EMAIL PROTECTED]


When I search with bobby* I get nothing.

When I search with steve* I get "Steve Ballmer" and "Steve Jobs"... 
What's going on?



The relevant part of my schema.xml is:


   required="true" />

   
   
   
   
   
   
   
   
   

 

 
 type_id

 

 name



Best type to use for enum-like behavior

2008-06-12 Thread Jon Drukman
I am going to store two totally different types of documents in a single 
solr instance.  Eventually I may separate them into separate instances 
but we are a long way from having either the size or traffic to require 
that.


I read somewhere that a good approach is to add a 'type' field to the 
data and then use a filter query.  What data type would you use for the 
type field?  I could just an integer but then we have to remember that 
1=user, 2=item, and so on.  In mysql there's an enum type where you use 
text labels that are mapped to integers behind the scenes (good 
performance and user friendly).  Is there something similar in solr or 
should I just use a string?


-jsd-



Re: Newbie Q: searching multiple fields

2008-06-02 Thread Jon Drukman

Yonik Seeley wrote:


  
  
  


There is your issue:  type "string" indexes the whole field value as a
single token.
You want type "text" like you have on the name field.


yep, i noticed that right after i hit send.  things are working now.

sorry, i did say i was a newbie!

-jsd-




Re: Newbie Q: searching multiple fields

2008-06-02 Thread Jon Drukman

Yonik Seeley wrote:

Verify all the fields you want to search on indexed
Verify that the query is being correctly built by adding
debugQuery=true to the request


here is the schema.xml extract:

   required="true" />

   
   
   
   
   

here is the debugQuery output.  i have no idea how to read it:


 
  0
  0
  
   dismax
   descriptive
   1
  
 
 
 
  descriptive
  descriptive
  +DisjunctionMaxQuery((tags:descriptive^0.8 | 
description:descriptive^1.5 | name:descript^2.0)~0.01) 
DisjunctionMaxQuery((tags:descriptive | description:descriptive^2.0 | 
name:descript^2.0)~0.01)
  +(tags:descriptive^0.8 | 
description:descriptive^1.5 | name:descript^2.0)~0.01 (tags:descriptive 
| description:descriptive^2.0 | name:descript^2.0)~0.01

  
  
  
 




Newbie Q: searching multiple fields

2008-06-02 Thread Jon Drukman
I am brand new to Solr.  I am trying to get a very simple setup running. 
 I've got just a few fields: name, description, tags.  I am only able 
to search on the default field (name) however.  I tried to set up the 
dismax config to search all the fields, but I never get any results on 
the other fields.  Example doc:



  318
  Testing the new system
  Here is the very descriptive 
description

  jsd
  
  2008-05-16T05:05:10Z


q=system finds this doc.

q=descriptive does not.

q=descriptive&qt=dismax does not

q=descriptive&qt=dismax&qf=description does not

my solrconfig contains:

 

 explicit
 0.01
 
name^2 description^1.5 tags^0.8
 
 
name^2 description^2 tags^1
 
 100
 *:*

  

What am I missing?
-jsd-