Re: Index Solr Logs

2011-06-26 Thread mike anderson
Check out Logg.ly. http://www.loggly.com/. They use SOLR to index all kinds
of logs, SOLR included. This is a paid service, so maybe not what you're
looking for. I've used it though, works great.

-Mike

On Sun, Jun 26, 2011 at 5:49 AM, Mr Havercamp  wrote:

> I'm interested to know if there is a way to have Solr index its own logs,
> in particular the logging of queries.
>
> One project that showed promise was Sogger but I believe the developer is
> working more closely with LogStash which uses ElasticSearch so my guess is
> that the Sogger project is no longer being developed.
>
> Has anyone else had experience with this and can share their
> thoughts/findings/solution?
>
> Cheers
>
>
> hayden
>
>
>


Re: Multicore boosting to only 1 core

2011-02-15 Thread mike anderson
Could you make an additional date field, call it date_boost, that gets
populated in all of the cores EXCEPT the one with the newest articles, and
then boost on this field? Then when you move articles from the 'newest' core
to the rest of the cores you copy over the date to the date_boost field. (I
haven't used boosting before so I don't know what happens if you try to
boost a field that's empty)

This would boost documents in each index (locally, as desired). Keep in mind
when you get your results back from a distributed shard query that the IDF
is not distributed so your scores aren't reliable for sorting.

-mike


On Tue, Feb 15, 2011 at 1:19 PM, Jonathan Rochkind  wrote:

> No. In fact, there's no way to search over multi-cores at once in Solr at
> all, even before you get to your boosting question. Your different cores are
> entirely different Solr indexes, Solr has no built-in way to combine
> searches accross multiple Solr instances.
>
> [Well, sort of it can, with sharding. But sharding is unlikely to be a
> solution to your problem either, UNLESS you problem is that your solr index
> is so big you want to split it accross multiple machines for performance.
>  That is the problem sharding is meant to solve. People trying to use it to
> solve other problems run into trouble.]
>
>
> On 2/14/2011 1:59 PM, Tanner Postert wrote:
>
>> I have a multicore system and I am looking to boost results by date, but
>> only for 1 core. Is this at all possible?
>>
>> Basically one of the core's content is very new, and changes all the time,
>> and if I boost everything by date, that core's content will almost always
>> be
>> at the top of the results, so I only want to do the date boosting to the
>> cores that have older content so that their more recent results get
>> boosted
>> over the older content.
>>
>


Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-21 Thread mike anderson
[x] ASF Mirrors (linked in our release announcements or via the Lucene
website)

[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[x] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a
downstream project)


On Tue, Jan 18, 2011 at 4:04 PM, Grant Ingersoll wrote:

> As devs of Lucene/Solr, due to the way ASF mirrors, etc. works, we really
> don't have a good sense of how people get Lucene and Solr for use in their
> application.  Because of this, there has been some talk of dropping Maven
> support for Lucene artifacts (or at least make them external).  Before we do
> that, I'd like to conduct an informal poll of actual users out there and see
> how you get Lucene or Solr.
>
> Where do you get your Lucene/Solr downloads from?
>
> [] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a
> downstream project)
>
> Please put an X in the box that applies to you.  Multiple selections are OK
> (for instance, if one project uses a mirror and another uses Maven)
>
> Please do not turn this thread into a discussion on Maven and it's
> (de)merits, I simply want to know, informally, where people get their JARs
> from.  In other words, no discussion is necessary (we already have that
> going on d...@lucene.apache.org which you are welcome to join.)
>
> Thanks,
> Grant


Re: Improving Solr performance

2011-01-10 Thread mike anderson
Not sure if this was mentioned yet, but if you are doing slave/master
replication you'll need 2x the RAM at replication time. Just something to
keep in mind.

-mike

On Mon, Jan 10, 2011 at 5:01 PM, Toke Eskildsen wrote:

> On Mon, 2011-01-10 at 21:43 +0100, Paul wrote:
> > > I see from your other messages that these indexes all live on the same
> machine.
> > > You're almost certainly I/O bound, because you don't have enough memory
> for the
> > > OS to cache your index files.  With 100GB of total index size, you'll
> get best
> > > results with between 64GB and 128GB of total RAM.
> >
> > Is that a general rule of thumb? That it is best to have about the
> > same amount of RAM as the size of your index?
>
> I does not seems like there is a clear current consensus on hardware to
> handle IO problems. I am firmly in the SSD camp, but as you can see from
> the current thread, other people recommend RAM and/or extra machines.
>
> I can say that our tests with RAM and spinning disks showed us that a
> lot of RAM certainly helps a lot, but also that it takes a considerable
> amount of time to warm the index before the performance is satisfactory.
> It might be helped with disk cache tricks, such as copying the whole
> index to /dev/null before opening it in Solr.
>
> > So, with a 5GB index, I should have between 4GB and 8GB of RAM
> > dedicated to solr?
>
> Not as -Xmx, but free for disk cache, yes. If you follow the RAM ~=
> index size recommendation.
>
>


Re: Improving Solr performance

2011-01-07 Thread mike anderson
Making sure the index can fit in memory (you don't have to allocate that
much to Solr, just make sure it's available to the OS so it can cache it --
otherwise you are paging the hard drive, which is why you are probably IO
bound) has been the key to our performance. We recently opted to use less
RAM and store the indices on SSDs, we're still evaluating this approach but
so far it seems to be comparable, so I agree with Toke! (We have 18 shards
and over 100GB of index).

On Fri, Jan 7, 2011 at 10:07 AM, Toke Eskildsen wrote:

> On Fri, 2011-01-07 at 10:57 +0100, supersoft wrote:
>
> [5 shards, 100GB, ~20M documents]
>
> ...
>
> [Low performance for concurrent searches]
>
> > Using JConsole for monitoring the server java proccess I checked that
> Heap
> > Memory and the CPU Usages don't reach the upper limits so the server
> > shouldn't perform as overloaded.
>
> If memory and CPU is okay, the culprit is I/O.
>
> Solid state Drives has more than proven their worth for random access
> I/O, which is used a lot when searching with Solr/Lucene. SSD's are
> plug-in replacements for harddrives and they virtually eliminate I/O
> performance bottlenecks when searching. This also means shortened warm
> up requirements and less need for disk caching. Expanding RAM capacity
> does not scale well and requires extensive warmup. Adding more machines
> is expensive and often require architectural changes. With the current
> prices for SSD's, I consider them the generic first suggestion for
> improving search performance.
>
> Extra spinning disks improves the query throughput in general and speeds
> up single queries when the chards are searched in parallel. They do not
> help much for a single sequential searching of shards as the seek time
> for a single I/O request is the same regardless of the number of drives.
> If your current response time for a single user is satisfactory, adding
> drives is a viable solution for you. I'll still recommend the SSD option
> though, as it will also lower the response time for a single query.
>
> Regards,
> Toke Eskildsen
>
>


Re: how well does multicore scale?

2010-10-27 Thread mike anderson
That's a great point. If SSDs are sufficient, then what does the "Index size
vs Response time" curve look like? Since that would dictate the number of
machines needed. I took a look at
http://wiki.apache.org/solr/SolrPerformanceData but only one use case seemed
comparable. We currently have about 25M docs, split into 18 shards, with a
total index size of about 120GB. If index size has truly little impact on
performance then perhaps tagging articles with user IDs is a better way to
approach my use case.

-Mike



On Wed, Oct 27, 2010 at 9:45 AM, Toke Eskildsen wrote:

> On Wed, 2010-10-27 at 14:20 +0200, mike anderson wrote:
> > [...] By my simple math, this would mean that if we want each shard's
> > index to be able to fit in memory, [...]
>
> Might I ask why you're planning on using memory-based sharding? The
> performance gap between memory and SSDs is not very big so using memory
> to get those last queries/second is quite expensive.
>
>


Re: how well does multicore scale?

2010-10-27 Thread mike anderson
Tagging every document with a few hundred thousand 6 character user-ids
would  increase the document size by two orders of magnitude. I can't
imagine why this wouldn't mean the index would increase by just as much
(though I really don't know much about that file structure). By my simple
math, this would mean that if we want each shard's index to be able to fit
in memory, then (even with some beefy servers) each query would have to go
out to a few thousand shards (as opposed to 21 if we used the MultiCore
approach). This means the typical response time would be much slower.


-mike

On Tue, Oct 26, 2010 at 10:15 AM, Jonathan Rochkind wrote:

> mike anderson wrote:
>
>> I'm really curious if there is a clever solution to the obvious problem
>> with: "So your better off using a single index and with a user id and use
>> a query filter with the user id when fetching data.", i.e.. when you have
>> hundreds of thousands of user IDs tagged on each article. That just
>> doesn't
>> sound like it scales very well..
>>
>>
> Actually, I think that design would scale pretty fine, I don't think
> there's an 'obvious' problem. You store your userIDs in a multi-valued field
> (or as multiple terms in a single value, ends up being similar). You fq on
> there with the current userID.   There's one way to find out of course, but
> that doesn't seem a patently ridiculous scenario or anything, that's the
> kind of thing Solr is generally good at, it's what it's built for.   The
> problem might actually be in the time it takes to add such a document to the
> index; but not in query time.
>
> Doesn't mean it's the best solution for your problem though, I can't say.
>
> My impression is that Solr in general isn't really designed to support the
> kind of multi-tenancy use case people are talking about lately.  So trying
> to make it work anyway... if multi-cores work for you, then great, but be
> aware they weren't really designed for that (having thousands of cores) and
> may not. If a single index can work for you instead, great, but as you've
> discovered it's not neccesarily obvious how to set up the schema to do what
> you need -- really this applies to Solr in general, unlike an rdbms where
> you just third-form-normalize everything and figure it'll work for almost
> any use case that comes up,  in Solr you generally need to custom fit the
> schema for your particular use cases, sometimes being kind of clever to
> figure out the optimal way to do that.
>
> This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr
> index takes more intellectual work than setting up an rdbms. The trade off
> is you get speed, and flexible ways to set up relevancy (that still perform
> well). Took a couple decades for rdbms to get as brainless to use as they
> are, maybe in a couple more we'll have figured out ways to make indexing
> engines like solr equally brainless, but not yet -- but it's still pretty
> damn easy for what it is, the lucene/Solr folks have done a remarkable job.
>


Re: how well does multicore scale?

2010-10-26 Thread mike anderson
So I fired up about 100 cores and used JMeter to fire off a few thousand
queries. It looks like the memory usage isn't much worse than running a
single shard. So thats good.

I'm really curious if there is a clever solution to the obvious problem
with: "So your better off using a single index and with a user id and use
a query filter with the user id when fetching data.", i.e.. when you have
hundreds of thousands of user IDs tagged on each article. That just doesn't
sound like it scales very well..


Cheers,
Mike


On Fri, Oct 22, 2010 at 10:43 PM, Lance Norskog  wrote:

> http://wiki.apache.org/solr/CoreAdmin
>
> Since Solr 1.3
>
> On Fri, Oct 22, 2010 at 1:40 PM, mike anderson 
> wrote:
> > Thanks for the advice, everyone. I'll take a look at the API mentioned
> and
> > do some benchmarking over the weekend.
> >
> > -Mike
> >
> >
> > On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller 
> wrote:
> >
> >> On 10/22/10 1:44 AM, Tharindu Mathew wrote:
> >> > Hi Mike,
> >> >
> >> > I've also considered using a separate cores in a multi tenant
> >> > application, ie a separate core for each tenant/domain. But the cores
> >> > do not suit that purpose.
> >> >
> >> > If you check out documentation no real API support exists for this so
> >> > it can be done dynamically through SolrJ. And all use cases I found,
> >> > only had users configuring it statically and then using it. That was
> >> > maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.
> >>
> >> You can dynamically manage cores with solrj. See
> >> org.apache.solr.client.solrj.request.CoreAdminRequest's static methods
> >> for a place to start.
> >>
> >> You probably want to turn solr.xml's persist option on so that your
> >> cores survive restarts.
> >>
> >> >
> >> > So your better off using a single index and with a user id and use a
> >> > query filter with the user id when fetching data.
> >>
> >> Many times this is probably the case - pro's and con's to each depending
> >> on what you are up to.
> >>
> >> - Mark
> >> lucidimagination.com
> >>
> >> >
> >> > On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind 
> >> wrote:
> >> >> No, it does not seem reasonable.  Why do you think you need a
> seperate
> >> core
> >> >> for every user?
> >> >> mike anderson wrote:
> >> >>>
> >> >>> I'm exploring the possibility of using cores as a solution to
> "bookmark
> >> >>> folders" in my solr application. This would mean I'll need tens of
> >> >>> thousands
> >> >>> of cores... does this seem reasonable? I have plenty of CPUs
> available
> >> for
> >> >>> scaling, but I wonder about the memory overhead of adding cores
> (aside
> >> >>> from
> >> >>> needing to fit the new index in memory).
> >> >>>
> >> >>> Thoughts?
> >> >>>
> >> >>> -mike
> >> >>>
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >>
> >>
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: how well does multicore scale?

2010-10-22 Thread mike anderson
Thanks for the advice, everyone. I'll take a look at the API mentioned and
do some benchmarking over the weekend.

-Mike


On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller  wrote:

> On 10/22/10 1:44 AM, Tharindu Mathew wrote:
> > Hi Mike,
> >
> > I've also considered using a separate cores in a multi tenant
> > application, ie a separate core for each tenant/domain. But the cores
> > do not suit that purpose.
> >
> > If you check out documentation no real API support exists for this so
> > it can be done dynamically through SolrJ. And all use cases I found,
> > only had users configuring it statically and then using it. That was
> > maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.
>
> You can dynamically manage cores with solrj. See
> org.apache.solr.client.solrj.request.CoreAdminRequest's static methods
> for a place to start.
>
> You probably want to turn solr.xml's persist option on so that your
> cores survive restarts.
>
> >
> > So your better off using a single index and with a user id and use a
> > query filter with the user id when fetching data.
>
> Many times this is probably the case - pro's and con's to each depending
> on what you are up to.
>
> - Mark
> lucidimagination.com
>
> >
> > On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind 
> wrote:
> >> No, it does not seem reasonable.  Why do you think you need a seperate
> core
> >> for every user?
> >> mike anderson wrote:
> >>>
> >>> I'm exploring the possibility of using cores as a solution to "bookmark
> >>> folders" in my solr application. This would mean I'll need tens of
> >>> thousands
> >>> of cores... does this seem reasonable? I have plenty of CPUs available
> for
> >>> scaling, but I wonder about the memory overhead of adding cores (aside
> >>> from
> >>> needing to fit the new index in memory).
> >>>
> >>> Thoughts?
> >>>
> >>> -mike
> >>>
> >>>
> >>
> >
> >
> >
>
>


how well does multicore scale?

2010-10-21 Thread mike anderson
I'm exploring the possibility of using cores as a solution to "bookmark
folders" in my solr application. This would mean I'll need tens of thousands
of cores... does this seem reasonable? I have plenty of CPUs available for
scaling, but I wonder about the memory overhead of adding cores (aside from
needing to fit the new index in memory).

Thoughts?

-mike


phrase query with autosuggest (SOLR-1316)

2010-10-06 Thread mike anderson
It seemed like SOLR-1316 was a little too long to continue the conversation.

Is there support for quotes indicating a phrase query. For example, my
autosuggest query for "mike sha" ought to return "mike shaffer", "mike
sharp", etc. Instead I get suggestions for "mike" and for "sha", resulting
in a collated result "mike r meyer shaw",

Cheers,
Mike


Re: upgrade index from 2.9 to 3.x

2010-09-24 Thread mike anderson
Thanks. I found the Jars for Lucene 3.0.2, but for the life of me I can
figure out how to compile solr against that verison. Is there a parameter I
can pass to ant that tells it which version to use? Should I just dump
everything Lucene related into the 'lib' folder?

-mike

On Fri, Sep 24, 2010 at 10:33 AM, Markus Jelsma wrote:

> There is a recent thread on this one
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg40491.html
>
>
> On Friday 24 September 2010 16:30:36 mike anderson wrote:
> > What is the right way to upgrade a solr index from Lucene 2.9.1 to 3.x.
> I'm
> > getting the exception:
> >
> > SEVERE: java.lang.RuntimeException:
> > org.apache.lucene.index.IndexFormatTooOldException: Format version is not
> > supported in file '_aw5w.fdx': 1 (needs to be between 2 and 2). This
> >  version of Lucene only supports indexes created with release 3.0 and
> >  later.
> >
> >
> > when I try to start a server with my old index.
> >
> > Thanks in advance,
> > Mike
> >
>
> Markus Jelsma - Technisch Architect - Buyways BV
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>
>


upgrade index from 2.9 to 3.x

2010-09-24 Thread mike anderson
What is the right way to upgrade a solr index from Lucene 2.9.1 to 3.x. I'm
getting the exception:

SEVERE: java.lang.RuntimeException:
org.apache.lucene.index.IndexFormatTooOldException: Format version is not
supported in file '_aw5w.fdx': 1 (needs to be between 2 and 2). This version
of Lucene only supports indexes created with release 3.0 and later.


when I try to start a server with my old index.

Thanks in advance,
Mike


Re: How to retrieve the full corpus

2010-09-06 Thread mike anderson
You might check out Luke, the Lucene Index Toolbox.

http://www.getopt.org/luke/

I know you can browse the index and get frequency counts, though I'm not
sure if you can export the entire index as a list like what you're looking
for.

Hope this helps,
Mike

On Mon, Sep 6, 2010 at 10:52 AM, Roland Villemoes 
wrote:

> Hi All,
>
> How can I retrieve all words from a Solr core?
> I need a list of all the words and how often they occur in the index.
>
> med venlig hilsen/best regards
>
> Roland Villemoes
> Tel: (+45) 22 69 59 62
> E-Mail: mailto:r...@alpha-solutions.dk
>
> Alpha Solutions A/S
> Borgergade 2, 3.sal, 1300 København K
> Tel: (+45) 70 20 65 38
> Web: http://www.alpha-solutions.dk
>
> ** This message including any attachments may contain confidential and/or
> privileged information intended only for the person or entity to which it is
> addressed. If you are not the intended recipient you should delete this
> message. Any printing, copying, distribution or other use of this message is
> strictly prohibited. If you have received this message in error, please
> notify the sender immediately by telephone, or e-mail and delete all copies
> of this message and any attachments from your system. Thank you.
>
>


Re: SolrCloud in production?

2010-08-01 Thread mike anderson
I'd second the request for more information on the current state of
SolrCloud. I have a 16 shard Solr setup in production running 1.3, and a lot
of the features of SolrCloud would make my life a lot easier.

Cheers,
Mike

On Sat, Jul 24, 2010 at 12:52 PM, Dennis Gearon wrote:

> Boy, if it does what it says it does, it's really a powerful tool.
>
> How is such a thing hosted, I wonder?
>
> Dennis Gearon
>
> Signature Warning
> 
> EARTH has a Right To Life,
>  otherwise we all die.
>
> Read 'Hot, Flat, and Crowded'
> Laugh at http://www.yert.com/film.php
>
>
> --- On Sat, 7/24/10, Andrew Clegg  wrote:
>
> > From: Andrew Clegg 
> > Subject: SolrCloud in production?
> > To: solr-user@lucene.apache.org
> > Date: Saturday, July 24, 2010, 5:18 AM
> >
> > Is anyone using ZooKeeper-based Solr Cloud in production
> > yet? Any war
> > stories? Any problematic missing features?
> >
> > Thanks,
> >
> > Andrew.
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-in-production-tp991995p991995.html
> > Sent from the Solr - User mailing list archive at
> > Nabble.com.
> >
>


proximity question

2010-07-06 Thread mike anderson
Will quotes do an exact match within a proximity test? For instance

body:""mountain goat" grass"~10

should match:

"the mountain goat went up the hill to eat grass"

but should NOT match

"the mountain where the goat lives is covered in grass"


If not, does anybody know how to accomplish this?


Thanks,
Mike Anderson


Re: solr-ruby with clustering

2010-03-22 Thread mike anderson
false alarm, on the client side I was specifically setting a shard,
and this was causing my query/solr-ruby/solr to think it was a
distributed request, which isn't supported by the clustering
component.

cheers,
mike

On Mon, Mar 22, 2010 at 8:53 PM, mike anderson  wrote:
> Has anybody got solr-ruby to return a clustering result? (using the
> clustering component)
>
> I'm almost certain the query is correct (I check the solr logs for the
> query and run it in my browser, get back the cluster output as
> expected). But when I dump the response from my solr-ruby query the
> clustering output is nowhere to be found. I noticed that the
> clustering output has a data type of "Arr", where the response and
> other components have output of type "Lst", could this be the problem?
>
> If anyone can think of some other debugging I could try I'd love to hear it.
>
> Thanks in advance,
> Mike
>


solr-ruby with clustering

2010-03-22 Thread mike anderson
Has anybody got solr-ruby to return a clustering result? (using the
clustering component)

I'm almost certain the query is correct (I check the solr logs for the
query and run it in my browser, get back the cluster output as
expected). But when I dump the response from my solr-ruby query the
clustering output is nowhere to be found. I noticed that the
clustering output has a data type of "Arr", where the response and
other components have output of type "Lst", could this be the problem?

If anyone can think of some other debugging I could try I'd love to hear it.

Thanks in advance,
Mike


Re: Best OCR API for solr

2010-02-04 Thread mike anderson
There might be an OCR plugin for Apache Tika (which does exactly this out of
the box except for OCR capability, i believe).

http://lucene.apache.org/tika/

-mike


2010/2/4 Kranti™ K K Parisa 

> Hi,
>
> Can anyone list the best OCR APIs available to use in combination with
> SOLR.
>
> The idea is to take a scanned file (format could be pdf,word,image..etc) as
> input and give OCRd file which could be used to get the contents for the
> SOLR indexing.
>
> Best Regards,
> Kranti K K Parisa
>


Re: solr application for website crawling and indexing html, pdf, word, ... files

2010-01-25 Thread mike anderson
I think you might be looking for Apache Tika.


On Mon, Jan 25, 2010 at 3:55 PM, Frank van Lingen wrote:

> I recently started working with solr and find it easy to setup and tinker
> with.
>
> I now want to scale up my setup and was wondering if there is an
> application/component that can do the following (I was not able to
> find documentation on this on the solr site):
>
> -Can I send solr an xml document with a url (html, pdf, word, ppt,
> etc..) and solr indexes it after analyzing (can it analyze pdf and
> other documents?). Solr would use some generic basic fields like
> header and content when analyzing the files.
>
> -Can I send solr a site url and it indexes the whole site?
>
> If the answer to the above is yes; are there some examples? If the
> answer is no; Is there a simple (basic) extractor for html, pdf, word,
> etc.. files that would translates this in a basic xml document (e.g.
> with field names, url, header and content) that solr can ingest, or
> preferably an application that does this for a whole site?
>
> The idea is to configure solr for generic indexing and search of a website.
>
> Frank.
>


Re: Lock problems: Lock obtain timed out

2010-01-25 Thread mike anderson
I am getting this exception as well, but disk space is not my problem. What
else can I do to debug this? The solr log doesn't appear to lend any other
clues..

Jan 25, 2010 4:02:22 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={} status=500 QTime=1990
Jan 25, 2010 4:02:22 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out: NativeFSLock@
/solr8984/index/lucene-98c1cb272eb9e828b1357f68112231e0-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1402)
at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


Should I consider changing the lock timeout settings (currently set to
defaults)? If so, I'm not sure what to base these values on.

Thanks in advance,
mike


On Wed, Nov 4, 2009 at 8:27 PM, Lance Norskog  wrote:

> This will not ever work reliably. You should have 2x total disk space
> for the index. Optimize, for one, requires this.
>
> On Wed, Nov 4, 2009 at 6:37 AM, Jérôme Etévé 
> wrote:
> > Hi,
> >
> > It seems this situation is caused by some No space left on device
> exeptions:
> > SEVERE: java.io.IOException: No space left on device
> >at java.io.RandomAccessFile.writeBytes(Native Method)
> >at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
> >at
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
> >at
> org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
> >
> >
> > I'd better try to set my maxMergeDocs and mergeFactor to more
> > adequates values for my app (I'm indexing ~15 Gb of data on 20Gb
> > device, so I guess there's problem when solr tries to merge the index
> > bits being build.
> >
> > At the moment, they are set to   100 and
> > 2147483647
> >
> > Jerome.
> >
> > --
> > Jerome Eteve.
> > http://www.eteve.net
> > jer...@eteve.net
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


MLT calculation

2009-12-16 Thread Mike Anderson
How exactly is MLT calculated? I'm trying to gain an intuition for it by
tweaking the parameters MLT.qf, MLT.mintf, and MLT.mindf (mostly the former,
changing boosts), but so far it's a bit counter intuitive. How does
MLT.boost play in?

If anybody could point me to a technical description (equations and what
not) that would be the most helpful. I couldn't find anything on google.


Thanks,
Mike


content stream/MLT

2009-12-09 Thread Mike Anderson
I'm trying to understand how content stream works with respect to MLT. I did
a regular MLT query using a document ID and specifying two fields to do MLT
on and got back a set of results. I then copied the xml for the document
with the aforementioned ID and pasted it to a text file. Then I made the
query with stream.file=mlt_doc.xml, but my result set was completely
different and didn't really make sense.

Am I not using content streams correctly here? Or does solr not use the
schema when accepting a content stream?

Thanks in advance,
Mike


atypical MLT use-case

2009-12-09 Thread Mike Anderson
This is somewhat of an odd use-case for MLT. Basically I'm using it for
near-duplicate detection (I'm not using the built in dup detection for a
variety of reasons). While this might sound like an okay idea, the problem
lies in the order of which things happen. Ideally, duplicate detection would
prevent me from adding a document to my index which is already there (or at
least partially there). However, more like this only works on documents
which are *already* in the index. Ideally what I would be able to do is:
post an xml document to solr, and receive a MLT response (the same kind of
MLT response I would recieve had the document been in Solr already, and
queried with id=#{id}&mlt=true).

Is anybody aware of how I could achieve this functionality leveraging
existing handlers? If not I will bump over to solr-dev and see if this is a
tractable problem.

Thanks in advance,
Mike


Re: field queries seem slow

2009-11-04 Thread mike anderson
Erik, we are doing a sort by date first, and then by score. I'm not sure
what you mean by readers.

Since we have nearly 6M authors attached to our 20M documents I'm not sure
that autowarming would help that much (especially since we have very little
overlap in what users are searching for). But maybe it would?

Lance, I was just being a bit lazy. thanks though.

-mike


On Mon, Nov 2, 2009 at 10:27 PM, Lance Norskog  wrote:

> This searches author:albert and (default text field): einstein. This
> may not be what you expect?
>
> On Mon, Nov 2, 2009 at 2:30 PM, Erick Erickson 
> wrote:
> > H, are you sorting? And has your readers been reopened? Is the
> > second query of that sort also slow? If the answer to this last question
> is
> > "no",
> > have you tried some autowarming queries?
> >
> > Best
> > Erick
> >
> > On Mon, Nov 2, 2009 at 4:34 PM, mike anderson  >wrote:
> >
> >> I took a look through my Solr logs this weekend and noticed that the
> >> longest
> >> queries were on particular fields, like "author:albert einstein". Is
> this a
> >> result consistent with other setups out there? If not, Is there a trick
> to
> >> make these go faster? I've read up on filter queries and use those when
> >> applicable, but they don't really solve all my problems.
> >>
> >> If anybody wants to take a shot at it but needs to see my solrconfig,
> etc
> >> just let me know.
> >>
> >> Cheers,
> >> Mike
> >>
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: apply a patch on solr

2009-11-02 Thread mike anderson
You can see what revision the patch was written for at the top of the patch,
it will look like this:

Index: org/apache/solr/handler/MoreLikeThisHandler.java
===
--- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437)
+++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy)

now check out revision 772437 using the --revision switch in svn, patch
away, and then svn up to make sure everything merges cleanly.  This is a
good guide to follow as well:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg10189.html

cheers,
-mike

On Mon, Nov 2, 2009 at 3:55 PM, michael8  wrote:

>
> Hi,
>
> First I like to pardon my novice question on patching solr (1.4).  What I
> like to know is, given a patch, like the one for collapse field, how would
> one go about knowing what solr source that patch is meant for since this is
> a source level patch?  Wouldn't the exact versions of a set of java files
> to
> be patched critical for the patch to work properly?
>
> So far what I have done is to pull the latest collapse field patch down
> from
> http://issues.apache.org/jira/browse/SOLR-236 (field-collapse-5.patch),
> and
> then svn up the latest trunk from
> http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and build.
> Intuitively I was thinking I should be doing svn up to a specific
> revision/tag instead of just latest.  So far everything seems fine, but I
> just want to make sure I'm doing the right thing and not just being lucky.
>
> Thanks,
> Michael
> --
> View this message in context:
> http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


field queries seem slow

2009-11-02 Thread mike anderson
I took a look through my Solr logs this weekend and noticed that the longest
queries were on particular fields, like "author:albert einstein". Is this a
result consistent with other setups out there? If not, Is there a trick to
make these go faster? I've read up on filter queries and use those when
applicable, but they don't really solve all my problems.

If anybody wants to take a shot at it but needs to see my solrconfig, etc
just let me know.

Cheers,
Mike


Re: benchmarking tools

2009-10-28 Thread mike anderson
Great suggestion, I took a look and it seems pretty useful. As a follow up
question, did you do anything to disable Solr caching for certain tests?

-mike

On Tue, Oct 27, 2009 at 8:14 PM, Joshua Tuberville <
joshuatubervi...@eharmony.com> wrote:

> Mike,
>
> For response times I would also look at java.net's Faban benchmarking
> framework.  We use it extensively for our acceptance tests and tuning
> excercises.
>
> Joshua
>
> On Oct 27, 2009, at 1:59 PM, Mike Anderson wrote:
>
> > I've been making modifications here and there to the Solr source
> > code in
> > hopes to optimize for my particular setup. My goal now is to
> > establish a
> > descent benchmark toolset so that I can evaluate the observed
> > performance
> > increase before deciding rolling out. So far I've investigated
> > Jmeter and
> > Lucid Gaze, but each seem to have pretty steep learning curves, so I
> > thought
> > I'd ping the group before I sink a good chunk of time into either.
> > My ideal
> > performance metrics aren't so much load testing, but rather response
> > time
> > testing for different query types across different Solr
> > configurations.
> >
> > If anybody has some insight into this kind of project I'd love to
> > get some
> > feedback.
> >
> > Thanks in advance,
> > Mike Anderson
>
>


benchmarking tools

2009-10-27 Thread Mike Anderson
I've been making modifications here and there to the Solr source code in
hopes to optimize for my particular setup. My goal now is to establish a
descent benchmark toolset so that I can evaluate the observed performance
increase before deciding rolling out. So far I've investigated Jmeter and
Lucid Gaze, but each seem to have pretty steep learning curves, so I thought
I'd ping the group before I sink a good chunk of time into either. My ideal
performance metrics aren't so much load testing, but rather response time
testing for different query types across different Solr configurations.

If anybody has some insight into this kind of project I'd love to get some
feedback.

Thanks in advance,
Mike Anderson


Re: stopfilterFactory isn't removing field name

2009-09-15 Thread mike anderson
Could this be related to SOLR-1423?

On Mon, Sep 14, 2009 at 8:51 AM, Yonik Seeley wrote:

> Thanks, I'll see if I can reproduce...
>
> -Yonik
> http://www.lucidimagination.com
>
> On Mon, Sep 14, 2009 at 2:10 AM, mike anderson 
> wrote:
> > Yeah.. that was weird. removing the line "forever,for ever" from my
> synonyms
> > file fixed the problem. In fact, i was having the same problem for every
> > double word like that. I decided I didn't really need the synonym filter
> for
> > that field so I just took it out, but I'd really like to know what the
> > problem is.
> > -mike
> >
> > On Mon, Sep 14, 2009 at 1:10 AM, Yonik Seeley <
> yo...@lucidimagination.com>
> > wrote:
> >>
> >> That's pretty strange... perhaps something to do with your synonyms
> >> file mapping "for" to a zero length token?
> >>
> >> -Yonik
> >> http://www.lucidimagination.com
> >>
> >> On Mon, Sep 14, 2009 at 12:13 AM, mike anderson  >
> >> wrote:
> >> > I'm kind of stumped by this one.. is it something obvious?
> >> > I'm running the latest trunk. In some cases the stopFilterFactory
> isn't
> >> > removing the field name.
> >> >
> >> > Thanks in advance,
> >> >
> >> > -mike
> >> >
> >> > From debugQuery (both words are in the stopwords file):
> >> >
> >> > http://localhost:8983/solr/select?q=citations:for&debugQuery=true
> >> >
> >> > citations:for
> >> > citations:for
> >> > citations:
> >> > citations:
> >> >
> >> >
> >> > http://localhost:8983/solr/select?q=citations:the&debugQuery=true
> >> >
> >> > citations:the
> >> > citations:the
> >> > 
> >> > 
> >> >
> >> >
> >> >
> >> >
> >> > schema analyzer for this field:
> >> > 
> >> >  >> > positionIncrementGap="100">
> >> >  
> >> > 
> >> >  >> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
> >> > 
> >> > >> > words="citationstopwords.txt"/>
> >> >
> >> >   
> >> >
> >> >
> >> >  
> >> >  
> >> >  
> >> >>> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
> >> >  
> >> >   >> > words="citationstopwords.txt"/>
> >> >  
> >> >
> >> >   
> >> >  
> >> >
> >> >
> >
> >
>


Re: stopfilterFactory isn't removing field name

2009-09-13 Thread mike anderson
Yeah.. that was weird. removing the line "forever,for ever" from my synonyms
file fixed the problem. In fact, i was having the same problem for every
double word like that. I decided I didn't really need the synonym filter for
that field so I just took it out, but I'd really like to know what the
problem is.
-mike

On Mon, Sep 14, 2009 at 1:10 AM, Yonik Seeley wrote:

> That's pretty strange... perhaps something to do with your synonyms
> file mapping "for" to a zero length token?
>
> -Yonik
> http://www.lucidimagination.com
>
> On Mon, Sep 14, 2009 at 12:13 AM, mike anderson 
> wrote:
> > I'm kind of stumped by this one.. is it something obvious?
> > I'm running the latest trunk. In some cases the stopFilterFactory isn't
> > removing the field name.
> >
> > Thanks in advance,
> >
> > -mike
> >
> > From debugQuery (both words are in the stopwords file):
> >
> > http://localhost:8983/solr/select?q=citations:for&debugQuery=true
> >
> > citations:for
> > citations:for
> > citations:
> > citations:
> >
> >
> > http://localhost:8983/solr/select?q=citations:the&debugQuery=true
> >
> > citations:the
> > citations:the
> > 
> > 
> >
> >
> >
> >
> > schema analyzer for this field:
> > 
> >  > positionIncrementGap="100">
> >  
> > 
> >  > synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
> > 
> > > words="citationstopwords.txt"/>
> >
> >   
> >
> >
> >  
> >  
> >  
> >> synonyms="substitutions.txt" ignoreCase="true" expand="false"/>
> >  
> >   > words="citationstopwords.txt"/>
> >  
> >
> >   
> >  
> >
> >
>


stopfilterFactory isn't removing field name

2009-09-13 Thread mike anderson
I'm kind of stumped by this one.. is it something obvious?
I'm running the latest trunk. In some cases the stopFilterFactory isn't
removing the field name.

Thanks in advance,

-mike

>From debugQuery (both words are in the stopwords file):

http://localhost:8983/solr/select?q=citations:for&debugQuery=true

citations:for
citations:for
citations:
citations:


http://localhost:8983/solr/select?q=citations:the&debugQuery=true

citations:the
citations:the






schema analyzer for this field:


  

 



   


  
  
  
   
  
  
  

   
  



Re: MoreLikeThis (MLT) in 1.4 distributed

2009-08-18 Thread mike anderson
Perhaps it was something about the way I applied the patch by hand, but
after trying it again (on a later revision, maybe that was the trick), I got
solr to acknowledge I was using MLT when also passing the shards parameter.
However, unlike a query without shards, I get numFound=0 for all results:


Any advice to this end?

My query is still the same:

http://localhost:8983/solr/select?q=graph%20theory&mlt=true&mlt.fl=abstract&mlt.mindf=1&mlt.mintf=1&shards=localhost:8983/solr

thanks in advance,
Mike


On Tue, Aug 18, 2009 at 12:18 PM, mike anderson wrote:

> There doesn't appear to be any related errors in the log. I've included it
> below anyhow (there is a java.lang.NumberFormatException, i'm not sure what
> that is).
> thanks,
> mike
>
> for the query:
>
> http://localhost:8983/solr/select?q=%22theory%20of%20colorful%20graphs%22&mlt=true&mlt.fl=abstract&mlt.mindf=1&mlt.mintf=1&shards=localhost:8983/solr
>
> Aug 18, 2009 12:11:56 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select
> params={mlt.mindf=1&mlt.fl=abstract&shards=localhost:8983/solr&q="theory+of+colorful+graphs"&mlt.mintf=1&mlt=true}
> status=0 QTime=68
> Aug 18, 2009 12:12:08 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select
> params={spellcheck=true&mlt.fl=abstract&spellcheck.extendedResults=false&mlt.mintf=1&mlt=true&spellcheck.collate=true&wt=javabin&spellcheck.onlyMorePopular=false&rows=10&version=2.2&mlt.mindf=1&fl=id,score&start=0&q="theory+of+colorful+graphs"&spellcheck.dictionary=titleCheck&spellcheck.count=1&isShard=true&fsv=true}
> hits=1 status=0 QTime=5
> Aug 18, 2009 12:12:08 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select
> params={spellcheck=true&mlt.fl=abstract&spellcheck.extendedResults=false&ids=2b0b321193c61dfbebe58d35f7d42bcf&mlt.mintf=1&mlt=true&spellcheck.collate=true&wt=javabin&spellcheck.onlyMorePopular=false&version=2.2&mlt.mindf=1&q="theory+of+colorful+graphs"&spellcheck.dictionary=titleCheck&spellcheck.count=1&isShard=true}
> status=0 QTime=5
> Aug 18, 2009 12:12:08 PM
> org.apache.solr.request.BinaryResponseWriter$Resolver getDoc
> WARNING: Error reading a field from document : SolrDocument[{abstract=  The
> theory of colorful graphs can be developed by working in Galois field
> modulo (p), p > 2 and a prime number. The paper proposes a program of
> possible
> conversion of graph theory into a pleasant colorful appearance. We propose
> to
> paint the usual black (indicating presence of an edge) and white
> (indicating
> absence of an edge) edges of graphs using multitude of colors and study
> their
> properties. All colorful graphs considered here are simple, i.e. not having
> any
> multiple edges or self-loops. This paper is an invitation to the program of
> generalizing usual graph theory in this direction.
> , affiliations=, all_authors=Dhananjay P Mehendale, article_date=Sat Apr 28
> 19:59:59 EDT 2007, authors=[Mehendale, Dhananjay P Mehendale, Mehendale
> Dhananjay P, D P Mehendale, Mehendale D P, D Mehendale, Mehendale D,
> Dhananjay Mehendale, Mehendale Dhananjay, DP Mehendale, Mehendale DP],
> created_at=Sat Apr 28 19:59:59 EDT 2007, description=10 pages, doi=, eissn=,
> first_author=[Mehendale, Dhananjay P Mehendale, Mehendale Dhananjay P, D P
> Mehendale, Mehendale D P, D Mehendale, Mehendale D, Dhananjay Mehendale,
> Mehendale Dhananjay, DP Mehendale, Mehendale DP], first_page=,
> id=2b0b321193c61dfbebe58d35f7d42bcf}]
> java.lang.NumberFormatException: For input string: ""
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>  at java.lang.Integer.parseInt(Integer.java:493)
> at java.lang.Integer.valueOf(Integer.java:570)
>  at org.apache.solr.schema.IntField.toObject(IntField.java:71)
> at org.apache.solr.schema.IntField.toObject(IntField.java:32)
>  at
> org.apache.solr.request.BinaryResponseWriter$Resolver.getDoc(BinaryResponseWriter.java:147)
> at
> org.apache.solr.request.BinaryResponseWriter$Resolver.writeDocList(BinaryResponseWriter.java:123)
>  at
> org.apache.solr.request.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:88)
> at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:142)
>  at
> org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:132)
> at
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:220)
>  at
> org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:137)
> at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.ja

Re: MoreLikeThis (MLT) in 1.4 distributed

2009-08-18 Thread mike anderson
her unsatisfactory theory of metrics
on
sparse graphs. This paper deals mainly with graphs with $o(n^2)$ but
$\omega(n)$ edges: a companion paper [arXiv:0812.2656] will discuss the
(more
problematic still) case of {\em extremely sparse} graphs, with O(n) edges.
, affiliations=[, ], all_authors=[B Bollobas, O Riordan], article_date=Tue
Aug 14 19:59:59 EDT 2007, authors=[Bollobas, B Bollobas, Bollobas B,
Riordan, O Riordan, Riordan O], created_at=Tue Aug 14 19:59:59 EDT 2007,
description=83 pages, 1 figure. References updated and one corrected, doi=,
eissn=, first_author=[Bollobas, B Bollobas, Bollobas B], first_page=,
id=9f6a7fdd34e6641a2dc7d1d8fdd9870f}]
java.lang.NumberFormatException: For input string: ""
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:493)
at java.lang.Integer.valueOf(Integer.java:570)
at org.apache.solr.schema.IntField.toObject(IntField.java:71)
at org.apache.solr.schema.IntField.toObject(IntField.java:32)
at
org.apache.solr.request.BinaryResponseWriter$Resolver.getDoc(BinaryResponseWriter.java:147)
at
org.apache.solr.request.BinaryResponseWriter$Resolver.writeDocList(BinaryResponseWriter.java:123)
at
org.apache.solr.request.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:88)
at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:142)
at
org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:132)
at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:220)
at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:137)
at
org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:132)
at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:220)
at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:137)
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:86)
at
org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWriter.java:48)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Aug 18, 2009 12:12:08 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params={mlt.mindf=1&mlt.fl=abstract&shards=localhost:8983/solr&q="theory+of+colorful+graphs"&mlt.mintf=1&mlt=true}
status=0 QTime=164



On Tue, Aug 18, 2009 at 11:30 AM, Grant Ingersoll wrote:

> Are there errors in the logs?
>
> -Grant
>
> On Aug 18, 2009, at 10:42 AM, mike anderson wrote:
>
>  I'm trying to get MLT working in 1.4 distributed mode. I was hoping the
>> patch *SOLR-788  *would do the trick, but after
>> applying the patch by hand to revision 737810 (it kept choking on
>> component/MoreLikeThisComponent.java) I still get nothing. The URL I am
>> using is this:
>>
>> http://localhost:8983/solr/select?q=graph%20theory&mlt=true&mlt.fl=abstract&mlt.mindf=1&mlt.mintf=1&shards=localhost:8983/solr
>>
>> and without the shards param it works fine:
>>
>>
>> http://localhost:8983/solr/select?q=graph%20theory&mlt=true&mlt.fl=abstract&mlt.mindf=1&mlt.mintf=1
>>
>> debugQuery=true shows that the MLT component is being called, is there
>> elsewhere I can check for more debug information? Any advice on getting
>> this
>> to work?
>>
>>
>>
>> Thanks in advance,
>> Mike
>>
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


MoreLikeThis (MLT) in 1.4 distributed

2009-08-18 Thread mike anderson
I'm trying to get MLT working in 1.4 distributed mode. I was hoping the
patch *SOLR-788  *would do the trick, but after
applying the patch by hand to revision 737810 (it kept choking on
component/MoreLikeThisComponent.java) I still get nothing. The URL I am
using is this:
http://localhost:8983/solr/select?q=graph%20theory&mlt=true&mlt.fl=abstract&mlt.mindf=1&mlt.mintf=1&shards=localhost:8983/solr

and without the shards param it works fine:

http://localhost:8983/solr/select?q=graph%20theory&mlt=true&mlt.fl=abstract&mlt.mindf=1&mlt.mintf=1

debugQuery=true shows that the MLT component is being called, is there
elsewhere I can check for more debug information? Any advice on getting this
to work?



Thanks in advance,
Mike


ruby client and building spell check dictionary

2009-08-14 Thread Mike Anderson
I set up the spell check  component with this code in the config file:

> 

 textSpell

 

  titleCheck

  solr.IndexBasedSpellChecker

 dictionary

 0.7







which works great. I can build the dictionary from my browser
"q="foo"&spellcheck.build=true&spellcheck.name=titleCheck"

and I can also receive the spellcheck response when I make a query via the
ruby client.


What I'm trying to do now though is build the dictionary via the ruby
client. I added this code to "class Solr::Request::Standard <
Solr::Request::Select"

  if @params[:spellcheck]

hash[:spellcheck] = true

hash["spellcheck.q"] = @params[:spellcheck][:query]

hash["spellcheck.build"] = @params[:spellcheck][:build]

  end



and attempt to make a query with  spellcheck.build=true (the
spellcheck.nameis set in the defaults of select). Unfortunately I am
receiving this
exception

Net::HTTPFatalError: 500
> "javaioFileNotFoundException__cfdx__javalangRuntimeException_javaioFileNotFoundException__cfdx__at_orgapachesolrspellingIndexBasedSpellCheckerbuildIndexBasedSpellCheckerjava92__at_orgapachesolrhandlercomponentSpellCheckComponentprepareSpellCheckComponentjava107__at_orgapachesolrhandlercomponentSearchHandlerhandleRequestBodySearchHandlerjava150__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1333__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava303__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava232__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayjettyHttpConnectionhandleHttpConnectionjava378__at_orgmortbayjettybioSocketConnector$ConnectionrunSocketConnectorjava226__at_orgmortbaythreadBoundedThreadPool$PoolThreadrunBoundedThreadPooljava442_Caused_by_javaioFileNo"

from /usr/lib/ruby/1.8/net/http.rb:2097:in `error!'

from /var/lib/gems/1.8/gems/solr-ruby-0.0.7/lib/solr/connection.rb:165:in
> `post'

from /var/lib/gems/1.8/gems/solr-ruby-0.0.7/lib/solr/connection.rb:151:in
> `send'

from /var/lib/gems/1.8/gems/solr-ruby-0.0.7/lib/solr/connection.rb:174:in
> `create_and_send_query'

from /var/lib/gems/1.8/gems/solr-ruby-0.0.7/lib/solr/connection.rb:92:in
> `query'

from /home/mike/code/pubget.rails/app/models/article.rb:695:in `solr_search'

from /home/mike/code/pubget.rails/app/models/article.rb:635:in
> `solr_build_dictionary


>
Any help in understanding the exception would be greatly appreciated.

-Mike


spellcheck component in 1.4 distributed

2009-08-08 Thread mike anderson
Hi all,

I am e-mailing to inquire about the status of the spellchecking component in
1.4 (distributed). I saw SOLR-785, but it is unreleased and appears to be
for 1.5. Any help would be much appreciated.
Thanks in advance,
Mike


Re: How to use key with facet.prefix?

2009-08-08 Thread mike anderson
whoops, sorry guys

On Sat, Aug 8, 2009 at 12:37 PM, mike anderson wrote:

> Hi all,
>
> I am e-mailing to inquire about the status of the spellchecking component
> in 1.4 (distributed). I saw SOLR-785, but it is unreleased and appears to be
> for 1.5. Any help would be much appreciated.
> Thanks in advance,
> Mike
>
>
> (sorry if this sent twice)
>


Re: How to use key with facet.prefix?

2009-08-08 Thread mike anderson
Hi all,

I am e-mailing to inquire about the status of the spellchecking component in
1.4 (distributed). I saw SOLR-785, but it is unreleased and appears to be
for 1.5. Any help would be much appreciated.
Thanks in advance,
Mike


(sorry if this sent twice)


spellcheck component in 1.4 distributed

2009-08-07 Thread mike anderson
I am e-mailing to inquire about the status of the spellchecking component in
1.4 (distributed). I saw SOLR-785, but it is unreleased and for 1.5. Any
help would be much appreciated.
Thanks in advance,
Mike