Solr Cache
We have two servers, with the same index load balanced. The indexes are updated at the same time every day. Occasionally, a search on one server will return different results from the other server, even though the data used to create the index is exactly the same. Is this possibly due to caching? Does the cache reset automatically after the commit? The problem usually resolves itself - by all appearances, randomly, but I assume something I don't know is going on such as a new searcher starting up for example at some point in the day. All cache settings are the solrconfig defaults. Thank you ahead of time. Tim Christensen Director Media & Technology Vann's Inc. 406-203-4656 [EMAIL PROTECTED] http://www.vanns.com
failover sharding
Hi, Is there a way to put a timeout or have some way of ignoring shards that are not there? For instance, I have 4 shards, and they have overlap with the documents for redundancy. shard 1 = 0->200 shard 2 = 100->400 shard 3 = 300->600 shard 4 = 500->600 & 0->100 This means if one of my shards goes down, then I can still give results. If there was some option that said wait 1 second and then give up, this would work perfectly for me. -- Regards, Ian Connor
Re: Administrative questions
Jason Rennie wrote: On Wed, Aug 13, 2008 at 1:52 PM, Jon Drukman <[EMAIL PROTECTED]> wrote: Duh. I should have thought of that. I'm a big fan of djbdns so I'm quite familiar with daemontools. Thanks! :) My pleasure. Was nice to hear recently that DJB is moving toward more flexible licensing terms. For anyone unfamiliar w/ daemontools, here's DJB's explanation of why they rock compared to inittab, ttys, init.d, and rc.local: http://cr.yp.to/daemontools/faq/create.html#why in case anybody wants to know, here's how to run solr under daemontools. 1. install daemontools 2. create /etc/solr 3. create a user and group called solr 4. create shell script /etc/solr/run (edit to taste, i'm using the default jetty that comes with solr) #!/bin/sh exec 2>&1 cd /usr/local/apache-solr-1.2.0/example exec setuidgid solr java -jar start.jar 4. create /etc/solr/log/run containing: #!/bin/sh exec setuidgid solr multilog t ./main 5. ln -s /etc/solr /service/solr that is all. as long as you've got svscan set to launch when the system boots, solr will run and auto-restart on crashes. logs will be in /service/solr/log/main (auto-rotated). yay. -jsd-
Re: "Auto commit error" and java.io.FileNotFoundException
I've done some more sniffing on the Lucene list, and noticed that Otis made the following comment about a FileNotFoundException problem in late 2005: Are you using Windows and a compound index format (look at your index dir - does it have .cfs file(s))? This may be a bad combination, judging from people who reported this problem so far. (http://www.nabble.com/fnm-file-disappear-td1531775.html#a1531775) Again, a CFS index was indeed involved in my case, but my experience comes almost three years after Otis' message... On Fri, Aug 15, 2008 at 10:35 AM, Chris Harris <[EMAIL PROTECTED]> wrote: > > The following may or may not be relevant: I built the base 3M-ish doc > index on a Windows machine, and it's a compound (.cfs) format index. > (I actually created it not with Solr, but by using the index merging > tool that comes with Lucene in order to merge three different > non-compound format indexes that I'd previously made with Solr into a > single index.) Before I started adding documents, I moved the index to > a Linux machine running a newer version of Solr/Lucene than was on the > Windows machine. The stuff described above all happened on Linux. > > Any thoughts? > > Thanks a bunch, > Chris >
Re: Can I copy an index built on a Windows system to a Unix/Linux system?
There is a (SOLR-561) feature getting built for doing replication in any platform . The patch works and it is tested. Do not expect it to work with the current trunk because a lot has changed in trunk since the last patch . We will be updating it soon once the dust settles down. - On Fri, Aug 15, 2008 at 7:45 PM, johnwarde <[EMAIL PROTECTED]> wrote: > > Excellent! Many thanks for your help Eric! > > John > > > Erick Erickson wrote: >> >> I've done exactly this many times in straight Lucene. Since Solr is built >> on Lucene, I wouldn't anticipate any problems. >> >> Make sure your transfer is binary mode... >> >> Best >> Erick >> >> On Fri, Aug 15, 2008 at 8:02 AM, johnwarde <[EMAIL PROTECTED]> wrote: >> >>> >>> Hi, >>> >>> Can I copy an index built on a Windows system to a Unix/Linux system and >>> still work? >>> >>> Reason for my question: >>> I have been working with Solr for the last month on a Windows system and >>> I >>> have determined that we need to have a replication solution for our >>> future >>> needs (volume of documents to be indexed and query loads). >>> >>> At this point in time it looks like, from my research, that Solr does not >>> currently provide a reliable/tested replication strategy on Windows. >>> >>> However, I would like to continue to use Solr on Windows for now until >>> the >>> load on the single windows system becomes too great and requires us to >>> implement a replication strategy (one index master, many query slaves). >>> Hopefully, by that time a reliable replication strategy on Windows may >>> present itself but if it doesn't ... >>> >>> Can I make a binary copy of the index files from a windows system to a >>> Unix/Linux system and be read by a Solr on the Unix/Linux system. Would >>> there be any byte order problems? Or would I need to rebuild the index >>> from >>> the original data? >>> >>> Many thanks for your help! >>> >>> John >>> >>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Can-I-copy-an-index-built-on-a-Windows-system-to-a-Unix-Linux-system--tp18997540p18997540.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> > > -- > View this message in context: > http://www.nabble.com/Can-I-copy-an-index-built-on-a-Windows-system-to-a-Unix-Linux-system--tp18997540p18999382.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul
"Auto commit error" and java.io.FileNotFoundException
I have an index (different from the ones mentioned yesterday) that was working fine with 3M docs or so, but when I added a bunch more docs, bringing it closer to 4M docs, the index seemed to get corrupted. In particular, now when I start Solr up, or when when my indexing process tries add a document, I get a complaint about missing index files. The error on startup looks like this: 2008-08-15T10:18:54 1218820734592 92 org.apache.solr.core.MultiCore SEVERE org.apache.solr.common.SolrException log 10 java.lang.RuntimeException: java.io.FileNotFoundException: /ssd/solr-/solr/exhibitcore/data/index/_p7.fdt (No such file or directory) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:733) at org.apache.solr.core.SolrCore.(SolrCore.java:387) at org.apache.solr.core.MultiCore.create(MultiCore.java:255) at org.apache.solr.core.MultiCore.load(MultiCore.java:139) at org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: java.io.FileNotFoundException: /ssd/solr-/solr/exhibitcore/data/index/_p7.fdt (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile. (RandomAccessFile.java:233) at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor. (FSDirectory.java:506) at org.apache.lucene.store.FSDirectory$FSIndexInput. (FSDirectory.java:536) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445) at org.apache.lucene.index.FieldsReader. (FieldsReader.java:75) at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:308) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197) at org.apache.lucene.index.MultiSegmentReader. (MultiSegmentReader.java:55) at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:75) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636) at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63) at org.apache.lucene.index.IndexReader.open(IndexReader.java:209) at org.apache.lucene.index.IndexReader.open(IndexReader.java:173) at org.apache.solr.search.SolrIndexSearcher. (SolrIndexSearcher.java:93) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:724) ... 29 more And the error on doc add looks like this: 2008-08-15T09:51:30 1218819090142 6571937 org.apache.solr.core.SolrCore SEVERE org.apache.solr.common.SolrException log 14 java.io.FileNotFoundException: /ssd/solr-/solr/exhibitcore/data/index/_p7.fdt (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile. (RandomAccessFile.
partialResults, distributed search & SOLR-502
I was going to file a ticket like this: "A SOLR-303 query with &shards=host1,host2,host3 when host3 is down returns an error. One of the advantages of a shard implementation is that data can be stored redundantly across different shards, either as direct copies (e.g. when host1 and host3 are snapshooter'd copies of each other) or where there is some "data RAID" that stripes indexes for redundancy." But then I saw SOLR-502, which appears to be committed. If I have the above scenario (host1,host2,host3 where host3 is not up) and set a timeAllowed, will I still get a 400 or will it come back with "partial" results? If not, can we think of a way to get this to work? It's my understanding already that duplicate docIDs are merged in the SOLR-303 response, so other than building in some "this host isn't working, just move on and report it" and of course the work to index redundantly, we wouldn't need anything to achieve a good redundant shard implementation. B
Re: Highlighting returns incorrect text on some results?
Thanks Otis. I downloaded the nightly today and reindexed, and it seems that it was a bug that you've worked out since 1.2 as I don't see the issue anymore. Paul Otis Gospodnetic wrote: > > Paul, we had many highlighter-related changes since 1.2, so I suggest you > try the nightly. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: pdovyda2 <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Thursday, August 14, 2008 2:56:42 PM >> Subject: Highlighting returns incorrect text on some results? >> >> >> This is kind of a strange issue, but when I submit a query and ask for >> highlighting back, sometimes the highlighted text includes a question >> mark >> at the beginning, although a question mark character does not appear in >> the >> field that the highlighted text is taken from. >> >> I've put some sample XML output on the web at >> http://ucair.cs.uiuc.edu/pdovyda2/problem.xml >> If you look at the first and third highlights, you'll see what I'm >> talking >> about. >> >> Besides looking a bit odd, it is causing my application to break because >> the >> highlighted field is multivalued, and I was doing text matching to >> determine >> which of the values was chosen for highlighting. >> >> Is this actually a bug, or have I just misconfigured something? By the >> way, >> I am using the 1.2 release, I have not yet tried out a nightly build to >> see >> if this is an old problem. >> >> Thanks, >> Paul >> -- >> View this message in context: >> http://www.nabble.com/Highlighting-returns-incorrect-text-on-some-results--tp18987598p18987598.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Highlighting-returns-incorrect-text-on-some-results--tp18987598p19002545.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index size vs. number of documents
Here's an example. Consider 2 docs with terms: doc1: term1, term2, term3 doc2: term4, term5, term6 vs. doc1: term1, term2, term3 doc2: term1, term1, term6 All other things constant, the former will make index grow faster because it has more unique terms. Even if your OCR has garbage that makes noise in form of new unique terms, there will still be some overlap (like that term1 in the second case above). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Phillip Farber <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, August 15, 2008 12:22:30 PM > Subject: Re: Index size vs. number of documents > > By "Index size almost never grows linearly with the number of > documents" are you saying it increases more slowly that the number of > documents, i.e. sub-linearly or more rapidly? > > With dirty OCR the number of unique terms is always increasing due to > the garbage "words" > > -Phil > > Chris Hostetter wrote: > > : > I'm surprised, as you are, by the non-linearity. Out of curiosity, what > > is > > > > Unless the data in "stored" fields is significantly greater then "indexed" > > fields the Index size almost never grows linearly with the number of > > documents -- it's the number of unique terms that tends to primarily > > influence the size of the index. > > > > At some point someone on the java-user list who really understood the file > > formats wrote a really great forumla for estimating the size of the index > > assuming some ratios of unique terms per doc, but i can't find it now. > > > > > > -Hoss > >
Re: Shard searching clarifications
On Fri, Aug 15, 2008 at 12:34 PM, Phillip Farber <[EMAIL PROTECTED]> wrote: > If I have 2 solr instances (solr1 and solr2) each serving a shard > is it correct I only need to send my query to one of the shards, e.g. > > solr1:8080/select?shards=solr1,solr2 ... > > and that I'll get merged results over both shards returned to be by solr1? Yes. > The other question is: can I query each instance in "non-shard" mode, i.e. > just as > > solr1:8080/select? ... or solr2:8080/select? ... > > if I'm only interested in the documents in one of the shards? Yes. -Yonik
Shard searching clarifications
Hi, I just want to be clear on how sharding works so I have two questions. If I have 2 solr instances (solr1 and solr2) each serving a shard is it correct I only need to send my query to one of the shards, e.g. solr1:8080/select?shards=solr1,solr2 ... and that I'll get merged results over both shards returned to be by solr1? Or do I have to send my query to both solr1 and solr2 solr1:8080/select?shards=solr1,solr2 solr2:8080/select?shards=solr1,solr2 in which case responses from both solr1 and solr2 have my results? Actually that seems sort of pointless so I assume that the answer is I only need to send to one shard or the other. The other question is: can I query each instance in "non-shard" mode, i.e. just as solr1:8080/select? ... or solr2:8080/select? ... if I'm only interested in the documents in one of the shards? Thanks, Phil
Re: Index size vs. number of documents
By "Index size almost never grows linearly with the number of documents" are you saying it increases more slowly that the number of documents, i.e. sub-linearly or more rapidly? With dirty OCR the number of unique terms is always increasing due to the garbage "words" -Phil Chris Hostetter wrote: : > I'm surprised, as you are, by the non-linearity. Out of curiosity, what is Unless the data in "stored" fields is significantly greater then "indexed" fields the Index size almost never grows linearly with the number of documents -- it's the number of unique terms that tends to primarily influence the size of the index. At some point someone on the java-user list who really understood the file formats wrote a really great forumla for estimating the size of the index assuming some ratios of unique terms per doc, but i can't find it now. -Hoss
Re: Administrative questions
Jeremy, +1 for the jmx config or at least putting that into on the SolrJMX page. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Jeremy Hinegardner <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, August 13, 2008 8:12:33 PM > Subject: Re: Administrative questions > > On Tue, Aug 12, 2008 at 05:49:32PM -0700, Jon Drukman wrote: > > 1. How do people deal with having solr start when system reboots, manage > > the log output, etc. Right now I run it manually under a unix 'screen' > > command with a wrapper script that takes care of restarts when it crashes. > > That means that only my user can connect to it, and it can't happen when > > the system starts up... But I don't see any other way to control the > > process easily. > > We use a standalone jetty instance for our solr war, and I have that > controlled > with an init.d script for start/stop/restart. I'm actually packing our solr > server as an rpm with a customized jetty config, the solr war, the solr > configuration all the solr/bin scripts and an init.d script and deploying it > to > servers that way. > > I'd be happy to donate the enhanced jetty configuration (jmx and such), along > with the init.d script to the community if anyone wants it as part of the > example application. > > Or if people are interested in the rpm spec I can make that available as well. > > enjoy, > > -jeremy > > -- > > Jeremy Hinegardner [EMAIL PROTECTED]
Re: Indexing Only Parts of HTML Pages
Hi Nick, Yes, sounds like either custom Nutch parsing code or custom HTML parser that has the logic you described and feeds Solr with docs constructed based on this logic. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Nick Tkach <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, August 13, 2008 12:44:58 PM > Subject: Indexing Only Parts of HTML Pages > > I'm wondering, is there some way ("out of the box") to tell Solr that > we're only interested in indexing certain parts of a page? For example, > let's say I have a bunch of pages in my site that contain some common > navigation elements, roughly like this: > > > > > >Stuff here about parts of my site > > >More stuff about other parts of the site > > A bunch of stuff particular to each individual page... > > > > Is there some way to either tell Solr to not index what's in the two > divs whenever it encounters them (and it will-in nearly every page) or, > failing that, to somehow easily give content in those areas a large > negative score in order to get the same effect? > > FWIW, we are using Nutch to do the crawling, but as I understand it > there's no way to get Nutch to skip only parts of pages without writing > custom code, right?
Re: Can I copy an index built on a Windows system to a Unix/Linux system?
Excellent! Many thanks for your help Eric! John Erick Erickson wrote: > > I've done exactly this many times in straight Lucene. Since Solr is built > on Lucene, I wouldn't anticipate any problems. > > Make sure your transfer is binary mode... > > Best > Erick > > On Fri, Aug 15, 2008 at 8:02 AM, johnwarde <[EMAIL PROTECTED]> wrote: > >> >> Hi, >> >> Can I copy an index built on a Windows system to a Unix/Linux system and >> still work? >> >> Reason for my question: >> I have been working with Solr for the last month on a Windows system and >> I >> have determined that we need to have a replication solution for our >> future >> needs (volume of documents to be indexed and query loads). >> >> At this point in time it looks like, from my research, that Solr does not >> currently provide a reliable/tested replication strategy on Windows. >> >> However, I would like to continue to use Solr on Windows for now until >> the >> load on the single windows system becomes too great and requires us to >> implement a replication strategy (one index master, many query slaves). >> Hopefully, by that time a reliable replication strategy on Windows may >> present itself but if it doesn't ... >> >> Can I make a binary copy of the index files from a windows system to a >> Unix/Linux system and be read by a Solr on the Unix/Linux system. Would >> there be any byte order problems? Or would I need to rebuild the index >> from >> the original data? >> >> Many thanks for your help! >> >> John >> >> >> >> -- >> View this message in context: >> http://www.nabble.com/Can-I-copy-an-index-built-on-a-Windows-system-to-a-Unix-Linux-system--tp18997540p18997540.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Can-I-copy-an-index-built-on-a-Windows-system-to-a-Unix-Linux-system--tp18997540p18999382.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can I copy an index built on a Windows system to a Unix/Linux system?
I've done exactly this many times in straight Lucene. Since Solr is built on Lucene, I wouldn't anticipate any problems. Make sure your transfer is binary mode... Best Erick On Fri, Aug 15, 2008 at 8:02 AM, johnwarde <[EMAIL PROTECTED]> wrote: > > Hi, > > Can I copy an index built on a Windows system to a Unix/Linux system and > still work? > > Reason for my question: > I have been working with Solr for the last month on a Windows system and I > have determined that we need to have a replication solution for our future > needs (volume of documents to be indexed and query loads). > > At this point in time it looks like, from my research, that Solr does not > currently provide a reliable/tested replication strategy on Windows. > > However, I would like to continue to use Solr on Windows for now until the > load on the single windows system becomes too great and requires us to > implement a replication strategy (one index master, many query slaves). > Hopefully, by that time a reliable replication strategy on Windows may > present itself but if it doesn't ... > > Can I make a binary copy of the index files from a windows system to a > Unix/Linux system and be read by a Solr on the Unix/Linux system. Would > there be any byte order problems? Or would I need to rebuild the index from > the original data? > > Many thanks for your help! > > John > > > > -- > View this message in context: > http://www.nabble.com/Can-I-copy-an-index-built-on-a-Windows-system-to-a-Unix-Linux-system--tp18997540p18997540.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: IndexOutOfBoundsException
Ignore that error - I think I installed the Sun JVM incorrectly - this seems unrelated to the error. On Fri, Aug 15, 2008 at 9:01 AM, Ian Connor <[EMAIL PROTECTED]> wrote: > I tried it again (rm -rf /solr/index and post all the docs again) but > this time, I get the error (I also switched to the Sun JVM to see if > that helped): > > 15-Aug-08 4:57:08 PM org.apache.solr.core.SolrCore execute > INFO: webapp=/solr path=/update params={} status=500 QTime=4576 > 15-Aug-08 4:57:08 PM org.apache.solr.common.SolrException log > SEVERE: javax.xml.stream.XMLStreamException: required string: "field" > at gnu.xml.stream.XMLParser.error(libgcj.so.8rh) > at gnu.xml.stream.XMLParser.require(libgcj.so.8rh) > at gnu.xml.stream.XMLParser.readEndElement(libgcj.so.8rh) > at gnu.xml.stream.XMLParser.next(libgcj.so.8rh) > at > org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:323) > at > org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:197) > at > org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:125) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1143) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > > 2008-08-15 16:57:08.440::WARN: EXCEPTION > java.lang.NullPointerException > at org.mortbay.io.bio.SocketEndPoint.getRemoteAddr(SocketEndPoint.java:116) > at org.mortbay.jetty.Request.getRemoteAddr(Request.java:746) > at org.mortbay.jetty.NCSARequestLog.log(NCSARequestLog.java:230) > at > org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:51) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > > > On Fri, Aug 15, 2008 at 8:26 AM, Doug Steigerwald > <[EMAIL PROTECTED]> wrote: >> We actually have this same exact issue on 5 of our cores. We're just going >> to wipe the index and reindex soon, but it isn't actually causing any >> problems for us. We can update the index just fine, there's just no merging >> going on. >> >> Ours happened when I reloaded all of our cores for a schema change. I don't >> do that any more ;). >> >> Doug >> >> On Aug 14, 2008, at 11:08 PM, Yonik Seeley wrote: >> >>> Since this looks like more of a lucene issue, I've replied in >>> [EMAIL PROTECTED] >>> >>> -Yonik >>> >>> On Thu, Aug 14, 2008 at 10:18 PM, Ian Connor <[EMAIL PROTECTED]> wrote: I seem to be able to reproduce this very easily and the data is medline (so I am sure I can share it if needed with a quick email to check). - I am using fedora: %uname -a Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30
Re: IndexOutOfBoundsException
I tried it again (rm -rf /solr/index and post all the docs again) but this time, I get the error (I also switched to the Sun JVM to see if that helped): 15-Aug-08 4:57:08 PM org.apache.solr.core.SolrCore execute INFO: webapp=/solr path=/update params={} status=500 QTime=4576 15-Aug-08 4:57:08 PM org.apache.solr.common.SolrException log SEVERE: javax.xml.stream.XMLStreamException: required string: "field" at gnu.xml.stream.XMLParser.error(libgcj.so.8rh) at gnu.xml.stream.XMLParser.require(libgcj.so.8rh) at gnu.xml.stream.XMLParser.readEndElement(libgcj.so.8rh) at gnu.xml.stream.XMLParser.next(libgcj.so.8rh) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:323) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:197) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:125) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1143) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) 2008-08-15 16:57:08.440::WARN: EXCEPTION java.lang.NullPointerException at org.mortbay.io.bio.SocketEndPoint.getRemoteAddr(SocketEndPoint.java:116) at org.mortbay.jetty.Request.getRemoteAddr(Request.java:746) at org.mortbay.jetty.NCSARequestLog.log(NCSARequestLog.java:230) at org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:51) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) On Fri, Aug 15, 2008 at 8:26 AM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: > We actually have this same exact issue on 5 of our cores. We're just going > to wipe the index and reindex soon, but it isn't actually causing any > problems for us. We can update the index just fine, there's just no merging > going on. > > Ours happened when I reloaded all of our cores for a schema change. I don't > do that any more ;). > > Doug > > On Aug 14, 2008, at 11:08 PM, Yonik Seeley wrote: > >> Since this looks like more of a lucene issue, I've replied in >> [EMAIL PROTECTED] >> >> -Yonik >> >> On Thu, Aug 14, 2008 at 10:18 PM, Ian Connor <[EMAIL PROTECTED]> wrote: >>> >>> I seem to be able to reproduce this very easily and the data is >>> medline (so I am sure I can share it if needed with a quick email to >>> check). >>> >>> - I am using fedora: >>> %uname -a >>> Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30 >>> 13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux >>> %java -version >>> java version "1.7.0" >>> IcedTea Runtime Environment (build 1.7.0-b21) >>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode) >>> - single core (will use shards but each machine just as one HDD so >>> didn't see how cores
Re: IndexOutOfBoundsException
We actually have this same exact issue on 5 of our cores. We're just going to wipe the index and reindex soon, but it isn't actually causing any problems for us. We can update the index just fine, there's just no merging going on. Ours happened when I reloaded all of our cores for a schema change. I don't do that any more ;). Doug On Aug 14, 2008, at 11:08 PM, Yonik Seeley wrote: Since this looks like more of a lucene issue, I've replied in [EMAIL PROTECTED] -Yonik On Thu, Aug 14, 2008 at 10:18 PM, Ian Connor <[EMAIL PROTECTED]> wrote: I seem to be able to reproduce this very easily and the data is medline (so I am sure I can share it if needed with a quick email to check). - I am using fedora: %uname -a Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30 13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux %java -version java version "1.7.0" IcedTea Runtime Environment (build 1.7.0-b21) IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode) - single core (will use shards but each machine just as one HDD so didn't see how cores would help but I am new at this) - next run I will keep the output to check for earlier errors - very and I can share code + data if that will help On Thu, Aug 14, 2008 at 4:23 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: Yikes... not good. This shouldn't be due to anything you did wrong Ian... it looks like a lucene bug. Some questions: - what platform are you running on, and what JVM? - are you using multicore? (I fixed some index locking bugs recently) - are there any exceptions in the log before this? - how reproducible is this? -Yonik On Thu, Aug 14, 2008 at 2:47 PM, Ian Connor <[EMAIL PROTECTED]> wrote: Hi, I have rebuilt my index a few times (it should get up to about 4 Million but around 1 Million it starts to fall apart). Exception in thread "Lucene Merge Thread #0" org.apache.lucene.index.MergePolicy$MergeException: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33 at org .apache .lucene .index .ConcurrentMergeScheduler .handleMergeException(ConcurrentMergeScheduler.java:323) at org.apache.lucene.index.ConcurrentMergeScheduler $MergeThread.run(ConcurrentMergeScheduler.java:300) Caused by: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33 at java.util.ArrayList.rangeCheck(ArrayList.java:572) at java.util.ArrayList.get(ArrayList.java:350) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:188) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java: 670) at org .apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java: 349) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 3998) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3650) at org .apache .lucene .index .ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java: 214) at org.apache.lucene.index.ConcurrentMergeScheduler $MergeThread.run(ConcurrentMergeScheduler.java:269) When this happens, the disk usage goes right up and the indexing really starts to slow down. I am using a Solr build from about a week ago - so my Lucene is at 2.4 according to the war files. Has anyone seen this error before? Is it possible to tell which Array is too large? Would it be an Array I am sending in or another internal one? Regards, Ian Connor -- Regards, Ian Connor
Can I copy an index built on a Windows system to a Unix/Linux system?
Hi, Can I copy an index built on a Windows system to a Unix/Linux system and still work? Reason for my question: I have been working with Solr for the last month on a Windows system and I have determined that we need to have a replication solution for our future needs (volume of documents to be indexed and query loads). At this point in time it looks like, from my research, that Solr does not currently provide a reliable/tested replication strategy on Windows. However, I would like to continue to use Solr on Windows for now until the load on the single windows system becomes too great and requires us to implement a replication strategy (one index master, many query slaves). Hopefully, by that time a reliable replication strategy on Windows may present itself but if it doesn't ... Can I make a binary copy of the index files from a windows system to a Unix/Linux system and be read by a Solr on the Unix/Linux system. Would there be any byte order problems? Or would I need to rebuild the index from the original data? Many thanks for your help! John -- View this message in context: http://www.nabble.com/Can-I-copy-an-index-built-on-a-Windows-system-to-a-Unix-Linux-system--tp18997540p18997540.html Sent from the Solr - User mailing list archive at Nabble.com.