Solr Cache

2008-08-15 Thread Tim Christensen
We have two servers, with the same index load balanced. The indexes  
are updated at the same time every day. Occasionally, a search on one  
server will return different results from the other server, even  
though the data used to create the index is exactly the same.


Is this possibly due to caching? Does the cache reset automatically  
after the commit?


The problem usually resolves itself - by all appearances, randomly,  
but I assume something I don't know is going on such as a new searcher  
starting up for example at some point in the day. All cache settings  
are the solrconfig defaults.


Thank you ahead of time.


Tim Christensen
Director Media & Technology
Vann's Inc.
406-203-4656

[EMAIL PROTECTED]

http://www.vanns.com









failover sharding

2008-08-15 Thread Ian Connor
Hi,

Is there a way to put a timeout or have some way of ignoring shards
that are not there? For instance, I have 4 shards, and they have
overlap with the documents for redundancy.

shard 1 = 0->200
shard 2 = 100->400
shard 3 = 300->600
shard 4 = 500->600 & 0->100

This means if one of my shards goes down, then I can still give
results. If there was some option that said wait 1 second and then
give up, this would work perfectly for me.

-- 
Regards,

Ian Connor


Re: Administrative questions

2008-08-15 Thread Jon Drukman

Jason Rennie wrote:

On Wed, Aug 13, 2008 at 1:52 PM, Jon Drukman <[EMAIL PROTECTED]> wrote:


Duh.  I should have thought of that.  I'm a big fan of djbdns so I'm quite
familiar with daemontools.

Thanks!



:)  My pleasure.  Was nice to hear recently that DJB is moving toward more
flexible licensing terms.  For anyone unfamiliar w/ daemontools, here's
DJB's explanation of why they rock compared to inittab, ttys, init.d, and
rc.local:

http://cr.yp.to/daemontools/faq/create.html#why


in case anybody wants to know, here's how to run solr under daemontools.

1. install daemontools
2. create /etc/solr
3. create a user and group called solr
4. create shell script /etc/solr/run  (edit to taste, i'm using the 
default jetty that comes with solr)


#!/bin/sh
exec 2>&1
cd /usr/local/apache-solr-1.2.0/example
exec setuidgid solr java -jar start.jar


4. create /etc/solr/log/run containing:

#!/bin/sh
exec setuidgid solr multilog t ./main

5. ln -s /etc/solr /service/solr

that is all.  as long as you've got svscan set to launch when the system 
boots, solr will run and auto-restart on crashes.  logs will be in 
/service/solr/log/main (auto-rotated).


yay.
-jsd-



Re: "Auto commit error" and java.io.FileNotFoundException

2008-08-15 Thread Chris Harris
I've done some more sniffing on the Lucene list, and noticed that Otis
made the following comment about a FileNotFoundException problem in
late 2005:

Are you using Windows and a compound index format (look at your index
dir - does it have .cfs file(s))?

This may be a bad combination, judging from people who reported this
problem so far.

(http://www.nabble.com/fnm-file-disappear-td1531775.html#a1531775)

Again, a CFS index was indeed involved in my case, but my experience
comes almost three years after Otis' message...

On Fri, Aug 15, 2008 at 10:35 AM, Chris Harris <[EMAIL PROTECTED]> wrote:
>
> The following may or may not be relevant: I built the base 3M-ish doc
> index on a Windows machine, and it's a compound (.cfs) format index.
> (I actually created it not with Solr, but by using the index merging
> tool that comes with Lucene in order to merge three different
> non-compound format indexes that I'd previously made with Solr into a
> single index.) Before I started adding documents, I moved the index to
> a Linux machine running a newer version of Solr/Lucene than was on the
> Windows machine. The stuff described above all happened on Linux.
>
> Any thoughts?
>
> Thanks a bunch,
> Chris
>


Re: Can I copy an index built on a Windows system to a Unix/Linux system?

2008-08-15 Thread Noble Paul നോബിള്‍ नोब्ळ्
There is a (SOLR-561) feature getting built for doing replication in
any platform . The patch works and it is tested. Do not expect it to
work with the current trunk because a lot has changed in  trunk since
the last patch . We will be updating it soon once the dust settles
down.
-

On Fri, Aug 15, 2008 at 7:45 PM, johnwarde <[EMAIL PROTECTED]> wrote:
>
> Excellent! Many thanks for your help Eric!
>
> John
>
>
> Erick Erickson wrote:
>>
>> I've done exactly this many times in straight Lucene. Since Solr is built
>> on Lucene, I wouldn't anticipate any problems.
>>
>> Make sure your transfer is binary mode...
>>
>> Best
>> Erick
>>
>> On Fri, Aug 15, 2008 at 8:02 AM, johnwarde <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Hi,
>>>
>>> Can I copy an index built on a Windows system to a Unix/Linux system and
>>> still work?
>>>
>>> Reason for my question:
>>> I have been working with Solr for the last month on a Windows system and
>>> I
>>> have determined that we need to have a replication solution for our
>>> future
>>> needs (volume of documents to be indexed and query loads).
>>>
>>> At this point in time it looks like, from my research, that Solr does not
>>> currently provide a reliable/tested replication strategy on Windows.
>>>
>>> However, I would like to continue to use Solr on Windows for now until
>>> the
>>> load on the single windows system becomes too great and requires us to
>>> implement a replication strategy (one index master, many query slaves).
>>> Hopefully, by that time a reliable replication strategy on Windows may
>>> present itself but if it doesn't ...
>>>
>>> Can I make a binary copy of the index files from a windows system to a
>>> Unix/Linux system and be read by a Solr on the Unix/Linux system.  Would
>>> there be any byte order problems? Or would I need to rebuild the index
>>> from
>>> the original data?
>>>
>>> Many thanks for your help!
>>>
>>> John
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Can-I-copy-an-index-built-on-a-Windows-system-to-a-Unix-Linux-system--tp18997540p18997540.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Can-I-copy-an-index-built-on-a-Windows-system-to-a-Unix-Linux-system--tp18997540p18999382.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul


"Auto commit error" and java.io.FileNotFoundException

2008-08-15 Thread Chris Harris
I have an index (different from the ones mentioned yesterday) that was
working fine with 3M docs or so, but when I added a bunch more docs,
bringing it closer to 4M docs, the index seemed to get corrupted. In
particular, now when I start Solr up, or when when my indexing process
tries add a document, I get a complaint about missing index files.

The error on startup looks like this:


  2008-08-15T10:18:54
  1218820734592
  92
  org.apache.solr.core.MultiCore
  SEVERE
  org.apache.solr.common.SolrException
  log
  10
  java.lang.RuntimeException: java.io.FileNotFoundException:
/ssd/solr-/solr/exhibitcore/data/index/_p7.fdt (No such file or
directory)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:733)
at org.apache.solr.core.SolrCore.(SolrCore.java:387)
at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
at 
org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.io.FileNotFoundException:
/ssd/solr-/solr/exhibitcore/data/index/_p7.fdt (No such file or
directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:506)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:536)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
at 
org.apache.lucene.index.FieldsReader.(FieldsReader.java:75)
at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:308)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
at 
org.apache.lucene.index.MultiSegmentReader.(MultiSegmentReader.java:55)
at 
org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:75)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
at 
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
at 
org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:93)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:724)
... 29 more



And the error on doc add looks like this:


  2008-08-15T09:51:30
  1218819090142
  6571937
  org.apache.solr.core.SolrCore
  SEVERE
  org.apache.solr.common.SolrException
  log
  14
  java.io.FileNotFoundException:
/ssd/solr-/solr/exhibitcore/data/index/_p7.fdt (No such file or
directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.

partialResults, distributed search & SOLR-502

2008-08-15 Thread Brian Whitman

I was going to file a ticket like this:

"A SOLR-303 query with &shards=host1,host2,host3 when host3 is down  
returns an error. One of the advantages of a shard implementation is  
that data can be stored redundantly across different shards, either as  
direct copies (e.g. when host1 and host3 are snapshooter'd copies of  
each other) or where there is some "data RAID" that stripes indexes  
for redundancy."


But then I saw SOLR-502, which appears to be committed.

If I have the above scenario (host1,host2,host3 where host3 is not up)  
and set a timeAllowed, will I still get a 400 or will it come back  
with "partial" results? If not, can we think of a way to get this to  
work? It's my understanding already that duplicate docIDs are merged  
in the SOLR-303 response, so other than building in some "this host  
isn't working, just move on and report it" and of course the work to  
index redundantly, we wouldn't need anything to achieve a good  
redundant shard implementation.


B




Re: Highlighting returns incorrect text on some results?

2008-08-15 Thread pdovyda2

Thanks Otis.  I downloaded the nightly today and reindexed, and it seems that
it was a bug that you've worked out since 1.2 as I don't see the issue
anymore.

Paul


Otis Gospodnetic wrote:
> 
> Paul, we had many highlighter-related changes since 1.2, so I suggest you
> try the nightly.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: pdovyda2 <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, August 14, 2008 2:56:42 PM
>> Subject: Highlighting returns incorrect text on some results?
>> 
>> 
>> This is kind of a strange issue, but when I submit a query and ask for
>> highlighting back, sometimes the highlighted text includes a question
>> mark
>> at the beginning, although a question mark character does not appear in
>> the
>> field that the highlighted text is taken from.
>> 
>> I've put some sample XML output on the web at
>> http://ucair.cs.uiuc.edu/pdovyda2/problem.xml
>> If you look at the first and third highlights, you'll see what I'm
>> talking
>> about.  
>> 
>> Besides looking a bit odd, it is causing my application to break because
>> the
>> highlighted field is multivalued, and I was doing text matching to
>> determine
>> which of the values was chosen for highlighting.
>> 
>> Is this actually a bug, or have I just misconfigured something?  By the
>> way,
>> I am using the 1.2 release, I have not yet tried out a nightly build to
>> see
>> if this is an old problem.
>> 
>> Thanks,
>> Paul
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Highlighting-returns-incorrect-text-on-some-results--tp18987598p18987598.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Highlighting-returns-incorrect-text-on-some-results--tp18987598p19002545.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Index size vs. number of documents

2008-08-15 Thread Otis Gospodnetic
Here's an example.
Consider 2 docs with terms:

doc1: term1, term2, term3
doc2: term4, term5, term6

vs.

doc1: term1, term2, term3
doc2: term1, term1, term6

All other things constant, the former will make index grow faster because it 
has more unique terms.  Even if your OCR has garbage that makes noise in form 
of new unique terms, there will still be some overlap (like that term1 in the 
second case above).

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Phillip Farber <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, August 15, 2008 12:22:30 PM
> Subject: Re: Index size vs. number of documents
> 
> By "Index size almost never grows linearly with the number of
> documents" are you saying it increases more slowly that the number of 
> documents, i.e. sub-linearly or more rapidly?
> 
> With dirty OCR the number of unique terms is always increasing due to 
> the garbage "words"
> 
> -Phil
> 
> Chris Hostetter wrote:
> > : > I'm surprised, as you are, by the non-linearity. Out of curiosity, what 
> > is
> > 
> > Unless the data in "stored" fields is significantly greater then "indexed" 
> > fields the Index size almost never grows linearly with the number of 
> > documents -- it's the number of unique terms that tends to primarily 
> > influence the size of the index.
> > 
> > At some point someone on the java-user list who really understood the file 
> > formats wrote a really great forumla for estimating the size of the index 
> > assuming some ratios of unique terms per doc, but i can't find it now.
> > 
> > 
> > -Hoss
> > 



Re: Shard searching clarifications

2008-08-15 Thread Yonik Seeley
On Fri, Aug 15, 2008 at 12:34 PM, Phillip Farber <[EMAIL PROTECTED]> wrote:
> If I have 2 solr instances (solr1 and solr2) each serving a shard
> is it correct I only need to send my query to one of the shards, e.g.
>
> solr1:8080/select?shards=solr1,solr2 ...
>
> and that I'll get merged results over both shards returned to be by solr1?

Yes.

> The other question is: can I query each instance in "non-shard" mode, i.e.
> just as
>
> solr1:8080/select? ... or solr2:8080/select? ...
>
> if I'm only interested in the documents in one of the shards?

Yes.

-Yonik


Shard searching clarifications

2008-08-15 Thread Phillip Farber


Hi,

I just want to be clear on how sharding works so I have two questions.

If I have 2 solr instances (solr1 and solr2) each serving a shard
is it correct I only need to send my query to one of the shards, e.g.

solr1:8080/select?shards=solr1,solr2 ...

and that I'll get merged results over both shards returned to be by 
solr1? Or do I have to send my query to both solr1 and solr2



solr1:8080/select?shards=solr1,solr2
solr2:8080/select?shards=solr1,solr2

in which case responses from both solr1 and solr2 have my results? 
Actually that seems sort of pointless so I assume that the answer is I 
only need to send to one shard or the other.




The other question is: can I query each instance in "non-shard" mode, 
i.e. just as


solr1:8080/select? ... or solr2:8080/select? ...

if I'm only interested in the documents in one of the shards?


Thanks,

Phil



Re: Index size vs. number of documents

2008-08-15 Thread Phillip Farber

By "Index size almost never grows linearly with the number of
documents" are you saying it increases more slowly that the number of 
documents, i.e. sub-linearly or more rapidly?


With dirty OCR the number of unique terms is always increasing due to 
the garbage "words"


-Phil

Chris Hostetter wrote:

: > I'm surprised, as you are, by the non-linearity. Out of curiosity, what is

Unless the data in "stored" fields is significantly greater then "indexed" 
fields the Index size almost never grows linearly with the number of 
documents -- it's the number of unique terms that tends to primarily 
influence the size of the index.


At some point someone on the java-user list who really understood the file 
formats wrote a really great forumla for estimating the size of the index 
assuming some ratios of unique terms per doc, but i can't find it now.



-Hoss



Re: Administrative questions

2008-08-15 Thread Otis Gospodnetic
Jeremy, +1 for the jmx config or at least putting that into on the SolrJMX 
page.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Jeremy Hinegardner <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, August 13, 2008 8:12:33 PM
> Subject: Re: Administrative questions
> 
> On Tue, Aug 12, 2008 at 05:49:32PM -0700, Jon Drukman wrote:
> > 1. How do people deal with having solr start when system reboots, manage 
> > the log output, etc.  Right now I run it manually under a unix 'screen' 
> > command with a wrapper script that takes care of restarts when it crashes.  
> > That means that only my user can connect to it, and it can't happen when 
> > the system starts up... But I don't see any other way to control the 
> > process easily.
> 
> We use a standalone jetty instance for our solr war, and I have that 
> controlled
> with an init.d script for start/stop/restart.  I'm actually packing our solr
> server as an rpm with a customized jetty config, the solr war, the solr
> configuration all the solr/bin scripts and an init.d script and deploying it 
> to
> servers that way.
> 
> I'd be happy to donate the enhanced jetty configuration (jmx and such), along
> with the init.d script to the community if anyone wants it as part of the
> example application.
> 
> Or if people are interested in the rpm spec I can make that available as well.
> 
> enjoy,
> 
> -jeremy
> 
> --
> 
> Jeremy Hinegardner  [EMAIL PROTECTED] 



Re: Indexing Only Parts of HTML Pages

2008-08-15 Thread Otis Gospodnetic
Hi Nick,

Yes, sounds like either custom Nutch parsing code or custom HTML parser that 
has the logic you described and feeds Solr with docs constructed based on this 
logic.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Nick Tkach <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, August 13, 2008 12:44:58 PM
> Subject: Indexing Only Parts of HTML Pages
> 
> I'm wondering, is there some way ("out of the box") to tell Solr that 
> we're only interested in indexing certain parts of a page?  For example, 
> let's say I have a bunch of pages in my site that contain some common 
> navigation elements, roughly like this:
> 
> 
>   
>   
> 

>Stuff here about parts of my site
> 
> 

>More stuff about other parts of the site
> 
>  A bunch of stuff particular to each individual page...
>   
> 
> 
> Is there some way to either tell Solr to not index what's in the two 
> divs whenever it encounters them (and it will-in nearly every page) or, 
> failing that, to somehow easily give content in those areas a large 
> negative score in order to get the same effect?
> 
> FWIW, we are using Nutch to do the crawling, but as I understand it 
> there's no way to get Nutch to skip only parts of pages without writing 
> custom code, right?



Re: Can I copy an index built on a Windows system to a Unix/Linux system?

2008-08-15 Thread johnwarde

Excellent! Many thanks for your help Eric!

John


Erick Erickson wrote:
> 
> I've done exactly this many times in straight Lucene. Since Solr is built
> on Lucene, I wouldn't anticipate any problems.
> 
> Make sure your transfer is binary mode...
> 
> Best
> Erick
> 
> On Fri, Aug 15, 2008 at 8:02 AM, johnwarde <[EMAIL PROTECTED]> wrote:
> 
>>
>> Hi,
>>
>> Can I copy an index built on a Windows system to a Unix/Linux system and
>> still work?
>>
>> Reason for my question:
>> I have been working with Solr for the last month on a Windows system and
>> I
>> have determined that we need to have a replication solution for our
>> future
>> needs (volume of documents to be indexed and query loads).
>>
>> At this point in time it looks like, from my research, that Solr does not
>> currently provide a reliable/tested replication strategy on Windows.
>>
>> However, I would like to continue to use Solr on Windows for now until
>> the
>> load on the single windows system becomes too great and requires us to
>> implement a replication strategy (one index master, many query slaves).
>> Hopefully, by that time a reliable replication strategy on Windows may
>> present itself but if it doesn't ...
>>
>> Can I make a binary copy of the index files from a windows system to a
>> Unix/Linux system and be read by a Solr on the Unix/Linux system.  Would
>> there be any byte order problems? Or would I need to rebuild the index
>> from
>> the original data?
>>
>> Many thanks for your help!
>>
>> John
>>
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Can-I-copy-an-index-built-on-a-Windows-system-to-a-Unix-Linux-system--tp18997540p18997540.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Can-I-copy-an-index-built-on-a-Windows-system-to-a-Unix-Linux-system--tp18997540p18999382.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Can I copy an index built on a Windows system to a Unix/Linux system?

2008-08-15 Thread Erick Erickson
I've done exactly this many times in straight Lucene. Since Solr is built
on Lucene, I wouldn't anticipate any problems.

Make sure your transfer is binary mode...

Best
Erick

On Fri, Aug 15, 2008 at 8:02 AM, johnwarde <[EMAIL PROTECTED]> wrote:

>
> Hi,
>
> Can I copy an index built on a Windows system to a Unix/Linux system and
> still work?
>
> Reason for my question:
> I have been working with Solr for the last month on a Windows system and I
> have determined that we need to have a replication solution for our future
> needs (volume of documents to be indexed and query loads).
>
> At this point in time it looks like, from my research, that Solr does not
> currently provide a reliable/tested replication strategy on Windows.
>
> However, I would like to continue to use Solr on Windows for now until the
> load on the single windows system becomes too great and requires us to
> implement a replication strategy (one index master, many query slaves).
> Hopefully, by that time a reliable replication strategy on Windows may
> present itself but if it doesn't ...
>
> Can I make a binary copy of the index files from a windows system to a
> Unix/Linux system and be read by a Solr on the Unix/Linux system.  Would
> there be any byte order problems? Or would I need to rebuild the index from
> the original data?
>
> Many thanks for your help!
>
> John
>
>
>
> --
> View this message in context:
> http://www.nabble.com/Can-I-copy-an-index-built-on-a-Windows-system-to-a-Unix-Linux-system--tp18997540p18997540.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: IndexOutOfBoundsException

2008-08-15 Thread Ian Connor
Ignore that error - I think I installed the Sun JVM incorrectly - this
seems unrelated to the error.

On Fri, Aug 15, 2008 at 9:01 AM, Ian Connor <[EMAIL PROTECTED]> wrote:
> I tried it again (rm -rf /solr/index and post all the docs again) but
> this time, I get the error (I also switched to the Sun JVM to see if
> that helped):
>
> 15-Aug-08 4:57:08 PM org.apache.solr.core.SolrCore execute
> INFO: webapp=/solr path=/update params={} status=500 QTime=4576
> 15-Aug-08 4:57:08 PM org.apache.solr.common.SolrException log
> SEVERE: javax.xml.stream.XMLStreamException: required string: "field"
>   at gnu.xml.stream.XMLParser.error(libgcj.so.8rh)
>   at gnu.xml.stream.XMLParser.require(libgcj.so.8rh)
>   at gnu.xml.stream.XMLParser.readEndElement(libgcj.so.8rh)
>   at gnu.xml.stream.XMLParser.next(libgcj.so.8rh)
>   at 
> org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:323)
>   at 
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:197)
>   at 
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:125)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1143)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>   at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>   at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>   at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>   at org.mortbay.jetty.Server.handle(Server.java:285)
>   at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
>   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
>   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
>   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>   at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>   at 
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
>
> 2008-08-15 16:57:08.440::WARN:  EXCEPTION
> java.lang.NullPointerException
>   at org.mortbay.io.bio.SocketEndPoint.getRemoteAddr(SocketEndPoint.java:116)
>   at org.mortbay.jetty.Request.getRemoteAddr(Request.java:746)
>   at org.mortbay.jetty.NCSARequestLog.log(NCSARequestLog.java:230)
>   at 
> org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:51)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>   at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>   at org.mortbay.jetty.Server.handle(Server.java:285)
>   at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
>   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
>   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
>   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>   at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>   at 
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
>
>
> On Fri, Aug 15, 2008 at 8:26 AM, Doug Steigerwald
> <[EMAIL PROTECTED]> wrote:
>> We actually have this same exact issue on 5 of our cores.  We're just going
>> to wipe the index and reindex soon, but it isn't actually causing any
>> problems for us.  We can update the index just fine, there's just no merging
>> going on.
>>
>> Ours happened when I reloaded all of our cores for a schema change.  I don't
>> do that any more ;).
>>
>> Doug
>>
>> On Aug 14, 2008, at 11:08 PM, Yonik Seeley wrote:
>>
>>> Since this looks like more of a lucene issue, I've replied in
>>> [EMAIL PROTECTED]
>>>
>>> -Yonik
>>>
>>> On Thu, Aug 14, 2008 at 10:18 PM, Ian Connor <[EMAIL PROTECTED]> wrote:

 I seem to be able to reproduce this very easily and the data is
 medline (so I am sure I can share it if needed with a quick email to
 check).

 - I am using fedora:
 %uname -a
 Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30

Re: IndexOutOfBoundsException

2008-08-15 Thread Ian Connor
I tried it again (rm -rf /solr/index and post all the docs again) but
this time, I get the error (I also switched to the Sun JVM to see if
that helped):

15-Aug-08 4:57:08 PM org.apache.solr.core.SolrCore execute
INFO: webapp=/solr path=/update params={} status=500 QTime=4576
15-Aug-08 4:57:08 PM org.apache.solr.common.SolrException log
SEVERE: javax.xml.stream.XMLStreamException: required string: "field"
   at gnu.xml.stream.XMLParser.error(libgcj.so.8rh)
   at gnu.xml.stream.XMLParser.require(libgcj.so.8rh)
   at gnu.xml.stream.XMLParser.readEndElement(libgcj.so.8rh)
   at gnu.xml.stream.XMLParser.next(libgcj.so.8rh)
   at 
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:323)
   at 
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:197)
   at 
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:125)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1143)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
   at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
   at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
   at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
   at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
   at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
   at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

2008-08-15 16:57:08.440::WARN:  EXCEPTION
java.lang.NullPointerException
   at org.mortbay.io.bio.SocketEndPoint.getRemoteAddr(SocketEndPoint.java:116)
   at org.mortbay.jetty.Request.getRemoteAddr(Request.java:746)
   at org.mortbay.jetty.NCSARequestLog.log(NCSARequestLog.java:230)
   at 
org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:51)
   at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
   at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
   at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
   at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


On Fri, Aug 15, 2008 at 8:26 AM, Doug Steigerwald
<[EMAIL PROTECTED]> wrote:
> We actually have this same exact issue on 5 of our cores.  We're just going
> to wipe the index and reindex soon, but it isn't actually causing any
> problems for us.  We can update the index just fine, there's just no merging
> going on.
>
> Ours happened when I reloaded all of our cores for a schema change.  I don't
> do that any more ;).
>
> Doug
>
> On Aug 14, 2008, at 11:08 PM, Yonik Seeley wrote:
>
>> Since this looks like more of a lucene issue, I've replied in
>> [EMAIL PROTECTED]
>>
>> -Yonik
>>
>> On Thu, Aug 14, 2008 at 10:18 PM, Ian Connor <[EMAIL PROTECTED]> wrote:
>>>
>>> I seem to be able to reproduce this very easily and the data is
>>> medline (so I am sure I can share it if needed with a quick email to
>>> check).
>>>
>>> - I am using fedora:
>>> %uname -a
>>> Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30
>>> 13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
>>> %java -version
>>> java version "1.7.0"
>>> IcedTea Runtime Environment (build 1.7.0-b21)
>>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
>>> - single core (will use shards but each machine just as one HDD so
>>> didn't see how cores 

Re: IndexOutOfBoundsException

2008-08-15 Thread Doug Steigerwald
We actually have this same exact issue on 5 of our cores.  We're just  
going to wipe the index and reindex soon, but it isn't actually  
causing any problems for us.  We can update the index just fine,  
there's just no merging going on.


Ours happened when I reloaded all of our cores for a schema change.  I  
don't do that any more ;).


Doug

On Aug 14, 2008, at 11:08 PM, Yonik Seeley wrote:


Since this looks like more of a lucene issue, I've replied in
[EMAIL PROTECTED]

-Yonik

On Thu, Aug 14, 2008 at 10:18 PM, Ian Connor <[EMAIL PROTECTED]>  
wrote:

I seem to be able to reproduce this very easily and the data is
medline (so I am sure I can share it if needed with a quick email to
check).

- I am using fedora:
%uname -a
Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30
13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
%java -version
java version "1.7.0"
IcedTea Runtime Environment (build 1.7.0-b21)
IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
- single core (will use shards but each machine just as one HDD so
didn't see how cores would help but I am new at this)
- next run I will keep the output to check for earlier errors
- very and I can share code + data if that will help

On Thu, Aug 14, 2008 at 4:23 PM, Yonik Seeley <[EMAIL PROTECTED]>  
wrote:

Yikes... not good.  This shouldn't be due to anything you did wrong
Ian... it looks like a lucene bug.

Some questions:
- what platform are you running on, and what JVM?
- are you using multicore? (I fixed some index locking bugs  
recently)

- are there any exceptions in the log before this?
- how reproducible is this?

-Yonik

On Thu, Aug 14, 2008 at 2:47 PM, Ian Connor <[EMAIL PROTECTED]>  
wrote:

Hi,

I have rebuilt my index a few times (it should get up to about 4
Million but around 1 Million it starts to fall apart).

Exception in thread "Lucene Merge Thread #0"
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.IndexOutOfBoundsException: Index: 105, Size: 33
  at  
org 
.apache 
.lucene 
.index 
.ConcurrentMergeScheduler 
.handleMergeException(ConcurrentMergeScheduler.java:323)
  at org.apache.lucene.index.ConcurrentMergeScheduler 
$MergeThread.run(ConcurrentMergeScheduler.java:300)
Caused by: java.lang.IndexOutOfBoundsException: Index: 105, Size:  
33

  at java.util.ArrayList.rangeCheck(ArrayList.java:572)
  at java.util.ArrayList.get(ArrayList.java:350)
  at  
org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
  at  
org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:188)
  at  
org.apache.lucene.index.SegmentReader.document(SegmentReader.java: 
670)
  at  
org 
.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java: 
349)
  at  
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134)
  at  
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 
3998)
  at  
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3650)
  at  
org 
.apache 
.lucene 
.index 
.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java: 
214)
  at org.apache.lucene.index.ConcurrentMergeScheduler 
$MergeThread.run(ConcurrentMergeScheduler.java:269)



When this happens, the disk usage goes right up and the indexing
really starts to slow down. I am using a Solr build from about a  
week

ago - so my Lucene is at 2.4 according to the war files.

Has anyone seen this error before? Is it possible to tell which  
Array
is too large? Would it be an Array I am sending in or another  
internal

one?

Regards,
Ian Connor







--
Regards,

Ian Connor





Can I copy an index built on a Windows system to a Unix/Linux system?

2008-08-15 Thread johnwarde

Hi,

Can I copy an index built on a Windows system to a Unix/Linux system and
still work?

Reason for my question:
I have been working with Solr for the last month on a Windows system and I
have determined that we need to have a replication solution for our future
needs (volume of documents to be indexed and query loads).  

At this point in time it looks like, from my research, that Solr does not
currently provide a reliable/tested replication strategy on Windows.  

However, I would like to continue to use Solr on Windows for now until the
load on the single windows system becomes too great and requires us to
implement a replication strategy (one index master, many query slaves). 
Hopefully, by that time a reliable replication strategy on Windows may
present itself but if it doesn't ... 

Can I make a binary copy of the index files from a windows system to a
Unix/Linux system and be read by a Solr on the Unix/Linux system.  Would
there be any byte order problems? Or would I need to rebuild the index from
the original data?

Many thanks for your help!

John



-- 
View this message in context: 
http://www.nabble.com/Can-I-copy-an-index-built-on-a-Windows-system-to-a-Unix-Linux-system--tp18997540p18997540.html
Sent from the Solr - User mailing list archive at Nabble.com.