Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Kevin Osborn
I did see that bug, which made me suspect Lucene. In my case, I tracked down 
the problem. It was my own application. I was using Java's 
FileChannel.transferTo functions to copy my index from one location to another. 
One of the files is bigger than 2^31-1 bytes. So, one of my files was corrupted 
during the copy because I was just doing one pass. I now loop the copy function 
until the entire file is copied and everything works fine.

DOH!

- Original Message 
From: Yonik Seeley <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 4:57:08 PM
Subject: Re: IOException: read past EOF during optimize phase


This may be a Lucene bug... IIRC, I saw at least one other lucene user
with a similar stack trace.  I think the latest lucene version (2.3
dev) should fix it if that's the case.

-Yonik

On Jan 16, 2008 3:07 PM, Kevin Osborn <[EMAIL PROTECTED]> wrote:
> I am using the embedded Solr API for my indexing process. I created a
 brand new index with my application without any problem. I then ran my
 indexer in incremental mode. This process copies the working index to
 a temporary Solr location, adds/updates any records, optimizes the
 index, and then copies it back to the working location. There are currently
 not any instances of Solr reading this index. Also, I commit after
 every 10 rows. The schema.xml and solrconfig.xml files have not
 changed.
>
> Here is my function call.
> protected void optimizeProducts() throws IOException {
> UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
> CommitUpdateCommand commitCmd = new
 CommitUpdateCommand(true);
> commitCmd.optimize = true;
>
> updateHandler.commit(commitCmd);
>
> log.info("Optimized index");
> }
>
> So, during the optimize phase, I get the following stack trace:
> java.io.IOException: read past EOF
> at
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
> at
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
> at
 org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
> at
 org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
> at
 org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
> at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
> at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
> at
 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
> at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
> at
 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
> at
 org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
> at
 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
> at ...
>
> There are no exceptions or anything else that appears to be incorrect
 during the adds or commits. After this, the index files are still
 non-optimized.
>
> I know there is not a whole lot to go on here. Anything in particular
 that I should look at?
>
>





Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Yonik Seeley
This may be a Lucene bug... IIRC, I saw at least one other lucene user
with a similar stack trace.  I think the latest lucene version (2.3
dev) should fix it if that's the case.

-Yonik

On Jan 16, 2008 3:07 PM, Kevin Osborn <[EMAIL PROTECTED]> wrote:
> I am using the embedded Solr API for my indexing process. I created a brand 
> new index with my application without any problem. I then ran my indexer in 
> incremental mode. This process copies the working index to a temporary Solr 
> location, adds/updates any records, optimizes the index, and then copies it 
> back to the working location. There are currently not any instances of Solr 
> reading this index. Also, I commit after every 10 rows. The schema.xml 
> and solrconfig.xml files have not changed.
>
> Here is my function call.
> protected void optimizeProducts() throws IOException {
> UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
> CommitUpdateCommand commitCmd = new CommitUpdateCommand(true);
> commitCmd.optimize = true;
>
> updateHandler.commit(commitCmd);
>
> log.info("Optimized index");
> }
>
> So, during the optimize phase, I get the following stack trace:
> java.io.IOException: read past EOF
> at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
> at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
> at org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
> at org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
> at 
> org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
> at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
> at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
> at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
> at 
> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
> at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
> at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
> at ...
>
> There are no exceptions or anything else that appears to be incorrect during 
> the adds or commits. After this, the index files are still non-optimized.
>
> I know there is not a whole lot to go on here. Anything in particular that I 
> should look at?
>
>


Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Kevin Osborn
Our basic setup is master/slave. We just want to make sure that we are not 
syncing against an index that is in the middle of a large rebuild. But, I think 
these issues are still separate from what I am experiencing.

I also tried this same scenario in a different development environment. No 
problems there.

- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 2:33:03 PM
Subject: Re: IOException: read past EOF during optimize phase


Kevin,

Perhaps you want to look at how Solr can be used in a master-slave
 setup.  This will separate your indexing from searching.  Don't have the
 URL, but it's on zee Wiki.

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Kevin Osborn <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 5:25:34 PM
Subject: Re: IOException: read past EOF during optimize phase

It is more of a file structure thing for our application. We build in
 one place and do our index syncing in a different place. I doubt it is
 relevant to this issue, but figured I would include this information
 anyway.

- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 2:21:31 PM
Subject: Re: IOException: read past EOF during optimize phase


Kevin,

Don't have the answer to EOF but I'm wondering why the index
 moving.  You don't need to do that as far as Solr is concerned.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Kevin Osborn <[EMAIL PROTECTED]>
To: Solr 
Sent: Wednesday, January 16, 2008 3:07:23 PM
Subject: IOException: read past EOF during optimize phase

I am using the embedded Solr API for my indexing process. I created a
 brand new index with my application without any problem. I then ran my
 indexer in incremental mode. This process copies the working index to
 a
 temporary Solr location, adds/updates any records, optimizes the
 index,
 and then copies it back to the working location. There are currently
 not any instances of Solr reading this index. Also, I commit after
 every
 10 rows. The schema.xml and solrconfig.xml files have not changed.

Here is my function call.
protected void optimizeProducts() throws IOException {
UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
CommitUpdateCommand commitCmd = new CommitUpdateCommand(true);
commitCmd.optimize = true;

updateHandler.commit(commitCmd);

log.info("Optimized index");
}

So, during the optimize phase, I get the following stack trace:
java.io.IOException: read past EOF
at



 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
at



 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
at
 org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
at
 org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
at



 org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
at



 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
at



 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
at
 org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
at



 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
at ...

There are no exceptions or anything else that appears to be incorrect
 during the adds or commits. After this, the index files are still
 non-optimized.

I know there is not a whole lot to go on here. Anything in particular
 that I should look at?















Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Otis Gospodnetic
Kevin,

Perhaps you want to look at how Solr can be used in a master-slave setup.  This 
will separate your indexing from searching.  Don't have the URL, but it's on 
zee Wiki.

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Kevin Osborn <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 5:25:34 PM
Subject: Re: IOException: read past EOF during optimize phase

It is more of a file structure thing for our application. We build in
 one place and do our index syncing in a different place. I doubt it is
 relevant to this issue, but figured I would include this information
 anyway.

- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 2:21:31 PM
Subject: Re: IOException: read past EOF during optimize phase


Kevin,

Don't have the answer to EOF but I'm wondering why the index
 moving.  You don't need to do that as far as Solr is concerned.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Kevin Osborn <[EMAIL PROTECTED]>
To: Solr 
Sent: Wednesday, January 16, 2008 3:07:23 PM
Subject: IOException: read past EOF during optimize phase

I am using the embedded Solr API for my indexing process. I created a
 brand new index with my application without any problem. I then ran my
 indexer in incremental mode. This process copies the working index to
 a
 temporary Solr location, adds/updates any records, optimizes the
 index,
 and then copies it back to the working location. There are currently
 not any instances of Solr reading this index. Also, I commit after
 every
 10 rows. The schema.xml and solrconfig.xml files have not changed.

Here is my function call.
protected void optimizeProducts() throws IOException {
UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
CommitUpdateCommand commitCmd = new CommitUpdateCommand(true);
commitCmd.optimize = true;

updateHandler.commit(commitCmd);

log.info("Optimized index");
}

So, during the optimize phase, I get the following stack trace:
java.io.IOException: read past EOF
at


 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
at


 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
at
 org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
at
 org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
at


 org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
at


 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
at


 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
at
 org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
at


 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
at ...

There are no exceptions or anything else that appears to be incorrect
 during the adds or commits. After this, the index files are still
 non-optimized.

I know there is not a whole lot to go on here. Anything in particular
 that I should look at?












Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Kevin Osborn
It is more of a file structure thing for our application. We build in one place 
and do our index syncing in a different place. I doubt it is relevant to this 
issue, but figured I would include this information anyway.

- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 2:21:31 PM
Subject: Re: IOException: read past EOF during optimize phase


Kevin,

Don't have the answer to EOF but I'm wondering why the index
 moving.  You don't need to do that as far as Solr is concerned.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Kevin Osborn <[EMAIL PROTECTED]>
To: Solr 
Sent: Wednesday, January 16, 2008 3:07:23 PM
Subject: IOException: read past EOF during optimize phase

I am using the embedded Solr API for my indexing process. I created a
 brand new index with my application without any problem. I then ran my
 indexer in incremental mode. This process copies the working index to
 a
 temporary Solr location, adds/updates any records, optimizes the
 index,
 and then copies it back to the working location. There are currently
 not any instances of Solr reading this index. Also, I commit after
 every
 10 rows. The schema.xml and solrconfig.xml files have not changed.

Here is my function call.
protected void optimizeProducts() throws IOException {
UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
CommitUpdateCommand commitCmd = new CommitUpdateCommand(true);
commitCmd.optimize = true;

updateHandler.commit(commitCmd);

log.info("Optimized index");
}

So, during the optimize phase, I get the following stack trace:
java.io.IOException: read past EOF
at

 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
at

 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
at
 org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
at
 org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
at

 org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
at

 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
at

 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
at
 org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
at

 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
at ...

There are no exceptions or anything else that appears to be incorrect
 during the adds or commits. After this, the index files are still
 non-optimized.

I know there is not a whole lot to go on here. Anything in particular
 that I should look at?









Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Otis Gospodnetic
Kevin,

Don't have the answer to EOF but I'm wondering why the index moving.  You 
don't need to do that as far as Solr is concerned.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Kevin Osborn <[EMAIL PROTECTED]>
To: Solr 
Sent: Wednesday, January 16, 2008 3:07:23 PM
Subject: IOException: read past EOF during optimize phase

I am using the embedded Solr API for my indexing process. I created a
 brand new index with my application without any problem. I then ran my
 indexer in incremental mode. This process copies the working index to a
 temporary Solr location, adds/updates any records, optimizes the index,
 and then copies it back to the working location. There are currently
 not any instances of Solr reading this index. Also, I commit after every
 10 rows. The schema.xml and solrconfig.xml files have not changed.

Here is my function call.
protected void optimizeProducts() throws IOException {
UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
CommitUpdateCommand commitCmd = new CommitUpdateCommand(true);
commitCmd.optimize = true;

updateHandler.commit(commitCmd);

log.info("Optimized index");
}

So, during the optimize phase, I get the following stack trace:
java.io.IOException: read past EOF
at
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
at
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
at
 org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
at
 org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
at
 org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
at
 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
at
 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
at
 org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
at
 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
at ...

There are no exceptions or anything else that appears to be incorrect
 during the adds or commits. After this, the index files are still
 non-optimized.

I know there is not a whole lot to go on here. Anything in particular
 that I should look at?






IOException: read past EOF during optimize phase

2008-01-16 Thread Kevin Osborn
I am using the embedded Solr API for my indexing process. I created a brand new 
index with my application without any problem. I then ran my indexer in 
incremental mode. This process copies the working index to a temporary Solr 
location, adds/updates any records, optimizes the index, and then copies it 
back to the working location. There are currently not any instances of Solr 
reading this index. Also, I commit after every 10 rows. The schema.xml and 
solrconfig.xml files have not changed.

Here is my function call.
protected void optimizeProducts() throws IOException {
UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
CommitUpdateCommand commitCmd = new CommitUpdateCommand(true);
commitCmd.optimize = true;

updateHandler.commit(commitCmd);

log.info("Optimized index");
}

So, during the optimize phase, I get the following stack trace:
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
at org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
at org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
at 
org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
at 
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
at 
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
at ...

There are no exceptions or anything else that appears to be incorrect during 
the adds or commits. After this, the index files are still non-optimized.

I know there is not a whole lot to go on here. Anything in particular that I 
should look at?