Re: corrupted index Lucene 4.4

Michael McCandless Wed, 23 Oct 2013 13:03:31 -0700

Hi Chris,

Sorry, I don't know much about Solr cloud; maybe as on the solr-user
list, and give details about what went wrong?


Mike McCandless

http://blog.mikemccandless.com


On Wed, Oct 23, 2013 at 11:25 AM, Chris <christu...@gmail.com> wrote:
> Wow !!! Thanks a lot for the helpfull tips I will implement this in the
> next two days & report back with my indexing speed....I have one more
> question...
>
> i tried committing to solr cloud, but then something was not correct
> as it would not index after a few documents...
>
> Also, There seems to be something wrong in zookeeper, when we try to add
> documents using solrj, it works fine as long as load of insert is not much,
> but once we start doing many inserts, then it throws a lot of errors...
>
> I am doing something like -
>
> CloudSolrServer solrCoreCloud = new CloudSolrServer(cloudURL);
>                     solrCoreCloud.
> setDefaultCollection("Image");
>                     UpdateResponse up = solrCoreCloud.addBean(resultItem);
>                     UpdateResponse upr = solrCoreCloud.commit();
>
> since i have to reindex, i am thinking if i need to use solrcloud or not?
>
>
>
>
> On Wed, Oct 23, 2013 at 8:41 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Indexing 100M web pages really should not take months; if you fix
>> committing after every row that should make things much faster.
>>
>> Use multiple index threads, set a highish RAM buffer (~512 MB), use a
>> local disk not a remote mounted fileserver, ideally an SSD, etc.  See
>> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed for more
>> ideas.
>>
>> Only commit periodically, when enough indexing has happened that you
>> would be upset to lose that work since the last commit (e.g. maybe
>> every few hours or something).
>>
>> Also, be sure your IO system is "healthy" / does not disregard fsync,
>> and if the index is really important, back it up to a different
>> storage device every so often.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Wed, Oct 23, 2013 at 10:58 AM, Chris <christu...@gmail.com> wrote:
>> > Actually, it contains about 100 million webpages and was built out of a
>> web
>> > index for NLP processing :(
>> >
>> > I did the indexing & crawling over one small sized server....and
>> > researching and getting it all to this stage took me this much time...and
>> > now my index is un-usable :(
>> >
>> >
>> > On Wed, Oct 23, 2013 at 8:16 PM, Michael McCandless <
>> > luc...@mikemccandless.com> wrote:
>> >
>> >> On Wed, Oct 23, 2013 at 10:33 AM, Chris <christu...@gmail.com> wrote:
>> >> > I am not exactly sure if the commit() was run, as i am inserting each
>> >> row &
>> >> > doing a commit right away. My solr will not load the index....
>> >>
>> >> I'm confused: if you are doing a commit right away after every row
>> >> (which is REALLY bad practice: that's incredibly slow and
>> >> unnecessary), then surely you've had many commits succeed?
>> >>
>> >> > is there anyway that i can fix this, I have a huge index & will loose
>> >> > months if i try to reindex :( I didnt know lucene was not stable, I
>> >> thought
>> >> > it was
>> >>
>> >> Sorry, but no.
>> >>
>> >> In theory ... a tool could be created that would try to "reconstitute"
>> >> a segments file by looking at all the various files that exist, but
>> >> this is not in general easy (and may not be possible): the segments
>> >> file has very important metadata, like which codec was used to write
>> >> each segment, etc.
>> >>
>> >> Did it really take months to do this indexing?  That is really way too
>> >> long; how many documents?
>> >>
>> >> Lucene (Solr) is stable, i.e. a successful commit should ensure your
>> >> index survives power loss.  If somehow that was not the case here,
>> >> then we need to figure out why and fix it ...
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: corrupted index Lucene 4.4

Reply via email to