Hi Chris, Sorry, I don't know much about Solr cloud; maybe as on the solr-user list, and give details about what went wrong?
Mike McCandless http://blog.mikemccandless.com On Wed, Oct 23, 2013 at 11:25 AM, Chris <christu...@gmail.com> wrote: > Wow !!! Thanks a lot for the helpfull tips I will implement this in the > next two days & report back with my indexing speed....I have one more > question... > > i tried committing to solr cloud, but then something was not correct > as it would not index after a few documents... > > Also, There seems to be something wrong in zookeeper, when we try to add > documents using solrj, it works fine as long as load of insert is not much, > but once we start doing many inserts, then it throws a lot of errors... > > I am doing something like - > > CloudSolrServer solrCoreCloud = new CloudSolrServer(cloudURL); > solrCoreCloud. > setDefaultCollection("Image"); > UpdateResponse up = solrCoreCloud.addBean(resultItem); > UpdateResponse upr = solrCoreCloud.commit(); > > since i have to reindex, i am thinking if i need to use solrcloud or not? > > > > > On Wed, Oct 23, 2013 at 8:41 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> Indexing 100M web pages really should not take months; if you fix >> committing after every row that should make things much faster. >> >> Use multiple index threads, set a highish RAM buffer (~512 MB), use a >> local disk not a remote mounted fileserver, ideally an SSD, etc. See >> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed for more >> ideas. >> >> Only commit periodically, when enough indexing has happened that you >> would be upset to lose that work since the last commit (e.g. maybe >> every few hours or something). >> >> Also, be sure your IO system is "healthy" / does not disregard fsync, >> and if the index is really important, back it up to a different >> storage device every so often. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Wed, Oct 23, 2013 at 10:58 AM, Chris <christu...@gmail.com> wrote: >> > Actually, it contains about 100 million webpages and was built out of a >> web >> > index for NLP processing :( >> > >> > I did the indexing & crawling over one small sized server....and >> > researching and getting it all to this stage took me this much time...and >> > now my index is un-usable :( >> > >> > >> > On Wed, Oct 23, 2013 at 8:16 PM, Michael McCandless < >> > luc...@mikemccandless.com> wrote: >> > >> >> On Wed, Oct 23, 2013 at 10:33 AM, Chris <christu...@gmail.com> wrote: >> >> > I am not exactly sure if the commit() was run, as i am inserting each >> >> row & >> >> > doing a commit right away. My solr will not load the index.... >> >> >> >> I'm confused: if you are doing a commit right away after every row >> >> (which is REALLY bad practice: that's incredibly slow and >> >> unnecessary), then surely you've had many commits succeed? >> >> >> >> > is there anyway that i can fix this, I have a huge index & will loose >> >> > months if i try to reindex :( I didnt know lucene was not stable, I >> >> thought >> >> > it was >> >> >> >> Sorry, but no. >> >> >> >> In theory ... a tool could be created that would try to "reconstitute" >> >> a segments file by looking at all the various files that exist, but >> >> this is not in general easy (and may not be possible): the segments >> >> file has very important metadata, like which codec was used to write >> >> each segment, etc. >> >> >> >> Did it really take months to do this indexing? That is really way too >> >> long; how many documents? >> >> >> >> Lucene (Solr) is stable, i.e. a successful commit should ensure your >> >> index survives power loss. If somehow that was not the case here, >> >> then we need to figure out why and fix it ... >> >> >> >> Mike McCandless >> >> >> >> http://blog.mikemccandless.com >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org