Can you get a strack trace so we can see where the thread is stuck? Mike McCandless
http://blog.mikemccandless.com On Tue, Jul 30, 2013 at 11:08 AM, Tom Burton-West <tburt...@umich.edu> wrote: > Thanks Mike, > > Billion not Trillion Doh! > > Wasn't thinking it through when I titled the e-mail.... The total number of > tokens shouldn't be unusual compared to our other indexes since whether we > index pages or whole docs, the number of tokens shouldn't change > significantly. The main difference between this and our other indexes is > the number of documents. Our regular indexes have maybe 800,000 docs > wheras these have about 82 million. > > I'm not sure what is going on but I'm guessing that the Checkindex program > has been caught in some GC loop for the last few days. I didn't start it > up with any GC logging or hooks to attach jconsole. I'm going to kill it > and maybe try again and give it more memory and maybe turn on GC logging. > > Tom > > > On Tue, Jul 30, 2013 at 8:41 AM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> I think that's ~ 110 billion, not trillion, tokens :) >> >> Are you certain you don't have any term vectors? >> >> Even if your index has no term vectors, CheckIndex goes through all >> docIDs trying to load them, but that ought to be very fast, and then >> you should see "test: doc values..." after that. >> >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Mon, Jul 29, 2013 at 4:30 PM, Tom Burton-West <tburt...@umich.edu> >> wrote: >> > We have very large indexes, almost a terabyte for a single index, and >> > normally it takes overnight to run a checkindex. I started a CheckIndex >> > on Friday and today (Monday) it seems to be stuck testing vectors >> although >> > we haven't got vectors turned on. (See below) >> > The output file was last written Jul 27 02:28, >> > Note that in this 750 GB segment we have about 83 million docs with >> about >> > 2.4 billion unique terms and about 110 trillion tokens. >> > >> > Have we hit a new CheckIndex limit? >> > >> > >> > Tom >> > >> > ----------------------- >> > >> > >> > Opening index @ /htsolr/lss-dev/solrs/4.2/3/core/data/index >> > >> > Segments file=segments_e numSegments=2 version=4.2.1 format= >> > userData={commitTimeMSec=1374712392103} >> > 1 of 2: name=_bch docCount=82946896 >> > codec=Lucene42 >> > compound=false >> > numFiles=12 >> > size (MB)=752,005.689 >> > diagnostics = {timestamp=1374657630506, os=Linux, >> > os.version=2.6.18-348.12.1.el5, mergeFactor=16, source=merge, >> > lucene.version=4.2.1 1461071 - mark - 2013-03-26 08:23:34, os.arch=amd64, >> > mergeMaxNumSegments=2, java.version=1.6.0_16, java.vendor=Sun >> Microsystems >> > Inc.} >> > no deletions >> > test: open reader.........OK >> > test: fields..............OK [12 fields] >> > test: field norms.........OK [3 fields] >> > test: terms, freq, prox...OK [2442919802 terms; 73922320413 >> terms/docs >> > pairs; 109976572432 tokens] >> > test: stored fields.......OK [960417844 total field count; avg 11.579 >> > fields per doc] >> > test: term vectors........ >> > ~ >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org