Thanks Jake. I have around 75 TB data to be indexed. So even though I do the sharding, individual index file size might still be pretty high. And that's why I wanted to find out whether there is any limit as such. And obviously whether such a huge index files can be searched at all.
>From your response it appears that 1 TB of 1 index file is too much. Is there >any guideline to what kind of hardware will be required to handle (10GB, 50GB, >100GB, 500GB etc) size of index file (with sensible search times) --Hrishi -----Original Message----- From: Jake Mannix [mailto:jake.man...@gmail.com] Sent: Friday, October 23, 2009 11:09 AM To: java-user@lucene.apache.org Subject: Re: Maximum index file size On Thu, Oct 22, 2009 at 10:29 PM, Hrishikesh Agashe < hrishikesh_aga...@persistent.co.in> wrote: > Can I create an index file with very large size, like 1 TB or so? Is there > any limit on how large index file one can create? Also, will I be able to > search on this 1 TB index file at all? > Leaving aside the question of hardware or JVM limits on monstrous files, this question (can you search this file) is easier: if you've got say, a ten billion documents in one index, and you have a query which is going to hit maybe even just 0.1% of the documents, you'll need to do scoring of 10 million hits in the course of that query. To do this in under a second means you only have 100 nanoseconds to look at each document. If your query hits 1% of your documents, you're down to 10 ns per document. I've never tried searching a 1TB index, but I'd say that's pushing it. Is there a reason you can't shard your index, and instead put maybe 20 shards of 50GB (or better - 100 shards of 10GB) each on a variety of machines, and just merge results? -jake DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org