Thanks Jake.

I have around 75 TB data to be indexed. So even though I do the sharding, 
individual index file size might still be pretty high. And that's why I wanted 
to find out whether there is any limit as such. And obviously whether such a 
huge index files can be searched at all.

>From your response it appears that 1 TB of 1 index file is too much. Is there 
>any guideline to what kind of hardware will be required to handle (10GB, 50GB, 
>100GB, 500GB etc) size of index file (with sensible search times)

--Hrishi

-----Original Message-----
From: Jake Mannix [mailto:jake.man...@gmail.com]
Sent: Friday, October 23, 2009 11:09 AM
To: java-user@lucene.apache.org
Subject: Re: Maximum index file size

On Thu, Oct 22, 2009 at 10:29 PM, Hrishikesh Agashe <
hrishikesh_aga...@persistent.co.in> wrote:

> Can I create an index file with very large size, like 1 TB or so? Is there
> any limit on how large index file one can create? Also, will I be able to
> search on this 1 TB index file at all?
>

Leaving aside the question of hardware or JVM limits on monstrous files,
this question (can you search this file) is easier: if you've got say, a ten
billion documents in one index, and you have a query which is going to hit
maybe even just 0.1% of the documents, you'll need to do scoring of 10
million hits in the course of that query.  To do this in under a second
means you only have 100 nanoseconds to look at each document.  If your query
hits 1% of your documents, you're down to 10 ns per document.  I've never
tried searching a 1TB index, but I'd say that's pushing it.

Is there a reason you can't shard your index, and instead put maybe 20
shards of 50GB (or better - 100 shards of 10GB) each on a variety of
machines, and just merge results?

  -jake

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to