+1 ----- Original Message ----- From: "Kevin Burton" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, May 18, 2004 2:43 PM Subject: Internal full content store within Lucene
> Per the discussion the other day about storing content external to > Lucene I think we have an opportunity to improve the lucene core and > bring a lot of functionality to future developers. > > Right now Lucene allows you to have a 'stored' field which keeps the > content with a segment along with your inverted index. > > While this is flexible for small indexes in production environments it > falls down because index merges take FOREVER. > > A thread the other day opened up and suggesting storing just a pointer > to a file on the filesystem. This got me thinking about a long term > mechanism I wanted for our cluster where we store content outside of the > index in a high performance flat-file database. > > The Lucene index would only maintain FILENO-:OFFSET:LENGTH info within > the index and this would allow us to point to our flat file database. > > This would allow Lucene index merges to be FAST, support native field > storage, and allow the filesystem optimize contiguous blocks for the > flat content store. Everyone wins. > > This is what the Internet archive uses: > > http://www.archive.org/web/researcher/ArcFileFormat.php > > I propose that Lucene support a new form of stored field that allows > external storage engine to keep the content in a flat text store. > > How much interest is there for this? I have to do this for work and > will certainly take the extra effort into making this a standard Lucene > feature. > > I can come up with a requirements doc and a more formal proposal in > another email if I get enough +1s... > > Kevin > > -- > > Please reply using PGP. > > http://peerfear.org/pubkey.asc > > NewsMonster - http://www.newsmonster.org/ > > Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 > AIM/YIM - sfburtonator, Web - http://peerfear.org/ > GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 > IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]