On Tue, Apr 12, 2011 at 13:41, Gregor Heinrich <gre...@arbylon.net> wrote: > Hi -- has there been any effort to create a numerical representation of > Lucene indices. That is, to use the Lucene Directory backend as a large > term-document matrix at index level. As this would require bijective mapping > between terms (per-field, as customary in Lucene) and a numerical index > (integer, monotonous from 0 to numTerms()-1), I guess this requires some > some special modifications to the Lucene core. Lucene index already provides term <-> id mapping in some form.
> Another interesting feature would be to use Lucene's Directory backend for > storage of large dense matrices, for instance to data-mining tasks from > within Lucene. Lucene's Directory is a dumb abstraction for random-access named write-once byte streams. It doesn't add /any/ value over mmap. > Any suggestions? *troll mode on* Use numpy/scipy? :) -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org