Okay - my fault - I'm not really talking in terms of Lucene. Though even
there I consider it possible. You'd just have to like, rewrite it :) And
it would likely be pretty slow.


Jake Mannix wrote:
>
>
> On Fri, Nov 20, 2009 at 4:20 PM, Mark Miller <markrmil...@gmail.com
> <mailto:markrmil...@gmail.com>> wrote:
>
>     Mark Miller wrote:
>     >
>     > it looks expensive to me to do both
>     > of them properly.
>     Okay - I guess that somewhat makes sense - you can calculate the
>     magnitude of the doc vectors at index time. How is that impossible
>     with
>     incremental indexing though? Isn't it just expensive? Seems somewhat
>     expensive in the non incremental case as well - your just eating it at
>     index time rather than query time - though the same could be done for
>     incremental? The information is all there in either case.
>
>
> The expense, if you have the idfs of all terms in the vocabulary (keep
> them
> in the form of idf^2 for efficiency at index time), is pretty trivial,
> isn't it?  If
> you have a document with 1000 terms, it's maybe 3000 floating point
> operations, all CPU actions, in memory, no disk seeks. 
>
> What it does require, is knowing, even when you have no documents yet
> on disk, what the idf of terms in the first few documents are.  Where do
> you know this, in Lucene, if you haven't externalized some notion of idf?
>
>   -jake
>  
>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>     <mailto:java-dev-unsubscr...@lucene.apache.org>
>     For additional commands, e-mail: java-dev-h...@lucene.apache.org
>     <mailto:java-dev-h...@lucene.apache.org>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to