Hehe... That was what I advocated from the beginning. There is a
cost associated with this, though, i.e. any change in CrawlDatum
size has a significant impact on most operations' performance.
Sure, if you every had a look to the 0.7 meta data patch, there i had
implement things in a way that only in case there was meta data these
meta data and the key was written to the file.
So no meta data means the same file size as before. in general we
need to accept that meta data pump up the file == the processing and
IO load, but people doing a complete web index, can work without meta
data and people that need these function need to accept that nothing
is for free.
All solutions I had seen until today load this kind of meta data
until indexing from a third party data source (database) and add
it into the index. This works but is very slow.
Well, maybe it makes sense to store the CrawlDatum and its
"metadata" separately in two MapFiles, so that you can perform some
operations using only the lightweight CrawlDatum, and for other
operations you will need to load the properties too...
Yes, I like this idea and I remember that Doug had suggest such a
solution also.
However first I will focus on the NutchConf issue.
Stefan