On Mar 10, 2007, at 3:27 PM, Michael Busch wrote:
- Introduce index-level metadata. Preferable in XML format, so it will be human readable. Later on, we can store information about the index format in this file, like the codecs that are used to store the data.
To provoke thought about what index-level metadata might go in this file, the contents of a KS "segments_2.yaml" file immediately after indexing an html presentation of the US constitution is below.
Marvin Humphrey Rectangular Research http://www.rectangular.com/ slothbear:~/projects/ks/perl marvin$ cat uscon_invindex/segments_2.yaml ks_version: 0.20_02 fields: title: 'KinoSearch::Schema::FieldSpec' url: 'USConSchema::UnIndexedField' content: 'KinoSearch::Schema::FieldSpec' format: 1 generation: 2 seg_counter: 1 segments: _1: term_list_index: skip_interval: 16 format: 1 index_interval: 128 size: 8 counts: title: 1 content: 8 posting_list: format: 1 compound_file: format: 1 sub_files: _1.tlx2: offset: 138575 length: 93 _1.p0: offset: 138134 length: 441 _1.tvx: offset: 137718 length: 416 _1.tv: offset: 73487 length: 64231 _1.tl0: offset: 73259 length: 228 _1.p2: offset: 56393 length: 16866 _1.ds: offset: 7015 length: 49378 _1.tl2: offset: 421 length: 6594 _1.dsx: offset: 5 length: 416 _1.tlx0: offset: 0 length: 5 term_vectors: format: 1 term_list: skip_interval: 16 format: 1 index_interval: 128 size: 923 counts: title: 41 content: 923 doc_storage: format: 1 seg_info: seg_name: _1 doc_count: 52 field_names: - title - url - content version: 1173732193033 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]