resending, typo in subject line On Mar 27, 2013, at 3:40 PM, Mark Bennett <mark.benn...@lucidworks.com> wrote:
> Disclaimer: I realize you wouldn't want to do this for anything other than a > toy collection. > > Perk: however, this overall discussion might also be useful people wanting to > use other codes by default, for example the faster BlockPostingsFormat. > > Old info online: Instructions for using enabling SimpleText were written back > in its early days. But in more recent versions of Solr these instructions > are largely obsolete, you DON'T need to do most of that. You can just add > postingsFormat="SimpleText" to a <fieldType> tag and get the new behavior. I > believe it's similar for using the BlockPostingsFormat. > > But when you do this (add it to text_general for example), although your text > fields reside in the new format, the other files in the index directory are > still binary. By the time your debugging gets to your text field values, some > "magic" has already happened via the other files (the system already knows > about offsets into the file, for example) > > Question: Can SimpleText even be used for the other binary files in an index? > Or is it somehow specific in scope to field tokens? > > Question: If it can be used for all the other files, what's the setting for > that? I had seen a switch -Dtests.codec=SimpleText in the old instructions, > but clearly that's for unit tests, and wasn't sure of it's scope or > applicability. > > Question: Has anybody tried using BlockPostingsFormat as a default codec? > (for all files) Did it work? Was it faster that just applying to your text > fields? > > Other questions... > > Or maybe there's some other aspect to all of this that I'm missing, some > other question I should really be asking? The old posts online seem to > assume fairly deep understanding of Lucene & Solr's overall codec framework, > which was appropriate at that time. But now it's included by default, so > it's sort of "mainstream", and although I generally understand codes, there's > still aspects of it in Solr that I'm a bit hazy one; wondering if others have > the same feeling? > > Examples of things I'm a bit hazy on: > > Are there rules about which codes can be used where? > > Can you mix and match codes? Can you chain them? > > I also saw the FilterCodec javadoc. Would I only use that if I want to reuse > most of an existing code, but alter just one part of it? I'm a bit fuzzy > combining that with other codes. If there's a java command line -D switch > that tells the system to use a different (but already existing) code, then I > don't think I'd need this at all? > > -- > Mark Bennett / LucidWorks: Search & Big Data / mark.benn...@lucidworks.com > Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513 > > > > > > >