resending, typo in subject line

On Mar 27, 2013, at 3:40 PM, Mark Bennett <mark.benn...@lucidworks.com> wrote:

> Disclaimer: I realize you wouldn't want to do this for anything other than a 
> toy collection.
> 
> Perk: however, this overall discussion might also be useful people wanting to 
> use other codes by default, for example the faster BlockPostingsFormat.
> 
> Old info online: Instructions for using enabling SimpleText were written back 
> in its early days.  But in more recent versions of Solr these instructions 
> are largely obsolete, you DON'T need to do most of that.  You can just add 
> postingsFormat="SimpleText" to a <fieldType> tag and get the new behavior.  I 
> believe it's similar for using the BlockPostingsFormat.
> 
> But when you do this (add it to text_general for example), although your text 
> fields reside in the new format, the other files in the index directory are 
> still binary. By the time your debugging gets to your text field values, some 
> "magic" has already happened via the other files (the system already knows 
> about offsets into the file, for example)
> 
> Question: Can SimpleText even be used for the other binary files in an index? 
>  Or is it somehow specific in scope to field tokens?
> 
> Question: If it can be used for all the other files, what's the setting for 
> that?  I had seen a switch -Dtests.codec=SimpleText in the old instructions, 
> but clearly that's for unit tests, and wasn't sure of it's scope or 
> applicability.
> 
> Question: Has anybody tried using BlockPostingsFormat as a default codec?  
> (for all files)  Did it work?  Was it faster that just applying to your text 
> fields?
> 
> Other questions...
> 
> Or maybe there's some other aspect to all of this that I'm missing, some 
> other question I should really be asking?  The old posts online seem to 
> assume fairly deep understanding of Lucene & Solr's overall codec framework, 
> which was appropriate at that time.  But now it's included by default, so 
> it's sort of "mainstream", and although I generally understand codes, there's 
> still aspects of it in Solr that I'm a bit hazy one; wondering if others have 
> the same feeling?
> 
> Examples of things I'm a bit hazy on:
> 
> Are there rules about which codes can be used where?
> 
> Can you mix and match codes?  Can you chain them?
> 
> I also saw the FilterCodec javadoc.  Would I only use that if I want to reuse 
> most of an existing code, but alter just one part of it?  I'm a bit fuzzy 
> combining that with other codes.  If there's a java command line -D switch 
> that tells the system to use a different (but already existing) code, then I 
> don't think I'd need this at all?
> 
> --
> Mark Bennett / LucidWorks: Search & Big Data / mark.benn...@lucidworks.com
> Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513
> 
> 
> 
> 
> 
> 
> 

Reply via email to