Unfortunately it is all classified data I could not share, I will try to debug
On Sun, Jan 3, 2010 at 4:10 PM, Grant Ingersoll <[email protected]> wrote: > Is there anyway you could zip up a small document set and your Solr home > and post somewhere? > > On Jan 3, 2010, at 9:08 AM, Bogdan Vatkov wrote: > > > Yesterday I had issues with mapping cluster results to dictionary entries > - > > it happened that I was using different dictionary - therefore the result > > clusters shown really strange results. > > But once I fixed all the commands, input/output files, etc. I got very > good > > result from clusterization POV (I mean clusters are quite correct having > in > > mind the input documents) but unfortunately the clusters contained mostly > > words which I would like to stop - and which words I placed in the > > stopwords.txt in Solr (re-indexed, restarted Solr, etc.). > > > > Where do you suggest I debug the vector creation? Seems Solr respects the > > stopwords but not the vector creation (then clustering). > > > > On Sun, Jan 3, 2010 at 4:02 PM, Grant Ingersoll <[email protected]> > wrote: > > > >> > >> On Jan 3, 2010, at 8:58 AM, Bogdan Vatkov wrote: > >> > >>> I have stopwords.txt file with 1200+ words, i did not understand this > >> with > >>> the stemming - you mean my stopwords are somehow ignored due to some > >>> stemming or ? > >> > >> No, stopword removal happens before stemming so it is possible that a > word > >> that was not stopped was then stemmed to a stopword. > >> > >> I thought you said yesterday you got it straightened out. > >> > >>> > >>> On Sun, Jan 3, 2010 at 3:53 PM, Grant Ingersoll <[email protected]> > >> wrote: > >>> > >>>> Are you sure you have stopwords and it is not the result of stemming > >> some > >>>> other word? > >>>> > >>>> On Jan 3, 2010, at 7:57 AM, Bogdan Vatkov wrote: > >>>> > >>>>> my Solr config is like the default one: > >>>>> > >>>>> <field name="msg_body" type="text" termVectors="true" indexed="true" > >>>>> stored="true"/> > >>>>> > >>>>> <fieldType name="text" class="solr.TextField" > >>>> positionIncrementGap="100"> > >>>>> <analyzer type="index"> > >>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >>>>> <filter class="solr.StopFilterFactory" > >>>>> ignoreCase="true" > >>>>> words="stopwords.txt" > >>>>> enablePositionIncrements="true" > >>>>> /> > >>>>> <filter class="solr.WordDelimiterFilterFactory" > >>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1" > >>>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > >>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>> <filter class="solr.SnowballPorterFilterFactory" > >>>> language="English" > >>>>> protected="protwords.txt"/> > >>>>> </analyzer> > >>>>> <analyzer type="query"> > >>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >>>>> <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" > >>>>> ignoreCase="true" expand="true"/> > >>>>> <filter class="solr.StopFilterFactory" > >>>>> ignoreCase="true" > >>>>> words="stopwords.txt" > >>>>> enablePositionIncrements="true" > >>>>> /> > >>>>> <filter class="solr.WordDelimiterFilterFactory" > >>>>> generateWordParts="1" generateNumberParts="1" catenateWords="0" > >>>>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > >>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>> <filter class="solr.SnowballPorterFilterFactory" > >>>> language="English" > >>>>> protected="protwords.txt"/> > >>>>> </analyzer> > >>>>> </fieldType> > >>>> > >>>> > >>> > >>> > >>> -- > >>> Best regards, > >>> Bogdan > >> > >> > > > > > > -- > > Best regards, > > Bogdan > > -- Best regards, Bogdan
