Modifying a tokenized field entry
Let's assume I have an index structured as below name species color max cat grey sam dog brown lucy cat white . . . . . . poe dog blond joe cat red pam dog brown The species and color fields are tokenized, indexed and stored. Now let's assume that I want to change the term cat to feline and dog to canine. From what I have been reading, I would have to delete each Document(row) and re-add it with the new term. Since the original cat and dog terms are Indexed, Tokenized and Stored, it seems like there should be a way to update just the terms cat and dog to their new titles. Is there already a way to do this? Did I just miss it? --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ---
Re: Is there a way to check for field uniqueness when indexing?
But in that case, I assume Solr does a commit per document added. Lets say I wanted to index a collection of 1 million pages, would it take much longer if I comited at each insertion rather than comiting at the end? Daniel Shane Grant Ingersoll wrote: On Aug 13, 2009, at 10:33 AM, Daniel Shane wrote: Does anyone have an idea on how I could check an index that is in the process of being indexed (things added, things deleted) for the uniquess of a given field *at the time I index a document* ? Solr has de-duplication built-in at indexing time: http://wiki.apache.org/solr/Deduplication -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
custom scorer
Hello, I'm trying to write a custom scorer that only uses the term frequency function from the DefaultSimilarity class, the problem is that documents with lower frequencies are returning with higher scores than documents with higher frequencies. Here's the code: searcher.setSimilarity(new DefaultSimilarity(){ public float lengthNorm(String field, int numTerms){ return 1; } public float idf(int docFreq, int numDocs){ return 1; } public float coord(int overlap, int maxoverlap){ return 1; } public float queryNorm(float sumOfSquaredWeights){ return 1; } public float sloppyFreq(int distance){ return 1; } }); Any idea why this wouldn't be working? Sincerely, Chris Salem
Re: custom scorer
Are you setting the Similarity before indexing, too, on the IndexWriter? On Aug 19, 2009, at 4:20 PM, Chris Salem wrote: Hello, I'm trying to write a custom scorer that only uses the term frequency function from the DefaultSimilarity class, the problem is that documents with lower frequencies are returning with higher scores than documents with higher frequencies. Here's the code: searcher.setSimilarity(new DefaultSimilarity(){ public float lengthNorm(String field, int numTerms){ return 1; } public float idf(int docFreq, int numDocs){ return 1; } public float coord(int overlap, int maxoverlap){ return 1; } public float queryNorm(float sumOfSquaredWeights){ return 1; } public float sloppyFreq(int distance){ return 1; } }); Any idea why this wouldn't be working? Sincerely, Chris Salem -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org