Re: Classifier for query intent?
Hello Wunder, If you are particular about Java Stanford and Weka both are good choices. OpenNLP also has a document classifier. You can even explore beyond Java, I mean Python, and consume the intent as a REST service. Regards, Dikshant On Tue 3 Apr, 2018, 4:48 AM Walter Underwood,wrote: > We are experimenting with a text classifier for determining query intent. > Anybody have a favorite (or anti-favorite) Java implementation? Speed and > ease of implementation is important. > > Right now, we’re mostly looking at Weka and the Stanford Classifier. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >
Re: Clarification on +, and in edismax parser
Hi, No, + and "and" doesn't works similar. Even "and" and "AND" would have a different behavior (is configurable) in edismax. When you put a + before a term, you specify that it's mandatory. Hence, "+google +india" will get you the same result as "google AND india". Best Regards, *Dikshant Shahi* On Thu, Mar 10, 2016 at 12:59 PM, Anil <anilk...@gmail.com> wrote: > "google"+"india" , "india"+"google" returning different results. Any help > would be appreciated. > > Thanks, > Anil > > > On 10 March 2016 at 11:47, Anil <anilk...@gmail.com> wrote: > > > HI, > > > > I am using edismax query parser for my solr search. > > > > i believe '+' and 'and' should work similar. > > > > ex : "google"+"india", "google" and "india" should return same number of > > results. > > > > Correct me if I am wrong. Thanks. > > > > Regards, > > Anil > > > > > > >
Request for Wiki edit rights
Hi, Can you please provide me the privilege to edit Wiki pages. My Wiki username is Dikshant. Thanks, Dikshant
Re: Request for Wiki edit rights
Thanks Erick! This is good for now. On Thu, Jul 16, 2015 at 9:54 AM, Erick Erickson erickerick...@gmail.com wrote: I added you to the Solr Wiki, if you need Lucene Wiki access let us know. Erick On Wed, Jul 15, 2015 at 7:59 PM, Dikshant Shahi contacts...@gmail.com wrote: Hi, Can you please provide me the privilege to edit Wiki pages. My Wiki username is Dikshant. Thanks, Dikshant
Admin extra menu becomes invisible
Hi, I uncommented the html tags in admin-extra.menu-top and admin-extra.menu-bottom. It works fine when I select the core from the dropdown but once I click on any other tab like Replication, Dataimport etc, it disappears. I tried it in Solr 4.6.1 and Solr 5.0.0 and the behavior is same. I could see there is a fix in JIRA issue 4405 https://issues.apache.org/jira/browse/SOLR-4405 but I don't see it working. Am wondering if am missing something. Thanks, Dikshant
Re: Non-Schemaless configuration in solr 5.0
You can create core as follows: solr create -c corename -d sample_techproducts_configs This will create a core with full features of Solr. You can refer to examples there and modify it as per your need. - Dikshant On Tue, Mar 17, 2015 at 9:38 PM, itzikgili itzik.g...@gmail.com wrote: I'm trying to implement an autocomplete solution for my website. After using solr 5.0 with the examples provided in it, I wanted to test it with my own configuration. Using *solr start * and then *solr create -c corename * Created a core , as asked. It's seems like now, solr is running schemaless. 1. How can I define a Schema.xml so that solr won't be schemaless? 2. Is there a way to use copy fields and analyzers using schemaless solr? -- View this message in context: http://lucene.472066.n3.nabble.com/Non-Schemaless-configuration-in-solr-5-0-tp4193509.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr pattern tokenizer
Why have you created ngram of size 3? Do you want match also in case of spell-mistakes? If you want 2 consecutive tokens to match, you can create shingles. Please refer to link https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-ShingleFilter Thanks, Dikshant On Mon, Feb 2, 2015 at 3:26 PM, Nivedita nivedita.pa...@tcs.com wrote: Hi, I want to tokenize query like CHQ PAID-INWARD TRAN-HDFC LTD in such a way that it should give me result documnet containing HDFC LTD and not HDFC MF. How can I do this. I Have already applied below Tokenizers fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=25 side=front/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.TrimFilterFactory / /analyzer /fieldType Please help. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-pattern-tokenizer-tp4183421.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lucene cosine similarity score for more like this query
Conceptually, your understanding is correct about VSM cosine similarity. In text analysis, the range is 0 to 1 as there is no negative similarity. The scores for handler which internally use Lucene's cosine similarity can also go beyond 1. The reason being these scores are computed for each field and goes through more computation after that. For example summation/multiplication of scores for fields, to come up with the final score for the document. Correct me, if my understanding is wrong. Thanks, Dikshant On Tue, Feb 3, 2015 at 2:53 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - MoreLikeThis is not based on cosine similarity. The idea is that rare terms - high IDF - are extracted from the source document, and then used to build a regular Query(). That query follows the same rules as regular queries, the rules of your similarity implemenation, which is TFIDF by default. So, as suggested, if you enable debugging, you can clearly see why scores can be above 1, or even much higher if queryNorm is disabled when using BM25 as similarity. If you really need cosine similarity between documents, you have to enable term vectors for the source fields, and use them to calculate the angle. The problem is that this does not scale well, you would need to calculate angles with virtually all other documents. M. -Original message- From:Ali Nazemian alinazem...@gmail.com Sent: Monday 2nd February 2015 21:39 To: solr-user@lucene.apache.org Subject: Re: Lucene cosine similarity score for more like this query Dear Erik, Thank you for your response. Would younplease tell me why this score could be higher than 1? While cosine similarity can not be higher than 1. On Feb 2, 2015 7:32 PM, Erik Hatcher erik.hatc...@gmail.com wrote: The scoring is the same as Lucene. To get deeper insight into how a score is computed, use Solr’s debug=true mode to see the explain details in the response. Erik On Feb 2, 2015, at 10:49 AM, Ali Nazemian alinazem...@gmail.com wrote: Hi, I was wondering what is the range of score is brought by more like this query in Solr? I know that the Lucene uses cosine similarity in vector space model for calculating similarity between two documents. I also know that cosine similarity is between -1 and 1 but the fact that I dont understand is why the score which is brought by more like this query could be 12 for example?! Would you please explain what is the calculation process is Solr? Thank you very much. Best regards. -- A.Nazemian
Re: Stopwords in shingles suggester
Configure a fieldType in schema.xml as below: fieldType name=text_shingle class=solr.TextField positionIncrementGap=0 analyzer tokenizer class=solr.StandardTokenizerFactory/ .. .. *filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt /* filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=3 outputUnigrams=false / /analyzer /fieldType Thanks, Dikshant On Mon, Oct 27, 2014 at 6:26 PM, O. Klein kl...@octoweb.nl wrote: Is there a way in Solr to filter out stopwords in shingles like ES does? http://www.elasticsearch.org/blog/searching-with-shingles/ -- View this message in context: http://lucene.472066.n3.nabble.com/Stopwords-in-shingles-suggester-tp4166057.html Sent from the Solr - User mailing list archive at Nabble.com.