Re: Classifier for query intent?

2018-04-02 Thread Dikshant Shahi
Hello Wunder,

If you are particular about Java Stanford and Weka both are good choices.
OpenNLP also has a document classifier.

You can even explore beyond Java, I mean Python, and consume the intent as
a REST service.

Regards,
Dikshant

On Tue 3 Apr, 2018, 4:48 AM Walter Underwood,  wrote:

> We are experimenting with a text classifier for determining query intent.
> Anybody have a favorite (or anti-favorite) Java implementation? Speed and
> ease of implementation is important.
>
> Right now, we’re mostly looking at Weka and the Stanford Classifier.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>


Re: Clarification on +, and in edismax parser

2016-03-09 Thread Dikshant Shahi
Hi,

No, + and "and" doesn't works similar. Even "and" and "AND" would have a
different behavior (is configurable) in edismax.

When you put a + before a term, you specify that it's mandatory. Hence,
"+google +india" will get you the same result as "google AND india".

Best Regards,
*Dikshant Shahi*



On Thu, Mar 10, 2016 at 12:59 PM, Anil <anilk...@gmail.com> wrote:

>  "google"+"india" ,  "india"+"google" returning different results. Any help
> would be appreciated.
>
> Thanks,
> Anil
>
>
> On 10 March 2016 at 11:47, Anil <anilk...@gmail.com> wrote:
>
> > HI,
> >
> > I am using edismax query parser for my solr search.
> >
> > i believe '+' and 'and' should work similar.
> >
> > ex : "google"+"india", "google" and "india" should return same number of
> > results.
> >
> > Correct me if I am wrong. Thanks.
> >
> > Regards,
> > Anil
> >
> >
> >
>


Request for Wiki edit rights

2015-07-15 Thread Dikshant Shahi
Hi,

Can you please provide me the privilege to edit Wiki pages.

My Wiki username is Dikshant.

Thanks,
Dikshant


Re: Request for Wiki edit rights

2015-07-15 Thread Dikshant Shahi
Thanks Erick! This is good for now.

On Thu, Jul 16, 2015 at 9:54 AM, Erick Erickson erickerick...@gmail.com
wrote:

 I added you to the Solr Wiki, if you need Lucene Wiki access let us know.

 Erick

 On Wed, Jul 15, 2015 at 7:59 PM, Dikshant Shahi contacts...@gmail.com
 wrote:
  Hi,
 
  Can you please provide me the privilege to edit Wiki pages.
 
  My Wiki username is Dikshant.
 
  Thanks,
  Dikshant



Admin extra menu becomes invisible

2015-03-17 Thread Dikshant Shahi
Hi,

I uncommented the html tags in admin-extra.menu-top and
admin-extra.menu-bottom. It works fine when I select the core from the
dropdown but once I click on any other tab like Replication, Dataimport
etc, it disappears.

I tried it in Solr 4.6.1 and Solr 5.0.0 and the behavior is same.

I could see there is a fix in JIRA issue 4405
https://issues.apache.org/jira/browse/SOLR-4405 but I don't see it
working.

Am wondering if am missing something.

Thanks,
Dikshant


Re: Non-Schemaless configuration in solr 5.0

2015-03-17 Thread Dikshant Shahi
You can create core as follows:
solr create -c corename -d sample_techproducts_configs

This will create a core  with full features of Solr. You can refer to
examples there and modify it as per your need.

- Dikshant

On Tue, Mar 17, 2015 at 9:38 PM, itzikgili itzik.g...@gmail.com wrote:


 I'm trying to implement an autocomplete solution for my website.

 After using solr 5.0 with the examples provided in it,
 I wanted to test it with my own configuration.

 Using
  *solr start *
 and then
  *solr create -c corename *

 Created a core , as asked.


 It's seems like now, solr is running schemaless.


 1. How can I define a Schema.xml so that solr won't be schemaless?
 2. Is there a way to use copy fields and analyzers using schemaless solr?





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Non-Schemaless-configuration-in-solr-5-0-tp4193509.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr pattern tokenizer

2015-02-02 Thread Dikshant Shahi
Why have you created ngram of size 3? Do you want match also in case of
spell-mistakes?
If you want 2 consecutive tokens to match, you can create shingles. Please
refer to link
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-ShingleFilter

Thanks,
Dikshant

On Mon, Feb 2, 2015 at 3:26 PM, Nivedita nivedita.pa...@tcs.com wrote:

 Hi,

 I want to tokenize query like CHQ PAID-INWARD TRAN-HDFC LTD  in such a
 way
 that it should give me result documnet containing HDFC LTD and not HDFC MF.

 How can I do this.
 I Have already applied below Tokenizers

  fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/

 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /

 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.TrimFilterFactory /
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/

 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.EdgeNGramFilterFactory minGramSize=3
 maxGramSize=25 side=front/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory
 words=stopwords.txt
 ignoreCase=true/
 filter class=solr.TrimFilterFactory /
   /analyzer
 /fieldType


 Please help.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-pattern-tokenizer-tp4183421.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Lucene cosine similarity score for more like this query

2015-02-02 Thread Dikshant Shahi
Conceptually, your understanding is correct about VSM  cosine similarity.
In text analysis, the range is 0 to 1 as there is no negative similarity.

The scores for handler which internally use Lucene's cosine similarity can
also go beyond 1. The reason being these scores are computed for each field
and goes through more computation after that. For example
summation/multiplication of scores for fields, to come up with the final
score for the document. Correct me, if my understanding is wrong.

Thanks,
Dikshant



On Tue, Feb 3, 2015 at 2:53 AM, Markus Jelsma markus.jel...@openindex.io
wrote:

 Hi - MoreLikeThis is not based on cosine similarity. The idea is that rare
 terms - high IDF - are extracted from the source document, and then used to
 build a regular Query(). That query follows the same rules as regular
 queries, the rules of your similarity implemenation, which is TFIDF by
 default. So, as suggested, if you enable debugging, you can clearly see why
 scores can be above 1, or even much higher if queryNorm is disabled when
 using BM25 as similarity.

 If you really need cosine similarity between documents, you have to enable
 term vectors for the source fields, and use them to calculate the angle.
 The problem is that this does not scale well, you would need to calculate
 angles with virtually all other documents.

 M.

 -Original message-
  From:Ali Nazemian alinazem...@gmail.com
  Sent: Monday 2nd February 2015 21:39
  To: solr-user@lucene.apache.org
  Subject: Re: Lucene cosine similarity score for more like this query
 
  Dear Erik,
  Thank you for your response. Would younplease tell me why this score
 could
  be higher than 1? While cosine similarity can not be higher than 1.
  On Feb 2, 2015 7:32 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
 
   The scoring is the same as Lucene.  To get deeper insight into how a
 score
   is computed, use Solr’s debug=true mode to see the explain details in
 the
   response.
  
   Erik
  
On Feb 2, 2015, at 10:49 AM, Ali Nazemian alinazem...@gmail.com
 wrote:
   
Hi,
I was wondering what is the range of score is brought by more like
 this
query in Solr? I know that the Lucene uses cosine similarity in
 vector
space model for calculating similarity between two documents. I also
 know
that cosine similarity is between -1 and 1 but the fact that I dont
understand is why the score which is brought by more like this query
   could
be 12 for example?! Would you please explain what is the
 calculation
process is Solr?
Thank you very much.
   
Best regards.
   
--
A.Nazemian
  
  
 



Re: Stopwords in shingles suggester

2014-10-27 Thread Dikshant Shahi
Configure a fieldType in schema.xml as below:

  fieldType name=text_shingle class=solr.TextField
positionIncrementGap=0
analyzer
  tokenizer class=solr.StandardTokenizerFactory/
  ..
  ..
  *filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /*
  filter class=solr.ShingleFilterFactory minShingleSize=2
maxShingleSize=3 outputUnigrams=false /

/analyzer
  /fieldType

Thanks,
Dikshant

On Mon, Oct 27, 2014 at 6:26 PM, O. Klein kl...@octoweb.nl wrote:

 Is there a way in Solr to filter out stopwords in shingles like ES does?

 http://www.elasticsearch.org/blog/searching-with-shingles/



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Stopwords-in-shingles-suggester-tp4166057.html
 Sent from the Solr - User mailing list archive at Nabble.com.