Re: About Query Parser

2014-06-27 Thread Vivekanand Ittigi
That's impressive answer. I actually wanted to know how exactly query
parser works. I'm actually supposed to collect some fields,values,other
related info and build a solr query. I wanted to know i should use this
queryparser or java code to build solr query. Anyway it looks i've to go
with java code so build it and i"m on it.

Thanks,
Vivek


On Fri, Jun 20, 2014 at 6:06 PM, Daniel Collins 
wrote:

> I would say "*:*" is a human-readable/writable query. as is
> "inStock:false".  The former will be converted by the query parser into a
> MatchAllDocsQuery which is what Lucene understands.  The latter will be
> converted (again by the query parser) into some query.  Now this is where
> *which* query parser you are using is important.  Is "inStock" a word to be
> queried, or a field in your schema?  Probably the latter, but the query
> parser has to determine that using the Solr schema.  So I would expect that
> query to be converted to a TermQuery(Term("inStock", "false")), so a query
> for the value false in the field inStock.
>
> This is all interesting but what are you really trying to find out?  If you
> just want to run queries and see what they translate to, you can use the
> debug options when you send the query in, and then Solr will return to you
> both the raw query (with any other options that the query handler might
> have added to your query) as well as the Lucene Query generated from it.
>
> e.g.from running ":" on a solr instance.
>
> "rawquerystring": "*:*", "querystring": "*:*", "parsedquery":
> "MatchAllDocsQuery(*:*)", "parsedquery_toString": "*:*", "QParser":
> "LuceneQParser",
> Or (this shows the difference between raw query syntax and parsed query
> syntax) "rawquerystring": "body_en:test AND headline_en:hello",
> "querystring":
> "body_en:test AND headline_en:hello", "parsedquery": "+body_en:test
> +headline_en:hello", "parsedquery_toString": "+body_en:test
> +headline_en:hello", "QParser": "LuceneQParser",
>
>
> On 20 June 2014 13:05, Vivekanand Ittigi  wrote:
>
> > All right let me put this.
> >
> >
> >
> http://192.168.1.78:8983/solr/collection1/select?q=inStock:false&facet=true&facet.field=popularity&wt=xml&indent=true
> > .
> >
> > I just want to know what is this form. is it lucene query or this query
> > should go under query parser to get converted to lucene query.
> >
> >
> > Thanks,
> > Vivek
> >
> >
> > On Fri, Jun 20, 2014 at 5:19 PM, Alexandre Rafalovitch <
> arafa...@gmail.com
> > >
> > wrote:
> >
> > > That's *:* and a special case. There is no scoring here, nor searching.
> > > Just a dump of documents. Not even filtering or faceting. I sure hope
> you
> > > have more interesting examples.
> > >
> > > Regards,
> > > Alex
> > > On 20/06/2014 6:40 pm, "Vivekanand Ittigi" 
> > wrote:
> > >
> > > > Hi Daniel,
> > > >
> > > > You said inputs are "human-generated" and outputs are "lucene
> objects".
> > > So
> > > > my question is what does the below query mean. Does this fall under
> > > > human-generated one or lucene.?
> > > >
> > > >
> > http://localhost:8983/solr/collection1/select?q=*%3A*&wt=xml&indent=true
> > > >
> > > > Thanks,
> > > > Vivek
> > > >
> > > >
> > > >
> > > > On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins <
> danwcoll...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Alexandre's response is very thorough, so I'm really simplifying
> > > things,
> > > > I
> > > > > confess but here's my "query parsers for dummies". :)
> > > > >
> > > > > In terms of inputs/outputs, a QueryParser takes a string (generally
> > > > assumed
> > > > > to be "human generated" i.e. something a user might type in, so
> > maybe a
> > > > > sentence, a set of words, the format can vary) and outputs a Lucene
> > > Query
> > > > > object (
> > > > >
> > > > >
> > > >
> > >
> >
> http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
> > > > > ),
> > > > > which in fact is a kind of "tree" (again, I'm simplifying I know)
> > > since a
> > > > > query can contain nested expressions.
> > > > >
> > > > > So very loosely its a translator from a human-generated query into
> > the
> > > > > structure that Lucene can handle.  There are several different
> query
> > > > > parsers since they all use different input syntax, and ways of
> > handling
> > > > > different constructs (to handle A and B, should the user type "+A
> +B"
> > > or
> > > > "A
> > > > > and B" or just "A B" for example), and have different levels of
> > support
> > > > for
> > > > > the various Query structures that Lucene can handle: SpanQuery,
> > > > FuzzyQuery,
> > > > > PhraseQuery, etc.
> > > > >
> > > > > We for example use an XML-based query parser.  Why (you might well
> > > ask!),
> > > > > well we had an already used and supported query syntax of our own,
> > > which
> > > > > our users understood, so we couldn't use an off the shelf query
> > parser.
> > > >  We
> > > > > could have built our own in Java, but for a variety of reasons we
> > parse
> > > > our
> > > > > queries in a

Re: About Query Parser

2014-06-20 Thread Daniel Collins
I would say "*:*" is a human-readable/writable query. as is
"inStock:false".  The former will be converted by the query parser into a
MatchAllDocsQuery which is what Lucene understands.  The latter will be
converted (again by the query parser) into some query.  Now this is where
*which* query parser you are using is important.  Is "inStock" a word to be
queried, or a field in your schema?  Probably the latter, but the query
parser has to determine that using the Solr schema.  So I would expect that
query to be converted to a TermQuery(Term("inStock", "false")), so a query
for the value false in the field inStock.

This is all interesting but what are you really trying to find out?  If you
just want to run queries and see what they translate to, you can use the
debug options when you send the query in, and then Solr will return to you
both the raw query (with any other options that the query handler might
have added to your query) as well as the Lucene Query generated from it.

e.g.from running ":" on a solr instance.

"rawquerystring": "*:*", "querystring": "*:*", "parsedquery":
"MatchAllDocsQuery(*:*)", "parsedquery_toString": "*:*", "QParser":
"LuceneQParser",
Or (this shows the difference between raw query syntax and parsed query
syntax) "rawquerystring": "body_en:test AND headline_en:hello", "querystring":
"body_en:test AND headline_en:hello", "parsedquery": "+body_en:test
+headline_en:hello", "parsedquery_toString": "+body_en:test
+headline_en:hello", "QParser": "LuceneQParser",


On 20 June 2014 13:05, Vivekanand Ittigi  wrote:

> All right let me put this.
>
>
> http://192.168.1.78:8983/solr/collection1/select?q=inStock:false&facet=true&facet.field=popularity&wt=xml&indent=true
> .
>
> I just want to know what is this form. is it lucene query or this query
> should go under query parser to get converted to lucene query.
>
>
> Thanks,
> Vivek
>
>
> On Fri, Jun 20, 2014 at 5:19 PM, Alexandre Rafalovitch  >
> wrote:
>
> > That's *:* and a special case. There is no scoring here, nor searching.
> > Just a dump of documents. Not even filtering or faceting. I sure hope you
> > have more interesting examples.
> >
> > Regards,
> > Alex
> > On 20/06/2014 6:40 pm, "Vivekanand Ittigi" 
> wrote:
> >
> > > Hi Daniel,
> > >
> > > You said inputs are "human-generated" and outputs are "lucene objects".
> > So
> > > my question is what does the below query mean. Does this fall under
> > > human-generated one or lucene.?
> > >
> > >
> http://localhost:8983/solr/collection1/select?q=*%3A*&wt=xml&indent=true
> > >
> > > Thanks,
> > > Vivek
> > >
> > >
> > >
> > > On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins  >
> > > wrote:
> > >
> > > > Alexandre's response is very thorough, so I'm really simplifying
> > things,
> > > I
> > > > confess but here's my "query parsers for dummies". :)
> > > >
> > > > In terms of inputs/outputs, a QueryParser takes a string (generally
> > > assumed
> > > > to be "human generated" i.e. something a user might type in, so
> maybe a
> > > > sentence, a set of words, the format can vary) and outputs a Lucene
> > Query
> > > > object (
> > > >
> > > >
> > >
> >
> http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
> > > > ),
> > > > which in fact is a kind of "tree" (again, I'm simplifying I know)
> > since a
> > > > query can contain nested expressions.
> > > >
> > > > So very loosely its a translator from a human-generated query into
> the
> > > > structure that Lucene can handle.  There are several different query
> > > > parsers since they all use different input syntax, and ways of
> handling
> > > > different constructs (to handle A and B, should the user type "+A +B"
> > or
> > > "A
> > > > and B" or just "A B" for example), and have different levels of
> support
> > > for
> > > > the various Query structures that Lucene can handle: SpanQuery,
> > > FuzzyQuery,
> > > > PhraseQuery, etc.
> > > >
> > > > We for example use an XML-based query parser.  Why (you might well
> > ask!),
> > > > well we had an already used and supported query syntax of our own,
> > which
> > > > our users understood, so we couldn't use an off the shelf query
> parser.
> > >  We
> > > > could have built our own in Java, but for a variety of reasons we
> parse
> > > our
> > > > queries in a front-end system ahead of Solr (which is C++-based), so
> we
> > > > needed an interim format to pass queries to Solr that was as near to
> a
> > > > Lucene Query object as we could get (and there was an existing XML
> > parser
> > > > to save us starting from square one!).
> > > >
> > > > As part of that Query construction (but independent of which
> > QueryParser
> > > > you use), Solr will also make use of a set of Tokenizers and Filters
> (
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
> > > > )
> > > > but that's more to do with dealing with the terms in the query (so in
> > my
> > > > examples above, is A a real wor

Re: About Query Parser

2014-06-20 Thread Vivekanand Ittigi
All right let me put this.

http://192.168.1.78:8983/solr/collection1/select?q=inStock:false&facet=true&facet.field=popularity&wt=xml&indent=true
.

I just want to know what is this form. is it lucene query or this query
should go under query parser to get converted to lucene query.


Thanks,
Vivek


On Fri, Jun 20, 2014 at 5:19 PM, Alexandre Rafalovitch 
wrote:

> That's *:* and a special case. There is no scoring here, nor searching.
> Just a dump of documents. Not even filtering or faceting. I sure hope you
> have more interesting examples.
>
> Regards,
> Alex
> On 20/06/2014 6:40 pm, "Vivekanand Ittigi"  wrote:
>
> > Hi Daniel,
> >
> > You said inputs are "human-generated" and outputs are "lucene objects".
> So
> > my question is what does the below query mean. Does this fall under
> > human-generated one or lucene.?
> >
> > http://localhost:8983/solr/collection1/select?q=*%3A*&wt=xml&indent=true
> >
> > Thanks,
> > Vivek
> >
> >
> >
> > On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins 
> > wrote:
> >
> > > Alexandre's response is very thorough, so I'm really simplifying
> things,
> > I
> > > confess but here's my "query parsers for dummies". :)
> > >
> > > In terms of inputs/outputs, a QueryParser takes a string (generally
> > assumed
> > > to be "human generated" i.e. something a user might type in, so maybe a
> > > sentence, a set of words, the format can vary) and outputs a Lucene
> Query
> > > object (
> > >
> > >
> >
> http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
> > > ),
> > > which in fact is a kind of "tree" (again, I'm simplifying I know)
> since a
> > > query can contain nested expressions.
> > >
> > > So very loosely its a translator from a human-generated query into the
> > > structure that Lucene can handle.  There are several different query
> > > parsers since they all use different input syntax, and ways of handling
> > > different constructs (to handle A and B, should the user type "+A +B"
> or
> > "A
> > > and B" or just "A B" for example), and have different levels of support
> > for
> > > the various Query structures that Lucene can handle: SpanQuery,
> > FuzzyQuery,
> > > PhraseQuery, etc.
> > >
> > > We for example use an XML-based query parser.  Why (you might well
> ask!),
> > > well we had an already used and supported query syntax of our own,
> which
> > > our users understood, so we couldn't use an off the shelf query parser.
> >  We
> > > could have built our own in Java, but for a variety of reasons we parse
> > our
> > > queries in a front-end system ahead of Solr (which is C++-based), so we
> > > needed an interim format to pass queries to Solr that was as near to a
> > > Lucene Query object as we could get (and there was an existing XML
> parser
> > > to save us starting from square one!).
> > >
> > > As part of that Query construction (but independent of which
> QueryParser
> > > you use), Solr will also make use of a set of Tokenizers and Filters (
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
> > > )
> > > but that's more to do with dealing with the terms in the query (so in
> my
> > > examples above, is A a real word, does it need stemming, lowercasing,
> > > removing because its a stopword, etc).
> > >
> >
>


Re: About Query Parser

2014-06-20 Thread Alexandre Rafalovitch
That's *:* and a special case. There is no scoring here, nor searching.
Just a dump of documents. Not even filtering or faceting. I sure hope you
have more interesting examples.

Regards,
Alex
On 20/06/2014 6:40 pm, "Vivekanand Ittigi"  wrote:

> Hi Daniel,
>
> You said inputs are "human-generated" and outputs are "lucene objects". So
> my question is what does the below query mean. Does this fall under
> human-generated one or lucene.?
>
> http://localhost:8983/solr/collection1/select?q=*%3A*&wt=xml&indent=true
>
> Thanks,
> Vivek
>
>
>
> On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins 
> wrote:
>
> > Alexandre's response is very thorough, so I'm really simplifying things,
> I
> > confess but here's my "query parsers for dummies". :)
> >
> > In terms of inputs/outputs, a QueryParser takes a string (generally
> assumed
> > to be "human generated" i.e. something a user might type in, so maybe a
> > sentence, a set of words, the format can vary) and outputs a Lucene Query
> > object (
> >
> >
> http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
> > ),
> > which in fact is a kind of "tree" (again, I'm simplifying I know) since a
> > query can contain nested expressions.
> >
> > So very loosely its a translator from a human-generated query into the
> > structure that Lucene can handle.  There are several different query
> > parsers since they all use different input syntax, and ways of handling
> > different constructs (to handle A and B, should the user type "+A +B" or
> "A
> > and B" or just "A B" for example), and have different levels of support
> for
> > the various Query structures that Lucene can handle: SpanQuery,
> FuzzyQuery,
> > PhraseQuery, etc.
> >
> > We for example use an XML-based query parser.  Why (you might well ask!),
> > well we had an already used and supported query syntax of our own, which
> > our users understood, so we couldn't use an off the shelf query parser.
>  We
> > could have built our own in Java, but for a variety of reasons we parse
> our
> > queries in a front-end system ahead of Solr (which is C++-based), so we
> > needed an interim format to pass queries to Solr that was as near to a
> > Lucene Query object as we could get (and there was an existing XML parser
> > to save us starting from square one!).
> >
> > As part of that Query construction (but independent of which QueryParser
> > you use), Solr will also make use of a set of Tokenizers and Filters (
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
> > )
> > but that's more to do with dealing with the terms in the query (so in my
> > examples above, is A a real word, does it need stemming, lowercasing,
> > removing because its a stopword, etc).
> >
>


Re: About Query Parser

2014-06-20 Thread Vivekanand Ittigi
Hi Daniel,

You said inputs are "human-generated" and outputs are "lucene objects". So
my question is what does the below query mean. Does this fall under
human-generated one or lucene.?

http://localhost:8983/solr/collection1/select?q=*%3A*&wt=xml&indent=true

Thanks,
Vivek



On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins 
wrote:

> Alexandre's response is very thorough, so I'm really simplifying things, I
> confess but here's my "query parsers for dummies". :)
>
> In terms of inputs/outputs, a QueryParser takes a string (generally assumed
> to be "human generated" i.e. something a user might type in, so maybe a
> sentence, a set of words, the format can vary) and outputs a Lucene Query
> object (
>
> http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
> ),
> which in fact is a kind of "tree" (again, I'm simplifying I know) since a
> query can contain nested expressions.
>
> So very loosely its a translator from a human-generated query into the
> structure that Lucene can handle.  There are several different query
> parsers since they all use different input syntax, and ways of handling
> different constructs (to handle A and B, should the user type "+A +B" or "A
> and B" or just "A B" for example), and have different levels of support for
> the various Query structures that Lucene can handle: SpanQuery, FuzzyQuery,
> PhraseQuery, etc.
>
> We for example use an XML-based query parser.  Why (you might well ask!),
> well we had an already used and supported query syntax of our own, which
> our users understood, so we couldn't use an off the shelf query parser.  We
> could have built our own in Java, but for a variety of reasons we parse our
> queries in a front-end system ahead of Solr (which is C++-based), so we
> needed an interim format to pass queries to Solr that was as near to a
> Lucene Query object as we could get (and there was an existing XML parser
> to save us starting from square one!).
>
> As part of that Query construction (but independent of which QueryParser
> you use), Solr will also make use of a set of Tokenizers and Filters (
>
> https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
> )
> but that's more to do with dealing with the terms in the query (so in my
> examples above, is A a real word, does it need stemming, lowercasing,
> removing because its a stopword, etc).
>


Re: About Query Parser

2014-06-20 Thread Daniel Collins
Alexandre's response is very thorough, so I'm really simplifying things, I
confess but here's my "query parsers for dummies". :)

In terms of inputs/outputs, a QueryParser takes a string (generally assumed
to be "human generated" i.e. something a user might type in, so maybe a
sentence, a set of words, the format can vary) and outputs a Lucene Query
object (
http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html),
which in fact is a kind of "tree" (again, I'm simplifying I know) since a
query can contain nested expressions.

So very loosely its a translator from a human-generated query into the
structure that Lucene can handle.  There are several different query
parsers since they all use different input syntax, and ways of handling
different constructs (to handle A and B, should the user type "+A +B" or "A
and B" or just "A B" for example), and have different levels of support for
the various Query structures that Lucene can handle: SpanQuery, FuzzyQuery,
PhraseQuery, etc.

We for example use an XML-based query parser.  Why (you might well ask!),
well we had an already used and supported query syntax of our own, which
our users understood, so we couldn't use an off the shelf query parser.  We
could have built our own in Java, but for a variety of reasons we parse our
queries in a front-end system ahead of Solr (which is C++-based), so we
needed an interim format to pass queries to Solr that was as near to a
Lucene Query object as we could get (and there was an existing XML parser
to save us starting from square one!).

As part of that Query construction (but independent of which QueryParser
you use), Solr will also make use of a set of Tokenizers and Filters (
https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters)
but that's more to do with dealing with the terms in the query (so in my
examples above, is A a real word, does it need stemming, lowercasing,
removing because its a stopword, etc).


Re: About Query Parser

2014-06-20 Thread Alexandre Rafalovitch
I am going to have a go at this. Maybe others can add/correct.

When you make a request to Solr, it hits a request handler first. E.g.
a "/select" request handler. That's defined in solrconfig.xml

The request handler can change your request with some defaults,
required and overriding parameters.

For "solr.SearchHandler", it can also define what search components
stack then processes the actual request. They can define it explicitly
(e.g. "/suggest" request handler),  use default stack or
append/prepend to the default stack (e.g. "/spell" request Handler).

The default search component stack can be seen in the commented out
section of solrconfig.xml and consists of 6 components: query, facet,
mlt (MoreLikeThis), highlight, stats, and debug.

Query component is the one that actually does the searching and
figuring out what the result documents are. And it uses query parsers
for that. There are multiple query parsers available. The most common
are "standard/lucene", "dismax" and "edismax". But there is a bunch
more: https://cwiki.apache.org/confluence/display/solr/Query+Syntax+and+Parsing

If you don't have query components, you are not actually searching for
documents, you are doing something else (e.g. spelling).

These parsers transform what you sent in your URL (in the "q"
parameter, but also others) into the Lucene or internal queries that
return documents with some ranking attached.

Then, other components do their own things too. facet components add
facets. highlight components add highlight sections based on the
already collected information and so on.

Then, all that gets serialized into one of many supported formats
(XML, JSON, Ruby, etc) and sent back to the client.

If you want examples, then just read through solrconfig.xml and
shema.xml and understand how they hang together. That's why they are
so long, so people can see the defaults and examples. If you did not
care for that, your solrconfig.xml could be as small as:
https://github.com/arafalov/solr-indexing-book/blob/master/published/collection1/conf/solrconfig.xml

Regards,
   Alex.
P.s. The interesting question in return is "where are you stuck that
you think that knowing what query parser is will move you further
ahead?"
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Jun 20, 2014 at 3:55 PM, Vivekanand Ittigi
 wrote:
> Hi,
>
> I think this might be a silly question but i want to make it clear.
>
> What is query parser...? What does it do.? I know its used for converting
> query. But from What to what?what is the input and what is the output of
> query parser. And where exactly this feature can be used?
>
> If possible please explain with the example. It really helps a lot?
>
> Thanks,
> Vivek


About Query Parser

2014-06-20 Thread Vivekanand Ittigi
Hi,

I think this might be a silly question but i want to make it clear.

What is query parser...? What does it do.? I know its used for converting
query. But from What to what?what is the input and what is the output of
query parser. And where exactly this feature can be used?

If possible please explain with the example. It really helps a lot?

Thanks,
Vivek