Re: query parsing output in analysis page and query page are different

2017-08-07 Thread Erick Erickson
Your problem is probably that the query _parser_ gets in there before
the input gets to the analysis chain. When you use the admin/analysis
page, it's as though the query parser has already broken the query up
and assigned it.

Add to that that wildcard queries have their own quirks when parsing
and... it's kind of confusing. Try escaping the asterisk as it has
special meaning for the query parsing.

Best,
Erick

On Mon, Aug 7, 2017 at 6:38 PM, radha krishnan
 wrote:
> Hi,
>
> I created the following fieldType in schema.xml
>
>  positionIncrementGap="100">
>
>mapping="mapping.txt"/>
>   
>   
>
> 
>
>
> mapping.txt contains the following  (replacing dot with white space)
>
> "." => " "
>
> and using the above in the field
>
>  required="true" stored="false" />
>
>
> 1. in the analysis page on the solr UI
> (http://localhost:8984/solr/#/tenant1-core-1/analysis)
>
>  i entered the following in  query tab --host1-dev*
>
>  i got the following output host1, dev
>
> 2. I inserted a document where 'text' contains the value host1-dev.eng.abc.com
>
> 3. When i go to the query page,
> (http://localhost:8984/solr/#/tenant1-core-1/query)
>
>and using this one for the query text:host1-dev* ( and enabled debug)
>
>am not getting the row i inserted in above step.
>
>also, noticed the  "parsedquery_toString":"text:host1-dev*",
>
>   It should have been text:host and text:dev*
>
>
> can you please guide on how can i make the query work.
>
>
> Query output (with debug enabled)
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"text:host1-dev*",
>   "indent":"on",
>   "wt":"json",
>   "debugQuery":"on",
>   "_":"1502149448777"}},
>   "response":{"numFound":0,"start":0,"docs":[]
>   },
>   "debug":{
> "rawquerystring":"text:host1-dev*",
> "querystring":"text:host1-dev*",
> "parsedquery":"text:host1-dev*",
> "parsedquery_toString":"text:host1-dev*",
> "explain":{},
> "QParser":"LuceneQParser",
> "timing":{
>   "time":1.0,
>   "prepare":{
> "time":0.0,
> "query":{
>   "time":0.0},
> "facet":{
>   "time":0.0},
> "facet_module":{
>   "time":0.0},
> "mlt":{
>   "time":0.0},
> "highlight":{
>   "time":0.0},
> "stats":{
>   "time":0.0},
> "expand":{
>   "time":0.0},
> "terms":{
>   "time":0.0},
> "debug":{
>   "time":0.0}},
>   "process":{
> "time":0.0,
> "query":{
>   "time":0.0},
> "facet":{
>   "time":0.0},
> "facet_module":{
>   "time":0.0},
> "mlt":{
>   "time":0.0},
> "highlight":{
>   "time":0.0},
> "stats":{
>   "time":0.0},
> "expand":{
>   "time":0.0},
> "terms":{
>   "time":0.0},
> "debug":{
>   "time":0.0}
>
>
>
>
>
> Thanks,
>
> D.Radhakrishnan


Re: query parsing

2015-09-28 Thread Alessandro Benedetti
 FROM eventlogtext
>>> WHERE
>>> lastmodtime > '${dataimporter.last_index_time}';">
>>>  
>>>  
>>>  
>>>  
>>> 
>>>
>>> Hope this helps!
>>>
>>> Thanks,
>>> Mark
>>>
>>> On 9/24/2015 10:57 AM, Erick Erickson wrote:
>>>
>>>> Geraint:
>>>>
>>>> Good Catch! I totally missed that. So all of our focus on schema.xml has
>>>> been... totally irrelevant. Now that you pointed that out, there's also
>>>> the
>>>> addition: add-unknown-fields-to-the-schema, which indicates you started
>>>> this up in "schemaless" mode.
>>>>
>>>> In short, solr is trying to guess what your field types should be and
>>>> guessing wrong (again and again and again). This is the classic weakness
>>>> of
>>>> schemaless. It's great for indexing stuff fast, but if it guesses wrong
>>>> you're stuck.
>>>>
>>>>
>>>> So to the original problem: I'd start over and either
>>>> 1> use the regular setup, not schemaless
>>>> or
>>>> 2> use the _managed_ schema API to explicitly add fields and fieldTypes
>>>> to
>>>> the managed schema
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Thu, Sep 24, 2015 at 2:02 AM, Duck Geraint (ext) GBJH <
>>>> geraint.d...@syngenta.com> wrote:
>>>>
>>>> Okay, so maybe I'm missing something here (I'm still relatively new to
>>>>> Solr myself), but am I right in thinking the following is still in your
>>>>> solrconfig.xml file:
>>>>>
>>>>> 
>>>>>   true
>>>>>   managed-schema
>>>>> 
>>>>>
>>>>> If so, wouldn't using a managed schema make several of your field
>>>>> definitions inside the schema.xml file semi-redundant?
>>>>>
>>>>> Regards,
>>>>> Geraint
>>>>>
>>>>>
>>>>> Geraint Duck
>>>>> Data Scientist
>>>>> Toxicology and Health Sciences
>>>>> Syngenta UK
>>>>> Email: geraint.d...@syngenta.com
>>>>>
>>>>>
>>>>> -Original Message-
>>>>> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
>>>>> Sent: 24 September 2015 09:23
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: query parsing
>>>>>
>>>>> I would focus on this :
>>>>>
>>>>> "
>>>>>
>>>>> 5> now kick off the DIH job and look again.
>>>>>>
>>>>>> Now it shows a histogram, but most of the "terms" are long -- the full
>>>>> texts of (the table.column) eventlogtext.logtext, including the
>>>>> whitespace
>>>>> (with %0A used for newline characters)...  So, it appears it is not
>>>>> being
>>>>> tokenized properly, correct?"
>>>>> Can you open from your Solr ui , the schema xml and show us the
>>>>> snippets
>>>>> for that field that seems to not tokenise ?
>>>>> Can you show us ( even a screenshot is fine) the schema browser page
>>>>> related ?
>>>>> Could be a problem of encoding ?
>>>>> Following Erick details about the analysis, what are your results ?
>>>>>
>>>>> Cheers
>>>>>
>>>>> 2015-09-24 8:04 GMT+01:00 Upayavira <u...@odoko.co.uk>:
>>>>>
>>>>> typically, the index dir is inside the data dir. Delete the index dir
>>>>>> and you should be good. If there is a tlog next to it, you might want
>>>>>> to delete that also.
>>>>>>
>>>>>> If you dont have a data dir, i wonder whether you set the data dir
>>>>>> when creating your core or collection. Typically the instance dir and
>>>>>> data dir aren't needed.
>>>>>>
>>>>>> Upayavira
>>>>>>
>>>>>> On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:
>>>>>>
>>>>>>> OK, this is bizarre. You'd have had to set up SolrCloud by
>>>>>>> specifying the -zkRun command when you start Solr or the -z

Re: query parsing

2015-09-27 Thread Mark Fenbers
eraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com


-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
Sent: 24 September 2015 09:23
To: solr-user@lucene.apache.org
Subject: Re: query parsing

I would focus on this :

"


5> now kick off the DIH job and look again.


Now it shows a histogram, but most of the "terms" are long -- the full
texts of (the table.column) eventlogtext.logtext, including the
whitespace
(with %0A used for newline characters)...  So, it appears it is not being
tokenized properly, correct?"
Can you open from your Solr ui , the schema xml and show us the snippets
for that field that seems to not tokenise ?
Can you show us ( even a screenshot is fine) the schema browser page
related ?
Could be a problem of encoding ?
Following Erick details about the analysis, what are your results ?

Cheers

2015-09-24 8:04 GMT+01:00 Upayavira <u...@odoko.co.uk>:


typically, the index dir is inside the data dir. Delete the index dir
and you should be good. If there is a tlog next to it, you might want
to delete that also.

If you dont have a data dir, i wonder whether you set the data dir
when creating your core or collection. Typically the instance dir and
data dir aren't needed.

Upayavira

On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:

OK, this is bizarre. You'd have had to set up SolrCloud by
specifying the -zkRun command when you start Solr or the -zkHost;
highly unlikely. On the admin page there would be a "cloud" link on
the left side, I really doubt one's there.

You should have a data directory, it should be the parent of the
index and tlog directories. As of sanity check try looking at the
analysis page.
Type
a bunch of words in the left hand side indexing box and uncheck the
verbose box. As you can tell I'm grasping at straws. I'm still
puzzled why you don't have a "data" directory here, but that
shouldn't really matter. How did you create this index? I don't mean
data import handler more how did you create the core that you're
indexing to?

Best,
Erick

On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers
<mark.fenb...@noaa.gov>
wrote:


On 9/23/2015 12:30 PM, Erick Erickson wrote:


Then my next guess is you're not pointing at the index you think
you

are

when you 'rm -rf data'

Just ignore the Elall field for now I should think, although get
rid

of it

if you don't think you need it.

DIH should be irrelevant here.

So let's back up.
1> go ahead and "rm -fr data" (with Solr stopped).


I have no "data" dir.  Did you mean "index" dir?  I removed 3
index directories (2 for spelling):
cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex


2> start Solr
3> do NOT re-index.
4> look at your index via the schema-browser. Of course there
4> should

be

nothing there!


Correct!  It said "there is no term info :("


5> now kick off the DIH job and look again.


Now it shows a histogram, but most of the "terms" are long -- the
full texts of (the table.column) eventlogtext.logtext, including
the

whitespace

(with %0A used for newline characters)...  So, it appears it is
not

being

tokenized properly, correct?


Your logtext field should have only single tokens. The fact that
you

have

some very
long tokens presumably with whitespace) indicates that you aren't

really

blowing
the index away between indexing.


Well, I did this time for sure.  I verified that initially,
because it showed there was no term info until I DIH'd again.


Are you perhaps in Solr Cloud with more than one replica?


Not that I know of, but being new to Solr, there could be things
going

on

that I'm not aware of.  How can I tell?  I certainly didn't set

anything up

for solrCloud deliberately.


In that case you
might be getting the index replicated on startup assuming you
didn't blow away all replicas. If you are in SolrCloud, I'd just
delete the collection and start over, after insuring that you'd
pushed the configset up to Zookeeper.

BTW, I always look at the schema.xml file from the Solr admin
window

just

as
a sanity check in these situations.


Good idea!  But the one shown in the browser is identical to the
one

I've

been editing!  So that's not an issue.




--
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England



Syngenta Limited, Registered in England No 2710846;Registered Office :
Syngenta Limited, European Regional Centre, Priestley Road, Surrey
Research
Park, Guildford, Surrey, GU2 7YH, United Kingdom

   This message may contain confidential information. If you are not the
designated recipient, please notify the sender immediately, and delete
the
original and any copies. Any use of the message by you is prohibited.





Re: query parsing

2015-09-26 Thread Erick Erickson
No need to re-install Solr, just create a new core, this time it'd probably be
easiest to use the bin/solr create_core command. In the Solr
directory just type bin/solr create_core -help to see the options.

We're pretty much trying to migrate to using bin/solr for all the maintenance
we can, but as always the documentation lags the code.

Yeah, things are a bit ragged. The admin UI/core UI is really a legacy
bit of code that has _always_ been confusing, I'm hoping we can pretty
much remove it at some point since it's as trappy as it is.

Best,
Erick

On Sat, Sep 26, 2015 at 12:49 PM, Mark Fenbers <mark.fenb...@noaa.gov> wrote:
> OK, a lot of dialog while I was gone for two days!  I read the whole thread,
> but I'm a newbie to Solr, so some of the dialog was Greek to me.  I
> understand the words, of course, but applying it so I know exactly what to
> do without screwing something else up is the problem.  After all, that is
> how I got into the mess in the first place.  I'm glad I have good help to
> untangle the knots I've made!
>
> I'd like to start over (option 1 below), but does this mean delete all my
> config and reinstalling Solr??  Maybe that is not a bad idea, but I will at
> least save off my data-config.xml as that is clearly the one thing that is
> probably working right.  However, I did do quite a bit of editing that I
> would have to do again. Please advise...
>
> To be fair, I must answer Erick's question of how I created the data index
> in the first place, because this might be relevant...
>
> The bulk of the data is read from 9000+ text files, where each file was
> manually typed.  Before inserting into the database, I do a little bit of
> processing of the text using "sed" to delete the top few and bottom few
> lines, and to substitute each single-quote character with a pair of
> single-quotes (so PostgreSQL doesn't choke).  Line-feed characters are
> preserved as ASCII 10 (hex 0A), but there shouldn't be (and I am not aware
> of) any characters aside from what is on the keyboard.
>
> Next, I insert it with this command:
> psql -U awips -d OHRFC -c "INSERT INTO EventLogText VALUES('$postDate',
> '$user', '$postDate', '$entryText', '$postCatVal');"
>
> In case you are wondering about my table, it is defined in this way:
> CREATE TABLE eventlogtext (
>   posttime timestamp without time zone NOT NULL, -- Timestamp of this
> entry's original posting
>   username character varying(8), -- username (logname) of the original
> poster
>   lastmodtime timestamp without time zone, -- Last time record was altered
>   logtext text, -- text of the log entry
>   category integer, -- bit-wise category value
>   CONSTRAINT eventlogtext_pkey PRIMARY KEY (posttime)
> )
>
> To do the indexing, I merely use /dataimport?full-import, but it knows what
> to do from my data-config.xml; which is here:
>
> 
>  url="jdbc:postgresql://dx1f/OHRFC" user="awips" />
> 
>  deltaQuery="SELECT posttime AS id FROM eventlogtext WHERE
> lastmodtime > '${dataimporter.last_index_time}';">
> 
> 
> 
> 
> 
>
> Hope this helps!
>
> Thanks,
> Mark
>
> On 9/24/2015 10:57 AM, Erick Erickson wrote:
>>
>> Geraint:
>>
>> Good Catch! I totally missed that. So all of our focus on schema.xml has
>> been... totally irrelevant. Now that you pointed that out, there's also
>> the
>> addition: add-unknown-fields-to-the-schema, which indicates you started
>> this up in "schemaless" mode.
>>
>> In short, solr is trying to guess what your field types should be and
>> guessing wrong (again and again and again). This is the classic weakness
>> of
>> schemaless. It's great for indexing stuff fast, but if it guesses wrong
>> you're stuck.
>>
>>
>> So to the original problem: I'd start over and either
>> 1> use the regular setup, not schemaless
>> or
>> 2> use the _managed_ schema API to explicitly add fields and fieldTypes to
>> the managed schema
>>
>> Best,
>> Erick
>>
>> On Thu, Sep 24, 2015 at 2:02 AM, Duck Geraint (ext) GBJH <
>> geraint.d...@syngenta.com> wrote:
>>
>>> Okay, so maybe I'm missing something here (I'm still relatively new to
>>> Solr myself), but am I right in thinking the following is still in your
>>> solrconfig.xml file:
>>>
>>>
>>>  true
>>>  managed-schema
>>>
>>>
>>> If so, wouldn't using a managed schema make several of your field
>>> definitions inside the schema.xml file semi-redundant?
>>>
>&

Re: query parsing

2015-09-26 Thread Mark Fenbers
OK, a lot of dialog while I was gone for two days!  I read the whole 
thread, but I'm a newbie to Solr, so some of the dialog was Greek to 
me.  I understand the words, of course, but applying it so I know 
exactly what to do without screwing something else up is the problem.  
After all, that is how I got into the mess in the first place.  I'm glad 
I have good help to untangle the knots I've made!


I'd like to start over (option 1 below), but does this mean delete all 
my config and reinstalling Solr??  Maybe that is not a bad idea, but I 
will at least save off my data-config.xml as that is clearly the one 
thing that is probably working right.  However, I did do quite a bit of 
editing that I would have to do again. Please advise...


To be fair, I must answer Erick's question of how I created the data 
index in the first place, because this might be relevant...


The bulk of the data is read from 9000+ text files, where each file was 
manually typed.  Before inserting into the database, I do a little bit 
of processing of the text using "sed" to delete the top few and bottom 
few lines, and to substitute each single-quote character with a pair of 
single-quotes (so PostgreSQL doesn't choke).  Line-feed characters are 
preserved as ASCII 10 (hex 0A), but there shouldn't be (and I am not 
aware of) any characters aside from what is on the keyboard.


Next, I insert it with this command:
psql -U awips -d OHRFC -c "INSERT INTO EventLogText VALUES('$postDate', 
'$user', '$postDate', '$entryText', '$postCatVal');"


In case you are wondering about my table, it is defined in this way:
CREATE TABLE eventlogtext (
  posttime timestamp without time zone NOT NULL, -- Timestamp of this 
entry's original posting
  username character varying(8), -- username (logname) of the original 
poster

  lastmodtime timestamp without time zone, -- Last time record was altered
  logtext text, -- text of the log entry
  category integer, -- bit-wise category value
  CONSTRAINT eventlogtext_pkey PRIMARY KEY (posttime)
)

To do the indexing, I merely use /dataimport?full-import, but it knows 
what to do from my data-config.xml; which is here:



url="jdbc:postgresql://dx1f/OHRFC" user="awips" />


deltaQuery="SELECT posttime AS id FROM eventlogtext 
WHERE lastmodtime > '${dataimporter.last_index_time}';">







Hope this helps!

Thanks,
Mark

On 9/24/2015 10:57 AM, Erick Erickson wrote:

Geraint:

Good Catch! I totally missed that. So all of our focus on schema.xml has
been... totally irrelevant. Now that you pointed that out, there's also the
addition: add-unknown-fields-to-the-schema, which indicates you started
this up in "schemaless" mode.

In short, solr is trying to guess what your field types should be and
guessing wrong (again and again and again). This is the classic weakness of
schemaless. It's great for indexing stuff fast, but if it guesses wrong
you're stuck.


So to the original problem: I'd start over and either
1> use the regular setup, not schemaless
or
2> use the _managed_ schema API to explicitly add fields and fieldTypes to
the managed schema

Best,
Erick

On Thu, Sep 24, 2015 at 2:02 AM, Duck Geraint (ext) GBJH <
geraint.d...@syngenta.com> wrote:


Okay, so maybe I'm missing something here (I'm still relatively new to
Solr myself), but am I right in thinking the following is still in your
solrconfig.xml file:

   
 true
 managed-schema
   

If so, wouldn't using a managed schema make several of your field
definitions inside the schema.xml file semi-redundant?

Regards,
Geraint


Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com


-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
Sent: 24 September 2015 09:23
To: solr-user@lucene.apache.org
Subject: Re: query parsing

I would focus on this :

"


5> now kick off the DIH job and look again.


Now it shows a histogram, but most of the "terms" are long -- the full
texts of (the table.column) eventlogtext.logtext, including the whitespace
(with %0A used for newline characters)...  So, it appears it is not being
tokenized properly, correct?"
Can you open from your Solr ui , the schema xml and show us the snippets
for that field that seems to not tokenise ?
Can you show us ( even a screenshot is fine) the schema browser page
related ?
Could be a problem of encoding ?
Following Erick details about the analysis, what are your results ?

Cheers

2015-09-24 8:04 GMT+01:00 Upayavira <u...@odoko.co.uk>:


typically, the index dir is inside the data dir. Delete the index dir
and you should be good. If there is a tlog next to it, you might want
to delete that also.

If you dont have a data dir, i wonder whether you set the data dir
when creating your core or collection. Typically the ins

Re: query parsing

2015-09-24 Thread Upayavira
typically, the index dir is inside the data dir. Delete the index dir
and you should be good. If there is a tlog next to it, you might want to
delete that also.

If you dont have a data dir, i wonder whether you set the data dir when
creating your core or collection. Typically the instance dir and data
dir aren't needed.

Upayavira

On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:
> OK, this is bizarre. You'd have had to set up SolrCloud by specifying the
> -zkRun command when you start Solr or the -zkHost; highly unlikely. On
> the
> admin page there would be a "cloud" link on the left side, I really doubt
> one's there.
> 
> You should have a data directory, it should be the parent of the index
> and
> tlog directories. As of sanity check try looking at the analysis page.
> Type
> a bunch of words in the left hand side indexing box and uncheck the
> verbose
> box. As you can tell I'm grasping at straws. I'm still puzzled why you
> don't have a "data" directory here, but that shouldn't really matter. How
> did you create this index? I don't mean data import handler more how did
> you create the core that you're indexing to?
> 
> Best,
> Erick
> 
> On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers 
> wrote:
> 
> > On 9/23/2015 12:30 PM, Erick Erickson wrote:
> >
> >> Then my next guess is you're not pointing at the index you think you are
> >> when you 'rm -rf data'
> >>
> >> Just ignore the Elall field for now I should think, although get rid of it
> >> if you don't think you need it.
> >>
> >> DIH should be irrelevant here.
> >>
> >> So let's back up.
> >> 1> go ahead and "rm -fr data" (with Solr stopped).
> >>
> > I have no "data" dir.  Did you mean "index" dir?  I removed 3 index
> > directories (2 for spelling):
> > cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex
> >
> >> 2> start Solr
> >> 3> do NOT re-index.
> >> 4> look at your index via the schema-browser. Of course there should be
> >> nothing there!
> >>
> > Correct!  It said "there is no term info :("
> >
> >> 5> now kick off the DIH job and look again.
> >>
> > Now it shows a histogram, but most of the "terms" are long -- the full
> > texts of (the table.column) eventlogtext.logtext, including the whitespace
> > (with %0A used for newline characters)...  So, it appears it is not being
> > tokenized properly, correct?
> >
> >> Your logtext field should have only single tokens. The fact that you have
> >> some very
> >> long tokens presumably with whitespace) indicates that you aren't really
> >> blowing
> >> the index away between indexing.
> >>
> > Well, I did this time for sure.  I verified that initially, because it
> > showed there was no term info until I DIH'd again.
> >
> >> Are you perhaps in Solr Cloud with more than one replica?
> >>
> > Not that I know of, but being new to Solr, there could be things going on
> > that I'm not aware of.  How can I tell?  I certainly didn't set anything up
> > for solrCloud deliberately.
> >
> >> In that case you
> >> might be getting the index replicated on startup assuming you didn't
> >> blow away all replicas. If you are in SolrCloud, I'd just delete the
> >> collection and
> >> start over, after insuring that you'd pushed the configset up to
> >> Zookeeper.
> >>
> >> BTW, I always look at the schema.xml file from the Solr admin window just
> >> as
> >> a sanity check in these situations.
> >>
> > Good idea!  But the one shown in the browser is identical to the one I've
> > been editing!  So that's not an issue.
> >
> >


Re: query parsing

2015-09-24 Thread Alessandro Benedetti
I would focus on this :

"

> 5> now kick off the DIH job and look again.
>
Now it shows a histogram, but most of the "terms" are long -- the full
texts of (the table.column) eventlogtext.logtext, including the whitespace
(with %0A used for newline characters)...  So, it appears it is not being
tokenized properly, correct?"
Can you open from your Solr ui , the schema xml and show us the snippets
for that field that seems to not tokenise ?
Can you show us ( even a screenshot is fine) the schema browser page
related ?
Could be a problem of encoding ?
Following Erick details about the analysis, what are your results ?

Cheers

2015-09-24 8:04 GMT+01:00 Upayavira :

> typically, the index dir is inside the data dir. Delete the index dir
> and you should be good. If there is a tlog next to it, you might want to
> delete that also.
>
> If you dont have a data dir, i wonder whether you set the data dir when
> creating your core or collection. Typically the instance dir and data
> dir aren't needed.
>
> Upayavira
>
> On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:
> > OK, this is bizarre. You'd have had to set up SolrCloud by specifying the
> > -zkRun command when you start Solr or the -zkHost; highly unlikely. On
> > the
> > admin page there would be a "cloud" link on the left side, I really doubt
> > one's there.
> >
> > You should have a data directory, it should be the parent of the index
> > and
> > tlog directories. As of sanity check try looking at the analysis page.
> > Type
> > a bunch of words in the left hand side indexing box and uncheck the
> > verbose
> > box. As you can tell I'm grasping at straws. I'm still puzzled why you
> > don't have a "data" directory here, but that shouldn't really matter. How
> > did you create this index? I don't mean data import handler more how did
> > you create the core that you're indexing to?
> >
> > Best,
> > Erick
> >
> > On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers 
> > wrote:
> >
> > > On 9/23/2015 12:30 PM, Erick Erickson wrote:
> > >
> > >> Then my next guess is you're not pointing at the index you think you
> are
> > >> when you 'rm -rf data'
> > >>
> > >> Just ignore the Elall field for now I should think, although get rid
> of it
> > >> if you don't think you need it.
> > >>
> > >> DIH should be irrelevant here.
> > >>
> > >> So let's back up.
> > >> 1> go ahead and "rm -fr data" (with Solr stopped).
> > >>
> > > I have no "data" dir.  Did you mean "index" dir?  I removed 3 index
> > > directories (2 for spelling):
> > > cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex
> > >
> > >> 2> start Solr
> > >> 3> do NOT re-index.
> > >> 4> look at your index via the schema-browser. Of course there should
> be
> > >> nothing there!
> > >>
> > > Correct!  It said "there is no term info :("
> > >
> > >> 5> now kick off the DIH job and look again.
> > >>
> > > Now it shows a histogram, but most of the "terms" are long -- the full
> > > texts of (the table.column) eventlogtext.logtext, including the
> whitespace
> > > (with %0A used for newline characters)...  So, it appears it is not
> being
> > > tokenized properly, correct?
> > >
> > >> Your logtext field should have only single tokens. The fact that you
> have
> > >> some very
> > >> long tokens presumably with whitespace) indicates that you aren't
> really
> > >> blowing
> > >> the index away between indexing.
> > >>
> > > Well, I did this time for sure.  I verified that initially, because it
> > > showed there was no term info until I DIH'd again.
> > >
> > >> Are you perhaps in Solr Cloud with more than one replica?
> > >>
> > > Not that I know of, but being new to Solr, there could be things going
> on
> > > that I'm not aware of.  How can I tell?  I certainly didn't set
> anything up
> > > for solrCloud deliberately.
> > >
> > >> In that case you
> > >> might be getting the index replicated on startup assuming you didn't
> > >> blow away all replicas. If you are in SolrCloud, I'd just delete the
> > >> collection and
> > >> start over, after insuring that you'd pushed the configset up to
> > >> Zookeeper.
> > >>
> > >> BTW, I always look at the schema.xml file from the Solr admin window
> just
> > >> as
> > >> a sanity check in these situations.
> > >>
> > > Good idea!  But the one shown in the browser is identical to the one
> I've
> > > been editing!  So that's not an issue.
> > >
> > >
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


RE: query parsing

2015-09-24 Thread Duck Geraint (ext) GBJH
Okay, so maybe I'm missing something here (I'm still relatively new to Solr 
myself), but am I right in thinking the following is still in your 
solrconfig.xml file:

  
true
managed-schema
  

If so, wouldn't using a managed schema make several of your field definitions 
inside the schema.xml file semi-redundant?

Regards,
Geraint


Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com


-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
Sent: 24 September 2015 09:23
To: solr-user@lucene.apache.org
Subject: Re: query parsing

I would focus on this :

"

> 5> now kick off the DIH job and look again.
>
Now it shows a histogram, but most of the "terms" are long -- the full texts of 
(the table.column) eventlogtext.logtext, including the whitespace (with %0A 
used for newline characters)...  So, it appears it is not being tokenized 
properly, correct?"
Can you open from your Solr ui , the schema xml and show us the snippets for 
that field that seems to not tokenise ?
Can you show us ( even a screenshot is fine) the schema browser page related ?
Could be a problem of encoding ?
Following Erick details about the analysis, what are your results ?

Cheers

2015-09-24 8:04 GMT+01:00 Upayavira <u...@odoko.co.uk>:

> typically, the index dir is inside the data dir. Delete the index dir
> and you should be good. If there is a tlog next to it, you might want
> to delete that also.
>
> If you dont have a data dir, i wonder whether you set the data dir
> when creating your core or collection. Typically the instance dir and
> data dir aren't needed.
>
> Upayavira
>
> On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:
> > OK, this is bizarre. You'd have had to set up SolrCloud by
> > specifying the -zkRun command when you start Solr or the -zkHost;
> > highly unlikely. On the admin page there would be a "cloud" link on
> > the left side, I really doubt one's there.
> >
> > You should have a data directory, it should be the parent of the
> > index and tlog directories. As of sanity check try looking at the
> > analysis page.
> > Type
> > a bunch of words in the left hand side indexing box and uncheck the
> > verbose box. As you can tell I'm grasping at straws. I'm still
> > puzzled why you don't have a "data" directory here, but that
> > shouldn't really matter. How did you create this index? I don't mean
> > data import handler more how did you create the core that you're
> > indexing to?
> >
> > Best,
> > Erick
> >
> > On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers
> > <mark.fenb...@noaa.gov>
> > wrote:
> >
> > > On 9/23/2015 12:30 PM, Erick Erickson wrote:
> > >
> > >> Then my next guess is you're not pointing at the index you think
> > >> you
> are
> > >> when you 'rm -rf data'
> > >>
> > >> Just ignore the Elall field for now I should think, although get
> > >> rid
> of it
> > >> if you don't think you need it.
> > >>
> > >> DIH should be irrelevant here.
> > >>
> > >> So let's back up.
> > >> 1> go ahead and "rm -fr data" (with Solr stopped).
> > >>
> > > I have no "data" dir.  Did you mean "index" dir?  I removed 3
> > > index directories (2 for spelling):
> > > cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex
> > >
> > >> 2> start Solr
> > >> 3> do NOT re-index.
> > >> 4> look at your index via the schema-browser. Of course there
> > >> 4> should
> be
> > >> nothing there!
> > >>
> > > Correct!  It said "there is no term info :("
> > >
> > >> 5> now kick off the DIH job and look again.
> > >>
> > > Now it shows a histogram, but most of the "terms" are long -- the
> > > full texts of (the table.column) eventlogtext.logtext, including
> > > the
> whitespace
> > > (with %0A used for newline characters)...  So, it appears it is
> > > not
> being
> > > tokenized properly, correct?
> > >
> > >> Your logtext field should have only single tokens. The fact that
> > >> you
> have
> > >> some very
> > >> long tokens presumably with whitespace) indicates that you aren't
> really
> > >> blowing
> > >> the index away between indexing.
> > >>
> > > Well, I did this time for sure.  I verified that initially,
> > 

Re: query parsing

2015-09-24 Thread Erick Erickson
Geraint:

Good Catch! I totally missed that. So all of our focus on schema.xml has
been... totally irrelevant. Now that you pointed that out, there's also the
addition: add-unknown-fields-to-the-schema, which indicates you started
this up in "schemaless" mode.

In short, solr is trying to guess what your field types should be and
guessing wrong (again and again and again). This is the classic weakness of
schemaless. It's great for indexing stuff fast, but if it guesses wrong
you're stuck.


So to the original problem: I'd start over and either
1> use the regular setup, not schemaless
or
2> use the _managed_ schema API to explicitly add fields and fieldTypes to
the managed schema

Best,
Erick

On Thu, Sep 24, 2015 at 2:02 AM, Duck Geraint (ext) GBJH <
geraint.d...@syngenta.com> wrote:

> Okay, so maybe I'm missing something here (I'm still relatively new to
> Solr myself), but am I right in thinking the following is still in your
> solrconfig.xml file:
>
>   
> true
> managed-schema
>   
>
> If so, wouldn't using a managed schema make several of your field
> definitions inside the schema.xml file semi-redundant?
>
> Regards,
> Geraint
>
>
> Geraint Duck
> Data Scientist
> Toxicology and Health Sciences
> Syngenta UK
> Email: geraint.d...@syngenta.com
>
>
> -Original Message-
> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> Sent: 24 September 2015 09:23
> To: solr-user@lucene.apache.org
> Subject: Re: query parsing
>
> I would focus on this :
>
> "
>
> > 5> now kick off the DIH job and look again.
> >
> Now it shows a histogram, but most of the "terms" are long -- the full
> texts of (the table.column) eventlogtext.logtext, including the whitespace
> (with %0A used for newline characters)...  So, it appears it is not being
> tokenized properly, correct?"
> Can you open from your Solr ui , the schema xml and show us the snippets
> for that field that seems to not tokenise ?
> Can you show us ( even a screenshot is fine) the schema browser page
> related ?
> Could be a problem of encoding ?
> Following Erick details about the analysis, what are your results ?
>
> Cheers
>
> 2015-09-24 8:04 GMT+01:00 Upayavira <u...@odoko.co.uk>:
>
> > typically, the index dir is inside the data dir. Delete the index dir
> > and you should be good. If there is a tlog next to it, you might want
> > to delete that also.
> >
> > If you dont have a data dir, i wonder whether you set the data dir
> > when creating your core or collection. Typically the instance dir and
> > data dir aren't needed.
> >
> > Upayavira
> >
> > On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:
> > > OK, this is bizarre. You'd have had to set up SolrCloud by
> > > specifying the -zkRun command when you start Solr or the -zkHost;
> > > highly unlikely. On the admin page there would be a "cloud" link on
> > > the left side, I really doubt one's there.
> > >
> > > You should have a data directory, it should be the parent of the
> > > index and tlog directories. As of sanity check try looking at the
> > > analysis page.
> > > Type
> > > a bunch of words in the left hand side indexing box and uncheck the
> > > verbose box. As you can tell I'm grasping at straws. I'm still
> > > puzzled why you don't have a "data" directory here, but that
> > > shouldn't really matter. How did you create this index? I don't mean
> > > data import handler more how did you create the core that you're
> > > indexing to?
> > >
> > > Best,
> > > Erick
> > >
> > > On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers
> > > <mark.fenb...@noaa.gov>
> > > wrote:
> > >
> > > > On 9/23/2015 12:30 PM, Erick Erickson wrote:
> > > >
> > > >> Then my next guess is you're not pointing at the index you think
> > > >> you
> > are
> > > >> when you 'rm -rf data'
> > > >>
> > > >> Just ignore the Elall field for now I should think, although get
> > > >> rid
> > of it
> > > >> if you don't think you need it.
> > > >>
> > > >> DIH should be irrelevant here.
> > > >>
> > > >> So let's back up.
> > > >> 1> go ahead and "rm -fr data" (with Solr stopped).
> > > >>
> > > > I have no "data" dir.  Did you mean "index" dir?  I removed 3
> > > > index directories (2 for spelling):
&g

Re: query parsing

2015-09-23 Thread Alessandro Benedetti
If you go to the Analysis tool, indexing and query time , what can you see
for your "deeper" query text and your field content ?
( using the log text field ) ?
Have you verified the current tokens in the index for that field ?

I quickly went through your config files, and they look ok, but it is quite
weird what happens to you.
Can you please run the query with debugQuery=true and post the results ?

Cheers

2015-09-23 12:57 GMT+01:00 Mark Fenbers :

> When I submit this:
>
> http://localhost:8983/solr/EventLog/select?q=deeper=json=true
>
> then I get these (empty) results:
>   {
>   "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
>   "q":"deeper",
>   "indent":"true",
>   "wt":"json"}},
>   "response":{"numFound":0,"start":0,"docs":[]
>   }}
>
> However, if I add asterisks before *and *after "deeper", like this:
>
> http://localhost:8983/solr/EventLog/select?q=*deeper*=json=true
>
> then I get the correct set of results (shown below), as I expect. What am
> I doing wrong that the query requires leading and trailing asterisks to
> work correctly?  If I search on existing text in the username field instead
> of the default logtext field, then I don't need to use the asterisks to get
> correct results.  Does this mean I have a problem in my indexing process
> when I used /dataimport. Or does it mean I have something wrong in my query?
>
> Also, notice in the results that category, logtext, and username fields
> are returned as arrays, even though I do not include multiValued="true" in
> the schema.xml definition.  Why?  Attached are my solrconfig.xml and
> schema.xml.  Any insights would be appreciated!
>
> thanks,
> Mark
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":9,
> "params":{
>   "q":"*deeper*",
>   "indent":"true",
>   "wt":"json"}},
>   "response":{"numFound":45,"start":0,"docs":[
>   {
> "id":"2012-07-10 13:23:39.0",
> "category":[16],
> "logtext":["\nHydromet Coordination Message\nOhio River Forecast
> Center, Wilmington, OH\n923 AM EDT Tuesday, July 10, 2012\n\nVery slow
> moving front has sagged down to the southernmost portion of the\nOhio
> Valley. This will keep the axis of convection along or south of the \nTN/KY
> border today and tomorrow, though some very light showers are \npossible in
> the northwest portion of the basin. On Thursday increased \nsoutherly flow
> over the Ohio Valley will begin to draw deeper moisture\nfarther north into
> the basin, but this will mainly be after the 48-hour\nforecast
> cutoff.\n\nDay 1 (8am EDT Tuesday - 8am EDT Wednesday):\nRain is forecast
> in southern Kentucky, southern West Virginia, middle\nTennessee and far
> western Virginia. Basin average amounts increase to the\nsouth with come
> areas approaching an inch. Light amounts less than 0.10 inch\nare expected
> in portions of central Indiana and Ohio. \n\nDay 2 (8am EDT Wednesday - 8am
> EDT Thursday): \nRain is forecast all areas south of the Ohio River as well
> as eastern \nIllinois, southern Indiana and southwest Pennsylvania. Basin
> average amounts\nincrease to the southwest with areas southwest of
> Nashville expecting \nover an inch. \n\nQPF from OHRFC, HPC, et al., can be
> seen at weather.gov/ohrfc/Forecast.php\n$$\nFor
>  critical after-hours
> support, the OHRFC cell number is 937-725-.\nLink Crawford "],
> "username":["crawford"],
> "_version_":1512928764746530816},
>   {
> "id":"2012-07-10 17:39:09.0",
> "category":[16],
> "logtext":["\nHydromet Coordination Message\nOhio River Forecast
> Center, Wilmington, OH\n139 PM EDT Tuesday, July 10, 2012\n\n18Z
> Discussion:\nMade some changes to the first 6-hour period of the QPF, but
> otherwise made\nno changes to the previous issuance.\n\nPrevious Discussion
> (12Z):\nVery slow moving front has sagged down to the southernmost portion
> of the\nOhio Valley. This will keep the axis of convection along or south
> of the \nTN/KY border today and tomorrow, though some very light showers
> are \npossible in the northwest portion of the basin. On Thursday increased
> \nsoutherly flow over the Ohio Valley will begin to draw deeper
> moisture\nfarther north into the basin, but this will mainly be after the
> 48-hour\nforecast cutoff.\n\nDay 1 (8am EDT Tuesday - 8am EDT
> Wednesday):\nRain is forecast in southern Kentucky, southern West Virginia,
> middle\nTennessee and far western Virginia. Basin average amounts increase
> to the\nsouth with come areas approaching an inch. Light amounts less than
> 0.10 inch\nare expected in portions of central Indiana and Ohio. \n\nDay 2
> (8am EDT Wednesday - 8am EDT Thursday): \nRain is forecast all areas south
> of the Ohio River as well as eastern \nIllinois, southern Indiana and
> southwest Pennsylvania. Basin average amounts\nincrease to the southwest
> with areas southwest of Nashville 

Re: query parsing

2015-09-23 Thread Mugeesh Husain
Hi Mark,

Search is not coming properly becuase you have taken  "ELall" field  as a
text type which is not define properly.

you have to modify the  schema.xml with these chance.



  


  

  






--
View this message in context: 
http://lucene.472066.n3.nabble.com/query-parsing-tp4230778p4230793.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: query parsing

2015-09-23 Thread Mark Fenbers
Mugeesh, I believe you are on the right path and I was eager to try out 
your suggestion.  So my schema.xml now contains this snippet (changes 
indicated by ~):


required="true" />
 ~ stored="true" required="true" />
required="true" />
required="true" />
~  stored="true" multiValued="true" />




~  
~ 
~
~
~   
~ 

but my results are the same -- that my search yields 0 results unless I 
wrap the search word with asterisks.


Alessandro, below are the results (with and without the asterisks) with 
debug turned on.  I don't know what much of the debug info means.  Is it 
giving you more clues?


http://localhost:8983/solr/EventLog/select?q=deeper=json=true=true

{
  "responseHeader":{
"status":0,
"QTime":2,
"params":{
  "q":"deeper",
  "indent":"true",
  "wt":"json",
  "debugQuery":"true"}},
  "response":{"numFound":0,"start":0,"docs":[]
  },
  "debug":{
"rawquerystring":"deeper",
"querystring":"deeper",
"parsedquery":"logtext:deeper",
"parsedquery_toString":"logtext:deeper",
"explain":{},
"QParser":"LuceneQParser",
"timing":{
  "time":1.0,
  "prepare":{
"time":0.0,
"query":{
  "time":0.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":0.0,
"query":{
  "time":0.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"debug":{
  "time":0.0}

http://localhost:8983/solr/EventLog/select?q=*deeper*=json=true=true

{
  "responseHeader":{
"status":0,
"QTime":460,
"params":{
  "q":"*deeper*",
  "indent":"true",
  "wt":"json",
  "debugQuery":"true"}},
  "response":{"numFound":45,"start":0,"docs":[
  {
"id":"2012-07-10 13:23:39.0",
"category":[16],
"logtext":["\nHydromet Coordination Message\nOhio River 
Forecast Center, Wilmington, OH\n923 AM EDT Tuesday, July 10, 
2012\n\nVery slow moving front has sagged down to the southernmost 
portion of the\nOhio Valley. This will keep the axis of convection along 
or south of the \nTN/KY border today and tomorrow, though some very 
light showers are \npossible in the northwest portion of the basin. On 
Thursday increased \nsoutherly flow over the Ohio Valley will begin to 
draw deeper moisture\nfarther north into the basin, but this will mainly 
be after the 48-hour\nforecast cutoff.\n\nDay 1 (8am EDT Tuesday - 8am 
EDT Wednesday):\nRain is forecast in southern Kentucky, southern West 
Virginia, middle\nTennessee and far western Virginia. Basin average 
amounts increase to the\nsouth with come areas approaching an inch. 
Light amounts less than 0.10 inch\nare expected in portions of central 
Indiana and Ohio. \n\nDay 2 (8am EDT Wednesday - 8am EDT Thursday): 
\nRain is forecast all areas south of the Ohio River as well as eastern 
\nIllinois, southern Indiana and southwest Pennsylvania. Basin average 
amounts\nincrease to the southwest with areas southwest of Nashville 
expecting \nover an inch. \n\nQPF from OHRFC, HPC, et al., can be seen 
at weather.gov/ohrfc/Forecast.php\n$$\nFor critical after-hours support, 
the OHRFC cell number is 937-725-.\nLink Crawford "],

"username":["crawford"],
"_version_":1512928764746530816},
  {
"id":"2012-07-10 17:39:09.0",
"category":[16],
"logtext":["\nHydromet Coordination Message\nOhio River 
Forecast Center, Wilmington, OH\n139 PM EDT Tuesday, July 10, 
2012\n\n18Z Discussion:\nMade some changes to the first 6-hour period of 
the QPF, but otherwise made\nno changes to the previous 
issuance.\n\nPrevious Discussion (12Z):\nVery slow moving front has 
sagged down to the southernmost portion of the\nOhio Valley. This will 
keep the axis of convection along or south of the \nTN/KY border today 
and tomorrow, though some very light showers are \npossible in the 
northwest portion of the basin. On Thursday increased \nsoutherly flow 
over the Ohio Valley will begin to draw deeper moisture\nfarther north 
into the basin, but this will mainly be after the 48-hour\nforecast 
cutoff.\n\nDay 1 (8am EDT Tuesday - 8am EDT Wednesday):\nRain is 
forecast in southern Kentucky, southern West Virginia, middle\nTennessee 
and far western Virginia. Basin average amounts increase to the\nsouth 
with come areas approaching an inch. Light amounts less than 0.10 
inch\nare expected in portions of central Indiana and Ohio. \n\nDay 2 
(8am EDT Wednesday - 8am EDT Thursday): \nRain is forecast all areas 
south of 

Re: query parsing

2015-09-23 Thread Alessandro Benedetti
m so those 2 are the queries at the minute :

1) logtext:deeper
2) logtext:*deeper*

According to your schema, the log text field is of type "text_en".
This should be completely fine.
Have you ever changed your schema on run ? without re-indexing your old
docs ?
What happens if you use your analysis tool ( both query and index time)
with the term deeper ?

Cheers

2015-09-23 15:10 GMT+01:00 Mark Fenbers :

> Mugeesh, I believe you are on the right path and I was eager to try out
> your suggestion.  So my schema.xml now contains this snippet (changes
> indicated by ~):
>
>  />
>  ~  stored="true" required="true" />
>  required="true" />
>  required="true" />
> ~   multiValued="true" />
> 
> 
>
> ~  
> ~ 
> ~
> ~
> ~   
> ~ 
>
> but my results are the same -- that my search yields 0 results unless I
> wrap the search word with asterisks.
>
> Alessandro, below are the results (with and without the asterisks) with
> debug turned on.  I don't know what much of the debug info means.  Is it
> giving you more clues?
>
>
> http://localhost:8983/solr/EventLog/select?q=deeper=json=true=true
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":2,
> "params":{
>   "q":"deeper",
>   "indent":"true",
>   "wt":"json",
>   "debugQuery":"true"}},
>   "response":{"numFound":0,"start":0,"docs":[]
>   },
>   "debug":{
> "rawquerystring":"deeper",
> "querystring":"deeper",
> "parsedquery":"logtext:deeper",
> "parsedquery_toString":"logtext:deeper",
> "explain":{},
> "QParser":"LuceneQParser",
> "timing":{
>   "time":1.0,
>   "prepare":{
> "time":0.0,
> "query":{
>   "time":0.0},
> "facet":{
>   "time":0.0},
> "facet_module":{
>   "time":0.0},
> "mlt":{
>   "time":0.0},
> "highlight":{
>   "time":0.0},
> "stats":{
>   "time":0.0},
> "expand":{
>   "time":0.0},
> "debug":{
>   "time":0.0}},
>   "process":{
> "time":0.0,
> "query":{
>   "time":0.0},
> "facet":{
>   "time":0.0},
> "facet_module":{
>   "time":0.0},
> "mlt":{
>   "time":0.0},
> "highlight":{
>   "time":0.0},
> "stats":{
>   "time":0.0},
> "expand":{
>   "time":0.0},
> "debug":{
>   "time":0.0}
>
>
> http://localhost:8983/solr/EventLog/select?q=*deeper*=json=true=true
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":460,
> "params":{
>   "q":"*deeper*",
>   "indent":"true",
>   "wt":"json",
>   "debugQuery":"true"}},
>
>   "response":{"numFound":45,"start":0,"docs":[
>   {
> "id":"2012-07-10 13:23:39.0",
> "category":[16],
> "logtext":["\nHydromet Coordination Message\nOhio River Forecast
> Center, Wilmington, OH\n923 AM EDT Tuesday, July 10, 2012\n\nVery slow
> moving front has sagged down to the southernmost portion of the\nOhio
> Valley. This will keep the axis of convection along or south of the \nTN/KY
> border today and tomorrow, though some very light showers are \npossible in
> the northwest portion of the basin. On Thursday increased \nsoutherly flow
> over the Ohio Valley will begin to draw deeper moisture\nfarther north into
> the basin, but this will mainly be after the 48-hour\nforecast
> cutoff.\n\nDay 1 (8am EDT Tuesday - 8am EDT Wednesday):\nRain is forecast
> in southern Kentucky, southern West Virginia, middle\nTennessee and far
> western Virginia. Basin average amounts increase to the\nsouth with come
> areas approaching an inch. Light amounts less than 0.10 inch\nare expected
> in portions of central Indiana and Ohio. \n\nDay 2 (8am EDT Wednesday - 8am
> EDT Thursday): \nRain is forecast all areas south of the Ohio River as well
> as eastern \nIllinois, southern Indiana and southwest Pennsylvania. Basin
> average amounts\nincrease to the southwest with areas southwest of
> Nashville expecting \nover an inch. \n\nQPF from OHRFC, HPC, et al., can be
> seen at weather.gov/ohrfc/Forecast.php\n$$\nFor
>  critical after-hours
> support, the OHRFC cell number is 937-725-.\nLink Crawford "],
> "username":["crawford"],
> "_version_":1512928764746530816},
>   {
> "id":"2012-07-10 17:39:09.0",
> "category":[16],
> "logtext":["\nHydromet Coordination Message\nOhio River Forecast
> Center, Wilmington, OH\n139 PM EDT Tuesday, July 10, 2012\n\n18Z
> Discussion:\nMade some changes to the first 6-hour period of the QPF, but
> otherwise made\nno changes to the previous issuance.\n\nPrevious Discussion
> (12Z):\nVery slow moving front has sagged down to the southernmost portion
> of the\nOhio Valley. This will keep the axis of convection along or south
> of the \nTN/KY border today and tomorrow, though 

Re: query parsing

2015-09-23 Thread Mark Fenbers

On 9/23/2015 10:21 AM, Alessandro Benedetti wrote:

m so those 2 are the queries at the minute :

1) logtext:deeper
2) logtext:*deeper*

According to your schema, the log text field is of type "text_en".
This should be completely fine.
Have you ever changed your schema on run ? without re-indexing your old
docs ?
I might forget sometimes, but usually, when I make changes to 
solrconfig.xml or schema.xml, then I delete the main index and the 
spellchecker indexes, and then restart solr, then do /dataimport again.

What happens if you use your analysis tool ( both query and index time)
with the term deeper ?
Can you clarify what you want me to do here?  What do you want me to put 
in the (Index) text box and in the (Query) text box and what do I select 
in the fieldType drop-list?  When I put "deeper" into both text boxes 
and select text_en from the drop list, I get several results, but I 
don't know what the output means.


thanks,
Mark


Re: query parsing

2015-09-23 Thread Erick Erickson
This is totally weird.

Don't only re-index your old docs, find the data directory and
rm -rf data (with Solr stopped) and re-index.

re: the analysis page Alessandro mentioned.
Go to the Solr admin UI (http://localhost:8983/solr). You'll
see a drop-down on the left that lets you select a core,
select the appropriate one.

Now you'll see a bunch of new choices. The "analysis" section
is what Alessandro is referencing. That shows you _exactly_ what
effects your analysis chain has at index and query time.

On the same page, you'll find "schema browser". Take a look at
your logtext field and hit the "load term info" button. You should
see a bunch of single-word tokens listed. If you see really long ones,
then your index is hosed and you should start by blowing away
the data directory

Because this symptom is totally explained by searching on a "string"
rather than a "text" type. But your definition is clearly a tokenized text
type so I'm mystified.

The ELall field is a red herring. The debug output shows you're searching
on the logtext field, this line is the relevant one:
"parsedquery_toString":"logtext:deeper",

Best,
Erick

On Wed, Sep 23, 2015 at 8:07 AM, Mark Fenbers  wrote:

> On 9/23/2015 10:21 AM, Alessandro Benedetti wrote:
>
>> m so those 2 are the queries at the minute :
>>
>> 1) logtext:deeper
>> 2) logtext:*deeper*
>>
>> According to your schema, the log text field is of type "text_en".
>> This should be completely fine.
>> Have you ever changed your schema on run ? without re-indexing your old
>> docs ?
>>
> I might forget sometimes, but usually, when I make changes to
> solrconfig.xml or schema.xml, then I delete the main index and the
> spellchecker indexes, and then restart solr, then do /dataimport again.
>
>> What happens if you use your analysis tool ( both query and index time)
>> with the term deeper ?
>>
> Can you clarify what you want me to do here?  What do you want me to put
> in the (Index) text box and in the (Query) text box and what do I select in
> the fieldType drop-list?  When I put "deeper" into both text boxes and
> select text_en from the drop list, I get several results, but I don't know
> what the output means.
>
> thanks,
> Mark
>


Re: query parsing

2015-09-23 Thread Mark Fenbers

On 9/23/2015 12:30 PM, Erick Erickson wrote:

Then my next guess is you're not pointing at the index you think you are
when you 'rm -rf data'

Just ignore the Elall field for now I should think, although get rid of it
if you don't think you need it.

DIH should be irrelevant here.

So let's back up.
1> go ahead and "rm -fr data" (with Solr stopped).
I have no "data" dir.  Did you mean "index" dir?  I removed 3 index 
directories (2 for spelling):

cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex

2> start Solr
3> do NOT re-index.
4> look at your index via the schema-browser. Of course there should be
nothing there!

Correct!  It said "there is no term info :("

5> now kick off the DIH job and look again.
Now it shows a histogram, but most of the "terms" are long -- the full 
texts of (the table.column) eventlogtext.logtext, including the 
whitespace (with %0A used for newline characters)...  So, it appears it 
is not being tokenized properly, correct?

Your logtext field should have only single tokens. The fact that you have
some very
long tokens presumably with whitespace) indicates that you aren't really
blowing
the index away between indexing.
Well, I did this time for sure.  I verified that initially, because it 
showed there was no term info until I DIH'd again.

Are you perhaps in Solr Cloud with more than one replica?
Not that I know of, but being new to Solr, there could be things going 
on that I'm not aware of.  How can I tell?  I certainly didn't set 
anything up for solrCloud deliberately.

In that case you
might be getting the index replicated on startup assuming you didn't
blow away all replicas. If you are in SolrCloud, I'd just delete the
collection and
start over, after insuring that you'd pushed the configset up to Zookeeper.

BTW, I always look at the schema.xml file from the Solr admin window just as
a sanity check in these situations.
Good idea!  But the one shown in the browser is identical to the one 
I've been editing!  So that's not an issue.




Re: query parsing

2015-09-23 Thread Erick Erickson
Then my next guess is you're not pointing at the index you think you are
when you 'rm -rf data'

Just ignore the Elall field for now I should think, although get rid of it
if you don't think you need it.

DIH should be irrelevant here.

So let's back up.
1> go ahead and "rm -fr data" (with Solr stopped).
2> start Solr
3> do NOT re-index.
4> look at your index via the schema-browser. Of course there should be
nothing there!
5> now kick off the DIH job and look again.

Your logtext field should have only single tokens. The fact that you have
some very
long tokens presumably with whitespace) indicates that you aren't really
blowing
the index away between indexing.

Are you perhaps in Solr Cloud with more than one replica? In that case you
might be getting the index replicated on startup assuming you didn't
blow away all replicas. If you are in SolrCloud, I'd just delete the
collection and
start over, after insuring that you'd pushed the configset up to Zookeeper.

BTW, I always look at the schema.xml file from the Solr admin window just as
a sanity check in these situations.

Best,
Erick

On Wed, Sep 23, 2015 at 9:22 AM, Mark Fenbers  wrote:

> On 9/23/2015 11:28 AM, Erick Erickson wrote:
>
>> This is totally weird.
>>
>> Don't only re-index your old docs, find the data directory and
>> rm -rf data (with Solr stopped) and re-index.
>>
> I pretty much do that.  The thing is: I don't have a data directory
> anywhere!  Most of my stuff is in /localapps/dev/EventLog/solr/, but I *do*
> have a /localapps/dev/EventLog/index/ directory where the main index
> resides.  I'd like to move that into /localapps/dev/EventLog/solr/ so that
> I can keep all Solr-related files under one parent dir, but I can't find
> where the configuration for that is...
>
> Perhaps I should also share what start command I'm using (in case it is
> wrong!):
>
> /localapps/dev/solr-5.3.0/bin/solr start -s /localapps/dev/EventLog
>
>> re: the analysis page Alessandro mentioned.
>> Go to the Solr admin UI (http://localhost:8983/solr). You'll
>> see a drop-down on the left that lets you select a core,
>> select the appropriate one.
>>
>> Now you'll see a bunch of new choices. The "analysis" section
>> is what Alessandro is referencing. That shows you _exactly_ what
>> effects your analysis chain has at index and query time.
>>
>> On the same page, you'll find "schema browser". Take a look at
>> your logtext field and hit the "load term info" button. You should
>> see a bunch of single-word tokens listed. If you see really long ones,
>> then your index is hosed and you should start by blowing away
>> the data directory
>>
> I wish I could show a screen capture!  But according to your symptoms, my
> index is hosed (I see very few single-word tokens and lots of really long
> ones.)  I have no data directory to blow away, though.  I've blown away
> /localapps/dev/EventLog/index/ before, but that has had no effect on the
> problem.
>
> Am I indexing improperly perhaps?  I'm using /dataimport.  Here is my
> data-config.xml, which hasn't been giving me any obvious trouble.  Import
> seems successful.  And I can get correct search results so long as I wrap
> my search text in asterisks...
>
> 
> 
>  driver="org.postgresql.Driver"/>
> 
>  name="eventlogtext">
>  
> 
> 
> 
>
> Because this symptom is totally explained by searching on a "string"
>> rather than a "text" type. But your definition is clearly a tokenized text
>> type so I'm mystified.
>>
>> The ELall field is a red herring. The debug output shows you're searching
>> on the logtext field, this line is the relevant one:
>> "parsedquery_toString":"logtext:deeper",
>>
> Should I just get rid of "ELall"?  I only created it with the intent to be
> able to search on "fenbers" and get hits if "fenbers" occurred in either
> place, the logtext field or the username field.
>
> thanks,
> Mark
>
>


Re: query parsing

2015-09-23 Thread Mark Fenbers

On 9/23/2015 11:28 AM, Erick Erickson wrote:

This is totally weird.

Don't only re-index your old docs, find the data directory and
rm -rf data (with Solr stopped) and re-index.
I pretty much do that.  The thing is: I don't have a data directory 
anywhere!  Most of my stuff is in /localapps/dev/EventLog/solr/, but I 
*do* have a /localapps/dev/EventLog/index/ directory where the main 
index resides.  I'd like to move that into /localapps/dev/EventLog/solr/ 
so that I can keep all Solr-related files under one parent dir, but I 
can't find where the configuration for that is...


Perhaps I should also share what start command I'm using (in case it is 
wrong!):


/localapps/dev/solr-5.3.0/bin/solr start -s /localapps/dev/EventLog

re: the analysis page Alessandro mentioned.
Go to the Solr admin UI (http://localhost:8983/solr). You'll
see a drop-down on the left that lets you select a core,
select the appropriate one.

Now you'll see a bunch of new choices. The "analysis" section
is what Alessandro is referencing. That shows you _exactly_ what
effects your analysis chain has at index and query time.

On the same page, you'll find "schema browser". Take a look at
your logtext field and hit the "load term info" button. You should
see a bunch of single-word tokens listed. If you see really long ones,
then your index is hosed and you should start by blowing away
the data directory
I wish I could show a screen capture!  But according to your symptoms, 
my index is hosed (I see very few single-word tokens and lots of really 
long ones.)  I have no data directory to blow away, though.  I've blown 
away /localapps/dev/EventLog/index/ before, but that has had no effect 
on the problem.


Am I indexing improperly perhaps?  I'm using /dataimport.  Here is my 
data-config.xml, which hasn't been giving me any obvious trouble.  
Import seems successful.  And I can get correct search results so long 
as I wrap my search text in asterisks...




driver="org.postgresql.Driver"/>


name="eventlogtext">
 






Because this symptom is totally explained by searching on a "string"
rather than a "text" type. But your definition is clearly a tokenized text
type so I'm mystified.

The ELall field is a red herring. The debug output shows you're searching
on the logtext field, this line is the relevant one:
"parsedquery_toString":"logtext:deeper",
Should I just get rid of "ELall"?  I only created it with the intent to 
be able to search on "fenbers" and get hits if "fenbers" occurred in 
either place, the logtext field or the username field.


thanks,
Mark



Re: query parsing

2015-09-23 Thread Erick Erickson
OK, this is bizarre. You'd have had to set up SolrCloud by specifying the
-zkRun command when you start Solr or the -zkHost; highly unlikely. On the
admin page there would be a "cloud" link on the left side, I really doubt
one's there.

You should have a data directory, it should be the parent of the index and
tlog directories. As of sanity check try looking at the analysis page. Type
a bunch of words in the left hand side indexing box and uncheck the verbose
box. As you can tell I'm grasping at straws. I'm still puzzled why you
don't have a "data" directory here, but that shouldn't really matter. How
did you create this index? I don't mean data import handler more how did
you create the core that you're indexing to?

Best,
Erick

On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers 
wrote:

> On 9/23/2015 12:30 PM, Erick Erickson wrote:
>
>> Then my next guess is you're not pointing at the index you think you are
>> when you 'rm -rf data'
>>
>> Just ignore the Elall field for now I should think, although get rid of it
>> if you don't think you need it.
>>
>> DIH should be irrelevant here.
>>
>> So let's back up.
>> 1> go ahead and "rm -fr data" (with Solr stopped).
>>
> I have no "data" dir.  Did you mean "index" dir?  I removed 3 index
> directories (2 for spelling):
> cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex
>
>> 2> start Solr
>> 3> do NOT re-index.
>> 4> look at your index via the schema-browser. Of course there should be
>> nothing there!
>>
> Correct!  It said "there is no term info :("
>
>> 5> now kick off the DIH job and look again.
>>
> Now it shows a histogram, but most of the "terms" are long -- the full
> texts of (the table.column) eventlogtext.logtext, including the whitespace
> (with %0A used for newline characters)...  So, it appears it is not being
> tokenized properly, correct?
>
>> Your logtext field should have only single tokens. The fact that you have
>> some very
>> long tokens presumably with whitespace) indicates that you aren't really
>> blowing
>> the index away between indexing.
>>
> Well, I did this time for sure.  I verified that initially, because it
> showed there was no term info until I DIH'd again.
>
>> Are you perhaps in Solr Cloud with more than one replica?
>>
> Not that I know of, but being new to Solr, there could be things going on
> that I'm not aware of.  How can I tell?  I certainly didn't set anything up
> for solrCloud deliberately.
>
>> In that case you
>> might be getting the index replicated on startup assuming you didn't
>> blow away all replicas. If you are in SolrCloud, I'd just delete the
>> collection and
>> start over, after insuring that you'd pushed the configset up to
>> Zookeeper.
>>
>> BTW, I always look at the schema.xml file from the Solr admin window just
>> as
>> a sanity check in these situations.
>>
> Good idea!  But the one shown in the browser is identical to the one I've
> been editing!  So that's not an issue.
>
>


Re: Query parsing - difference between Analysis and parsedquery_toString output

2014-10-20 Thread Ramzi Alqrainy
 q: manufacture_t:The Hershey Company^100 OR title_t:The Hershey
Company^1000 

Firstly, Make sure that manufacture_t and title_t are text_general type, and
Let's use this approach instead of your approach 
q=The Hershey Companyq.op=ANDqf=manufacture_t title_tdefType=edismax





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-parsing-difference-between-Analysis-and-parsedquery-toString-output-tp4164851p4164884.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query parsing - difference between Analysis and parsedquery_toString output [SOLVED]

2014-10-20 Thread tinush
Thanks guys for a quick reply, 

Adding ( ) to query values resolved the issue!

Tanya



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-parsing-difference-between-Analysis-and-parsedquery-toString-output-tp4164851p4164912.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query parsing - difference between Analysis and parsedquery_toString output

2014-10-19 Thread Erick Erickson
This trips _everybody_ up. Analysis doesn't happen until things get
through the query parser. So,
let's assume your query is
q=manufacture_t:The Hershey Company^100 OR title_t:The Hershey
Company^1000

The problem is that the query _parser_ doesn't understand that
your intent is that the hershey company be evaluated against
the manuracture_t field, and the title_t field. All it sees is
manufacture_t:the then, as a naked token, hershey and company.
So, it does the best it can and assumes that hershey and company
should be evaluated against your default text field, in this case text.

You have two choices here:
1 form your query like maufacture_t:The Hershey Company,or
manufacture_t:(The Hershey Company).

The first form requires that the words The, Hershey, and Company
appear in sequence, and the second form just requires that all three
appear in somewhere in the field in any order.

Actually, the second form requires that only one of the terms appears
in the field assuming your default q.op is OR. If you require all three
either define the default operator to be AND or enter it as
manuracture_t:(The AND Hershey AND company).

Best,
Erick

On Sun, Oct 19, 2014 at 4:49 PM, tinush tanya.karpin...@gmail.com wrote:
 Hi,

 I use Solr 4.9 and imported about 20K documents from CSV data.

 In schema there is following definition for text_general field which I want
 to process by tokenization, stop word removal, stemming.

 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
 filter class=solr.StopFilterFactory ignoreCase=true
 enablePositionIncrements=true /
 filter class=solr.ASCIIFoldingFilterFactory /
 filter class=solr.SnowballPorterFilterFactory
 language=English/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 enablePositionIncrements=true /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.ASCIIFoldingFilterFactory /
 filter class=solr.SnowballPorterFilterFactory
 language=English/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType

 Using Solr Admin Analysis for that field type I see that both index and
 query value proceed as expected: Hershey's - *hershey*, The Hershey's
 Company - the *hershey* compani

 I was expected the same processing for select query, but it seems doesn't
 happen and no result found in below example:
  q: manufacture_t:The Hershey Company^100 OR title_t:The Hershey
 Company^1000
  parsedquery_toString: manufacture_t:the text:Hershey text:Company^100.0
 title_t:the text:Hershey text:Company^1000.0,

 indexed document:
docs: [
   {
 id: 00010700501806,
 description_t: [
   Hershey's Whoppers Carton - 12 Pack 
 ],
 title_t: [
   Whoppers Carton - 12 Pack
 ],
 manufacture_t: [
   Hershey's
 ],

 What do I miss?

 Thanks in advance,
 Tanya







 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Query-parsing-difference-between-Analysis-and-parsedquery-toString-output-tp4164851.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query parsing issue

2013-03-06 Thread Tomás Fernández Löbbe
It should be easy to extend ExtendedDismaxQParser and do your
pre-processing in the parse() method before calling edismax's parse. Or
maybe you could change the way EDismax is splitting the input query into
clauses by extending the splitIntoClauses method?

Tomás


On Wed, Mar 6, 2013 at 6:37 AM, Francesco Valentini 
francesco.valent...@altiliagroup.com wrote:

 Hi,



 I’ve written my own analyzer to index and query a set of documents. At
 indexing time everything goes well but

 now I have a problem in  query phase.

 I need to pass  the whole query string to my analyzer before the edismax
 query parser begins its tasks.

 In other words I have to preprocess the raw query string.

 The phrase querying does not fit my needs because I don’t have to match
 the entire set of terms/tokens.

 How can I achieve this?



 Thank you in advance.





 Francesco






Re: Query parsing VS marshalling/unmarshalling

2013-01-16 Thread balaji.gandhi
Hi, 

I am trying to do something similar:- 

Eg. 
Input: (name:John AND name:Doe) 
Output: ((firstName:John OR lastName:John) AND (firstName:John OR
lastName:John)) 

How can I extract the fields, change them and repackage the query? 

Thanks, 
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-parsing-VS-marshalling-unmarshalling-tp3935430p4033985.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query parsing VS marshalling/unmarshalling

2012-04-24 Thread Benson Margulies
2012/4/24 Mindaugas Žakšauskas min...@gmail.com:
 Hi,

 I maintain a distributed system which Solr is part of. The data which
 is kept is Solr is permissioned and permissions are currently
 implemented by taking the original user query, adding certain bits to
 it which would make it return less data in the search results. Now I
 am at the point where I need to go over this functionality and try to
 improve it.

 Changing this to send two separate queries (q=...fq=...) would be the
 first logical thing to do, however I was thinking of an extra
 improvement. Instead of generating filter query, converting it into a
 String, sending over the HTTP just to parse it by Solr again - would
 it not be better to take generated Lucene fq query, serialize it using
 Java serialization, convert it to, say, Base64 and then send and
 deserialize it on the Solr end? Has anyone tried doing any performance
 comparisons on this topic?

I'm about to try out a contribution for serializing queries in
Javascript using Jackson. I've previously done this by serializing my
own data structure and putting the JSON into a custom query parameter.



 I am particularly concerned about this because in extreme cases my
 filter queries can be very large (1000s of characters long) and we
 already had to do tweaks as the size of GET requests would exceed
 default limits. And yes, we could move to POST but I would like to
 minimize both the amount of data that is sent over and the time taken
 to parse large queries.

 Thanks in advance.

 m.


Re: Query parsing VS marshalling/unmarshalling

2012-04-24 Thread Mindaugas Žakšauskas
On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies bimargul...@gmail.com wrote:
 I'm about to try out a contribution for serializing queries in
 Javascript using Jackson. I've previously done this by serializing my
 own data structure and putting the JSON into a custom query parameter.

Thanks for your reply. Appreciate your effort, but I'm not sure if I
fully understand the gain.

Having data in JSON would still require it to be converted into Lucene
Query at the end which takes space  CPU effort, right? Or are you
saying that having query serialized into a structured data blob (JSON
in this case) makes it somehow easier to convert it into Lucene Query?

I only thought about Java serialization because:
- it's rather close to the in-object format
- the mechanism is rather stable and is an established standard in Java/JVM
- Lucene Queries seem to implement java.io.Serializable (haven't done
a thorough check but looks good on the surface)
- other conversions (e.g. using Xtream) are either slow or require
custom annotations. I personally don't see how would Lucene/Solr
include them in their core classes.

Anyway, it would still be interesting to hear if anyone could
elaborate on query parsing complexity.

m.


Re: Query parsing VS marshalling/unmarshalling

2012-04-24 Thread Erick Erickson
In general, query parsing is such a small fraction of the total time that,
almost no matter how complex, it's not worth worrying about. To see
this, attach debugQuery=on to your query and look at the timings
in the pepare and process portions of the response. I'd  be
very sure that it was a problem before spending any time trying to make
the transmission of the data across the wire more efficient, my first
reaction is that this is premature optimization.

Second, you could do this on the server side with a custom query
component if you chose. You can freely modify the query
over there and it may make sense in your situation.

Third, consider no cache filters, which were developed for
expensive filter queries, ACL being one of them. See:
https://issues.apache.org/jira/browse/SOLR-2429

Fourth, I'd ask if there's a way to reduce the size of the FQ
clause. Is this on a particular user basis or groups basis?
If you can get this down to a few groups that would help. Although
there's often some outlier who is member of thousands of
groups :(.

Best
Erick


2012/4/24 Mindaugas Žakšauskas min...@gmail.com:
 On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 I'm about to try out a contribution for serializing queries in
 Javascript using Jackson. I've previously done this by serializing my
 own data structure and putting the JSON into a custom query parameter.

 Thanks for your reply. Appreciate your effort, but I'm not sure if I
 fully understand the gain.

 Having data in JSON would still require it to be converted into Lucene
 Query at the end which takes space  CPU effort, right? Or are you
 saying that having query serialized into a structured data blob (JSON
 in this case) makes it somehow easier to convert it into Lucene Query?

 I only thought about Java serialization because:
 - it's rather close to the in-object format
 - the mechanism is rather stable and is an established standard in Java/JVM
 - Lucene Queries seem to implement java.io.Serializable (haven't done
 a thorough check but looks good on the surface)
 - other conversions (e.g. using Xtream) are either slow or require
 custom annotations. I personally don't see how would Lucene/Solr
 include them in their core classes.

 Anyway, it would still be interesting to hear if anyone could
 elaborate on query parsing complexity.

 m.


Re: Query parsing VS marshalling/unmarshalling

2012-04-24 Thread Mindaugas Žakšauskas
Hi Erick,

Thanks for looking into this and for the tips you've sent.

I am leaning towards custom query component at the moment, the primary
reason for it would be to be able to squeeze the amount of data that
is sent over to Solr. A single round trip within the same datacenter
is worth around 0.5 ms [1] and if query doesn't fit into a single
ethernet packet, this number effectively has to double/triple/etc.

Regarding cache filters - I was actually thinking the opposite:
caching ACL queries (filter queries) would be beneficial as those tend
to be the same across multiple search requests.

[1] 
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//people/jeff/stanford-295-talk.pdf
, slide 13

m.

On Tue, Apr 24, 2012 at 4:43 PM, Erick Erickson erickerick...@gmail.com wrote:
 In general, query parsing is such a small fraction of the total time that,
 almost no matter how complex, it's not worth worrying about. To see
 this, attach debugQuery=on to your query and look at the timings
 in the pepare and process portions of the response. I'd  be
 very sure that it was a problem before spending any time trying to make
 the transmission of the data across the wire more efficient, my first
 reaction is that this is premature optimization.

 Second, you could do this on the server side with a custom query
 component if you chose. You can freely modify the query
 over there and it may make sense in your situation.

 Third, consider no cache filters, which were developed for
 expensive filter queries, ACL being one of them. See:
 https://issues.apache.org/jira/browse/SOLR-2429

 Fourth, I'd ask if there's a way to reduce the size of the FQ
 clause. Is this on a particular user basis or groups basis?
 If you can get this down to a few groups that would help. Although
 there's often some outlier who is member of thousands of
 groups :(.

 Best
 Erick


 2012/4/24 Mindaugas Žakšauskas min...@gmail.com:
 On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 I'm about to try out a contribution for serializing queries in
 Javascript using Jackson. I've previously done this by serializing my
 own data structure and putting the JSON into a custom query parameter.

 Thanks for your reply. Appreciate your effort, but I'm not sure if I
 fully understand the gain.

 Having data in JSON would still require it to be converted into Lucene
 Query at the end which takes space  CPU effort, right? Or are you
 saying that having query serialized into a structured data blob (JSON
 in this case) makes it somehow easier to convert it into Lucene Query?

 I only thought about Java serialization because:
 - it's rather close to the in-object format
 - the mechanism is rather stable and is an established standard in Java/JVM
 - Lucene Queries seem to implement java.io.Serializable (haven't done
 a thorough check but looks good on the surface)
 - other conversions (e.g. using Xtream) are either slow or require
 custom annotations. I personally don't see how would Lucene/Solr
 include them in their core classes.

 Anyway, it would still be interesting to hear if anyone could
 elaborate on query parsing complexity.

 m.


Re: Query parsing VS marshalling/unmarshalling

2012-04-24 Thread Erick Erickson
If you're assembling an fq clause, this is all done or you, although
you need to take some care to form the fq clause _exactly_
the same way each time. Think of the filterCache as a key/value
map where the key is the raw fq text and the value is the docs
satisfying that query.

So fq=acl:(a OR a) will not, for instance, match
 fq=acl:(b OR a)

FWIW
Erick

2012/4/24 Mindaugas Žakšauskas min...@gmail.com:
 Hi Erick,

 Thanks for looking into this and for the tips you've sent.

 I am leaning towards custom query component at the moment, the primary
 reason for it would be to be able to squeeze the amount of data that
 is sent over to Solr. A single round trip within the same datacenter
 is worth around 0.5 ms [1] and if query doesn't fit into a single
 ethernet packet, this number effectively has to double/triple/etc.

 Regarding cache filters - I was actually thinking the opposite:
 caching ACL queries (filter queries) would be beneficial as those tend
 to be the same across multiple search requests.

 [1] 
 http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//people/jeff/stanford-295-talk.pdf
 , slide 13

 m.

 On Tue, Apr 24, 2012 at 4:43 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 In general, query parsing is such a small fraction of the total time that,
 almost no matter how complex, it's not worth worrying about. To see
 this, attach debugQuery=on to your query and look at the timings
 in the pepare and process portions of the response. I'd  be
 very sure that it was a problem before spending any time trying to make
 the transmission of the data across the wire more efficient, my first
 reaction is that this is premature optimization.

 Second, you could do this on the server side with a custom query
 component if you chose. You can freely modify the query
 over there and it may make sense in your situation.

 Third, consider no cache filters, which were developed for
 expensive filter queries, ACL being one of them. See:
 https://issues.apache.org/jira/browse/SOLR-2429

 Fourth, I'd ask if there's a way to reduce the size of the FQ
 clause. Is this on a particular user basis or groups basis?
 If you can get this down to a few groups that would help. Although
 there's often some outlier who is member of thousands of
 groups :(.

 Best
 Erick


 2012/4/24 Mindaugas Žakšauskas min...@gmail.com:
 On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 I'm about to try out a contribution for serializing queries in
 Javascript using Jackson. I've previously done this by serializing my
 own data structure and putting the JSON into a custom query parameter.

 Thanks for your reply. Appreciate your effort, but I'm not sure if I
 fully understand the gain.

 Having data in JSON would still require it to be converted into Lucene
 Query at the end which takes space  CPU effort, right? Or are you
 saying that having query serialized into a structured data blob (JSON
 in this case) makes it somehow easier to convert it into Lucene Query?

 I only thought about Java serialization because:
 - it's rather close to the in-object format
 - the mechanism is rather stable and is an established standard in Java/JVM
 - Lucene Queries seem to implement java.io.Serializable (haven't done
 a thorough check but looks good on the surface)
 - other conversions (e.g. using Xtream) are either slow or require
 custom annotations. I personally don't see how would Lucene/Solr
 include them in their core classes.

 Anyway, it would still be interesting to hear if anyone could
 elaborate on query parsing complexity.

 m.


Re: query parsing - removes a term

2011-06-14 Thread Dmitry Kan
Do you use stop word removal on text field?

Dmitry

On Tue, Jun 14, 2011 at 9:18 PM, Andrea Eakin 
andrea.ea...@systemsbiology.org wrote:

 I am trying to do the following type of query:

 +text:(was wasp) +pub_date_year:[1991 TO 2011]

 When I turn debugQuery=on I find that the parsedquery is only sending in
 the
 +text:(wasp) on parsing, and doesn't use the was value.  Why is it
 removing one of the terms?

 Thanks!
 Andrea




-- 
Regards,

Dmitry Kan


Re: query parsing ( expansion ) in solr

2009-12-23 Thread gudumba l
Hi,
 I have explored DisMaxRequestHandler. It could serve for some
of my purposes but not all.
1) It seems we have to decide that alternative field list beforehand
and declare them in the config.xml . But the field list for which
synonyms are to be considered is not definite ( at least in the view
of declaring manually in the xml ), its getting updated frequently
depending upon the indexed fiels. Anyways if the list is too big its
hard to follow this approach.

2) I have another issue too.. We could mention city , place, town in
the dismax declaration, but what if there is another list of synonyms
like .. if the query is organisation : xyz.. for which I would like
to convert the query to
  organisation:xyz OR company:xyz OR institution:xyz .

   As far as I explored it is not possible to link city, organisation
to their corresponding synonyms seperately, but we can only decalre a
set of default field names to be searched.
 If I am wrong at any point, please let me know.
  Any other suggestions?
Thanks.


2009/12/22 AHMET ARSLAN iori...@yahoo.com:

 Hello All,
             I have been
 trying to find out the right place to parse
 the query submitted. To be brief, I need to expand the
 query. For
 example.. let the query be
        city:paris
 then I would like to expand the query as .. follows
     city:paris OR place:paris OR town:paris .

      I guess the synonym support is
 provided only for values but not
 field names.

 Why not use DisMaxRequestHandler?
 ...search for the individual words across several fields...
 http://wiki.apache.org/solr/DisMaxRequestHandler






Re: query parsing ( expansion ) in solr

2009-12-23 Thread AHMET ARSLAN
 Hi,
      I have explored
 DisMaxRequestHandler. It could serve for some
 of my purposes but not all.
 1) It seems we have to decide that alternative field list
 beforehand
 and declare them in the config.xml . But the field list for
 which
 synonyms are to be considered is not definite ( at least in
 the view
 of declaring manually in the xml ), its getting updated
 frequently
 depending upon the indexed fiels. Anyways if the list is
 too big its
 hard to follow this approach.
 
 2) I have another issue too.. We could mention city ,
 place, town in
 the dismax declaration, but what if there is another list
 of synonyms
 like .. if the query is organisation : xyz.. for which I
 would like
 to convert the query to
       organisation:xyz OR company:xyz OR
 institution:xyz .
 
    As far as I explored it is not possible
 to link city, organisation
 to their corresponding synonyms seperately, but we can only
 decalre a
 set of default field names to be searched.
      If I am wrong at any point, please
 let me know.
       Any other suggestions?
 Thanks.

If you want field synonyms seperately, then you can extend 
org.apache.solr.handler.component.SearchHandler and override

public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
{

  final String q = req.getParams().get(CommonParams.Q);
  
  String expandedQuery = process incoming query with string operations e.g. if 
q.startsWith(organisation:) ;

  ModifiableSolrParams solrParams = new   ModifiableSolrParams(req.getParams());

  solrParams.set(CommonParams.Q, expandedQuery);
  req.setParams(solrParams);

  super.handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp);
}

Then register this new request handler in solrconfig.xml and use it.
Does this approach serve your purposes?





Re: query parsing ( expansion ) in solr

2009-12-23 Thread gudumba l
Hello,
 Thanks. This would absolutely serve. I thought of doing it in
queryparser part which I mentioned in first mail. But if the query is
a complex one, then it would become a bit complicated. Thats why I
wanted to know whether there is any other way which is similar to  the
second point in my first mail..

---2) I could first pass the incoming query string to a default parser
provided by Solr and then retrieve all the Terms ( and then add
synonym terms ) using by calling Query.extractTerms() ..on the
returned Query object but I am unable to get how to point out
relations among the Terms like.. whether its Term1 OR Term2 AND Term3,
 or   Term1 AND Term2 AND Term3 .. or something else.  

Thanks.

2009/12/23 AHMET ARSLAN iori...@yahoo.com:
 Hi,
      I have explored
 DisMaxRequestHandler. It could serve for some
 of my purposes but not all.
 1) It seems we have to decide that alternative field list
 beforehand
 and declare them in the config.xml . But the field list for
 which
 synonyms are to be considered is not definite ( at least in
 the view
 of declaring manually in the xml ), its getting updated
 frequently
 depending upon the indexed fiels. Anyways if the list is
 too big its
 hard to follow this approach.

 2) I have another issue too.. We could mention city ,
 place, town in
 the dismax declaration, but what if there is another list
 of synonyms
 like .. if the query is organisation : xyz.. for which I
 would like
 to convert the query to
       organisation:xyz OR company:xyz OR
 institution:xyz .

    As far as I explored it is not possible
 to link city, organisation
 to their corresponding synonyms seperately, but we can only
 decalre a
 set of default field names to be searched.
      If I am wrong at any point, please
 let me know.
       Any other suggestions?
 Thanks.

 If you want field synonyms seperately, then you can extend 
 org.apache.solr.handler.component.SearchHandler and override

 public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
 {

  final String q = req.getParams().get(CommonParams.Q);

  String expandedQuery = process incoming query with string operations e.g. 
 if q.startsWith(organisation:) ;

  ModifiableSolrParams solrParams = new   
 ModifiableSolrParams(req.getParams());

  solrParams.set(CommonParams.Q, expandedQuery);
  req.setParams(solrParams);

  super.handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp);
 }

 Then register this new request handler in solrconfig.xml and use it.
 Does this approach serve your purposes?






Re: query parsing ( expansion ) in solr

2009-12-22 Thread AHMET ARSLAN

 Hello All,
             I have been
 trying to find out the right place to parse
 the query submitted. To be brief, I need to expand the
 query. For
 example.. let the query be
        city:paris
 then I would like to expand the query as .. follows
     city:paris OR place:paris OR town:paris .
 
      I guess the synonym support is
 provided only for values but not
 field names.

Why not use DisMaxRequestHandler? 
...search for the individual words across several fields... 
http://wiki.apache.org/solr/DisMaxRequestHandler





Re: Query Parsing in Custom Request Handler

2009-01-16 Thread Hana

Sorry to all, there was a terrible bug in my code.
I should have checked whether the query was changed by
(q.toString().equals(newQuery.toString())   instead of (q != newQuery)!





Hana wrote:
 
 Hi
 
 I need a help with boolean queries in my custom RequestHandler. The
 purpose of the handler is to translate human readable
 date (like January 1990 or 15.2.1983 or 1995) into two date range fields
 using internal date representation.
 
 E.g. simple search 'q=chronological:1942' translates to
 
 '+from:[1942-01-01T00:00:01Z TO 1942-12-31T23:59:59Z]
 +to:[1942-01-01T00:00:01Z TO 1942-12-31T23:59:59Z]'
 
 Everything works fine in the previous search, but when I try more complex
 boolean search it returns no result.
 
 E.g complex search 'q=London AND chronological:1942'
 
 my RequestHandler translates it to 
 
 '+text:london +(+from:[1942-01-01T00:00:01Z TO 1942-12-31T23:59:59Z]
 +to:[1942-01-01T00:00:01Z TO 1942-12-31T23:59:59Z])'
 
 So this query above doesn't work and I don't see the reason why because it
 seems to produce correct query.
 
 
 I have checked it with direct query bellow, it returns correct results: 
 
 'q=London AND (from:[1942-01-01T00:00:00Z TO 1942-12-31T23:59:59Z] AND
 to:[1942-01-01T00:00:00Z TO 1942-12-31T23:59:59Z])'
 
 and the boolean query syntax is:
 
 '+text:london +(+from:[1942-01-01T00:00:00 TO 1942-12-31T23:59:59]
 +to:[1942-01-01T00:00:00 TO 1942-12-31T23:59:59])'
 
 
 So I do not not understand why the previous is not working when the
 boolean query is totally the same except for the 'Z' char for dates
 string. But as the simple query works it don't seems to be the reason of
 not working of the complex query.
 
 
 Cheers
 
 Hana
 
 
 Here's the code of the RequestHandler:
 
 
 public class CenturyShareRequestHandling extends StandardRequestHandler
 {
  
   public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse
 rsp) throws Exception
   {
 SolrParams p = req.getParams();
 String query = p.get(CommonParams.Q);
 Query q = QueryParsing.parseQuery(query, req.getSchema());
 Query newQuery = searchChronological(q);
 if (q != newQuery)
 {
ModifiableSolrParams m = new
 ModifiableSolrParams(SolrParams.toMultiMap(p.toNamedList()));
   m.remove(CommonParams.Q);
   m.add(CommonParams.Q, newQuery.toString());
   req.setParams(m);
 }
 super.handleRequestBody(req, rsp);
   }
 
  
   private Query searchChronological(Query q)
   {
 if (q instanceof BooleanQuery)
 {
   BooleanQuery bq = (BooleanQuery) q;
   BooleanClause[] cl = bq.getClauses();
   for (int i = 0; i  cl.length; i++)
   {
 if (cl[i].getQuery() instanceof BooleanQuery)
 {
   searchChronological(cl[i].getQuery());
 } else if (cl[i].getQuery() instanceof TermQuery)
 {
   String result = getTemporalTerm((TermQuery) cl[i].getQuery());
   if (result != null)
   {
 Query dateQuery = replaceChronological(result);
 if (dateQuery != null)
   cl[i].setQuery(dateQuery);
   }
 }
   }
 }
 else if (q instanceof TermQuery)
 {
   String result = getTemporalTerm((TermQuery) q);
if (result != null)
   {
 Query dateQuery = replaceChronological(result);
 if (dateQuery != null)
   q = dateQuery;
  }
 }
  return q;
   }
 
   private String getTemporalTerm(TermQuery tq)
   {
  if (chronological.equals(tq.getTerm().field()))
   return tq.getTerm().text();
 else
   return null;
   }
 
   private Query replaceChronological(String chronological)
   {
 DateRange r = getDateRange(chronological);
 BooleanQuery query = null;
 if (r.getStartDate() != null  r.getEndDate() != null)
 {
   String startDate = r.getFormatedStartDate();
   String endDate = r.getFormatedEndDate();
   Term start = new Term(from, startDate);
   Term end = new Term(from, endDate);
   
   RangeQuery startQuery = new RangeQuery(start, end, true);
   start = new Term(to, startDate);
   end = new Term(to, endDate);
   RangeQuery endQuery = new RangeQuery(start, end, true);
   query = new BooleanQuery();
   query.add(new BooleanClause(startQuery, BooleanClause.Occur.MUST));
   query.add(new BooleanClause(endQuery, BooleanClause.Occur.MUST));
 }
 return query;
   }
 
   private DateRange getDateRange(String text)
   {
 if (text == null)
   return null;
 else
 {
   DateParser p = new DateParser();
   return p.parseDateRange(text);
 }
   }
 
 }
 
 

-- 
View this message in context: 
http://www.nabble.com/Query-Parsing-in-Custom-Request-Handler-tp21501351p21504363.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: query parsing issue + behavior as OR (solr 1.4-dev)

2008-10-20 Thread Norberto Meijome
On Mon, 20 Oct 2008 06:21:06 -0700 (PDT)
Sunil Sarje [EMAIL PROTECTED] wrote:

 I am working with nightly build of Oct 17, 2008  and found the issue that
 something wrong with LuceneQParserPlugin; It takes + as OR

Sunil, please do not hijack the thread :

http://en.wikipedia.org/wiki/Thread_hijacking

thanks,
B

_
{Beto|Norberto|Numard} Meijome

He could be a poster child for retroactive birth control.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: query parsing

2008-08-12 Thread Erik Hatcher
Solr/Lucene QueryParser returns a TermQuery for phrases that end up  
only as a single term.  This could happen, for example, if it was  
using Solr's string field type (which has effectively no analyzer).


I'd guess that you'd want to re-analyze TermQuery's?  (though that  
sound problematic for many cases)  Or possibly use your own  
SolrQueryParser subclass and override #getFieldQuery.


Erik

On Aug 12, 2008, at 5:26 AM, Stefan Oestreicher wrote:


Hi,

I need to modify the query to search through all fields if no  
explicit field
has been specified. I know there's the dismax handler but I'd like  
to use

the standard query syntax.
I implemented that with my own QParserPlugin and QParser and for  
simple term
queries it works great. I'm using the SolrQueryParser which I get  
from the
schema to parse the query with an impossible field name as the  
default field

and then I rewrite the query accordingly.
Unfortunately this doesn't work with phrase queries, the  
SolrQueryParser

always returns a TermQuery instead of a phrase query.

What am I missing? Is this even a viable approach?

This is a code snippet from a test case (extending  
AbstractSolrTestCase)

which I used to verify that it's not returning a PhraseQuery:

-8-
SolrQueryParser parser =  
h.getCore().getSchema().getSolrQueryParser(null);

Query q = parser.parse(baz \foo bar\);
assertTrue( q instanceof BooleanQuery );
BooleanQuery bq = (BooleanQuery)q;
BooleanClause[] cl = bq.getClauses();
assertEquals(2, cl.length);
//this assertion fails
assertTrue(cl[1].getQuery() instanceof PhraseQuery);
-8-

I'm using solr 1.3, r685085.

TIA,

Stefan Oestreicher




RE: query parsing

2008-08-12 Thread Stefan Oestreicher
Ah, yes, the FieldType I used was not the one I needed. I completely missed
that. Thank you very much, it's working perfectly now.

thanks,

Stefan Oestreicher

 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, August 12, 2008 11:46 AM
 To: solr-user@lucene.apache.org
 Subject: Re: query parsing
 
 Solr/Lucene QueryParser returns a TermQuery for phrases 
 that end up only as a single term.  This could happen, for 
 example, if it was using Solr's string field type (which 
 has effectively no analyzer).
 
 I'd guess that you'd want to re-analyze TermQuery's?  (though 
 that sound problematic for many cases)  Or possibly use your 
 own SolrQueryParser subclass and override #getFieldQuery.
 
   Erik
 
 On Aug 12, 2008, at 5:26 AM, Stefan Oestreicher wrote:
 
  Hi,
 
  I need to modify the query to search through all fields if 
 no explicit 
  field has been specified. I know there's the dismax handler but I'd 
  like to use the standard query syntax.
  I implemented that with my own QParserPlugin and QParser and for 
  simple term queries it works great. I'm using the SolrQueryParser 
  which I get from the schema to parse the query with an impossible 
  field name as the default field and then I rewrite the query 
  accordingly.
  Unfortunately this doesn't work with phrase queries, the 
  SolrQueryParser always returns a TermQuery instead of a 
 phrase query.
 
  What am I missing? Is this even a viable approach?
 
  This is a code snippet from a test case (extending
  AbstractSolrTestCase)
  which I used to verify that it's not returning a PhraseQuery:
 
  -8-
  SolrQueryParser parser =
  h.getCore().getSchema().getSolrQueryParser(null);
  Query q = parser.parse(baz \foo bar\); assertTrue( q instanceof 
  BooleanQuery ); BooleanQuery bq = (BooleanQuery)q; 
 BooleanClause[] cl 
  = bq.getClauses(); assertEquals(2, cl.length); //this 
 assertion fails
  assertTrue(cl[1].getQuery() instanceof PhraseQuery);
  -8-
 
  I'm using solr 1.3, r685085.
 
  TIA,
 
  Stefan Oestreicher
 
 



Re: query parsing wildcards

2007-11-28 Thread Charles Hornberger
I should have Googled better. It seems that my question has been asked
and answered already, and not just once:

  http://www.nabble.com/Using-wildcard-with-accented-words-tf4673239.html
  
http://groups.google.com/group/acts_as_solr/browse_thread/thread/42920dc2dcc5fa88

On Nov 28, 2007 9:42 AM, Charles Hornberger
[EMAIL PROTECTED] wrote:
 I'm confused by some behavior I'm seeing in Solr (i'm using 1.2.0). I
 have a field named description, declared with the following
 fieldType:

 fieldType name=textTightUnstemmed class=solr.TextField
 positionIncrementGap=100 
   analyzer
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=false/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 The problem I'm having is that when I search for description:deck*, I
 get the results I expect; when I search for description:Deck*, I get
 nothing. I want both queries to return the same result set. (I'm using
 the standard request handler.)

 Interestingly, when I search for description:Deck from the web
 interface, the debug output shows that the query term is converted to
 lowercase:

 str name=rawquerystringdescription:Deck/str
 str name=querystringdescription:Deck/str
 str name=parsedquerydescription:deck/str
 str name=parsedquery_toStringdescription:deck/str

 ... but when I search for description:Deck*, it shows that it is not:

 str name=rawquerystringdescription:Deck*/str
 str name=querystringdescription:Deck*/str
 str name=parsedquerydescription:Deck*/str
 str name=parsedquery_toStringdescription:Deck*/str

 What am I doing wrong here?

 Also, when I use the Field Analysis tool for description:Deck*, it
 shows the following (sorry for the bad copy/paste):

 Query Analyzer
 org.apache.solr.analysis.WhitespaceTokenizerFactory {}
 term position   1
 term text   Deck*
 term type   word
 source start,end0,5
 org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
 expand=false, ignoreCase=true}
 term position   1
 term text   Deck*
 term type   word
 source start,end0,5
 org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
 ignoreCase=true}
 term position   1
 term text   Deck*
 term type   word
 source start,end0,5
 org.apache.solr.analysis.WordDelimiterFilterFactory
 {generateNumberParts=0, catenateWords=1, generateWordParts=0,
 catenateAll=0, catenateNumbers=1}
 term position   1
 term text   Deck
 term type   word
 source start,end0,4
 org.apache.solr.analysis.LowerCaseFilterFactory {}
 term position   1
 term text   deck
 term type   word
 source start,end0,4
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
 term position   1
 term text   deck
 term type   word
 source start,end0,4

 Thanks,
 Charlie



Re: query parsing wildcards

2007-11-28 Thread Chris Hostetter

: I should have Googled better. It seems that my question has been asked
: and answered already, and not just once:

right, wildcard and prefix queries aren't analyzed by the query 
parser (there's more on the why of this in the Lucene-Java FAQ).

To clarify one other part of your question

:  Also, when I use the Field Analysis tool for description:Deck*, it
:  shows the following (sorry for the bad copy/paste):

the analysis tool only shows you the analysis portion of 
indexing/querying ... it knows nothing about which query parser you are 
using, so it doesn't know anything about any special query parser 
characters (like *).  The output it gave you shows you want the 
standard request handler would have done if you'd used the standard 
request handler to search for...
 description:Deck*
or:  description:Deck\*

(where the * character is 'escaped')



-Hoss