Re: searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal
Disregard my previous response.  When I reindexed, something went wrong and
so my Lucene database was empty, which explains the immediate results and 0
results.  I reindexed again (properly) and all is working find now.  Thanks
for the help.
Mark

On Fri, Jun 7, 2019 at 10:40 AM Erick Erickson 
wrote:

> Yeah, it can be opaque…
>
> My first guess is that you may not have a field “posttime” defined in your
> schema and/or documents. For searching it needs “indexed=true” and for
> faceting/grouping/sorting it should have “docValues=true”. That’s what your
> original facet query was telling you, the field isn’t there. Switching to
> an “fq” clause is consistent with there being no “posttime” field since
> Solr is fine with  docs that don’t have a  particular field. So by
> specifying a date range, any doc without a “posttime” field will be omitted
> from the results.
>
> Or it  just is spelled differently ;)
>
> Some things that might help:
>
> 1> Go to the admin UI and select cores>>your_core, then look at the
> “schema” link. There’s a drop-down that lets you select fields that are
> actually in your index and see  some of the values. My bet: “posttime”
> isn’t in the list. If so, you need to add it and re-index the docs  with a
> posttime field. If there is a “posttime”, select it and look at the upper
> right to see how it’s defined. There are two rows, one for what the schema
> thinks the definition is and one for what is actually in the Lucene  index.
>
> 2> add =query to your queries, and run them from the admin UI.
> That’ll give you a _lot_ quicker turn-around as well as some good info
> about how  the query was actually executed.
>
> Best,
> Erick
>
> > On Jun 7, 2019, at 7:23 AM, Mark Fenbers - NOAA Federal
>  wrote:
> >
> > So, instead of addDateRangeFacet(), I used:
> > query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
> > 2015-01-01T00:00:00Z]");
> >
> > I didn't get any errors, but the query returned immediately with 0
> > results.  Without this contraint, it searches 13,000 records and takes 1
> to
> > 2 minutes and returns 356 records.  So something is not quite right, and
> > I'm too new at this to understand where I went wrong.
> > Mark
> >
> > On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
> > wrote:
> >
> >> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
> >> it doesn't have any constraint on the results (i.e. it doesn't filter at
> >> all).
> >> You need to add a filter query [1] with a date range clause (e.g.
> >> fq=field:[ TO  >> or *>]).
> >>
> >> Best,
> >> Andrea
> >>
> >> [1]
> >>
> >>
> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
> >> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
> >>
> >> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
> >>> Hello!
> >>>
> >>> I have a search setup and it works fine.  I search a text field called
> >>> "logtext" in a database table.  My Java code is like this:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> query.setParam("df", "logtext");
> >>>
> >>> Then I execute the search... and it works just great.  But now I want
> to
> >>> add a constraint to only search for the "searchWord" within a certain
> >> range
> >>> of time -- given timestamps in the column called "posttime".  So, I
> added
> >>> the code in bold below:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> *query.setFacet(true);*
> >>> *query.addDateRangeFacet("posttime", new
> Date(System.currentTimeMillis()
> >> -
> >>> 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
> >> /*
> >>> from 1 year ago to present) */*
> >>> query.setParam("df", "logtext");
> >>>
> >>> But this gives me a complaint: *undefined field: "posttime"* so I
> clearly
> >>> do not understand the arguments needed to addDateRangeFacet().  Can
> >> someone
> >>> help me determine the proper code for doing what I want?
> >>>
> >>> Further, I am puzzled about the "gap" argument [last one in
> >>> addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
> >> have
> >>> no idea the purpose of this.  I haven't found any documentation that
> >>> explains this well.
> >>>
> >>> Mark
> >>>
> >>
> >>
>
>


Re: searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal
I added "posttime" to the schema first thing this morning, but your message
reminded me that I needed to re-index the table, which I did.  My schema
entry:



But my SQL contains "SELECT posttime as id" as so I tried both "posttime"
and "id" in my setParam() function, namely,
query.setParam("fq", "id:[2007-01-01T00:00:00Z TO 2010-01-01T00:00:00Z]");

So, whether I use "id" (string) or "posttime" (date), my results are an
immediate return of zero results.

I did look in the admin interface and *did* see posttime listed as one of
the index items.  The two rows (Index Analyzer and Query Analyzer) show the
same thing: org.apache.solr.schema.FieldType$DefaultAnalyzer, though I'm
not certain of the implications of this.

I have not attempted your =query suggestion just yet...
Mark

On Fri, Jun 7, 2019 at 10:40 AM Erick Erickson 
wrote:

> Yeah, it can be opaque…
>
> My first guess is that you may not have a field “posttime” defined in your
> schema and/or documents. For searching it needs “indexed=true” and for
> faceting/grouping/sorting it should have “docValues=true”. That’s what your
> original facet query was telling you, the field isn’t there. Switching to
> an “fq” clause is consistent with there being no “posttime” field since
> Solr is fine with  docs that don’t have a  particular field. So by
> specifying a date range, any doc without a “posttime” field will be omitted
> from the results.
>
> Or it  just is spelled differently ;)
>
> Some things that might help:
>
> 1> Go to the admin UI and select cores>>your_core, then look at the
> “schema” link. There’s a drop-down that lets you select fields that are
> actually in your index and see  some of the values. My bet: “posttime”
> isn’t in the list. If so, you need to add it and re-index the docs  with a
> posttime field. If there is a “posttime”, select it and look at the upper
> right to see how it’s defined. There are two rows, one for what the schema
> thinks the definition is and one for what is actually in the Lucene  index.
>
> 2> add =query to your queries, and run them from the admin UI.
> That’ll give you a _lot_ quicker turn-around as well as some good info
> about how  the query was actually executed.
>
> Best,
> Erick
>
> > On Jun 7, 2019, at 7:23 AM, Mark Fenbers - NOAA Federal
>  wrote:
> >
> > So, instead of addDateRangeFacet(), I used:
> > query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
> > 2015-01-01T00:00:00Z]");
> >
> > I didn't get any errors, but the query returned immediately with 0
> > results.  Without this contraint, it searches 13,000 records and takes 1
> to
> > 2 minutes and returns 356 records.  So something is not quite right, and
> > I'm too new at this to understand where I went wrong.
> > Mark
> >
> > On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
> > wrote:
> >
> >> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
> >> it doesn't have any constraint on the results (i.e. it doesn't filter at
> >> all).
> >> You need to add a filter query [1] with a date range clause (e.g.
> >> fq=field:[ TO  >> or *>]).
> >>
> >> Best,
> >> Andrea
> >>
> >> [1]
> >>
> >>
> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
> >> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
> >>
> >> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
> >>> Hello!
> >>>
> >>> I have a search setup and it works fine.  I search a text field called
> >>> "logtext" in a database table.  My Java code is like this:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> query.setParam("df", "logtext");
> >>>
> >>> Then I execute the search... and it works just great.  But now I want
> to
> >>> add a constraint to only search for the "searchWord" within a certain
> >> range
> >>> of time -- given timestamps in the column called "posttime".  So, I
> added
> >>> the code in bold below:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> *query.setFacet(true);*
> >>> *query.addDateRangeFacet("posttime", new
> Date(System.currentTimeMillis()
> >> -
> >>> 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
> >> /*
> >>> from 1 year ago to present) */*
> >>> query.setParam("df", "logtext");
> >>>
> >>> But this gives me a complaint: *undefined field: "posttime"* so I
> clearly
> >>> do not understand the arguments needed to addDateRangeFacet().  Can
> >> someone
> >>> help me determine the proper code for doing what I want?
> >>>
> >>> Further, I am puzzled about the "gap" argument [last one in
> >>> addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
> >> have
> >>> no idea the purpose of this.  I haven't found any documentation that
> >>> explains this well.
> >>>
> >>> Mark
> >>>
> >>
> >>
>
>


Re: searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal
So, instead of addDateRangeFacet(), I used:
query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
2015-01-01T00:00:00Z]");

I didn't get any errors, but the query returned immediately with 0
results.  Without this contraint, it searches 13,000 records and takes 1 to
2 minutes and returns 356 records.  So something is not quite right, and
I'm too new at this to understand where I went wrong.
Mark

On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
wrote:

> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
> it doesn't have any constraint on the results (i.e. it doesn't filter at
> all).
> You need to add a filter query [1] with a date range clause (e.g.
> fq=field:[ TO  or *>]).
>
> Best,
> Andrea
>
> [1]
>
> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
>
> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
> > Hello!
> >
> > I have a search setup and it works fine.  I search a text field called
> > "logtext" in a database table.  My Java code is like this:
> >
> > SolrQuery query - new SolrQuery();
> > query.setQuery(searchWord);
> > query.setParam("df", "logtext");
> >
> > Then I execute the search... and it works just great.  But now I want to
> > add a constraint to only search for the "searchWord" within a certain
> range
> > of time -- given timestamps in the column called "posttime".  So, I added
> > the code in bold below:
> >
> > SolrQuery query - new SolrQuery();
> > query.setQuery(searchWord);
> > *query.setFacet(true);*
> > *query.addDateRangeFacet("posttime", new Date(System.currentTimeMillis()
> -
> > 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
> /*
> > from 1 year ago to present) */*
> > query.setParam("df", "logtext");
> >
> > But this gives me a complaint: *undefined field: "posttime"* so I clearly
> > do not understand the arguments needed to addDateRangeFacet().  Can
> someone
> > help me determine the proper code for doing what I want?
> >
> > Further, I am puzzled about the "gap" argument [last one in
> > addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
> have
> > no idea the purpose of this.  I haven't found any documentation that
> > explains this well.
> >
> > Mark
> >
>
>


searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal
Hello!

I have a search setup and it works fine.  I search a text field called
"logtext" in a database table.  My Java code is like this:

SolrQuery query - new SolrQuery();
query.setQuery(searchWord);
query.setParam("df", "logtext");

Then I execute the search... and it works just great.  But now I want to
add a constraint to only search for the "searchWord" within a certain range
of time -- given timestamps in the column called "posttime".  So, I added
the code in bold below:

SolrQuery query - new SolrQuery();
query.setQuery(searchWord);
*query.setFacet(true);*
*query.addDateRangeFacet("posttime", new Date(System.currentTimeMillis() -
1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY"); /*
from 1 year ago to present) */*
query.setParam("df", "logtext");

But this gives me a complaint: *undefined field: "posttime"* so I clearly
do not understand the arguments needed to addDateRangeFacet().  Can someone
help me determine the proper code for doing what I want?

Further, I am puzzled about the "gap" argument [last one in
addDateRangeFacet()].  What does this do?  I used +1DAY, but I really have
no idea the purpose of this.  I haven't found any documentation that
explains this well.

Mark


Ugh! My term is the entire record

2015-12-16 Thread Mark Fenbers

Greetings,

I had my Solr searching capabilities working for a while.  But today I 
inadvertently "unload"d my core from the Admin Interface. After adding 
it back in, it is not working right. Because Solr was down for a while 
in recent weeks, I have also done a full import with the clean option.  
So now, searching on words like Ohio or forecast (both very popular 
words in the documents) return 0 results.


In Schema Browser, "Show Term Info" now reveals that my terms are the 
*entire* text string record instead of individual words.  I had come 
across this issue before, during initially setting up Solr, but now I 
can't remember what I had done to get it to index each *word* instead of 
the entire String stored in the DB record.


Can someone please point me to the trick that does the proper parsing 
and indexing of *each word* in each record?


thanks!
Mark


Re: Ugh! My term is the entire record

2015-12-16 Thread Mark Fenbers

Yup! That was it!  Thanks!
(I changed "string" to "text_en" in my backup copy, too, so this doesn't 
happen again.)

Mark

On 12/16/2015 10:44 AM, Binoy Dalal wrote:

What is the type of the fields in question?
What you're seeing will happen if a field is of type string. If this is the
case then try changing your field type to text_en or text_general depending
on your requirements.

On Wed, 16 Dec 2015, 19:51 Mark Fenbers <mark.fenb...@noaa.gov> wrote:





logical steps to configuring file-based spell-check

2015-11-01 Thread Mark Fenbers

Greetings!

I want my spell-checker to be based on a file 
(/usr/share/dict/linux.words should suffice).  Word-breaks features 
would also be a benefit.  I have previously indexed my docs for 
searching with minimal alterations to the baseline Solr configuration.  
My "docs" are user-typed text, typically a paragraph or two.  The Solr 
searching feature works very well with my local customization.  With the 
success of using the search feature, I now move on to adding 
spell-checking capabilities to my project.


Though my archive of docs *does* contain many technical terms and coded 
site identifiers, I prefer not to use the index-based spellcheck at this 
time, because the archive has never been previously spell-checked and 
I'm apprehensive that misspelled words will appear in my suggestions.  
But the index-based spell-checker is the baseline configuration, so I 
need to change that to use file-based spell checking.  Intuitively, this 
seems as simple as commenting out the IndexBasedSpellChecker XML section 
and uncommenting the FileBasedSpellChecker XML section in the 
solrconfig.xml file that I've customized.  But in doing that, I have 
gotten quite bizarre results, and though I've had much help from some 
very smart (and patient) contributors on this forum, I still have never 
gotten spell-checking to work in any meaningful way, even using the 
debugger.


So, my question for now is:

Should setting up a file-based spell checker just a matter of starting 
with the baseline solrconfig.xml and commenting out the Index-based 
spell checker and uncommenting the File-based Spell Checker (and 
changing the SourceLocation value), or am I overlooking too much??  But 
my second question is, which "baseline" solrconfig.xml should I use as a 
starting point, because there are several solrconfig.xml file nested in 
the subfolders when I unzip the tarball?  I'm using 5.3.0 in case that 
matters.


Thanks!
Mark




Re: File-based Spelling

2015-10-19 Thread Mark Fenbers
OK.  I removed it, started Solr, adn refreshed the query, but my results 
are the same, indicating that queryAnalyzerFieldType has nothing to do 
with my problem.


New ideas??
Mark

On 10/19/2015 4:37 AM, Duck Geraint (ext) GBJH wrote:

"Yet, it claimed it found my misspelled word to be "fenber" without the "s""
I wonder if this is because you seem to applying a stemmer to your dictionary 
words.

Try removing the "text_en" line from 
your spellcheck search component definition.

Geraint


Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com


-Original Message-
From: Mark Fenbers [mailto:mark.fenb...@noaa.gov]
Sent: 16 October 2015 19:43
To: solr-user@lucene.apache.org
Subject: Re: File-based Spelling

On 10/13/2015 9:30 AM, Dyer, James wrote:

Mark,

The older spellcheck implementations create an n-gram sidecar index, which is 
why you're seeing your name split into 2-grams like this.  See the IR Book by 
Manning et al, section 3.3.4 for more information.  Based on the results you're 
getting, I think it is loading your file correctly.  You should now try a query 
against this spelling index, using words *not* in the file you loaded that are 
within 1 or 2 edits from something that is in the dictionary.  If it doesn't 
yield suggestions, then post the relevant sections of the solrconfig.xml, 
schema.xml and also the query string you are trying.

James Dyer
Ingram Content Group


James, I've already done this.   My query string was "fenbers". This is
my last name which does *not* occur in the linux.words file.  It is only
1 edit distance from "fenders" which *is* in the linux.words file.  Yet, it claimed it found my 
misspelled word to be "fenber" without the "s"
and it gave me these 8 suggestions:
f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

So I'm attaching the the entire solrconfig.xml and schema.xml that is in 
effect.  These are in a single file with all the block comments removed.

I'm also puzzled that you say "older implementations create a sidecar index"... 
because I am using v5.3.0, which was the latest version as of my download a month or two 
ago.  So, with my implementation being recent, why is an n-gram sidecar index still 
(seemingly) being produced?

thanks for the help!
Mark






Syngenta Limited, Registered in England No 2710846;Registered Office : Syngenta 
Limited, European Regional Centre, Priestley Road, Surrey Research Park, 
Guildford, Surrey, GU2 7YH, United Kingdom

  This message may contain confidential information. If you are not the 
designated recipient, please notify the sender immediately, and delete the 
original and any copies. Any use of the message by you is prohibited.




Re: NullPointerException

2015-10-16 Thread Mark Fenbers
Yes, I'm aware that building an index is expensive and I will remove 
"buildOnStartup" once I have it working.  The field I added was an 
attempt to get it working...


I have attached my latest version of solrconfig.xml and schema.xml (both 
are in the same attachment), except that I have removed all block 
comments for your easier scrutiny.  The source of the correctly spelled 
words is a RedHat baseline file called /usr/share/dict/linux.words.  
(Does this also mean it is the source of the suggestions?)


thanks for the help!

Mark

On 10/13/2015 7:07 AM, Alessandro Benedetti wrote:

Generally it is highly discouraged to build the spellcheck on startup.
In the case of big suggestion file, you are going to build the suggester
data structures ( basically FST in memory and then in disk) for a long
time, on startup.
You should build your spellchecker only when you change the file source of
the suggestions,

Checking the snippet, first I see you add a field to the FileBased
Spellchecker config, which is useless.
Anyway should not be the problem.
Can you give us the source of suggestions ?
A snippet of the file ?

Cheers

On 13 October 2015 at 10:02, Duck Geraint (ext) GBJH <
geraint.d...@syngenta.com> wrote:





  5.3.0
  ${solr.data.dir:}
  
   
  

  
  
${solr.lock.type:native}
 true
  
  
  

  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}

  
   ${solr.autoCommit.maxTime:15000} 
   false 
 
  
   ${solr.autoSoftCommit.maxTime:-1} 
 

  
  
1024



   
 


true

   20
   200
false
2

  
  
 

 

  

  
 
   explicit
   10
 



  
  
 
   explicit
   json
   true
   text
 
  

  

  {!xport}
  xsort
  false



  query

  
  
  

/localapps/dev/EventLog/solr/EventLog2/conf/data-config.xml 
   

  

  

  text

  

  
  

  
  

 explicit 
 true

  
  
  
text_en


  WordBreak
  solr.WordBreakSolrSpellChecker
  logtext
  true
  true
  10



 solr.FileBasedSpellChecker
logtext 
FileDict
 /usr/share/dict/linux.words
 UTF-8
 /localapps/dev/EventLog/solr/EventLog2/data/spFile
 true
   0.5

  2

  1

  5

  4

  0.01

  
  
  

  


  FileDict
  WordBreak
  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  
  
  

  
  
 
  true
  false
 

  terms

  

  
  
*:*
  

















   
   





   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   

   
   
   

   
   
   
   
   
   

   

   
   

   

 id
















 









  

  


  



  
  




  




  







  
  







  



  






  
  







  



  








  



  




  
  




  



  





  




  


  
















Re: File-based Spelling

2015-10-16 Thread Mark Fenbers

On 10/13/2015 9:30 AM, Dyer, James wrote:

Mark,

The older spellcheck implementations create an n-gram sidecar index, which is 
why you're seeing your name split into 2-grams like this.  See the IR Book by 
Manning et al, section 3.3.4 for more information.  Based on the results you're 
getting, I think it is loading your file correctly.  You should now try a query 
against this spelling index, using words *not* in the file you loaded that are 
within 1 or 2 edits from something that is in the dictionary.  If it doesn't 
yield suggestions, then post the relevant sections of the solrconfig.xml, 
schema.xml and also the query string you are trying.

James Dyer
Ingram Content Group

James, I've already done this.   My query string was "fenbers". This is 
my last name which does *not* occur in the linux.words file.  It is only 
1 edit distance from "fenders" which *is* in the linux.words file.  Yet, 
it claimed it found my misspelled word to be "fenber" without the "s" 
and it gave me these 8 suggestions:

f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

So I'm attaching the the entire solrconfig.xml and schema.xml that is in 
effect.  These are in a single file with all the block comments removed.


I'm also puzzled that you say "older implementations create a sidecar 
index"... because I am using v5.3.0, which was the latest version as of 
my download a month or two ago.  So, with my implementation being 
recent, why is an n-gram sidecar index still (seemingly) being produced?


thanks for the help!
Mark





  5.3.0
  ${solr.data.dir:}
  
   
  

  
  
${solr.lock.type:native}
 true
  
  
  

  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}

  
   ${solr.autoCommit.maxTime:15000} 
   false 
 
  
   ${solr.autoSoftCommit.maxTime:-1} 
 

  
  
1024



   
 


true

   20
   200
false
2

  
  
 

 

  

  
 
   explicit
   10
 



  
  
 
   explicit
   json
   true
   text
 
  

  

  {!xport}
  xsort
  false



  query

  
  
  

/localapps/dev/EventLog/solr/EventLog2/conf/data-config.xml 
   

  

  

  text

  

  
  

  
  

 explicit 
 true

  
  
  
text_en


  WordBreak
  solr.WordBreakSolrSpellChecker
  logtext
  true
  true
  10



 solr.FileBasedSpellChecker
logtext 
FileDict
 /usr/share/dict/linux.words
 UTF-8
 /localapps/dev/EventLog/solr/EventLog2/data/spFile
 true
   0.5

  2

  1

  5

  4

  0.01

  
  
  

  


  FileDict
  WordBreak
  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  
  
  

  
  
 
  true
  false
 

  terms

  

  
  
*:*
  

















   
   





   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   

   
   
   

   
   
   
   
   
   

   

   
   

   

 id
















 









  

  


  



  
  




  




  







  
  







  



  






  
  







  



  








  



  




  
  




  



  





  




  


  
















Re: NullPointerException

2015-10-12 Thread Mark Fenbers

On 10/12/2015 5:38 AM, Duck Geraint (ext) GBJH wrote:

"When I use the Admin UI (v5.3.0), and check the spellcheck.build box"
Out of interest, where is this option within the Admin UI? I can't find 
anything like it in mine...
This is in the expanded options that open up once I put a checkmark in 
the "spellcheck" box.

Do you get the same issue by submitting the build command directly with 
something like this instead:
http://localhost:8983/solr//ELspell?spellcheck.build=true
?

Yes, I do.

It'll be reasonably obvious if the dictionary has actually built or not by the 
file size of your speller store:
/localapps/dev/EventLog/solr/EventLog2/data/spFile


Otherwise, (temporarily) try adding...
true
...to your spellchecker search component config, you might find it'll log a 
more useful error message that way.
Interesting!  The index builds successfully using this method and I get 
no stacktrace error.  Hurray!  But why??


So now, I tried running a query, so I typed Fenbers into the 
spellcheck.q box, and I get the following 9 suggestions:

fenber
f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

I find this very odd because I commented out all references to the 
wordbreak checker in solrconfig.xml.  What do I configure so that Solr 
will give me sensible suggestions like:

  fenders
  embers
  fenberry
and so on?

Mark



File-based Spelling

2015-10-12 Thread Mark Fenbers

Greetings!

I'm attempting to use a file-based spell checker.  My sourceLocation is 
/usr/share/dict/linux.words, and my spellcheckIndexDir is set to 
./data/spFile.  BuildOnStartup is set to true, and I see nothing to 
suggest any sort of problem/error in solr.log.  However, in my 
./data/spFile/ directory, there are only two files: segments_2 with only 
71 bytes in it, and a zero-byte write.lock file.  For a source 
dictionary having 480,000 words in it, I was expecting a bit more 
substance in the ./data/spFile directory.  Something doesn't seem right 
with this.


Moreover, I ran a query on the word Fenbers, which isn't listed in the 
linux.words file, but there are several similar words.  The results I 
got back were odd, and suggestions included the following:

fenber
f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

But I expected suggestions like fenders, embers, and fenberry, etc. I 
also ran a query on Mark (which IS listed in linux.words) and got back 
two suggestions in a similar format.  I played with configurables like 
changing the fieldType from text_en to string and the characterEncoding 
from UTF-8 to ASCII, etc., but nothing seemed to yield any different 
results.


Can anyone offer suggestions as to what I'm doing wrong?  I've been 
struggling with this for more than 40 hours now!  I'm surprised my 
persistence has lasted this long!


Thanks,
Mark


NullPointerException

2015-10-10 Thread Mark Fenbers

Greetings!

I'm new to Solr Spellchecking...  I have yet to get it to work.

Attached is a snippet from my solrconfig.xml pertaining to my spellcheck 
efforts.


When I use the Admin UI (v5.3.0), and check the spellcheck.build box, I 
get a NullPointerException stacktrace.  The actual stacktrace is at the 
bottom of the attachment.  My spellcheck.q is the following:

Solr will yuse suggestions frum both.

The FileBasedSpellChecker.build method is clearly the problem 
(determined from the stack trace), but I cannot figure out why.


Maybe I don't need to do a build on it...(?)  If I don't, the 
spell-checker finds no mispelled words.  yet, "yuse" and "frum" are not 
stand-alone words in /usr/share/dict/words.


/usr/share/dict/words exists and has global read permissions.  I 
displayed the file and see no issues (i.e., one word per line) although 
some "words" are a string of digits, but that shouldn't matter.


Does my snippet give any clues about why I would get this error? Is my 
stripped down configuration missing something, perhaps?


Mark

  
text_en



   
 solr.FileBasedSpellChecker
logtext 
FileDict
 /usr/share/dict/words
 UTF-8
 /localapps/dev/EventLog/solr/EventLog2/data/spFile
   
  
  

  

  FileDict

  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  


"trace": "java.lang.NullPointerException\n\tat 
org.apache.lucene.search.spell.SpellChecker.indexDictionary(SpellChecker.java:509)\n\tat
 
org.apache.solr.spelling.FileBasedSpellChecker.build(FileBasedSpellChecker.java:74)\n\tat
 
org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:124)\n\tat
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:251)\n\tat
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
 org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat
 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat
 java.lang.Thread.run(Thread.java:745)\n",


Re: Solr vs Lucene

2015-10-02 Thread Mark Fenbers
Thanks for the suggestion, but I've looked at aspell and hunspell and 
neither provide a native Java API.  Further, I already use Solr for a 
search engine, too, so why not stick with this infrastructure for 
spelling, too?  I think it will work well for me once I figure out the 
right configuration to get it to do what I want it to.


Mark

On 10/1/2015 4:16 PM, Walter Underwood wrote:

If you want a spell checker, don’t use a search engine. Use a spell checker. 
Something like aspell (http://aspell.net/ ) will be faster 
and better than Solr.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)






NullPointerException

2015-10-02 Thread Mark Fenbers

Greetings!

Attached is a snippet from solrconfig.xml pertaining to my spellcheck 
efforts.  When I use the Admin UI (v5.3.0), and check the 
spellcheck.build box, I get a NullPointerException stacktrace.  The 
actual stacktrace is at the bottom of the attachment.  The 
FileBasedSpellChecker.build is clearly the problem, but I cannot figure 
out why.  /usr/share/dict/words exists and has global read permissions.  
I displayed the file and see no issues (i.e., one word per line) 
although some "words" are a string of digits, but that shouldn't matter.


Does my snippet give any clues about why I would get this error? Is my 
stripped down configuration missing something, perhaps?


Mark
  
text_en



   
 solr.FileBasedSpellChecker
logtext 
FileDict
 /usr/share/dict/words
 UTF-8
 /localapps/dev/EventLog/solr/EventLog2/data/spFile
   
  
  

  

  FileDict

  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  


"trace": "java.lang.NullPointerException\n\tat 
org.apache.lucene.search.spell.SpellChecker.indexDictionary(SpellChecker.java:509)\n\tat
 
org.apache.solr.spelling.FileBasedSpellChecker.build(FileBasedSpellChecker.java:74)\n\tat
 
org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:124)\n\tat
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:251)\n\tat
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat
 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat
 org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat
 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat
 java.lang.Thread.run(Thread.java:745)\n",


Solr vs Lucene

2015-10-01 Thread Mark Fenbers

Greetings!

Being a newbie, I'm still mostly in the dark regarding where the line is 
between Solr and Lucene.  The following code snippet is -- I think -- 
all Lucene and no Solr.  It is a significantly modified version of some 
example code I found on the net.


dir = 
FSDirectory.open(FileSystems.getDefault().getPath("/localapps/dev/EventLog/solr/data", 
"SpellIndex"));

speller = new SpellChecker(dir);
fis = new FileInputStream("/usr/share/dict/words");
analyzer = new StandardAnalyzer();
speller.indexDictionary(new PlainTextDictionary(EventLog.fis), new 
IndexWriterConfig(analyzer), false);


// now let's see speller in action...
System.out.println(speller.exist("beez"));  // returns false
System.out.println(speller.exist("bees"));  // returns true

String[] suggestions = speller.suggestSimilar("beez", 10);
for (String suggestion : suggestions)
System.err.println(suggestion);

(Later in my code, I close what objects need to be...)  This code 
(above) does the following:


1. identifies whether a given word is misspelled or spelled correctly.
2. Gives alternate suggestions to a given word (whether spelled
   correctly or not).
3. I presume, but haven't tested this yet, that I can add a second or
   third word list to the index, say, a site dictionary containing
   names of people or places commonly found in the text.

But this code does not:

1. parse any given text into words, and testing each word.
2. provide markers showing where the misspelled/suspect words are
   within the text.

and so my code will have to provide the latter functionality.  Or does 
Solr provide this capability, such that it would be silly to write my own?


Thanks,

Mark



Re: highlighting

2015-10-01 Thread Mark Fenbers
Yeah, I thought about using markers, but then I'd have to search the the 
text for the markers to determine the locations.  This is a clunky way 
of getting the results I want, and it would save two steps if Solr 
merely had an option to return a start/length array (of what should be 
highlighted) in the original string rather than returning an altered 
string with tags inserted.


Mark

On 9/29/2015 7:04 AM, Upayavira wrote:

You can change the strings that are inserted into the text, and could
place markers that you use to identify the start/end of highlighting
elements. Does that work?

Upayavira

On Mon, Sep 28, 2015, at 09:55 PM, Mark Fenbers wrote:

Greetings!

I have highlighting turned on in my Solr searches, but what I get back
is  tags surrounding the found term.  Since I use a SWT StyledText
widget to display my search results, what I really want is the offset
and length of each found term, so that I can highlight it in my own way
without HTML.  Is there a way to configure Solr to do that?  I couldn't
find it.  If not, how do I go about posting this as a feature request?

Thanks,
Mark




Re: Solr vs Lucene

2015-10-01 Thread Mark Fenbers
Yes, and I've spend numerous hours configuring and reconfiguring, and 
eventually even starting over, but still have not getting it to work 
right.  Even now, I'm getting bizarre results.  For example, I query   
"NOTE: This is purely as an example."  and I get back really bizarre 
suggestions, like "n ot e" and "n o te" and "n o t e" for the first word 
which isn't even misspelled!  The same goes for "purely" and "example" 
also!  Moreover, I get extended results showing the frequencies of these 
suggestions being over 2600 occurrences, when I'm not even using an 
indexed spell checker.  I'm only using a file-based spell checker 
(/usr/shar/dict/words), and the wordbreak checker.


At this point, I can't even figure out how to narrow down my confusion 
so that I can post concise questions to the group.  But I'll get there 
eventually, starting with removing the wordbreak checker for the 
time-being.  Your response was encouraging, at least.


Mark


On 10/1/2015 9:45 AM, Alexandre Rafalovitch wrote:

Hi Mark,

Have you gone through a Solr tutorial yet? If/when you do, you will
see you don't need to code any of this. It is configured as part of
the web-facing total offering which are tweaked by XML configuration
files (or REST API calls). And most of the standard pipelines are
already pre-configured, so you don't need to invent them from scratch.

On your specific question, it would be better to ask what _business_
level functionality you are trying to achieve and see if Solr can help
with that. Starting from Lucene code is less useful :-)

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 1 October 2015 at 07:48, Mark Fenbers <mark.fenb...@noaa.gov> wrote:


Re: Solr vs Lucene

2015-10-01 Thread Mark Fenbers
This is with Solr.  The Lucene approach (assuming that is what is in my 
Java code, shared previously) works flawlessly, albeit with fewer 
options, AFAIK.


I'm not sure what you mean by "business case"...  I'm wanting to 
spell-check user-supplied text in my Java app.  The end-user then 
activates the spell-checker on the entire text (presumably, a few 
paragraphs or less).  I can use StyledText's capabilities to highlight 
the misspelled words, and when the user clicks the highlighted word, a 
menu will appear where he can select a suggested spelling.


But so far, I've had trouble:

 * determining which words are misspelled (because Solr often returns
   suggestions for correctly spelled words).
 * getting coherent suggestions (regardless if the query word is
   misspelled or not).

It's been a bit puzzling (and frustrating)!!  it only took me 10 minutes 
to get the Lucene spell checker working, but I agree that Solr would be 
the better way to go, if I can ever get it configured properly...


Mark


On 10/1/2015 12:50 PM, Alexandre Rafalovitch wrote:

Is that with Lucene or with Solr? Because Solr has several different
spell-checker modules you can configure.  I would recommend trying
them first.

And, frankly, I still don't know what your business case is.

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 1 October 2015 at 12:38, Mark Fenbers <mark.fenb...@noaa.gov> wrote:

Yes, and I've spend numerous hours configuring and reconfiguring, and
eventually even starting over, but still have not getting it to work right.
Even now, I'm getting bizarre results.  For example, I query   "NOTE: This
is purely as an example."  and I get back really bizarre suggestions, like
"n ot e" and "n o te" and "n o t e" for the first word which isn't even
misspelled!  The same goes for "purely" and "example" also!  Moreover, I get
extended results showing the frequencies of these suggestions being over
2600 occurrences, when I'm not even using an indexed spell checker.  I'm
only using a file-based spell checker (/usr/shar/dict/words), and the
wordbreak checker.

At this point, I can't even figure out how to narrow down my confusion so
that I can post concise questions to the group.  But I'll get there
eventually, starting with removing the wordbreak checker for the time-being.
Your response was encouraging, at least.

Mark



On 10/1/2015 9:45 AM, Alexandre Rafalovitch wrote:

Hi Mark,

Have you gone through a Solr tutorial yet? If/when you do, you will
see you don't need to code any of this. It is configured as part of
the web-facing total offering which are tweaked by XML configuration
files (or REST API calls). And most of the standard pipelines are
already pre-configured, so you don't need to invent them from scratch.

On your specific question, it would be better to ask what _business_
level functionality you are trying to achieve and see if Solr can help
with that. Starting from Lucene code is less useful :-)

Regards,
 Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 1 October 2015 at 07:48, Mark Fenbers <mark.fenb...@noaa.gov> wrote:




highlighting

2015-09-28 Thread Mark Fenbers

Greetings!

I have highlighting turned on in my Solr searches, but what I get back 
is  tags surrounding the found term.  Since I use a SWT StyledText 
widget to display my search results, what I really want is the offset 
and length of each found term, so that I can highlight it in my own way 
without HTML.  Is there a way to configure Solr to do that?  I couldn't 
find it.  If not, how do I go about posting this as a feature request?


Thanks,
Mark


Re: query parsing

2015-09-27 Thread Mark Fenbers
I am delighted to announce that I have it all working again!  Well, not 
all, just the searching!


I deleted my core and created a new one from the command-line (solr 
create_core -c EventLog2) using the basic_configs option. Then I had to 
add my columns to the schema.xml and the dataimport handler to 
solrconfig.xml and tweak a couple of other details. But to make a long 
story short, parsing is working and I can search on terms without 
wrapping asterisks!!  Yay!  Thanks for the help!


Spell-checking still isn't working, though, and I'm apprehensive about 
working with it today.  But I will eventually.  The complaint is it 
can't find ELspell, which I had defined in the old setup that I blew 
away, so I'll have to redefine it at some point!  For now, I'm just 
gonna delight in having searching working again!


Mark

On 9/26/2015 11:05 PM, Erick Erickson wrote:

No need to re-install Solr, just create a new core, this time it'd probably be
easiest to use the bin/solr create_core command. In the Solr
directory just type bin/solr create_core -help to see the options.

We're pretty much trying to migrate to using bin/solr for all the maintenance
we can, but as always the documentation lags the code.

Yeah, things are a bit ragged. The admin UI/core UI is really a legacy
bit of code that has _always_ been confusing, I'm hoping we can pretty
much remove it at some point since it's as trappy as it is.

Best,
Erick

On Sat, Sep 26, 2015 at 12:49 PM, Mark Fenbers <mark.fenb...@noaa.gov> wrote:

OK, a lot of dialog while I was gone for two days!  I read the whole thread,
but I'm a newbie to Solr, so some of the dialog was Greek to me.  I
understand the words, of course, but applying it so I know exactly what to
do without screwing something else up is the problem.  After all, that is
how I got into the mess in the first place.  I'm glad I have good help to
untangle the knots I've made!

I'd like to start over (option 1 below), but does this mean delete all my
config and reinstalling Solr??  Maybe that is not a bad idea, but I will at
least save off my data-config.xml as that is clearly the one thing that is
probably working right.  However, I did do quite a bit of editing that I
would have to do again. Please advise...

To be fair, I must answer Erick's question of how I created the data index
in the first place, because this might be relevant...

The bulk of the data is read from 9000+ text files, where each file was
manually typed.  Before inserting into the database, I do a little bit of
processing of the text using "sed" to delete the top few and bottom few
lines, and to substitute each single-quote character with a pair of
single-quotes (so PostgreSQL doesn't choke).  Line-feed characters are
preserved as ASCII 10 (hex 0A), but there shouldn't be (and I am not aware
of) any characters aside from what is on the keyboard.

Next, I insert it with this command:
psql -U awips -d OHRFC -c "INSERT INTO EventLogText VALUES('$postDate',
'$user', '$postDate', '$entryText', '$postCatVal');"

In case you are wondering about my table, it is defined in this way:
CREATE TABLE eventlogtext (
   posttime timestamp without time zone NOT NULL, -- Timestamp of this
entry's original posting
   username character varying(8), -- username (logname) of the original
poster
   lastmodtime timestamp without time zone, -- Last time record was altered
   logtext text, -- text of the log entry
   category integer, -- bit-wise category value
   CONSTRAINT eventlogtext_pkey PRIMARY KEY (posttime)
)

To do the indexing, I merely use /dataimport?full-import, but it knows what
to do from my data-config.xml; which is here:


 
 
 
 
 
 
 


Hope this helps!

Thanks,
Mark

On 9/24/2015 10:57 AM, Erick Erickson wrote:

Geraint:

Good Catch! I totally missed that. So all of our focus on schema.xml has
been... totally irrelevant. Now that you pointed that out, there's also
the
addition: add-unknown-fields-to-the-schema, which indicates you started
this up in "schemaless" mode.

In short, solr is trying to guess what your field types should be and
guessing wrong (again and again and again). This is the classic weakness
of
schemaless. It's great for indexing stuff fast, but if it guesses wrong
you're stuck.


So to the original problem: I'd start over and either
1> use the regular setup, not schemaless
or
2> use the _managed_ schema API to explicitly add fields and fieldTypes to
the managed schema

Best,
Erick

On Thu, Sep 24, 2015 at 2:02 AM, Duck Geraint (ext) GBJH <
geraint.d...@syngenta.com> wrote:


Okay, so maybe I'm missing something here (I'm still relatively new to
Solr myself), but am I right in thinking the following is still in your
solrconfig.xml file:


  true
  managed-schema


If so, wouldn't using a managed schema make several of your field
definitions inside the schema.xml file semi-redundant?

Regards,
Geraint


G

position of the search term

2015-09-27 Thread Mark Fenbers
For the brief period that I had spell-checking working, I noticed that 
the results record had the start/end position within the text of the 
misspelled word.  Is there anyway to get the same start/end position 
when doing a search?  I want to be able to highlight the search term in 
the text.  Default config puts  tags around the search, but I'm not 
using an HTML renderer and I don't want characters of any sort inserted 
into the text returned in the result set. rather, I just want the 
start/end position.  How do I configure that?


Mark


Re: New Project setup too clunky

2015-09-27 Thread Mark Fenbers

On 9/27/2015 12:49 PM, Alexandre Rafalovitch wrote:

Mark,

Thank you for your valuable feedback. The newbie's views are always appreciated.

Admin Admin UI command is designed for creating a collection based on
the configuration you already have. Obviously, it makes that point
somewhat less than obvious.

To create a new collection with configuration files all in place, you
can bootstrap it from a configset. Which is basically what you did
when you run "solr -e", except "-e" also populates the files and does
other tricks.

So, if you go back to the command line and run "solr" you will see a
bunch of options. The one you are looking for is "solr create_core"
which will tell you all the parameters as well as the available
configurations to bootstrap from.

I hope this helps.


Yes!  It does help!  But it took a post and a response on the user-forum for me to learn 
this!  Rather, it should be added to the "Solr Quick Start" document.
Mark




New Project setup too clunky

2015-09-26 Thread Mark Fenbers

Greetings,

Being a Solr newbie, I've run the examples in the "Solr Quick Start" 
document and got a feel for Solr's capabilities.  Now I want to move on 
and work with my own data and my own Solr server without using the 
example setup (i.e., "solr -e" options).  This is where the 
documentation dries up.  Unless I merely haven't found it.


So, I've been able to launch Solr, and it is running.  Then, I used the 
browser-based web pages ("admin UI", I think it is called) and created 
my new "core".  But it says I have to create the directory and subdirs 
first, which I did (why can't it do that for me?).  Then it complains 
about a missing solrconfig.xml, which I copied in from one of a number 
of places it is provided in the distribution.  Then, it complained about 
the schema.xml, and on and on.  I think it complained about 6 times 
before I resolved them all!  So what's the point of the Admin UI 
creating a new core for you if you have to do so much manual setup 
anyway?  Why can a simple config.bash script take care of this 
administrivia? Moreover, when it was all done complaining, I was able to 
/dataimport and index, but the searching doesn't work, so I have to 
troubleshoot... (and I've done this with the help of another thread, 
which still isn't resolved).  Had I had more clear instructions to work 
from, I might have it working long ago without bugging this user-group.


My point is that this process is way too clunky for a mature Apache 
project like Solr/Lucene.  So clunky, in fact, that I am highly 
suspicious (convinced perhaps) that I simply am missing something (or 
several things), like a document/tutorial that explains how to move on 
from the "solr -e" examples and setup Solr to work in my own 
environment.  Can someone please point me to the document(s)/tutorial(s) 
that I am missing?


Mark


Re: query parsing

2015-09-26 Thread Mark Fenbers
tance dir and
data dir aren't needed.

Upayavira

On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:

OK, this is bizarre. You'd have had to set up SolrCloud by
specifying the -zkRun command when you start Solr or the -zkHost;
highly unlikely. On the admin page there would be a "cloud" link on
the left side, I really doubt one's there.

You should have a data directory, it should be the parent of the
index and tlog directories. As of sanity check try looking at the
analysis page.
Type
a bunch of words in the left hand side indexing box and uncheck the
verbose box. As you can tell I'm grasping at straws. I'm still
puzzled why you don't have a "data" directory here, but that
shouldn't really matter. How did you create this index? I don't mean
data import handler more how did you create the core that you're
indexing to?

Best,
Erick

On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers
<mark.fenb...@noaa.gov>
wrote:


On 9/23/2015 12:30 PM, Erick Erickson wrote:


Then my next guess is you're not pointing at the index you think
you

are

when you 'rm -rf data'

Just ignore the Elall field for now I should think, although get
rid

of it

if you don't think you need it.

DIH should be irrelevant here.

So let's back up.
1> go ahead and "rm -fr data" (with Solr stopped).


I have no "data" dir.  Did you mean "index" dir?  I removed 3
index directories (2 for spelling):
cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex


2> start Solr
3> do NOT re-index.
4> look at your index via the schema-browser. Of course there
4> should

be

nothing there!


Correct!  It said "there is no term info :("


5> now kick off the DIH job and look again.


Now it shows a histogram, but most of the "terms" are long -- the
full texts of (the table.column) eventlogtext.logtext, including
the

whitespace

(with %0A used for newline characters)...  So, it appears it is
not

being

tokenized properly, correct?


Your logtext field should have only single tokens. The fact that
you

have

some very
long tokens presumably with whitespace) indicates that you aren't

really

blowing
the index away between indexing.


Well, I did this time for sure.  I verified that initially,
because it showed there was no term info until I DIH'd again.


Are you perhaps in Solr Cloud with more than one replica?


Not that I know of, but being new to Solr, there could be things
going

on

that I'm not aware of.  How can I tell?  I certainly didn't set

anything up

for solrCloud deliberately.


In that case you
might be getting the index replicated on startup assuming you
didn't blow away all replicas. If you are in SolrCloud, I'd just
delete the collection and start over, after insuring that you'd
pushed the configset up to Zookeeper.

BTW, I always look at the schema.xml file from the Solr admin
window

just

as
a sanity check in these situations.


Good idea!  But the one shown in the browser is identical to the
one

I've

been editing!  So that's not an issue.





--
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England



Syngenta Limited, Registered in England No 2710846;Registered Office :
Syngenta Limited, European Regional Centre, Priestley Road, Surrey Research
Park, Guildford, Surrey, GU2 7YH, United Kingdom

  This message may contain confidential information. If you are not the
designated recipient, please notify the sender immediately, and delete the
original and any copies. Any use of the message by you is prohibited.





query parsing

2015-09-23 Thread Mark Fenbers

When I submit this:

http://localhost:8983/solr/EventLog/select?q=deeper=json=true

then I get these (empty) results:
  {
  "responseHeader":{
"status":0,
"QTime":1,
"params":{
  "q":"deeper",
  "indent":"true",
  "wt":"json"}},
  "response":{"numFound":0,"start":0,"docs":[]
  }}

However, if I add asterisks before *and *after "deeper", like this:

http://localhost:8983/solr/EventLog/select?q=*deeper*=json=true

then I get the correct set of results (shown below), as I expect. What 
am I doing wrong that the query requires leading and trailing asterisks 
to work correctly?  If I search on existing text in the username field 
instead of the default logtext field, then I don't need to use the 
asterisks to get correct results.  Does this mean I have a problem in my 
indexing process when I used /dataimport. Or does it mean I have 
something wrong in my query?


Also, notice in the results that category, logtext, and username fields 
are returned as arrays, even though I do not include multiValued="true" 
in the schema.xml definition.  Why?  Attached are my solrconfig.xml and 
schema.xml.  Any insights would be appreciated!


thanks,
Mark

{
  "responseHeader":{
"status":0,
"QTime":9,
"params":{
  "q":"*deeper*",
  "indent":"true",
  "wt":"json"}},
  "response":{"numFound":45,"start":0,"docs":[
  {
"id":"2012-07-10 13:23:39.0",
"category":[16],
"logtext":["\nHydromet Coordination Message\nOhio River 
Forecast Center, Wilmington, OH\n923 AM EDT Tuesday, July 10, 
2012\n\nVery slow moving front has sagged down to the southernmost 
portion of the\nOhio Valley. This will keep the axis of convection along 
or south of the \nTN/KY border today and tomorrow, though some very 
light showers are \npossible in the northwest portion of the basin. On 
Thursday increased \nsoutherly flow over the Ohio Valley will begin to 
draw deeper moisture\nfarther north into the basin, but this will mainly 
be after the 48-hour\nforecast cutoff.\n\nDay 1 (8am EDT Tuesday - 8am 
EDT Wednesday):\nRain is forecast in southern Kentucky, southern West 
Virginia, middle\nTennessee and far western Virginia. Basin average 
amounts increase to the\nsouth with come areas approaching an inch. 
Light amounts less than 0.10 inch\nare expected in portions of central 
Indiana and Ohio. \n\nDay 2 (8am EDT Wednesday - 8am EDT Thursday): 
\nRain is forecast all areas south of the Ohio River as well as eastern 
\nIllinois, southern Indiana and southwest Pennsylvania. Basin average 
amounts\nincrease to the southwest with areas southwest of Nashville 
expecting \nover an inch. \n\nQPF from OHRFC, HPC, et al., can be seen 
at weather.gov/ohrfc/Forecast.php\n$$\nFor critical after-hours support, 
the OHRFC cell number is 937-725-.\nLink Crawford "],

"username":["crawford"],
"_version_":1512928764746530816},
  {
"id":"2012-07-10 17:39:09.0",
"category":[16],
"logtext":["\nHydromet Coordination Message\nOhio River 
Forecast Center, Wilmington, OH\n139 PM EDT Tuesday, July 10, 
2012\n\n18Z Discussion:\nMade some changes to the first 6-hour period of 
the QPF, but otherwise made\nno changes to the previous 
issuance.\n\nPrevious Discussion (12Z):\nVery slow moving front has 
sagged down to the southernmost portion of the\nOhio Valley. This will 
keep the axis of convection along or south of the \nTN/KY border today 
and tomorrow, though some very light showers are \npossible in the 
northwest portion of the basin. On Thursday increased \nsoutherly flow 
over the Ohio Valley will begin to draw deeper moisture\nfarther north 
into the basin, but this will mainly be after the 48-hour\nforecast 
cutoff.\n\nDay 1 (8am EDT Tuesday - 8am EDT Wednesday):\nRain is 
forecast in southern Kentucky, southern West Virginia, middle\nTennessee 
and far western Virginia. Basin average amounts increase to the\nsouth 
with come areas approaching an inch. Light amounts less than 0.10 
inch\nare expected in portions of central Indiana and Ohio. \n\nDay 2 
(8am EDT Wednesday - 8am EDT Thursday): \nRain is forecast all areas 
south of the Ohio River as well as eastern \nIllinois, southern Indiana 
and southwest Pennsylvania. Basin average amounts\nincrease to the 
southwest with areas southwest of Nashville expecting \nover an inch. 
\n\nQPF from OHRFC, HPC, et al., can be seen at 
weather.gov/ohrfc/Forecast.php\n$$\nFor critical after-hours support, 
the OHRFC cell number is 937-725-.\nLink Crawford"],

"username":["crawford"],
"_version_":1512928764769599488},
  {
"id":"2012-07-11 12:39:56.0",
"category":[16],
"logtext":["\nHydromet Coordination Message\nOhio River 
Forecast Center, Wilmington, OH\n839 AM EDT Wednesday, July 11, 
2012\n\nOHRFC QPF Discussion (12Z):\n\nDewpoints in the upper 60's and 
70's will help fuel showers and thunderstorms \nacross the southern 
third of the basin today. This 

Re: query parsing

2015-09-23 Thread Mark Fenbers
Mugeesh, I believe you are on the right path and I was eager to try out 
your suggestion.  So my schema.xml now contains this snippet (changes 
indicated by ~):


required="true" />
 ~ stored="true" required="true" />
required="true" />
required="true" />
~  stored="true" multiValued="true" />




~  
~ 
~
~
~   
~ 

but my results are the same -- that my search yields 0 results unless I 
wrap the search word with asterisks.


Alessandro, below are the results (with and without the asterisks) with 
debug turned on.  I don't know what much of the debug info means.  Is it 
giving you more clues?


http://localhost:8983/solr/EventLog/select?q=deeper=json=true=true

{
  "responseHeader":{
"status":0,
"QTime":2,
"params":{
  "q":"deeper",
  "indent":"true",
  "wt":"json",
  "debugQuery":"true"}},
  "response":{"numFound":0,"start":0,"docs":[]
  },
  "debug":{
"rawquerystring":"deeper",
"querystring":"deeper",
"parsedquery":"logtext:deeper",
"parsedquery_toString":"logtext:deeper",
"explain":{},
"QParser":"LuceneQParser",
"timing":{
  "time":1.0,
  "prepare":{
"time":0.0,
"query":{
  "time":0.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":0.0,
"query":{
  "time":0.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"debug":{
  "time":0.0}

http://localhost:8983/solr/EventLog/select?q=*deeper*=json=true=true

{
  "responseHeader":{
"status":0,
"QTime":460,
"params":{
  "q":"*deeper*",
  "indent":"true",
  "wt":"json",
  "debugQuery":"true"}},
  "response":{"numFound":45,"start":0,"docs":[
  {
"id":"2012-07-10 13:23:39.0",
"category":[16],
"logtext":["\nHydromet Coordination Message\nOhio River 
Forecast Center, Wilmington, OH\n923 AM EDT Tuesday, July 10, 
2012\n\nVery slow moving front has sagged down to the southernmost 
portion of the\nOhio Valley. This will keep the axis of convection along 
or south of the \nTN/KY border today and tomorrow, though some very 
light showers are \npossible in the northwest portion of the basin. On 
Thursday increased \nsoutherly flow over the Ohio Valley will begin to 
draw deeper moisture\nfarther north into the basin, but this will mainly 
be after the 48-hour\nforecast cutoff.\n\nDay 1 (8am EDT Tuesday - 8am 
EDT Wednesday):\nRain is forecast in southern Kentucky, southern West 
Virginia, middle\nTennessee and far western Virginia. Basin average 
amounts increase to the\nsouth with come areas approaching an inch. 
Light amounts less than 0.10 inch\nare expected in portions of central 
Indiana and Ohio. \n\nDay 2 (8am EDT Wednesday - 8am EDT Thursday): 
\nRain is forecast all areas south of the Ohio River as well as eastern 
\nIllinois, southern Indiana and southwest Pennsylvania. Basin average 
amounts\nincrease to the southwest with areas southwest of Nashville 
expecting \nover an inch. \n\nQPF from OHRFC, HPC, et al., can be seen 
at weather.gov/ohrfc/Forecast.php\n$$\nFor critical after-hours support, 
the OHRFC cell number is 937-725-.\nLink Crawford "],

"username":["crawford"],
"_version_":1512928764746530816},
  {
"id":"2012-07-10 17:39:09.0",
"category":[16],
"logtext":["\nHydromet Coordination Message\nOhio River 
Forecast Center, Wilmington, OH\n139 PM EDT Tuesday, July 10, 
2012\n\n18Z Discussion:\nMade some changes to the first 6-hour period of 
the QPF, but otherwise made\nno changes to the previous 
issuance.\n\nPrevious Discussion (12Z):\nVery slow moving front has 
sagged down to the southernmost portion of the\nOhio Valley. This will 
keep the axis of convection along or south of the \nTN/KY border today 
and tomorrow, though some very light showers are \npossible in the 
northwest portion of the basin. On Thursday increased \nsoutherly flow 
over the Ohio Valley will begin to draw deeper moisture\nfarther north 
into the basin, but this will mainly be after the 48-hour\nforecast 
cutoff.\n\nDay 1 (8am EDT Tuesday - 8am EDT Wednesday):\nRain is 
forecast in southern Kentucky, southern West Virginia, middle\nTennessee 
and far western Virginia. Basin average amounts increase to the\nsouth 
with come areas approaching an inch. Light amounts less than 0.10 
inch\nare expected in portions of central Indiana and Ohio. \n\nDay 2 
(8am EDT Wednesday - 8am EDT Thursday): \nRain is forecast all areas 
south of 

Re: query parsing

2015-09-23 Thread Mark Fenbers

On 9/23/2015 10:21 AM, Alessandro Benedetti wrote:

m so those 2 are the queries at the minute :

1) logtext:deeper
2) logtext:*deeper*

According to your schema, the log text field is of type "text_en".
This should be completely fine.
Have you ever changed your schema on run ? without re-indexing your old
docs ?
I might forget sometimes, but usually, when I make changes to 
solrconfig.xml or schema.xml, then I delete the main index and the 
spellchecker indexes, and then restart solr, then do /dataimport again.

What happens if you use your analysis tool ( both query and index time)
with the term deeper ?
Can you clarify what you want me to do here?  What do you want me to put 
in the (Index) text box and in the (Query) text box and what do I select 
in the fieldType drop-list?  When I put "deeper" into both text boxes 
and select text_en from the drop list, I get several results, but I 
don't know what the output means.


thanks,
Mark


Re: query parsing

2015-09-23 Thread Mark Fenbers

On 9/23/2015 12:30 PM, Erick Erickson wrote:

Then my next guess is you're not pointing at the index you think you are
when you 'rm -rf data'

Just ignore the Elall field for now I should think, although get rid of it
if you don't think you need it.

DIH should be irrelevant here.

So let's back up.
1> go ahead and "rm -fr data" (with Solr stopped).
I have no "data" dir.  Did you mean "index" dir?  I removed 3 index 
directories (2 for spelling):

cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex

2> start Solr
3> do NOT re-index.
4> look at your index via the schema-browser. Of course there should be
nothing there!

Correct!  It said "there is no term info :("

5> now kick off the DIH job and look again.
Now it shows a histogram, but most of the "terms" are long -- the full 
texts of (the table.column) eventlogtext.logtext, including the 
whitespace (with %0A used for newline characters)...  So, it appears it 
is not being tokenized properly, correct?

Your logtext field should have only single tokens. The fact that you have
some very
long tokens presumably with whitespace) indicates that you aren't really
blowing
the index away between indexing.
Well, I did this time for sure.  I verified that initially, because it 
showed there was no term info until I DIH'd again.

Are you perhaps in Solr Cloud with more than one replica?
Not that I know of, but being new to Solr, there could be things going 
on that I'm not aware of.  How can I tell?  I certainly didn't set 
anything up for solrCloud deliberately.

In that case you
might be getting the index replicated on startup assuming you didn't
blow away all replicas. If you are in SolrCloud, I'd just delete the
collection and
start over, after insuring that you'd pushed the configset up to Zookeeper.

BTW, I always look at the schema.xml file from the Solr admin window just as
a sanity check in these situations.
Good idea!  But the one shown in the browser is identical to the one 
I've been editing!  So that's not an issue.




Re: query parsing

2015-09-23 Thread Mark Fenbers

On 9/23/2015 11:28 AM, Erick Erickson wrote:

This is totally weird.

Don't only re-index your old docs, find the data directory and
rm -rf data (with Solr stopped) and re-index.
I pretty much do that.  The thing is: I don't have a data directory 
anywhere!  Most of my stuff is in /localapps/dev/EventLog/solr/, but I 
*do* have a /localapps/dev/EventLog/index/ directory where the main 
index resides.  I'd like to move that into /localapps/dev/EventLog/solr/ 
so that I can keep all Solr-related files under one parent dir, but I 
can't find where the configuration for that is...


Perhaps I should also share what start command I'm using (in case it is 
wrong!):


/localapps/dev/solr-5.3.0/bin/solr start -s /localapps/dev/EventLog

re: the analysis page Alessandro mentioned.
Go to the Solr admin UI (http://localhost:8983/solr). You'll
see a drop-down on the left that lets you select a core,
select the appropriate one.

Now you'll see a bunch of new choices. The "analysis" section
is what Alessandro is referencing. That shows you _exactly_ what
effects your analysis chain has at index and query time.

On the same page, you'll find "schema browser". Take a look at
your logtext field and hit the "load term info" button. You should
see a bunch of single-word tokens listed. If you see really long ones,
then your index is hosed and you should start by blowing away
the data directory
I wish I could show a screen capture!  But according to your symptoms, 
my index is hosed (I see very few single-word tokens and lots of really 
long ones.)  I have no data directory to blow away, though.  I've blown 
away /localapps/dev/EventLog/index/ before, but that has had no effect 
on the problem.


Am I indexing improperly perhaps?  I'm using /dataimport.  Here is my 
data-config.xml, which hasn't been giving me any obvious trouble.  
Import seems successful.  And I can get correct search results so long 
as I wrap my search text in asterisks...




driver="org.postgresql.Driver"/>


name="eventlogtext">
 






Because this symptom is totally explained by searching on a "string"
rather than a "text" type. But your definition is clearly a tokenized text
type so I'm mystified.

The ELall field is a red herring. The debug output shows you're searching
on the logtext field, this line is the relevant one:
"parsedquery_toString":"logtext:deeper",
Should I just get rid of "ELall"?  I only created it with the intent to 
be able to search on "fenbers" and get hits if "fenbers" occurred in 
either place, the logtext field or the username field.


thanks,
Mark



Re: write.lock

2015-09-22 Thread Mark Fenbers

Mikhail,

Yes, both the Index-based and File-based spell checkers reference the 
same index location.  My understanding is they were supposed to.  I 
didn't realize this was for writing indexes.  Rather, I thought this was 
for reading the main index.  So, I need to make 3 separate locations for 
indexes (main, index-based and File-based)?? Can I make them subdirs of 
the main index (in /localapps/dev/EventLog/index)?  Or would that mess 
up the main index?


Thanks for raising my awareness of these errors!
Mark

On 9/21/2015 5:07 PM, Mikhail Khludnev wrote:

Both of these guys below try to write spell index into the same dir. Don't they?

To make it clear, it's not possible so far.

  
   solr.IndexBasedSpellChecker
   /localapps/dev/EventLog/index



  solr.FileBasedSpellChecker
  /localapps/dev/EventLog/index

Also, can you make sure that this path doesn't lead to main index dir.


On Mon, Sep 21, 2015 at 5:13 PM, Mark Fenbers <mark.fenb...@noaa.gov> wrote:






Re: write.lock

2015-09-22 Thread Mark Fenbers
OK, I gave each of these spellcheckIndexDir tokens distinct location -- 
from each other and from the main index.  This has resolved the 
write.lock problem when I attempt a spellcheck.build!  Thanks for the help!


I looked in the new spellcheckIndexDir location and the directory is 
populated with a few files.  So it seems "spellcheck.build" worked, but 
I am still not getting any hits when I purposefully misspell a word.  
But I'll post this problem with more details in a separate post.


Mark

On 9/21/2015 5:07 PM, Mikhail Khludnev wrote:

Both of these guys below try to write spell index into the same dir. Don't they?

To make it clear, it's not possible so far.

  
   solr.IndexBasedSpellChecker
   /localapps/dev/EventLog/index



  solr.FileBasedSpellChecker
  /localapps/dev/EventLog/index

Also, can you make sure that this path doesn't lead to main index dir.


On Mon, Sep 21, 2015 at 5:13 PM, Mark Fenbers <mark.fenb...@noaa.gov> wrote:


A snippet of my solrconfig.xml is attached.  The snippet only contains the
Spell checking sections (for brevity) which should be sufficient for you to
see all the pertinent info you seek.

Thanks!
Mark


On 9/19/2015 3:29 AM, Mikhail Khludnev wrote:


Mark,

What's your solconfig.xml?

On Sat, Sep 19, 2015 at 12:34 AM, Mark Fenbers <mark.fenb...@noaa.gov>
wrote:

Greetings,

Whenever I try to build my spellcheck index
(params.set("spellcheck.build", true); or put a check in the
spellcheck.build box in the web interface) I get the following
stacktrace.
Removing the write.lock file does no good.  The message comes right back
anyway.  I read in a post that increasing writeLockTimeout would help.
It
did not help for me even increasing it to 20,000 msec.  If I don't build,
then my resultset count is always 0, i.e., empty results.  What could be
causing this?

Mark










Re: Zero Query results

2015-09-21 Thread Mark Fenbers
Ok, Erick, you provided useful info to help with my understanding. 
However, I still get zero results when I search on literal text (e.g., 
"Wednesday"), even with making changes that you suggest. However, I 
discovered that if I search on "Wednesday*" (trailing asterisk), then I 
get all the results containing Wednesday that I'm looking for!  Why 
would adding a wildcard token change the results I get back?


In my schema.xml, my customized section now looks like this, based on 
your previous message:



required="true" />
required="true" />
required="true" />


multiValued="true" />




Then I removed the data subdir, did a solr restart, and did a 
/dataimport again.  It successfully processed all 9857 documents. No 
stack traces in solr.log.  It is at this point that searching on 
Wednesday gave zero results (Boo!), but searching on Wednesday* gave 
hundreds of results. (Yay!)  My changes to schema.xml were to make 
logtext be the type "text_en".   Previously, the only line in schema.xml 
was the first one ("id"), and I changed that from type="text" to 
type="date" because it is a Timestamp object in Java and a "timestamp 
without time zone" in PostgreSQL.  But even with these changes, the 
results are the same as before.


Do you have any more ideas why searching on any literal string finds 
zero documents?


Thanks,
Mark


On 9/18/2015 10:30 PM, Erick Erickson wrote:

bq: There is no fieldType defined in my solrconfig.xml, unless you are
referring to this line:

Well, that's because you should be looking in schema.xml ;).

This line from your stacktrace file is very suspicious:
   logtext:Wednesday

It _looks_ like your logtext file is perhaps a "string" type. String
types are totally unanalyzed,
so unless the input matches _exactly_ (and by exactly mean same case,
same words, same
order, identical punctuation) you won't find the doc. Thus with a
string field type, if the doc had
"my Dog has fleas.", searching for "my" or "My" or "My dog has fleas"
or "my Dog has fleas"
would all not find the doc (this last one has no period".

You usually want one of the text types, text_en or the like. Note that
you will be a _long_ time
figuring out how all that works and affects your searches, the
admin/analysis page is definitely
your friend.

There should be a line similar to


Somewhere else there should be something like:


The fieldType is what determines how the text is handled to search,
how it's broken up
and, in essence, how searches behave.

So what Erik and Shawn were asking is those two definitions.

Do note if you've changed the definitions here, it's usually wise to
'rm -rf /data' and completely re-index from scratch.

Best,
Erick



Re: Zero Query results

2015-09-21 Thread Mark Fenbers
You were right about finding only the Wednesday occurrences at the 
beginning of the line.  But attached (if it works) is a screen capture 
of my admin UI.  But unlike your suspicion, the index text is being 
parsed properly, it appears.  So I'm uncertain where this leads me.


Also attached is the pertinent schema.xml snippet you asked for.

The logtext column in my table contains merely keyboarded text, with the 
infrequent exception that I add a \uFFFC as a placeholder for images.  
So, should I be using something besides text_en as the fieldType?


Thanks,
Mark

On 9/21/2015 12:12 PM, Erick Erickson wrote:

bq: However, I discovered that if I search on "Wednesday*" (trailing
asterisk), then I get all the results containing Wednesday that I'm
looking for!

This almost always means you're not searching on the field you think
you're searching on and/or the field isn't being analyzed as you think
(i.e. the fieldType isn't what you expect). If you're really searching
on a fieldType of text_en (and you haven't changed the definition),
then there's something very weird here. FieldTypes are totally
mutable, they are composed of various analysis chains that you (or
someone else) can freely alter, so seeing the  definition that
references a type="text_en" is suggestive but not definitive.

I'm going to further guess that when you search on "Wednesday*", all
the matches are at the beginning of the line, and you find docs where
the field has "Wednesday, September" but not "The party was on
Wednesday".

So let's see the  associated with the logtext field. Plus,
the results of adding =true to the query.

But you can get a lot of info a lot faster if you go to the admin UI
screen, select the proper core from the drop-down on the left sied and
go to the "analysis" section. Pick the field (or field type), enter
some text and hit analyze (or uncheck the "verbose" box, that's
largely uninteresting info at this level). That'll show you exactly
how the input document is parsed, exactly how the query is parsed etc.
And be sure to enter something like
"september first was a Wednesday" in the left-hand (index) box, then
just "Wednesday" in the right hand (query) side. My bet: You'll see on
the index side that the input is not broken up, not transformed, etc.

Best,
Erick


  





	

	

  
  




	

	

  




Re: write.lock

2015-09-21 Thread Mark Fenbers
A snippet of my solrconfig.xml is attached.  The snippet only contains 
the Spell checking sections (for brevity) which should be sufficient for 
you to see all the pertinent info you seek.


Thanks!
Mark

On 9/19/2015 3:29 AM, Mikhail Khludnev wrote:

Mark,

What's your solconfig.xml?

On Sat, Sep 19, 2015 at 12:34 AM, Mark Fenbers <mark.fenb...@noaa.gov>
wrote:


Greetings,

Whenever I try to build my spellcheck index
(params.set("spellcheck.build", true); or put a check in the
spellcheck.build box in the web interface) I get the following stacktrace.
Removing the write.lock file does no good.  The message comes right back
anyway.  I read in a post that increasing writeLockTimeout would help.  It
did not help for me even increasing it to 20,000 msec.  If I don't build,
then my resultset count is always 0, i.e., empty results.  What could be
causing this?

Mark






 
  

text_en





  index
  logtext
  
  solr.IndexBasedSpellChecker
  /localapps/dev/EventLog/index
  true  
  
  
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  




  wordbreak
  solr.WordBreakSolrSpellChecker
  logtext
  true
  true
  10










   
 solr.FileBasedSpellChecker
logtext 
FileDict
 /usr/share/dict/words
 UTF-8
 /localapps/dev/EventLog/index
  

  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  
   
  
  

  
  

  


  FileDict

  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  




Zero Query results

2015-09-18 Thread Mark Fenbers

Greetings!

Using the browser interface to run a query on my indexed data, 
specifying "q=logtext:*" gives me all 9800+ documents indexed -- as 
expected.  But if I specify something like "q=logtext:Sunday", then I 
get zero results even though ~1000 documents contain the word Sunday.  
So I'm puzzled as to why this is not working for specific words.


This was working (i.e., returning ~1000 documents containing Sunday), 
until I began working on adding a spell-checking capability.  Now both 
spell-checking or searching gives me zero results, and I don't know what 
I could have done to break searching capabilities in the process.  
Again, searching is not completely broken because it will return all the 
documents with * as the token.


thanks,
Mark


Re: Zero Query results

2015-09-18 Thread Mark Fenbers

On 9/18/2015 8:33 PM, Shawn Heisey wrote:


The "field:*" syntax is something you should not get in the habit of
using.  It is a wildcard search.  What this does under the covers is
looks up all the possible terms in that field across the entire index,
and constructs a Lucene query that actually includes all those terms.
If you execute a search like this on a field that has millions or
billions of terms, Solr will find them all.  It will use a ton of memory
and be quite slow.
Yes.  I only specified * to see if it would return ANY results, because 
searching on a fixed string does not.

For the problem with "Sunday":

What fieldType is used for "logtext"?
There is no fieldType defined in my solrconfig.xml, unless you are 
referring to this line:

text_general

Should I have one??  Defined where?

If you are talking about the fieldType of this column in the PostgreSQL 
database, it is "Text"...

We'll also need the full
definition of that fieldType,

Not sure what you want here...  (Herein may lie my problem...)

and an example of the full text indexed
into that field for a document that should match, but doesn't.
Attached are 2 files: one where I used "*" and so it returned all 
documents, but I only include the top 2 in my attachment.  You can see 
the first document contains the word "Wednesday".  So I replaced "*" 
with "Wednesday" and ran the query again.  This is the second 
attachment, showing zero results.   The "logtext" field is what I search 
on, and this field type is plain text, although I don't think I 
specifically declare this anywhere.


Both attachments were run with debug on.

Thanks,
Mark


You should also check the "debugQuery" box on the Query tab, and give us
the "rawquerystring" and "parsedquery" values from the debug.

Thanks,
Shawn








  0
  1
  
Wednesday
true
xml
true
1442626657279
  




  Wednesday
  Wednesday
  logtext:Wednesday
  logtext:Wednesday
  
  LuceneQParser
  
1.0

  0.0
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  


  0.0
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  

  







  0
  2
  
*
true
xml
true
1442626473675
  


  
2007-03-21 12:43:25.0

  16


  ZCZC CRWHCMTIR CES
TTAA00 KTIR DDHHMM

...FOR INTERGOVERNMENTAL AGENCY USE ONLY...

Hydromet Coordination Message
Ohio River Forecast Center, Wilmington, OH
843 AM EDT Wednesday, March 21, 2007

To:   OHRFC WFOs
From: OHRFC

OHRFC QPF Discussion:
Tricky forecast this morning with the warm front moving northward across the 
Ohio 
Valley. Expect this feature to produce scattered showers, mainly across the 
northern 
and western basin. The conveyor of warm, moist air will skirt the western 
basin, but 
most of the really intense precipitation should remain west of the area through 
12Z 
Thursday. The ensuing cold front will bring the heavy rainfall into the Ohio 
Valley 
Thursday into Thursday night. The front then stalls parelleling the Ohio River, 
and 
another wave slides across late Friday, producing another round of moderate to 
heavy precipitation. The front finally washes out by the weekend bringing a
respite from the unsettled weather. 

OHRFC QPF Amounts through 12Z Thursday:
Rainfall will mainly be confined to the northern and western tiers of the
basin. Basin average amounts will generally range from a few hundredths up to 
about a quarter inch, although isolated pockets withing the Maumee and Wabash 
watersheds could see over a quarter inch. Elsewhere, a few showers over the 
Appalachians will produce a few hundredths for the Kanawha. 

$$
Myers



  myers

1512693012333854720
  
2008-04-01 22:50:58.0

  16


  ZCZC CRWHCMTIR CES
TTAA00 KTIR DDHHMM

...FOR INTERGOVERNMENTAL AGENCY USE ONLY...

Hydromet Coordination Message
Ohio River Forecast Center, Wilmington, OH
650 PM EDT Tuesday, April 1, 2008

To:   OHRFC WFOs
From: OHRFC

***The OHRFC will close at the usual time of 10PM this evening***

If there are any hydrologic concerns please call as soon as possible to allow
for sufficient time to run models and issue forecasts.

OHRFC QPF Discussion:
A cold front is currently along the eastern and southern edge of the basin at
this time.  Some light rain showers continue along the front and across the
north with the upper level system.  All amounts are light and will continue to
decrease as high pressure builds into the basin for a dry Wednesday for the
entire region.

OHRFC QPF Amounts:
A few hundredths of an inch across portions of middle TN and across western VA
into NC.  Elsewhere, little or no rainfall is expected.

$$
JEH




Re: Headscratcher 2 of 2

2015-09-18 Thread Mark Fenbers
OK, I understand now!  To view the results before going much farther, I 
simply did a "System.err.println(queryresponse);" which printed the 
results in a JSON-like format.  Instead, I need to use the methods of 
the queryresponse object to view my output.  Apparently, the 
queryreponse.toString() is what is formatting in JSON...  Doink!!


Thanks for the nudge!

Mark

On 9/18/2015 6:15 PM, Upayavira wrote:

What URL are you posting to? Why do you want to use JSON or XML from
SolrJ, which is best using javabin anyway?

Get it right via a URL first, then try to port it over to SolrJ. Then,
look in the Solr logs and you'll see the params that were passed over to
Solr - maybe you'll see what's getting set wrong. Watch for more than
one wt=, I bet Solr is always honouring the first.

Upayavira


write.lock

2015-09-18 Thread Mark Fenbers

Greetings,

Whenever I try to build my spellcheck index 
(params.set("spellcheck.build", true); or put a check in the 
spellcheck.build box in the web interface) I get the following 
stacktrace.  Removing the write.lock file does no good.  The message 
comes right back anyway.  I read in a post that increasing 
writeLockTimeout would help.  It did not help for me even increasing it 
to 20,000 msec.  If I don't build, then my resultset count is always 0, 
i.e., empty results.  What could be causing this?


Mark

indent
debugQuery
dismax
edismax
hl
facet
spatial
spellcheck
spellcheck.build
spellcheck.reload
spellcheck.q
spellcheck.dictionary
spellcheck.count
spellcheck.onlyMorePopular
spellcheck.extendedResults
spellcheck.collate
spellcheck.maxCollations
spellcheck.maxCollationTries
spellcheck.accuracy
http://localhost:8983/solr/EventLog/ELspell?df=logtext=xml=true=true=true=Sunday=true





  500
  42


  Lock held by this virtual machine: 
/localapps/dev/EventLog/index/write.lock
  org.apache.lucene.store.LockObtainFailedException: Lock 
held by this virtual machine: /localapps/dev/EventLog/index/write.lock
at 
org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:127)
at 
org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41)
at 
org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:775)
at 
org.apache.lucene.search.spell.SpellChecker.clearIndex(SpellChecker.java:455)
at 
org.apache.solr.spelling.FileBasedSpellChecker.build(FileBasedSpellChecker.java:70)
at 
org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:124)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:251)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

  500






Headscratcher 1 of 2

2015-09-18 Thread Mark Fenbers

Greetings,

Using an Index-based spell-checker, I get some results, but not what I'm 
looking for.  Using a File-based checker, I never get any results, but 
no errors either.  I've trimmed down my configuration to only use one 
spell-checker and named it "default", but still empty results on my text 
that is chock full of misspelled words. Any ideas?  Attached is my 
solrconfig snippet:

Mark


  

text_general





  index
  logtext
  
  solr.IndexBasedSpellChecker
  .
  true  
  
  
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  




  wordbreak
  solr.WordBreakSolrSpellChecker
  logtext
  true
  true
  10










   
 solr.FileBasedSpellChecker
logtext 
default
 /usr/share/dict/words
 UTF-8
 .
   
  
  

  
  

  


  default

  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  



Headscratcher 2 of 2

2015-09-18 Thread Mark Fenbers

Greetings!

I cannot seem to configure the spell-checker to return results in XML 
instead of JSON.  I tried programmatically, as in ...


params.set("wt", "xml");
solr.query(params);

... and I tried through the solrconfig.xml.  My problem here is that it 
is not exactly clear (because I've seen no example doing this) where the 
spec


xml

is supposed to go (although I tried it in a number of places that made 
intuitive sense to me).  In which tag(s) does it go, exactly?  Or do I 
even have the spec syntax right?


Nothing I tried is working -- I get back JSON no matter what I try. Can 
you offer specific advice?


Mark



Re: Google didn't help on this one!

2015-09-16 Thread Mark Fenbers

On 9/15/2015 6:49 PM, Shawn Heisey wrote:


>From the information we have, we cannot tell if this is a problem
request or not.  Do you have a core/collection named "EventLog" on your
Solr server?  It will be case sensitive.  If you do, does that config
have a handler named "spellCheckCompRH" in it (also case sensitive)?

The nc output lets us see everything that your client sent to Solr, so I
have built a test URL for you based on that info.

Try sending the following URL from a browser or a curl command.  If I've
gotten the host wrong, go ahead and replace it with the correct value.
You'll probably be able to see any errors right in the browser or curl
output.  Hopefully this will help you figure out what's happening.  Also
look in your Solr server's logfile for error messages.

http://dell9-tir:8983/solr/EventLog/spellCheckCompRH?qt=%2FspellCheckCompRH=Some+more+text+wit+some+missspelled+wordz.=on=true=json=true

I notice that you have "spellcheck.build=true" in that URL.  You
probably don't want to do this on every request, assuming that your
spellcheck dictionary even requires building.

Thanks,
Shawn
It wasn't really a problem request, but a follow-up to those who took 
the time to help me.  However, since this error has returned, it is now 
a problem request!  ;-)


I am aware that "spellcheck.build=true" is expensive, but since I 
haven't had my first success yet with spell-checking, I figured it 
wouldn't hurt to have it in there for now.


I ran the URL you gave verbatim (because your assumptions were correct), 
but I got the stacktrace shown below.  This is particularly puzzling 
because I can find nowhere in my code or configuration where I am 
specifying a float value where I shouldn't be.  My solrconfig.xml and 
schema.xml are posted in another thread having a subject "Moving on to 
spelling" if that helps you help me.


Thanks,
Mark

HTTP ERROR 500

Problem accessing /solr/EventLog/spellCheckCompRH. Reason:

{msg=SolrCore 'EventLog' is not available due to init failure: 
java.lang.Float cannot be cast to 
java.lang.String,trace=org.apache.solr.common.SolrException: SolrCore 
'EventLog' is not available due to init failure: java.lang.Float cannot 
be cast to java.lang.String

at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:978)
at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:250)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:417)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)

at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)

at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: java.lang.Float cannot 
be cast to java.lang.String

at org.apache.solr.core.SolrCore.(SolrCore.java:820)
at org.apache.solr.core.SolrCore.(SolrCore.java:659)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:727)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:447)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:438)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
at 

Re: Google didn't help on this one!

2015-09-16 Thread Mark Fenbers

On 9/16/2015 5:24 AM, Alessandro Benedetti wrote:

As a reference I always suggest :
https://cwiki.apache.org/confluence/display/solr/Spell+Checking


I read this doc and have found it moderately helpful to my current 
problem.  But I have at least one question about it, especially given 
that my current error is a ClassCastException from an unknown origin.  
Let's look at some lines I copied from the document:


true
false

and

20
2

Why are some parameters specified as strings ( tag) even though 
they are integers or boolean, and others specificed as integer ( 
tags) and boolean ( tags)?  Does it matter which way I specify them?


thanks,
Mark


Re: Google didn't help on this one!

2015-09-16 Thread Mark Fenbers
Indeed!   should be changed to  in the "Spell Checking" 
document 
(https://cwiki.apache.org/confluence/display/solr/Spell+Checking) and in 
all the baseline solrconfig.xml files provided in the distribution.  In 
addition, ' internal' should be 
removed/changed in the same document and same solrconfig.xml files 
because "internal" is not defined in AbstractLuceneSpellchecker.java!  
Once I edited these two problems in my own solrconfig.xml, the 
stacktrace errors went away!!  Yay!


But I'm not out of the woods yet!  I'll resume later, after our system 
upgrade today.


Thanks!
Mark

On 9/16/2015 8:03 AM, Mikhail Khludnev wrote:

https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/spelling/AbstractLuceneSpellChecker.java#L97
this mean that

0.5

should be replaced to

0.5








Re: Google didn't help on this one!

2015-09-16 Thread Mark Fenbers
Ah ha!!  Exactly my point in the post I sent about the same time you did 
(same Thread)!

Mark

On 9/16/2015 8:03 AM, Mikhail Khludnev wrote:

https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/spelling/AbstractLuceneSpellChecker.java#L97
this mean that

0.5

should be replaced to

0.5



On Wed, Sep 16, 2015 at 2:32 PM, Upayavira <u...@odoko.co.uk> wrote:


See this:

Caused by: java.lang.ClassCastException: java.lang.Float cannot be cast
to java.lang.String
  at

org.apache.solr.spelling.AbstractLuceneSpellChecker.init(AbstractLuceneSpellChecker.java:97)

AbstractLuceneSpellChecker is expecting a string, but getting a float.
Can you paste here the config (in solrconfig.xml) for your spellchecker?

Also, a simple way to get spell checking started is to look at the
/browse example that comes with the techproducts sample configs. It has
spellchecking already working, so starting there can be a way to get
something going easily.

Upayavira

On Wed, Sep 16, 2015, at 12:22 PM, Mark Fenbers wrote:

On 9/15/2015 6:49 PM, Shawn Heisey wrote:

>From the information we have, we cannot tell if this is a problem
request or not.  Do you have a core/collection named "EventLog" on your
Solr server?  It will be case sensitive.  If you do, does that config
have a handler named "spellCheckCompRH" in it (also case sensitive)?

The nc output lets us see everything that your client sent to Solr, so

I

have built a test URL for you based on that info.

Try sending the following URL from a browser or a curl command.  If

I've

gotten the host wrong, go ahead and replace it with the correct value.
You'll probably be able to see any errors right in the browser or curl
output.  Hopefully this will help you figure out what's happening.

Also

look in your Solr server's logfile for error messages.



http://dell9-tir:8983/solr/EventLog/spellCheckCompRH?qt=%2FspellCheckCompRH=Some+more+text+wit+some+missspelled+wordz.=on=true=json=true

I notice that you have "spellcheck.build=true" in that URL.  You
probably don't want to do this on every request, assuming that your
spellcheck dictionary even requires building.

Thanks,
Shawn

It wasn't really a problem request, but a follow-up to those who took
the time to help me.  However, since this error has returned, it is now
a problem request!  ;-)

I am aware that "spellcheck.build=true" is expensive, but since I
haven't had my first success yet with spell-checking, I figured it
wouldn't hurt to have it in there for now.

I ran the URL you gave verbatim (because your assumptions were correct),
but I got the stacktrace shown below.  This is particularly puzzling
because I can find nowhere in my code or configuration where I am
specifying a float value where I shouldn't be.  My solrconfig.xml and
schema.xml are posted in another thread having a subject "Moving on to
spelling" if that helps you help me.

Thanks,
Mark

HTTP ERROR 500

Problem accessing /solr/EventLog/spellCheckCompRH. Reason:

  {msg=SolrCore 'EventLog' is not available due to init failure:
java.lang.Float cannot be cast to
java.lang.String,trace=org.apache.solr.common.SolrException: SolrCore
'EventLog' is not available due to init failure: java.lang.Float cannot
be cast to java.lang.String
  at
  org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:978)
  at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:250)
  at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:417)
  at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)

  at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)

  at


org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)

  at


org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)

  at


org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)

  at


org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)

  at


org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)

  at


org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)

  at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
  at


org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)

  at


org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)

  at


org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)

  at


org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)

  at


org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)

  at


org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)


Good ol' Websters

2015-09-16 Thread Mark Fenbers

Greetings!

Mikhail Khludnev, in his post to the thread "Google didn't help on this 
one!", has pointed out one bug in Solr-5.3.0, and I was able to uncover 
another one (which I wrote about in the same thread). Therefore, and 
thankfully, I've been able to get past my configuration issues.


So now I've been able to try spell-checking on my local configuration 
for the first time.  My query string was "Anothr text containig 
missspelled wordz."  Of these 5 words, the only correct one is "text"; 
the others are not spelled correctly.  Yet my query results gave me only 
two suggestions ("test" and "Test") and they were for the one word that 
*is* spelled correctly!  This is the polar opposite of what I expected.


I understand why, though.   Because I am using 
solr.IndexBasedSpellchecker, so my data's index used as the dictionary.  
This is not entirely a bad thing, because we use a lot of technical 
terms and industry accepted spellings (like "gage" instead of "gauge").  
But for the most part, I want to use a Webster-like dictionary against 
which to check my spelling. Does this mean I need to find an English 
dictionary file on the web and add Solr's FileBasedSpellChecker??  Or 
does Solr already have what I need and it's a matter of me learning how 
to configure that properly??  (If so, how?)


Mark


Re: Google didn't help on this one!

2015-09-15 Thread Mark Fenbers
So I ran "nc -l 8983" then restarted solr, and then ran my app with my 
query.   nc reported the following:


GET 
/solr/EventLog/spellCheckCompRH?qt=%2FspellCheckCompRH=Some+more+text+wit+some+missspelled+wordz.=on=true=javabin=2 
HTTP/1.1

User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0
Host: dell9-tir:8983
Connection: Keep-Alive

I'm not sure if this is good, or indicates an error of any kind.

Anyway, when I ran my app again, I got a completely different error, 
although I didn't change anything!  So, I guess I get to move on from 
this and see what other hurdles I run into!


Thanks for the help!
Mark


On 9/15/2015 11:13 AM, Yonik Seeley wrote:

On Tue, Sep 15, 2015 at 11:08 AM, Mark Fenbers <mark.fenb...@noaa.gov> wrote:

I'm working with the spellcheck component of Solr for the first time.  I'm
using SolrJ, and when I submit my query, I get a Solr Exception:  "Expected
mime type octet/stream but got text/html."

What in the world is this telling me??

You're probably hitting an endpoint on Solr that doesn't exist and
getting an HTML 404 error page rather than the response (which would
be in binary by default).

An easy way to see what SolrJ is sending is to kill your solr server, then do

nc -l 8983

And then run your SolrJ program to see what it sends... if it look OK,
then try sending the request from curl to Solr.

-Yonik





Google didn't help on this one!

2015-09-15 Thread Mark Fenbers
I'm working with the spellcheck component of Solr for the first time.  
I'm using SolrJ, and when I submit my query, I get a Solr Exception:  
"Expected mime type octet/stream but got text/html."


What in the world is this telling me??  The query object I submitted is 
an entire sentence, not a single word.  Would that matter?


Mark


Moving on to spelling

2015-09-15 Thread Mark Fenbers

Greetings,

In my app, I've successfully implemented full-text searching 
capabilities on a database using Solr.  Now I'm ready to move on to 
using Solr's spell check/suggest capability.  Having succeeded in 
searching, I figured spell-checking would be an easier step. Well, not 
for me!


I'm rather tangled in my configuration, and despite reading several 
documents, and looking at several examples, I don't feel like I'm making 
progress.  I'm about to give up, but I'm attaching stripped-down 
versions of my solrconfig.xml and schema.xml files to see if any really 
smart folks can spot what's wrong.  I've also attached the relative code 
snippet that triggers the error.  I'm so lost that I'm sure I have more 
than one thing configured improperly.


The most recent error is "Expected mime type is octet/stream but got 
text/html", but I've had a variety of errors and they seem to change 
each time I try something different.


My app allows end users to type text (using a StyledText widget) and 
post it to the database.  I'd like to spell-check the text before it is 
posted to the database and indexed for searching.  In my snippet, the 
variable "text" is what the end-user typed. "eventlogtext.logtext" is 
the table.column this text is stored in.


I'm using Solr 5.3.0.  I'm cross-eyed!  Any tips would be appreciated.

Mark




   
   


   
   
   
   
   
   

   
   
   
   

   

   
   
   
   
   
   
   
   
   
   
   
   

   
   
   
   

   

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   

   
   
   

   
   
   
   
   

   

   
   

   
 posttime

   
   
   
   
   
   



   
   
   
   
   
   
   
   

   
   
















  

  



  




  
  




  



  



	


  
  




	


  



  






  
  







  



  








  



  




  
  




  



  




  



  


  



  


  



  


  



  
	
  
  
	
  


  
	
  
  
	
  













  5.3.0
  ${solr.data.dir:}
  
  

  
true
managed-schema
  
  
${solr.lock.type:native}
  
  
  
  
  

  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}



  ${solr.autoCommit.maxTime:15000}
  false



  ${solr.autoSoftCommit.maxTime:-1}


  

  
1024






true
20

200


  
  


  
  

false

2
  
  
  



  

  

  explicit
  10
  logtext 
  
  default
  wordbreak
  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  

  

  explicit
  json
  true

  
  
  

/localapps/dev/EventLog/solr/conf/data-config.xml

  

  

  explicit

  

  

  _text_

  

  

  add-unknown-fields-to-the-schema

  

  

  true
  ignored_
  _text_

  

  

  {!xport}
  xsort
  false



  query

  

  

  json
  false

  

  

  

  

  explicit
  true

  

  

text_general


  default
  logtext
  
  solr.IndexBasedSpellChecker
  
  internal
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  




  wordbreak
  solr.WordBreakSolrSpellChecker
  name
  true
  true
  10


  

  
  

  default
  wordbreak
  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  

  

  

  true


  tvComponent

  

  

  
  

  true
  false


  terms

  


  
string
elevate.xml
  

  

  explicit


  elevator

  

  

  

  100

  

  

  
  70
  
  0.5
  
  [-\w ,/\n\]{20,200}

  

  

  
  

  

  

  

  

  

  
  

  

  
  

  

  

  10
  .,!? 

  

 

Spell checking: What is left to the programmer?

2015-09-15 Thread Mark Fenbers

Greetings!

My Java app, using SolrJ, now successfully does searches. I've used the 
web interface to do a full-text indexing and for each new entry added 
through my app, I have it add to this index.


But now I want to use SolrJ to also do spell checking.  I have read 
several documents on this and examined a couple of Java examples, but 
one main question still persists.  Let's first assume that I have my 
configuration XML files set up correctly and I can spell-check a word 
through the web interface using something like 
.../spell?q=missspelled=on.  Assume also that the end user 
has typed in a paragraph and is about to submit the text.  In the 
current implementation of my software, using SolorJ API, the text will 
get parsed into words and the words will be added to the search index.  
For spell-checking, however; I am puzzled.


Is it up to me, the programmer, to parse the text into individual words 
and determine which words are misspelled, then run the query on a 
misspelled word to get a list of suggestions for that misspelled word??  
Or does Solr itself parse the text string into words and run a query on 
every word, thus indicating which words are misspelled by the non-zero 
list of suggestions?  Or is there a third option I haven't thought of 
(like, spell-check as I type)??


I'm just trying to picture the behavior in my head so I know what 
programming approach to take.  Thanks for the help!


Mark


Re: Bug or Operator Error?

2015-09-11 Thread Mark Fenbers
Additional experimenting lead me to the discovery that /dataimport does 
*not* index words with a preceding %20 (a URL-encoded space), or in fact 
*any* preceding %xx encoding.  I can probably replace each %20 with a 
'+' in each record of my database -- the dataimporter/indexer doesn't 
sneeze at those -- but using some sort of encoding is important for 
certain characters such as double and single quotes, because many 
non-alphanumeric characters have special meanings to the shell and/or 
PostgreSQL and need to be escaped.


So now that I know what the issue is, I need to find a work-around. Does 
Solr have any baseline processors that will handle the URL-encoding?  
Being new to Solr, I'm not sure I have the skill to write my own.  Or, 
is there another kind of encoding I can use that Solr doesn't adversely 
react to??


Mark

On 9/11/2015 12:11 PM, Erick Erickson wrote:

Several ideas, all shots in the dark because to analyze this we
need the schema definitions and the result of your query with
=true added. In particular you'll see the "parsed query"
section near the bottom, and often the parsed query isn't
quite what you think it is. In particular this is often the issue:
you query q=Drzal. this translates into q=default_search_field:Drazl
where default_search_field is the "df" parameter in your search
handler ("query" or "select" in solrconfig.xml).

Next most frequent thing: Your analysis chain does things you're
not expecting. Simple example is whether the analysis lower-cases
or not. For this kind of problem, the Admin UI>>core>>analysis page
is _really_ your friend.

Best,
Erick



Bug or Operator Error?

2015-09-11 Thread Mark Fenbers

Greetings!

So, I've created my first index and am able to search programmatically 
(through SolrJ) and through the Web interface. (Yay!)  I get non-empty 
results for my searches!


My index was built from database records using 
/dataimport?command=full-import.  I have 9936 records in the table to be 
indexed and the import status indicated it processed all 9936.  However, 
my searches only pull up a subset of the records that I know to contain 
a word.  For example, I know that there are hundreds of records 
containing the word "Friday", yet my results for my "Friday" query only 
contain 17 records (documents) in the Web interface, and only 10 records 
from the SolrJ query.


I figure I must be doing something wrong in my query, or have somehow 
indexed improperly.  This might be a clue: My main text field in the 
database table is URL-encoded.  I wouldn't think that would matter, though.


Another example... In one of the documents returned by the "Friday" 
query results, I noticed in the text the name of a co-worker "Drzal".  
So, I searched on "Drzal" and my results came up with 0 documents.   (!?)


Any ideas where I went wrong??
Mark






Re: ghostly config issues

2015-09-10 Thread Mark Fenbers

On 9/7/2015 4:52 PM, Shawn Heisey wrote:


The only files that should be in server/lib is jetty and servlet jars.
The only files that should be in server/lib/ext is logging jars (slf4j,
log4j, etc).

In the server/lib directory on Solr 5.3.0:

ext/
javax.servlet-api-3.1.0.jar
jetty-continuation-9.2.11.v20150529.jar
jetty-deploy-9.2.11.v20150529.jar
jetty-http-9.2.11.v20150529.jar
jetty-io-9.2.11.v20150529.jar
jetty-jmx-9.2.11.v20150529.jar
jetty-rewrite-9.2.11.v20150529.jar
jetty-security-9.2.11.v20150529.jar
jetty-server-9.2.11.v20150529.jar
jetty-servlet-9.2.11.v20150529.jar
jetty-servlets-9.2.11.v20150529.jar
jetty-util-9.2.11.v20150529.jar
jetty-webapp-9.2.11.v20150529.jar
jetty-xml-9.2.11.v20150529.jar

In the server/lib/ext directory on Solr 5.3.0:

jcl-over-slf4j-1.7.7.jar
jul-to-slf4j-1.7.7.jar
log4j-1.2.17.jar
slf4j-api-1.7.7.jar
slf4j-log4j12-1.7.7.jar


Excellent!!  Based on this info, I decided to blow away the Solr 
installation and reinstall from the tarball file.  After "tar -xzvf", I 
created a "lib" subdir under /localapps/dev/EventLog and copied my 
postgres jar and the dist/dataImportHandler jar into the "lib".  I 
restarted solr and "Viola!"  All works as designed!  It even indexed my 
entire database on the first try of a full-import! Woohooo!


Thanks for your help.  I would have abandoned this project without your 
persistence.


Mark


Re: ghostly config issues

2015-09-07 Thread Mark Fenbers

On 9/6/2015 4:25 PM, Shawn Heisey wrote:


If we assume that it cannot be a problem with multiple jar versions,
which sounds pretty reasonable, then I think SOLR-6188 is probably to blame.

https://issues.apache.org/jira/browse/SOLR-6188

I think you should try this as a troubleshooting step:  Rename that lib
directory where you have the jars to something like libtest and
add/change the sharedLib setting in your solr.xml to libtest.  The
following line should do it:

   libtest

See this wiki page for more information about solr.xml:

https://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond

If this troubleshooting step fixes the problem, then I think it's
definitely SOLR-6188, and you have a viable workaround that should
continue to work even after SOLR-6188 is fixed.

Thanks,
Shawn


I did as you prescribed.  Being my solr.xml did not have a "sharedLib" 
line in it,  I added


libtest

between the  and the  tags because I'm not running in 
cloud mode -- at least, not yet.


I also renamed /server/lib to /server/libtest, 
then restarted solr.  This time, solr did not start at all and nothing 
was written to solr.log.  In solr.xml, I also tried the following (one 
at a time):


server/libtest name="sharedLib">/localapps/dev/solr-5.3.0/server/libtest name="sharedLib">${sharedLib:libtest} ...all with the same 
no-start results. Instead of removing the sharedLib line from solr.xml, 
I left it like this: ${sharedLib:} and 
renamed my libtest back to lib Doing so allowed Solr to restart, albeit 
with the return of DataImportHandler error, as before.


So does this tell us that we are barking up the wrong tree??  Does this 
mean it *is* a multiple jar version problem?  If so, there may be 
multiple jars (caused by my copying things around to try to get this 
working, and my lack of knowledge of where all the places are that it 
looks for jars), but they would be the same versions.


Unfortunately, the piece of Solr that is not working for me 
(DataImportHandler) is the very piece I need for my project. :-((


Mark



Re: ghostly config issues

2015-09-06 Thread Mark Fenbers

On 9/5/2015 10:40 PM, Shawn Heisey wrote:

Your solr home is /localapps/dev/EventLog ... Solr automatically loads
any jar found in the lib directory in the solr home, so it is attempting
to use /localapps/dev/EventLog/lib for the classloader.

For the other things you noticed, I believe I know why that is happening
too.

This SHOULD be the structure of a core instanceDir, if "collection1" is
the name of that directory.  This is highly simplified and missing
things you would likely find in the directory structure:

collection1/
|-core.properties
|-conf/
|--solrconfig.xml
|--schema.xml
|--stopwords.txt
|-data/
|--index/
|---[Lucene index files go here]

The directory with the core.properties file is the instanceDir.  The
instanceDir is supposed to contain a conf directory and a data directory.

It appears that you have this of structure in your solr home (leaving a
lot of things out on this one):

solr/
|-conf/
|--core.properties
|--solrconfig.xml
|--schema.xml

This is why it is looking in a path that has "conf" twice -- it is going
to the instanceDir (where it found core.properties) and assuming that it
will find a conf directory there.  The conf directory is not supposed to
be the instanceDir.

Thanks,
Shawn


Yes, indeed.  Moving core.properties up to the parent directory did the 
trick.  I also created a lib subdir and copied the postgres jar file and 
the Solr import handlers into it -- according to the advice from Kevin 
Lee responding to my "Config error mystery" post.  This brought me back 
to my original problem, which is shown in the log data below (java stack 
trace).  Perhaps is it an issue with multiple versions of the same jar 
as you suggested in your response to "Config error mystery", but the log 
(below) does not seem to indicate that it has found a duplicate jar.   
Though Solr is running, I am not able to create an index from my 
database data.


What do you make of the information in the log file?

Mark

2015-09-06 11:02:30.674 INFO  (main) [   ] o.e.j.u.log Logging 
initialized @853ms
2015-09-06 11:02:31.075 INFO  (main) [   ] o.e.j.s.Server 
jetty-9.2.11.v20150529
2015-09-06 11:02:31.110 WARN  (main) [   ] o.e.j.s.h.RequestLogHandler 
!RequestLog
2015-09-06 11:02:31.115 INFO  (main) [   ] o.e.j.d.p.ScanningAppProvider 
Deployment monitor [file:/localapps/dev/solr-5.3.0/server/contexts/] at 
interval 0
2015-09-06 11:02:32.261 INFO  (main) [   ] 
o.e.j.w.StandardDescriptorProcessor NO JSP Support for /solr, did not 
find org.apache.jasper.servlet.JspServlet
2015-09-06 11:02:32.290 WARN  (main) [   ] o.e.j.s.SecurityHandler 
ServletContext@o.e.j.w.WebAppContext@27cd61b{/solr,file:/localapps/dev/solr-5.3.0/server/solr-webapp/webapp/,STARTING}{/localapps/dev/solr-5.3.0/server/solr-webapp/webapp} 
has uncovered http methods for path: /
2015-09-06 11:02:32.308 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter 
SolrDispatchFilter.init(): WebAppClassLoader=365733477@15cca665
2015-09-06 11:02:32.343 INFO  (main) [   ] o.a.s.c.SolrResourceLoader 
JNDI not configured for solr (NoInitialContextEx)
2015-09-06 11:02:32.344 INFO  (main) [   ] o.a.s.c.SolrResourceLoader 
using system property solr.solr.home: /localapps/dev/EventLog/
2015-09-06 11:02:32.349 INFO  (main) [   ] o.a.s.c.SolrResourceLoader 
new SolrResourceLoader for directory: '/localapps/dev/EventLog/'
2015-09-06 11:02:32.355 INFO  (main) [   ] o.a.s.c.SolrResourceLoader 
Adding 'file:/localapps/dev/EventLog/lib/pg.jar' to classloader
2015-09-06 11:02:32.355 INFO  (main) [   ] o.a.s.c.SolrResourceLoader 
Adding 
'file:/localapps/dev/EventLog/lib/solr-dataimporthandler-5.3.0.jar' to 
classloader
2015-09-06 11:02:32.356 INFO  (main) [   ] o.a.s.c.SolrResourceLoader 
Adding 
'file:/localapps/dev/EventLog/lib/solr-dataimporthandler-extras-5.3.0.jar' 
to classloader
2015-09-06 11:02:32.700 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading 
container configuration from /localapps/dev/EventLog/solr.xml
2015-09-06 11:02:32.871 INFO  (main) [   ] o.a.s.c.CoresLocator 
Config-defined core root directory: /localapps/dev/EventLog
2015-09-06 11:02:32.928 INFO  (main) [   ] o.a.s.c.CoreContainer New 
CoreContainer 574322827
2015-09-06 11:02:32.928 INFO  (main) [   ] o.a.s.c.CoreContainer Loading 
cores into CoreContainer [instanceDir=/localapps/dev/EventLog/]
2015-09-06 11:02:32.929 INFO  (main) [   ] o.a.s.c.CoreContainer loading 
shared library: /localapps/dev/EventLog/lib
2015-09-06 11:02:32.931 INFO  (main) [   ] o.a.s.c.SolrResourceLoader 
Adding 'file:/localapps/dev/EventLog/lib/pg.jar' to classloader
2015-09-06 11:02:32.931 INFO  (main) [   ] o.a.s.c.SolrResourceLoader 
Adding 
'file:/localapps/dev/EventLog/lib/solr-dataimporthandler-5.3.0.jar' to 
classloader
2015-09-06 11:02:32.931 INFO  (main) [   ] o.a.s.c.SolrResourceLoader 
Adding 
'file:/localapps/dev/EventLog/lib/solr-dataimporthandler-extras-5.3.0.jar' 
to classloader
2015-09-06 11:02:32.983 INFO  (main) [   ] 
o.a.s.h.c.HttpShardHandlerFactory created with socketTimeout : 

Re: ghostly config issues

2015-09-06 Thread Mark Fenbers

On 9/6/2015 12:00 PM, Shawn Heisey wrote:

It looks like you have jars in the solrhome/lib directory, which is
good.  You probably don't need the dataimporthandler jar for -extras if
you just want to load from a database.

It does appear that you also have  directives in your
solrconfig.xml, which are loading after the dataimport jars load.
Loading jars from multiple locations complicates the classloader, and
has been known to cause a huge number of problems.  I don't know if it
is why you are having a class cast exception, but it would be my first
guess.  Since you are using the solrhome/lib directory, you should put
ALL jars that you need there, remove all jars from any instanceDir/lib
directory, and also remove all  configuration elements from your
solrconfig.xml.

It looks like the jars that are loading are for Solr 5.3.0.  Do you have
any jars from another Solr or Lucene version in a directory that might
somehow be on the classpath -- loaded into the system java directories
via operating system packages or something?

There *might* be a problem with jars loading twice.  That appears to be
causing other problems, described in this issue:

https://issues.apache.org/jira/browse/SOLR-6188

If loading the dataimport jar twice IS causing this problem, then you
could probably fix it by removing the jars from the solrhome/lib
directory and loading them with  elements in solrconfig.xml instead
-- this is the opposite of the advice I gave you earlier in this message.

I do not like to load jars that way, because each core ends up loading
the jars into its own classloader, so for me they would get loaded
multiple times ... but it might be a viable workaround, especially if
you are not going to have a large number of cores.

Thanks,
Shawn


This issue still persists. :-((

I have moved all the jars to /lib.  I have commented out 
all  references in solrconfig.xml.  I have moved all jars to 
/lib and removed them from /lib.  But nothing has 
changed with any of these steps.


I only heard of Solr/Lucene about a week ago, and so I downloaded the 
package since then.  If I have multiple versions of things, then it 
would have had to come packaged that way because I only 
downloaded/installed it once.


In my solrconfig.xml, I reference a file for my particular database 
details in a  tag.  In it, I deliberately misspelled the 
class name because I wanted to see if I got a different error.  I did, 
so I know that my issue isn't because it can't find the class.  (I since 
changed it back.)  The contents of my db-data-config.xml file are 
attached (below).  Do you see anything obviously incorrect about my 
config?  Could this be where the source of the DataImportHandler error 
originates?


Thanks!
Mark


url="jdbc:postgresql://dx1f/OHRFC" user="awips" />


deltaQuery="SELECT posttime FROM eventlogtext WHERE 
lastmodtime > '${dataimporter.last_index_time}'">










ghostly config issues

2015-09-05 Thread Mark Fenbers

The log data is from solr.log.  There are a couple of puzzling items.

1. On line 2015-09-05 19:19:56.678, it shows a "lib" subdir
   (/localapps/dev/EventLog/lib) which doesn't exist and isn't
   specified anywhere that I can find (lots of "find | grep"
   commands).  I did, at one point, specify this in a version of
   solrconfig.xml that I was experimenting with, but have since removed
   that long ago.  The fact that this still appears is strange, as if
   it is using an old cache or something.
2. On line 2015-09-05 19:19:57.455, it shows a .../conf/conf/... and I
   can't figure out where the double "conf" is coming from.
   /localapps/dev/EventLog/solr/conf/conf/solrconfig.xml doesn't exist,
   and solrconfig.xml resides in
   /localapps/dev/EventLog/solr/conf/ (just one conf) where it belongs!

Can someone give me clues how to uncover these ghostly configuration 
settings?

Mark


2015-09-05 19:19:54.416 INFO  (main) [   ] o.e.j.u.log Logging 
initialized @902ms
2015-09-05 19:19:54.815 INFO  (main) [   ] o.e.j.s.Server 
jetty-9.2.11.v20150529
2015-09-05 19:19:54.851 WARN  (main) [   ] o.e.j.s.h.RequestLogHandler 
!RequestLog
2015-09-05 19:19:54.856 INFO  (main) [   ] o.e.j.d.p.ScanningAppProvider 
Deployment monitor [file:/localapps/dev/solr-5.3.0/server/contexts/] at 
interval 0
2015-09-05 19:19:56.027 INFO  (main) [   ] 
o.e.j.w.StandardDescriptorProcessor NO JSP Support for /solr, did not 
find org.apache.jasper.servlet.JspServlet
2015-09-05 19:19:56.057 WARN  (main) [   ] o.e.j.s.SecurityHandler 
ServletContext@o.e.j.w.WebAppContext@51cc87e3{/solr,file:/localapps/dev/solr-5.3.0/server/solr-webapp/webapp/,STARTING}{/localapps/dev/solr-5.3.0/server/solr-webapp/webapp} 
has uncovered http methods for path: /
2015-09-05 19:19:56.076 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter 
SolrDispatchFilter.init(): WebAppClassLoader=784350225@2ec03c11
2015-09-05 19:19:56.110 INFO  (main) [   ] o.a.s.c.SolrResourceLoader 
JNDI not configured for solr (NoInitialContextEx)
2015-09-05 19:19:56.111 INFO  (main) [   ] o.a.s.c.SolrResourceLoader 
using system property solr.solr.home: /localapps/dev/EventLog/
2015-09-05 19:19:56.116 INFO  (main) [   ] o.a.s.c.SolrResourceLoader 
new SolrResourceLoader for directory: '/localapps/dev/EventLog/'
2015-09-05 19:19:56.449 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading 
container configuration from /localapps/dev/EventLog/solr.xml
2015-09-05 19:19:56.621 INFO  (main) [   ] o.a.s.c.CoresLocator 
Config-defined core root directory: /localapps/dev/EventLog
2015-09-05 19:19:56.677 INFO  (main) [   ] o.a.s.c.CoreContainer New 
CoreContainer 748968642
2015-09-05 19:19:56.678 INFO  (main) [   ] o.a.s.c.CoreContainer Loading 
cores into CoreContainer [instanceDir=/localapps/dev/EventLog/]
2015-09-05 19:19:56.678 INFO  (main) [   ] o.a.s.c.CoreContainer loading 
shared library: /localapps/dev/EventLog/lib
2015-09-05 19:19:56.678 WARN  (main) [   ] o.a.s.c.SolrResourceLoader 
Can't find (or read) directory to add to classloader: lib (resolved as: 
/localapps/dev/EventLog/lib).
2015-09-05 19:19:56.721 INFO  (main) [   ] 
o.a.s.h.c.HttpShardHandlerFactory created with socketTimeout : 
60,connTimeout : 6,maxConnectionsPerHost : 20,maxConnections : 
1,corePoolSize : 0,maximumPoolSize : 2147483647,maxThreadIdleTime : 
5,sizeOfQueue : -1,fairnessPolicy : false,useRetries : false,
2015-09-05 19:19:57.120 INFO  (main) [   ] o.a.s.u.UpdateShardHandler 
Creating UpdateShardHandler HTTP client with params: 
socketTimeout=60=6=true
2015-09-05 19:19:57.129 INFO  (main) [   ] o.a.s.l.LogWatcher SLF4J impl 
is org.slf4j.impl.Log4jLoggerFactory
2015-09-05 19:19:57.132 INFO  (main) [   ] o.a.s.l.LogWatcher 
Registering Log Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
2015-09-05 19:19:57.139 INFO  (main) [   ] o.a.s.c.CoreContainer 
Security conf doesn't exist. Skipping setup for authorization module.
2015-09-05 19:19:57.140 INFO  (main) [   ] o.a.s.c.CoreContainer No 
authentication plugin used.
2015-09-05 19:19:57.225 INFO  (main) [   ] o.a.s.c.CoresLocator Looking 
for core definitions underneath /localapps/dev/EventLog
2015-09-05 19:19:57.409 INFO  (main) [   ] o.a.s.c.SolrCore Created 
CoreDescriptor: {name=EventLog, config=solrconfig.xml, transient=false, 
schema=schema.xml, loadOnStartup=true, 
configSetProperties=configsetprops.json, 
instanceDir=/localapps/dev/EventLog/solr/conf, 
absoluteInstDir=/localapps/dev/EventLog/solr/conf/, 
coreNodeName=core_node5, dataDir=/localapps/dev/EventLog}
2015-09-05 19:19:57.411 INFO  (main) [   ] o.a.s.c.CoresLocator Found 
core EventLog in /localapps/dev/EventLog/solr/conf/
2015-09-05 19:19:57.415 INFO  (main) [   ] o.a.s.c.CoresLocator Found 1 
core definitions
2015-09-05 19:19:57.425 INFO  (coreLoadExecutor-6-thread-1) [   ] 
o.a.s.c.SolrResourceLoader new SolrResourceLoader for directory: 
'/localapps/dev/EventLog/solr/conf/'
2015-09-05 19:19:57.426 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter 

Re: which solrconfig.xml

2015-09-04 Thread Mark Fenbers

Chris,

The document "Uploading Structured Data Store Data with the Data Import 
Handler" has a number of references to solrconfig.xml, starting on Page 
2 and continuing on page 3 in the section "Configuring solrconfig.xml".  
It also is mentioned on Page 5 in the "Property Writer" and the "Data 
Sources" sections.  And other places in this document as well.


The solrconfig.xml file is also referenced (without a path) in the "Solr 
Quick Start" document, in the Design Overview section and other sections 
as well.  None of these references suggests the location of the 
solrconfig.xml file.  Doing a "find . -name solrconfig.xml" from the 
Solr home directory reveals about a dozen or so of these files in 
various subdirectories.  Thus, my confusion as to which one I need to 
customize...


I feel ready to graduate from the examples in "Solr Quick Start" 
document, e.g., using bin/solr -e dih and have fed in existing files on 
disk.  The tutorial was *excellent* for this part.  But now I want to 
build a "real" index using *my own* data from a database.  In doing 
this, I find the coaching in the tutorial to be rather absent.  For 
example, I haven't read in any of the documents I have found so far an 
explanation of why one might want to use more than one Solr node and 
more than one shard, or what the advantages are of using Solr in cloud 
mode vs stand-alone mode.  As a result, I had to 
improvise/guess/trial-and-error.  I did manage to configure my own data 
source and changed my queries to apply to my own data, but I did 
something wrong somewhere in solrconfig.xml because I get errors when 
running, now.  I solved some of them by copying the *.jar files from the 
./dist directory to the solr/lib directory (a tip I found when I googled 
the error message), but that only helped to a certain point.


I will post more specific questions about my issues when I have a chance 
to re-investigate that (hopefully later today).


I have *not* found specific Java code examples using Solr yet, but I 
haven't exhausted exploring the Solr website yet.  Hopefully, I'll find 
some examples using Solr in Java code...


Mark

On 9/2/2015 9:51 PM, Chris Hostetter wrote:

: various $HOME/solr-5.3.0 subdirectories.  The documents/tutorials say to edit
: the solrconfig.xml file for various configuration details, but they never say
: which one of these dozen to edit.  Moreover, I cannot determine which version

can you please give us a specific examples (ie: urls, page numbers &
version of the ref guide, etc...) of documentation that tell you to edit
the solrconfig.xml w/o being explicit about where to to find it so that we
can fix the docs?

FWIW: The official "Quick Start" tutorial does not mention editing
solrconfig.xml at all...

http://lucene.apache.org/solr/quickstart.html



-Hoss
http://www.lucidworks.com/





Config error mystery

2015-09-04 Thread Mark Fenbers

Greetings,

I'm moving on from the tutorials and trying to setup an index for my own 
data (from a database).  All I did was add the following to the 
solrconfig.xml (taken verbatim from the example in Solr documentation, 
except for the name="config" pathname) and I get an error in the 
web-based UI.


  class="org.apache.solr.handler.dataimport.DataImportHandler" >


/localapps/dev/EventLog/data-config.xml

  

Because of this error, no /dataimport page is available in the Admin 
user interface; therefore, I cannot visit the page 
http://localhost:8983/solr/dataimport.  The actual error is:


org.apache.solr.common.SolrException: Error Instantiating 
requestHandler, org.apache.solr.handler.dataimport.DataImportHandler 
failed to instantiate org.apache.solr.request.SolrRequestHandler

at org.apache.solr.core.SolrCore.(SolrCore.java:820)
at org.apache.solr.core.SolrCore.(SolrCore.java:659)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:727)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:447)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:438)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
requestHandler, org.apache.solr.handler.dataimport.DataImportHandler 
failed to instantiate org.apache.solr.request.SolrRequestHandler

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:588)
at org.apache.solr.core.PluginBag.createPlugin(PluginBag.java:122)
at org.apache.solr.core.PluginBag.init(PluginBag.java:217)
at 
org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:130)

at org.apache.solr.core.SolrCore.(SolrCore.java:773)
... 9 more
Caused by: java.lang.ClassCastException: class 
org.apache.solr.handler.dataimport.DataImportHandler

at java.lang.Class.asSubclass(Class.java:3208)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:475)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:422)

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:567)
... 13 more


If I remove the  section and restart Solr, the error 
goes away.  As best I can tell, the contents of

/localapps/dev/EventLog/data-config.xml look fine, too.  See it here:


url="jdbc:postgresql://dx1f/OHRFC" user="awips" />


deltaQuery="SELECT posttime FROM eventlogtext WHERE 
lastmodtime > '${dataimporter.last_index_time}'">







It seems to me that this problem could be a classpath issue, but I 
copied the appropriate jar file into the solr/lib directory to be sure.  
This made the (slightly different) initial error go away, but now I 
cannot make this one go away.


Any ideas?

Mark





which solrconfig.xml

2015-09-02 Thread Mark Fenbers
Hi,  I've been fiddling with Solr for two whole days since 
downloading/unzipping it.  I've learned a lot by reading 4 documents and 
the web site.  However, there are a dozen or so instances of 
solrconfig.xml in various $HOME/solr-5.3.0 subdirectories.  The 
documents/tutorials say to edit the solrconfig.xml file for various 
configuration details, but they never say which one of these dozen to 
edit.  Moreover, I cannot determine which version is being used once I 
start solr, so that I would know which instance of this file to 
edit/customize.


Can you help??

Thanks!
Mark