RE: Configure query parser to handle field name case-insensitive

2017-05-15 Thread Duck Geraint (ext) GBJH
As you're using the extended dismax parser, it has an option to include per 
field aliasing:
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

You could include this in your solr requesthandler config, e.g.
id

Which would direct ID:1 to instead search id:1

Geraint



-Original Message-
From: Peemöller, Björn [mailto:bjoern.peemoel...@berenberg.de]
Sent: 15 May 2017 14:17
To: 'solr-user@lucene.apache.org' 
Subject: AW: Configure query parser to handle field name case-insensitive

Hi Rick,

thank you for your reply! I really meant field *names*, since our values are 
already processed by a lower case filter (both index and query). However, our 
users are confused because they can search for "id:1" but not for "ID:1". 
Furthermore, we employ the EDisMax query parser, so then even get no error 
message.

Therefore, I thought it may be sufficient to map all field names to lower case 
at the query level so that I do not have to introduce additional fields.

Regards,
Björn

-Ursprüngliche Nachricht-
Von: Rick Leir [mailto:rl...@leirtech.com]
Gesendet: Montag, 15. Mai 2017 13:48
An: solr-user@lucene.apache.org
Betreff: Re: Configure query parser to handle field name case-insensitive

Björn
Field names or values? I assume values. Your analysis chain in schema.xml 
probably downcases chars, if not then that could be your problem.

Field _name_? Then you might have to copyfield the field to a new field with 
the desired case. Avoid doing that if you can. Cheers -- Rick

On May 15, 2017 5:48:09 AM EDT, "Peemöller, Björn" 
 wrote:
>Hi all,
>
>I'm fairly new at using Solr and I need to configure our instance to
>accept field names in both uppercase and lowercase (they are defined as
>lowercase in our configuration). Is there a simple way to achieve this?
>
>Thanks in advance,
>Björn
>
>Björn Peemöller
>IT & IT Operations
>
>BERENBERG
>Joh. Berenberg, Gossler & Co. KG
>Neuer Jungfernstieg 20
>20354 Hamburg
>
>Telefon +49 40 350 60-8548
>Telefax +49 40 350 60-900
>E-Mail
>bjoern.peemoel...@berenberg.de
>www.berenberg.de
>
>Sitz: Hamburg - Amtsgericht Hamburg HRA 42659
>
>
>Diese Nachricht einschliesslich etwa beigefuegter Anhaenge ist
>vertraulich und kann dem Bank- und Datengeheimnis unterliegen oder
>sonst rechtlich geschuetzte Daten und Informationen enthalten. Wenn Sie
>nicht der richtige Adressat sind oder diese Nachricht irrtuemlich
>erhalten haben, informieren Sie bitte sofort den Absender über die
>Antwortfunktion. Anschliessend moechten Sie bitte diese Nachricht
>einschliesslich etwa beigefuegter Anhaenge unverzueglich vollstaendig
>loeschen. Das unerlaubte Kopieren oder Speichern dieser Nachricht
>und/oder der ihr etwa beigefuegten Anhaenge sowie die unbefugte
>Weitergabe der darin enthaltenen Daten und Informationen sind nicht
>gestattet. Wir weisen darauf hin, dass rechtsverbindliche Erklaerungen
>namens unseres Hauses grundsaetzlich der Unterschriften zweier
>ausreichend bevollmaechtigter Vertreter unseres Hauses beduerfen. Wir
>verschicken daher keine rechtsverbindlichen Erklaerungen per E-Mail an
>Dritte. Demgemaess nehmen wir per E-Mail auch keine rechtsverbindlichen
>Erklaerungen oder Auftraege von Dritten entgegen.
>Sollten Sie Schwierigkeiten beim Oeffnen dieser E-Mail haben, wenden
>Sie sich bitte an den Absender oder an i...@berenberg.de. Please refer
>to http://www.berenberg.de/my_berenberg/disclaimer_e.html for our
>confidentiality notice.

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Diese Nachricht einschliesslich etwa beigefuegter Anhaenge ist vertraulich und 
kann dem Bank- und Datengeheimnis unterliegen oder sonst rechtlich geschuetzte 
Daten und Informationen enthalten. Wenn Sie nicht der richtige Adressat sind 
oder diese Nachricht irrtuemlich erhalten haben, informieren Sie bitte sofort 
den Absender über die Antwortfunktion. Anschliessend moechten Sie bitte diese 
Nachricht einschliesslich etwa beigefuegter Anhaenge unverzueglich vollstaendig 
loeschen. Das unerlaubte Kopieren oder Speichern dieser Nachricht und/oder der 
ihr etwa beigefuegten Anhaenge sowie die unbefugte Weitergabe der darin 
enthaltenen Daten und Informationen sind nicht gestattet. Wir weisen darauf 
hin, dass rechtsverbindliche Erklaerungen namens unseres Hauses grundsaetzlich 
der Unterschriften zweier ausreichend bevollmaechtigter Vertreter unseres 
Hauses beduerfen. Wir verschicken daher keine rechtsverbindlichen Erklaerungen 
per E-Mail an Dritte. Demgemaess nehmen wir per E-Mail auch keine 
rechtsverbindlichen Erklaerungen oder Auftraege von Dritten entgegen.
Sollten Sie Schwierigkeiten beim Oeffnen dieser E-Mail haben, wenden Sie sich 
bitte an den Absender oder an i...@berenberg.de. Please refer to 
http://www.berenberg.de/my_berenberg/disclaimer_e.html for our confidentiality 
notice.


RE: Wrong highlighting in stripped HTML field

2016-09-08 Thread Duck Geraint (ext) GBJH
As far as I can tell, that is how it's currently set-up (does the same on mine 
at least). The HTML Stripper seems to exclude the pre tag, but include the post 
tag when it generates the start and end offsets of each text token. I couldn't 
say why though... (This may just have avoided needing to backtrack).

Play around in the analysis section of the admin ui to verify this.

Geraint


-Original Message-
From: Neumann, Dennis [mailto:neum...@sub.uni-goettingen.de]
Sent: 07 September 2016 18:16
To: solr-user@lucene.apache.org
Subject: AW: Wrong highlighting in stripped HTML field

Hello,
can anyone confirm this behavior of the highlighter? Otherwise my Solr 
installation might be misconfigured or something.
Or does anyone know if this is a known issue? In that case I probably should 
ask on the dev mailing list.

Thanks and cheers,
Dennis



Von: Neumann, Dennis [neum...@sub.uni-goettingen.de]
Gesendet: Montag, 5. September 2016 18:00
An: solr-user@lucene.apache.org
Betreff: Wrong highlighting in stripped HTML field

Hi guys

I am having a problem with the standard highlighter. I'm working with Solr 
5.4.1. The problem appears in my project, but it is easy to replicate:

I create a new core with the conf directory from configsets/basic_configs, so 
everything is set to defaults. I add the following in schema.xml:





  


  
  

  



Now I add this document (in the admin interface):

{"id":"1","testfield":"bla"}

I search for: testfield:bla
with hl=on=testfield

What I get is a response with an incorrectly formatted HTML snippet:


  "response": {
"numFound": 1,
"start": 0,
"docs": [
  {
"id": "1",
"testfield": "bla",
"_version_": 1544645963570741200
  }
]
  },
  "highlighting": {
"1": {
  "testfield": [
"bla"
  ]
}
  }

Is there a way to tell the highlighter to just enclose the "bla"? I. e. I want 
to get

bla


Best regards
Dennis





Syngenta Limited, Registered in England No 2710846; Registered Office : 
Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire, 
RG42 6EY, United Kingdom

 This message may contain confidential information. If you are not the 
designated recipient, please notify the sender immediately, and delete the 
original and any copies. Any use of the message by you is prohibited.


RE: Recursively scan documents for indexing in a folder in SolrJ

2015-10-19 Thread Duck Geraint (ext) GBJH
"The problem for this is that it is indexing all the files regardless of the 
formats, instead of just those formats in post.jar. So I guess still have to 
"steal" some codes from there to detect the file format?"

If you've not worked it out yourself yet, try something like:
http://docs.oracle.com/javase/7/docs/api/java/io/File.html#listFiles(java.io.FilenameFilter)
http://stackoverflow.com/questions/5751335/using-file-listfiles-with-filenameextensionfilter

Geraint

Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com

-Original Message-
From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com]
Sent: 17 October 2015 00:55
To: solr-user@lucene.apache.org
Subject: Re: Recursively scan documents for indexing in a folder in SolrJ

Thanks for your advice. I also found this method which so far has been able to 
traverse all the documents in the folder and index them in Solr.

public static void showFiles(File[] files) {
for (File file : files) {
if (file.isDirectory()) {
System.out.println("Directory: " + file.getName());
showFiles(file.listFiles()); // Calls same method again.
} else {
System.out.println("File: " + file.getName());
}
}}

The problem for this is that it is indexing all the files regardless of the 
formats, instead of just those formats in post.jar. So I guess still have to 
"steal" some codes from there to detect the file format?

As for files that contains non-English characters (Eg; Chinese characters), it 
is currently not able to read the Chinese characters, and it is all read as a 
series of "???". Any idea how to solve this problem?

Thank you.

Regards,
Edwin


On 16 October 2015 at 21:16, Duck Geraint (ext) GBJH < 
geraint.d...@syngenta.com> wrote:

> Also, check this link for SolrJ example code (including the recursion):
> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
>
> Geraint
>
>
> Geraint Duck
> Data Scientist
> Toxicology and Health Sciences
> Syngenta UK
> Email: geraint.d...@syngenta.com
>
> -Original Message-
> From: Jan Høydahl [mailto:jan@cominvent.com]
> Sent: 16 October 2015 12:14
> To: solr-user@lucene.apache.org
> Subject: Re: Recursively scan documents for indexing in a folder in
> SolrJ
>
> SolrJ does not have any file crawler built in.
> But you are free to steal code from SimplePostTool.java related to
> directory traversal, and then index each document found using SolrJ.
>
> Note that SimplePostTool.java tries to be smart with what endpoint to
> post files to, xml, csv and json content will be posted to /update
> while office docs go to /update/extract
>
> --
> Jan Høydahl, search solution architect Cominvent AS -
> www.cominvent.com
>
> > 16. okt. 2015 kl. 05.22 skrev Zheng Lin Edwin Yeo
> ><edwinye...@gmail.com
> >:
> >
> > Hi,
> >
> > I understand that in SimplePostTool (post.jar), there is this
> > command to automatically detect content types in a folder, and
> > recursively scan it for documents for indexing into a collection:
> > bin/post -c gettingstarted afolder/
> >
> > This has been useful for me to do mass indexing of all the files
> > that are in the folder. Now that I'm moving to production and plans
> > to use SolrJ to do the indexing as it can do more things like
> > robustness checks and retires for indexes that fails.
> >
> > However, I can't seems to find a way to do the same in SolrJ. Is it
> > possible for this to be done in SolrJ? I'm using Solr 5.3.0
> >
> > Thank you.
> >
> > Regards,
> > Edwin
>
>
> 
>
>
> Syngenta Limited, Registered in England No 2710846;Registered Office :
> Syngenta Limited, European Regional Centre, Priestley Road, Surrey
> Research Park, Guildford, Surrey, GU2 7YH, United Kingdom
>   This message may contain
> confidential information. If you are not the designated recipient,
> please notify the sender immediately, and delete the original and any
> copies. Any use of the message by you is prohibited.
>



Syngenta Limited, Registered in England No 2710846;Registered Office : Syngenta 
Limited, European Regional Centre, Priestley Road, Surrey Research Park, 
Guildford, Surrey, GU2 7YH, United Kingdom

 This message may contain confidential information. If you are not the 
designated recipient, please notify the sender immediately, and delete the 
original and any copies. Any use of the message by you is prohibited.


RE: File-based Spelling

2015-10-19 Thread Duck Geraint (ext) GBJH
"Yet, it claimed it found my misspelled word to be "fenber" without the "s""
I wonder if this is because you seem to applying a stemmer to your dictionary 
words.

Try removing the "text_en" line from 
your spellcheck search component definition.

Geraint


Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com


-Original Message-
From: Mark Fenbers [mailto:mark.fenb...@noaa.gov]
Sent: 16 October 2015 19:43
To: solr-user@lucene.apache.org
Subject: Re: File-based Spelling

On 10/13/2015 9:30 AM, Dyer, James wrote:
> Mark,
>
> The older spellcheck implementations create an n-gram sidecar index, which is 
> why you're seeing your name split into 2-grams like this.  See the IR Book by 
> Manning et al, section 3.3.4 for more information.  Based on the results 
> you're getting, I think it is loading your file correctly.  You should now 
> try a query against this spelling index, using words *not* in the file you 
> loaded that are within 1 or 2 edits from something that is in the dictionary. 
>  If it doesn't yield suggestions, then post the relevant sections of the 
> solrconfig.xml, schema.xml and also the query string you are trying.
>
> James Dyer
> Ingram Content Group
>
James, I've already done this.   My query string was "fenbers". This is
my last name which does *not* occur in the linux.words file.  It is only
1 edit distance from "fenders" which *is* in the linux.words file.  Yet, it 
claimed it found my misspelled word to be "fenber" without the "s"
and it gave me these 8 suggestions:
f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

So I'm attaching the the entire solrconfig.xml and schema.xml that is in 
effect.  These are in a single file with all the block comments removed.

I'm also puzzled that you say "older implementations create a sidecar index"... 
because I am using v5.3.0, which was the latest version as of my download a 
month or two ago.  So, with my implementation being recent, why is an n-gram 
sidecar index still (seemingly) being produced?

thanks for the help!
Mark






Syngenta Limited, Registered in England No 2710846;Registered Office : Syngenta 
Limited, European Regional Centre, Priestley Road, Surrey Research Park, 
Guildford, Surrey, GU2 7YH, United Kingdom

 This message may contain confidential information. If you are not the 
designated recipient, please notify the sender immediately, and delete the 
original and any copies. Any use of the message by you is prohibited.


RE: Recursively scan documents for indexing in a folder in SolrJ

2015-10-16 Thread Duck Geraint (ext) GBJH
Also, check this link for SolrJ example code (including the recursion):
https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Geraint


Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com

-Original Message-
From: Jan Høydahl [mailto:jan@cominvent.com]
Sent: 16 October 2015 12:14
To: solr-user@lucene.apache.org
Subject: Re: Recursively scan documents for indexing in a folder in SolrJ

SolrJ does not have any file crawler built in.
But you are free to steal code from SimplePostTool.java related to directory 
traversal, and then index each document found using SolrJ.

Note that SimplePostTool.java tries to be smart with what endpoint to post 
files to, xml, csv and json content will be posted to /update while office docs 
go to /update/extract

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 16. okt. 2015 kl. 05.22 skrev Zheng Lin Edwin Yeo :
>
> Hi,
>
> I understand that in SimplePostTool (post.jar), there is this command
> to automatically detect content types in a folder, and recursively
> scan it for documents for indexing into a collection:
> bin/post -c gettingstarted afolder/
>
> This has been useful for me to do mass indexing of all the files that
> are in the folder. Now that I'm moving to production and plans to use
> SolrJ to do the indexing as it can do more things like robustness
> checks and retires for indexes that fails.
>
> However, I can't seems to find a way to do the same in SolrJ. Is it
> possible for this to be done in SolrJ? I'm using Solr 5.3.0
>
> Thank you.
>
> Regards,
> Edwin





Syngenta Limited, Registered in England No 2710846;Registered Office : Syngenta 
Limited, European Regional Centre, Priestley Road, Surrey Research Park, 
Guildford, Surrey, GU2 7YH, United Kingdom

 This message may contain confidential information. If you are not the 
designated recipient, please notify the sender immediately, and delete the 
original and any copies. Any use of the message by you is prohibited.


RE: NullPointerException

2015-10-13 Thread Duck Geraint (ext) GBJH
How odd, though I'm afraid this is reaching the limit of my knowledge at this 
point (and I still can't find where that box is within the Admin UI!).

The only thing I'd say is to check that "logtext" is a defined named field 
within your schema, and to double check how it's field type is defined.

Also, try without the "text_en" 
definition (I believe this should be implicit as the filed type of "logtext" 
above).

Geraint

Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com


-Original Message-
From: Mark Fenbers [mailto:mark.fenb...@noaa.gov]
Sent: 12 October 2015 12:14
To: solr-user@lucene.apache.org
Subject: Re: NullPointerException

On 10/12/2015 5:38 AM, Duck Geraint (ext) GBJH wrote:
> "When I use the Admin UI (v5.3.0), and check the spellcheck.build box"
> Out of interest, where is this option within the Admin UI? I can't find 
> anything like it in mine...
This is in the expanded options that open up once I put a checkmark in the 
"spellcheck" box.
> Do you get the same issue by submitting the build command directly with 
> something like this instead:
> http://localhost:8983/solr//ELspell?spellcheck.build=true
> ?
Yes, I do.
> It'll be reasonably obvious if the dictionary has actually built or not by 
> the file size of your speller store:
> /localapps/dev/EventLog/solr/EventLog2/data/spFile
>
>
> Otherwise, (temporarily) try adding...
> true
> ...to your spellchecker search component config, you might find it'll log a 
> more useful error message that way.
Interesting!  The index builds successfully using this method and I get no 
stacktrace error.  Hurray!  But why??

So now, I tried running a query, so I typed Fenbers into the spellcheck.q box, 
and I get the following 9 suggestions:
fenber
f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

I find this very odd because I commented out all references to the wordbreak 
checker in solrconfig.xml.  What do I configure so that Solr will give me 
sensible suggestions like:
   fenders
   embers
   fenberry
and so on?

Mark




Syngenta Limited, Registered in England No 2710846;Registered Office : Syngenta 
Limited, European Regional Centre, Priestley Road, Surrey Research Park, 
Guildford, Surrey, GU2 7YH, United Kingdom

 This message may contain confidential information. If you are not the 
designated recipient, please notify the sender immediately, and delete the 
original and any copies. Any use of the message by you is prohibited.


RE: NullPointerException

2015-10-12 Thread Duck Geraint (ext) GBJH
"When I use the Admin UI (v5.3.0), and check the spellcheck.build box"
Out of interest, where is this option within the Admin UI? I can't find 
anything like it in mine...

Do you get the same issue by submitting the build command directly with 
something like this instead:
http://localhost:8983/solr//ELspell?spellcheck.build=true
?

It'll be reasonably obvious if the dictionary has actually built or not by the 
file size of your speller store:
/localapps/dev/EventLog/solr/EventLog2/data/spFile


Otherwise, (temporarily) try adding...
true
...to your spellchecker search component config, you might find it'll log a 
more useful error message that way.

Geraint

Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com

-Original Message-
From: Mark Fenbers [mailto:mark.fenb...@noaa.gov]
Sent: 10 October 2015 20:03
To: solr User Group
Subject: NullPointerException

Greetings!

I'm new to Solr Spellchecking...  I have yet to get it to work.

Attached is a snippet from my solrconfig.xml pertaining to my spellcheck 
efforts.

When I use the Admin UI (v5.3.0), and check the spellcheck.build box, I get a 
NullPointerException stacktrace.  The actual stacktrace is at the bottom of the 
attachment.  My spellcheck.q is the following:
Solr will yuse suggestions frum both.

The FileBasedSpellChecker.build method is clearly the problem (determined from 
the stack trace), but I cannot figure out why.

Maybe I don't need to do a build on it...(?)  If I don't, the spell-checker 
finds no mispelled words.  yet, "yuse" and "frum" are not stand-alone words in 
/usr/share/dict/words.

/usr/share/dict/words exists and has global read permissions.  I displayed the 
file and see no issues (i.e., one word per line) although some "words" are a 
string of digits, but that shouldn't matter.

Does my snippet give any clues about why I would get this error? Is my stripped 
down configuration missing something, perhaps?

Mark




Syngenta Limited, Registered in England No 2710846;Registered Office : Syngenta 
Limited, European Regional Centre, Priestley Road, Surrey Research Park, 
Guildford, Surrey, GU2 7YH, United Kingdom

 This message may contain confidential information. If you are not the 
designated recipient, please notify the sender immediately, and delete the 
original and any copies. Any use of the message by you is prohibited.


RE: New Project setup too clunky

2015-09-28 Thread Duck Geraint (ext) GBJH
Huh, strange - I didn't even notice that you could create cores through the UI. 
I suppose it depends what order you read and infer from the documentation.

See "Create a Core":
https://cwiki.apache.org/confluence/display/solr/Running+Solr

I followed the "solr create -help" option to work out how to create a 
non-datadriven core (i.e. solr create_core), but I suppose this could be a 
little more explicit.

Geraint


Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com

-Original Message-
From: Mark Fenbers [mailto:mark.fenb...@noaa.gov]
Sent: 27 September 2015 19:07
To: solr-user@lucene.apache.org
Subject: Re: New Project setup too clunky

On 9/27/2015 12:49 PM, Alexandre Rafalovitch wrote:
> Mark,
>
> Thank you for your valuable feedback. The newbie's views are always 
> appreciated.
>
> Admin Admin UI command is designed for creating a collection based on
> the configuration you already have. Obviously, it makes that point
> somewhat less than obvious.
>
> To create a new collection with configuration files all in place, you
> can bootstrap it from a configset. Which is basically what you did
> when you run "solr -e", except "-e" also populates the files and does
> other tricks.
>
> So, if you go back to the command line and run "solr" you will see a
> bunch of options. The one you are looking for is "solr create_core"
> which will tell you all the parameters as well as the available
> configurations to bootstrap from.
>
> I hope this helps.

Yes!  It does help!  But it took a post and a response on the user-forum for me 
to learn this!  Rather, it should be added to the "Solr Quick Start" document.
Mark





Syngenta Limited, Registered in England No 2710846;Registered Office : Syngenta 
Limited, European Regional Centre, Priestley Road, Surrey Research Park, 
Guildford, Surrey, GU2 7YH, United Kingdom

 This message may contain confidential information. If you are not the 
designated recipient, please notify the sender immediately, and delete the 
original and any copies. Any use of the message by you is prohibited.


RE: query parsing

2015-09-24 Thread Duck Geraint (ext) GBJH
Okay, so maybe I'm missing something here (I'm still relatively new to Solr 
myself), but am I right in thinking the following is still in your 
solrconfig.xml file:

  
true
managed-schema
  

If so, wouldn't using a managed schema make several of your field definitions 
inside the schema.xml file semi-redundant?

Regards,
Geraint


Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com


-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
Sent: 24 September 2015 09:23
To: solr-user@lucene.apache.org
Subject: Re: query parsing

I would focus on this :

"

> 5> now kick off the DIH job and look again.
>
Now it shows a histogram, but most of the "terms" are long -- the full texts of 
(the table.column) eventlogtext.logtext, including the whitespace (with %0A 
used for newline characters)...  So, it appears it is not being tokenized 
properly, correct?"
Can you open from your Solr ui , the schema xml and show us the snippets for 
that field that seems to not tokenise ?
Can you show us ( even a screenshot is fine) the schema browser page related ?
Could be a problem of encoding ?
Following Erick details about the analysis, what are your results ?

Cheers

2015-09-24 8:04 GMT+01:00 Upayavira :

> typically, the index dir is inside the data dir. Delete the index dir
> and you should be good. If there is a tlog next to it, you might want
> to delete that also.
>
> If you dont have a data dir, i wonder whether you set the data dir
> when creating your core or collection. Typically the instance dir and
> data dir aren't needed.
>
> Upayavira
>
> On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:
> > OK, this is bizarre. You'd have had to set up SolrCloud by
> > specifying the -zkRun command when you start Solr or the -zkHost;
> > highly unlikely. On the admin page there would be a "cloud" link on
> > the left side, I really doubt one's there.
> >
> > You should have a data directory, it should be the parent of the
> > index and tlog directories. As of sanity check try looking at the
> > analysis page.
> > Type
> > a bunch of words in the left hand side indexing box and uncheck the
> > verbose box. As you can tell I'm grasping at straws. I'm still
> > puzzled why you don't have a "data" directory here, but that
> > shouldn't really matter. How did you create this index? I don't mean
> > data import handler more how did you create the core that you're
> > indexing to?
> >
> > Best,
> > Erick
> >
> > On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers
> > 
> > wrote:
> >
> > > On 9/23/2015 12:30 PM, Erick Erickson wrote:
> > >
> > >> Then my next guess is you're not pointing at the index you think
> > >> you
> are
> > >> when you 'rm -rf data'
> > >>
> > >> Just ignore the Elall field for now I should think, although get
> > >> rid
> of it
> > >> if you don't think you need it.
> > >>
> > >> DIH should be irrelevant here.
> > >>
> > >> So let's back up.
> > >> 1> go ahead and "rm -fr data" (with Solr stopped).
> > >>
> > > I have no "data" dir.  Did you mean "index" dir?  I removed 3
> > > index directories (2 for spelling):
> > > cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex
> > >
> > >> 2> start Solr
> > >> 3> do NOT re-index.
> > >> 4> look at your index via the schema-browser. Of course there
> > >> 4> should
> be
> > >> nothing there!
> > >>
> > > Correct!  It said "there is no term info :("
> > >
> > >> 5> now kick off the DIH job and look again.
> > >>
> > > Now it shows a histogram, but most of the "terms" are long -- the
> > > full texts of (the table.column) eventlogtext.logtext, including
> > > the
> whitespace
> > > (with %0A used for newline characters)...  So, it appears it is
> > > not
> being
> > > tokenized properly, correct?
> > >
> > >> Your logtext field should have only single tokens. The fact that
> > >> you
> have
> > >> some very
> > >> long tokens presumably with whitespace) indicates that you aren't
> really
> > >> blowing
> > >> the index away between indexing.
> > >>
> > > Well, I did this time for sure.  I verified that initially,
> > > because it showed there was no term info until I DIH'd again.
> > >
> > >> Are you perhaps in Solr Cloud with more than one replica?
> > >>
> > > Not that I know of, but being new to Solr, there could be things
> > > going
> on
> > > that I'm not aware of.  How can I tell?  I certainly didn't set
> anything up
> > > for solrCloud deliberately.
> > >
> > >> In that case you
> > >> might be getting the index replicated on startup assuming you
> > >> didn't blow away all replicas. If you are in SolrCloud, I'd just
> > >> delete the collection and start over, after insuring that you'd
> > >> pushed the configset up to Zookeeper.
> > >>
> > >> BTW, I always look at the schema.xml file from the Solr admin
> > >> window
> just
> > >> as
> > >> a sanity check in these situations.
> > >>
> > > Good idea!  But