RE: Search for ISBN-like identifiers

2017-01-17 Thread Moenieb Davids
Hi Guys

Just a quick question on search, which in not related to this post:

I have a few cores which is based on a mainframe extract, 1 core per extracted 
file which resembles a "DB Table"
The cores are all somehow linked via 1 to many fields, with a structure similar 
to a normal ERD

Is it possible to return the result from a query that joins lets say 3 cores in 
the following format:

"core1_id":"XXX",
"_childDocuments_":[
{
  "core2_id":"yyy",
  "core_2_fieldx":"ABC",
  "_childDocuments_":[
  {
"core3_id":"zzz",
"core_3_fieldx":"ABC",
"core3_fieldy":"123",
  {
  "core2_fieldy":"123",
{

Regards
Moenieb Davids

-Original Message-
From: Josh Lincoln [mailto:josh.linc...@gmail.com] 
Sent: 05 January 2017 08:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Search for ISBN-like identifiers

Sebastian,
You may want to try adding autoGeneratePhraseQueries="true" to the fieldtype.
With that setting, a query for 978-3-8052-5094-8 will behave just like "978
3 8052 5094 8" (with the quotes)

A few notes about autoGeneratePhraseQueries
a) it used to be set to true by default, but that was changed several years ago
b) does NOT require a reindex, so very easy to test
c) apparently not recommended for non-whitespace delimited languages (CJK, 
etc), but maybe that's not an issue in your use case.
d) i'm unsure how it'll impact wildcard queries on that field. E.g. will
978-3-8052* match 978-3-8052-5094-8? At the very least, partial ISBNs (e.g.
978-3-8052) would match full ISBN without needing to use the wildcard. I'm just 
not sure what happens if the user includes the wildcard.

Josh

On Thu, Jan 5, 2017 at 1:41 PM Sebastian Riemer <s.rie...@littera.eu> wrote:

> Thank you very much for taking the time to help me!
>
> I'll definitely have a look at the link you've posted.
>
> @ShawnHeisey Thanks too for shedding light on the wildcard behaviour!
>
> Allow me one further question:
> - Assuming that I define a separate field for storing the ISBNs, using 
> the awesome analyzer provider by Mr. Bill Dueber. How do I get that 
> field copied into my general text field, which is used by my 
> QuickSearch-Input?
> Won't that field be processed again by the analyser defined on the 
> text field?
> - Should I alternatively add more fields to the q-Parameter? As for 
> now, I always have set q=text: but I 
> guess one could try something like 
> q=text:+isbnspeciallookupfield: want_to_search>
>
> I don't really know about that last idea though, since the searches 
> are propably OR-combined which is not what I like to have.
>
> Third option would be, to pre-process the distinction to where to look 
> at in the solr in my application of course. I.e. everything being a 
> regex containing only numbers and hyphens with length 13 -> don't 
> query on field text, instead use field isbnspeciallookupfield
>
>
> Many thanks again, and have a nice day!
> Sebastian
>
>
> -Ursprüngliche Nachricht-
> Von: Erik Hatcher [mailto:erik.hatc...@gmail.com]
> Gesendet: Donnerstag, 5. Januar 2017 19:10
> An: solr-user@lucene.apache.org
> Betreff: Re: Search for ISBN-like identifiers
>
> Sebastian -
>
> There’s some precedent out there for ISBN’s.  Bill Dueber and the 
> UMICH/code4lib folks have done amazing work, check it out here -
>
> https://github.com/mlibrary/umich_solr_library_filters < 
> https://github.com/mlibrary/umich_solr_library_filters>
>
>   - Erik
>
>
> > On Jan 5, 2017, at 5:08 AM, Sebastian Riemer <s.rie...@littera.eu>
> wrote:
> >
> > Hi folks,
> >
> >
> > TL;DR: Is there an easy way, to copy ISBNs with hyphens to the 
> > general
> text field, respectively configure the analyser on that field, so that 
> a search for the hyphenated ISBN returns exactly the matching document?
> >
> > Long version:
> > I've defined a field "text" of type "text_general", where I copy all 
> > my other fields to, to be able to do a "quick search" where I set 
> > q=text
> >
> > The definition of the type text_general is like this:
> >
> >
> >
> >  > positionIncrementGap="100">
> >
> >  
> >
> >
> >
> > > words="stopwords.txt" />
> >
> >
> >
> >  
> >
> >  
> >
> >
> >
> >   

Re: Search for ISBN-like identifiers

2017-01-05 Thread Josh Lincoln
Sebastian,
You may want to try adding autoGeneratePhraseQueries="true" to the
fieldtype.
With that setting, a query for 978-3-8052-5094-8 will behave just like "978
3 8052 5094 8" (with the quotes)

A few notes about autoGeneratePhraseQueries
a) it used to be set to true by default, but that was changed several years
ago
b) does NOT require a reindex, so very easy to test
c) apparently not recommended for non-whitespace delimited languages (CJK,
etc), but maybe that's not an issue in your use case.
d) i'm unsure how it'll impact wildcard queries on that field. E.g. will
978-3-8052* match 978-3-8052-5094-8? At the very least, partial ISBNs (e.g.
978-3-8052) would match full ISBN without needing to use the wildcard. I'm
just not sure what happens if the user includes the wildcard.

Josh

On Thu, Jan 5, 2017 at 1:41 PM Sebastian Riemer <s.rie...@littera.eu> wrote:

> Thank you very much for taking the time to help me!
>
> I'll definitely have a look at the link you've posted.
>
> @ShawnHeisey Thanks too for shedding light on the wildcard behaviour!
>
> Allow me one further question:
> - Assuming that I define a separate field for storing the ISBNs, using the
> awesome analyzer provider by Mr. Bill Dueber. How do I get that field
> copied into my general text field, which is used by my QuickSearch-Input?
> Won't that field be processed again by the analyser defined on the text
> field?
> - Should I alternatively add more fields to the q-Parameter? As for now, I
> always have set q=text: but I guess one
> could try something like
> q=text:+isbnspeciallookupfield:
>
> I don't really know about that last idea though, since the searches are
> propably OR-combined which is not what I like to have.
>
> Third option would be, to pre-process the distinction to where to look at
> in the solr in my application of course. I.e. everything being a regex
> containing only numbers and hyphens with length 13 -> don't query on field
> text, instead use field isbnspeciallookupfield
>
>
> Many thanks again, and have a nice day!
> Sebastian
>
>
> -Ursprüngliche Nachricht-
> Von: Erik Hatcher [mailto:erik.hatc...@gmail.com]
> Gesendet: Donnerstag, 5. Januar 2017 19:10
> An: solr-user@lucene.apache.org
> Betreff: Re: Search for ISBN-like identifiers
>
> Sebastian -
>
> There’s some precedent out there for ISBN’s.  Bill Dueber and the
> UMICH/code4lib folks have done amazing work, check it out here -
>
> https://github.com/mlibrary/umich_solr_library_filters <
> https://github.com/mlibrary/umich_solr_library_filters>
>
>   - Erik
>
>
> > On Jan 5, 2017, at 5:08 AM, Sebastian Riemer <s.rie...@littera.eu>
> wrote:
> >
> > Hi folks,
> >
> >
> > TL;DR: Is there an easy way, to copy ISBNs with hyphens to the general
> text field, respectively configure the analyser on that field, so that a
> search for the hyphenated ISBN returns exactly the matching document?
> >
> > Long version:
> > I've defined a field "text" of type "text_general", where I copy all
> > my other fields to, to be able to do a "quick search" where I set
> > q=text
> >
> > The definition of the type text_general is like this:
> >
> >
> >
> >  > positionIncrementGap="100">
> >
> >  
> >
> >
> >
> > > words="stopwords.txt" />
> >
> >
> >
> >  
> >
> >  
> >
> >
> >
> > > words="stopwords.txt" />
> >
> > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >
> >
> >
> >  
> >
> >
> >
> >
> > I now face the problem, that searching for a book with
> > text:978-3-8052-5094-8* does not return the single result I expect.
> > However searching for text:9783805250948* instead returns a result.
> > Note, that I am adding a wildcard at the end automatically, to further
> > broaden the resultset. Note also, that it does not seem to matter
> > whether I put backslashes in front of the hyphen or not (to be exact,
> > when sending via SolrJ from my application, I put in the backslashes,
> > but I don't see a difference when using SolrAdmin as I guess SolrAdmin
> > automatically inserts backslashes if needed?)
> >
> > When storing ISBNs, I do store them twice, once with hyphens
> (978-3-8052-5094-8) and once without (9783805250948). A pure phrase search
> on both those values return also the single document.
> >
> > I learned that the StandardToken

AW: Search for ISBN-like identifiers

2017-01-05 Thread Sebastian Riemer
Thank you very much for taking the time to help me!

I'll definitely have a look at the link you've posted.

@ShawnHeisey Thanks too for shedding light on the wildcard behaviour!

Allow me one further question:
- Assuming that I define a separate field for storing the ISBNs, using the 
awesome analyzer provider by Mr. Bill Dueber. How do I get that field copied 
into my general text field, which is used by my QuickSearch-Input? Won't that 
field be processed again by the analyser defined on the text field?
- Should I alternatively add more fields to the q-Parameter? As for now, I 
always have set q=text: but I guess one could 
try something like 
q=text:+isbnspeciallookupfield:

I don't really know about that last idea though, since the searches are 
propably OR-combined which is not what I like to have.

Third option would be, to pre-process the distinction to where to look at in 
the solr in my application of course. I.e. everything being a regex containing 
only numbers and hyphens with length 13 -> don't query on field text, instead 
use field isbnspeciallookupfield


Many thanks again, and have a nice day!
Sebastian


-Ursprüngliche Nachricht-
Von: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Gesendet: Donnerstag, 5. Januar 2017 19:10
An: solr-user@lucene.apache.org
Betreff: Re: Search for ISBN-like identifiers

Sebastian -

There’s some precedent out there for ISBN’s.  Bill Dueber and the 
UMICH/code4lib folks have done amazing work, check it out here -

https://github.com/mlibrary/umich_solr_library_filters 
<https://github.com/mlibrary/umich_solr_library_filters>

  - Erik


> On Jan 5, 2017, at 5:08 AM, Sebastian Riemer <s.rie...@littera.eu> wrote:
> 
> Hi folks,
> 
> 
> TL;DR: Is there an easy way, to copy ISBNs with hyphens to the general text 
> field, respectively configure the analyser on that field, so that a search 
> for the hyphenated ISBN returns exactly the matching document?
> 
> Long version:
> I've defined a field "text" of type "text_general", where I copy all 
> my other fields to, to be able to do a "quick search" where I set 
> q=text
> 
> The definition of the type text_general is like this:
> 
> 
> 
>  positionIncrementGap="100">
> 
>  
> 
>
> 
> words="stopwords.txt" />
> 
>
> 
>  
> 
>  
> 
>
> 
> words="stopwords.txt" />
> 
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
>
> 
>  
> 
>
> 
> 
> I now face the problem, that searching for a book with 
> text:978-3-8052-5094-8* does not return the single result I expect. 
> However searching for text:9783805250948* instead returns a result. 
> Note, that I am adding a wildcard at the end automatically, to further 
> broaden the resultset. Note also, that it does not seem to matter 
> whether I put backslashes in front of the hyphen or not (to be exact, 
> when sending via SolrJ from my application, I put in the backslashes, 
> but I don't see a difference when using SolrAdmin as I guess SolrAdmin 
> automatically inserts backslashes if needed?)
> 
> When storing ISBNs, I do store them twice, once with hyphens 
> (978-3-8052-5094-8) and once without (9783805250948). A pure phrase search on 
> both those values return also the single document.
> 
> I learned that the StandardTokenizer splits up values from fields at index 
> time, and I've also learned that I can use the solrAdmin analysis and the 
> debugQuery to help understand what is going on. From the analysis screen I 
> see, that given the value 9783805250948 at index-time and 9783805250948* 
> query-time both leads to an unchanged value 9783805250948 at the end.
> When given the value 978-3-8052-5094-8 for "Field Value (Index)" and 
> 978-3-8052-5094-8* for "Field Value (Query)"  I can see how the ISBN is 
> tokenized into 5 parts. Again, the values match on both sides (Index and 
> Query).
> 
> How does the left side correlate with the right side? My guess: The left side 
> means, "Values stored in field text will be tokenized while indexing as show 
> here on the left". The right side means, "When querying on the field text, 
> I'll tokenize the entered value like this, and see if I find something on the 
> index" Is this correct?
> 
> Another question: when querying and investigating the single document in 
> solrAdmin, the contents I see In the column text represents the _stored_ 
> value of the field text, right?
> And am I correct that this actually has nothing to do, with what is actually 
> stored in  the index for searching?
> 
> When storing the value 978-3-805

Re: Search for ISBN-like identifiers

2017-01-05 Thread Erik Hatcher
Sebastian -

There’s some precedent out there for ISBN’s.  Bill Dueber and the 
UMICH/code4lib folks have done amazing work, check it out here -

https://github.com/mlibrary/umich_solr_library_filters 


  - Erik


> On Jan 5, 2017, at 5:08 AM, Sebastian Riemer  wrote:
> 
> Hi folks,
> 
> 
> TL;DR: Is there an easy way, to copy ISBNs with hyphens to the general text 
> field, respectively configure the analyser on that field, so that a search 
> for the hyphenated ISBN returns exactly the matching document?
> 
> Long version:
> I've defined a field "text" of type "text_general", where I copy all my other 
> fields to, to be able to do a "quick search" where I set q=text
> 
> The definition of the type text_general is like this:
> 
> 
> 
>  positionIncrementGap="100">
> 
>  
> 
>
> 
> words="stopwords.txt" />
> 
>
> 
>  
> 
>  
> 
>
> 
> words="stopwords.txt" />
> 
> ignoreCase="true" expand="true"/>
> 
>
> 
>  
> 
>
> 
> 
> I now face the problem, that searching for a book with 
> text:978-3-8052-5094-8* does not return the single result I expect. However 
> searching for text:9783805250948* instead returns a result. Note, that I am 
> adding a wildcard at the end automatically, to further broaden the resultset. 
> Note also, that it does not seem to matter whether I put backslashes in front 
> of the hyphen or not (to be exact, when sending via SolrJ from my 
> application, I put in the backslashes, but I don't see a difference when 
> using SolrAdmin as I guess SolrAdmin automatically inserts backslashes if 
> needed?)
> 
> When storing ISBNs, I do store them twice, once with hyphens 
> (978-3-8052-5094-8) and once without (9783805250948). A pure phrase search on 
> both those values return also the single document.
> 
> I learned that the StandardTokenizer splits up values from fields at index 
> time, and I've also learned that I can use the solrAdmin analysis and the 
> debugQuery to help understand what is going on. From the analysis screen I 
> see, that given the value 9783805250948 at index-time and 9783805250948* 
> query-time both leads to an unchanged value 9783805250948 at the end.
> When given the value 978-3-8052-5094-8 for "Field Value (Index)" and 
> 978-3-8052-5094-8* for "Field Value (Query)"  I can see how the ISBN is 
> tokenized into 5 parts. Again, the values match on both sides (Index and 
> Query).
> 
> How does the left side correlate with the right side? My guess: The left side 
> means, "Values stored in field text will be tokenized while indexing as show 
> here on the left". The right side means, "When querying on the field text, 
> I'll tokenize the entered value like this, and see if I find something on the 
> index" Is this correct?
> 
> Another question: when querying and investigating the single document in 
> solrAdmin, the contents I see In the column text represents the _stored_ 
> value of the field text, right?
> And am I correct that this actually has nothing to do, with what is actually 
> stored in  the index for searching?
> 
> When storing the value 978-3-8052-5094-8, are only the tokenized values 
> stored for search, or is the "whole word" also stored? Is there a way to 
> actually see all the values which are stored for search?
> When searching text:" 978-3-8052-5094-8" I get the single result, so I guess 
> the value as a whole must also be stored in the index for searching?
> 
> One more thing which confuses me:
> Searching for text: 978-3-8052-5094-8 gives me 72 results, because it leads 
> to searching for "parsedquery_toString":"text:978 text:3 text:8052 text:5094 
> text:8",
> but searching for text: 978-3-8052-5094-8* gives me 0 results, this leads to 
> "parsedquery_toString":"text:978-3-8052-5094-8*",
> 
> Why is the appended wildcard changing the behaviour so radically? I'd rather 
> expect to get something like "parsedquery_toString":"text:978 text:3 
> text:8052 text:5094 text:8*",  and thus even more results.
> 
> Btw. I've found and read an interesting blog about storing ISBNs and alikes 
> here: 
> http://robotlibrarian.billdueber.com/2012/03/solr-field-type-for-numericish-ids/
>  However, I already store my ISBN also in a separate field, of type string, 
> which works fine when I use this field for searching.
> 
> Best regards, sorry for the enormously long question and thank you for 
> listening.
> 
> Sebastian



Re: Search for ISBN-like identifiers

2017-01-05 Thread Shawn Heisey
On 1/5/2017 3:08 AM, Sebastian Riemer wrote:
> I now face the problem, that searching for a book with
> text:978-3-8052-5094-8* does not return the single result I expect.
> However searching for text:9783805250948* instead returns a result.
> Note, that I am adding a wildcard at the end automatically, to further
> broaden the resultset. Note also, that it does not seem to matter
> whether I put backslashes in front of the hyphen or not (to be exact,
> when sending via SolrJ from my application, I put in the backslashes,
> but I don't see a difference when using SolrAdmin as I guess SolrAdmin
> automatically inserts backslashes if needed?) 

As soon as you use a wildcard, the query is no longer run through the
analysis chain, which means that it keeps all those hyphens.  That will
never match anything in the index, because the StandardTokenizer has
removed all the hyphens in the tokens that it puts into the index.  The
fact that wildcards skip analysis is a source of major confusion.  I
assume that the analysis skip is required for correct operation,
although I have never delved that deeply into the internals.

A hyphen is only a special character if it's the first character in a
word.  It's generally a good idea to escape the special characters
anyway, but in this case it doesn't matter, which is why you can send it
unescaped.

If you want to use wildcards, you're going to have to use them on an
untokenized (normally "string") field, or the results will probably not
be what you expect.

Thanks,
Shawn



Re: Search for ISBN-like identifiers

2017-01-05 Thread Erick Erickson
bq: How does the left side correlate with the right side?...

You've got it right, the left is the indexed and the right is the query

bq: the contents I see In the column text represents the _stored_
value of the field text, right...

Correct

bq: ...are only the tokenized values stored for search

I'll be a bit pedantic here since "stored" is overloaded ;)...

The _indexed_ tokens, i.e. the tokens you search against are all
that's searchable. For instance let's say you have "running" in your
text and are stemming. "run" is all that gets into the searchable
portion of your index.

there's no really convenient way to find the tokens associated with a
doc, the inverted index structure doesn't lent itself well to
reconstructing a doc that way. Luke _can_ do this. It's a lossy
process as you'll see. It can also be quite lengthy.

bq: One more thing which confuses me:

Oh boy. All I can offer here is it's less confusing that it was in
"the bad old days". Wildcards are tricky to handle. Here's a writeup:
https://lucidworks.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/

The short form is that wildcards are handled "specially" and much of
the analysis chain will be skipped, it depends on the particular
class. Your trailing wildcard example makes sense to a human, but it
turns out to be hard to generalize.

Two possibilities for you to consider, especially since ISBNs are regular:
1> WordDelimiterFilterFactory is designed for this kind of thing. You
can dothings like "catenateNumbers" so what'd be searchable would be
both "978-3-8052-5094-8" and 9783805250948

2> do the above yourself in the ETL process. Then just use a
multiValued String field.

Best,
Erick

On Thu, Jan 5, 2017 at 2:08 AM, Sebastian Riemer  wrote:
> Hi folks,
>
>
> TL;DR: Is there an easy way, to copy ISBNs with hyphens to the general text 
> field, respectively configure the analyser on that field, so that a search 
> for the hyphenated ISBN returns exactly the matching document?
>
> Long version:
> I've defined a field "text" of type "text_general", where I copy all my other 
> fields to, to be able to do a "quick search" where I set q=text
>
> The definition of the type text_general is like this:
>
>
>
>  positionIncrementGap="100">
>
>   
>
> 
>
>  words="stopwords.txt" />
>
> 
>
>   
>
>   
>
> 
>
>  words="stopwords.txt" />
>
>  ignoreCase="true" expand="true"/>
>
> 
>
>   
>
> 
>
>
> I now face the problem, that searching for a book with 
> text:978-3-8052-5094-8* does not return the single result I expect. However 
> searching for text:9783805250948* instead returns a result. Note, that I am 
> adding a wildcard at the end automatically, to further broaden the resultset. 
> Note also, that it does not seem to matter whether I put backslashes in front 
> of the hyphen or not (to be exact, when sending via SolrJ from my 
> application, I put in the backslashes, but I don't see a difference when 
> using SolrAdmin as I guess SolrAdmin automatically inserts backslashes if 
> needed?)
>
> When storing ISBNs, I do store them twice, once with hyphens 
> (978-3-8052-5094-8) and once without (9783805250948). A pure phrase search on 
> both those values return also the single document.
>
> I learned that the StandardTokenizer splits up values from fields at index 
> time, and I've also learned that I can use the solrAdmin analysis and the 
> debugQuery to help understand what is going on. From the analysis screen I 
> see, that given the value 9783805250948 at index-time and 9783805250948* 
> query-time both leads to an unchanged value 9783805250948 at the end.
> When given the value 978-3-8052-5094-8 for "Field Value (Index)" and 
> 978-3-8052-5094-8* for "Field Value (Query)"  I can see how the ISBN is 
> tokenized into 5 parts. Again, the values match on both sides (Index and 
> Query).
>
> How does the left side correlate with the right side? My guess: The left side 
> means, "Values stored in field text will be tokenized while indexing as show 
> here on the left". The right side means, "When querying on the field text, 
> I'll tokenize the entered value like this, and see if I find something on the 
> index" Is this correct?
>
> Another question: when querying and investigating the single document in 
> solrAdmin, the contents I see In the column text represents the _stored_ 
> value of the field text, right?
> And am I correct that this actually has nothing to do, with what is actually 
> stored in  the index for searching?
>
> When storing the value 978-3-8052-5094-8, are only the tokenized values 
> stored for search, or is the "whole word" also stored? Is there a way to 
> actually see all the values which are stored for search?
> When searching text:" 978-3-8052-5094-8" I get the single result, so I guess 
> the value as a whole must also be stored in the index for searching?
>
> One more thing which 

Search for ISBN-like identifiers

2017-01-05 Thread Sebastian Riemer
Hi folks,


TL;DR: Is there an easy way, to copy ISBNs with hyphens to the general text 
field, respectively configure the analyser on that field, so that a search for 
the hyphenated ISBN returns exactly the matching document?

Long version:
I've defined a field "text" of type "text_general", where I copy all my other 
fields to, to be able to do a "quick search" where I set q=text

The definition of the type text_general is like this:





  







  

  









  




I now face the problem, that searching for a book with text:978-3-8052-5094-8* 
does not return the single result I expect. However searching for 
text:9783805250948* instead returns a result. Note, that I am adding a wildcard 
at the end automatically, to further broaden the resultset. Note also, that it 
does not seem to matter whether I put backslashes in front of the hyphen or not 
(to be exact, when sending via SolrJ from my application, I put in the 
backslashes, but I don't see a difference when using SolrAdmin as I guess 
SolrAdmin automatically inserts backslashes if needed?)

When storing ISBNs, I do store them twice, once with hyphens 
(978-3-8052-5094-8) and once without (9783805250948). A pure phrase search on 
both those values return also the single document.

I learned that the StandardTokenizer splits up values from fields at index 
time, and I've also learned that I can use the solrAdmin analysis and the 
debugQuery to help understand what is going on. From the analysis screen I see, 
that given the value 9783805250948 at index-time and 9783805250948* query-time 
both leads to an unchanged value 9783805250948 at the end.
When given the value 978-3-8052-5094-8 for "Field Value (Index)" and 
978-3-8052-5094-8* for "Field Value (Query)"  I can see how the ISBN is 
tokenized into 5 parts. Again, the values match on both sides (Index and Query).

How does the left side correlate with the right side? My guess: The left side 
means, "Values stored in field text will be tokenized while indexing as show 
here on the left". The right side means, "When querying on the field text, I'll 
tokenize the entered value like this, and see if I find something on the index" 
Is this correct?

Another question: when querying and investigating the single document in 
solrAdmin, the contents I see In the column text represents the _stored_ value 
of the field text, right?
And am I correct that this actually has nothing to do, with what is actually 
stored in  the index for searching?

When storing the value 978-3-8052-5094-8, are only the tokenized values stored 
for search, or is the "whole word" also stored? Is there a way to actually see 
all the values which are stored for search?
When searching text:" 978-3-8052-5094-8" I get the single result, so I guess 
the value as a whole must also be stored in the index for searching?

One more thing which confuses me:
Searching for text: 978-3-8052-5094-8 gives me 72 results, because it leads to 
searching for "parsedquery_toString":"text:978 text:3 text:8052 text:5094 
text:8",
but searching for text: 978-3-8052-5094-8* gives me 0 results, this leads to 
"parsedquery_toString":"text:978-3-8052-5094-8*",

Why is the appended wildcard changing the behaviour so radically? I'd rather 
expect to get something like "parsedquery_toString":"text:978 text:3 text:8052 
text:5094 text:8*",  and thus even more results.

Btw. I've found and read an interesting blog about storing ISBNs and alikes 
here: 
http://robotlibrarian.billdueber.com/2012/03/solr-field-type-for-numericish-ids/
 However, I already store my ISBN also in a separate field, of type string, 
which works fine when I use this field for searching.

Best regards, sorry for the enormously long question and thank you for 
listening.

Sebastian