subject:"Solr 5.0 \- uniqueKey case insensitive \?"

Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-06 Thread Bruno Mannina


Yes thanks it's now for me too.

Daniel, my pn is always in uppercase and I index them always in uppercase.
the problem (solved now after all your answers, thanks) was the request, 
if users

requests with lowercase then solr reply no result and it was not good.

but now the problem is solved, I changed in my source file the name pn 
field to id

and in my schema I use a copy field named pn and it works perfectly.

Thanks a lot !!!

Le 06/05/2015 09:44, Daniel Collins a écrit :

Ah, I remember seeing this when we first started using Solr (which was 4.0
because we needed Solr Cloud), I never got around to filing an issue for it
(oops!), but we have a note in our schema to leave the key field a normal
string (like Bruno we had tried to lowercase it which failed).
We didn't really know Solr in those days, and hadn't really thought about
it since then, but Hoss' and Erick's explanations make perfect sense now!

Since shard routing is (basically) done on hashes of the unique key, if I
have 2 documents which are the "same", but have values "HELLO" and "hello",
they might well hash to completely different shards, so the update
logistics would be horrible.

Bruno, why do you need to lowercase at all then?  You said in your example,
that your client application always supplies "pn" and it is always
uppercase, so presumably all adds/updates could be done directly on that
field (as a normal string with no lowercasing).  Where does the case
insensitivity come in, is that only for searching?  If so couldn't you add
a search field (called id), and update your app to search using that (or
make that your default search field, I guess it depends if your calling app
explicitly uses the pn field name in its searches).


On 6 May 2015 at 01:55, Erick Erickson  wrote:


Well, "working fine" may be a bit of an overstatement. That has never
been officially supported, so it "just happened" to work in 3.6.

As Chris points out, if you're using SolrCloud then this will _not_
work as routing happens early in the process, i.e. before the analysis
chain gets the token so various copies of the doc will exist on
different shards.

Best,
Erick

On Mon, May 4, 2015 at 4:19 PM, Bruno Mannina  wrote:

Hello Chris,

yes I confirm on my SOLR3.6 it works fine since several years, and each

doc

added with same code is updated not added.

To be more clear, I receive docs with a field name "pn" and it's the
uniqueKey, and it always in uppercase

so I must define in my schema.xml

 
 
indexed="true"

stored="false"/>
...
id
...
   

but the application that use solr already exists so it requests with pn
field not id, i cannot change that.
and in each docs I receive, there is not id field, just pn field, and  i
cannot also change that.

so there is a problem no ? I must import a id field and request a pn

field,

but I have a pn field only for import...



Le 05/05/2015 01:00, Chris Hostetter a écrit :

: On SOLR3.6, I defined a string_ci field like this:
:
: 
: 
:   
:   
: 
: 
:
: 


I'm really suprised that field would have worked for you (reliably) as a
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going
back to Solr 1.x) been to use a copyField to create a case insensitive
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc
with id "foo" overwrites a doc with id "FOO" then the only reliable way

to

make something like that work is to do the lowercassing in an
UpdateProcessor to ensure it happens *before* the docs are distributed

to

the correct shard, and so the correct existing doc is overwritten (even

if

you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-06 Thread Daniel Collins

Ah, I remember seeing this when we first started using Solr (which was 4.0
because we needed Solr Cloud), I never got around to filing an issue for it
(oops!), but we have a note in our schema to leave the key field a normal
string (like Bruno we had tried to lowercase it which failed).
We didn't really know Solr in those days, and hadn't really thought about
it since then, but Hoss' and Erick's explanations make perfect sense now!

Since shard routing is (basically) done on hashes of the unique key, if I
have 2 documents which are the "same", but have values "HELLO" and "hello",
they might well hash to completely different shards, so the update
logistics would be horrible.

Bruno, why do you need to lowercase at all then?  You said in your example,
that your client application always supplies "pn" and it is always
uppercase, so presumably all adds/updates could be done directly on that
field (as a normal string with no lowercasing).  Where does the case
insensitivity come in, is that only for searching?  If so couldn't you add
a search field (called id), and update your app to search using that (or
make that your default search field, I guess it depends if your calling app
explicitly uses the pn field name in its searches).

On 6 May 2015 at 01:55, Erick Erickson  wrote:

> Well, "working fine" may be a bit of an overstatement. That has never
> been officially supported, so it "just happened" to work in 3.6.
>
> As Chris points out, if you're using SolrCloud then this will _not_
> work as routing happens early in the process, i.e. before the analysis
> chain gets the token so various copies of the doc will exist on
> different shards.
>
> Best,
> Erick
>
> On Mon, May 4, 2015 at 4:19 PM, Bruno Mannina  wrote:
> > Hello Chris,
> >
> > yes I confirm on my SOLR3.6 it works fine since several years, and each
> doc
> > added with same code is updated not added.
> >
> > To be more clear, I receive docs with a field name "pn" and it's the
> > uniqueKey, and it always in uppercase
> >
> > so I must define in my schema.xml
> >
> >  > required="true" stored="true"/>
> >  indexed="true"
> > stored="false"/>
> > ...
> >id
> > ...
> >   
> >
> > but the application that use solr already exists so it requests with pn
> > field not id, i cannot change that.
> > and in each docs I receive, there is not id field, just pn field, and  i
> > cannot also change that.
> >
> > so there is a problem no ? I must import a id field and request a pn
> field,
> > but I have a pn field only for import...
> >
> >
> >
> > Le 05/05/2015 01:00, Chris Hostetter a écrit :
> >>
> >> : On SOLR3.6, I defined a string_ci field like this:
> >> :
> >> :  >> : sortMissingLast="true" omitNorms="true">
> >> : 
> >> :   
> >> :   
> >> : 
> >> : 
> >> :
> >> :  >> : required="true" stored="true"/>
> >>
> >>
> >> I'm really suprised that field would have worked for you (reliably) as a
> >> uniqueKey field even in Solr 3.6.
> >>
> >> the best practice for something like what you describe has always (going
> >> back to Solr 1.x) been to use a copyField to create a case insensitive
> >> copy of your uniqueKey for searching.
> >>
> >> if, for some reason, you really want case insensitve *updates* (so a doc
> >> with id "foo" overwrites a doc with id "FOO" then the only reliable way
> to
> >> make something like that work is to do the lowercassing in an
> >> UpdateProcessor to ensure it happens *before* the docs are distributed
> to
> >> the correct shard, and so the correct existing doc is overwritten (even
> if
> >> you aren't using solr cloud)
> >>
> >>
> >>
> >> -Hoss
> >> http://www.lucidworks.com/
> >>
> >>
> >
> >
> > ---
> > Ce courrier électronique ne contient aucun virus ou logiciel malveillant
> > parce que la protection avast! Antivirus est active.
> > http://www.avast.com
> >
>

Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-05 Thread Erick Erickson

Well, "working fine" may be a bit of an overstatement. That has never
been officially supported, so it "just happened" to work in 3.6.

As Chris points out, if you're using SolrCloud then this will _not_
work as routing happens early in the process, i.e. before the analysis
chain gets the token so various copies of the doc will exist on
different shards.

Best,
Erick

On Mon, May 4, 2015 at 4:19 PM, Bruno Mannina  wrote:
> Hello Chris,
>
> yes I confirm on my SOLR3.6 it works fine since several years, and each doc
> added with same code is updated not added.
>
> To be more clear, I receive docs with a field name "pn" and it's the
> uniqueKey, and it always in uppercase
>
> so I must define in my schema.xml
>
>  required="true" stored="true"/>
>  stored="false"/>
> ...
>id
> ...
>   
>
> but the application that use solr already exists so it requests with pn
> field not id, i cannot change that.
> and in each docs I receive, there is not id field, just pn field, and  i
> cannot also change that.
>
> so there is a problem no ? I must import a id field and request a pn field,
> but I have a pn field only for import...
>
>
>
> Le 05/05/2015 01:00, Chris Hostetter a écrit :
>>
>> : On SOLR3.6, I defined a string_ci field like this:
>> :
>> : > : sortMissingLast="true" omitNorms="true">
>> : 
>> :   
>> :   
>> : 
>> : 
>> :
>> : > : required="true" stored="true"/>
>>
>>
>> I'm really suprised that field would have worked for you (reliably) as a
>> uniqueKey field even in Solr 3.6.
>>
>> the best practice for something like what you describe has always (going
>> back to Solr 1.x) been to use a copyField to create a case insensitive
>> copy of your uniqueKey for searching.
>>
>> if, for some reason, you really want case insensitve *updates* (so a doc
>> with id "foo" overwrites a doc with id "FOO" then the only reliable way to
>> make something like that work is to do the lowercassing in an
>> UpdateProcessor to ensure it happens *before* the docs are distributed to
>> the correct shard, and so the correct existing doc is overwritten (even if
>> you aren't using solr cloud)
>>
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>>
>
>
> ---
> Ce courrier électronique ne contient aucun virus ou logiciel malveillant
> parce que la protection avast! Antivirus est active.
> http://www.avast.com
>

Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina


Hello Chris,

yes I confirm on my SOLR3.6 it works fine since several years, and each 
doc added with same code is updated not added.


To be more clear, I receive docs with a field name "pn" and it's the 
uniqueKey, and it always in uppercase


so I must define in my schema.xml

required="true" stored="true"/>
indexed="true" stored="false"/>

...
   id
...
  

but the application that use solr already exists so it requests with pn 
field not id, i cannot change that.
and in each docs I receive, there is not id field, just pn field, and  i 
cannot also change that.


so there is a problem no ? I must import a id field and request a pn 
field, but I have a pn field only for import...



Le 05/05/2015 01:00, Chris Hostetter a écrit :

: On SOLR3.6, I defined a string_ci field like this:
:
: 
: 
:   
:   
: 
: 
:
: 


I'm really suprised that field would have worked for you (reliably) as a
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going
back to Solr 1.x) been to use a copyField to create a case insensitive
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc
with id "foo" overwrites a doc with id "FOO" then the only reliable way to
make something like that work is to do the lowercassing in an
UpdateProcessor to ensure it happens *before* the docs are distributed to
the correct shard, and so the correct existing doc is overwritten (even if
you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Chris Hostetter


: On SOLR3.6, I defined a string_ci field like this:
: 
: 
: 
:   
:   
: 
: 
: 
: 


I'm really suprised that field would have worked for you (reliably) as a 
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going 
back to Solr 1.x) been to use a copyField to create a case insensitive 
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc 
with id "foo" overwrites a doc with id "FOO" then the only reliable way to 
make something like that work is to do the lowercassing in an 
UpdateProcessor to ensure it happens *before* the docs are distributed to 
the correct shard, and so the correct existing doc is overwritten (even if 
you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/

Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina


Dear Solr users,

I have a problem with SOLR5.0 (and not on SOLR3.6)

What kind of field can I use for my uniqueKey field named "code" if I
want it case insensitive ?

On SOLR3.6, I defined a string_ci field like this:



  
  





and it works fine.
- If I add a document with the same code then the doc is updated.
- If I search a document with lower or upper case, the doc is found


But in SOLR5.0, if I use this definition then :
- I can search in lower/upper case, it's OK
- BUT if I add a doc with the same code then the doc is added not updated !?

I read that the problem could be that the type of field is tokenized
instead of use a string.

If I change from string_ci to string, then
- I lost the possibility to search in lower/upper case
- but it works fine to update the doc.

So, could you help me to find the right field type to:

- search in case insensitive
- if I add a document with the same code, the old doc will be updated

Thanks a lot !


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 5.0 - uniqueKey case insensitive ?

Re: Solr 5.0 - uniqueKey case insensitive ?

Re: Solr 5.0 - uniqueKey case insensitive ?

Re: Solr 5.0 - uniqueKey case insensitive ?

Re: Solr 5.0 - uniqueKey case insensitive ?

Solr 5.0 - uniqueKey case insensitive ?

6 matches

Site Navigation

Mail list logo

Footer information