Re: SolrJ: SolrInputDocument.addField()

2021-02-16 Thread Shawn Heisey

On 2/15/2021 10:17 AM, Steven White wrote:

Yes, I have managed schema enabled like so:

   
 true
 cp-schema.xml
   

The reason why I enabled it is so that I can dynamically customize the
schema based on what's in the DB.  So that I can add fields to the schema
dynamically.


A managed/mutable schema is a configuration detail that's separate from 
(and required by) the update processor that guesses unknown fields.  It 
has been the default schema factory used in out-of-the box 
configurations for quite a while.



I guess a better question, to meet my need, is this: how do I tell Solr, in
schema-less mode, to use *my* defined field-type whenever it needs to
create a new field?


The config for that is described here:

https://lucene.apache.org/solr/guide/8_6/schemaless-mode.html#enable-field-class-guessing

It is a bad idea to rely on field guessing for a production index.  Even 
the most carefully designed configuration cannot get it right every 
time.  You're very likely to run into situations where the software's 
best guess turns out to be wrong for your needs.  And then you're forced 
into what you should have done in the first place -- manually fixing the 
definition for that field, which usually also requires reindexing from 
scratch.


One counter-argument to what I stated in the last paragraph that 
frequently comes up is "my data is very well curated and consistent." 
But if that is the case, then you will know what fields and types are 
required *in advance* and you can easily construct a schema yourself 
before sending any data for indexing -- no guessing required.


Thanks,
Shawn


Re: SolrJ: SolrInputDocument.addField()

2021-02-16 Thread Jimi Hullegård
Hi Steven,

Just a thought, from someone who never have used schema-less mode: Have you 
considered using a regular schema file, with a bunch of dynamicField 
definitions? Then you can for example define a dynamic boolean field like this:



Then, when you index the data, you can append "_b" to the field name for all 
boolean values. So if you for example want to index searchable: true, then you 
send that data with the fieldname "searchable_b" and solr will index it as a 
Boolean field.

/Jimi

Steven White wrote:
>
> Hi Shawn,
>
> Yes, I have managed schema enabled like so:
>
>  
> true
> cp-schema.xml
>   
>
> The reason why I enabled it is so that I can dynamically customize the schema 
> based on what's in the DB.  So that I can add fields to the schema 
> dynamically.
>
> I didn't know about the field "guessing" part.  Now that I know I see this in 
> my solrconfig.xml file:
>
>default="${update.autoCreateFields:true}"
>
> processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
> 
> 
> 
>   
>
> If I remove this block, what will happen?
>
> I guess a better question, to meet my need, is this: how do I tell Solr, in 
> schema-less mode, to use *my* defined field-type whenever it needs to create 
> a new field?
>
> I'm on Solr 8.6.1 and the link at
> https://lucene.apache.org/solr/guide/8_6/schema-factory-definition-in-solrconfig.html#schema-factory-definition-in-solrconfig
> doesn't offer much help.
>
> Thanks
>
> Steven
Svenskt Näringsliv är företagsamhetens röst i Sverige. Vi samverkar med 50 
arbetsgivar- och branschorganisationer och är den gemensamma rösten för 60 000 
företag med nästan 2 miljoner medarbetare. Vår uppgift är att tala för alla 
företag och branscher, även de som ännu inte finns men som kan uppstå om 
förutsättningarna är de rätta. Ett bättre företagsklimat för ett bättre 
Sverige. Det är vårt uppdrag.

Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR. Här kan 
du läsa mer om vår behandling och dina rättigheter, 
Integritetspolicy


Re: SolrJ: SolrInputDocument.addField()

2021-02-15 Thread Steven White
Hi Shawn,

Yes, I have managed schema enabled like so:

  
true
cp-schema.xml
  

The reason why I enabled it is so that I can dynamically customize the
schema based on what's in the DB.  So that I can add fields to the schema
dynamically.

I didn't know about the field "guessing" part.  Now that I know I see this
in my solrconfig.xml file:

  



  

If I remove this block, what will happen?

I guess a better question, to meet my need, is this: how do I tell Solr, in
schema-less mode, to use *my* defined field-type whenever it needs to
create a new field?

I'm on Solr 8.6.1 and the link at
https://lucene.apache.org/solr/guide/8_6/schema-factory-definition-in-solrconfig.html#schema-factory-definition-in-solrconfig
doesn't offer much help.

Thanks

Steven


On Mon, Feb 15, 2021 at 11:09 AM Shawn Heisey  wrote:

> On 2/15/2021 6:52 AM, Steven White wrote:
> > It looks to me that SolrInputDocument.addField() is either missnamed or
> > isn't well implemented.
> >
> > When it is called on a field that doesn't exist in the schema, it will
> > create that field and give it a type based on the data.  Not only that,
> it
> > will set default values.  For example, this call
> >
> >  SolrInputDocument doc = new SolrInputDocument();
> >  doc.addField("Company", "ACM company");
> >
> > Will create the following:
> >
> >  
> >  
>
> That SolrJ code does not make those changes to your schema.  At least
> not in the way you're thinking.
>
> It sounds to me like your solrconfig.xml includes what we call
> "schemaless mode" -- an update processor that adds unknown fields when
> they are indexed.  You should disable it.  We strongly recommend never
> using it in production, because it can make the wrong guess about which
> fieldType is required.  The fieldType chosen has very little to do with
> the SolrJ code.  It is controlled by what's in solrconfig.xml.
>
> Thanks,
> Shawn
>


Re: SolrJ: SolrInputDocument.addField()

2021-02-15 Thread Shawn Heisey

On 2/15/2021 6:52 AM, Steven White wrote:

It looks to me that SolrInputDocument.addField() is either missnamed or
isn't well implemented.

When it is called on a field that doesn't exist in the schema, it will
create that field and give it a type based on the data.  Not only that, it
will set default values.  For example, this call

 SolrInputDocument doc = new SolrInputDocument();
 doc.addField("Company", "ACM company");

Will create the following:

 
 


That SolrJ code does not make those changes to your schema.  At least 
not in the way you're thinking.


It sounds to me like your solrconfig.xml includes what we call 
"schemaless mode" -- an update processor that adds unknown fields when 
they are indexed.  You should disable it.  We strongly recommend never 
using it in production, because it can make the wrong guess about which 
fieldType is required.  The fieldType chosen has very little to do with 
the SolrJ code.  It is controlled by what's in solrconfig.xml.


Thanks,
Shawn


Re: SolrJ: SolrInputDocument.addField()

2021-02-15 Thread Steven White
Thanks Shawn.

It looks to me that SolrInputDocument.addField() is either missnamed or
isn't well implemented.

When it is called on a field that doesn't exist in the schema, it will
create that field and give it a type based on the data.  Not only that, it
will set default values.  For example, this call

SolrInputDocument doc = new SolrInputDocument();
doc.addField("Company", "ACM company");

Will create the following:




Since this is happening without the caller knowing it (this is not
documented) it leads to search issues (the intended analyzer will not be
used to name one).

It looks to me that the only way I can fix this is to first create the
field type first and then call addField() passing it the field I
just created.  To that end, I cannot find a SolrJ API to create a field
type.  Is that the case?

Steven.


On Sun, Feb 14, 2021 at 7:17 PM Shawn Heisey  wrote:

> On 2/14/2021 9:00 AM, Steven White wrote:
> > It looks like I'm misusing SolrJ API  SolrInputDocument.addField() thus I
> > need clarification.
> >
> > Here is an example of what I have in my code:
> >
> >  SolrInputDocument doc = new SolrInputDocument();
> >  doc.addField("MyFieldOne", "some data");
> >  doc.addField("MyFieldTwo", 100);
> >
> > The above code is creating 2 fields for me (if they don't exist already)
> > and then indexing the data to those fields.  The data is "some data" and
> > the number 100  However, when the field is created, it is not using the
> > field type that I custom created in my schema.  My question is, how do I
> > tell addField() to use my custom field type?
>
> There is no way in SolrJ code to control which fieldType is used.  That
> is controlled solely by the server-side schema definition.
>
> How do you know that Solr is not using the correct fieldType?  If you
> are looking at the documents returned by a search and aren't seeing the
> transformations described in the schema, you're looking in the wrong place.
>
> Solr search results always returns what was originally sent in for
> indexing.  Only Update Processors (defined in solrconfig.xml, not the
> schema) can affect what gets returned in results, fieldType definitions
> NEVER affect data returned in search results.
>
> Thanks,
> Shawn
>


Re: SolrJ: SolrInputDocument.addField()

2021-02-14 Thread Shawn Heisey

On 2/14/2021 9:00 AM, Steven White wrote:

It looks like I'm misusing SolrJ API  SolrInputDocument.addField() thus I
need clarification.

Here is an example of what I have in my code:

 SolrInputDocument doc = new SolrInputDocument();
 doc.addField("MyFieldOne", "some data");
 doc.addField("MyFieldTwo", 100);

The above code is creating 2 fields for me (if they don't exist already)
and then indexing the data to those fields.  The data is "some data" and
the number 100  However, when the field is created, it is not using the
field type that I custom created in my schema.  My question is, how do I
tell addField() to use my custom field type?


There is no way in SolrJ code to control which fieldType is used.  That 
is controlled solely by the server-side schema definition.


How do you know that Solr is not using the correct fieldType?  If you 
are looking at the documents returned by a search and aren't seeing the 
transformations described in the schema, you're looking in the wrong place.


Solr search results always returns what was originally sent in for 
indexing.  Only Update Processors (defined in solrconfig.xml, not the 
schema) can affect what gets returned in results, fieldType definitions 
NEVER affect data returned in search results.


Thanks,
Shawn


SolrJ: SolrInputDocument.addField()

2021-02-14 Thread Steven White
Hi everyone,

It looks like I'm misusing SolrJ API  SolrInputDocument.addField() thus I
need clarification.

Here is an example of what I have in my code:

SolrInputDocument doc = new SolrInputDocument();
doc.addField("MyFieldOne", "some data");
doc.addField("MyFieldTwo", 100);

The above code is creating 2 fields for me (if they don't exist already)
and then indexing the data to those fields.  The data is "some data" and
the number 100  However, when the field is created, it is not using the
field type that I custom created in my schema.  My question is, how do I
tell addField() to use my custom field type?

I _think_ I have to first SolrInputDocument.createField() and then call
SolrInputDocument.addField()?  Or is the process of indexing data into a
field done via some other API I overlooked?

I need some guidance to make sure I get the logic right.

Thanks.

Steven