[ 
https://issues.apache.org/jira/browse/SOLR-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075116#comment-16075116
 ] 

Steve Rowe edited comment on SOLR-9526 at 7/5/17 5:36 PM:
----------------------------------------------------------

Attaching patch brought up to date with master (in particular, collapsing of 
{{data_driven_schema_configs}} and {{basic_configs}} into {{_default}}) - note 
that your original patch only modified {{solrconfig.xml}} on one of these and 
{{managed_schema}} on the other - I assume you had/have local changes that 
didn't make it into the patch [~janhoy]?  I made a couple of other changes; 
details below.

{quote}
See new NOCOMMIT comments. I was using the ManagedIndexSchema method
{code}
public ManagedIndexSchema addCopyFields(String source, Collection<String> 
destinations, int maxChars)
{code}
which does not have a {{persist=true/false}} argument, so calling it leaves the 
schema not persisted. Then I could not find a way to explicitly persist it 
since method
{{boolean persistManagedSchema(boolean createOnly)}}
was not public. In this patch I've made it public and done a hacky instanceof 
check in AddSchemaFieldsUpdateProcessorFactory
{code}
if (newSchema instanceof ManagedIndexSchema) {
  // NOCOMMIT: Hack to avoid persisting schema once after addFields and then 
once after each copyField
  ((ManagedIndexSchema)newSchema).persistManagedSchema(false);
}
{code}
Steve Rowe, you wrote the {{addCopyFields()}} method a while ago, is there a 
cleaner way to make sure schema is persisted after adding a copyField?
{quote}

The design of {{ManagedIndexSchema}}'s API was in support of the Schema REST 
API, where each resource was modifiable one at a time; "bulk" modifications 
weren't possible.  In the new bulk schema API, though, the ordinary case 
involves multiple modifications; in this case, it is counter-productive to 
persist in the middle of a set of operations.

SOLR-6476 (introducing schema "bulk" mode) added the option to *not* persist 
the schema after an operation; previously every operation was automatically 
persisted.  This was added as an option because at the time, bulk and REST 
modes co-existed.   SOLR-7682 added the ability to specify maxChars for 
copyField directives, and I intentionally left off the {{persist}} option of 
the new {{addCopyFields()}} method, because there was (intentionally) no way to 
invoke this capability via the (now deprecated) schema REST API, and the bulk 
schema API didn't need the {{persist}} option.

Long story short: I think making {{persistManagedSchema()}} public is a natural 
consequence of the bulk schema API (and in support of bulk operations from 
other sources, e.g. this issue).  It's just that nobody had gotten around to it 
yet.  

In {{AddSchemaFieldsUpdateProcessorFactory.processAdd()}} in my patch I removed 
the {{instanceof ManagedIndexSchema}} check wrapping the call to 
{{persistManagedSchama()}}, as well as the {{NOCOMMIT}}'s, since the check {{if 
( ! cmd.getReq().getSchema().isMutable())}} at the beginning of the method 
already ensures that we're dealing with a {{ManagedIndexSchema}}.

I also removed the following {{typeMapping}} that was added in your patch from 
URP chains {{add-fields-no-run-processor}} and {{parse-and-add-fields}} in 
{{solrconfig-add-schema-fields-update-processor-chains.xml}} - I'm assuming 
this is a vestige from an earlier concept of removing {{<defaultTypeMapping>}}, 
since these chains have {{<str name="defaultFieldType">text</str>}}?  
{{AddSchemaFieldsUpdateProcessorFactoryTest}} passes with my change:

{code:xml}
<lst name="typeMapping">
  <str name="valueClass">java.lang.String</str>
  <str name="fieldType">text</str>
</lst>
{code}


was (Author: steve_rowe):
Attaching patch brought up to date with master (in particular, collapsing of 
{{data_driven_schema_configs}} and {{basic_configs}} into {{_default}}) - note 
that your original patch only modified {{solrconfig.xml}} on one of these and 
{{managed_schema}} on the other - I assume you had/have local changes that 
didn't make it into the patch [~janhoy]?  I made a couple of other changes; 
details below.

{quote}
See new NOCOMMIT comments. I was using the ManagedIndexSchema method
{code}
public ManagedIndexSchema addCopyFields(String source, Collection<String> 
destinations, int maxChars)
{code}
which does not have a {{persist=true/false}} argument, so calling it leaves the 
schema not persisted. Then I could not find a way to explicitly persist it 
since method
{{boolean persistManagedSchema(boolean createOnly)}}
was not public. In this patch I've made it public and done a hacky instanceof 
check in AddSchemaFieldsUpdateProcessorFactory
{code}
if (newSchema instanceof ManagedIndexSchema) {
  // NOCOMMIT: Hack to avoid persisting schema once after addFields and then 
once after each copyField
  ((ManagedIndexSchema)newSchema).persistManagedSchema(false);
}
{code}
Steve Rowe, you wrote the {{addCopyFields()}} method a while ago, is there a 
cleaner way to make sure schema is persisted after adding a copyField?
{quote}

The design of {{ManagedIndexSchema}}'s API was in support of the Schema REST 
API, where each resource was modifiable one at a time; "bulk" modifications 
weren't possible.  In the new bulk schema API, though, the ordinary case 
involves multiple modifications; in this case, it is counter-productive to 
persist in the middle of a set of operations.

SOLR-6476 (introducing schema "bulk" mode) added the option to *not* persist 
the schema after an operation; previously every operation was automatically 
persisted.  This was added as an option because at the time, bulk and REST 
modes co-existed.   SOLR-7682 added the ability to specify maxChars for 
copyField directives, and I intentionally left off the {{persist}} option of 
the new {{addCopyFields()}} method, because there was (intentionally) no way to 
invoke this capability via the (now deprecated) schema REST API, and the bulk 
schema API didn't need the {{persist}} option.

Long story short: I think making {{persistManagedSchema()}} public is a natural 
consequence of the bulk schema API (and in support of bulk operations from 
other sources, e.g. this issue).  It's just that nobody had gotten around to it 
yet.  

In the {{AddSchemaFieldsUpdateProcessorFactory.processAdd()}} in my patch I 
removed the {{instanceof ManagedIndexSchema}} check wrapping the call to 
{{persistManagedSchama()}}, as well as the {{NOCOMMIT}}'s, since the check {{if 
( ! cmd.getReq().getSchema().isMutable())}} at the beginning of the method 
already insures that we're dealing with a {{ManagedIndexSchema}}.

I also removed the following {{typeMapping}} that was added in your patch from 
URP chains {{add-fields-no-run-processor}} and {{parse-and-add-fields}} in 
{{solrconfig-add-schema-fields-update-processor-chains.xml}} - I'm assuming 
this is a vestige from an earlier concept of removing {{<defaultTypeMapping>}}, 
since these chains have {{<str name="defaultFieldType">text</str>}}?  
{{AddSchemaFieldsUpdateProcessorFactoryTest}} passes with my change:

{code:xml}
<lst name="typeMapping">
  <str name="valueClass">java.lang.String</str>
  <str name="fieldType">text</str>
</lst>
{code}

> data_driven configs defaults to "strings" for unmapped fields, makes most 
> fields containing "textual content" unsearchable, breaks tutorial examples
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9526
>                 URL: https://issues.apache.org/jira/browse/SOLR-9526
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: UpdateRequestProcessors
>            Reporter: Hoss Man
>            Assignee: Jan Høydahl
>              Labels: dynamic-schema
>             Fix For: 7.0
>
>         Attachments: SOLR-9526.patch, SOLR-9526.patch, SOLR-9526.patch, 
> SOLR-9526.patch, SOLR-9526.patch
>
>
> James Pritchett pointed out on the solr-user list that this sample query from 
> the quick start tutorial matched no docs (even though the tutorial text says 
> "The above request returns only one document")...
> http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=name:foundation
> The root problem seems to be that the add-unknown-fields-to-the-schema chain 
> in data_driven_schema_configs is configured with...
> {code}
> <str name="defaultFieldType">strings</str>
> {code}
> ...and the "strings" type uses StrField and is not tokenized.
> ----
> Original thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201609.mbox/%3ccac-n2zrpsspfnk43agecspchc5b-0ff25xlfnzogyuvyg2d...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to