[jira] [Commented] (SOLR-3250) Dynamic Field capabilities based on value not name

Steve Rowe (JIRA) Mon, 03 Jun 2013 08:15:22 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673197#comment-13673197
 ]


Steve Rowe commented on SOLR-3250:
----------------------------------

Now that we have the ability to dynamically add schema fields (SOLR-3251), I 
want to push forward on this issue.

Value-based dynamic field capabilities for document updates - which I'll 
sometimes refer to as schemaless mode - will a) determine the type for field 
names that don’t match explicit or dynamic fields in the schema; b) add these 
field names to the schema with their determined types; and c) complete the 
document update request as normal.  This process should apply equally to new 
doc additions, atomic updates, and regular updates.

In a conversation with [[email protected]] about this feature, he 
suggested that configuration for parsing/converting {{String}}-typed field 
values into the appropriate Java objects could be separated from configuration 
of mappings from Java object types to schema field types.  In this way, 
components built for schemaless mode could be reused for other purposes.

JSON and Javabin content streams already carry some type information for their 
field values.  The {{ContentStreamLoader}}-s corresponding to these, 
{{JsonLoader}} and {{JavabinLoader}}, should set field value object types in 
the {{SolrInputDocument}} according to the content stream's data types.  
(Currently {{JavabinLoader}} does this correctly, but {{JsonLoader}} stores 
everything as {{String}}-s; this will need to be fixed.)  As a result, for the 
Java object types supported by these content streams and their loaders (as well 
as other update processors, etc. that set field values' Java object types), 
{{String}} parsing/conversion won't be required, and only the Java object type 
-> schema field type mappings will be necessary to determine the schema field 
type for new fields.

[On 
SOLR-2802|https://issues.apache.org/jira/browse/SOLR-2802?focusedCommentId=13117911&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13117911],
 Hoss wrote that {{FieldMutatingUpdateProcessor}}-s that parsed dates, numbers 
and booleans would be generally useful.  I plan on going that route to 
implement {{String}}-typed field value parsing.  These field value parsing 
update processors should operate on {{String}}-valued fields that either a) are 
not in the schema, or b) have a schema field type with an appropriate 
{{typeClass}}.

After the new parsing update processors detect and convert field values to the 
appropriate Java object types, an update processor that adds fields to the 
schema as needed can be configured with a mapping from Java object type to 
schema field type.

Here is the list of things I think need to happen - I plan on making JIRA 
issues for each of these:

# Fix {{JsonLoader}} to create field values using the JSON-supplied type, 
rather than making everything a {{String}}.
# Add a new field update processor selector that will configure the processor 
to select fields that match any schema field, or that match no schema field, 
depending on its boolean parameter: {{<bool 
name="fieldNameMatchesSchemaField">}}
# Add new {{FieldMutatingUpdateProcessorFactory}} subclasses 
{{ParseFooUpdateProcessorFactory}}, where {{Foo}} includes {{Date}}, 
{{Double}}, {{Long}}, and {{Boolean}}. If they see a field value that is not 
{{String}}-valued, or can't parse the value, they will ignore it and leave it 
as is.  For multi-valued fields, they should be all-or-nothing.
# Add a new {{AddSchemaFieldsUpdateProcessorFactory}}, with configurable 
mappings from Java object type to schema field type, that will dynamically add 
fields to the schema, as needed.
# Add a new example config set for schemaless mode.

                
> Dynamic Field capabilities based on value not name
> --------------------------------------------------
>
>                 Key: SOLR-3250
>                 URL: https://issues.apache.org/jira/browse/SOLR-3250
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>
> In some situations, one already knows the schema of their content, so having 
> to declare a schema in Solr becomes cumbersome in some situations.  For 
> instance, if you have all your content in JSON (or can easily generate it) or 
> other typed serializations, then you already have a schema defined.  It would 
> be nice if we could have support for dynamic fields that used whatever name 
> was passed in, but then picked the appropriate FieldType for that field based 
> on the value of the content.  So, for instance, if the input is a number, it 
> would select the appropriate numeric type.  If it is a plain text string, it 
> would pick the appropriate text field (you could even add in language 
> detection here).  If it is comma separated, it would treat them as keywords, 
> etc.  Also, we could likely send in a hint as to the type too.
> With this approach, you of course have a "first in wins" situation, but 
> assuming you have this schema defined elsewhere, it is likely fine.
> Supporting such cases would allow us to be schemaless when appropriate, while 
> offering the benefits of schemas when appropriate.  Naturally, one could mix 
> and match these too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-3250) Dynamic Field capabilities based on value not name

Reply via email to