[jira] [Commented] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests

Amrit Sarkar (JIRA) Sat, 08 Apr 2017 00:47:01 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961727#comment-15961727
 ]


Amrit Sarkar commented on SOLR-10229:
-------------------------------------

Updated patch:

Refined builder methods for the Framework. TestKeywordTokenizer.java for 
LUCENE-7705 changed incorporating the framework and successfully able to 
implement it using declarative builder methods.

Please note one thing regarding building FieldTypes:
{code}
   framework.createNewFieldType().withName("keywordType").
        withClassName("solr.TextField").
        withAttribute("positionIncrementGap", "100").
        withAttribute("analyzer", map("tokenizer", map("class", 
"solr.KeywordTokenizerFactory", "maxTokenLen", "3"))).
        build(h.getCore());
{code}

For defining analyser nested map for attributes are declared, I believe this 
the correct way to do, seeking suggestions whether we want to handle them 
better.

I am struggling with loading mother-schema in the framework. Following are the 
challenges and seeking advice on it:
1. I am trying to use the *_ManagedSchemaFactory.create()_* to load the mother 
schema, it needs a live Solrconfig object to do it. If I pass a dummy and able 
to create one, while we run a test and load core with an empty schema, the 
mother-schema gets replaced by the empty one.
2. The access level for the methods, constructors are restrictive.
2. I digged down to *readSchema(InputSource is)*, which effectively read the 
schema and fill the fields, fieldTypes, copyFields .... list into the core. If 
I refer IndexSchema directly to get the function *readSchema(InputSource is)*, 
it is immutable and hence the functions related to Schema API doesn't apply to 
them. Also _readSchema_ needs _SolrResourceLoader_ from _SolrConfig_, which 
should be one-off thing a the time of Framework creation.

In the patch I commented out the loading of mother-schema, I am trying out 
different combinations, techniques to load them up. I am sure there is a way, 
seeking some pointers on them. I was also thinking about reading the schema in 
plain XML reader, though not sure it is a good way.

> See what it would take to shift many of our one-off schemas used for testing 
> to managed schema and construct them as part of the tests
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10229
>                 URL: https://issues.apache.org/jira/browse/SOLR-10229
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Minor
>         Attachments: SOLR-10229.patch, SOLR-10229.patch
>
>
> The test schema files are intimidating. There are about a zillion of them, 
> and making a change in any of them risks breaking some _other_ test. That 
> leaves people three choices:
> 1> add what they need to some existing schema. Which makes schemas bigger and 
> bigger and bigger.
> 2> create a new schema file, adding to the proliferation thereof.
> 3> Look through all the existing tests to see if they have something that 
> works.
> The recent work on LUCENE-7705 is a case in point. We're adding a maxLen 
> parameter to some tokenizers. Putting those parameters into any of the 
> existing schemas, especially to test < 255 char tokens is virtually 
> guaranteed to break other tests, so the only safe thing to do is make another 
> schema file. Adding to the multiplication of files.
> As part of SOLR-5260 I tried creating the schema on the fly rather than 
> creating a new static schema file and it's not hard. WDYT about making this 
> into some better thought-out utility? 
> At present, this is pretty fuzzy, I wanted to get some reactions before 
> putting much effort into it. I expect that the utility methods would 
> eventually get a bunch of canned types. It's reasonably straightforward for 
> primitive types, if lengthy. But when you get into solr.TextField-based types 
> it gets less straight-forward.
> We could manage to just move the "intimidation" from the plethora of schema 
> files to a zillion fieldTypes in the utility to choose from...
> Also, forcing every test to define the fields up-front is arguably less 
> convenient than just having _some_ canned schemas we can use. And erroneous 
> schemas to test failure modes are probably not very good fits for any such 
> framework.
> [~steve_rowe] and [~hossman_luc...@fucit.org] in particular might have 
> something to say.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests

Reply via email to