[jira] [Comment Edited] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests

Erick Erickson (JIRA) Mon, 13 Mar 2017 09:27:59 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907767#comment-15907767
 ]


Erick Erickson edited comment on SOLR-10229 at 3/13/17 4:26 PM:
----------------------------------------------------------------

Amrit and I were chatting offline. Using the managed-schema API to create 
fieldTypes inline is certainly possible, but I find it's code that's very hard 
to understand at a glance. And the goal here is to provide something that's not 
hard to write, especially if you're new.

Any interface that allows us to add fieldTypes immediately gets complex since 
fieldTypes are much more complex structurally than fields.

So what do you think about providing a "translator" method that would take the 
XML definition (static string in the test case if needed) and transforms it 
into a managed schema definition and submits it? You'd have something like 
(yeah, the internal double quotes are not escaped):
{code}
static String newField = "<fieldType name="lowercase" class="solr.TextField" 
positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>";

then somewhere

FieldUtility.addFieldType(newField)
{code}
the string could have multiple entries possibly. I'd find this much easier to 
understand than the raw addSchema code.

So that leaves us with a "regular" schema file with a series of pre-defined 
base types. I'm thinking the primitive types, int, tint, string, date, boolean 
and the like. Also it'd have a few "common" text-based field types. One 
challenge would be to keep the text-based types from sprawling in the "one base 
schema". We'd still also have all the bad-schema files I'd guess, and a maybe 
the minimal version(s). For the rest we could remove them and put special 
fieldTypes into the respective test files.

Mostly throwing this out there for discussion. I agree it's an extra step to 
have a parser to take the string and turn it into the addSchema commands but 
it's also much more friendly to scanning someone else's code.


was (Author: erickerickson):
Amrit and I were chatting offline. Using the managed-schema API to create 
fieldTypes inline is certainly possible, but I find it's code that's very hard 
to understand at a glance. And the goal here is to provide something that's not 
hard to write, especially if you're new.

Any interface that allows us to add fieldTypes immediately gets complex since 
fieldTypes are much more complex structurally than fields.

So what do you think about providing a "translator" method that would take the 
XML definition (static string in the test case if needed) and transforms it 
into a managed schema definition and submits it? You'd have something like 
(yeah, the internal double quotes are not escaped):
{{static String newField = "<fieldType name="lowercase" class="solr.TextField" 
positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>";

then somewhere

FieldUtility.addFieldType(newField)
}}
the string could have multiple entries possibly. I'd find this much easier to 
understand than the raw addSchema code.

So that leaves us with a "regular" schema file with a series of pre-defined 
base types. I'm thinking the primitive types, int, tint, string, date, boolean 
and the like. Also it'd have a few "common" text-based field types. One 
challenge would be to keep the text-based types from sprawling in the "one base 
schema". We'd still also have all the bad-schema files I'd guess, and a maybe 
the minimal version(s). For the rest we could remove them and put special 
fieldTypes into the respective test files.

Mostly throwing this out there for discussion. I agree it's an extra step to 
have a parser to take the string and turn it into the addSchema commands but 
it's also much more friendly to scanning someone else's code.

> See what it would take to shift many of our one-off schemas used for testing 
> to managed schema and construct them as part of the tests
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10229
>                 URL: https://issues.apache.org/jira/browse/SOLR-10229
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>            Priority: Minor
>
> The test schema files are intimidating. There are about a zillion of them, 
> and making a change in any of them risks breaking some _other_ test. That 
> leaves people three choices:
> 1> add what they need to some existing schema. Which makes schemas bigger and 
> bigger and bigger.
> 2> create a new schema file, adding to the proliferation thereof.
> 3> Look through all the existing tests to see if they have something that 
> works.
> The recent work on LUCENE-7705 is a case in point. We're adding a maxLen 
> parameter to some tokenizers. Putting those parameters into any of the 
> existing schemas, especially to test < 255 char tokens is virtually 
> guaranteed to break other tests, so the only safe thing to do is make another 
> schema file. Adding to the multiplication of files.
> As part of SOLR-5260 I tried creating the schema on the fly rather than 
> creating a new static schema file and it's not hard. WDYT about making this 
> into some better thought-out utility? 
> At present, this is pretty fuzzy, I wanted to get some reactions before 
> putting much effort into it. I expect that the utility methods would 
> eventually get a bunch of canned types. It's reasonably straightforward for 
> primitive types, if lengthy. But when you get into solr.TextField-based types 
> it gets less straight-forward.
> We could manage to just move the "intimidation" from the plethora of schema 
> files to a zillion fieldTypes in the utility to choose from...
> Also, forcing every test to define the fields up-front is arguably less 
> convenient than just having _some_ canned schemas we can use. And erroneous 
> schemas to test failure modes are probably not very good fits for any such 
> framework.
> [~steve_rowe] and [~hossman_luc...@fucit.org] in particular might have 
> something to say.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests

Reply via email to