[jira] Commented: (SOLR-1803) ExtractingRequestHandler does not propagate multiple values to a multi-valued field
[ https://issues.apache.org/jira/browse/SOLR-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845656#action_12845656 ] Lance Norskog commented on SOLR-1803: - Actually the problem is that the effect of combining params and generated values is not defined well. I suggest that the semantics should be, a param is treated exactly like a generated field. Under this theory, these are the test cases: literal.single_s=abc and no generated single_s data: str name=single_sabc/str literal.single_s=abc and generated data def: str name=single_sabc def/str literal.multi_s=abc and generated data def: arr name=multi_s strabc/str strdef/str /arr Is this a coherent and useful semantics? ExtractingRequestHandler does not propagate multiple values to a multi-valued field --- Key: SOLR-1803 URL: https://issues.apache.org/jira/browse/SOLR-1803 Project: Solr Issue Type: Bug Components: contrib - Solr Cell (Tika extraction) Reporter: Lance Norskog Priority: Minor Attachments: display-extracting-bug.patch When multiple values for one field are extracted from a document, only the last value is stored in the document. If one or more values are given as parameters, those values are all stored. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1803) ExtractingRequestHandler does not propagate multiple values to a multi-valued field
[ https://issues.apache.org/jira/browse/SOLR-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845691#action_12845691 ] Hoss Man commented on SOLR-1803: Lance: i agree that the current semantics are either poorly definied, or not very useful, but your suggestion seems like it overlooks what is probably the two most common cases: * to have literal values that overwrite/replace extracted values * to have literal values that act as defaults unless extracted values are found ...those seem like they should both be possible for single and multivalued fields ExtractingRequestHandler does not propagate multiple values to a multi-valued field --- Key: SOLR-1803 URL: https://issues.apache.org/jira/browse/SOLR-1803 Project: Solr Issue Type: Bug Components: contrib - Solr Cell (Tika extraction) Reporter: Lance Norskog Priority: Minor Attachments: display-extracting-bug.patch When multiple values for one field are extracted from a document, only the last value is stored in the document. If one or more values are given as parameters, those values are all stored. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1803) ExtractingRequestHandler does not propagate multiple values to a multi-valued field
[ https://issues.apache.org/jira/browse/SOLR-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845697#action_12845697 ] Mark Miller commented on SOLR-1803: --- bq. Actually the problem is that the effect of combining params and generated values is not defined well. Your tests and summary don't appear to try and cover this ... should we update the Title and Description? bq. I suggest that the semantics should be, a param is treated exactly like a generated field. Have you tested that this is not the case? When I look at the code, it appears to me that it does what your proposed semantics say - params are treated like generated fields when adding multiple fields or concatenating - I have not tested this, but thats what the code looks like its doing ... ExtractingRequestHandler does not propagate multiple values to a multi-valued field --- Key: SOLR-1803 URL: https://issues.apache.org/jira/browse/SOLR-1803 Project: Solr Issue Type: Bug Components: contrib - Solr Cell (Tika extraction) Reporter: Lance Norskog Priority: Minor Attachments: display-extracting-bug.patch When multiple values for one field are extracted from a document, only the last value is stored in the document. If one or more values are given as parameters, those values are all stored. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1803) ExtractingRequestHandler does not propagate multiple values to a multi-valued field
[ https://issues.apache.org/jira/browse/SOLR-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843367#action_12843367 ] Mark Miller commented on SOLR-1803: --- A few comments: 1. If you want to test multi-valued field stuff, you really need to use a field that is multi-valued. multi_s is not as far as I can see. 2. You search for multi_s:value1 and multi_s:value2, but where do you ever add them? It seems valid that they are not found. 3. Your test (if/when written correctly) will pass whether the field is multivalued or not - ie it won't test if the params were really added as a multi-valued field. Solr cell does add multiple literals as multi-values when the the field is actually multi-valued, but when its not, it just concatenates the values - so those searches would still pass. ExtractingRequestHandler does not propagate multiple values to a multi-valued field --- Key: SOLR-1803 URL: https://issues.apache.org/jira/browse/SOLR-1803 Project: Solr Issue Type: Bug Components: contrib - Solr Cell (Tika extraction) Reporter: Lance Norskog Priority: Minor Attachments: display-extracting-bug.patch When multiple values for one field are extracted from a document, only the last value is stored in the document. If one or more values are given as parameters, those values are all stored. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1803) ExtractingRequestHandler does not propagate multiple values to a multi-valued field
[ https://issues.apache.org/jira/browse/SOLR-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843372#action_12843372 ] Mark Miller commented on SOLR-1803: --- Strike number one - didn't realize that the test schema for extraction is 1.0 - multivalued by default it is. So I'd address 2 and 3: add the values value1 and value2. That will get the test passing. You'd still need to device a test to tell you its making a mulivalue rather than concatenating as well. ExtractingRequestHandler does not propagate multiple values to a multi-valued field --- Key: SOLR-1803 URL: https://issues.apache.org/jira/browse/SOLR-1803 Project: Solr Issue Type: Bug Components: contrib - Solr Cell (Tika extraction) Reporter: Lance Norskog Priority: Minor Attachments: display-extracting-bug.patch When multiple values for one field are extracted from a document, only the last value is stored in the document. If one or more values are given as parameters, those values are all stored. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1803) ExtractingRequestHandler does not propagate multiple values to a multi-valued field
[ https://issues.apache.org/jira/browse/SOLR-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840472#action_12840472 ] Lance Norskog commented on SOLR-1803: - The attached patch does not fix the bug. It causes the ExtractingRequestHandler test program to illustrated the bug. Line 90 in TestExtractingRequestHandler.java should succeed and does not. [SOLR-1633|http://issues.apache.org/jira/browse/SOLR-1633] comments on a related behavior. This test patch also checks for that behavior. Since both this and 1633 are in the same area, work should be combined. ExtractingRequestHandler does not propagate multiple values to a multi-valued field --- Key: SOLR-1803 URL: https://issues.apache.org/jira/browse/SOLR-1803 Project: Solr Issue Type: Bug Components: contrib - Solr Cell (Tika extraction) Reporter: Lance Norskog Priority: Minor When multiple values for one field are extracted from a document, only the last value is stored in the document. If one or more values are given as parameters, those values are all stored. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.