[jira] Commented: (SOLR-1803) ExtractingRequestHandler does not propagate multiple values to a multi-valued field

2010-03-15 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845656#action_12845656
 ] 

Lance Norskog commented on SOLR-1803:
-

Actually the problem is that the effect of combining params and generated 
values is not defined well. I suggest that the semantics should be, a param is 
treated exactly like a generated field.

Under this theory, these are the test cases:

literal.single_s=abc and no generated single_s data:
str name=single_sabc/str

literal.single_s=abc and generated data def:
str name=single_sabc def/str

literal.multi_s=abc and generated data def:
arr name=multi_s
  strabc/str
  strdef/str
/arr

Is this a coherent and useful semantics? 

 ExtractingRequestHandler does not propagate multiple values to a multi-valued 
 field
 ---

 Key: SOLR-1803
 URL: https://issues.apache.org/jira/browse/SOLR-1803
 Project: Solr
  Issue Type: Bug
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Lance Norskog
Priority: Minor
 Attachments: display-extracting-bug.patch


 When multiple values for one field are extracted from a document, only the 
 last value is stored in the document. If one or more values are given as 
 parameters, those values are all stored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1803) ExtractingRequestHandler does not propagate multiple values to a multi-valued field

2010-03-15 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845691#action_12845691
 ] 

Hoss Man commented on SOLR-1803:


Lance: i agree that the current semantics are either poorly definied, or not 
very useful, but your suggestion seems like it overlooks what is probably the 
two most common cases:
 * to have literal values that overwrite/replace extracted values
 * to have literal values that act as defaults unless extracted values are 
found
...those seem like they should both be possible for single and multivalued 
fields

 ExtractingRequestHandler does not propagate multiple values to a multi-valued 
 field
 ---

 Key: SOLR-1803
 URL: https://issues.apache.org/jira/browse/SOLR-1803
 Project: Solr
  Issue Type: Bug
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Lance Norskog
Priority: Minor
 Attachments: display-extracting-bug.patch


 When multiple values for one field are extracted from a document, only the 
 last value is stored in the document. If one or more values are given as 
 parameters, those values are all stored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1803) ExtractingRequestHandler does not propagate multiple values to a multi-valued field

2010-03-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845697#action_12845697
 ] 

Mark Miller commented on SOLR-1803:
---

bq. Actually the problem is that the effect of combining params and generated 
values is not defined well.

Your tests and summary don't appear to try and cover this ... should we update 
the Title and Description?

bq.  I suggest that the semantics should be, a param is treated exactly like a 
generated field.

Have you tested that this is not the case? When I look at the code, it appears 
to me that it does what your proposed semantics say -
params are treated like generated fields when adding multiple fields or 
concatenating - I have not tested this, but thats what the
code looks like its doing ...

 ExtractingRequestHandler does not propagate multiple values to a multi-valued 
 field
 ---

 Key: SOLR-1803
 URL: https://issues.apache.org/jira/browse/SOLR-1803
 Project: Solr
  Issue Type: Bug
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Lance Norskog
Priority: Minor
 Attachments: display-extracting-bug.patch


 When multiple values for one field are extracted from a document, only the 
 last value is stored in the document. If one or more values are given as 
 parameters, those values are all stored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1803) ExtractingRequestHandler does not propagate multiple values to a multi-valued field

2010-03-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843367#action_12843367
 ] 

Mark Miller commented on SOLR-1803:
---

A few comments:

1. If you want to test multi-valued field stuff, you really need to use a field 
that is multi-valued. multi_s is not as far as I can see.

2. You search for multi_s:value1 and multi_s:value2, but where do you ever add 
them? It seems valid that they are not found.

3. Your test (if/when written correctly) will pass whether the field is 
multivalued or not - ie it won't test if the params were really added as a 
multi-valued field.
 Solr cell does add multiple literals as multi-values when the the field is 
actually multi-valued, but when its not, it just concatenates the values - so 
those searches
would still pass.

 ExtractingRequestHandler does not propagate multiple values to a multi-valued 
 field
 ---

 Key: SOLR-1803
 URL: https://issues.apache.org/jira/browse/SOLR-1803
 Project: Solr
  Issue Type: Bug
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Lance Norskog
Priority: Minor
 Attachments: display-extracting-bug.patch


 When multiple values for one field are extracted from a document, only the 
 last value is stored in the document. If one or more values are given as 
 parameters, those values are all stored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1803) ExtractingRequestHandler does not propagate multiple values to a multi-valued field

2010-03-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843372#action_12843372
 ] 

Mark Miller commented on SOLR-1803:
---

Strike number one - didn't realize that the test schema for extraction is 1.0 - 
multivalued by default it is.

So I'd address 2 and 3:

add the values value1 and value2.

That will get the test passing. You'd still need to device a test to tell you 
its making a mulivalue rather than concatenating as well.

 ExtractingRequestHandler does not propagate multiple values to a multi-valued 
 field
 ---

 Key: SOLR-1803
 URL: https://issues.apache.org/jira/browse/SOLR-1803
 Project: Solr
  Issue Type: Bug
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Lance Norskog
Priority: Minor
 Attachments: display-extracting-bug.patch


 When multiple values for one field are extracted from a document, only the 
 last value is stored in the document. If one or more values are given as 
 parameters, those values are all stored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1803) ExtractingRequestHandler does not propagate multiple values to a multi-valued field

2010-03-02 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840472#action_12840472
 ] 

Lance Norskog commented on SOLR-1803:
-

The attached patch does not fix the bug. It causes the ExtractingRequestHandler 
test program to illustrated the bug.  Line 90 in 
TestExtractingRequestHandler.java should succeed and does not.

[SOLR-1633|http://issues.apache.org/jira/browse/SOLR-1633] comments on a 
related behavior. This test patch also checks for that behavior. 

Since both this and 1633 are in the same area, work should be combined.

 ExtractingRequestHandler does not propagate multiple values to a multi-valued 
 field
 ---

 Key: SOLR-1803
 URL: https://issues.apache.org/jira/browse/SOLR-1803
 Project: Solr
  Issue Type: Bug
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Lance Norskog
Priority: Minor

 When multiple values for one field are extracted from a document, only the 
 last value is stored in the document. If one or more values are given as 
 parameters, those values are all stored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.