[jira] [Updated] (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2011-04-17 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-445:


Attachment: SOLR-445.patch

So, Grant. How do you feel about refactorings G?

I got bitten by this problem again so I decided to dust off the patch, and I 
re-created it. This one shouldn't have the gratuitous re-formatting. But, after 
I added the bookkeeping, the method got even more unwieldy, so I extracted some 
of the code to methods in XMLLoader. I also have the un-refactored version if 
this one is too painful.

This patch incorporates the changes you suggested months ago. I'm a little 
uncertain whether putting a constant in UpdateParams.java was the correct 
place, but it seemed like a pattern used for other parameters.

One minor issue: The behavior is the same here as it used to be if you don't 
start the packet with add. An NPE is thrown. That's because the addCmd 
variable isn't initialized until the add tag is encountered and the NPE is a 
result of using the addCmd variable later (I think I was seeing it at line 
118). I think it would be better to fail if the first element wasn't an add 
element rather than because it just happens to cause an NPE.

While I'm at it, though, what do you think about making this robust enough to 
ignore ?xml and/or !DOCTYPE entries? Or is that just not worth the bother?

Erick

 XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Grant Ingersoll
 Fix For: Next

 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, 
 SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2011-01-23 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-445:


Attachment: SOLR-445_3x.patch
SOLR-445.patch

OK, I think this is ready to go if someone wants to take a look and commit.

This patch includes the ability to turn on continuing to process documents 
after the first failure, as per Erik H's comments. The default is the old 
behavior of stopping upon the first error.

Changed example solrconfig.xml to include the new parameter as false (mimicing 
old behavior) in both 3x and trunk.



 XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Erick Erickson
 Fix For: Next

 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, 
 SOLR-445.patch, solr-445.xml, SOLR-445_3x.patch


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2011-01-18 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-445:


Attachment: SOLR-445-3_x.patch
SOLR-445.patch

I think it's ready for review, both trunk and 3_x. Would someone look this over 
and commit it if they think it's ready?

Note to self: do NOT call initCore in a test case just because you need a 
different schema.

The problem I was having with running tests was because I needed a schema file 
with a required field so I naively called initCore with schema11.xml in spite 
of the fact that @BeforeClass called it with just schema.xml. Which apparently 
does bad things with the state of *something* and caused other tests to fail... 
I can get TestDistributedSearch to fail on unchanged source code simply by 
calling initCore with schema11.xml and doing nothing else in a new test case in 
BasicFunctionalityTest. So I put my new tests that required schema11 in a new 
file instead.

The XML file attached is not intended to be committed, it is just a convenience 
for anyone checking out this patch to run against a Solr instance to see what 
is returned.

This seems to return the data in the SolrJ case as well.

NOTE: This does change the behavior of Solr. Without this patch, the first 
document that is incorrect stops processing. Now, it continues merrily on 
adding documents as it can. Is this desirable behavior? It would be easy to 
abort on first error if that's the consensus, and I could take some tedious 
record-keeping out. I think there's no big problem with continuing on, since 
the state of committed documents is indeterminate already when errors occur so 
worrying about this should be part of a bigger issue.

 XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Erick Erickson
 Fix For: Next

 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, 
 solr-445.xml


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2011-01-12 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-445:


Attachment: solr-445.xml
SOLR-445.patch

Here's a cut at an improvement at least.

The attached XML file contains an add packet with a number of documents 
illustrating a number of errors. The xml file can be POSTed Solr to index via 
the post.jar file so you can see the output.

This patch attempts to report back to the user the following for each document 
that failed:
1 the ordinal position in the file where the error occurred (e.g. the first, 
second, etc doc tag).
2 the uniqueKey if available.
3 the error.

The general idea is to accrue the errors in a StringBuilder and eventually 
re-throw the error after processing as far as possible.

Issues:
1 the reported format in the log file is kind of hard to read. I 
pipe-delimited the various doc tags, but they run together in a Windows DOS 
window. What happens on Unix I'm not quite sure. Suggestions welcome.
2 From the original post, rolling this back will be tricky. Very tricky. The 
autocommit feature makes it indeterminate what's been committed to the index, 
so I don't know how to even approach rolling back everything.
3 The intent here is to give the user a clue where to start when figuring out 
what document(s) failed so they don't have to guess.
4 Tests fail, but I have no clue why. I checked out a new copy of trunk and 
that fails as well, so I don't think that this patch is the cause of the 
errors. But let's not commit this until we can be sure.
5 What do you think about limiting the number of docs that fail before 
quitting? One could imagine some ratio (say 10%) that have to fail before 
quitting (with some safeguards, like don't bother calculating the ratio until 
20 docs had been processed or...). Or an absolute number. Should this be a 
parameter? Or hard-coded? The assumption here is that if 10 (or 100 or..) docs 
fail, there's something pretty fundamentally wrong and it's a waste to keep on. 
I don't have any strong feeling here, I can argue it either way
6 Sorry, all, but I reflexively hit the reformat keystrokes so the raw patch 
may be hard to read. But I'm pretty well in the camp that you *have* to 
reformat as you go or the code will be held hostage to the last person who 
*didn't* format properly. I'm pretty sure I'm using the right codestyle.xml 
file, but let me know if not.
7 I doubt that this has any bearing on, say, SolrJ indexing. Should that be 
another bug (or is there one already)? Anybody got a clue where I'd look for 
that since I'm in the area anyway?

Erick

 XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Erick Erickson
 Fix For: Next

 Attachments: SOLR-445.patch, solr-445.xml


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2008-08-19 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-445:
---

Fix Version/s: 1.4

 XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Grant Ingersoll
 Fix For: 1.4


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.