subject:"\[jira\] Updated\: \(SOLR\-445\) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch"

[jira] [Updated] (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2011-04-17 Thread Erick Erickson (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-445:

Attachment: SOLR-445.patch

So, Grant. How do you feel about refactorings G?

I got bitten by this problem again so I decided to dust off the patch, and I
re-created it. This one shouldn't have the gratuitous re-formatting. But, after
I added the bookkeeping, the method got even more unwieldy, so I extracted some
of the code to methods in XMLLoader. I also have the un-refactored version if
this one is too painful.

This patch incorporates the changes you suggested months ago. I'm a little
uncertain whether putting a constant in UpdateParams.java was the correct
place, but it seemed like a pattern used for other parameters.

One minor issue: The behavior is the same here as it used to be if you don't
start the packet with add. An NPE is thrown. That's because the addCmd
variable isn't initialized until the add tag is encountered and the NPE is a
result of using the addCmd variable later (I think I was seeing it at line
118). I think it would be better to fail if the first element wasn't an add
element rather than because it just happens to cause an NPE.

While I'm at it, though, what do you think about making this robust enough to
ignore ?xml and/or !DOCTYPE entries? Or is that just not worth the bother?

Erick

XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

Key: SOLR-445
URL: https://issues.apache.org/jira/browse/SOLR-445
Project: Solr
Issue Type: Bug
Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Grant Ingersoll
Fix For: Next

Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch,
SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml

Has anyone run into the problem of handling bad documents / failures mid
batch. Ie:
add
doc
field name=id1/field
/doc
doc
field name=id2/field
field name=myDateFieldI_AM_A_BAD_DATE/field
/doc
doc
field name=id3/field
/doc
/add
Right now solr adds the first doc and then aborts. It would seem like it
should either fail the entire batch or log a message/return a code and then
continue on to add doc 3. Option 1 would seem to be much harder to
accomplish and possibly require more memory while Option 2 would require more
information to come back from the API. I'm about to dig into this but I
thought I'd ask to see if anyone had any suggestions, thoughts or comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2011-01-23 Thread Erick Erickson (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-445:

Attachment: SOLR-445_3x.patch
SOLR-445.patch

OK, I think this is ready to go if someone wants to take a look and commit.

This patch includes the ability to turn on continuing to process documents
after the first failure, as per Erik H's comments. The default is the old
behavior of stopping upon the first error.

Changed example solrconfig.xml to include the new parameter as false (mimicing
old behavior) in both 3x and trunk.

XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

Key: SOLR-445
URL: https://issues.apache.org/jira/browse/SOLR-445
Project: Solr
Issue Type: Bug
Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Erick Erickson
Fix For: Next

Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch,
SOLR-445.patch, solr-445.xml, SOLR-445_3x.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2011-01-18 Thread Erick Erickson (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-445:

Attachment: SOLR-445-3_x.patch
SOLR-445.patch

I think it's ready for review, both trunk and 3_x. Would someone look this over
and commit it if they think it's ready?

Note to self: do NOT call initCore in a test case just because you need a
different schema.

The problem I was having with running tests was because I needed a schema file
with a required field so I naively called initCore with schema11.xml in spite
of the fact that @BeforeClass called it with just schema.xml. Which apparently
does bad things with the state of *something* and caused other tests to fail...
I can get TestDistributedSearch to fail on unchanged source code simply by
calling initCore with schema11.xml and doing nothing else in a new test case in
BasicFunctionalityTest. So I put my new tests that required schema11 in a new
file instead.

The XML file attached is not intended to be committed, it is just a convenience
for anyone checking out this patch to run against a Solr instance to see what
is returned.

This seems to return the data in the SolrJ case as well.

NOTE: This does change the behavior of Solr. Without this patch, the first
document that is incorrect stops processing. Now, it continues merrily on
adding documents as it can. Is this desirable behavior? It would be easy to
abort on first error if that's the consensus, and I could take some tedious
record-keeping out. I think there's no big problem with continuing on, since
the state of committed documents is indeterminate already when errors occur so
worrying about this should be part of a bigger issue.

XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

Key: SOLR-445
URL: https://issues.apache.org/jira/browse/SOLR-445
Project: Solr
Issue Type: Bug
Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Erick Erickson
Fix For: Next

Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch,
solr-445.xml

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2011-01-12 Thread Erick Erickson (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-445:

Attachment: solr-445.xml
SOLR-445.patch

Here's a cut at an improvement at least.

The attached XML file contains an add packet with a number of documents
illustrating a number of errors. The xml file can be POSTed Solr to index via
the post.jar file so you can see the output.

This patch attempts to report back to the user the following for each document
that failed:
1 the ordinal position in the file where the error occurred (e.g. the first,
second, etc doc tag).
2 the uniqueKey if available.
3 the error.

The general idea is to accrue the errors in a StringBuilder and eventually
re-throw the error after processing as far as possible.

Issues:
1 the reported format in the log file is kind of hard to read. I
pipe-delimited the various doc tags, but they run together in a Windows DOS
window. What happens on Unix I'm not quite sure. Suggestions welcome.
2 From the original post, rolling this back will be tricky. Very tricky. The
autocommit feature makes it indeterminate what's been committed to the index,
so I don't know how to even approach rolling back everything.
3 The intent here is to give the user a clue where to start when figuring out
what document(s) failed so they don't have to guess.
4 Tests fail, but I have no clue why. I checked out a new copy of trunk and
that fails as well, so I don't think that this patch is the cause of the
errors. But let's not commit this until we can be sure.
5 What do you think about limiting the number of docs that fail before
quitting? One could imagine some ratio (say 10%) that have to fail before
quitting (with some safeguards, like don't bother calculating the ratio until
20 docs had been processed or...). Or an absolute number. Should this be a
parameter? Or hard-coded? The assumption here is that if 10 (or 100 or..) docs
fail, there's something pretty fundamentally wrong and it's a waste to keep on.
I don't have any strong feeling here, I can argue it either way
6 Sorry, all, but I reflexively hit the reformat keystrokes so the raw patch
may be hard to read. But I'm pretty well in the camp that you *have* to
reformat as you go or the code will be held hostage to the last person who
*didn't* format properly. I'm pretty sure I'm using the right codestyle.xml
file, but let me know if not.
7 I doubt that this has any bearing on, say, SolrJ indexing. Should that be
another bug (or is there one already)? Anybody got a clue where I'd look for
that since I'm in the area anyway?

Erick

XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

Key: SOLR-445
URL: https://issues.apache.org/jira/browse/SOLR-445
Project: Solr
Issue Type: Bug
Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Erick Erickson
Fix For: Next

Attachments: SOLR-445.patch, solr-445.xml

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2008-08-19 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-445:
---

Fix Version/s: 1.4

 XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Grant Ingersoll
 Fix For: 1.4


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Updated] (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

[jira] Updated: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

[jira] Updated: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

[jira] Updated: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

[jira] Updated: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

5 matches

Site Navigation

Mail list logo

Footer information