[ https://issues.apache.org/jira/browse/PIG-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ahmed Eldawy updated PIG-3373: ------------------------------ Affects Version/s: site Release Note: I added a new patch that fixes this bug. It turned out that this bug happens only when the input file is .bz2 compressed and the non-matching tag spans two file splits in the compressed file. Since it's almost impossible to tailor an example that has this bug since the compression is virtually non-deterministic, I included a random generator that generates this test case. I don't like the idea of discovering a bug using this randomly generated file since, by definition, it's non-deterministic, I attached the test file for reference. The fix is still the same as the previous patch, but this time, the test fails without this fix. Status: Patch Available (was: Open) > XMLLoader returns non-matching nodes when a tag name spans through the block > boundary > ------------------------------------------------------------------------------------- > > Key: PIG-3373 > URL: https://issues.apache.org/jira/browse/PIG-3373 > Project: Pig > Issue Type: Bug > Components: piggybank > Affects Versions: site > Reporter: Ahmed Eldawy > Assignee: Ahmed Eldawy > Labels: patch > Attachments: PIG3373.patch, PIG3373_1.patch, bad-file.xml.bz2 > > > When node start tag spans two blocks this tag is returned even if it is not > of the type. > Example: For the following input file > <event id="3423"> > <ev > -------- BLOCK BOUNDARY > entually id="dfasd"> > XMLoader with tag type 'event' should return only the first one but it > actually returns both of them -- This message was sent by Atlassian JIRA (v6.1.5#6160)