[ https://issues.apache.org/jira/browse/PIG-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887929#comment-13887929 ]
Ahmed Eldawy commented on PIG-3373: ----------------------------------- By the way, the attached file bad-file.xml.bz2 is the smallest file that can reveal this bug. To find this bug, we need a compressed XML file of at least two BZ2 blocks. The minimum block size of BZ2 is 100KB (by design). > XMLLoader returns non-matching nodes when a tag name spans through the block > boundary > ------------------------------------------------------------------------------------- > > Key: PIG-3373 > URL: https://issues.apache.org/jira/browse/PIG-3373 > Project: Pig > Issue Type: Bug > Components: piggybank > Affects Versions: site > Reporter: Ahmed Eldawy > Assignee: Ahmed Eldawy > Labels: patch > Attachments: PIG3373.patch, PIG3373_1.patch, PIG3373_2.patch, > bad-file.xml.bz2 > > > When node start tag spans two blocks this tag is returned even if it is not > of the type. > Example: For the following input file > <event id="3423"> > <ev > -------- BLOCK BOUNDARY > entually id="dfasd"> > XMLoader with tag type 'event' should return only the first one but it > actually returns both of them -- This message was sent by Atlassian JIRA (v6.1.5#6160)