[ 
https://issues.apache.org/jira/browse/PIG-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Eldawy updated PIG-3373:
------------------------------

    Affects Version/s: site
         Release Note: 
I added a new patch that fixes this bug. It turned out that this bug happens 
only when the input file is .bz2 compressed and the non-matching tag spans two 
file splits in the compressed file. Since it's almost impossible to tailor an 
example that has this bug since the compression is virtually non-deterministic, 
I included a random generator that generates this test case.
I don't like the idea of discovering a bug using this randomly generated file 
since, by definition, it's non-deterministic, I attached the test file for 
reference.
The fix is still the same as the previous patch, but this time, the test fails 
without this fix.
               Status: Patch Available  (was: Open)

> XMLLoader returns non-matching nodes when a tag name spans through the block 
> boundary
> -------------------------------------------------------------------------------------
>
>                 Key: PIG-3373
>                 URL: https://issues.apache.org/jira/browse/PIG-3373
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: site
>            Reporter: Ahmed Eldawy
>            Assignee: Ahmed Eldawy
>              Labels: patch
>         Attachments: PIG3373.patch, PIG3373_1.patch, bad-file.xml.bz2
>
>
> When node start tag spans two blocks this tag is returned even if it is not 
> of the type.
> Example: For the following input file
> <event id="3423">
> <ev
> -------- BLOCK BOUNDARY
> entually id="dfasd">
> XMLoader with tag type 'event' should return only the first one but it 
> actually returns both of them



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to