[
https://issues.apache.org/jira/browse/PIG-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-4340:
----------------------------
Component/s: impl
Fix Version/s: 0.15.0
Assignee: Daniel Dai
> PigStorage fails parsing empty map.
> -----------------------------------
>
> Key: PIG-4340
> URL: https://issues.apache.org/jira/browse/PIG-4340
> Project: Pig
> Issue Type: Bug
> Components: impl
> Reporter: Akira Murashita
> Assignee: Daniel Dai
> Priority: Minor
> Fix For: 0.15.0
>
>
> I've found that PigStorage doesn't parse empty maps properly.
> I'm using pig-0.11.0-cdh4.4.0, but reading the source code, it would be
> reproduced in the later versions.
> An empty map in a field of a tuple is parsed as null.
> {code:title=test.txt}
> empty []
> nonempty [foo#bar]
> {code}
> {code:title=test.pig}
> A = LOAD '/tmp/test.txt' USING PigStorage(' ') AS (a:chararray,
> b:map[chararray]);
> DUMP A;
> {code}
> {code}
> $ pig test.pig
> ...
> (empty,)
> (nonempty,[foo#bar])
> {code}
> Moreover, if the empty map is nested in a parent field, the entire field is
> interpreted as null.
> {code:title=test-nested.txt}
> empty (f1,[])
> nonempty (f1,[foo#bar])
> {code}
> {code:title=test.pig}
> A = LOAD '/tmp/test.txt' USING PigStorage(' ') AS (a:chararray, (b:chararray,
> b:map[chararray]));
> DUMP A;
> {code}
> {code}
> $ pig test.pig
> ...
> (empty,)
> (nonempty,(f1,[foo#bar]))
> {code}
> Investigating this, I've found it is because
> {{Utf8StorageConverter#consumeMap}} throws {{IOException}} when it receives
> empty map as string '[]'. It seems like always assuming there should be a
> content of map, more specifically '#' character.
> {code:java}
> private Map<String, Object> consumeMap(PushbackInputStream in,
> ResourceFieldSchema fieldSchema) throws IOException {
> int buf;
>
> while ((buf=in.read())!='[') {
> if (buf==-1) {
> throw new IOException("Unexpect end of map");
> }
> }
> HashMap<String, Object> m = new HashMap<String, Object>();
> ByteArrayOutputStream mOut = new ByteArrayOutputStream(BUFFER_SIZE);
> while (true) {
> // Read key (assume key can not contains special character such
> as #, (, [, {, }, ], )
> while ((buf=in.read())!='#') {
> if (buf==-1) {
> throw new IOException("Unexpect end of map");
> }
> mOut.write(buf);
> }
> String key = bytesToCharArray(mOut.toByteArray());
> if (key.length()==0)
> throw new IOException("Map key can not be null");
> {code}
> I would appreciate if you could fix this problem.
> Thanks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)