[ https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mithun Radhakrishnan updated HIVE-15491: ---------------------------------------- Affects Version/s: 2.1.1 > Failures are masked/swallowed in GenericUDTFJSONTuple::process(). > ----------------------------------------------------------------- > > Key: HIVE-15491 > URL: https://issues.apache.org/jira/browse/HIVE-15491 > Project: Hive > Issue Type: Bug > Affects Versions: 2.1.1 > Reporter: Mithun Radhakrishnan > > I draw your attention to the following piece of code in > {{GenericUDTFJSONTuple::process()}}: > {code:java} > @Override > public void process(Object[] o) throws HiveException { > ... > for (int i = 0; i < numCols; ++i) { > if (retCols[i] == null) { > retCols[i] = cols[i]; // use the object pool rather than creating a > new object > } > Object extractObject = ((Map<String, Object>)jsonObj).get(paths[i]); > if (extractObject instanceof Map || extractObject instanceof List) { > retCols[i].set(MAPPER.writeValueAsString(extractObject)); > } else if (extractObject != null) { > retCols[i].set(extractObject.toString()); > } else { > retCols[i] = null; > } > } > forward(retCols); > return; > } catch (Throwable e) { <================= Yikes. > LOG.error("JSON parsing/evaluation exception" + e); > forward(nullCols); > } > } > {code} > The error-handling here seems suspect. Judging from the error message, the > intention here seems to be to catch JSON-specific errors arising from > {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching > {{Throwable}}, this code masks the errors that arise from the call to > {{forward(retCols)}}. > I just ran into this in production. A user with a nearly exhausted HDFS quota > attempted to use {{json_tuple}} to extract fields from json strings in his > data. The data turned out to have large record counts and the query used over > 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the > exhausted quota. But the thrown exception was swallowed in the code above. > {{process()}} ignored the failure for the record and proceeded to the next > one. Eventually, this resulted in DDoS-ing the name-node. > I'll have a patch for this shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)