Hello Team,
We are processing huge XML files.
We are currently using Apache FLUME to ingest the raw XML data into HDFS.
Tie the file to Hive and then write Xpath queries to work on those files.
First, is our approach good are can we do it better.
Second running Xpath with Hive I am getting some errors.
Query that I ran:
select xpath(xml_load, "//book[@id='470']") from Muthu_XML;
Table:
CREATE EXTERNAL TABLE IF NOT EXISTS Muthu_XML(XML_Load STRING)
LOCATION "/user/Muthu";
Sample XML:
<book id="471><author>Gambardella, Matthew</author><title>XML Developer's
Guide</title><genre>Computer</genre><price>227</price></book>
Error:
Diagnostic Messages for this Task:
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row {"xml_load":"<book id=\"0><author>Gambardella,
Matthew</author><title>XML Developer's
Guide</title><genre>Computer</genre><price>44.95</price></book>"}
at
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row {"xml_load":"<book id=\"0><author>Gambardella,
Matthew</author><title>XML Developer's
Guide</title><genre>Computer</genre><price>44.95</price></book>"}
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error
evaluating array ('xml_load',''//book[@id='470']'')
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:
84)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator
.java:92)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 9 more
Caused by: java.lang.RuntimeException: Invalid expression
'//book[@id='470']'
at
org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:74)
at
org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.evalNodeList(UDFXPathUtil.jav
a:95)
at
org.apache.hadoop.hive.ql.udf.xml.GenericUDFXPath.eval(GenericUDFXPath.java:
76)
at
org.apache.hadoop.hive.ql.udf.xml.GenericUDFXPath.evaluate(GenericUDFXPath.j
ava:97)
at
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNo
deGenericFuncEvaluator.java:166)
at
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.
java:77)
at
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.
java:65)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:
79)
... 13 more
Can anyone help me out on this? We are using
apache-hive-0.13.1-bin/lib/hive-common-0.13.1.jar
and
[root@10 ~]# hadoop -version
\java version "1.6.0_51"
Java(TM) SE Runtime Environment (build 1.6.0_51-b11)
Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01, mixed mode)
Thanks
Muthu
408-329-0704