Hello Team,

 

We are processing huge XML files.

 

We are currently using Apache FLUME to ingest the raw XML data into HDFS.
Tie the file to Hive and then write Xpath queries to work on those files.

First, is our approach good are can we do it better.

 

Second running Xpath with Hive I am getting some errors.

 

Query that I ran:

 

select xpath(xml_load, "//book[@id='470']") from Muthu_XML;

 

Table:

 

CREATE EXTERNAL TABLE IF NOT EXISTS Muthu_XML(XML_Load STRING)

LOCATION "/user/Muthu";

 

Sample XML:

 

<book id="471><author>Gambardella, Matthew</author><title>XML Developer's
Guide</title><genre>Computer</genre><price>227</price></book>

 

Error:

 

Diagnostic Messages for this Task:

java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row {"xml_load":"<book id=\"0><author>Gambardella,
Matthew</author><title>XML Developer's
Guide</title><genre>Computer</genre><price>44.95</price></book>"}

        at
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)

        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)

        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)

        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)

        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:396)

        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1190)

       at org.apache.hadoop.mapred.Child.main(Child.java:249)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row {"xml_load":"<book id=\"0><author>Gambardella,
Matthew</author><title>XML Developer's
Guide</title><genre>Computer</genre><price>44.95</price></book>"}

        at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)

        at
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)

        ... 8 more

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error
evaluating array ('xml_load',''//book[@id='470']'')

        at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:
84)

        at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)

        at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator
.java:92)

        at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)

        at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)

        ... 9 more

Caused by: java.lang.RuntimeException: Invalid expression
'//book[@id='470']'

        at
org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:74)

        at
org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.evalNodeList(UDFXPathUtil.jav
a:95)

        at
org.apache.hadoop.hive.ql.udf.xml.GenericUDFXPath.eval(GenericUDFXPath.java:
76)

        at
org.apache.hadoop.hive.ql.udf.xml.GenericUDFXPath.evaluate(GenericUDFXPath.j
ava:97)

        at
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNo
deGenericFuncEvaluator.java:166)

        at
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.
java:77)

        at
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.
java:65)

        at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:
79)

        ... 13 more

 

Can anyone help me out on this? We are using 

 

apache-hive-0.13.1-bin/lib/hive-common-0.13.1.jar 

 

and 

 

[root@10 ~]# hadoop -version

\java version "1.6.0_51"

Java(TM) SE Runtime Environment (build 1.6.0_51-b11)

Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01, mixed mode)

 

Thanks

Muthu

408-329-0704

Reply via email to