[ https://issues.apache.org/jira/browse/BIGTOP-1450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177972#comment-14177972 ]
Roman Shaposhnik commented on BIGTOP-1450: ------------------------------------------ I was one of the dudes who implemented the Hive tests in Bigtop. What we did was simple: we took existing tests from Hive and stuck them into Bigtop. That's it. Of course, in Hive the tests get maintained and they seem to have bitroted in Bigtop. I actually do like a suggestion of gutting them out and replacing with more representative smoke tests. But the proof is in the puddin'^H^H^H^H^Hpatch ;-) Anyway, another thing that would be super cool is to somehow collaborate on making Hive tests from Apache Hive projects itself be able to execute against a real cluster. That's what Pig lets us do for example. Any takers? > hive smoke test : possibly out of sync, need review, and hard to debug > ---------------------------------------------------------------------- > > Key: BIGTOP-1450 > URL: https://issues.apache.org/jira/browse/BIGTOP-1450 > Project: Bigtop > Issue Type: Improvement > Components: tests > Reporter: jay vyas > Assignee: jay vyas > Fix For: 0.9.0 > > > *Overall: The hive tests in {{test-artifacts}} are prone to failures from > missing data sets and generally need a thorough review* > When testing bigtop 0.8.0 release candidate, I found that I got some errors > {noformat} > [--- /dev/fd/63 2014-09-16 10:12:54.579647323 +0000, +++ /dev/fd/62 > 2014-09-16 10:12:54.579647323 +0000, @@ -14,4 +14,4 @@, INSERT OVERWRITE > DIRECTORY '/tmp/count', SELECT COUNT(1) FROM u_data, dfs -cat /tmp/count/*, > -0, +100000] err=[14/09/16 10:12:17 WARN mapred.JobConf: The variable > mapred.child.ulimit is no longer used., , Logging initialized using > configuration in file:/etc/hive/conf.dist/hive-log4j.properties, OK, Time > taken: 2.609 seconds, OK, Time taken: 0.284 seconds, Total jobs = 1, > Launching Job 1 out of 1, Number of reduce tasks determined at compile time: > 1, In order to change the average load for a reducer (in bytes):, set > hive.exec.reducers.bytes.per.reducer=<number>, In order to limit the > maximum number of reducers:, set hive.exec.reducers.max=<number>, In > order to set a constant number of reducers:, set > mapreduce.job.reduces=<number>, Starting Job = job_1410830363557_0019, > Tracking URL = > http://bigtop1.vagrant:20888/proxy/application_1410830363557_0019/, Kill > Command = /usr/lib/hadoop/bin/hadoop job -kill job_1410830363557_0019, > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 1, 2014-09-16 10:12:38,870 Stage-1 map = 0%, reduce = 0%, 2014-09-16 > 10:12:45,516 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.81 sec, > 2014-09-16 10:12:53,036 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 1.73 sec, MapReduce Total cumulative CPU time: 1 seconds 730 msec, Ended Job > = job_1410830363557_0019, Moving data to: /tmp/count, MapReduce Jobs > Launched: , Job 0: Map: 1 Reduce: 1 Cumulative CPU: 1.73 sec HDFS Read: > 272 HDFS Write: 2 SUCCESS, Total MapReduce CPU Time Spent: 1 seconds 730 > msec, OK, Time taken: 24.594 seconds > {noformat} > I know there is a diff error in here - some kind of diff is going on , but I > forgot how the actual,output,and filter are working. > In any case, I think these tests can be simplified to just grep for a output > string and check error code, or else, at least add some very clear assertions > as to what failures may be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)