[
https://issues.apache.org/jira/browse/HIVE-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thejas M Nair resolved HIVE-5273.
---------------------------------
Resolution: Cannot Reproduce
I am unable to reproduce this issue with hive trunk and branch 0.12 . Please
let me know if I am not following the right steps here.
By local task tracker, I assume you meant local mode jobtracker. To run in
local mode, I used -
echo $HIVE_OPTS
-hiveconf mapred.job.tracker=local -hiveconf fs.default.name=file:///tmp
-hiveconf hive.metastore.warehouse.dir=file:///tmp/warehouse -hiveconf
javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/tmp/metastore_db;create=true
This is what i tried -
//create table
{code}
hive> create table ts(s string);
OK
Time taken: 0.02 seconds
hive> select s from ts limit 5;
{code}
//adding data to table
{code}
$ perl -e 'for (my $i=0; $i<100000000; $i++){ print
"asdfasdfasdfasdfasdfasdfasdfasdfasd\n";}' > /tmp/warehouse/ts/input
$ du -hs /tmp/warehouse/ts/input
3.4G /tmp/warehouse/ts/input
{code}
//running the test
{code}
hive> select s from ts limit 5;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/thejas/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2013-09-25 09:47:25,276 null map = 0%, reduce = 0%
2013-09-25 09:47:28,278 null map = 100%, reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
Time taken: 14.622 seconds, Fetched: 5 row(s)
hive> select s from ts limit 5;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/thejas/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2013-09-25 09:58:00,492 null map = 0%, reduce = 0%
2013-09-25 09:58:03,493 null map = 100%, reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
Time taken: 11.825 seconds, Fetched: 5 row(s)
{code}
> Subsequent use of Mapper yields 0 results
> -----------------------------------------
>
> Key: HIVE-5273
> URL: https://issues.apache.org/jira/browse/HIVE-5273
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.12.0, 0.13.0
> Reporter: Mike Lewis
> Priority: Blocker
>
> First noticed this when using local task tracker (and is easiest to reproduce
> with it).
> Created a table with one column (uuid). Ran
> {code}
> SELECT uuid FROM test_foo LIMIT 5;
> {code}
> Results are as expected:
> {code}
> ace7265d-49bf-4c11-af67-0cd0a33c690e
> ace7265d-49bf-4c11-af67-0cd0a33c690e
> ace7265d-49bf-4c11-af67-0cd0a33c690e
> ace7265d-49bf-4c11-af67-0cd0a33c690e
> ace7265d-49bf-4c11-af67-0cd0a33c690e
> Time taken: 40.172 seconds, Fetched: 5 row(s)
> {code}
> Then I run it again.
> The results are not as expected:
> {code}
> Time taken: 55.498 seconds
> {code}
> The table I am querying is
> {code}
> hive> describe extended test_foo;
> OK
> uuid string None
>
> Detailed Table Information Table(tableName:test_foo, dbName:default,
> owner:lewis, createTime:1378934838, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:uuid, type:string,
> comment:null)],
> location:hdfs://gun1.sjc1c.square:8020/user/hive/warehouse/test_foo,
> inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{serialization.format=1}), bucketCols:[], sortCols:[],
> parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[],
> skewedColValueLocationMaps:{}), storedAsSubDirectories:false),
> partitionKeys:[], parameters:{numPartitions=0, numFiles=37,
> transient_lastDdlTime=1378934838, numRows=0, totalSize=44600654909,
> rawDataSize=0}, viewOriginalText:null, viewExpandedText:null,
> tableType:MANAGED_TABLE)
> {code}
> With non-local tasktracker subsequent queries work, but when doing a
> {{count(* )}} over a large data set, 0.12.0 returns only a subset of results
> that 0.10.0 returns.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira