[jira] [Resolved] (HIVE-5273) Subsequent use of Mapper yields 0 results

Thejas M Nair (JIRA) Wed, 25 Sep 2013 10:02:46 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Thejas M Nair resolved HIVE-5273.
---------------------------------

    Resolution: Cannot Reproduce

I am unable to reproduce this issue with hive trunk and branch 0.12 . Please 
let me know if I am not following the right steps here.

By local task tracker, I assume you meant local mode jobtracker. To run in 
local mode, I used - 
echo $HIVE_OPTS 
-hiveconf mapred.job.tracker=local -hiveconf fs.default.name=file:///tmp 
-hiveconf hive.metastore.warehouse.dir=file:///tmp/warehouse -hiveconf 
javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/tmp/metastore_db;create=true


This is what i tried -
//create table
{code}
hive> create table ts(s string);
OK
Time taken: 0.02 seconds
hive> select s from ts limit 5;
{code}

//adding data to table
{code}
$ perl -e 'for (my $i=0; $i<100000000; $i++){ print 
"asdfasdfasdfasdfasdfasdfasdfasdfasd\n";}'  >  /tmp/warehouse/ts/input
$ du -hs /tmp/warehouse/ts/input
3.4G    /tmp/warehouse/ts/input

{code}


//running the test
{code}
hive> select s from ts limit 5;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/thejas/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2013-09-25 09:47:25,276 null map = 0%,  reduce = 0%
2013-09-25 09:47:28,278 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
Time taken: 14.622 seconds, Fetched: 5 row(s)


hive> select s from ts limit 5;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/thejas/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2013-09-25 09:58:00,492 null map = 0%,  reduce = 0%
2013-09-25 09:58:03,493 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
asdfasdfasdfasdfasdfasdfasdfasdfasd
Time taken: 11.825 seconds, Fetched: 5 row(s)

{code}




                
> Subsequent use of Mapper yields 0 results
> -----------------------------------------
>
>                 Key: HIVE-5273
>                 URL: https://issues.apache.org/jira/browse/HIVE-5273
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Mike Lewis
>            Priority: Blocker
>
> First noticed this when using local task tracker (and is easiest to reproduce 
> with it).
> Created a table with one column (uuid).  Ran
> {code}
> SELECT uuid FROM test_foo LIMIT 5;
> {code}
> Results are as expected:
> {code}
> ace7265d-49bf-4c11-af67-0cd0a33c690e
> ace7265d-49bf-4c11-af67-0cd0a33c690e
> ace7265d-49bf-4c11-af67-0cd0a33c690e
> ace7265d-49bf-4c11-af67-0cd0a33c690e
> ace7265d-49bf-4c11-af67-0cd0a33c690e
> Time taken: 40.172 seconds, Fetched: 5 row(s)
> {code}
> Then I run it again.
> The results are not as expected:
> {code}
> Time taken: 55.498 seconds
> {code}
> The table I am querying is
> {code}
> hive> describe extended test_foo;
> OK
> uuid                  string                  None                
>                
> Detailed Table Information    Table(tableName:test_foo, dbName:default, 
> owner:lewis, createTime:1378934838, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:uuid, type:string, 
> comment:null)], 
> location:hdfs://gun1.sjc1c.square:8020/user/hive/warehouse/test_foo, 
> inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
> parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
> skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
> partitionKeys:[], parameters:{numPartitions=0, numFiles=37, 
> transient_lastDdlTime=1378934838, numRows=0, totalSize=44600654909, 
> rawDataSize=0}, viewOriginalText:null, viewExpandedText:null, 
> tableType:MANAGED_TABLE) 
> {code}
> With non-local tasktracker subsequent queries work, but when doing a 
> {{count(* )}} over a large data set, 0.12.0 returns only a subset of results 
> that 0.10.0 returns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-5273) Subsequent use of Mapper yields 0 results

Reply via email to