Daniel Papp created HIVE-17487:
----------------------------------
Summary: Example fails on the Hive Getting started page
Key: HIVE-17487
URL: https://issues.apache.org/jira/browse/HIVE-17487
Project: Hive
Issue Type: Bug
Reporter: Daniel Papp
Priority: Trivial
There is an example on [Hive Getting
Started|https://cwiki.apache.org/confluence/display/Hive/GettingStarted] page
using the MovieLens100k dataset. The mapper is defined as a python script in
the following way:
{code}
import sys
import datetime
for line in sys.stdin:
line = line.strip()
userid, movieid, rating, unixtime = line.split('\t')
weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
print '\t'.join([userid, movieid, rating, str(weekday)])
{code}
which is correct assuming you're using the python 2 series. The following code
works with both 2 and 3 series:
{code}
from __future__ import print_function
import sys
import datetime
for line in sys.stdin:
line = line.strip()
userid, movieid, rating, unixtime = line.split('\t')
weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
print('\t'.join([userid, movieid, rating, str(weekday)]))
{code}
I think this should be corrected.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)