Attached are the task logs of one of the tasks.
Saurabh.
On Wed, Dec 30, 2009 at 12:16 PM, Zheng Shao <[email protected]> wrote:
> This should be compiled into a single map-only job.
> Can you take a look at the progress and the task logs of the job?
>
> We are not aware of any changes that might cause this problem.
>
> Zheng
>
> On Tue, Dec 29, 2009 at 10:35 PM, Saurabh Nanda <[email protected]>
> wrote:
> > Picking up data from the 'raw' table, filtering the unwanted lines and
> > inserting into 'raw_compressed' table which is stored as sequencefile:
> >
> > insert overwrite table raw_compressed partition(dt='2009-04-01') select
> line
> > from raw where dt='2009-04-01' and lower(line) rlike '.*get
> > .*/confirmation.*http.*' and not lower(line) rlike
> >
> '(/images.*?|/styles.*?|/javascripts.*?|/adserver.*?|.*?favicon.*?|/includes/thwarte-logo.html.*)';
> >
> > Saurabh.
> >
> > On Wed, Dec 30, 2009 at 11:59 AM, Zheng Shao <[email protected]> wrote:
> >>
> >> What is the import query? Do you mean "load data"?
> >> Can you give an example?
> >>
> >> Zheng
> >>
> >> On Tue, Dec 29, 2009 at 10:22 PM, Saurabh Nanda <[email protected]
> >
> >> wrote:
> >> > Also has something changed drastically in Hive over the last 2-3
> months?
> >> > A
> >> > simply import query seems to be taking forever now!
> >> >
> >> > Saurabh.
> >> >
> >> > On Wed, Dec 30, 2009 at 11:48 AM, Saurabh Nanda <
> [email protected]>
> >> > wrote:
> >> >>
> >> >> I'm taking a look at the HDFS directories through the web interface
> and
> >> >> I
> >> >> can see only 5 files there, not 6. I tried creating the partition
> using
> >> >> the
> >> >> ADD PARTITION command. After that all 6 files get imported
> >> >> successfully.
> >> >>
> >> >> Saurabh.
> >> >>
> >> >> On Wed, Dec 30, 2009 at 11:39 AM, Zheng Shao <[email protected]>
> wrote:
> >> >>>
> >> >>> Can you list the HDFS directories? Are the files in the
> corresponding
> >> >>> directories yet?
> >> >>>
> >> >>>
> >> >>> Zheng
> >> >>>
> >> >>> On Tue, Dec 29, 2009 at 9:57 PM, Saurabh Nanda
> >> >>> <[email protected]>
> >> >>> wrote:
> >> >>> > Hi,
> >> >>> >
> >> >>> > I'm revisiting Hive after a long hiatus, so I may not be aware of
> >> >>> > any
> >> >>> > new
> >> >>> > developments. I had written a script some time back to import
> >> >>> > webserver
> >> >>> > logs
> >> >>> > for a day into a new partition. The same script now running on the
> >> >>> > latest
> >> >>> > version of Hive (r894548 compiled off trunk) seems to be
> >> >>> > misbehaving.
> >> >>> >
> >> >>> > I'm importing about 6 files into each partition. However, after
> the
> >> >>> > script
> >> >>> > ends, only 5 files show up in each partition. Do I need to
> >> >>> > explicitly
> >> >>> > issue
> >> >>> > the ADD PARTITION command before loading data? Isn't the partition
> >> >>> > implicitly created?
> >> >>> >
> >> >>> > Saurabh.
> >> >>> > --
> >> >>> > http://nandz.blogspot.com
> >> >>> > http://foodieforlife.blogspot.com
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Yours,
> >> >>> Zheng
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> http://nandz.blogspot.com
> >> >> http://foodieforlife.blogspot.com
> >> >
> >> >
> >> >
> >> > --
> >> > http://nandz.blogspot.com
> >> > http://foodieforlife.blogspot.com
> >> >
> >>
> >>
> >>
> >> --
> >> Yours,
> >> Zheng
> >
> >
> >
> > --
> > http://nandz.blogspot.com
> > http://foodieforlife.blogspot.com
> >
>
>
>
> --
> Yours,
> Zheng
>
--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com
2009-12-30 12:10:34,318 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2009-12-30 12:10:34,413 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2009-12-30 12:10:34,636 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded
the native-hadoop library
2009-12-30 12:10:34,638 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory:
Successfully loaded & initialized native-zlib library
2009-12-30 12:10:34,937 INFO ExecMapper: maximum memory = 416219136
2009-12-30 12:10:34,937 INFO ExecMapper: conf classpath =
[file:/home/ct-admin/hadoop/hadoop-0.18.3/conf/,
file:/usr/lib/jvm/java-6-sun-1.6.0.10/lib/tools.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/,
file:/home/ct-admin/hadoop/hadoop-0.18.3/hadoop-0.18.3-core.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/commons-cli-2.0-SNAPSHOT.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/commons-codec-1.3.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/commons-httpclient-3.0.1.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/commons-logging-1.0.4.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/commons-logging-api-1.0.4.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/commons-net-1.4.1.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/jets3t-0.6.0.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/jetty-5.1.4.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/junit-3.8.1.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/kfs-0.1.3.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/log4j-1.2.15.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/oro-2.0.8.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/servlet-api.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/slf4j-api-1.4.3.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/slf4j-log4j12-1.4.3.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/xmlenc-0.52.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/jetty-ext/commons-el.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/jetty-ext/jasper-compiler.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/jetty-ext/jasper-runtime.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/jetty-ext/jsp-api.jar,
file:/home/ct-admin/hadoop/tmp/mapred/local/taskTracker/jobcache/job_200912291955_0022/attempt_200912291955_0022_m_000000_0/work/,
file:/home/ct-admin/hadoop/tmp/mapred/local/taskTracker/jobcache/job_200912291955_0022/jars/classes,
file:/home/ct-admin/hadoop/tmp/mapred/local/taskTracker/jobcache/job_200912291955_0022/jars/,
file:/home/ct-admin/hadoop/tmp/mapred/local/taskTracker/jobcache/job_200912291955_0022/attempt_200912291955_0022_m_000000_0/work/]
2009-12-30 12:10:34,938 INFO ExecMapper: thread classpath =
[file:/home/ct-admin/hadoop/hadoop-0.18.3/conf/,
file:/usr/lib/jvm/java-6-sun-1.6.0.10/lib/tools.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/,
file:/home/ct-admin/hadoop/hadoop-0.18.3/hadoop-0.18.3-core.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/commons-cli-2.0-SNAPSHOT.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/commons-codec-1.3.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/commons-httpclient-3.0.1.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/commons-logging-1.0.4.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/commons-logging-api-1.0.4.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/commons-net-1.4.1.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/jets3t-0.6.0.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/jetty-5.1.4.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/junit-3.8.1.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/kfs-0.1.3.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/log4j-1.2.15.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/oro-2.0.8.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/servlet-api.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/slf4j-api-1.4.3.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/slf4j-log4j12-1.4.3.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/xmlenc-0.52.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/jetty-ext/commons-el.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/jetty-ext/jasper-compiler.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/jetty-ext/jasper-runtime.jar,
file:/home/ct-admin/hadoop/hadoop-0.18.3/lib/jetty-ext/jsp-api.jar,
file:/home/ct-admin/hadoop/tmp/mapred/local/taskTracker/jobcache/job_200912291955_0022/attempt_200912291955_0022_m_000000_0/work/,
file:/home/ct-admin/hadoop/tmp/mapred/local/taskTracker/jobcache/job_200912291955_0022/jars/classes,
file:/home/ct-admin/hadoop/tmp/mapred/local/taskTracker/jobcache/job_200912291955_0022/jars/,
file:/home/ct-admin/hadoop/tmp/mapred/local/taskTracker/jobcache/job_200912291955_0022/attempt_200912291955_0022_m_000000_0/work/]
2009-12-30 12:10:35,006 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Adding
alias raw to work list for file
/user/hive/warehouse/raw/dt=2009-04-01/20090402000000-172.16.1.61-access.log.gz
2009-12-30 12:10:35,009 INFO org.apache.hadoop.hive.ql.exec.MapOperator: dump
TS struct<line:string,dt:string>
2009-12-30 12:10:35,009 INFO ExecMapper:
<MAP>Id =5
<Children>
<TS>Id =0
<Children>
<FIL>Id =1
<Children>
<FIL>Id =2
<Children>
<SEL>Id =3
<Children>
<FS>Id =4
<Parent>Id = 3 null<\Parent>
<\FS>
<\Children>
<Parent>Id = 2 null<\Parent>
<\SEL>
<\Children>
<Parent>Id = 1 null<\Parent>
<\FIL>
<\Children>
<Parent>Id = 0 null<\Parent>
<\FIL>
<\Children>
<Parent>Id = 5 null<\Parent>
<\TS>
<\Children>
<\MAP>
2009-12-30 12:10:35,009 INFO org.apache.hadoop.hive.ql.exec.MapOperator:
Initializing Self 5 MAP
2009-12-30 12:10:35,010 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
Initializing Self 0 TS
2009-12-30 12:10:35,010 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
Operator 0 TS initialized
2009-12-30 12:10:35,010 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
Initializing children of 0 TS
2009-12-30 12:10:35,010 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Initializing child 1 FIL
2009-12-30 12:10:35,010 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Initializing Self 1 FIL
2009-12-30 12:10:35,015 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Operator 1 FIL initialized
2009-12-30 12:10:35,015 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Initializing children of 1 FIL
2009-12-30 12:10:35,015 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Initializing child 2 FIL
2009-12-30 12:10:35,015 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Initializing Self 2 FIL
2009-12-30 12:10:35,015 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Operator 2 FIL initialized
2009-12-30 12:10:35,015 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Initializing children of 2 FIL
2009-12-30 12:10:35,015 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initializing child 3 SEL
2009-12-30 12:10:35,015 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initializing Self 3 SEL
2009-12-30 12:10:35,015 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
SELECT struct<line:string,dt:string>
2009-12-30 12:10:35,015 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Operator 3 SEL initialized
2009-12-30 12:10:35,015 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initializing children of 3 SEL
2009-12-30 12:10:35,015 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Initializing child 4 FS
2009-12-30 12:10:35,015 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Initializing Self 4 FS
2009-12-30 12:10:35,020 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Writing to temp file: FS
hdfs://master-hadoop:8020/tmp/hive-ct-admin/297834503/_tmp.10002/_tmp.attempt_200912291955_0022_m_000000_0
2009-12-30 12:10:35,072 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor
2009-12-30 12:10:35,095 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Operator 4 FS initialized
2009-12-30 12:10:35,096 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Initialization Done 4 FS
2009-12-30 12:10:35,096 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initialization Done 3 SEL
2009-12-30 12:10:35,096 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Initialization Done 2 FIL
2009-12-30 12:10:35,096 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Initialization Done 1 FIL
2009-12-30 12:10:35,096 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
Initialization Done 0 TS
2009-12-30 12:10:35,096 INFO org.apache.hadoop.hive.ql.exec.MapOperator:
Initialization Done 5 MAP
2009-12-30 12:10:35,116 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5
forwarding 1 rows
2009-12-30 12:10:35,116 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
0 forwarding 1 rows
2009-12-30 12:10:35,208 INFO ExecMapper: ExecMapper: processing 1 rows: used
memory = 2567664
2009-12-30 12:10:35,662 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5
forwarding 10 rows
2009-12-30 12:10:35,662 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
0 forwarding 10 rows
2009-12-30 12:10:35,741 INFO ExecMapper: ExecMapper: processing 10 rows: used
memory = 2044328
2009-12-30 12:10:40,223 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5
forwarding 100 rows
2009-12-30 12:10:40,223 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
0 forwarding 100 rows
2009-12-30 12:10:40,290 INFO ExecMapper: ExecMapper: processing 100 rows: used
memory = 2440872
2009-12-30 12:10:44,215 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1
forwarding 1 rows
2009-12-30 12:10:44,247 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2
forwarding 1 rows
2009-12-30 12:10:44,247 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3
forwarding 1 rows
2009-12-30 12:11:21,327 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1
forwarding 10 rows
2009-12-30 12:11:21,378 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2
forwarding 10 rows
2009-12-30 12:11:21,378 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3
forwarding 10 rows
2009-12-30 12:11:43,087 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5
forwarding 1000 rows
2009-12-30 12:11:43,087 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
0 forwarding 1000 rows
2009-12-30 12:11:43,210 INFO ExecMapper: ExecMapper: processing 1000 rows: used
memory = 2469904
2009-12-30 12:20:30,437 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5
forwarding 10000 rows
2009-12-30 12:20:30,437 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator:
0 forwarding 10000 rows
2009-12-30 12:20:30,477 INFO ExecMapper: ExecMapper: processing 10000 rows:
used memory = 2358968