JOIN + LATERAL VIEW works, but + MAPJOIN and no longer get any results

2012-05-22 Thread Ruben de Vries
Okay first off; I know JOIN + LATERAL VIEW together isn't working so I moved my 
JOIN into a subquery and that makes the query work properly

However when I added a MAPJOIN hint for the JOIN in the subquery it will also 
stop doing the reducer for the main query!
This only happens when there's a LATERAL VIEW in there though, if I remove the 
LATERAL VIEW then the main query still get's a reducer to do grouping

Here's a gist: https://gist.github.com/2499436 Containing the queries and a PHP 
script which you can run to execute the test case I'm using, which does;
* setup a database called hive_mapjoin
* setup tables
* load some test data
* do the selects

You'll need the https://github.com/rcongiu/Hive-JSON-Serde/downloads 
json-serde-1.1-jar-with-dependencies.jar with it though and change the path 
I guess looking at the queries you guys can probally figure out a better 
testcase, but maybe it's helpful 

Not sure if this is a bug or me doing something that just isn't supposed to be 
working, but I can't seem to find any pointers that this wouldn't be 
supported...

Here's another gist with the plan.xml: https://gist.github.com/2499658

I've also created a ticket in JIRA but it doesn't seem to get any attention at 
all: https://issues.apache.org/jira/browse/HIVE-2992


Greetz, Ruben de Vries



start hive cli error

2012-05-22 Thread Dimboo Zhu
hi there,

I got the following trace stack when startuping hive cli. It worked=20
well last week when i just installed it.
Anybody can help? thanks,

Dianbau

[dzhu@bbdw-194 bin]$ ./hive
Logging initialized using configuration in j=
ar:file:/local/dzhu/hadoop/hive-0.8.1-bin/lib/hive-common-0.8.1.jar!/hive-=
log4j.properties
Hive history file=3D/tmp/dzhu/hive_job_log_dzhu_20120518130=
0_1495802688.txt
Exception in thread main java.io.UnsupportedEnc= odingException: GB2312
at sun.nio.cs.St=
reamEncoder.forOutputStreamWriter(StreamEncoder.java:42)
at java.io.Outpu= tStreamWriter.init(OutputStreamWriter.java:83)
at jline.Console= Reader.init(ConsoleReader.java:174)
at org.apache.ha= doop.hive.cli.CliDriver.run(CliDriver.java:649)
at org.apache.ha= doop.hive.cli.CliDriver.main(CliDriver.java:554)
at sun.reflect.N= ativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.N=
ativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.D=
elegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.ref= lect.Method.invoke(Method.java:597)
at org.apache.ha= doop.util.RunJar.main(RunJar.java:156)


Re: start hive cli error

2012-05-22 Thread Nitin Pawar
error is due to default encoding.
hive supports UTF-8  based encoding but somehow your hive setup is picking
up GB2312.

can you provide the output of locale command?

Thanks,
Nitin

On Tue, May 22, 2012 at 3:17 PM, Dimboo Zhu dianbo@gmail.com wrote:

 hi there,

 I got the following trace stack when startuping hive cli. It worked=20
 well last week when i just installed it.
 Anybody can help? thanks,

 Dianbau

 [dzhu@bbdw-194 bin]$ ./hive
 Logging initialized using configuration in j=
 ar:file:/local/dzhu/hadoop/hive-0.8.1-bin/lib/hive-common-0.8.1.jar!/hive-=
 log4j.properties
 Hive history file=3D/tmp/dzhu/hive_job_log_dzhu_20120518130=
 0_1495802688.txt
 Exception in thread main java.io.UnsupportedEnc= odingException: GB2312
 at sun.nio.cs.St=
 reamEncoder.forOutputStreamWriter(StreamEncoder.java:42)
 at java.io.Outpu= tStreamWriter.init(OutputStreamWriter.java:83)
 at jline.Console= Reader.init(ConsoleReader.java:174)
 at org.apache.ha= doop.hive.cli.CliDriver.run(CliDriver.java:649)
 at org.apache.ha= doop.hive.cli.CliDriver.main(CliDriver.java:554)
 at sun.reflect.N= ativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.N=
 ativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.D=
 elegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.ref= lect.Method.invoke(Method.java:597)
 at org.apache.ha= doop.util.RunJar.main(RunJar.java:156)




-- 
Nitin Pawar


from-insert-select trouble

2012-05-22 Thread Avdeev V . M .
Hello!

I'm very new to the world Hadoop and Hive so I cannot solve a problem that I 
encountered.

Hadoop has been deployed on a single-node in a pseudo-distributed mode.
I'm trying to copy data from one table to another. Source table created by 
Sqoop, destination table created by query



create table if not exists rev0.operation_list (
id bigint,
id_paper bigint,

lgot_code int,
id_region int,
id_tarif_type int,
id_annulate int,
id_from int,
id_to int,
id_train int,
id_emitent int,
id_carriage int,
id_place int,
id_ticket_type int,

sell_date string,
trip_date string,

amount int,
cash int,
ticket_count int,
price_tarif_place int,
price_tarif_transfer int,
km float,
passengers int,
pkm float)
PARTITIONED BY(id_sell_date string)
stored as RCFILE;



Source table contains about 23 000 000 rows. When I try to execute 



set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

from rev0.operation_list_temp
insert overwrite table rev0.operation_list PARTITION(id_sell_date)
select
id,
id_paper,
lgot_code,
id_region,
id_tarif_type,
id_annulate,
id_from,
id_to,
id_train,
id_emitent,
id_carriage,
id_place,
id_ticket_type,

sell_date,
trip_date,

amount,
cash,
ticket_count,
price_tarif_place,
price_tarif_transfer,
km,
passengers,
pkm,

to_date(sell_date) id_sell_date;



I see strange progress report:



Hive history file=/tmp/user/hive_job_log_user_201205221419_1856534995.txt
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205191141_0110, Tracking URL = 
http://localhost:50030/jobdetails.jsp?jobid=job_201205191141_0110
Kill Command = /usr/lib/hadoop/bin/hadoop job  
-Dmapred.job.tracker=localhost:8021 -kill job_201205191141_0110
2012-05-22 14:19:59,092 Stage-1 map = 0%,  reduce = 0%
2012-05-22 14:21:00,000 Stage-1 map = 0%,  reduce = 0%
2012-05-22 14:21:46,527 Stage-1 map = 13%,  reduce = 0%
2012-05-22 14:21:52,664 Stage-1 map = 41%,  reduce = 0%
2012-05-22 14:22:53,357 Stage-1 map = 41%,  reduce = 0%
2012-05-22 14:23:06,747 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:23:28,409 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:24:29,322 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:25:28,276 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:25:31,397 Stage-1 map = 50%,  reduce = 0% -- my comment: 88% 
downs to 50%!
2012-05-22 14:26:32,332 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:27:02,701 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:28:03,314 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:28:21,919 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:29:22,023 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:30:22,081 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:30:32,182 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:30:34,227 Stage-1 map = 50%,  reduce = 0% -- my comment: again!
2012-05-22 14:31:34,948 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:32:01,198 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:33:01,904 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:33:20,150 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:34:21,127 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:35:22,018 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:35:33,295 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:35:43,137 Stage-1 map = 50%,  reduce = 0% -- my comment: and 
again!
2012-05-22 14:36:44,057 Stage-1 map = 50%,  reduce = 0%
2012-05-22 14:37:17,486 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:38:18,116 Stage-1 map = 63%,  reduce = 0%
2012-05-22 14:38:36,327 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:39:36,936 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:40:37,660 Stage-1 map = 75%,  reduce = 0%
2012-05-22 14:40:41,731 Stage-1 map = 88%,  reduce = 0%
2012-05-22 14:40:43,759 Stage-1 map = 50%,  reduce = 0% -- my comment: last 
one!
2012-05-22 14:40:47,815 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201205191141_0110 with errors
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask



I can not understand why the process was completed with an error and why the 
progress of the MAP procedure is so strange.

I have found 2 'workarounds':
1) divide original query for two by adding 'WHERE to_date(sell_date)  
to_date(border_date)' and 'WHERE to_date(sell_date) = to_date(border_date)'. 
As a result each query contains 11 500 000 rows and the copying process is 
completed without errors.
2) In other hand, change 'stored by rcfile' to 'stored by sequencefile' without 
WHERE predicate. Again, the query will complete without errors.
I have no idea about this behavior. Maybe I have not enough knowledge to 

Re: Re: start hive cli error

2012-05-22 Thread dianbo . zhu
Hi Nitin,
i reinstalled and did not modify anything, but it also can not work. It worked 
well when i first ran it months ago.

the output of locale command is below: 
LANG=zh_CN
LC_CTYPE=zh_CN
LC_NUMERIC=zh_CN
LC_TIME=zh_CN
LC_COLLATE=zh_CN
LC_MONETARY=zh_CN
LC_MESSAGES=zh_CN
LC_PAPER=zh_CN
LC_NAME=zh_CN
LC_ADDRESS=zh_CN
LC_TELEPHONE=zh_CN
LC_MEASUREMENT=zh_CN
LC_IDENTIFICATION=zh_CN
LC_ALL=

Thanks very much.




dianbo.zhu

From: Nitin Pawar
Date: 2012-05-22 18:42
To: user
Subject: Re: start hive cli error
error is due to default encoding.
hive supports UTF-8  based encoding but somehow your hive setup is picking up 
GB2312.


can you provide the output of locale command?


Thanks,
Nitin


On Tue, May 22, 2012 at 3:17 PM, Dimboo Zhu dianbo@gmail.com wrote:

hi there,

I got the following trace stack when startuping hive cli. It worked=20 well 
last week when i just installed it.
Anybody can help? thanks,

Dianbau

[dzhu@bbdw-194 bin]$ ./hive
Logging initialized using configuration in j= 
ar:file:/local/dzhu/hadoop/hive-0.8.1-bin/lib/hive-common-0.8.1.jar!/hive-= 
log4j.properties
Hive history file=3D/tmp/dzhu/hive_job_log_dzhu_20120518130= 0_1495802688.txt
Exception in thread main java.io.UnsupportedEnc= odingException: GB2312
at sun.nio.cs.St= 
reamEncoder.forOutputStreamWriter(StreamEncoder.java:42)
at java.io.Outpu= tStreamWriter.init(OutputStreamWriter.java:83)
at jline.Console= Reader.init(ConsoleReader.java:174)
at org.apache.ha= doop.hive.cli.CliDriver.run(CliDriver.java:649)
at org.apache.ha= doop.hive.cli.CliDriver.main(CliDriver.java:554)
at sun.reflect.N= ativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.N= 
ativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.D= 
elegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
at java.lang.ref= lect.Method.invoke(Method.java:597)
at org.apache.ha= doop.util.RunJar.main(RunJar.java:156)





-- 
Nitin Pawar

AbstractMethodError while using serde

2012-05-22 Thread Sumit Kumar
Hi all,

I'm using csv-serde code ( https://github.com/ogrodnek/csv-serde ) with
hadoop 0.20.205 and hive 0.7.1 and running into the following issue

2012-05-22 15:51:44,354 WARN org.apache.hadoop.mapred.Child: Error running
child
java.lang.RuntimeException: java.lang.AbstractMethodError:
com.bizo.hive.serde.csv.CSVSerde.getSerDeStats()Lorg/apache/hadoop/hive/serde2/SerDeStats;
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.AbstractMethodError:
com.bizo.hive.serde.csv.CSVSerde.getSerDeStats()Lorg/apache/hadoop/hive/serde2/SerDeStats;
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:574)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
... 8 more

Any idea on what could be going wrong? Have you ever faced this?
Surprisingly this works fine in Amazon's EMR infrastructure but fails on my
local setup that i've created with everything from apache. Any help would
be appreciated.

Regards,
-Sumit


Re[2]: from-insert-select trouble

2012-05-22 Thread Avdeev V . M .
Found.

2012-05-22 17:52:47,117 FATAL org.apache.hadoop.mapred.Child: Error running 
child : java.lang.OutOfMemoryError: Java heap space
 at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$Packet.init(DFSClient.java:2790)
 at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:3733)
 at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
 at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
 at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
 at java.io.DataOutputStream.write(DataOutputStream.java:90)
 at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.write(RCFile.java:450)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.flushRecords(RCFile.java:867)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.close(RCFile.java:884)
 at 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.close(RCFileOutputFormat.java:147)
 at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.abortWriters(FileSinkOperator.java:196)
 at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:653)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)

I will experiment with smaller data set!
Thank you Bejoy! 

Tue, 22 May 2012 03:40:20 -0700 (PDT) от Bejoy Ks bejoy...@yahoo.com:
 
 
 
Hi Vyacheslav
       Can you post in the error log from your failed mapreduce tasks? You can 
get the error logs from the Jobtracker web UI drilling down to task level. 
Those errors will give you abetter understanding on what could be 
going wrong here.

Regards
Bejoy

 
 
 
  
--
 From: Avdeev V. M. ls...@list.ru
 To: user@hive.apache.org 
 Sent: Tuesday, May 22, 2012 3:50 PM
 Subject: from-insert-select trouble
   
Hello!

I'm very new to the world Hadoop and Hive so I cannot solve a problem that I 
encountered.

Hadoop has been deployed on a single-node in a pseudo-distributed mode.
I'm trying to copy data from one table to another. Source table created by 
Sqoop, destination table created by query



create table if not exists rev0.operation_list (
    id bigint,
    id_paper bigint,
    
    lgot_code int,
    id_region int,
    id_tarif_type int,
    id_annulate int,
    id_from int,
    id_to int,
    id_train int,
    id_emitent int,
    id_carriage int,
    id_place int,
    id_ticket_type int,

    sell_date string,
    trip_date string,

    amount int,
    cash int,
    ticket_count int,
    price_tarif_place int,
    price_tarif_transfer int,
    km float,
    passengers int,
    pkm float)
PARTITIONED BY(id_sell_date string)
stored as RCFILE;



Source table contains about 23 000 000 rows. When I try to execute 



set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

from rev0.operation_list_temp
insert overwrite table rev0.operation_list PARTITION(id_sell_date)
select
    id,
    id_paper,
    lgot_code,
    id_region,
    id_tarif_type,
    id_annulate,
    id_from,
    id_to,
    id_train,
    id_emitent,
    id_carriage,
    id_place,
    id_ticket_type,

    sell_date,
    trip_date,

    amount,
    cash,
    ticket_count,
    price_tarif_place,
    price_tarif_transfer,
    km,
    passengers,
    pkm,

    to_date(sell_date) id_sell_date;



I see strange progress report:



Hive history file=/tmp/user/hive_job_log_user_201205221419_1856534995.txt
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201205191141_0110, Tracking URL = 
http://localhost:50030/jobdetails.jsp?jobid=job_201205191141_0110
Kill Command = /usr/lib/hadoop/bin/hadoop job  
-Dmapred.job.tracker=localhost:8021 -kill job_201205191141_0110
2012-05-22 14:19:59,092 Stage-1 map = 0%,  reduce = 0%
2012-05-22 14:21:00,000 Stage-1 map = 0%,  reduce = 0%
2012-05-22 14:21:46,527 Stage-1 map = 13%,  reduce = 0%
2012-05-22 14:21:52,664 Stage-1 map = 41%,  reduce = 0%
2012-05-22 14:22:53,357 Stage-1 map = 41%,  reduce = 0%

Re: Re[2]: from-insert-select trouble

2012-05-22 Thread Bejoy KS
Great, good catch.. Not enough child heap size available to process your data 
volume. If you have free memory available just increase child.opts memory and 
it may pass through as well.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: Avdeev V. M. ls...@list.ru
Date: Tue, 22 May 2012 16:15:17 
To: Bejoy Ksbejoy...@yahoo.com
Reply-To: user@hive.apache.org
Cc: user@hive.apache.orguser@hive.apache.org
Subject: Re[2]: from-insert-select trouble

Found.

2012-05-22 17:52:47,117 FATAL org.apache.hadoop.mapred.Child: Error running 
child : java.lang.OutOfMemoryError: Java heap space
 at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$Packet.init(DFSClient.java:2790)
 at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:3733)
 at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
 at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
 at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
 at java.io.DataOutputStream.write(DataOutputStream.java:90)
 at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.write(RCFile.java:450)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.flushRecords(RCFile.java:867)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.close(RCFile.java:884)
 at 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.close(RCFileOutputFormat.java:147)
 at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.abortWriters(FileSinkOperator.java:196)
 at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:653)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)

I will experiment with smaller data set!
Thank you Bejoy! 

Tue, 22 May 2012 03:40:20 -0700 (PDT) от Bejoy Ks bejoy...@yahoo.com:
 
 
 
Hi Vyacheslav
       Can you post in the error log from your failed mapreduce tasks? You can 
get the error logs from the Jobtracker web UI drilling down to task level. 
Those errors will give you abetter understanding on what could be 
going wrong here.

Regards
Bejoy

 
 
 
  
--
 From: Avdeev V. M. ls...@list.ru
 To: user@hive.apache.org 
 Sent: Tuesday, May 22, 2012 3:50 PM
 Subject: from-insert-select trouble
   
Hello!

I'm very new to the world Hadoop and Hive so I cannot solve a problem that I 
encountered.

Hadoop has been deployed on a single-node in a pseudo-distributed mode.
I'm trying to copy data from one table to another. Source table created by 
Sqoop, destination table created by query



create table if not exists rev0.operation_list (
    id bigint,
    id_paper bigint,
    
    lgot_code int,
    id_region int,
    id_tarif_type int,
    id_annulate int,
    id_from int,
    id_to int,
    id_train int,
    id_emitent int,
    id_carriage int,
    id_place int,
    id_ticket_type int,

    sell_date string,
    trip_date string,

    amount int,
    cash int,
    ticket_count int,
    price_tarif_place int,
    price_tarif_transfer int,
    km float,
    passengers int,
    pkm float)
PARTITIONED BY(id_sell_date string)
stored as RCFILE;



Source table contains about 23 000 000 rows. When I try to execute 



set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

from rev0.operation_list_temp
insert overwrite table rev0.operation_list PARTITION(id_sell_date)
select
    id,
    id_paper,
    lgot_code,
    id_region,
    id_tarif_type,
    id_annulate,
    id_from,
    id_to,
    id_train,
    id_emitent,
    id_carriage,
    id_place,
    id_ticket_type,

    sell_date,
    trip_date,

    amount,
    cash,
    ticket_count,
    price_tarif_place,
    price_tarif_transfer,
    km,
    passengers,
    pkm,

    to_date(sell_date) id_sell_date;



I see strange progress report:



Hive history file=/tmp/user/hive_job_log_user_201205221419_1856534995.txt
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = 

Re: Hive + Cassandra

2012-05-22 Thread Edward Capriolo
There are actually several storage handlers floating out in the wild
for Mongo, Cassandra, HyperTable, etc. Your best bet is referencing
the Cassandra ticket and building/using the brisk/dse code. Eventually
the code will make it's way into Cassandra. It is a shame the code has
been in production in numerous sites for over a year now, and is the
most voted for hive issue, but has not landed into a release, thus
confusing everyone.

Look here for the latest code.

https://issues.apache.org/jira/browse/CASSANDRA-4131

On Tue, May 22, 2012 at 4:19 AM, Szymon Dąbrowski
szymon.dabrow...@gmail.com wrote:
 I've been trying to see if that's possible to combine Hive with
 Cassandra. I've noticed some issues about integration but it seems
 to me there's some mix-up about who is to patch things up. There are
 two issues: in Hive project [1] and in Cassandra [2]. The first one is
 opened, but it ends with comment which says the patch has been send
 to the second issue. The second issue is marked as closed, because
 it's marked as duplicate of the first - Hive issue. So I guess no one
 is going to look at the closed issue in Cassandra project (and no one
 will apply the patch) and no one is going to do anything about it on
 Hive side (because the patch has been submitted to Cassandra). Anyway
 is there a chance this feature would be available soon?

 [1] - https://issues.apache.org/jira/browse/HIVE-1434
 [2] - https://issues.apache.org/jira/browse/CASSANDRA-913


 By the way, I am hoping to achieve lower Hive latency by using the
 Cassandra, so that I get online processing tool. Is there a chance
 it would be possible?


 --
 Szymon


Condition for doing a sort merge bucket map join

2012-05-22 Thread Bruce Bian
Hi ,
I've got 7 large tables to join(each ~10G in size) into one table, all with
the same* 2 *join keys, I've read some documents on sort merge bucket map
join, but failed to fire that.
I've bucketed all the 7 tables into 20 buckets and sorted  by one of the
join key,
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set
hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
Set the above parameters while doing the join.
What else do I miss? Do I have to bucket on both of the join keys(I'm
currently trying this)? And does each bucket file has to be smaller than
one HDFS block?
Thanks a lot.


Re: Condition for doing a sort merge bucket map join

2012-05-22 Thread Mark Grover
Hi Bruce,
Instead of joining 7 tables in the query, can you please start off with 2 
tables and see if that works? If it doesn't, feel free to paste your table 
definitions and join query along with any properties you are setting and folks 
on the mailing list can take a jab at it.


Mark

- Original Message -
From: Bruce Bian weidong@gmail.com
To: user@hive.apache.org
Sent: Tuesday, May 22, 2012 11:07:38 AM
Subject: Condition for doing a sort merge bucket map join

Hi , 
I've got 7 large tables to join(each ~10G in size) into one table, all with the 
same 2 join keys, I've read some documents on sort merge bucket map join, but 
failed to fire that. 
I've bucketed all the 7 tables into 20 buckets and sorted by one of the join 
key, 
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true; 
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; 
Set the above parameters while doing the join. 
What else do I miss? Do I have to bucket on both of the join keys(I'm currently 
trying this)? And does each bucket file has to be smaller than one HDFS block? 
Thanks a lot. 


RCFile and UDF

2012-05-22 Thread Mohit Anchlia
I am new to Hive. Currently I am trying out one of the use cases where we
write xml files into a sequence file. We then read the sequence file and
convert it into more structured row, col format using pig udf. This is
currently being stored as snapp compression.

Now what I want to do is use hive to query data and do self join. But my
problem is that file that I need to query on is in snappy format, HIVE
dserializes the entire row which I am trying to avoid. Is there a way I can
store file in RCFile format when I store using pig?


protobuf 2.4.1 and ObjectInspector

2012-05-22 Thread kulkarni.swar...@gmail.com
I am trying to use the ReflectionStructObjectInspector to extract fields
from a protobuf generated from 2.4.1 compiler. I am seeing that reflections
fails to extract fields out of the generated protobuf class. Specifically,
this code snippet:

public static Field[] getDeclaredNonStaticFields(Class? c) {

Field[] f = c.getDeclaredFields();// This returns back the correct
number of fields

ArrayListField af = new ArrayListField();

for (int i = 0; i  f.length; ++i) {

  *//* *The logic here falls flat as it is looking only for the
non-static fields and all generated fields *

* // seem to be static*

  if (!Modifier.isStatic(f[i].getModifiers())) {

af.add(f[i]);

  }

}

Field[] r = new Field[af.size()];

for (int i = 0; i  af.size(); ++i) {

  r[i] = af.get(i);

}

return r;

  }

This causes the whole ObjectInspector to fail. Has anyone else seen this
issue too?


Map side aggregations

2012-05-22 Thread Raghunath, Ranjith
I have the parameter hive.map.aggr set to true. However, when I look at the 
counters associated with the map tasks I notice the following Combine input 
records 0. I am interpreting this as a failure to perform the map side 
aggregation. Is that accurate? Is this option not working in hive 0.7.1?

Thanks,
Ranjith





RE: Re: start hive cli error

2012-05-22 Thread Hezhiqiang (Ransom)
Is it your linux cosole problem?
You changed SecureCRT or putty charset “GB2312”?

Best regards
Ransom.

From: dianbo.zhu [mailto:dianbo@gmail.com]
Sent: Tuesday, May 22, 2012 6:48 PM
To: user
Subject: Re: Re: start hive cli error

Hi Nitin,
i reinstalled and did not modify anything, but it also can not work. It worked 
well when i first ran it months ago.

the output of locale command is below:
LANG=zh_CN
LC_CTYPE=zh_CN
LC_NUMERIC=zh_CN
LC_TIME=zh_CN
LC_COLLATE=zh_CN
LC_MONETARY=zh_CN
LC_MESSAGES=zh_CN
LC_PAPER=zh_CN
LC_NAME=zh_CN
LC_ADDRESS=zh_CN
LC_TELEPHONE=zh_CN
LC_MEASUREMENT=zh_CN
LC_IDENTIFICATION=zh_CN
LC_ALL=

Thanks very much.


dianbo.zhu

From: Nitin Pawarmailto:nitinpawar...@gmail.com
Date: 2012-05-22 18:42
To: usermailto:user@hive.apache.org
Subject: Re: start hive cli error
error is due to default encoding.
hive supports UTF-8  based encoding but somehow your hive setup is picking up 
GB2312.

can you provide the output of locale command?

Thanks,
Nitin
On Tue, May 22, 2012 at 3:17 PM, Dimboo Zhu 
dianbo@gmail.commailto:dianbo@gmail.com wrote:
hi there,

I got the following trace stack when startuping hive cli. It worked=20 well 
last week when i just installed it.
Anybody can help? thanks,

Dianbau

[dzhu@bbdw-194 bin]$ ./hive
Logging initialized using configuration in j= 
ar:file:/local/dzhu/hadoop/hive-0.8.1-bin/lib/hive-common-0.8.1.jar!/hive-= 
log4j.properties
Hive history file=3D/tmp/dzhu/hive_job_log_dzhu_20120518130= 0_1495802688.txt
Exception in thread main java.io.UnsupportedEnc= odingException: GB2312
at sun.nio.cs.Sthttp://sun.nio.cs.St= 
reamEncoder.forOutputStreamWriter(StreamEncoder.java:42)
at java.io.Outpu= tStreamWriter.init(OutputStreamWriter.java:83)
at jline.Console= Reader.init(ConsoleReader.java:174)
at org.apache.ha= doop.hive.cli.CliDriver.run(CliDriver.java:649)
at org.apache.ha= doop.hive.cli.CliDriver.main(CliDriver.java:554)
at sun.reflect.N= ativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.N= 
ativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.D= 
elegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.ref= lect.Method.invoke(Method.java:597)
at org.apache.ha= doop.util.RunJar.main(RunJar.java:156)



--
Nitin Pawar


Re: Map side aggregations

2012-05-22 Thread Tucker, Matt
Try setting hive.auto.convert.join to true.  The CLI will have a local task 
before it starts a map-reduce job on the cluster.

Matt



On May 22, 2012, at 8:43 PM, Raghunath, Ranjith 
ranjith.raghuna...@usaa.commailto:ranjith.raghuna...@usaa.com wrote:

I have the parameter hive.map.aggr set to true. However, when I look at the 
counters associated with the map tasks I notice the following “Combine input 
records 0”. I am interpreting this as a failure to perform the map side 
aggregation. Is that accurate? Is this option not working in hive 0.7.1?

Thanks,
Ranjith





Want to give a short talk at the next Hive User Group meetup?

2012-05-22 Thread Carl Steinbach
Hi,

I just wanted to remind everyone that the next Hive User Group meetup is
happening on June 12th (the day before the Hadoop Summit) in San Jose. More
details about the meetup can be found on the Hive User Group page located
here:

http://www.meetup.com/Hive-User-Group-Meeting/events/62458462/

I also wanted to remind everyone that I'm looking for speakers for this
event. Our plan is to have people give short 15 minute talks on topics that
are relevant to the Hive community, and at this point I have still have a
couple slots left to fill. Please send me an email with your proposed topic
if you're interested in speaking.

Thanks.

Carl


Re: Map side aggregations

2012-05-22 Thread Ranjith
Thanks Matt. I am not performing a join so does that matter? What does this 
local task do?

Thanks,
Ranjith

On May 22, 2012, at 8:17 PM, Tucker, Matt matt.tuc...@disney.com wrote:

 Try setting hive.auto.convert.join to true.  The CLI will have a local task 
 before it starts a map-reduce job on the cluster.
 
 Matt
 
 
 
 On May 22, 2012, at 8:43 PM, Raghunath, Ranjith 
 ranjith.raghuna...@usaa.com wrote:
 
 I have the parameter hive.map.aggr set to true. However, when I look at the 
 counters associated with the map tasks I notice the following “Combine input 
 records 0”. I am interpreting this as a failure to perform the map side 
 aggregation. Is that accurate? Is this option not working in hive 0.7.1?
  
 Thanks,
 Ranjith