Re: Help with TimeSeries serde
Thanks, I've got it working. I am making a chart UDF, to convert BigInts from count(6) to ** I am trying to find a way to select TimeSeries(count(*)) and group by day, but this results in an error: select to_date(dt) as total, TimeSeries(CAST(count(*) AS INT)) as stars, count(*) as total from from_to group by to_date(dt) order by to_date(dt); FAILED: Error in semantic analysis: Line 5:16 Invalid table alias or column reference 'total': (possible column names are: from_address, to_address, dt) I can't seem to select a to_date(dt) value and also group AND order by it. I need to order by it to get my time series :) Any ideas? On Wed, May 30, 2012 at 9:58 PM, Aniket Mokashi aniket...@gmail.com wrote: If this is UDF, you will need hive-exec.jar to compile it. I am not sure what is the use of this udf. Serde has following interface-- public interface SerDe extends Deserializer, Serializer ~Aniket On Wed, May 30, 2012 at 9:51 PM, Russell Jurney russell.jur...@gmail.comwrote: I tried to make a simple Serde that converted Ints to * for a time series/distribution chart. https://github.com/rjurney/timeseriesserde I get this error: https://gist.github.com/2841090 -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome. com -- ...:::Aniket:::... Quetzalco@tl -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com
Re: How to execute query with timestamp type (Hbase/Hive integeration)
Hi Mark, thanks for your response, I tried with other data type, it seems the issue occur while query the timestamp field only, not sure how the timestamp mapping work... From the hbase, I seek the data and the value of timestamp(event_time) is: Wed May 30 16:15:06 CST 2012, should I should the value as long type? Regards, Peyton -原始邮件- From: Mark Grover Sent: Thursday, May 31, 2012 11:14 AM To: user@hive.apache.org Subject: Re: How to execute query with timestamp type (Hbase/Hive integeration) Hi Peyton, It seems like something to do with timestamp mapping. What happens if you change your Hive table definition to have the event_time as int or string? Mark - Original Message - From: Peyton Peng pengp...@yunyou.tv To: user@hive.apache.org Sent: Wednesday, May 30, 2012 5:54:20 AM Subject: Re: How to execute query with timestamp type (Hbase/Hive integeration) Actually I can execute the first sql and it works well, all the libs you specified is under the hive lib folder, I doubt if the issue is caused by the timestamp mapping between hbase with hive.. Regards, Peyton From: shashwat shriparv Sent: Wednesday, May 30, 2012 5:26 PM To: user@hive.apache.org Subject: Re: How to execute query with timestamp type (Hbase/Hive integeration) Add these file to hive lib folder hadoop-0.20-core.jar hive/lib/hive-exec-0.7.1.jar hive/lib/hive-jdbc-0.7.1.jar hive/lib/hive-metastore-0.7.1.jar hive/lib/hive-service-0.7.1.jar hive/lib/libfb303.jar lib/commons-logging-1.0.4.jar slf4j-api-1.6.1.jar slf4j-log4j12-1.6.1.jar and then try On Wed, May 30, 2012 at 2:23 PM, Peyton Peng pengp...@yunyou.tv wrote: Hi, I build the hive table mapped with hbase table, CREATE TABLE http_access(key string, client_ip string, client_port int, request_method string, event_time timestamp) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( hbase.columns.mapping = :key,client:ip,client:port,request:method,request:event_time ); The data is store with hbase client. I get an issue while query with hive for (hbase/hive integration), while I execute sql: select ip, port, request_method from http_access , it works well with no problem, but while I execute below sql: select ip, port, event_time from http_access, I got below exception. The only difference between two sqls is: the event_time is timestamp type, I can scan the corresponding hbase table and see the value of event_time is: 1338365792142 column=request:event_time, timestamp=1338365739818, value=Wed May 30 16:15:06 CST 2012 Anyone who know what the issue is? (Not sure if I made a wrong mapping or should I just store the timestamp value as long in hbase? currently I store the value as java.util.Date) Thank you very much Regards, Peyton Exception tracking: Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator hdfs://Server:9000/user/hive/warehouse/http_access Starting Job = job_201205291421_0008, Tracking URL = http://Server:50030/jobdetails.jsp?jobid=job_201205291421_0008 Kill Command = /here is the hadoop_home/libexec/../bin/hadoop job -Dmapred.job.tracker=Server:9001 –kill job_201205291421_0008 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2012-05-30 16:28:01,572 Stage-1 map = 0%, reduce = 0% 2012-05-30 16:28:34,707 Stage-1 map = 100%, reduce = 100% Ended Job = job_201205291421_0008 with errors Error during job, obtaining debugging information... Examining task ID: task_201205291421_0008_m_02 (and more) from job job_201205291421_0008 Exception in thread Thread-211 java.lang.RuntimeException: Error while reading from task log url at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130) at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebuggerjava:211) at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://Server:50060/tasklog?taskid=attempt_201205291421_0008_m_00_1start=-8193 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1305) at java.net.URL.openStream(URL.java:1009) at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120) .. 3 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec -- ∞ Shashwat Shriparv
Hive partition problem
hi, We setup a new hive 0.9 client, Found some sql did not work, for example hive create table t1(a int) partitioned by ( dt int ); OK Time taken: 0.097 seconds hive load data local inpath '/tmp/t' into table t1 partition (dt=111); Copying data from file:/tmp/t Copying file: file:/tmp/t Loading data to table default.t1 partition (dt=111) OK Time taken: 0.375 seconds hive show partitions t1; OK dt=111 Time taken: 0.108 seconds hive alter table t1 drop partition ( dt=111 ); FAILED: Error in semantic analysis: Partition not found dt = 111 hive alter table t1 drop partition ( dt='111' ); FAILED: Error in semantic analysis: Partition not found dt = '111' So, how to delete the partition?
Re: Hive partition problem
Hello wd, Try this one... I am not sure about this ALTER TABLE t1 DROP PARTITION(dt = '111') -- Regards, Bhavesh Shah On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote: hi, We setup a new hive 0.9 client, Found some sql did not work, for example hive create table t1(a int) partitioned by ( dt int ); OK Time taken: 0.097 seconds hive load data local inpath '/tmp/t' into table t1 partition (dt=111); Copying data from file:/tmp/t Copying file: file:/tmp/t Loading data to table default.t1 partition (dt=111) OK Time taken: 0.375 seconds hive show partitions t1; OK dt=111 Time taken: 0.108 seconds hive alter table t1 drop partition ( dt=111 ); FAILED: Error in semantic analysis: Partition not found dt = 111 hive alter table t1 drop partition ( dt='111' ); FAILED: Error in semantic analysis: Partition not found dt = '111' So, how to delete the partition?
Re: Hive partition problem
Sorry for previous reply. Actually you had try that one, I didn't saw that. Really Sorry. On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah bhavesh25s...@gmail.comwrote: Hello wd, Try this one... I am not sure about this ALTER TABLE t1 DROP PARTITION(dt = '111') -- Regards, Bhavesh Shah On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote: hi, We setup a new hive 0.9 client, Found some sql did not work, for example hive create table t1(a int) partitioned by ( dt int ); OK Time taken: 0.097 seconds hive load data local inpath '/tmp/t' into table t1 partition (dt=111); Copying data from file:/tmp/t Copying file: file:/tmp/t Loading data to table default.t1 partition (dt=111) OK Time taken: 0.375 seconds hive show partitions t1; OK dt=111 Time taken: 0.108 seconds hive alter table t1 drop partition ( dt=111 ); FAILED: Error in semantic analysis: Partition not found dt = 111 hive alter table t1 drop partition ( dt='111' ); FAILED: Error in semantic analysis: Partition not found dt = '111' So, how to delete the partition? -- Regards, Bhavesh Shah
Re: Hive partition problem
No problem, thanks for your reply. I'm very curious why this didn't work, this sql come from hive wiki. The metadata is store in postgres, does it matter? On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Sorry for previous reply. Actually you had try that one, I didn't saw that. Really Sorry. On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Hello wd, Try this one... I am not sure about this ALTER TABLE t1 DROP PARTITION(dt = '111') -- Regards, Bhavesh Shah On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote: hi, We setup a new hive 0.9 client, Found some sql did not work, for example hive create table t1(a int) partitioned by ( dt int ); OK Time taken: 0.097 seconds hive load data local inpath '/tmp/t' into table t1 partition (dt=111); Copying data from file:/tmp/t Copying file: file:/tmp/t Loading data to table default.t1 partition (dt=111) OK Time taken: 0.375 seconds hive show partitions t1; OK dt=111 Time taken: 0.108 seconds hive alter table t1 drop partition ( dt=111 ); FAILED: Error in semantic analysis: Partition not found dt = 111 hive alter table t1 drop partition ( dt='111' ); FAILED: Error in semantic analysis: Partition not found dt = '111' So, how to delete the partition? -- Regards, Bhavesh Shah
Re: Hive partition problem
You should look at hive log and find exact exception. That will give you a hint. On Thu, May 31, 2012 at 12:33 AM, wd w...@wdicc.com wrote: No problem, thanks for your reply. I'm very curious why this didn't work, this sql come from hive wiki. The metadata is store in postgres, does it matter? On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Sorry for previous reply. Actually you had try that one, I didn't saw that. Really Sorry. On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Hello wd, Try this one... I am not sure about this ALTER TABLE t1 DROP PARTITION(dt = '111') -- Regards, Bhavesh Shah On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote: hi, We setup a new hive 0.9 client, Found some sql did not work, for example hive create table t1(a int) partitioned by ( dt int ); OK Time taken: 0.097 seconds hive load data local inpath '/tmp/t' into table t1 partition (dt=111); Copying data from file:/tmp/t Copying file: file:/tmp/t Loading data to table default.t1 partition (dt=111) OK Time taken: 0.375 seconds hive show partitions t1; OK dt=111 Time taken: 0.108 seconds hive alter table t1 drop partition ( dt=111 ); FAILED: Error in semantic analysis: Partition not found dt = 111 hive alter table t1 drop partition ( dt='111' ); FAILED: Error in semantic analysis: Partition not found dt = '111' So, how to delete the partition? -- Regards, Bhavesh Shah -- ...:::Aniket:::... Quetzalco@tl
Re: Hive partition problem
Nothing output in hive history file, is there an other log file or an option to output detail log ? On Thu, May 31, 2012 at 3:34 PM, Aniket Mokashi aniket...@gmail.com wrote: You should look at hive log and find exact exception. That will give you a hint. On Thu, May 31, 2012 at 12:33 AM, wd w...@wdicc.com wrote: No problem, thanks for your reply. I'm very curious why this didn't work, this sql come from hive wiki. The metadata is store in postgres, does it matter? On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Sorry for previous reply. Actually you had try that one, I didn't saw that. Really Sorry. On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Hello wd, Try this one... I am not sure about this ALTER TABLE t1 DROP PARTITION(dt = '111') -- Regards, Bhavesh Shah On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote: hi, We setup a new hive 0.9 client, Found some sql did not work, for example hive create table t1(a int) partitioned by ( dt int ); OK Time taken: 0.097 seconds hive load data local inpath '/tmp/t' into table t1 partition (dt=111); Copying data from file:/tmp/t Copying file: file:/tmp/t Loading data to table default.t1 partition (dt=111) OK Time taken: 0.375 seconds hive show partitions t1; OK dt=111 Time taken: 0.108 seconds hive alter table t1 drop partition ( dt=111 ); FAILED: Error in semantic analysis: Partition not found dt = 111 hive alter table t1 drop partition ( dt='111' ); FAILED: Error in semantic analysis: Partition not found dt = '111' So, how to delete the partition? -- Regards, Bhavesh Shah -- ...:::Aniket:::... Quetzalco@tl
Hive in IntelliJ
Hi, has anyone managed to get Hive properly set up in IntelliJ? I've tried but so far I've failed to get it running with Ivy and its dependencies. Pointers would be very appreciated. Cheers, Lars
Re: Hive partition problem
-hiveconf hive.root.logger=DEBUG,FA \ -hiveconf hive.log.dir=$logDir \ -hiveconf hive.log.file=$logFile \ On Thu, May 31, 2012 at 12:41 AM, wd w...@wdicc.com wrote: Nothing output in hive history file, is there an other log file or an option to output detail log ? On Thu, May 31, 2012 at 3:34 PM, Aniket Mokashi aniket...@gmail.com wrote: You should look at hive log and find exact exception. That will give you a hint. On Thu, May 31, 2012 at 12:33 AM, wd w...@wdicc.com wrote: No problem, thanks for your reply. I'm very curious why this didn't work, this sql come from hive wiki. The metadata is store in postgres, does it matter? On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Sorry for previous reply. Actually you had try that one, I didn't saw that. Really Sorry. On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Hello wd, Try this one... I am not sure about this ALTER TABLE t1 DROP PARTITION(dt = '111') -- Regards, Bhavesh Shah On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote: hi, We setup a new hive 0.9 client, Found some sql did not work, for example hive create table t1(a int) partitioned by ( dt int ); OK Time taken: 0.097 seconds hive load data local inpath '/tmp/t' into table t1 partition (dt=111); Copying data from file:/tmp/t Copying file: file:/tmp/t Loading data to table default.t1 partition (dt=111) OK Time taken: 0.375 seconds hive show partitions t1; OK dt=111 Time taken: 0.108 seconds hive alter table t1 drop partition ( dt=111 ); FAILED: Error in semantic analysis: Partition not found dt = 111 hive alter table t1 drop partition ( dt='111' ); FAILED: Error in semantic analysis: Partition not found dt = '111' So, how to delete the partition? -- Regards, Bhavesh Shah -- ...:::Aniket:::... Quetzalco@tl -- ...:::Aniket:::... Quetzalco@tl
Re: Hive partition problem
Is this a bug? Should I report a bug to hive? On Thu, May 31, 2012 at 3:56 PM, wd w...@wdicc.com wrote: Still no useful output. But we found the problem. When the partition col type is int, it can't be droped. After change it to string, it can be droped. On Thu, May 31, 2012 at 3:48 PM, Aniket Mokashi aniket...@gmail.com wrote: -hiveconf hive.root.logger=DEBUG,FA \ -hiveconf hive.log.dir=$logDir \ -hiveconf hive.log.file=$logFile \ On Thu, May 31, 2012 at 12:41 AM, wd w...@wdicc.com wrote: Nothing output in hive history file, is there an other log file or an option to output detail log ? On Thu, May 31, 2012 at 3:34 PM, Aniket Mokashi aniket...@gmail.com wrote: You should look at hive log and find exact exception. That will give you a hint. On Thu, May 31, 2012 at 12:33 AM, wd w...@wdicc.com wrote: No problem, thanks for your reply. I'm very curious why this didn't work, this sql come from hive wiki. The metadata is store in postgres, does it matter? On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Sorry for previous reply. Actually you had try that one, I didn't saw that. Really Sorry. On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Hello wd, Try this one... I am not sure about this ALTER TABLE t1 DROP PARTITION(dt = '111') -- Regards, Bhavesh Shah On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote: hi, We setup a new hive 0.9 client, Found some sql did not work, for example hive create table t1(a int) partitioned by ( dt int ); OK Time taken: 0.097 seconds hive load data local inpath '/tmp/t' into table t1 partition (dt=111); Copying data from file:/tmp/t Copying file: file:/tmp/t Loading data to table default.t1 partition (dt=111) OK Time taken: 0.375 seconds hive show partitions t1; OK dt=111 Time taken: 0.108 seconds hive alter table t1 drop partition ( dt=111 ); FAILED: Error in semantic analysis: Partition not found dt = 111 hive alter table t1 drop partition ( dt='111' ); FAILED: Error in semantic analysis: Partition not found dt = '111' So, how to delete the partition? -- Regards, Bhavesh Shah -- ...:::Aniket:::... Quetzalco@tl -- ...:::Aniket:::... Quetzalco@tl
Error in Hive execution on cluster : Wrong FS: hdfs
Hi, I am facing below error when I am firing any query in Hive. My Hive setup is present on the master node in my cluster. Hadoop is configured using IP_addresses in configuration xmls and in master, slave files and it is running fine. The error only arises when hive query is executed which had table location on HDFS. It seems hive is expecting configurations to be done using hostnames. Please help me in configuring hive such that it can understand IP address based configuration. I am using Hadoop 0.20.2 and Hive 0.7.1. Regards, Puneet __ __ __ hive show tables; OK test_page_view test_page_view_stg Time taken: 131.309 seconds hive select * from test_page_view_stg; FAILED: Hive Internal Error: java.lang.RuntimeException(Error while making MR scratch directory - check filesystem config (null)) java.lang.RuntimeException: Error while making MR scratch directory - check filesystem config (null) at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:196) at org.apache.hadoop.hive.ql.Context.getMRTmpFileURI(Context.java:247) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:900) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6594) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://IP_ADDRESS:9100/tmp/hive-hadoop/hive_2012-05-31_11-29-19_844_3368974040204630542, expected: hdfs://hostname.local:9100 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310) at org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99) at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:222) at org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:116) at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:146) at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:190) ... 14 more Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All e-mails sent from or to Tavant Technologies may be subject to our monitoring procedures.
Hadoop Russia user group
Hi everyone, I've created a group on Linkedin for Russian-speaking folks. It is about Hadoop and its components (Hive, Pig, etc). http://www.linkedin.com/groups/Hadoop-Russia-4468740?gid=4468740 Thanks, Ruslan Al-Fakikh
RE: Hive partition problem
Hi, We used to use integer partitioning for 2 columns, year_month_day and hour When it came to dropping 1 partition for an hour for a particular day, it seemed to drop all of the hour partitions for the day. IMO it's a bug. We just moved to using text Malc -Original Message- From: wd [mailto:w...@wdicc.com] Sent: 31 May 2012 09:32 To: user@hive.apache.org Subject: Re: Hive partition problem Is this a bug? Should I report a bug to hive? On Thu, May 31, 2012 at 3:56 PM, wd w...@wdicc.com wrote: Still no useful output. But we found the problem. When the partition col type is int, it can't be droped. After change it to string, it can be droped. On Thu, May 31, 2012 at 3:48 PM, Aniket Mokashi aniket...@gmail.com wrote: -hiveconf hive.root.logger=DEBUG,FA \ -hiveconf hive.log.dir=$logDir \ -hiveconf hive.log.file=$logFile \ On Thu, May 31, 2012 at 12:41 AM, wd w...@wdicc.com wrote: Nothing output in hive history file, is there an other log file or an option to output detail log ? On Thu, May 31, 2012 at 3:34 PM, Aniket Mokashi aniket...@gmail.com wrote: You should look at hive log and find exact exception. That will give you a hint. On Thu, May 31, 2012 at 12:33 AM, wd w...@wdicc.com wrote: No problem, thanks for your reply. I'm very curious why this didn't work, this sql come from hive wiki. The metadata is store in postgres, does it matter? On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Sorry for previous reply. Actually you had try that one, I didn't saw that. Really Sorry. On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Hello wd, Try this one... I am not sure about this ALTER TABLE t1 DROP PARTITION(dt = '111') -- Regards, Bhavesh Shah On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote: hi, We setup a new hive 0.9 client, Found some sql did not work, for example hive create table t1(a int) partitioned by ( dt int ); OK Time taken: 0.097 seconds hive load data local inpath '/tmp/t' into table t1 partition (dt=111); Copying data from file:/tmp/t Copying file: file:/tmp/t Loading data to table default.t1 partition (dt=111) OK Time taken: 0.375 seconds hive show partitions t1; OK dt=111 Time taken: 0.108 seconds hive alter table t1 drop partition ( dt=111 ); FAILED: Error in semantic analysis: Partition not found dt = 111 hive alter table t1 drop partition ( dt='111' ); FAILED: Error in semantic analysis: Partition not found dt = '111' So, how to delete the partition? -- Regards, Bhavesh Shah -- ...:::Aniket:::... Quetzalco@tl -- ...:::Aniket:::... Quetzalco@tl
Re: Hive partition problem
It's too bad. I've file a issue at hive jira, https://issues.apache.org/jira/browse/HIVE-3069. On Thu, May 31, 2012 at 5:16 PM, Malcolm Tye malcolm@btinternet.com wrote: Hi, We used to use integer partitioning for 2 columns, year_month_day and hour When it came to dropping 1 partition for an hour for a particular day, it seemed to drop all of the hour partitions for the day. IMO it's a bug. We just moved to using text Malc -Original Message- From: wd [mailto:w...@wdicc.com] Sent: 31 May 2012 09:32 To: user@hive.apache.org Subject: Re: Hive partition problem Is this a bug? Should I report a bug to hive? On Thu, May 31, 2012 at 3:56 PM, wd w...@wdicc.com wrote: Still no useful output. But we found the problem. When the partition col type is int, it can't be droped. After change it to string, it can be droped. On Thu, May 31, 2012 at 3:48 PM, Aniket Mokashi aniket...@gmail.com wrote: -hiveconf hive.root.logger=DEBUG,FA \ -hiveconf hive.log.dir=$logDir \ -hiveconf hive.log.file=$logFile \ On Thu, May 31, 2012 at 12:41 AM, wd w...@wdicc.com wrote: Nothing output in hive history file, is there an other log file or an option to output detail log ? On Thu, May 31, 2012 at 3:34 PM, Aniket Mokashi aniket...@gmail.com wrote: You should look at hive log and find exact exception. That will give you a hint. On Thu, May 31, 2012 at 12:33 AM, wd w...@wdicc.com wrote: No problem, thanks for your reply. I'm very curious why this didn't work, this sql come from hive wiki. The metadata is store in postgres, does it matter? On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Sorry for previous reply. Actually you had try that one, I didn't saw that. Really Sorry. On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote: Hello wd, Try this one... I am not sure about this ALTER TABLE t1 DROP PARTITION(dt = '111') -- Regards, Bhavesh Shah On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote: hi, We setup a new hive 0.9 client, Found some sql did not work, for example hive create table t1(a int) partitioned by ( dt int ); OK Time taken: 0.097 seconds hive load data local inpath '/tmp/t' into table t1 partition (dt=111); Copying data from file:/tmp/t Copying file: file:/tmp/t Loading data to table default.t1 partition (dt=111) OK Time taken: 0.375 seconds hive show partitions t1; OK dt=111 Time taken: 0.108 seconds hive alter table t1 drop partition ( dt=111 ); FAILED: Error in semantic analysis: Partition not found dt = 111 hive alter table t1 drop partition ( dt='111' ); FAILED: Error in semantic analysis: Partition not found dt = '111' So, how to delete the partition? -- Regards, Bhavesh Shah -- ...:::Aniket:::... Quetzalco@tl -- ...:::Aniket:::... Quetzalco@tl
Re: confused on different behavior of Bucketized tables do not support INSERT INTO
I'm using hive 0.9.0 On Thursday, May 31, 2012, Bruce Bian wrote: Hi, I've got a table vt_new_data which is defined as follows: CREATE TABLE VT_NEW_DATA ( V_ACCOUNT_NUM string ,V_ACCOUNT_MODIFIER_NUM string ,V_DEPOSIT_TYPE_CD string ,V_DEPOSIT_TERM int ,V_LEDGER_SUBJECT_ID string ,V_ACCOUNTING_ORG_CD string ,V_OPEN_DT string ,V_CLOSE_DT string ,V_CURRENCY_CD string ,V_ACCOUNT_BAL float ,V_INNER_MONTH_DELAY_ACCUM float ) CLUSTERED BY (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM) SORTED BY (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM ASC) INTO 256 BUCKETS STORED AS RCFile; when I execute the following query explain insert into table vt_new_data select * from vt_new_data limit 1; (this is just a test) an FAILED: Error in semantic analysis: Bucketized tables do not support INSERT INTO: Table: vt_new_data error is occurred but when I execute the query: explain insert into table vt_new_data select /*+ MAPJOIN(T4) */ t1.account_num as v_account_num ,t1.account_modifier_num as v_account_modifier_num ,'3006' as v_deposit_type_cd ,0 as v_deposit_term ,'23201000' v_ledger_subject_id ,coalesce(t2.party_id,'') as v_accounting_org_cd ,coalesce(t3.card_begin_dt,'19000101') as v_open_dt ,coalesce(t3.card_live_dt,'19000101') as v_close_dt ,coalesce(t4.currency_cd,substr(t1.account_modifier_num,3,3)) as v_currency_cd ,coalesce(t4.agt_amt,0) as v_account_bal ,0 as v_inner_month_delay_accum from t03_e_cash_bucket t1 left outer join t03_agt_amount_h_bucket t4 on t1.account_num=t4.account_num and t1.account_modifier_num=t4.account_modifier_num and t4.agt_amt_type_cd = '001' and t4.start_date='$TXNDATE' and t4.end_date'$TXNDATE' left outer join t01_party_card_rela_h_bucket t2 on t1.card_no = t2.card_no and t2.party_card_rela_type_cd = '01' and t2.start_date='$TXNDATE' and t2.end_date'$TXNDATE' left outer join t03_card_bucket t3 on t1.card_no = t3.card_no; the execution plan is generated successfully and triggered an SMB Map Join, which is great. But I don't see the difference here? As both are inserting into a bucketized and sorted table?
Re: confused on different behavior of Bucketized tables do not support INSERT INTO
So I did another test on this. hive create table test(foo int,bar string) clustered by(foo) sorted by (foo asc) into 2 buckets; OK Time taken: 0.097 seconds hive create table test2 (foo int,bar string) clustered by(foo) sorted by (foo asc) into 2 buckets; OK hive LOAD DATA LOCAL INPATH 'hive/examples/files/kv1.txt' OVERWRITE INTO TABLE test; hive set hive.enforce.bucketing=true; hive set hive.enforce.sorting=true; hive insert into table test2 select * from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 …… hive insert into table test2 select * from test2; FAILED: Error in semantic analysis: Bucketized tables do not support INSERT INTO: Table: test2 Seems like the errorFAILED: Error in semantic analysis: Bucketized tables do not support INSERT INTO: Table: vt_new_data error is occurred is only thrown when insert into a bucketized table from the same table? And when insert into a bucketized table multi-times, it will create a original_file_copy_n under the same bucket. -rw-r--r-- 3 wbian supergroup 2856 2012-05-31 22:03 /user/hive/warehouse/test2/00_0 -rw-r--r-- 3 wbian supergroup 2856 2012-05-31 22:04 /user/hive/warehouse/test2/00_0_copy_1 -rw-r--r-- 3 wbian supergroup 2956 2012-05-31 22:03 /user/hive/warehouse/test2/01_0 -rw-r--r-- 3 wbian supergroup 2956 2012-05-31 22:04 /user/hive/warehouse/test2/01_0_copy_1 And since what I want to do is SMB Map Join, the following triggered the SMB Map Join successfully set hive.optimize.bucketmapjoin= true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; select /*+mapjoin(test)*/ * from pokes join test on pokes.foo=test.foo; So what's the reason for throwing that error(i mean why not support insert into a bucketized table from the same table)?And isn't that error message kind of misleading? On Thu, May 31, 2012 at 6:43 PM, Bruce Bian weidong@gmail.com wrote: I'm using hive 0.9.0 On Thursday, May 31, 2012, Bruce Bian wrote: Hi, I've got a table vt_new_data which is defined as follows: CREATE TABLE VT_NEW_DATA ( V_ACCOUNT_NUM string ,V_ACCOUNT_MODIFIER_NUM string ,V_DEPOSIT_TYPE_CD string ,V_DEPOSIT_TERM int ,V_LEDGER_SUBJECT_ID string ,V_ACCOUNTING_ORG_CD string ,V_OPEN_DT string ,V_CLOSE_DT string ,V_CURRENCY_CD string ,V_ACCOUNT_BAL float ,V_INNER_MONTH_DELAY_ACCUM float ) CLUSTERED BY (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM) SORTED BY (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM ASC) INTO 256 BUCKETS STORED AS RCFile; when I execute the following query explain insert into table vt_new_data select * from vt_new_data limit 1; (this is just a test) an FAILED: Error in semantic analysis: Bucketized tables do not support INSERT INTO: Table: vt_new_data error is occurred but when I execute the query: explain insert into table vt_new_data select /*+ MAPJOIN(T4) */ t1.account_num as v_account_num ,t1.account_modifier_num as v_account_modifier_num ,'3006' as v_deposit_type_cd ,0 as v_deposit_term ,'23201000' v_ledger_subject_id ,coalesce(t2.party_id,'') as v_accounting_org_cd ,coalesce(t3.card_begin_dt,'19000101') as v_open_dt ,coalesce(t3.card_live_dt,'19000101') as v_close_dt ,coalesce(t4.currency_cd,substr(t1.account_modifier_num,3,3)) as v_currency_cd ,coalesce(t4.agt_amt,0) as v_account_bal ,0 as v_inner_month_delay_accum from t03_e_cash_bucket t1 left outer join t03_agt_amount_h_bucket t4 on t1.account_num=t4.account_num and t1.account_modifier_num=t4.account_modifier_num and t4.agt_amt_type_cd = '001' and t4.start_date='$TXNDATE' and t4.end_date'$TXNDATE' left outer join t01_party_card_rela_h_bucket t2 on t1.card_no = t2.card_no and t2.party_card_rela_type_cd = '01' and t2.start_date='$TXNDATE' and t2.end_date'$TXNDATE' left outer join t03_card_bucket t3 on t1.card_no = t3.card_no; the execution plan is generated successfully and triggered an SMB Map Join, which is great. But I don't see the difference here? As both are inserting into a bucketized and sorted table?
Parse Error with '-' in Hive
Hi there, I'm looking for help with the following error: I keep receiving an error 'Parse Error: line 40:4 cannot recognize input near '-' 'Platform' STRING column type. ' It looks to be a problem with my JSON deserializer OR Hive not recognizing the - in the field names. So far I've tried commenting it out with / and //, and tried wrapping single and double quotes around the field name. Nothing has worked so far. My code looks like this: add jar s3://sa/hive-json-serde-0.3.jar; DROP TABLE IF EXISTS viewstream; CREATE EXTERNAL TABLE IF NOT EXISTS viewstream( cl STRING, pid STRING, vp STRING, ar STRING, UArm STRING, dc STRING, host STRING, Blocked STRING, UA-Platform STRING, UA-Browser STRING, UA-Version STRING, UA-Language STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' LOCATION 'hdfs:///repo/viewstream/'; Anybody run into similar errors? I can't find anything on this problem online, strangely enough. Thanks all, Alex
Re: Parse Error with '-' in Hive
I believe Hive column names cant have '-' in them . From what I know this JSON serde uses column names as JSON expressions. This means that if you renamed the column name you would end up with a null value for the column. You might want to try a different json serde (for e.g. the one Amazon uses in EMR examples allows the JSON expression for a column to be passed in as a SERDE parameter) thanks, Shrikanth On May 31, 2012, at 8:35 AM, Alex Barrera wrote: Hi there, I'm looking for help with the following error: I keep receiving an error 'Parse Error: line 40:4 cannot recognize input near '-' 'Platform' STRING column type. ' It looks to be a problem with my JSON deserializer OR Hive not recognizing the - in the field names. So far I've tried commenting it out with / and //, and tried wrapping single and double quotes around the field name. Nothing has worked so far. My code looks like this: add jar s3://sa/hive-json-serde-0.3.jar; DROP TABLE IF EXISTS viewstream; CREATE EXTERNAL TABLE IF NOT EXISTS viewstream( cl STRING, pid STRING, vp STRING, ar STRING, UArm STRING, dc STRING, host STRING, Blocked STRING, UA-Platform STRING, UA-Browser STRING, UA-Version STRING, UA-Language STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' LOCATION 'hdfs:///repo/viewstream/'; Anybody run into similar errors? I can't find anything on this problem online, strangely enough. Thanks all, Alex
Re: Hive in IntelliJ
On Thu, May 31, 2012 at 12:45 AM, Lars Francke lars.fran...@gmail.comwrote: Hi, has anyone managed to get Hive properly set up in IntelliJ? I've tried but so far I've failed to get it running with Ivy and its dependencies. I managed, but it wasn't easy. I let it do the original import, but then had to fix up the dependencies between modules to get everything compiling. The big problem was the stubs for the hadoop versions. I had everything depend on hadoop-0.20, the version specific stubs put their own hadoop version above that, no other modules depended on the version specific stubs. It would be great to move Hive over to maven, which makes the intellij integration much easier. -- Owen
Re: Hive in IntelliJ
Hive is in maven but it always forks a hadoop to run stuff. The approach I took is make fetching and extracting hadoop a 1 time install target. https://github.com/edwardcapriolo/hive_test/ I know its not the right way, but Runs unit tests lightning fast like 15 sec vs like 1:20 or however long it takes to run ant test in hive. On 5/31/12, Owen O'Malley omal...@apache.org wrote: On Thu, May 31, 2012 at 12:45 AM, Lars Francke lars.fran...@gmail.comwrote: Hi, has anyone managed to get Hive properly set up in IntelliJ? I've tried but so far I've failed to get it running with Ivy and its dependencies. I managed, but it wasn't easy. I let it do the original import, but then had to fix up the dependencies between modules to get everything compiling. The big problem was the stubs for the hadoop versions. I had everything depend on hadoop-0.20, the version specific stubs put their own hadoop version above that, no other modules depended on the version specific stubs. It would be great to move Hive over to maven, which makes the intellij integration much easier. -- Owen
Re: Hive in IntelliJ
On Thu, May 31, 2012 at 9:35 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Hive is in maven I meant that Hive is built with Ant. Intellij has better support for importing projects built by Maven. Hive's jars are published in Maven central, but that is a different thing. -- Owen
Re: Hive in IntelliJ
I agree, even if we had a decent ant script most IDEs could deal with it, but maven would be better. I understand why Hive's ant build infrastructure is the way it is. generating thrift, building a hive-exec.jar fat jar, getting hadoop in the old days before it was in maven, templating engine to generate .q files and diff them against .out files. But I still think we should gut it. Most of it's complexity is self imposed because common is its own subant, and cli is its own subant, serde is its own subant. Then each subant usually has to hack up the classpath somehow to work, or hack up something else to test. And then support other hacks like this. Which get replicated to every subant. fileset dir=${hadoop.root} !-- below is for 0.23 onwards -- include name=share/hadoop/common/lib/*.jar / exclude name=share/hadoop/common/lib/hadoop-mapreduce-*.jar / exclude name=share/hadoop/common/lib/hadoop-yarn-*.jar / /fileset /path Not all of it is hives fault, because one build of hive works with like 40 version of hadoop, and a majority of the hackery is dealing with rather large upstream changes. On the other hand maven is terrible at dealing with any of the outside of the box stuff hive needs to do. Each ugly ant hack hive has to do to deal with hadoop 20-20s-0.20.2.205-2.0 would like have to be written as a maven plugin which would not really leave us in a better state. Ugly-one-off ants vs ugly one off maven plugins. The way I look at it 33% of the problem is upstream changes and 33% is hive desire to fork hadoop to run hive, and 33% is the every component is a subant thing. On 5/31/12, Owen O'Malley omal...@apache.org wrote: On Thu, May 31, 2012 at 9:35 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Hive is in maven I meant that Hive is built with Ant. Intellij has better support for importing projects built by Maven. Hive's jars are published in Maven central, but that is a different thing. -- Owen
Re: Setup for contributing to Hive from MAC
https://cwiki.apache.org/confluence/display/Hive/HowToContribute will help you get started. Thanks, Ashutosh On Thu, May 31, 2012 at 8:55 AM, Mapred Learn mapred.le...@gmail.comwrote: Hi, Does anyone know how to set up apache hive on MAC to contribute and fix JIRAs ? Thanks, JJ Sent from my iPhone
Re: Hive in IntelliJ
Hi Owen ( Edward), has anyone managed to get Hive properly set up in IntelliJ? I've tried but so far I've failed to get it running with Ivy and its dependencies. I managed, but it wasn't easy. I let it do the original import, but then had to fix up the dependencies between modules to get everything compiling. The big problem was the stubs for the hadoop versions. I had everything depend on hadoop-0.20, the version specific stubs put their own hadoop version above that, no other modules depended on the version specific stubs. oh god yeah that sounds painful. I had hoped to somehow get around that. On the other hand it makes me happy that it wasn't just stupidity on my part. It would be great to move Hive over to maven, which makes the intellij integration much easier. Yeah that'd be awesome (fully realizing that this wouldn't be an easy feat either). Cheers, Lars
Re: Error in Hive execution on cluster : Wrong FS: hdfs
Can you check the configuration in your Hadoop's core-site.xml file. Also are you able to fire fs command from hadoop successfully by using the fully qualified directory structure? Seems a problem in your file system config. Regards, Abhishek On Thu, May 31, 2012 at 1:35 AM, Puneet Khatod puneet.kha...@tavant.comwrote: Hi, ** ** I am facing below error when I am firing any query in Hive. My Hive setup is present on the master node in my cluster. Hadoop is configured using IP_addresses in configuration xmls and in master, slave files and it is running fine. The error only arises when hive query is executed which had table location on HDFS. It seems hive is expecting configurations to be done using hostnames. ** ** Please help me in configuring hive such that it can understand IP address based configuration. I am using Hadoop 0.20.2 and Hive 0.7.1. ** ** Regards, Puneet ** ** __ __ __ ** ** hive show tables; OK test_page_view test_page_view_stg Time taken: 131.309 seconds hive select * from test_page_view_stg; FAILED: Hive Internal Error: java.lang.RuntimeException(Error while making MR scratch directory - check filesystem config (null)) java.lang.RuntimeException: Error while making MR scratch directory - check filesystem config (null) at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:196) at org.apache.hadoop.hive.ql.Context.getMRTmpFileURI(Context.java:247) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:900) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6594) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)** ** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) *Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://IP_ADDRESS:9100/tmp/hive-hadoop/hive_2012-05-31_11-29-19_844_3368974040204630542, expected: hdfs://hostname.local:9100* at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)* *** at org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99) at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:222) at org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:116) at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:146) at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:190) ... 14 more ** ** Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All e-mails sent from or to Tavant Technologies may be subject to our monitoring procedures.
Re: confused on different behavior of Bucketized tables do not support INSERT INTO
Hi Bruce, It seems to me that your issue may be two-fold. 1) The JIRA that introduced insert into table (https://issues.apache.org/jira/browse/HIVE-306) in Hive 0.8 only works for non-bucketed partitions or tables. This is because appending inside a table or (its partition, if it's partitioned) works by adding a new file within the folder on HDFS. Since bucketing in a partition is implemented as files with a folder, this would break bucketing. To allow for insert into support for bucketed tables, we will probably have to regenerate the entire bucket(s) within the partition. Not sure if this is being done or has already been done in a newer release. 2) It seems like there was a minor bug in the implementation. As mentioned by this ticket (https://issues.apache.org/jira/browse/HIVE-3064) where insert into table works as insert overwrite table if the table name in the insert statement has upper case characters (black magic, eh?). Regardless of whether your table was created using upper or lower case letters, this bug manifests itself if you use upper case characters for the table name in your insert into table query. To summarize, you shouldn't be calling insert into on bucketed tables for the reasons stated above. Also, if you are using insert into command, try to restrict your table names to lower case letters. I have also verified that insert into table dummy select * from dummy works when dummy is a non-bucketed table. As a matter of fact, insert overwrite table dummy select * from dummy works too. I would have expected your query insert into table test2 select * from test; to fail since test2 is bucketed as well. However, it doesn't. This seems to be a bug and I have created a JIRA for this (https://issues.apache.org/jira/browse/HIVE-3077). Hope that helps, Mark - Original Message - From: Bruce Bian weidong@gmail.com To: user@hive.apache.org Sent: Thursday, May 31, 2012 10:16:03 AM Subject: Re: confused on different behavior of Bucketized tables do not support INSERT INTO So I did another test on this. hive create table test(foo int,bar string) clustered by(foo) sorted by (foo asc) into 2 buckets; OK Time taken: 0.097 seconds hive create table test2 (foo int,bar string) clustered by(foo) sorted by (foo asc) into 2 buckets; OK hive LOAD DATA LOCAL INPATH 'hive/examples/files/kv1.txt' OVERWRITE INTO TABLE test; hive set hive.enforce.bucketing=true; hive set hive.enforce.sorting=true; hive insert into table test2 select * from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 …… hive insert into table test2 select * from test2; FAILED: Error in semantic analysis: Bucketized tables do not support INSERT INTO: Table: test2 Seems like the errorFAILED: Error in semantic analysis: Bucketized tables do not support INSERT INTO: Table: vt_new_data error is occurred is only thrown when insert into a bucketized table from the same table? And when insert into a bucketized table multi-times, it will create a original_file_copy_n under the same bucket. -rw-r--r-- 3 wbian supergroup 2856 2012-05-31 22:03 /user/hive/warehouse/test2/00_0 -rw-r--r-- 3 wbian supergroup 2856 2012-05-31 22:04 /user/hive/warehouse/test2/00_0_copy_1 -rw-r--r-- 3 wbian supergroup 2956 2012-05-31 22:03 /user/hive/warehouse/test2/01_0 -rw-r--r-- 3 wbian supergroup 2956 2012-05-31 22:04 /user/hive/warehouse/test2/01_0_copy_1 And since what I want to do is SMB Map Join, the following triggered the SMB Map Join successfully set hive.optimize.bucketmapjoin= true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; select /*+mapjoin(test)*/ * from pokes join test on pokes.foo=test.foo; So what's the reason for throwing that error(i mean why not support insert into a bucketized table from the same table)?And isn't that error message kind of misleading? On Thu, May 31, 2012 at 6:43 PM, Bruce Bian weidong@gmail.com wrote: I'm using hive 0.9.0 On Thursday, May 31, 2012, Bruce Bian wrote: Hi, I've got a table vt_new_data which is defined as follows: CREATE TABLE VT_NEW_DATA ( V_ACCOUNT_NUM string ,V_ACCOUNT_MODIFIER_NUM string ,V_DEPOSIT_TYPE_CD string ,V_DEPOSIT_TERM int ,V_LEDGER_SUBJECT_ID string ,V_ACCOUNTING_ORG_CD string ,V_OPEN_DT string ,V_CLOSE_DT string ,V_CURRENCY_CD string ,V_ACCOUNT_BAL float ,V_INNER_MONTH_DELAY_ACCUM float ) CLUSTERED BY (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM) SORTED BY (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM ASC) INTO 256 BUCKETS STORED AS RCFile; when I execute the following query explain insert into table vt_new_data select * from vt_new_data limit 1; (this is just a test) an FAILED: Error in semantic analysis: Bucketized tables do not support INSERT INTO: Table: vt_new_data error is occurred but when I execute the query: explain insert into table vt_new_data
Behavior of Hive 2837: insert into external tables should not be allowed
Hi folks, I have a question regarding HIVE 2837( https://issues.apache.org/jira/browse/HIVE-2837) that deals with disallowing external table from using insert into queries. From looking at the JIRA, it seems like it applies to external tables on HDFS as well. Technically, insert into should be ok for external tables on HDFS (and S3 as well). Seems like a storage file system level thing to specify whether insert into is applied and implement it. Historically, there hasn't been any real difference between creating an external table on HDFS vs creating a managed one. However, if we disallow insert into on external tables, that would mean that folks with external tables on HDFS wouldn't be able to make use of insert into functionality even though they should be able to. Do we want to allow insert into on HDFS tables regardless of whether they are external or not? Mark