Re: Help with TimeSeries serde

2012-05-31 Thread Russell Jurney
Thanks, I've got it working.  I am making a chart UDF, to convert BigInts
from count(6) to **

I am trying to find a way to select TimeSeries(count(*)) and group by day,
but this results in an error:

select to_date(dt) as total,
   TimeSeries(CAST(count(*) AS INT)) as stars,
   count(*) as total
   from from_to
   group by to_date(dt)
   order by to_date(dt);

FAILED: Error in semantic analysis: Line 5:16 Invalid table alias or column
reference 'total': (possible column names are: from_address, to_address, dt)

I can't seem to select a to_date(dt) value and also group AND order by it.
 I need to order by it to get my time series :)

Any ideas?

On Wed, May 30, 2012 at 9:58 PM, Aniket Mokashi aniket...@gmail.com wrote:

 If this is UDF, you will need hive-exec.jar to compile it. I am not sure
 what is the use of this udf.

 Serde has following interface--

 public interface SerDe extends Deserializer, Serializer

 ~Aniket

 On Wed, May 30, 2012 at 9:51 PM, Russell Jurney 
 russell.jur...@gmail.comwrote:

 I tried to make a simple Serde that converted Ints to * for a time
 series/distribution chart.

 https://github.com/rjurney/timeseriesserde

 I get this error: https://gist.github.com/2841090

 --
 Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.
 com




 --
 ...:::Aniket:::... Quetzalco@tl




-- 
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com


Re: How to execute query with timestamp type (Hbase/Hive integeration)

2012-05-31 Thread Peyton Peng

Hi Mark,

thanks for your response, I tried with other data type, it seems the issue 
occur while query the timestamp field only, not sure how the timestamp 
mapping work...


From the hbase, I seek the data and the value of timestamp(event_time) is: 

Wed May 30 16:15:06 CST 2012, should I should the value as long type?

Regards,
Peyton

-原始邮件- 
From: Mark Grover

Sent: Thursday, May 31, 2012 11:14 AM
To: user@hive.apache.org
Subject: Re: How to execute query with timestamp type (Hbase/Hive 
integeration)


Hi Peyton,
It seems like something to do with timestamp mapping.
What happens if you change your Hive table definition to have the event_time 
as int or string?


Mark

- Original Message -
From: Peyton Peng pengp...@yunyou.tv
To: user@hive.apache.org
Sent: Wednesday, May 30, 2012 5:54:20 AM
Subject: Re: How to execute query with timestamp type (Hbase/Hive 
integeration)





Actually I can execute the first sql and it works well, all the libs you 
specified is under the hive lib folder,


I doubt if the issue is caused by the timestamp mapping between hbase with 
hive..


Regards,
Peyton





From: shashwat shriparv
Sent: Wednesday, May 30, 2012 5:26 PM
To: user@hive.apache.org
Subject: Re: How to execute query with timestamp type (Hbase/Hive 
integeration)


Add these file to hive lib folder


hadoop-0.20-core.jar
hive/lib/hive-exec-0.7.1.jar
hive/lib/hive-jdbc-0.7.1.jar
hive/lib/hive-metastore-0.7.1.jar
hive/lib/hive-service-0.7.1.jar
hive/lib/libfb303.jar
lib/commons-logging-1.0.4.jar
slf4j-api-1.6.1.jar
slf4j-log4j12-1.6.1.jar


and then try

On Wed, May 30, 2012 at 2:23 PM, Peyton Peng  pengp...@yunyou.tv  wrote:






Hi,

I build the hive table mapped with hbase table,

CREATE TABLE http_access(key string, client_ip string, client_port int, 
request_method string, event_time timestamp)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
hbase.columns.mapping = 
:key,client:ip,client:port,request:method,request:event_time

);

The data is store with hbase client.

I get an issue while query with hive for (hbase/hive integration), while I 
execute sql: select ip, port, request_method from http_access , it works 
well with no problem,


but while I execute below sql: select ip, port, event_time from http_access, 
I got below exception.


The only difference between two sqls is: the event_time is timestamp type, I 
can scan the corresponding hbase table and see the value of event_time is:
1338365792142 column=request:event_time, timestamp=1338365739818, value=Wed 
May 30 16:15:06 CST 2012


Anyone who know what the issue is? (Not sure if I made a wrong mapping or 
should I just store the timestamp value as long in hbase? currently I store 
the value as java.util.Date)


Thank you very much

Regards,
Peyton

Exception tracking:

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
hdfs://Server:9000/user/hive/warehouse/http_access
Starting Job = job_201205291421_0008, Tracking URL = 
http://Server:50030/jobdetails.jsp?jobid=job_201205291421_0008
Kill Command = /here is the hadoop_home/libexec/../bin/hadoop 
job -Dmapred.job.tracker=Server:9001 –kill job_201205291421_0008
Hadoop job information for Stage-1: number of mappers: 1; number of 
reducers: 0

2012-05-30 16:28:01,572 Stage-1 map = 0%, reduce = 0%
2012-05-30 16:28:34,707 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201205291421_0008 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201205291421_0008_m_02 (and more) from job 
job_201205291421_0008
Exception in thread Thread-211 java.lang.RuntimeException: Error while 
reading from task log url
at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
at 
org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebuggerjava:211)

at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for 
URL: 
http://Server:50060/tasklog?taskid=attempt_201205291421_0008_m_00_1start=-8193
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1305)

at java.net.URL.openStream(URL.java:1009)
at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)

.. 3 more
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask

MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec





--


∞ Shashwat Shriparv




Hive partition problem

2012-05-31 Thread wd
hi,

We setup a new hive 0.9 client, Found some sql did not work, for example

hive create table t1(a int) partitioned by ( dt int );
OK
Time taken: 0.097 seconds
hive load data local inpath '/tmp/t' into table t1 partition (dt=111);
Copying data from file:/tmp/t
Copying file: file:/tmp/t
Loading data to table default.t1 partition (dt=111)
OK
Time taken: 0.375 seconds
hive show partitions t1;
OK
dt=111
Time taken: 0.108 seconds
hive alter table t1 drop partition ( dt=111 );
FAILED: Error in semantic analysis: Partition not found dt = 111
hive alter table t1 drop partition ( dt='111' );
FAILED: Error in semantic analysis: Partition not found dt = '111'


So, how to delete the partition?


Re: Hive partition problem

2012-05-31 Thread Bhavesh Shah
Hello wd,
Try this one... I am not sure about this
ALTER TABLE t1 DROP PARTITION(dt = '111')


-- 
Regards,
Bhavesh Shah


On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote:

 hi,

 We setup a new hive 0.9 client, Found some sql did not work, for example

 hive create table t1(a int) partitioned by ( dt int );
 OK
 Time taken: 0.097 seconds
 hive load data local inpath '/tmp/t' into table t1 partition (dt=111);
 Copying data from file:/tmp/t
 Copying file: file:/tmp/t
 Loading data to table default.t1 partition (dt=111)
 OK
 Time taken: 0.375 seconds
 hive show partitions t1;
 OK
 dt=111
 Time taken: 0.108 seconds
 hive alter table t1 drop partition ( dt=111 );
 FAILED: Error in semantic analysis: Partition not found dt = 111
 hive alter table t1 drop partition ( dt='111' );
 FAILED: Error in semantic analysis: Partition not found dt = '111'


 So, how to delete the partition?



Re: Hive partition problem

2012-05-31 Thread Bhavesh Shah
Sorry for previous reply. Actually you had try that one, I didn't saw that.
Really Sorry.

On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah bhavesh25s...@gmail.comwrote:

 Hello wd,
 Try this one... I am not sure about this
 ALTER TABLE t1 DROP PARTITION(dt = '111')


 --
 Regards,
 Bhavesh Shah


 On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote:

 hi,

 We setup a new hive 0.9 client, Found some sql did not work, for example

 hive create table t1(a int) partitioned by ( dt int );
 OK
 Time taken: 0.097 seconds
 hive load data local inpath '/tmp/t' into table t1 partition (dt=111);
 Copying data from file:/tmp/t
 Copying file: file:/tmp/t
 Loading data to table default.t1 partition (dt=111)
 OK
 Time taken: 0.375 seconds
 hive show partitions t1;
 OK
 dt=111
 Time taken: 0.108 seconds
 hive alter table t1 drop partition ( dt=111 );
 FAILED: Error in semantic analysis: Partition not found dt = 111
 hive alter table t1 drop partition ( dt='111' );
 FAILED: Error in semantic analysis: Partition not found dt = '111'


 So, how to delete the partition?








-- 
Regards,
Bhavesh Shah


Re: Hive partition problem

2012-05-31 Thread wd
No problem, thanks for your reply.
I'm very curious why this didn't work, this sql come from hive wiki.
The metadata is store in postgres, does it matter?

On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah bhavesh25s...@gmail.com wrote:
 Sorry for previous reply. Actually you had try that one, I didn't saw that.
 Really Sorry.

 On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah bhavesh25s...@gmail.com
 wrote:

 Hello wd,
 Try this one... I am not sure about this
 ALTER TABLE t1 DROP PARTITION(dt = '111')


 --
 Regards,
 Bhavesh Shah


 On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote:

 hi,

 We setup a new hive 0.9 client, Found some sql did not work, for example

 hive create table t1(a int) partitioned by ( dt int );
 OK
 Time taken: 0.097 seconds
 hive load data local inpath '/tmp/t' into table t1 partition (dt=111);
 Copying data from file:/tmp/t
 Copying file: file:/tmp/t
 Loading data to table default.t1 partition (dt=111)
 OK
 Time taken: 0.375 seconds
 hive show partitions t1;
 OK
 dt=111
 Time taken: 0.108 seconds
 hive alter table t1 drop partition ( dt=111 );
 FAILED: Error in semantic analysis: Partition not found dt = 111
 hive alter table t1 drop partition ( dt='111' );
 FAILED: Error in semantic analysis: Partition not found dt = '111'


 So, how to delete the partition?








 --
 Regards,
 Bhavesh Shah



Re: Hive partition problem

2012-05-31 Thread Aniket Mokashi
You should look at hive log and find exact exception. That will give you a
hint.

On Thu, May 31, 2012 at 12:33 AM, wd w...@wdicc.com wrote:

 No problem, thanks for your reply.
 I'm very curious why this didn't work, this sql come from hive wiki.
 The metadata is store in postgres, does it matter?

 On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah bhavesh25s...@gmail.com
 wrote:
  Sorry for previous reply. Actually you had try that one, I didn't saw
 that.
  Really Sorry.
 
  On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah bhavesh25s...@gmail.com
  wrote:
 
  Hello wd,
  Try this one... I am not sure about this
  ALTER TABLE t1 DROP PARTITION(dt = '111')
 
 
  --
  Regards,
  Bhavesh Shah
 
 
  On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote:
 
  hi,
 
  We setup a new hive 0.9 client, Found some sql did not work, for
 example
 
  hive create table t1(a int) partitioned by ( dt int );
  OK
  Time taken: 0.097 seconds
  hive load data local inpath '/tmp/t' into table t1 partition (dt=111);
  Copying data from file:/tmp/t
  Copying file: file:/tmp/t
  Loading data to table default.t1 partition (dt=111)
  OK
  Time taken: 0.375 seconds
  hive show partitions t1;
  OK
  dt=111
  Time taken: 0.108 seconds
  hive alter table t1 drop partition ( dt=111 );
  FAILED: Error in semantic analysis: Partition not found dt = 111
  hive alter table t1 drop partition ( dt='111' );
  FAILED: Error in semantic analysis: Partition not found dt = '111'
 
 
  So, how to delete the partition?
 
 
 
 
 
 
 
 
  --
  Regards,
  Bhavesh Shah
 




-- 
...:::Aniket:::... Quetzalco@tl


Re: Hive partition problem

2012-05-31 Thread wd
Nothing output in hive history file, is there an other log file or an
option to output detail log ?

On Thu, May 31, 2012 at 3:34 PM, Aniket Mokashi aniket...@gmail.com wrote:
 You should look at hive log and find exact exception. That will give you a
 hint.


 On Thu, May 31, 2012 at 12:33 AM, wd w...@wdicc.com wrote:

 No problem, thanks for your reply.
 I'm very curious why this didn't work, this sql come from hive wiki.
 The metadata is store in postgres, does it matter?

 On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah bhavesh25s...@gmail.com
 wrote:
  Sorry for previous reply. Actually you had try that one, I didn't saw
  that.
  Really Sorry.
 
  On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah bhavesh25s...@gmail.com
  wrote:
 
  Hello wd,
  Try this one... I am not sure about this
  ALTER TABLE t1 DROP PARTITION(dt = '111')
 
 
  --
  Regards,
  Bhavesh Shah
 
 
  On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote:
 
  hi,
 
  We setup a new hive 0.9 client, Found some sql did not work, for
  example
 
  hive create table t1(a int) partitioned by ( dt int );
  OK
  Time taken: 0.097 seconds
  hive load data local inpath '/tmp/t' into table t1 partition
  (dt=111);
  Copying data from file:/tmp/t
  Copying file: file:/tmp/t
  Loading data to table default.t1 partition (dt=111)
  OK
  Time taken: 0.375 seconds
  hive show partitions t1;
  OK
  dt=111
  Time taken: 0.108 seconds
  hive alter table t1 drop partition ( dt=111 );
  FAILED: Error in semantic analysis: Partition not found dt = 111
  hive alter table t1 drop partition ( dt='111' );
  FAILED: Error in semantic analysis: Partition not found dt = '111'
 
 
  So, how to delete the partition?
 
 
 
 
 
 
 
 
  --
  Regards,
  Bhavesh Shah
 




 --
 ...:::Aniket:::... Quetzalco@tl


Hive in IntelliJ

2012-05-31 Thread Lars Francke
Hi,

has anyone managed to get Hive properly set up in IntelliJ? I've tried
but so far I've failed to get it running with Ivy and its
dependencies.

Pointers would be very appreciated.

Cheers,
Lars


Re: Hive partition problem

2012-05-31 Thread Aniket Mokashi
-hiveconf hive.root.logger=DEBUG,FA \
-hiveconf hive.log.dir=$logDir \
-hiveconf hive.log.file=$logFile \

On Thu, May 31, 2012 at 12:41 AM, wd w...@wdicc.com wrote:

 Nothing output in hive history file, is there an other log file or an
 option to output detail log ?

 On Thu, May 31, 2012 at 3:34 PM, Aniket Mokashi aniket...@gmail.com
 wrote:
  You should look at hive log and find exact exception. That will give you
 a
  hint.
 
 
  On Thu, May 31, 2012 at 12:33 AM, wd w...@wdicc.com wrote:
 
  No problem, thanks for your reply.
  I'm very curious why this didn't work, this sql come from hive wiki.
  The metadata is store in postgres, does it matter?
 
  On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah bhavesh25s...@gmail.com
  wrote:
   Sorry for previous reply. Actually you had try that one, I didn't saw
   that.
   Really Sorry.
  
   On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah 
 bhavesh25s...@gmail.com
   wrote:
  
   Hello wd,
   Try this one... I am not sure about this
   ALTER TABLE t1 DROP PARTITION(dt = '111')
  
  
   --
   Regards,
   Bhavesh Shah
  
  
   On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote:
  
   hi,
  
   We setup a new hive 0.9 client, Found some sql did not work, for
   example
  
   hive create table t1(a int) partitioned by ( dt int );
   OK
   Time taken: 0.097 seconds
   hive load data local inpath '/tmp/t' into table t1 partition
   (dt=111);
   Copying data from file:/tmp/t
   Copying file: file:/tmp/t
   Loading data to table default.t1 partition (dt=111)
   OK
   Time taken: 0.375 seconds
   hive show partitions t1;
   OK
   dt=111
   Time taken: 0.108 seconds
   hive alter table t1 drop partition ( dt=111 );
   FAILED: Error in semantic analysis: Partition not found dt = 111
   hive alter table t1 drop partition ( dt='111' );
   FAILED: Error in semantic analysis: Partition not found dt = '111'
  
  
   So, how to delete the partition?
  
  
  
  
  
  
  
  
   --
   Regards,
   Bhavesh Shah
  
 
 
 
 
  --
  ...:::Aniket:::... Quetzalco@tl




-- 
...:::Aniket:::... Quetzalco@tl


Re: Hive partition problem

2012-05-31 Thread wd
Is this a bug? Should I report a bug to hive?

On Thu, May 31, 2012 at 3:56 PM, wd w...@wdicc.com wrote:
 Still no useful output. But we found the problem.

 When the partition col type is int, it can't be droped. After change
 it to string, it can be droped.

 On Thu, May 31, 2012 at 3:48 PM, Aniket Mokashi aniket...@gmail.com wrote:
 -hiveconf hive.root.logger=DEBUG,FA \
     -hiveconf hive.log.dir=$logDir \
     -hiveconf hive.log.file=$logFile \

 On Thu, May 31, 2012 at 12:41 AM, wd w...@wdicc.com wrote:

 Nothing output in hive history file, is there an other log file or an
 option to output detail log ?

 On Thu, May 31, 2012 at 3:34 PM, Aniket Mokashi aniket...@gmail.com
 wrote:
  You should look at hive log and find exact exception. That will give you
  a
  hint.
 
 
  On Thu, May 31, 2012 at 12:33 AM, wd w...@wdicc.com wrote:
 
  No problem, thanks for your reply.
  I'm very curious why this didn't work, this sql come from hive wiki.
  The metadata is store in postgres, does it matter?
 
  On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah bhavesh25s...@gmail.com
  wrote:
   Sorry for previous reply. Actually you had try that one, I didn't saw
   that.
   Really Sorry.
  
   On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah
   bhavesh25s...@gmail.com
   wrote:
  
   Hello wd,
   Try this one... I am not sure about this
   ALTER TABLE t1 DROP PARTITION(dt = '111')
  
  
   --
   Regards,
   Bhavesh Shah
  
  
   On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote:
  
   hi,
  
   We setup a new hive 0.9 client, Found some sql did not work, for
   example
  
   hive create table t1(a int) partitioned by ( dt int );
   OK
   Time taken: 0.097 seconds
   hive load data local inpath '/tmp/t' into table t1 partition
   (dt=111);
   Copying data from file:/tmp/t
   Copying file: file:/tmp/t
   Loading data to table default.t1 partition (dt=111)
   OK
   Time taken: 0.375 seconds
   hive show partitions t1;
   OK
   dt=111
   Time taken: 0.108 seconds
   hive alter table t1 drop partition ( dt=111 );
   FAILED: Error in semantic analysis: Partition not found dt = 111
   hive alter table t1 drop partition ( dt='111' );
   FAILED: Error in semantic analysis: Partition not found dt = '111'
  
  
   So, how to delete the partition?
  
  
  
  
  
  
  
  
   --
   Regards,
   Bhavesh Shah
  
 
 
 
 
  --
  ...:::Aniket:::... Quetzalco@tl




 --
 ...:::Aniket:::... Quetzalco@tl


Error in Hive execution on cluster : Wrong FS: hdfs

2012-05-31 Thread Puneet Khatod
Hi,

I am facing below error when I am firing any query in Hive.
My Hive setup is present on the master node in my cluster. Hadoop is configured 
using IP_addresses in configuration xmls and in master, slave files and it is 
running fine. The error only arises when hive query is executed which had table 
location on HDFS. It seems hive is expecting configurations to be done using 
hostnames.

Please help me in configuring hive such that it can understand IP address based 
configuration. I am using Hadoop 0.20.2 and Hive 0.7.1.

Regards,
Puneet

  __  __  
__

hive show tables;
OK
test_page_view
test_page_view_stg
Time taken: 131.309 seconds
hive
 select * from test_page_view_stg;
FAILED: Hive Internal Error: java.lang.RuntimeException(Error while making MR 
scratch directory - check filesystem config (null))
java.lang.RuntimeException: Error while making MR scratch directory - check 
filesystem config (null)
at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:196)
at org.apache.hadoop.hive.ql.Context.getMRTmpFileURI(Context.java:247)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:900)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6594)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.IllegalArgumentException: Wrong FS: 
hdfs://IP_ADDRESS:9100/tmp/hive-hadoop/hive_2012-05-31_11-29-19_844_3368974040204630542,
 expected: hdfs://hostname.local:9100
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:222)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:116)
at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:146)
at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:190)
... 14 more

Any comments or statements made in this email are not necessarily those of 
Tavant Technologies.
The information transmitted is intended only for the person or entity to which 
it is addressed and may
contain confidential and/or privileged material. If you have received this in 
error, please contact the
sender and delete the material from any computer. All e-mails sent from or to 
Tavant Technologies
may be subject to our monitoring procedures.


Hadoop Russia user group

2012-05-31 Thread Ruslan Al-Fakikh
Hi everyone,

I've created a group on Linkedin for Russian-speaking folks. It is
about Hadoop and its components (Hive, Pig, etc).

http://www.linkedin.com/groups/Hadoop-Russia-4468740?gid=4468740

Thanks,
Ruslan Al-Fakikh


RE: Hive partition problem

2012-05-31 Thread Malcolm Tye
Hi,
We used to use integer partitioning for 2 columns, year_month_day
and hour

When it came to dropping 1 partition for an hour for a particular day, it
seemed to drop all of the hour partitions for the day.

IMO it's a bug. We just moved to using text


Malc


-Original Message-
From: wd [mailto:w...@wdicc.com] 
Sent: 31 May 2012 09:32
To: user@hive.apache.org
Subject: Re: Hive partition problem

Is this a bug? Should I report a bug to hive?

On Thu, May 31, 2012 at 3:56 PM, wd w...@wdicc.com wrote:
 Still no useful output. But we found the problem.

 When the partition col type is int, it can't be droped. After change 
 it to string, it can be droped.

 On Thu, May 31, 2012 at 3:48 PM, Aniket Mokashi aniket...@gmail.com
wrote:
 -hiveconf hive.root.logger=DEBUG,FA \
     -hiveconf hive.log.dir=$logDir \
     -hiveconf hive.log.file=$logFile \

 On Thu, May 31, 2012 at 12:41 AM, wd w...@wdicc.com wrote:

 Nothing output in hive history file, is there an other log file or 
 an option to output detail log ?

 On Thu, May 31, 2012 at 3:34 PM, Aniket Mokashi 
 aniket...@gmail.com
 wrote:
  You should look at hive log and find exact exception. That will 
  give you a hint.
 
 
  On Thu, May 31, 2012 at 12:33 AM, wd w...@wdicc.com wrote:
 
  No problem, thanks for your reply.
  I'm very curious why this didn't work, this sql come from hive wiki.
  The metadata is store in postgres, does it matter?
 
  On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah 
  bhavesh25s...@gmail.com
  wrote:
   Sorry for previous reply. Actually you had try that one, I 
   didn't saw that.
   Really Sorry.
  
   On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah 
   bhavesh25s...@gmail.com
   wrote:
  
   Hello wd,
   Try this one... I am not sure about this ALTER TABLE t1 DROP 
   PARTITION(dt = '111')
  
  
   --
   Regards,
   Bhavesh Shah
  
  
   On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote:
  
   hi,
  
   We setup a new hive 0.9 client, Found some sql did not work, 
   for example
  
   hive create table t1(a int) partitioned by ( dt int );
   OK
   Time taken: 0.097 seconds
   hive load data local inpath '/tmp/t' into table t1 partition
   (dt=111);
   Copying data from file:/tmp/t Copying file: file:/tmp/t 
   Loading data to table default.t1 partition (dt=111) OK Time 
   taken: 0.375 seconds
   hive show partitions t1;
   OK
   dt=111
   Time taken: 0.108 seconds
   hive alter table t1 drop partition ( dt=111 );
   FAILED: Error in semantic analysis: Partition not found dt = 
   111
   hive alter table t1 drop partition ( dt='111' );
   FAILED: Error in semantic analysis: Partition not found dt =
'111'
  
  
   So, how to delete the partition?
  
  
  
  
  
  
  
  
   --
   Regards,
   Bhavesh Shah
  
 
 
 
 
  --
  ...:::Aniket:::... Quetzalco@tl




 --
 ...:::Aniket:::... Quetzalco@tl



Re: Hive partition problem

2012-05-31 Thread wd
It's too bad.
I've file a issue at hive jira, https://issues.apache.org/jira/browse/HIVE-3069.

On Thu, May 31, 2012 at 5:16 PM, Malcolm Tye malcolm@btinternet.com wrote:
 Hi,
        We used to use integer partitioning for 2 columns, year_month_day
 and hour

 When it came to dropping 1 partition for an hour for a particular day, it
 seemed to drop all of the hour partitions for the day.

 IMO it's a bug. We just moved to using text


 Malc


 -Original Message-
 From: wd [mailto:w...@wdicc.com]
 Sent: 31 May 2012 09:32
 To: user@hive.apache.org
 Subject: Re: Hive partition problem

 Is this a bug? Should I report a bug to hive?

 On Thu, May 31, 2012 at 3:56 PM, wd w...@wdicc.com wrote:
 Still no useful output. But we found the problem.

 When the partition col type is int, it can't be droped. After change
 it to string, it can be droped.

 On Thu, May 31, 2012 at 3:48 PM, Aniket Mokashi aniket...@gmail.com
 wrote:
 -hiveconf hive.root.logger=DEBUG,FA \
     -hiveconf hive.log.dir=$logDir \
     -hiveconf hive.log.file=$logFile \

 On Thu, May 31, 2012 at 12:41 AM, wd w...@wdicc.com wrote:

 Nothing output in hive history file, is there an other log file or
 an option to output detail log ?

 On Thu, May 31, 2012 at 3:34 PM, Aniket Mokashi
 aniket...@gmail.com
 wrote:
  You should look at hive log and find exact exception. That will
  give you a hint.
 
 
  On Thu, May 31, 2012 at 12:33 AM, wd w...@wdicc.com wrote:
 
  No problem, thanks for your reply.
  I'm very curious why this didn't work, this sql come from hive wiki.
  The metadata is store in postgres, does it matter?
 
  On Thu, May 31, 2012 at 3:26 PM, Bhavesh Shah
  bhavesh25s...@gmail.com
  wrote:
   Sorry for previous reply. Actually you had try that one, I
   didn't saw that.
   Really Sorry.
  
   On Thu, May 31, 2012 at 12:53 PM, Bhavesh Shah
   bhavesh25s...@gmail.com
   wrote:
  
   Hello wd,
   Try this one... I am not sure about this ALTER TABLE t1 DROP
   PARTITION(dt = '111')
  
  
   --
   Regards,
   Bhavesh Shah
  
  
   On Thu, May 31, 2012 at 12:49 PM, wd w...@wdicc.com wrote:
  
   hi,
  
   We setup a new hive 0.9 client, Found some sql did not work,
   for example
  
   hive create table t1(a int) partitioned by ( dt int );
   OK
   Time taken: 0.097 seconds
   hive load data local inpath '/tmp/t' into table t1 partition
   (dt=111);
   Copying data from file:/tmp/t Copying file: file:/tmp/t
   Loading data to table default.t1 partition (dt=111) OK Time
   taken: 0.375 seconds
   hive show partitions t1;
   OK
   dt=111
   Time taken: 0.108 seconds
   hive alter table t1 drop partition ( dt=111 );
   FAILED: Error in semantic analysis: Partition not found dt =
   111
   hive alter table t1 drop partition ( dt='111' );
   FAILED: Error in semantic analysis: Partition not found dt =
 '111'
  
  
   So, how to delete the partition?
  
  
  
  
  
  
  
  
   --
   Regards,
   Bhavesh Shah
  
 
 
 
 
  --
  ...:::Aniket:::... Quetzalco@tl




 --
 ...:::Aniket:::... Quetzalco@tl



Re: confused on different behavior of Bucketized tables do not support INSERT INTO

2012-05-31 Thread Bruce Bian
I'm using hive 0.9.0

On Thursday, May 31, 2012, Bruce Bian wrote:

 Hi,
 I've got a table vt_new_data which is defined as follows:
 CREATE TABLE VT_NEW_DATA
 (
  V_ACCOUNT_NUM string
 ,V_ACCOUNT_MODIFIER_NUM string
 ,V_DEPOSIT_TYPE_CD string
 ,V_DEPOSIT_TERM int
 ,V_LEDGER_SUBJECT_ID string
 ,V_ACCOUNTING_ORG_CD string
 ,V_OPEN_DT string
 ,V_CLOSE_DT string
 ,V_CURRENCY_CD string
 ,V_ACCOUNT_BAL float
 ,V_INNER_MONTH_DELAY_ACCUM float
 ) CLUSTERED BY (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM) SORTED BY
 (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM ASC) INTO 256 BUCKETS STORED AS
 RCFile;

 when I execute the following query
 explain insert into table vt_new_data select * from vt_new_data limit 1;
 (this is just a test)
 an FAILED: Error in semantic analysis: Bucketized tables do not support
 INSERT INTO: Table: vt_new_data error is occurred

 but when I execute the query:
 explain insert into table vt_new_data
 select /*+ MAPJOIN(T4) */
  t1.account_num as v_account_num
 ,t1.account_modifier_num as v_account_modifier_num
 ,'3006' as v_deposit_type_cd
 ,0 as  v_deposit_term
 ,'23201000' v_ledger_subject_id
 ,coalesce(t2.party_id,'')  as v_accounting_org_cd
 ,coalesce(t3.card_begin_dt,'19000101') as v_open_dt
 ,coalesce(t3.card_live_dt,'19000101') as v_close_dt
 ,coalesce(t4.currency_cd,substr(t1.account_modifier_num,3,3)) as
 v_currency_cd
 ,coalesce(t4.agt_amt,0) as v_account_bal
 ,0 as v_inner_month_delay_accum
 from t03_e_cash_bucket t1
 left outer join t03_agt_amount_h_bucket t4
 on t1.account_num=t4.account_num
 and t1.account_modifier_num=t4.account_modifier_num
 and t4.agt_amt_type_cd = '001'
 and t4.start_date='$TXNDATE'
 and t4.end_date'$TXNDATE'
 left outer join t01_party_card_rela_h_bucket t2
 on  t1.card_no = t2.card_no
 and t2.party_card_rela_type_cd = '01'
 and t2.start_date='$TXNDATE'
 and t2.end_date'$TXNDATE'
 left outer join t03_card_bucket t3
 on t1.card_no = t3.card_no;

 the execution plan is generated successfully and triggered an SMB Map
 Join, which is great.

 But I don't see the difference here? As both are inserting into a
 bucketized and sorted table?



Re: confused on different behavior of Bucketized tables do not support INSERT INTO

2012-05-31 Thread Bruce Bian
So I did another test on this.
hive create table test(foo int,bar string) clustered by(foo) sorted by
(foo asc) into 2 buckets;
OK
Time taken: 0.097 seconds
hive create table test2 (foo int,bar string) clustered by(foo) sorted by
(foo asc) into 2 buckets;
OK
hive LOAD DATA LOCAL INPATH 'hive/examples/files/kv1.txt' OVERWRITE INTO
TABLE test;
hive set hive.enforce.bucketing=true;
hive set hive.enforce.sorting=true;
hive insert into table test2 select * from test;
Total MapReduce jobs = 1
Launching Job 1 out of 1
……
hive insert into table test2 select * from test2;
FAILED: Error in semantic analysis: Bucketized tables do not support INSERT
INTO: Table: test2

Seems like the errorFAILED: Error in semantic analysis: Bucketized tables
do not support INSERT INTO: Table: vt_new_data error is occurred
 is only thrown when insert into a bucketized table from the same table?
And when insert into a bucketized table multi-times, it will create a
original_file_copy_n under the same bucket.

-rw-r--r--   3 wbian supergroup   2856 2012-05-31 22:03
/user/hive/warehouse/test2/00_0
-rw-r--r--   3 wbian supergroup   2856 2012-05-31 22:04
/user/hive/warehouse/test2/00_0_copy_1
-rw-r--r--   3 wbian supergroup   2956 2012-05-31 22:03
/user/hive/warehouse/test2/01_0
-rw-r--r--   3 wbian supergroup   2956 2012-05-31 22:04
/user/hive/warehouse/test2/01_0_copy_1

And since what I want to do is SMB Map Join, the following triggered the
SMB Map Join successfully
set hive.optimize.bucketmapjoin= true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set
hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
select /*+mapjoin(test)*/ * from pokes join test on pokes.foo=test.foo;

So what's the reason for throwing that error(i mean why not support insert
into a bucketized table from the same table)?And isn't that error message
kind of misleading?



On Thu, May 31, 2012 at 6:43 PM, Bruce Bian weidong@gmail.com wrote:

 I'm using hive 0.9.0

 On Thursday, May 31, 2012, Bruce Bian wrote:

 Hi,
 I've got a table vt_new_data which is defined as follows:
 CREATE TABLE VT_NEW_DATA
 (
  V_ACCOUNT_NUM string
 ,V_ACCOUNT_MODIFIER_NUM string
 ,V_DEPOSIT_TYPE_CD string
 ,V_DEPOSIT_TERM int
 ,V_LEDGER_SUBJECT_ID string
 ,V_ACCOUNTING_ORG_CD string
 ,V_OPEN_DT string
 ,V_CLOSE_DT string
 ,V_CURRENCY_CD string
 ,V_ACCOUNT_BAL float
 ,V_INNER_MONTH_DELAY_ACCUM float
 ) CLUSTERED BY (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM) SORTED BY
 (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM ASC) INTO 256 BUCKETS STORED AS
 RCFile;

 when I execute the following query
 explain insert into table vt_new_data select * from vt_new_data limit 1;
 (this is just a test)
 an FAILED: Error in semantic analysis: Bucketized tables do not support
 INSERT INTO: Table: vt_new_data error is occurred

 but when I execute the query:
 explain insert into table vt_new_data
 select /*+ MAPJOIN(T4) */
  t1.account_num as v_account_num
 ,t1.account_modifier_num as v_account_modifier_num
 ,'3006' as v_deposit_type_cd
 ,0 as  v_deposit_term
 ,'23201000' v_ledger_subject_id
 ,coalesce(t2.party_id,'')  as v_accounting_org_cd
 ,coalesce(t3.card_begin_dt,'19000101') as v_open_dt
 ,coalesce(t3.card_live_dt,'19000101') as v_close_dt
 ,coalesce(t4.currency_cd,substr(t1.account_modifier_num,3,3)) as
 v_currency_cd
 ,coalesce(t4.agt_amt,0) as v_account_bal
 ,0 as v_inner_month_delay_accum
 from t03_e_cash_bucket t1
 left outer join t03_agt_amount_h_bucket t4
 on t1.account_num=t4.account_num
 and t1.account_modifier_num=t4.account_modifier_num
 and t4.agt_amt_type_cd = '001'
 and t4.start_date='$TXNDATE'
 and t4.end_date'$TXNDATE'
 left outer join t01_party_card_rela_h_bucket t2
 on  t1.card_no = t2.card_no
 and t2.party_card_rela_type_cd = '01'
 and t2.start_date='$TXNDATE'
 and t2.end_date'$TXNDATE'
 left outer join t03_card_bucket t3
 on t1.card_no = t3.card_no;

 the execution plan is generated successfully and triggered an SMB Map
 Join, which is great.

 But I don't see the difference here? As both are inserting into a
 bucketized and sorted table?




Parse Error with '-' in Hive

2012-05-31 Thread Alex Barrera
Hi there,

I'm looking for help with the following error: I keep receiving an error
'Parse Error: line 40:4 cannot recognize input near '-' 'Platform' STRING
column type. '

It looks to be a problem with my JSON deserializer OR Hive not recognizing
the - in the field names. So far I've tried commenting it out with / and
//, and tried wrapping single and double quotes around the field name.
Nothing has worked so far.

My code looks like this:
add jar s3://sa/hive-json-serde-0.3.jar;
DROP TABLE IF EXISTS viewstream;
CREATE EXTERNAL TABLE IF NOT EXISTS viewstream(
 cl STRING,
 pid STRING,
 vp STRING,
 ar STRING,
 UArm STRING,
 dc STRING,
 host STRING,
 Blocked STRING,
 UA-Platform STRING,
 UA-Browser STRING,
 UA-Version STRING,
 UA-Language STRING
)

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
  LOCATION 'hdfs:///repo/viewstream/';

Anybody run into similar errors? I can't find anything on this problem
online, strangely enough.

Thanks all,
Alex


Re: Parse Error with '-' in Hive

2012-05-31 Thread shrikanth shankar
I believe Hive column names cant have '-' in them . From what I know this JSON 
serde uses column names as JSON expressions. This means that if you renamed the 
column name you would end up with a null value for the column. You might want 
to try a different json serde (for e.g. the one  Amazon uses in EMR examples 
allows the JSON expression for a column to be passed in as a  SERDE parameter)

thanks,
Shrikanth
On May 31, 2012, at 8:35 AM, Alex Barrera wrote:

 Hi there,
 
 I'm looking for help with the following error: I keep receiving an error 
 'Parse Error: line 40:4 cannot recognize input near '-' 'Platform' STRING 
 column type. ' 
 
 It looks to be a problem with my JSON deserializer OR Hive not recognizing 
 the - in the field names. So far I've tried commenting it out with / and //, 
 and tried wrapping single and double quotes around the field name. Nothing 
 has worked so far. 
 
 My code looks like this: 
 add jar s3://sa/hive-json-serde-0.3.jar;
 DROP TABLE IF EXISTS viewstream;
 CREATE EXTERNAL TABLE IF NOT EXISTS viewstream(
  cl STRING,
  pid STRING,
  vp STRING,
  ar STRING,
  UArm STRING,
  dc STRING,
  host STRING,
  Blocked STRING,
  UA-Platform STRING,
  UA-Browser STRING,
  UA-Version STRING,
  UA-Language STRING
 )
 
 ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
   LOCATION 'hdfs:///repo/viewstream/';
 
 Anybody run into similar errors? I can't find anything on this problem 
 online, strangely enough. 
 
 Thanks all,
 Alex



Re: Hive in IntelliJ

2012-05-31 Thread Owen O'Malley
On Thu, May 31, 2012 at 12:45 AM, Lars Francke lars.fran...@gmail.comwrote:

 Hi,

 has anyone managed to get Hive properly set up in IntelliJ? I've tried
 but so far I've failed to get it running with Ivy and its
 dependencies.


I managed, but it wasn't easy. I let it do the original import, but then
had to fix up the dependencies between modules to get everything compiling.
The big problem was the stubs for the hadoop versions. I had everything
depend on hadoop-0.20, the version specific stubs put their own hadoop
version above that, no other modules depended on the version specific stubs.

It would be great to move Hive over to maven, which makes the intellij
integration much easier.

-- Owen


Re: Hive in IntelliJ

2012-05-31 Thread Edward Capriolo
Hive is in maven but it always forks a hadoop to run stuff. The
approach I took is make fetching and extracting hadoop a 1 time
install target.

https://github.com/edwardcapriolo/hive_test/

I know its not the right way, but Runs unit tests lightning fast
like 15 sec vs like 1:20 or however long it takes to run ant test in
hive.



On 5/31/12, Owen O'Malley omal...@apache.org wrote:
 On Thu, May 31, 2012 at 12:45 AM, Lars Francke
 lars.fran...@gmail.comwrote:

 Hi,

 has anyone managed to get Hive properly set up in IntelliJ? I've tried
 but so far I've failed to get it running with Ivy and its
 dependencies.


 I managed, but it wasn't easy. I let it do the original import, but then
 had to fix up the dependencies between modules to get everything compiling.
 The big problem was the stubs for the hadoop versions. I had everything
 depend on hadoop-0.20, the version specific stubs put their own hadoop
 version above that, no other modules depended on the version specific
 stubs.

 It would be great to move Hive over to maven, which makes the intellij
 integration much easier.

 -- Owen



Re: Hive in IntelliJ

2012-05-31 Thread Owen O'Malley
On Thu, May 31, 2012 at 9:35 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Hive is in maven


I meant that Hive is built with Ant. Intellij has better support for
importing projects built by Maven. Hive's jars are published in Maven
central, but that is a different thing.

-- Owen


Re: Hive in IntelliJ

2012-05-31 Thread Edward Capriolo
I agree, even if we had a decent ant script most IDEs could deal with
it, but maven would be better.

I understand why Hive's ant build infrastructure is the way it is.
generating thrift, building a hive-exec.jar fat jar, getting hadoop in
the old days before it was in maven, templating engine to generate .q
files and diff them against .out files.

But I still think we should gut it. Most of it's complexity is self
imposed because common is its own subant, and cli is its own subant,
serde is its own subant. Then each subant usually has to hack up the
classpath somehow to work, or hack up something else to test.

And then support other hacks like this. Which get replicated to every subant.

fileset dir=${hadoop.root}
  !-- below is for 0.23 onwards --
  include name=share/hadoop/common/lib/*.jar /
  exclude name=share/hadoop/common/lib/hadoop-mapreduce-*.jar /
  exclude name=share/hadoop/common/lib/hadoop-yarn-*.jar /
/fileset
  /path

Not all of it is hives fault, because one build of hive works with
like 40 version of hadoop, and a majority of the hackery is dealing
with rather large upstream changes.

On the other hand maven is terrible at dealing with any of the outside
of the box stuff hive needs to do. Each ugly ant hack hive has to do
to deal with hadoop 20-20s-0.20.2.205-2.0 would like have to be
written as a maven plugin which would not really leave us in a better
state. Ugly-one-off ants vs ugly one off maven plugins.

The way I look at it 33% of the problem is upstream changes and 33% is
hive desire to fork hadoop to run hive, and 33% is the every component
is a subant thing.


On 5/31/12, Owen O'Malley omal...@apache.org wrote:
 On Thu, May 31, 2012 at 9:35 AM, Edward Capriolo
 edlinuxg...@gmail.comwrote:

 Hive is in maven


 I meant that Hive is built with Ant. Intellij has better support for
 importing projects built by Maven. Hive's jars are published in Maven
 central, but that is a different thing.

 -- Owen



Re: Setup for contributing to Hive from MAC

2012-05-31 Thread Ashutosh Chauhan
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
will help you get started.

Thanks,
Ashutosh
On Thu, May 31, 2012 at 8:55 AM, Mapred Learn mapred.le...@gmail.comwrote:

 Hi,
 Does anyone know how to set up apache hive on MAC to contribute and fix
 JIRAs ?

 Thanks,
 JJ

 Sent from my iPhone


Re: Hive in IntelliJ

2012-05-31 Thread Lars Francke
Hi Owen ( Edward),

 has anyone managed to get Hive properly set up in IntelliJ? I've tried
 but so far I've failed to get it running with Ivy and its
 dependencies.

 I managed, but it wasn't easy. I let it do the original import, but then had
 to fix up the dependencies between modules to get everything compiling. The
 big problem was the stubs for the hadoop versions. I had everything depend
 on hadoop-0.20, the version specific stubs put their own hadoop version
 above that, no other modules depended on the version specific stubs.

oh god yeah that sounds painful. I had hoped to somehow get around
that. On the other hand it makes me happy that it wasn't just
stupidity on my part.

 It would be great to move Hive over to maven, which makes the intellij
 integration much easier.

Yeah that'd be awesome (fully realizing that this wouldn't be an easy
feat either).

Cheers,
Lars


Re: Error in Hive execution on cluster : Wrong FS: hdfs

2012-05-31 Thread Abhishek Pratap Singh
Can you check the configuration in your Hadoop's core-site.xml file.
Also are you able to fire fs command from hadoop successfully by using the
fully qualified directory structure?

Seems a problem in your file system config.


Regards,
Abhishek

On Thu, May 31, 2012 at 1:35 AM, Puneet Khatod puneet.kha...@tavant.comwrote:

  Hi,

 ** **

 I am facing below error when I am firing any query in Hive.

 My Hive setup is present on the master node in my cluster. Hadoop is
 configured using IP_addresses in configuration xmls and in master, slave
 files and it is running fine. The error only arises when hive query is
 executed which had table location on HDFS. It seems hive is expecting
 configurations to be done using hostnames.

 ** **

 Please help me in configuring hive such that it can understand IP address
 based configuration. I am using Hadoop 0.20.2 and Hive 0.7.1.

 ** **

 Regards,

 Puneet

 ** **

   __
 __  __

 ** **

 hive show tables;

 OK

 test_page_view

 test_page_view_stg

 Time taken: 131.309 seconds

 hive

  select * from test_page_view_stg;

 FAILED: Hive Internal Error: java.lang.RuntimeException(Error while making
 MR scratch directory - check filesystem config (null))

 java.lang.RuntimeException: Error while making MR scratch directory -
 check filesystem config (null)

 at
 org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:196)

 at
 org.apache.hadoop.hive.ql.Context.getMRTmpFileURI(Context.java:247)

 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:900)
 

 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6594)
 

 at
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
 

 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)

 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)

 at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)

 at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)

 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)**
 **

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 

 at java.lang.reflect.Method.invoke(Method.java:601)

 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

 *Caused by: java.lang.IllegalArgumentException: Wrong FS:
 hdfs://IP_ADDRESS:9100/tmp/hive-hadoop/hive_2012-05-31_11-29-19_844_3368974040204630542,
 expected: hdfs://hostname.local:9100*

 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)*
 ***

 at
 org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99)
 

 at
 org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:222)

 at
 org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:116)
 

 at
 org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:146)

 at
 org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:190)

 ... 14 more

 ** **

 Any comments or statements made in this email are not necessarily those of
 Tavant Technologies.
 The information transmitted is intended only for the person or entity to
 which it is addressed and may
 contain confidential and/or privileged material. If you have received this
 in error, please contact the
 sender and delete the material from any computer. All e-mails sent from or
 to Tavant Technologies
 may be subject to our monitoring procedures.



Re: confused on different behavior of Bucketized tables do not support INSERT INTO

2012-05-31 Thread Mark Grover
Hi Bruce,
It seems to me that your issue may be two-fold.

1) The JIRA that introduced insert into table 
(https://issues.apache.org/jira/browse/HIVE-306) in Hive 0.8 only works for 
non-bucketed partitions or tables. This is because appending inside a table or 
(its partition, if it's partitioned) works by adding a new file within the 
folder on HDFS. Since bucketing in a partition is implemented as files with a 
folder, this would break bucketing. To allow for insert into support for 
bucketed tables, we will probably have to regenerate the entire bucket(s) 
within the partition. Not sure if this is being done or has already been done 
in a newer release.

2) It seems like there was a minor bug in the implementation. As mentioned by 
this ticket (https://issues.apache.org/jira/browse/HIVE-3064) where insert 
into table works as insert overwrite table if the table name in the 
insert statement has upper case characters (black magic, eh?). Regardless of 
whether your table was created using upper or lower case letters, this bug 
manifests itself if you use upper case characters for the table name in your 
insert into table query.

To summarize, you shouldn't be calling insert into on bucketed tables for the 
reasons stated above. Also, if you are using insert into command, try to 
restrict your table names to lower case letters.

I have also verified that insert into table dummy select * from dummy works 
when dummy is a non-bucketed table. As a matter of fact, insert overwrite 
table dummy select * from dummy works too.

I would have expected your query insert into table test2 select * from test; 
to fail since test2 is bucketed as well. However, it doesn't. This seems to be 
a bug and I have created a JIRA for this 
(https://issues.apache.org/jira/browse/HIVE-3077).

Hope that helps,
Mark

- Original Message -
From: Bruce Bian weidong@gmail.com
To: user@hive.apache.org
Sent: Thursday, May 31, 2012 10:16:03 AM
Subject: Re: confused on different behavior of Bucketized tables do not support 
INSERT INTO

So I did another test on this. 

hive create table test(foo int,bar string) clustered by(foo) sorted by (foo 
asc) into 2 buckets; 
OK 
Time taken: 0.097 seconds 
hive create table test2 (foo int,bar string) clustered by(foo) sorted by (foo 
asc) into 2 buckets; 
OK 

hive LOAD DATA LOCAL INPATH 'hive/examples/files/kv1.txt' OVERWRITE INTO TABLE 
test; 

hive set hive.enforce.bucketing=true; 
hive set hive.enforce.sorting=true; 

hive insert into table test2 select * from test; 
Total MapReduce jobs = 1 
Launching Job 1 out of 1 
…… 

hive insert into table test2 select * from test2; 
FAILED: Error in semantic analysis: Bucketized tables do not support INSERT 
INTO: Table: test2 


Seems like the errorFAILED: Error in semantic analysis: Bucketized tables do 
not support INSERT INTO: Table: vt_new_data error is occurred 
 is only thrown when insert into a bucketized table from the same table? And 
when insert into a bucketized table multi-times, it will create a 
original_file_copy_n under the same bucket. 



-rw-r--r-- 3 wbian supergroup 2856 2012-05-31 22:03 
/user/hive/warehouse/test2/00_0 
-rw-r--r-- 3 wbian supergroup 2856 2012-05-31 22:04 
/user/hive/warehouse/test2/00_0_copy_1 
-rw-r--r-- 3 wbian supergroup 2956 2012-05-31 22:03 
/user/hive/warehouse/test2/01_0 
-rw-r--r-- 3 wbian supergroup 2956 2012-05-31 22:04 
/user/hive/warehouse/test2/01_0_copy_1 


And since what I want to do is SMB Map Join, the following triggered the SMB 
Map Join successfully 

set hive.optimize.bucketmapjoin= true; 
set hive.optimize.bucketmapjoin.sortedmerge = true; 
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; 
select /*+mapjoin(test)*/ * from pokes join test on pokes.foo=test.foo; 


So what's the reason for throwing that error(i mean why not support insert into 
a bucketized table from the same table)?And isn't that error message kind of 
misleading? 








On Thu, May 31, 2012 at 6:43 PM, Bruce Bian  weidong@gmail.com  wrote: 


I'm using hive 0.9.0 



On Thursday, May 31, 2012, Bruce Bian wrote: 


Hi, 
I've got a table vt_new_data which is defined as follows: 

CREATE TABLE VT_NEW_DATA 
( 
V_ACCOUNT_NUM string 
,V_ACCOUNT_MODIFIER_NUM string 
,V_DEPOSIT_TYPE_CD string 
,V_DEPOSIT_TERM int 
,V_LEDGER_SUBJECT_ID string 
,V_ACCOUNTING_ORG_CD string 
,V_OPEN_DT string 
,V_CLOSE_DT string 
,V_CURRENCY_CD string 
,V_ACCOUNT_BAL float 
,V_INNER_MONTH_DELAY_ACCUM float 
) CLUSTERED BY (V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM) SORTED BY 
(V_ACCOUNT_NUM,V_ACCOUNT_MODIFIER_NUM ASC) INTO 256 BUCKETS STORED AS RCFile; 


when I execute the following query 
explain insert into table vt_new_data select * from vt_new_data limit 1; (this 
is just a test) 
an FAILED: Error in semantic analysis: Bucketized tables do not support INSERT 
INTO: Table: vt_new_data error is occurred 


but when I execute the query: 

explain insert into table vt_new_data 

Behavior of Hive 2837: insert into external tables should not be allowed

2012-05-31 Thread Mark Grover
Hi folks,
I have a question regarding HIVE 2837(
https://issues.apache.org/jira/browse/HIVE-2837) that deals with
disallowing external table from using insert into queries.

From looking at the JIRA, it seems like it applies to external tables on
HDFS as well. Technically, insert into should be ok for external tables on
HDFS (and S3 as well). Seems like a storage file system level thing to
specify whether insert into is applied and implement it.

Historically, there hasn't been any real difference between creating an
external table on HDFS vs creating a managed one. However, if we disallow
insert into on external tables, that would mean that folks with external
tables on HDFS wouldn't be able to make use of insert into functionality
even though they should be able to. Do we want to allow insert into on HDFS
tables regardless of whether they are external or not?

Mark