Re: Create external table with s3 location error

2018-11-07 Thread Suresh Kumar Sethuramaswamy
.ql.session.SessionState.applyAuthorizationPolicy(SessionState.java:1745)
>
> at
> org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:125)
>
> at org.apache.hive.service.cli.CLIService.init(CLIService.java:111)
>
> at
> org.apache.hive.service.CompositeService.init(CompositeService.java:59)
>
> at
> org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:125)
>
> at
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:542)
>
> at
> org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:89)
>
> at
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:793)
>
> at
> org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:666)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
>
> Caused by: java.net.ConnectException: Connection refused (Connection
> refused)
>
> at java.net.PlainSocketImpl.socketConnect(Native Method)
>
> at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>
> at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>
> at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>
> at java.net.Socket.connect(Socket.java:589)
>
> at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
>
> ... 42 more
>
> )
>
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:512)
>
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:244)
>
> at
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>
>
>
> *From: *Suresh Kumar Sethuramaswamy 
> *Reply-To: *"user@hive.apache.org" 
> *Date: *Wednesday, November 7, 2018 at 2:50 PM
> *To: *"user@hive.apache.org" 
> *Subject: *Re: Create external table with s3 location error
>
>
>
> Are you using EMR or Apache hadoop open source?
>
> Can you share your hive megastore logs?
>
>
>
>
>
> On Wed, Nov 7, 2018, 2:19 PM Garry Chen 
> hi All,
>
> I am try to create a external table using s3 as location
> but failed.  I add my access key and security key in hive-site.xml and
> reboot the server.  Any suggestion?
>
>
>
> hive>  create external table kv (key int, values string)  location
> 's3://cu-iclick/test';
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask.
> MetaException(message:java.lang.NullPointerException)
>
>
>
> Garry
>
>


Re: Create external table with s3 location error

2018-11-07 Thread Garry Chen

Yes, it works.  Thank you very much,
Garry
From: Suresh Kumar Sethuramaswamy 
Reply-To: "user@hive.apache.org" 
Date: Wednesday, November 7, 2018 at 3:10 PM
To: "user@hive.apache.org" 
Subject: Re: Create external table with s3 location error

Thanks for the logs. Couple of things here,

1. Based on the logs,the HiveServer2 seem to be down. This is an issue if you 
use Hue/Beeline
2. If you are using Hive CLI, you can still test it, where you have to add the 
(fs.s3a.access.key,fs.s3.secret.key)s3 keys in hdfs advanced configuration 
snippet for core-site.xml and hdfs-site.xml via CM.Also for location it should 
be s3a:// not s3://


On Wed, Nov 7, 2018, 2:58 PM Garry Chen 
mailto:g...@cornell.edu> wrote:
Hi Suresh,
I am using Hive 1.1.0-cdh5.14.4 and hive server log as below.

2018-11-07 19:43:16,581 WARN  [main]: server.HiveServer2 
(HiveServer2.java:startHiveServer2(581)) - Error starting HiveServer2 on 
attempt 1, will retry in 6ms
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at 
org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:840)
at 
org.apache.hadoop.hive.ql.session.SessionState.getAuthorizationMode(SessionState.java:1686)
at 
org.apache.hadoop.hive.ql.session.SessionState.isAuthorizationModeV2(SessionState.java:1697)
at 
org.apache.hadoop.hive.ql.session.SessionState.applyAuthorizationPolicy(SessionState.java:1745)
at 
org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:125)
at org.apache.hive.service.cli.CLIService.init(CLIService.java:111)
at 
org.apache.hive.service.CompositeService.init(CompositeService.java:59)
at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:125)
at 
org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:542)
at 
org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:89)
at 
org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:793)
at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:666)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at 
org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthorizeProviderManager(HiveUtils.java:391)
at 
org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:817)
... 17 more
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at 
org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProviderBase.setConf(HiveAuthorizationProviderBase.java:114)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at 
org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthorizeProviderManager(HiveUtils.java:388)
... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at 
org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:220)
at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:338)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:299)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:274)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:256)
at 
org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider.init(DefaultHiveAuthorizationProvider.java:29)
at 
org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProviderBase.setConf(HiveAuthorizationProviderBase.java:112)
... 21 more
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtil

Re: Create external table with s3 location error

2018-11-07 Thread Garry Chen
)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
... 42 more
)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:512)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:244)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)

From: Suresh Kumar Sethuramaswamy 
Reply-To: "user@hive.apache.org" 
Date: Wednesday, November 7, 2018 at 2:50 PM
To: "user@hive.apache.org" 
Subject: Re: Create external table with s3 location error

Are you using EMR or Apache hadoop open source?
Can you share your hive megastore logs?


On Wed, Nov 7, 2018, 2:19 PM Garry Chen 
mailto:g...@cornell.edu> wrote:
hi All,
I am try to create a external table using s3 as location but 
failed.  I add my access key and security key in hive-site.xml and reboot the 
server.  Any suggestion?

hive>  create external table kv (key int, values string)  location 
's3://cu-iclick/test';
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:java.lang.NullPointerException)

Garry


Re: Create external table with s3 location error

2018-11-07 Thread Suresh Kumar Sethuramaswamy
Are you using EMR or Apache hadoop open source?
Can you share your hive megastore logs?


On Wed, Nov 7, 2018, 2:19 PM Garry Chen  hi All,
>
> I am try to create a external table using s3 as location
> but failed.  I add my access key and security key in hive-site.xml and
> reboot the server.  Any suggestion?
>
>
>
> hive>  create external table kv (key int, values string)  location
> 's3://cu-iclick/test';
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask.
> MetaException(message:java.lang.NullPointerException)
>
>
>
> Garry
>


Re: Create external table

2016-05-13 Thread Margus Roo

I have kerberos enabled in my cluster.

In case I create external table using beeline I see from hdfs namenode 
log that it does Kerberos auth for every single file I guess.


It may be the reason why creating external hive table fails in case I 
have loads of directories and files under them.


Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780

On 12/05/16 10:41, Margus Roo wrote:


No I got closer and discovered that my problem is related with 
permissions.


In example

drwxr-xr-x   - margusja  hdfs  0 2016-05-12 03:33 /tmp/files_10k

...

-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1959.txt
-rw-r--r--   3 margusja hdfs  4 2016-05-12 02:01 
/tmp/files_10k/f196.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1960.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1961.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1962.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1963.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1964.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1965.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1966.txt


...

Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://bigdata29.webmedia.int:1/> create external table 
files_10k (i int) row format delimited fields terminated by '\t' 
location '/tmp/files_10k';

No rows affected (3.184 seconds)
0: jdbc:hive2://bigdata29.webmedia.int:1/>


Now I change owner to flume in example.

drwxr-xr-x   - flume hdfs  0 2016-05-12 03:33 /tmp/files_10k

...

-rw-r--r--   3 flume hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1968.txt
-rw-r--r--   3 flume hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1969.txt
-rw-r--r--   3 flume hdfs  4 2016-05-12 02:01 
/tmp/files_10k/f197.txt
-rw-r--r--   3 flume hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1970.txt
-rw-r--r--   3 flume hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1971.txt
-rw-r--r--   3 flume hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1972.txt


...

Others can read. In example user margusja can read

[margusja@bigdata29 ~]$ hdfs dfs -ls /tmp/files_10k
Found 1112 items
-rw-r--r--   3 flume hdfs  2 2016-05-12 01:59 
/tmp/files_10k/f1.txt
-rw-r--r--   3 flume hdfs  3 2016-05-12 01:59 
/tmp/files_10k/f10.txt
-rw-r--r--   3 flume hdfs  4 2016-05-12 01:59 
/tmp/files_10k/f100.txt
-rw-r--r--   3 flume hdfs  5 2016-05-12 01:59 
/tmp/files_10k/f1000.txt
-rw-r--r--   3 flume hdfs  6 2016-05-12 01:59 
/tmp/files_10k/f1.txt


I try now create a table

0: jdbc:hive2://bigdata29.webmedia.int:1/> create external table 
files_10k (i int) row format delimited fields terminated by '\t' 
location '/tmp/files_10k';
Error: Error while compiling statement: FAILED: 
HiveAccessControlException Permission denied: user [margusja] does not 
have [READ] privilege on [hdfs://mycluster/tmp/files_10k] 
(state=42000,code=4)

0: jdbc:hive2://bigdata29.webmedia.int:1/>

In Hiveserver2.log:

2016-05-12 03:38:58,111 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: 
create external table files_10k (i int) row format delimited fields 
terminated by '\t' location '/tmp/files_10k'
2016-05-12 03:38:58,112 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
parse.ParseDriver (ParseDriver.java:parse(209)) - Parse Completed
2016-05-12 03:38:58,112 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(162)) - method=parse start=1463038738111 end=1463038738112 duration=1 
from=org.apache.hadoop.hive.ql.Driver>
2016-05-12 03:38:58,112 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
log.PerfLogger (PerfLogger.java:PerfLogBegin(135)) - method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
2016-05-12 03:38:58,112 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
parse.CalcitePlanner (SemanticAnalyzer.java:analyzeInternal(10114)) - 
Starting Semantic Analysis
2016-05-12 03:38:58,113 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
parse.CalcitePlanner (SemanticAnalyzer.java:analyzeCreateTable(10776)) 
- Creating table default.files_10k position=22
2016-05-12 03:38:58,113 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 1: 
get_database: default
2016-05-12 03:38:58,113 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - 
ugi=hive/bigdata29.webmedia@testhadoop.com ip=unknown-ip-addr  
cmd=get_database: default
2016-05-12 03:38:58,118 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
ql.Driver (Driver.java:compile(466)) - Semantic Analysis Completed

Re: Create external table

2016-05-12 Thread Margus Roo

No I got closer and discovered that my problem is related with permissions.

In example

drwxr-xr-x   - margusja  hdfs  0 2016-05-12 03:33 /tmp/files_10k

...

-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1959.txt
-rw-r--r--   3 margusja hdfs  4 2016-05-12 02:01 
/tmp/files_10k/f196.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1960.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1961.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1962.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1963.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1964.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1965.txt
-rw-r--r--   3 margusja hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1966.txt


...

Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://bigdata29.webmedia.int:1/> create external table 
files_10k (i int) row format delimited fields terminated by '\t' 
location '/tmp/files_10k';

No rows affected (3.184 seconds)
0: jdbc:hive2://bigdata29.webmedia.int:1/>


Now I change owner to flume in example.

drwxr-xr-x   - flume hdfs  0 2016-05-12 03:33 /tmp/files_10k

...

-rw-r--r--   3 flume hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1968.txt
-rw-r--r--   3 flume hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1969.txt
-rw-r--r--   3 flume hdfs  4 2016-05-12 02:01 
/tmp/files_10k/f197.txt
-rw-r--r--   3 flume hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1970.txt
-rw-r--r--   3 flume hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1971.txt
-rw-r--r--   3 flume hdfs  5 2016-05-12 02:01 
/tmp/files_10k/f1972.txt


...

Others can read. In example user margusja can read

[margusja@bigdata29 ~]$ hdfs dfs -ls /tmp/files_10k
Found 1112 items
-rw-r--r--   3 flume hdfs  2 2016-05-12 01:59 /tmp/files_10k/f1.txt
-rw-r--r--   3 flume hdfs  3 2016-05-12 01:59 /tmp/files_10k/f10.txt
-rw-r--r--   3 flume hdfs  4 2016-05-12 01:59 
/tmp/files_10k/f100.txt
-rw-r--r--   3 flume hdfs  5 2016-05-12 01:59 
/tmp/files_10k/f1000.txt
-rw-r--r--   3 flume hdfs  6 2016-05-12 01:59 
/tmp/files_10k/f1.txt


I try now create a table

0: jdbc:hive2://bigdata29.webmedia.int:1/> create external table 
files_10k (i int) row format delimited fields terminated by '\t' 
location '/tmp/files_10k';
Error: Error while compiling statement: FAILED: 
HiveAccessControlException Permission denied: user [margusja] does not 
have [READ] privilege on [hdfs://mycluster/tmp/files_10k] 
(state=42000,code=4)

0: jdbc:hive2://bigdata29.webmedia.int:1/>

In Hiveserver2.log:

2016-05-12 03:38:58,111 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: 
create external table files_10k (i int) row format delimited fields 
terminated by '\t' location '/tmp/files_10k'
2016-05-12 03:38:58,112 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
parse.ParseDriver (ParseDriver.java:parse(209)) - Parse Completed
2016-05-12 03:38:58,112 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(162)) - method=parse start=1463038738111 end=1463038738112 duration=1 
from=org.apache.hadoop.hive.ql.Driver>
2016-05-12 03:38:58,112 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
log.PerfLogger (PerfLogger.java:PerfLogBegin(135)) - method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
2016-05-12 03:38:58,112 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
parse.CalcitePlanner (SemanticAnalyzer.java:analyzeInternal(10114)) - 
Starting Semantic Analysis
2016-05-12 03:38:58,113 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
parse.CalcitePlanner (SemanticAnalyzer.java:analyzeCreateTable(10776)) - 
Creating table default.files_10k position=22
2016-05-12 03:38:58,113 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 1: 
get_database: default
2016-05-12 03:38:58,113 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - 
ugi=hive/bigdata29.webmedia@testhadoop.com ip=unknown-ip-addr  
cmd=get_database: default
2016-05-12 03:38:58,118 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
ql.Driver (Driver.java:compile(466)) - Semantic Analysis Completed
2016-05-12 03:38:58,118 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(162)) - method=semanticAnalyze start=1463038738112 end=1463038738118 duration=6 
from=org.apache.hadoop.hive.ql.Driver>
2016-05-12 03:38:58,118 INFO  [HiveServer2-Handler-Pool: Thread-69]: 
ql.Driver (Driver.java:getSchema(246)) - Returning Hive schema: 
Schema(fieldSchemas:null, properties:null)
2016

Re: Create external table

2016-05-11 Thread Margus Roo

One more example:

[hdfs@hadoopnn1 ~]$ hdfs dfs -count -h /user/margusja/files_10k/
   19.8 K 47.7 K /user/margusja/files_10k
[hdfs@hadoopnn1 ~]$ hdfs dfs -count -h /datasource/dealgate/
  537.9 K  8.5 G /datasource/dealgate

2: jdbc:hive2://hadoopnn1.estpak.ee:1/def> create external table 
files_10k (i int) row format delimited fields terminated by '\t' 
location '/user/margusja/files_10k';

No rows affected (0.197 seconds)
2: jdbc:hive2://hadoopnn1.estpak.ee:1/def> drop table files_10k;
No rows affected (0.078 seconds)
2: jdbc:hive2://hadoopnn1.estpak.ee:1/def> create external table 
files_10k (i int) row format delimited fields terminated by '\t' 
location '/datasource/dealgate';

Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0)
2: jdbc:hive2://hadoopnn1.estpak.ee:1/def>


So in my point of view beeline in some reason looks data and old hive 
client does not.


Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780

On 11/05/16 13:35, Margus Roo wrote:


More information:

2016-05-11 13:31:17,086 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: 
create external table files_10k (i int) row format delimited fields 
terminated by '\t' location '/user/margusja/files_10k'
2016-05-11 13:31:17,089 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
parse.ParseDriver (ParseDriver.java:parse(209)) - Parse Completed
2016-05-11 13:31:17,089 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(162)) - method=parse start=1462962677086 end=1462962677089 duration=3 
from=org.apache.hadoop.hive.ql.Driver>
2016-05-11 13:31:17,089 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
log.PerfLogger (PerfLogger.java:PerfLogBegin(135)) - method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
2016-05-11 13:31:17,090 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
parse.CalcitePlanner (SemanticAnalyzer.java:analyzeInternal(10114)) - 
Starting Semantic Analysis
2016-05-11 13:31:17,093 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
parse.CalcitePlanner (SemanticAnalyzer.java:analyzeCreateTable(10776)) 
- Creating table default.files_10k position=22
2016-05-11 13:31:17,094 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 2: 
get_database: default
2016-05-11 13:31:17,094 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - 
ugi=hive/hadoopnn1.estpak...@testhadoop.com ip=unknown-ip-addr  
cmd=get_database: default
2016-05-11 13:31:17,098 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
for user hive
2016-05-11 13:31:17,098 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
for user hive
2016-05-11 13:31:17,099 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
for user hive
2016-05-11 13:31:17,099 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
for user hive
2016-05-11 13:31:17,099 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
metadata.HiveUtils 
(HiveUtils.java:getMetaStoreAuthorizeProviderManagers(353)) - Adding 
metastore authorization provider: 
org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider
2016-05-11 13:31:17,102 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
for user hive
2016-05-11 13:31:17,102 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
for user hive
2016-05-11 13:31:17,106 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
ql.Driver (Driver.java:compile(466)) - Semantic Analysis Completed
2016-05-11 13:31:17,106 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(162)) - method=semanticAnalyze start=1462962677089 end=1462962677106 
duration=17 from=org.apache.hadoop.hive.ql.Driver>
2016-05-11 13:31:17,106 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
ql.Driver (Driver.java:getSchema(246)) - Returning Hive schema: 
Schema(fieldSchemas:null, properties:null)
2016-05-11 13:31:17,106 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
log.PerfLogger (PerfLogger.java:PerfLogBegin(135)) - method=doAuthorization from=org.apache.hadoop.hive.ql.Driver>
2016-05-11 13:31:17,107 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
fo

Re: Create external table

2016-05-11 Thread Margus Roo

More information:

2016-05-11 13:31:17,086 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: 
create external table files_10k (i int) row format delimited fields 
terminated by '\t' location '/user/margusja/files_10k'
2016-05-11 13:31:17,089 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
parse.ParseDriver (ParseDriver.java:parse(209)) - Parse Completed
2016-05-11 13:31:17,089 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(162)) - method=parse start=1462962677086 end=1462962677089 duration=3 
from=org.apache.hadoop.hive.ql.Driver>
2016-05-11 13:31:17,089 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
log.PerfLogger (PerfLogger.java:PerfLogBegin(135)) - method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
2016-05-11 13:31:17,090 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
parse.CalcitePlanner (SemanticAnalyzer.java:analyzeInternal(10114)) - 
Starting Semantic Analysis
2016-05-11 13:31:17,093 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
parse.CalcitePlanner (SemanticAnalyzer.java:analyzeCreateTable(10776)) - 
Creating table default.files_10k position=22
2016-05-11 13:31:17,094 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 2: 
get_database: default
2016-05-11 13:31:17,094 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - 
ugi=hive/hadoopnn1.estpak...@testhadoop.com ip=unknown-ip-addr  
cmd=get_database: default
2016-05-11 13:31:17,098 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
for user hive
2016-05-11 13:31:17,098 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
for user hive
2016-05-11 13:31:17,099 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
for user hive
2016-05-11 13:31:17,099 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
for user hive
2016-05-11 13:31:17,099 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
metadata.HiveUtils 
(HiveUtils.java:getMetaStoreAuthorizeProviderManagers(353)) - Adding 
metastore authorization provider: 
org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider
2016-05-11 13:31:17,102 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
for user hive
2016-05-11 13:31:17,102 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
for user hive
2016-05-11 13:31:17,106 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
ql.Driver (Driver.java:compile(466)) - Semantic Analysis Completed
2016-05-11 13:31:17,106 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
log.PerfLogger (PerfLogger.java:PerfLogEnd(162)) - method=semanticAnalyze start=1462962677089 end=1462962677106 duration=17 
from=org.apache.hadoop.hive.ql.Driver>
2016-05-11 13:31:17,106 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
ql.Driver (Driver.java:getSchema(246)) - Returning Hive schema: 
Schema(fieldSchemas:null, properties:null)
2016-05-11 13:31:17,106 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
log.PerfLogger (PerfLogger.java:PerfLogBegin(135)) - method=doAuthorization from=org.apache.hadoop.hive.ql.Driver>
2016-05-11 13:31:17,107 WARN  [HiveServer2-Handler-Pool: Thread-5867]: 
security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1521)) - No groups available 
for user margusja
2016-05-11 13:31:18,289 INFO 
[org.apache.hadoop.util.JvmPauseMonitor$Monitor@59f45950]: 
util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in 
JVM or host machine (eg GC): pause of approximately 1092ms
2016-05-11 13:31:29,547 INFO  [HiveServer2-Handler-Pool: Thread-5867]: 
retry.RetryInvocationHandler (RetryInvocationHandler.java:invoke(144)) - 
Exception while invoking getListing of class 
ClientNamenodeProtocolTranslatorPB over 
hadoopnn1.estpak.ee/88.196.164.42:8020. Trying to fail over immediately.
java.io.IOException: com.google.protobuf.ServiceException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:580)

at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.

Re: Create external table

2016-05-11 Thread Margus Roo

What do you mean?

Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780

On 11/05/16 08:21, Mich Talebzadeh wrote:

yes but table then exists correct I mean second time

did you try


*use default;*
*
drop table if exists trips;*
**


it is still within Hive metadata registered as an existing table.
*
*


*



*

Dr Mich Talebzadeh

LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/


http://talebzadehmich.wordpress.com 


On 11 May 2016 at 06:16, Margus Roo > wrote:


Hi

Thanks for your answer.

---

At first I create an empty hdfs directory (if directory is empty I
did not have problems before too).

[margusja@hadoopnn1 ~]$ hdfs dfs -mkdir /user/margusja/trips

[margusja@hadoopnn1 ~]$ beeline -f create_externat_table_trips.hql
-u

"jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com"


--verbose
WARNING: Use "yarn jar" to launch YARN applications.
issuing: !connect jdbc:hive2://hadoopnn1.example.com:2181
,hadoopnn2.example.com:2181

,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com


'' [passwd stripped]
Connecting to jdbc:hive2://hadoopnn1.example.com:2181
,hadoopnn2.example.com:2181

,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com


Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> use default;
Getting log thread is interrupted, since query is done!
No rows affected (1.225 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> drop table if exists trips;
Getting log thread is interrupted, since query is done!
No rows affected (0.159 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> CREATE EXTERNAL TABLE
`TRIPS`(
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> `bike_nr` string,
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> `duration` int,
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> `start_date` string,
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> `start_station` string,
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> `end_station` string)
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> PARTITIONED BY (
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> `year` int,
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> `month` string)
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> ROW FORMAT DELIMITED
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> FIELDS TERMINATED BY ','
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> LINES TERMINATED BY '\n'
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> STORED AS INPUTFORMAT
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado>
'org.apache.hadoop.mapred.TextInputFormat'
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> OUTPUTFORMAT
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado>
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> LOCATION
0: jdbc:hive2://hadoopnn1.example.com:2181
,hado> '/user/margusja/trips';
Getting log thread is interrupted

Re: Create external table

2016-05-11 Thread Margus Roo

Sadly in our environment:


Generated files like you did.

Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoopnn1.estpak.ee:2181,hado> create external table 
files_10k (i int) row format delimited fields terminated by '\t' 
location '/user/margusja/files_10k';

Error: Shutdown in progress, cannot remove a shutdownHook (state=,code=0)
0: jdbc:hive2://hadoopnn1.estpak.ee:2181,hado>

Using just hive:

[margusja@hadoopnn1 ~]$ hive
WARNING: Use "yarn jar" to launch YARN applications.
log4j:WARN No such property [maxBackupIndex] in 
org.apache.log4j.DailyRollingFileAppender.


Logging initialized using configuration in 
file:/etc/hive/2.3.4.0-3485/0/hive-log4j.properties
hive> create external table files_10k (i int) row format delimited 
fields terminated by '\t' location '/user/margusja/files_10k';

OK
Time taken: 1.255 seconds
hive>


Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780

On 11/05/16 10:16, Markovitz, Dudu wrote:
create external table files_10k (i int) row format delimited fields 
terminated by '\t' location '/tmp/files_10k';




RE: Create external table

2016-05-11 Thread Markovitz, Dudu
Could not reproduced that issue on Cloudera quickstart VM.

I’ve created an HDFS directory with 10,000 files.
I’ve create external table from within beeline.
The creation was immediate.

Dudu

---
bash
---
mkdir files_10k
awk 'BEGIN{for (i=1;i<=1;++i){print i>"./files_10k/f"i".txt"}}'
hdfs dfs -put files_10k /tmp

---
beeline
---
> create external table files_10k (i int) row format delimited fields 
> terminated by '\t' location '/tmp/files_10k';
No rows affected (0.282 seconds)
> select * from files_10k;
10,000 rows selected (27.986 seconds)

From: Margus Roo [mailto:mar...@roo.ee]
Sent: Tuesday, May 10, 2016 11:26 PM
To: user@hive.apache.org
Subject: Re: Create external table


Hi again

I opened hive (an old client)

And exactly the same create external table  location [paht in hdfs to place 
where are loads of files] works and the same DDL does not work via beeline.

Margus (margusja) Roo

http://margus.roo.ee

skype: margusja

+372 51 48 780
On 10/05/16 23:03, Margus Roo wrote:

Hi

Can someone explain or provide documentation how Hive creates external tables?

I have problem with creating external table in case I am pointing location in 
hdfs in to directory where are loads of files. Beeline just hangs or there will 
be other errors.

In case I point location in to the empty directory then hive creates table.



So does hive looks into files during creating external table?

I can not find any documentation explaining it.

--

Margus (margusja) Roo

http://margus.roo.ee

skype: margusja

+372 51 48 780



Re: Create external table

2016-05-10 Thread Mich Talebzadeh
yes but table then exists correct I mean second time

did you try


*use default;*

*drop table if exists trips;*


it is still within Hive metadata registered as an existing table.





Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 11 May 2016 at 06:16, Margus Roo  wrote:

> Hi
>
> Thanks for your answer.
>
> ---
>
> At first I create an empty hdfs directory (if directory is empty I did not
> have problems before too).
>
> [margusja@hadoopnn1 ~]$ hdfs dfs -mkdir /user/margusja/trips
>
> [margusja@hadoopnn1 ~]$ beeline -f create_externat_table_trips.hql -u
> "jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com"
> 
> --verbose
> WARNING: Use "yarn jar" to launch YARN applications.
> issuing: !connect jdbc:hive2://hadoopnn1.example.com:2181,
> hadoopnn2.example.com:2181,
> hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com
> '' [passwd stripped]
> Connecting to jdbc:hive2://hadoopnn1.example.com:2181,
> hadoopnn2.example.com:2181,
> hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com
> Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
> Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado> use default;
> Getting log thread is interrupted, since query is done!
> No rows affected (1.225 seconds)
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado> drop table if exists
> trips;
> Getting log thread is interrupted, since query is done!
> No rows affected (0.159 seconds)
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado> CREATE EXTERNAL TABLE
> `TRIPS`(
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `bike_nr` string,
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `duration` int,
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `start_date` string,
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `start_station` string,
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `end_station` string)
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado> PARTITIONED BY (
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `year` int,
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `month` string)
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado> ROW FORMAT DELIMITED
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   FIELDS TERMINATED BY
> ','
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   LINES TERMINATED BY
> '\n'
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado> STORED AS INPUTFORMAT
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado>
> 'org.apache.hadoop.mapred.TextInputFormat'
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado> OUTPUTFORMAT
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado>
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado> LOCATION
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   '/user/margusja/trips';
> Getting log thread is interrupted, since query is done!
> No rows affected (0.067 seconds)
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado> desc trips;
> Getting log thread is interrupted, since query is done!
>
> +--+---+---+--+
> | col_name |   data_type   |comment
> |
>
> +--+---+---+--+
> | bike_nr  | string|
> |
> | duration | int   |
> |
> | start_date   | string|
> |
> | start_station| string|
> |
> | end_station  | string|
> |
> | year | int   |
> |
> | month| string|
> |
> |  | NULL  | NULL
> |
> | # Partition Information  | NULL  | NULL
> |
> | # col_name   | data_type | comment
> |
> |  | NULL  | NULL
> |
> | year | int   |
> |
> | month| string|
> |
>
> +--+---+---+--+
> 13 rows selected (0.46 seconds)
> 0: jdbc:hive2://hadoopnn1.example.com:2181,hado> !exit
> Closing: 0: jdbc:hive2://hadoopnn1.example.com:2181,
> hadoopnn2.example.com:2181,
> hadoopdn1.host.ee:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com
>
>
> But In case I try:
>

Re: Create external table

2016-05-10 Thread Margus Roo

Hi

Thanks for your answer.

---

At first I create an empty hdfs directory (if directory is empty I did 
not have problems before too).


[margusja@hadoopnn1 ~]$ hdfs dfs -mkdir /user/margusja/trips

[margusja@hadoopnn1 ~]$ beeline -f create_externat_table_trips.hql -u 
"jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com" 
--verbose

WARNING: Use "yarn jar" to launch YARN applications.
issuing: !connect 
jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com 
'' [passwd stripped]
Connecting to 
jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.example.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com

Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> use default;
Getting log thread is interrupted, since query is done!
No rows affected (1.225 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> drop table if exists trips;
Getting log thread is interrupted, since query is done!
No rows affected (0.159 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> CREATE EXTERNAL TABLE 
`TRIPS`(

0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `bike_nr` string,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `duration` int,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `start_date` string,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> `start_station` string,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> `end_station` string)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> PARTITIONED BY (
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `year` int,
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   `month` string)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> ROW FORMAT DELIMITED
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   FIELDS TERMINATED BY ','
0: jdbc:hive2://hadoopnn1.example.com:2181,hado>   LINES TERMINATED BY '\n'
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> STORED AS INPUTFORMAT
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> 
'org.apache.hadoop.mapred.TextInputFormat'

0: jdbc:hive2://hadoopnn1.example.com:2181,hado> OUTPUTFORMAT
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

0: jdbc:hive2://hadoopnn1.example.com:2181,hado> LOCATION
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> '/user/margusja/trips';
Getting log thread is interrupted, since query is done!
No rows affected (0.067 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> desc trips;
Getting log thread is interrupted, since query is done!
+--+---+---+--+
| col_name |   data_type   | comment|
+--+---+---+--+
| bike_nr  | string |   |
| duration | int |   |
| start_date   | string |   |
| start_station| string |   |
| end_station  | string |   |
| year | int |   |
| month| string |   |
|  | NULL  | NULL  |
| # Partition Information  | NULL  | NULL  |
| # col_name   | data_type | comment   |
|  | NULL  | NULL  |
| year | int |   |
| month| string |   |
+--+---+---+--+
13 rows selected (0.46 seconds)
0: jdbc:hive2://hadoopnn1.example.com:2181,hado> !exit
Closing: 0: 
jdbc:hive2://hadoopnn1.example.com:2181,hadoopnn2.example.com:2181,hadoopdn1.host.ee:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_h...@testhadoop.com



But In case I try:

 CREATE EXTERNAL TABLE default.st1_test_margusja (
  original STRING,
  rsyslog_timestamp STRING,
  rsyslog_url STRING,
  rsyslog_appname STRING,
  rsyslog_pos5 STRING,
  ts STRING,
  url STRING,
  username STRING,
ip_address STRING,
log_level STRING,
content STRING
  )
  COMMENT 'Dealgate logs raw data'
  PARTITIONED BY (year STRING, month STRING, day STRING)
  ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
  WITH SERDEPROPERTIES ('input.regex'='([^\\s]+ ([^\\s]+) ([^\\s]+) 
([^\\s]+) ([^\\s]+) [^\\s]+ 
[^\

Re: Create external table

2016-05-10 Thread Mich Talebzadeh
Try this simple external table creation in beeline (check first that that
it connects OK)























*use default;drop table if exists trips;CREATE EXTERNAL TABLE `TRIPS`(
`bike_nr` string,  `duration` int,  `start_date` string,  `start_station`
string,  `end_station` string)PARTITIONED BY (  `year` int,  `month`
string)ROW FORMAT DELIMITED  FIELDS TERMINATED BY ','  LINES TERMINATED BY
'\n'STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'LOCATION
'/test/text/trips';desc trips;!exit*

run the code
bl -f create_externat_table_trips.hql

No rows affected (0.018 seconds)
CREATE EXTERNAL TABLE `TRIPS`(
  `bike_nr` string,
  `duration` int,
  `start_date` string,
  `start_station` string,
  `end_station` string)
PARTITIONED BY (
  `year` int,
  `month` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/test/text/trips'
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling
command(queryId=hduser_20160511001257_b2b612dd-d007-406d-a49c-83e8352a6ac1);
Time taken: 0.009 seconds
INFO  : Executing
command(queryId=hduser_20160511001257_b2b612dd-d007-406d-a49c-83e8352a6ac1):
CREATE EXTERNAL TABLE `TRIPS`(
  `bike_nr` string,
  `duration` int,
  `start_date` string,
  `start_station` string,
  `end_station` string)
PARTITIONED BY (
  `year` int,
  `month` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/test/text/trips'
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing
command(queryId=hduser_20160511001257_b2b612dd-d007-406d-a49c-83e8352a6ac1);
Time taken: 0.028 seconds
INFO  : OK
+--+---+---+--+
| col_name |   data_type   |comment|
+--+---+---+--+
| bike_nr  | string|   |
| duration | int   |   |
| start_date   | string|   |
| start_station| string|   |
| end_station  | string|   |
| year | int   |   |
| month| string|   |
|  | NULL  | NULL  |
| # Partition Information  | NULL  | NULL  |
| # col_name   | data_type | comment   |
|  | NULL  | NULL  |
| year | int   |   |
| month| string|   |
+--+---+---+--+
13 rows selected (0.13 seconds)
0: jdbc:hive2://rhes564:10010/default> Closing: 0:
jdbc:hive2://rhes564:10010/default


HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 10 May 2016 at 21:25, Margus Roo  wrote:

> Hi again
>
> I opened hive (an old client)
>
> And exactly the same create external table  location [paht in hdfs to
> place where are loads of files] works and the same DDL does not work via
> beeline.
>
> Margus (margusja) Roohttp://margus.roo.ee
> skype: margusja+372 51 48 780
>
> On 10/05/16 23:03, Margus Roo wrote:
>
> Hi
>
> Can someone explain or provide documentation how Hive creates external
> tables?
>
> I have problem with creating external table in case I am pointing location
> in hdfs in to directory where are loads of files. Beeline just hangs or
> there will be other errors.
>
> In case I point location in to the empty directory then hive creates table.
>
>
> So does hive looks into files during creating external table?
>
> I can not find any documentation explaining it.
>
> --
> Margus (margusja) Roohttp://margus.roo.ee
> skype: margusja+372 51 48 780
>
>
>


Re: Create external table

2016-05-10 Thread Margus Roo

Hi again

I opened hive (an old client)

And exactly the same create external table  location [paht in hdfs 
to place where are loads of files] works and the same DDL does not work 
via beeline.


Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780

On 10/05/16 23:03, Margus Roo wrote:


Hi

Can someone explain or provide documentation how Hive creates external 
tables?


I have problem with creating external table in case I am pointing 
location in hdfs in to directory where are loads of files. Beeline 
just hangs or there will be other errors.


In case I point location in to the empty directory then hive creates 
table.



So does hive looks into files during creating external table?

I can not find any documentation explaining it.

--
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780




Re: "Create external table" nulling data from source table

2016-01-28 Thread Gopal Vijayaraghavan
> And again: the same row is correct if I export a small set of data, and
>incorrect if I export a large set - so I think that file/data size has
>something to do with this.

My Phoenix vs LLAP benchmark hit size related issues in ETL.


In my case, the tipping point was >1 hdfs block per CSV file.

Generating CSV files compressed with SNAPPY was how I prevented the
old-style MapReduce splitters from arbitrarily chopping up those files on
block boundaries while loading.

>I just tested and if I take the orc table, copy it to a sequence file,
>and then copy to a csv "file", everything looks good.
...
> So, my (not-very-educated) guess is that this has to do with ORC files.

Yes, though somewhat indirectly. Check the output file sizes between those
two.

ORC -> SequenceFile -> Text

will produce smaller text files (more of them) than

ORC -> Text.

Cheers,
Gopal




RE: "Create external table" nulling data from source table

2016-01-28 Thread Riesland, Zack
Thanks Nick,

I don't think there should be any newline chars - the columns that are turning 
null are mostly Decimal, and none of my strings are longer than one word (no 
spaces).

And again: the same row is correct if I export a small set of data, and 
incorrect if I export a large set - so I think that file/data size has 
something to do with this.

That said, I think you have the right idea in general.

I just tested and if I take the orc table, copy it to a sequence file, and then 
copy to a csv "file", everything looks good.

Is that going to be my best workaround for this?

-Original Message-
From: Nicholas Hakobian [mailto:nicholas.hakob...@rallyhealth.com] 
Sent: Thursday, January 28, 2016 3:15 PM
To: user@hive.apache.org
Subject: Re: "Create external table" nulling data from source table

Do you have any fields with embedded newline characters? If so, certain hive 
output formats will parse the newline character as the end of row, and when 
importing, chances are the missing fields (now part of the next row) will be 
padded with nulls. This happens in Hive as well if you are using a TextFile 
intermediate format (SequenceFile does not have this problem).

-Nick

Nicholas Szandor Hakobian
Data Scientist
Rally Health
nicholas.hakob...@rallyhealth.com



On Thu, Jan 28, 2016 at 12:05 PM, Riesland, Zack  
wrote:
> First time posting to this list. Please forgive me if I break 
> etiquette. I’m looking for some help with getting data from hive to hbase.
>
>
>
> I’m using HDP 2.2.8.
>
>
>
> I have a compressed (zlib), orc-based hive table with 12 columns and 
> billions of rows.
>
> In order to get the data into hbase, I have to create a copy of the 
> table as an "external" table, backed by CSV files (unless someone 
> knows a better way).
>
>
>
> Then, I use the CsvBulkLoad mapreduce job to create hfiles from the 
> csv files backing the external table.
>
> I’ve been doing this for almost a year, and MOST of the data ends up 
> correct, but if I export a large amount of data, I end up with nulls 
> where I shouldn't.
>
>
>
> If I run the exact same query on the source table (compressed orc) and 
> destination table (external text) I get null values in the results of 
> the latter, but not the former.
>
>
>
> However, if I only copy a small subset of the data to the text-based 
> table, all the data is correct.
>
>
>
> I also noticed that if I use an uncompressed source table, and then 
> copy to an external text-based table, it happens much more often.
>
>
>
> So, my (not-very-educated) guess is that this has to do with ORC files.
>
>
>
> I know that there are alternatives to ORC, but Hortonworks strongly 
> encourages us to use ORC for everything. I’m not even sure whether 
> Parquet works with HDP.
>
>
>
> Anyways, Is this a known bug?
>
> Any ideas on how I can get around it without chopping up my data into 
> multiple tables?


Re: "Create external table" nulling data from source table

2016-01-28 Thread Nicholas Hakobian
Do you have any fields with embedded newline characters? If so,
certain hive output formats will parse the newline character as the
end of row, and when importing, chances are the missing fields (now
part of the next row) will be padded with nulls. This happens in Hive
as well if you are using a TextFile intermediate format (SequenceFile
does not have this problem).

-Nick

Nicholas Szandor Hakobian
Data Scientist
Rally Health
nicholas.hakob...@rallyhealth.com



On Thu, Jan 28, 2016 at 12:05 PM, Riesland, Zack
 wrote:
> First time posting to this list. Please forgive me if I break etiquette. I’m
> looking for some help with getting data from hive to hbase.
>
>
>
> I’m using HDP 2.2.8.
>
>
>
> I have a compressed (zlib), orc-based hive table with 12 columns and
> billions of rows.
>
> In order to get the data into hbase, I have to create a copy of the table as
> an "external" table, backed by CSV files (unless someone knows a better
> way).
>
>
>
> Then, I use the CsvBulkLoad mapreduce job to create hfiles from the csv
> files backing the external table.
>
> I’ve been doing this for almost a year, and MOST of the data ends up
> correct, but if I export a large amount of data, I end up with nulls where I
> shouldn't.
>
>
>
> If I run the exact same query on the source table (compressed orc) and
> destination table (external text) I get null values in the results of the
> latter, but not the former.
>
>
>
> However, if I only copy a small subset of the data to the text-based table,
> all the data is correct.
>
>
>
> I also noticed that if I use an uncompressed source table, and then copy to
> an external text-based table, it happens much more often.
>
>
>
> So, my (not-very-educated) guess is that this has to do with ORC files.
>
>
>
> I know that there are alternatives to ORC, but Hortonworks strongly
> encourages us to use ORC for everything. I’m not even sure whether Parquet
> works with HDP.
>
>
>
> Anyways, Is this a known bug?
>
> Any ideas on how I can get around it without chopping up my data into
> multiple tables?


Re: CREATE EXTERNAL TABLE Fails on Some Directories

2013-02-15 Thread bejoy_ks

Hi Joseph

There are differences in the following ls commands

cloudera@localhost data]$ hdfs dfs -ls /715

This would list out all the contents in /715 in hdfs, if it is a dir

Found 1 items
-rw-r--r--   1 cloudera supergroup    7853975 2013-02-14 17:03 /715

The output clearly defines it is file as d is missing as the first char

[cloudera@localhost data]$ hdfs dfs -ls 715

This lists the dir 715 in your user's hdfs home dir. If your user is cloudera 
then usually your home dir might be /userdata/cloudera/ so in effect the dir 
listed is /userdata/cloudera/715


Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: Joseph D Antoni 
Date: Fri, 15 Feb 2013 08:55:50 
To: user@hive.apache.org
Reply-To: user@hive.apache.org
Subject: Re: CREATE EXTERNAL TABLE Fails on Some Directories

Not sure--I just truncated the file list from the ls--that was the first file 
(just obfuscated the name)

The command I used to create the directories was:

hdfs dfs -mkdir 715 
then 
hdfs dfs -put myfile.csv 715

[cloudera@localhost data]$ hdfs dfs -ls /715
Found 1 items
-rw-r--r--   1 cloudera supergroup    7853975 2013-02-14 17:03 /715
[cloudera@localhost data]$ hdfs dfs -ls 715
Found 13 items
-rw-r--r--   1 cloudera cloudera    7853975 2013-02-15 00:41 715/40-file.csv

Thanks






 From: Dean Wampler 
To: user@hive.apache.org; Joseph D Antoni  
Sent: Friday, February 15, 2013 11:50 AM
Subject: Re: CREATE EXTERNAL TABLE Fails on Some Directories
 

Something's odd about this output; why is there no / in front of 715? I always 
get the full path when I run a -ls command. I would expect either:

/715/file.csv
or
/user//715/file.csv

Or is that what you meant by "(didn't leave rest of ls results)"?

dean


On Fri, Feb 15, 2013 at 10:45 AM, Joseph D Antoni  wrote:

[cloudera@localhost data]$ hdfs dfs -ls 715
>Found 13 items
>-rw-r--r--   1 cloudera cloudera    7853975 2013-02-15 00:41 715/file.csv 
>(didn't leave rest of ls results)
>
>
>Thanks on the directory--wasn't clear on that..
>
>Joey
>
>
>
>
>
>
>
>
>
> From: Dean Wampler 
>To: user@hive.apache.org; Joseph D Antoni  
>Sent: Friday, February 15, 2013 11:37 AM
>Subject: Re: CREATE EXTERNAL TABLE Fails on Some Directories
> 
>
>
>You confirmed that 715 is an actual directory? It didn't become a file by 
>accident?
>
>
>By the way, you don't need to include the file name in the LOCATION. It will 
>read all the files in the directory.
>
>
>dean
>
>
>On Fri, Feb 15, 2013 at 10:29 AM, Joseph D Antoni  wrote:
>
>I'm trying to create a series of external tables for a time series of data 
>(using the prebuilt Cloudera VM).
>>
>>
>>The directory structure in HDFS is as such:
>>
>>
>>/711
>>/712
>>/713
>>/714
>>/715
>>/716
>>/717
>>
>>
>>Each directory contains the same set of files, from a different day. They 
>>were all put into HDFS using the following script:
>>
>>
>>for i in *;do hdfs dfs -put $i in $dir;done
>>
>>
>>They all show up with the same ownership/perms in HDFS.
>>
>>
>>Going into Hive to build the tables, I built a set of scripts to do the 
>>loads--then did a sed (changing 711 to 712,713, etc) to a file for each day. 
>>All of my loads work, EXCEPT for 715 and 716. 
>>
>>
>>Script is as follows:
>>
>>
>>create external table 715_table_name
>>(col1 string,
>>col2 string)
>>row format
>>delimited fields terminated by ','
>>lines terminated by '\n'
>>stored as textfile
>>location '/715/file.csv';
>>
>>
>>This is failing with:
>>
>>
>>Error in Metadata MetaException(message:Got except: 
>>org.apache.hadoop.fs.FileAlreadyExistsException Parent Path is not a 
>>directory: /715 715...
>>
>>
>>Like I mentioned it works for all of the other directories, except 715 and 
>>716. Thoughts on troubleshooting path?
>>
>>
>>Thanks
>>
>>
>>Joey D'Antoni
>
>
>
>-- 
>Dean Wampler, Ph.D.
>thinkbiganalytics.com
>+1-312-339-1330
>
>
>
>


-- 
Dean Wampler, Ph.D.
thinkbiganalytics.com
+1-312-339-1330


Re: CREATE EXTERNAL TABLE Fails on Some Directories

2013-02-15 Thread Joseph D Antoni
Not sure--I just truncated the file list from the ls--that was the first file 
(just obfuscated the name)

The command I used to create the directories was:

hdfs dfs -mkdir 715 
then 
hdfs dfs -put myfile.csv 715

[cloudera@localhost data]$ hdfs dfs -ls /715
Found 1 items
-rw-r--r--   1 cloudera supergroup    7853975 2013-02-14 17:03 /715
[cloudera@localhost data]$ hdfs dfs -ls 715
Found 13 items
-rw-r--r--   1 cloudera cloudera    7853975 2013-02-15 00:41 715/40-file.csv

Thanks






 From: Dean Wampler 
To: user@hive.apache.org; Joseph D Antoni  
Sent: Friday, February 15, 2013 11:50 AM
Subject: Re: CREATE EXTERNAL TABLE Fails on Some Directories
 

Something's odd about this output; why is there no / in front of 715? I always 
get the full path when I run a -ls command. I would expect either:

/715/file.csv
or
/user//715/file.csv

Or is that what you meant by "(didn't leave rest of ls results)"?

dean


On Fri, Feb 15, 2013 at 10:45 AM, Joseph D Antoni  wrote:

[cloudera@localhost data]$ hdfs dfs -ls 715
>Found 13 items
>-rw-r--r--   1 cloudera cloudera    7853975 2013-02-15 00:41 715/file.csv 
>(didn't leave rest of ls results)
>
>
>Thanks on the directory--wasn't clear on that..
>
>Joey
>
>
>
>
>
>
>
>
>
> From: Dean Wampler 
>To: user@hive.apache.org; Joseph D Antoni  
>Sent: Friday, February 15, 2013 11:37 AM
>Subject: Re: CREATE EXTERNAL TABLE Fails on Some Directories
> 
>
>
>You confirmed that 715 is an actual directory? It didn't become a file by 
>accident?
>
>
>By the way, you don't need to include the file name in the LOCATION. It will 
>read all the files in the directory.
>
>
>dean
>
>
>On Fri, Feb 15, 2013 at 10:29 AM, Joseph D Antoni  wrote:
>
>I'm trying to create a series of external tables for a time series of data 
>(using the prebuilt Cloudera VM).
>>
>>
>>The directory structure in HDFS is as such:
>>
>>
>>/711
>>/712
>>/713
>>/714
>>/715
>>/716
>>/717
>>
>>
>>Each directory contains the same set of files, from a different day. They 
>>were all put into HDFS using the following script:
>>
>>
>>for i in *;do hdfs dfs -put $i in $dir;done
>>
>>
>>They all show up with the same ownership/perms in HDFS.
>>
>>
>>Going into Hive to build the tables, I built a set of scripts to do the 
>>loads--then did a sed (changing 711 to 712,713, etc) to a file for each day. 
>>All of my loads work, EXCEPT for 715 and 716. 
>>
>>
>>Script is as follows:
>>
>>
>>create external table 715_table_name
>>(col1 string,
>>col2 string)
>>row format
>>delimited fields terminated by ','
>>lines terminated by '\n'
>>stored as textfile
>>location '/715/file.csv';
>>
>>
>>This is failing with:
>>
>>
>>Error in Metadata MetaException(message:Got except: 
>>org.apache.hadoop.fs.FileAlreadyExistsException Parent Path is not a 
>>directory: /715 715...
>>
>>
>>Like I mentioned it works for all of the other directories, except 715 and 
>>716. Thoughts on troubleshooting path?
>>
>>
>>Thanks
>>
>>
>>Joey D'Antoni
>
>
>
>-- 
>Dean Wampler, Ph.D.
>thinkbiganalytics.com
>+1-312-339-1330
>
>
>
>


-- 
Dean Wampler, Ph.D.
thinkbiganalytics.com
+1-312-339-1330

Re: CREATE EXTERNAL TABLE Fails on Some Directories

2013-02-15 Thread Dean Wampler
Something's odd about this output; why is there no / in front of 715? I
always get the full path when I run a -ls command. I would expect either:

/715/file.csv
or
/user//715/file.csv

Or is that what you meant by "(didn't leave rest of ls results)"?

dean

On Fri, Feb 15, 2013 at 10:45 AM, Joseph D Antoni wrote:

> [cloudera@localhost data]$ hdfs dfs -ls 715
> Found 13 items
> -rw-r--r--   1 cloudera cloudera7853975 2013-02-15 00:41 715/file.csv
> (didn't leave rest of ls results)
>
>
> Thanks on the directory--wasn't clear on that..
>
> Joey
>
>
>
>   --
> *From:* Dean Wampler 
> *To:* user@hive.apache.org; Joseph D Antoni 
> *Sent:* Friday, February 15, 2013 11:37 AM
> *Subject:* Re: CREATE EXTERNAL TABLE Fails on Some Directories
>
> You confirmed that 715 is an actual directory? It didn't become a file by
> accident?
>
> By the way, you don't need to include the file name in the LOCATION. It
> will read all the files in the directory.
>
> dean
>
> On Fri, Feb 15, 2013 at 10:29 AM, Joseph D Antoni wrote:
>
> I'm trying to create a series of external tables for a time series of data
> (using the prebuilt Cloudera VM).
>
> The directory structure in HDFS is as such:
>
> /711
> /712
> /713
> /714
> /715
> /716
> /717
>
> Each directory contains the same set of files, from a different day. They
> were all put into HDFS using the following script:
>
> for i in *;do hdfs dfs -put $i in $dir;done
>
> They all show up with the same ownership/perms in HDFS.
>
> Going into Hive to build the tables, I built a set of scripts to do the
> loads--then did a sed (changing 711 to 712,713, etc) to a file for each
> day. All of my loads work, EXCEPT for 715 and 716.
>
> Script is as follows:
>
> create external table 715_table_name
> (col1 string,
> col2 string)
> row format
> delimited fields terminated by ','
> lines terminated by '\n'
> stored as textfile
> location '/715/file.csv';
>
> This is failing with:
>
> Error in Metadata MetaException(message:Got except:
> org.apache.hadoop.fs.FileAlreadyExistsException Parent Path is not a
> directory: /715 715...
>
> Like I mentioned it works for all of the other directories, except 715 and
> 716. Thoughts on troubleshooting path?
>
> Thanks
>
> Joey D'Antoni
>
>
>
>
> --
> *Dean Wampler, Ph.D.*
> thinkbiganalytics.com
> +1-312-339-1330
>
>
>
>


-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330


Re: CREATE EXTERNAL TABLE Fails on Some Directories

2013-02-15 Thread Joseph D Antoni
[cloudera@localhost data]$ hdfs dfs -ls 715
Found 13 items
-rw-r--r--   1 cloudera cloudera    7853975 2013-02-15 00:41 715/file.csv 
(didn't leave rest of ls results)


Thanks on the directory--wasn't clear on that..

Joey






 From: Dean Wampler 
To: user@hive.apache.org; Joseph D Antoni  
Sent: Friday, February 15, 2013 11:37 AM
Subject: Re: CREATE EXTERNAL TABLE Fails on Some Directories
 

You confirmed that 715 is an actual directory? It didn't become a file by 
accident?

By the way, you don't need to include the file name in the LOCATION. It will 
read all the files in the directory.

dean


On Fri, Feb 15, 2013 at 10:29 AM, Joseph D Antoni  wrote:

I'm trying to create a series of external tables for a time series of data 
(using the prebuilt Cloudera VM).
>
>
>The directory structure in HDFS is as such:
>
>
>/711
>/712
>/713
>/714
>/715
>/716
>/717
>
>
>Each directory contains the same set of files, from a different day. They were 
>all put into HDFS using the following script:
>
>
>for i in *;do hdfs dfs -put $i in $dir;done
>
>
>They all show up with the same ownership/perms in HDFS.
>
>
>Going into Hive to build the tables, I built a set of scripts to do the 
>loads--then did a sed (changing 711 to 712,713, etc) to a file for each day. 
>All of my loads work, EXCEPT for 715 and 716. 
>
>
>Script is as follows:
>
>
>create external table 715_table_name
>(col1 string,
>col2 string)
>row format
>delimited fields terminated by ','
>lines terminated by '\n'
>stored as textfile
>location '/715/file.csv';
>
>
>This is failing with:
>
>
>Error in Metadata MetaException(message:Got except: 
>org.apache.hadoop.fs.FileAlreadyExistsException Parent Path is not a 
>directory: /715 715...
>
>
>Like I mentioned it works for all of the other directories, except 715 and 
>716. Thoughts on troubleshooting path?
>
>
>Thanks
>
>
>Joey D'Antoni


-- 
Dean Wampler, Ph.D.
thinkbiganalytics.com
+1-312-339-1330

Re: CREATE EXTERNAL TABLE Fails on Some Directories

2013-02-15 Thread Dean Wampler
You confirmed that 715 is an actual directory? It didn't become a file by
accident?

By the way, you don't need to include the file name in the LOCATION. It
will read all the files in the directory.

dean

On Fri, Feb 15, 2013 at 10:29 AM, Joseph D Antoni wrote:

> I'm trying to create a series of external tables for a time series of data
> (using the prebuilt Cloudera VM).
>
> The directory structure in HDFS is as such:
>
> /711
> /712
> /713
> /714
> /715
> /716
> /717
>
> Each directory contains the same set of files, from a different day. They
> were all put into HDFS using the following script:
>
> for i in *;do hdfs dfs -put $i in $dir;done
>
> They all show up with the same ownership/perms in HDFS.
>
> Going into Hive to build the tables, I built a set of scripts to do the
> loads--then did a sed (changing 711 to 712,713, etc) to a file for each
> day. All of my loads work, EXCEPT for 715 and 716.
>
> Script is as follows:
>
> create external table 715_table_name
> (col1 string,
> col2 string)
> row format
> delimited fields terminated by ','
> lines terminated by '\n'
> stored as textfile
> location '/715/file.csv';
>
> This is failing with:
>
> Error in Metadata MetaException(message:Got except:
> org.apache.hadoop.fs.FileAlreadyExistsException Parent Path is not a
> directory: /715 715...
>
> Like I mentioned it works for all of the other directories, except 715 and
> 716. Thoughts on troubleshooting path?
>
> Thanks
>
> Joey D'Antoni
>



-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330


Re: Create external table like.

2012-07-27 Thread Vidhya Venkataraman
Thanks for the reference Bejoy.

V


On Fri, Jul 27, 2012 at 12:36 PM, Bejoy Ks  wrote:

> Hi Vidhya
>
> This bug was reported and fixed in a later version of hive , Hive 0.8. An
> upgrade would set things in place.
>
> https://issues.apache.org/jira/browse/HIVE-2888
>
> Regards,
> Bejoy KS
>
>   --
> *From:* Vidhya Venkataraman 
> *To:* user@hive.apache.org
> *Sent:* Friday, July 27, 2012 10:21 PM
> *Subject:* Create external table like.
>
> Hi
>   I am using Hive 0.7.x on my dev machine (yeah we will be upgrading soon
> :) )
>
>   I used the statement indicated in the subject to create an external
> table:
>
> *create external table ext_sample_v1 like sample_v1 location
> '/hive/warehouse/sample_v1/';*
> *
> *
> Since sample_v1 had partitions, I added some of them to ext_sample_v1 as
> well:
>
> *alter table ext_sample_v1 add partition (dt='8/1/2012');*
>
> Select * on ext_sample_v1 yields the correct result.
>
> One odd thing is:
> *hive > describe extended ext_sample_v1;*
> *
> OK
> **
> key_index int
> userid bigint
> otherid bigint
> dt string
>
> Detailed Table Information Table(tableName:ext_sample_v1, dbName:default,
> owner:vidhya, createTime:1343406972, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:key_index, type:int,
> comment:null), FieldSchema(name:userid, type:bigint, comment:null),
> FieldSchema(name:otherid, type:bigint, comment:null), FieldSchema(name:dt,
> type:string, comment:null)],
> location:hdfs://localhost:9000/hive/warehouse/sample_v1,
> inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{serialization.format=1}), bucketCols:[], sortCols:[],
> parameters:{}), partitionKeys:[FieldSchema(name:dt, type:string,
> comment:null)], parameters:{last_modified_by=vidhya, external=true,
> last_modified_time=1343407191, transient_lastDdlTime=1343407191},
> viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
> *
>
>
> Notice the MANAGED_TABLE value! So, it doesn't treat this table as an
> external table. In fact when I drop ext_sample_v1, it also deletes the data
> associated with it. I didn't look into Hive source, but is this expected?
> Is this corrected in a future version?
>
> V
>
>
>
>


Re: Create external table like.

2012-07-27 Thread Bejoy Ks
Hi Vidhya

This bug was reported and fixed in a later version of hive , Hive 0.8. An 
upgrade would set things in place.

https://issues.apache.org/jira/browse/HIVE-2888

 
Regards,
Bejoy KS



 From: Vidhya Venkataraman 
To: user@hive.apache.org 
Sent: Friday, July 27, 2012 10:21 PM
Subject: Create external table like.
 

Hi 
  I am using Hive 0.7.x on my dev machine (yeah we will be upgrading soon :) )
  I used the statement indicated in the subject to create an external table:


create external table ext_sample_v1 like sample_v1 location 
'/hive/warehouse/sample_v1/';

Since sample_v1 had partitions, I added some of them to ext_sample_v1 as well:

alter table ext_sample_v1 add partition (dt='8/1/2012');


Select * on ext_sample_v1 yields the correct result. 

One odd thing is: 
hive > describe extended ext_sample_v1;
OKkey_indexint
useridbigint
otheridbigint
dtstring
  
Detailed Table InformationTable(tableName:ext_sample_v1, dbName:default, 
owner:vidhya, createTime:1343406972, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:key_index, type:int, comment:null), 
FieldSchema(name:userid, type:bigint, comment:null), FieldSchema(name:otherid, 
type:bigint, comment:null), FieldSchema(name:dt, type:string, comment:null)], 
location:hdfs://localhost:9000/hive/warehouse/sample_v1, 
inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}), partitionKeys:[FieldSchema(name:dt, type:string, 
comment:null)], parameters:{last_modified_by=vidhya, external=true, 
last_modified_time=1343407191, transient_lastDdlTime=1343407191},
 viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)


Notice the MANAGED_TABLE value! So, it doesn't treat this table as an external 
table. In fact when I drop ext_sample_v1, it also deletes the data associated 
with it. I didn't look into Hive source, but is this expected? Is this 
corrected in a future version? 

V

Re: create external table on existing hive partitioned table ?

2012-06-19 Thread Mark Grover
In hindsight, I have learned to create external tables so if I get into a 
situation like this, I don't have to look up the appropriate alter table 
statment (if it exists at all), I can just drop the external table and 
re-create it:-)

Mark

- Original Message -
From: "Sai Naga" 
To: user@hive.apache.org
Sent: Tuesday, June 19, 2012 6:16:31 PM
Subject: Re: create external table on existing hive partitioned table ?

Thanks Mark, 


The reason to create the 2nd table is One of the column is defined as string in 
the first table, I wanted to read the string into Map data type. 




i.e 


Existing table. 


{"UY": 2, "BR": 1} 
{"LV": 1, "BR": 1} 



To 


Country Map 

Thanks 
Gopi 




On Tue, Jun 19, 2012 at 1:37 PM, Mark Grover < mgro...@oanda.com > wrote: 


Sai, 
Maybe I don't understand your question properly but creating an external table 
on a partitioned table is no different than create an external table on a 
non-partitioned one. 

Your syntax looks right. After table creation, you would have to add all 
existing partitions of the table so that the new table's metadata is properly 
populated. 
You can do that using a command like "ALTER TABLE page_access_m RECOVER 
PARTITIONS" on Amazon EMR Hive or using msck on Apache Hive (more details at: 
https://issues.apache.org/jira/browse/HIVE-874 ) 

Does that answer your question? 

On a related note, having two tables over the same data could be troublesome. 
For example, if one query adds a partition to one table, you would have to 
remember to recover partitions on the on the other table before you access your 
data through it. If the schema for both the tables is the same (except the fact 
that one is managed while other is external), any particular reason you'd like 
to create a new table? 

Mark 



- Original Message - 
From: "Sai Naga" < freshair1...@gmail.com > 
To: user@hive.apache.org 
Sent: Tuesday, June 19, 2012 4:19:25 PM 
Subject: create external table on existing hive partitioned table ? 


Is it possible, to create external table on a existing hive table which is 
partitioned. 


I have a existing hive table which is partitioned by dt and group like below 




desc page_access; 
page string 
country string 
dt string ( Partitioned column ) 
group string ( Partitioned column ) 






dfs -tail file /user/hive/warehouse/page_access/dt=2012-06-01/group=a/03_0 




PX1 {"UY": 2, "BR": 1} 
PX2 {"LV": 1, "BR": 1} 




Is it possible to define a external table something like.. 


create external table page_access_m 
( 
page string 
country map(,) 
)PARTITIONED BY(dt STRING, group STRING) 
location /user/hive/warehouse/page_access; 


currently I'm using hive.0.6 






Thanks 
Sai 



Re: create external table on existing hive partitioned table ?

2012-06-19 Thread Mark Grover
Hi Sai,
You might want to look into Alter table statements to see the type of the 
column can be modified. You may need to upgrade Hive if that functionality is 
not available in your version.

Mark

- Original Message -
From: "Sai Naga" 
To: user@hive.apache.org
Sent: Tuesday, June 19, 2012 6:16:31 PM
Subject: Re: create external table on existing hive partitioned table ?

Thanks Mark, 


The reason to create the 2nd table is One of the column is defined as string in 
the first table, I wanted to read the string into Map data type. 




i.e 


Existing table. 


{"UY": 2, "BR": 1} 
{"LV": 1, "BR": 1} 



To 


Country Map 

Thanks 
Gopi 




On Tue, Jun 19, 2012 at 1:37 PM, Mark Grover < mgro...@oanda.com > wrote: 


Sai, 
Maybe I don't understand your question properly but creating an external table 
on a partitioned table is no different than create an external table on a 
non-partitioned one. 

Your syntax looks right. After table creation, you would have to add all 
existing partitions of the table so that the new table's metadata is properly 
populated. 
You can do that using a command like "ALTER TABLE page_access_m RECOVER 
PARTITIONS" on Amazon EMR Hive or using msck on Apache Hive (more details at: 
https://issues.apache.org/jira/browse/HIVE-874 ) 

Does that answer your question? 

On a related note, having two tables over the same data could be troublesome. 
For example, if one query adds a partition to one table, you would have to 
remember to recover partitions on the on the other table before you access your 
data through it. If the schema for both the tables is the same (except the fact 
that one is managed while other is external), any particular reason you'd like 
to create a new table? 

Mark 



- Original Message - 
From: "Sai Naga" < freshair1...@gmail.com > 
To: user@hive.apache.org 
Sent: Tuesday, June 19, 2012 4:19:25 PM 
Subject: create external table on existing hive partitioned table ? 


Is it possible, to create external table on a existing hive table which is 
partitioned. 


I have a existing hive table which is partitioned by dt and group like below 




desc page_access; 
page string 
country string 
dt string ( Partitioned column ) 
group string ( Partitioned column ) 






dfs -tail file /user/hive/warehouse/page_access/dt=2012-06-01/group=a/03_0 




PX1 {"UY": 2, "BR": 1} 
PX2 {"LV": 1, "BR": 1} 




Is it possible to define a external table something like.. 


create external table page_access_m 
( 
page string 
country map(,) 
)PARTITIONED BY(dt STRING, group STRING) 
location /user/hive/warehouse/page_access; 


currently I'm using hive.0.6 






Thanks 
Sai 



Re: create external table on existing hive partitioned table ?

2012-06-19 Thread Sai Naga
Thanks Mark,

The reason to create the 2nd table is One of the column is defined as
string in the first table, I wanted to read the string into Map data type.


i.e

Existing table.

{"UY": 2, "BR": 1}
{"LV": 1, "BR": 1}

To

Country Map

Thanks
Gopi


On Tue, Jun 19, 2012 at 1:37 PM, Mark Grover  wrote:

> Sai,
> Maybe I don't understand your question properly but creating an external
> table on a partitioned table is no different than create an external table
> on a non-partitioned one.
>
> Your syntax looks right. After table creation, you would have to add all
> existing partitions of the table so that the new table's metadata is
> properly populated.
> You can do that using a command like "ALTER TABLE page_access_m RECOVER
> PARTITIONS" on Amazon EMR Hive or using msck on Apache Hive (more details
> at: https://issues.apache.org/jira/browse/HIVE-874)
>
> Does that answer your question?
>
> On a related note, having two tables over the same data could be
> troublesome. For example, if one query adds a partition to one table, you
> would have to remember to recover partitions on the on the other table
> before you access your data through it. If the schema for both the tables
> is the same (except the fact that one is managed while other is external),
> any particular reason you'd like to create a new table?
>
> Mark
>
> - Original Message -
> From: "Sai Naga" 
> To: user@hive.apache.org
> Sent: Tuesday, June 19, 2012 4:19:25 PM
> Subject: create external table on existing hive partitioned table ?
>
>
> Is it possible, to create external table on a existing hive table which is
> partitioned.
>
>
> I have a existing hive table which is partitioned by dt and group like
> below
>
>
>
>
> desc page_access;
> page string
> country string
> dt string ( Partitioned column )
> group string ( Partitioned column )
>
>
>
>
>
>
> dfs -tail file
> /user/hive/warehouse/page_access/dt=2012-06-01/group=a/03_0
>
>
>
>
> PX1 {"UY": 2, "BR": 1}
> PX2 {"LV": 1, "BR": 1}
>
>
>
>
> Is it possible to define a external table something like..
>
>
> create external table page_access_m
> (
> page string
> country map(,)
> )PARTITIONED BY(dt STRING, group STRING)
> location /user/hive/warehouse/page_access;
>
>
> currently I'm using hive.0.6
>
>
>
>
>
>
> Thanks
> Sai
>


Re: create external table on existing hive partitioned table ?

2012-06-19 Thread Mark Grover
Sai,
Maybe I don't understand your question properly but creating an external table 
on a partitioned table is no different than create an external table on a 
non-partitioned one.

Your syntax looks right. After table creation, you would have to add all 
existing partitions of the table so that the new table's metadata is properly 
populated.
You can do that using a command like "ALTER TABLE page_access_m RECOVER 
PARTITIONS" on Amazon EMR Hive or using msck on Apache Hive (more details at: 
https://issues.apache.org/jira/browse/HIVE-874)

Does that answer your question?

On a related note, having two tables over the same data could be troublesome. 
For example, if one query adds a partition to one table, you would have to 
remember to recover partitions on the on the other table before you access your 
data through it. If the schema for both the tables is the same (except the fact 
that one is managed while other is external), any particular reason you'd like 
to create a new table?

Mark

- Original Message -
From: "Sai Naga" 
To: user@hive.apache.org
Sent: Tuesday, June 19, 2012 4:19:25 PM
Subject: create external table on existing hive partitioned table ?


Is it possible, to create external table on a existing hive table which is 
partitioned. 


I have a existing hive table which is partitioned by dt and group like below 




desc page_access; 
page string 
country string 
dt string ( Partitioned column ) 
group string ( Partitioned column ) 






dfs -tail file /user/hive/warehouse/page_access/dt=2012-06-01/group=a/03_0 




PX1 {"UY": 2, "BR": 1} 
PX2 {"LV": 1, "BR": 1} 




Is it possible to define a external table something like.. 


create external table page_access_m 
( 
page string 
country map(,) 
)PARTITIONED BY(dt STRING, group STRING) 
location /user/hive/warehouse/page_access; 


currently I'm using hive.0.6 






Thanks 
Sai