sqoop issue

2017-02-21 Thread Raj hadoop
I'm using hadoop 2.5.1 and sqoop 1.4.6.

I am using sqoop import for importing table from mysql database to be used
with hadoop. It is showing following error

Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.hadoop.fs.FSOutputSummer


How to handle RAW data type of oracle in SQOOP import

2017-02-21 Thread Raj hadoop
How to handle RAW data type of oracle in SQOOP import


Re: Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x

2016-10-05 Thread Raj hadoop
seems its already present Amrit,

hdpmaster001:~ # useradd -G hdfs root
useradd: Account `root' already exists.
hdpmaster001:~ #


On Wed, Oct 5, 2016 at 2:46 PM, Amrit Jangid 
wrote:

> Hi Raj
>
> Do add root user into hdfs group.
> Run this command on your NameNode server.
>
>
> useradd -G hdfs root
>
> On Wed, Oct 5, 2016 at 2:07 PM, Raj hadoop  wrote:
>
>> Im getting it when im trying to start hive
>>
>> hdpmaster001:~ # hive
>> WARNING: Use "yarn jar" to launch YARN applications.
>>
>> how can I execute the same,
>> Thanks,
>> Raj.
>>
>> On Wed, Oct 5, 2016 at 1:56 PM, Raj hadoop  wrote:
>>
>>> Hi All,
>>>
>>> Could someone help in to solve this issue,
>>>
>>> Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/h
>>> ive-log4j.properties
>>> Exception in thread "main" java.lang.RuntimeException:
>>> org.apache.hadoop.security.AccessControlException: Permission denied:
>>> user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
>>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>>> heck(FSPermissionChecker.java:319)
>>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>>> heck(FSPermissionChecker.java:292)
>>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>>> heckPermission(FSPermissionChecker.java:213)
>>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>>> heckPermission(FSPermissionChecker.java:190)
>>> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPerm
>>> ission(FSDirectory.java:1780)
>>> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPerm
>>> ission(FSDirectory.java:1764)
>>> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAnce
>>> storAccess(FSDirectory.java:1747)
>>> at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(F
>>> SDirMkdirOp.java:71)
>>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(F
>>> SNamesystem.java:3972)
>>> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkd
>>> irs(NameNodeRpcServer.java:1081)
>>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ
>>> erSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTr
>>> anslatorPB.java:630)
>>> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocol
>>> Protos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNam
>>> enodeProtocolProtos.java)
>>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn
>>> voker.call(ProtobufRpcEngine.java:616)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>> upInformation.java:1709)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)
>>>
>>> at org.apache.hadoop.hive.ql.session.SessionState.start(Session
>>> State.java:516)
>>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
>>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:62)
>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>>> Caused by: org.apache.hadoop.security.AccessControlException:
>>> Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:d
>>> rwxr-xr-x
>>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>>> heck(FSPermissionChecker.java:319)
>>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>>> heck(FSPermissionChecker.java:292)
>>> at org.apache

Re: Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x

2016-10-05 Thread Raj hadoop
Im getting it when im trying to start hive

hdpmaster001:~ # hive
WARNING: Use "yarn jar" to launch YARN applications.

how can I execute the same,
Thanks,
Raj.

On Wed, Oct 5, 2016 at 1:56 PM, Raj hadoop  wrote:

> Hi All,
>
> Could someone help in to solve this issue,
>
> Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/
> hive-log4j.properties
> Exception in thread "main" java.lang.RuntimeException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:319)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:292)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:213)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:190)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1780)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1764)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkAncestorAccess(FSDirectory.java:1747)
> at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(
> FSDirMkdirOp.java:71)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(
> FSNamesystem.java:3972)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
> mkdirs(NameNodeRpcServer.java:1081)
> at org.apache.hadoop.hdfs.protocolPB.
> ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(
> ClientNamenodeProtocolServerSideTranslatorPB.java:630)
> at org.apache.hadoop.hdfs.protocol.proto.
> ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> ClientNamenodeProtocolProtos.java)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1709)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)
>
> at org.apache.hadoop.hive.ql.session.SessionState.start(
> SessionState.java:516)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.security.AccessControlException: Permission
> denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:319)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:292)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:213)
> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:190)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1780)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1764)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkAncestorAccess(FSDirectory.java:1747)
> at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(
> FSDirMkdirOp.java:71)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(
> FSNamesystem.java:3972)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
> mkdirs(NameNodeRpcServer.java:1081)
> at org.apache.hadoop.hdfs.protocolPB.
> ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(
> ClientNamenodePro

Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x

2016-10-05 Thread Raj hadoop
Hi All,

Could someone help in to solve this issue,

Logging initialized using configuration in
file:/etc/hive/2.4.2.0-258/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1780)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1764)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1747)
at
org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3972)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1081)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)

at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:516)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.security.AccessControlException: Permission
denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1780)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1764)
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1747)
at
org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3972)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1081)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInfo

Re: hive concurrency not working

2016-08-04 Thread Raj hadoop
Thanks everyone..

we are raising case with Hortonworks

On Wed, Aug 3, 2016 at 6:44 PM, Raj hadoop  wrote:

> Dear All,
>
> In need or your help,
>
> we have horton works 4 node cluster,and the problem is hive is allowing
> only one user at a time,
>
> if any second resource need to login hive is not working,
>
> could someone please help me in this
>
> Thanks,
> Rajesh
>


hive concurrency not working

2016-08-03 Thread Raj hadoop
Dear All,

In need or your help,

we have horton works 4 node cluster,and the problem is hive is allowing
only one user at a time,

if any second resource need to login hive is not working,

could someone please help me in this

Thanks,
Rajesh


Re: Unable to start Hive CLI after install

2016-04-04 Thread Raj Hadoop
Hi Mich -I did all those steps. Some how i am not able to find out whats the 
issue. Can you suggest any debugging tips ?Regards,Rajendra

 

On Monday, April 4, 2016 12:16 PM, Mich Talebzadeh 
 wrote:
 

 HI Raj,
Hive 2 is as good to go :) Check this
I see that you are using Oracle DB as your metastore. Mine is Oracle as well
  
    javax.jdo.option.ConnectionURL
    jdbc:oracle:thin:@rhes564:1521:mydb
    JDBC connect string for a JDBC metastore
  
Also need username/password for your metastore
  
    javax.jdo.option.ConnectionUserName
    hiveuser
    Username to use against metastore database
  
  
    javax.jdo.option.ConnectionPassword
    xxx
    password to use against metastore database
  

Now you also need to put Oracle jar file ojdbc6.jar in $HIVE_HOME/lib otherwise 
you won't be able to connect
HTH

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 4 April 2016 at 20:02, Raj Hadoop  wrote:

Sorry in a typo with your name - Mich.
 

On Monday, April 4, 2016 12:01 PM, Raj Hadoop  wrote:
 

 Thanks Mike. If Hive 2.0 is stable - i would definitely go for it. But let me 
troubleshoot 1.1.1 issues i am facing now.
here is my hive-site.xml. Can you please let me know if i am missing anything.


hive.exec.scratchdir
/tmp/hive




hive.metastore.local
false




hive.metastore.warehouse.dir
hdfs://z1:8899/user/hive/warehouse



javax.jdo.option.ConnectionURL
jdbc:oracle:thin:@//z4:1521/xe

 

javax.jdo.option.ConnectionDriverName
com.oracle.jdbc.Driver

 

javax.jdo.option.ConnectionUserName
hive

 

javax.jdo.option.ConnectionPassword
hive

 


    hive.querylog.location
    $HIVE_HOME/iotmp
    Location of Hive run time structured log file
  

  
    hive.exec.local.scratchdir
    $HIVE_HOME/iotmp
    Local scratch space for Hive jobs
  

  
    hive.downloaded.resources.dir
    $HIVE_HOME/iotmp
    Temporary local directory for added resources in the remote 
file system.
  



  

On Monday, April 4, 2016 11:46 AM, Mich Talebzadeh 
 wrote:
 

 Interesting why you did not download Hive 2.0 which is out now
The error says:
 HiveConf of name hive.metastore.local does not exist
In you hive-site.xml how have you configured parameters for hive.metastore?
HTH

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 4 April 2016 at 18:25, Raj Hadoop  wrote:

Hi,
I have downloaded apache hive 1.1.1 and trying to setup hive environment in my 
hadoop cluster.
On one of the nodes i installed hive and when i set all the variables and 
environment i am getting the following error.Please advise.

[hadoop@z1 bin]$ hive
2016-04-04 10:12:45,686 WARN  [main] conf.HiveConf 
(HiveConf.java:initialize(2605)) - HiveConf of name hive.metastore.local does 
not exist

Logging initialized using configuration in 
jar:file:/home/hadoop/hive/hive111/lib/hive-common-1.1.1.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/hadoop/hadoop262/hadoop262/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/hadoop/hive/hive111/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1485)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:64)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74)
    at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860)
    at 
org.apache.hadoop.hive.ql.session.Ses

Re: Unable to start Hive CLI after install

2016-04-04 Thread Raj Hadoop
Thanks Mike. If Hive 2.0 is stable - i would definitely go for it. But let me 
troubleshoot 1.1.1 issues i am facing now.
here is my hive-site.xml. Can you please let me know if i am missing anything.


hive.exec.scratchdir
/tmp/hive




hive.metastore.local
false




hive.metastore.warehouse.dir
hdfs://z1:8899/user/hive/warehouse



javax.jdo.option.ConnectionURL
jdbc:oracle:thin:@//z4:1521/xe

 

javax.jdo.option.ConnectionDriverName
com.oracle.jdbc.Driver

 

javax.jdo.option.ConnectionUserName
hive

 

javax.jdo.option.ConnectionPassword
hive

 


    hive.querylog.location
    $HIVE_HOME/iotmp
    Location of Hive run time structured log file
  

  
    hive.exec.local.scratchdir
    $HIVE_HOME/iotmp
    Local scratch space for Hive jobs
  

  
    hive.downloaded.resources.dir
    $HIVE_HOME/iotmp
    Temporary local directory for added resources in the remote 
file system.
  



  

On Monday, April 4, 2016 11:46 AM, Mich Talebzadeh 
 wrote:
 

 Interesting why you did not download Hive 2.0 which is out now
The error says:
 HiveConf of name hive.metastore.local does not exist
In you hive-site.xml how have you configured parameters for hive.metastore?
HTH

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 4 April 2016 at 18:25, Raj Hadoop  wrote:

Hi,
I have downloaded apache hive 1.1.1 and trying to setup hive environment in my 
hadoop cluster.
On one of the nodes i installed hive and when i set all the variables and 
environment i am getting the following error.Please advise.

[hadoop@z1 bin]$ hive
2016-04-04 10:12:45,686 WARN  [main] conf.HiveConf 
(HiveConf.java:initialize(2605)) - HiveConf of name hive.metastore.local does 
not exist

Logging initialized using configuration in 
jar:file:/home/hadoop/hive/hive111/lib/hive-common-1.1.1.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/hadoop/hadoop262/hadoop262/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/hadoop/hive/hive111/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1485)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:64)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74)
    at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860)
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:453)
    ... 8 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483)
    ... 13 more
Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
    at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)
    at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:78


Regards,Raj





  

Re: Unable to start Hive CLI after install

2016-04-04 Thread Raj Hadoop
Sorry in a typo with your name - Mich.
 

On Monday, April 4, 2016 12:01 PM, Raj Hadoop  wrote:
 

 Thanks Mike. If Hive 2.0 is stable - i would definitely go for it. But let me 
troubleshoot 1.1.1 issues i am facing now.
here is my hive-site.xml. Can you please let me know if i am missing anything.


hive.exec.scratchdir
/tmp/hive




hive.metastore.local
false




hive.metastore.warehouse.dir
hdfs://z1:8899/user/hive/warehouse



javax.jdo.option.ConnectionURL
jdbc:oracle:thin:@//z4:1521/xe

 

javax.jdo.option.ConnectionDriverName
com.oracle.jdbc.Driver

 

javax.jdo.option.ConnectionUserName
hive

 

javax.jdo.option.ConnectionPassword
hive

 


    hive.querylog.location
    $HIVE_HOME/iotmp
    Location of Hive run time structured log file
  

  
    hive.exec.local.scratchdir
    $HIVE_HOME/iotmp
    Local scratch space for Hive jobs
  

  
    hive.downloaded.resources.dir
    $HIVE_HOME/iotmp
    Temporary local directory for added resources in the remote 
file system.
  



  

On Monday, April 4, 2016 11:46 AM, Mich Talebzadeh 
 wrote:
 

 Interesting why you did not download Hive 2.0 which is out now
The error says:
 HiveConf of name hive.metastore.local does not exist
In you hive-site.xml how have you configured parameters for hive.metastore?
HTH

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 4 April 2016 at 18:25, Raj Hadoop  wrote:

Hi,
I have downloaded apache hive 1.1.1 and trying to setup hive environment in my 
hadoop cluster.
On one of the nodes i installed hive and when i set all the variables and 
environment i am getting the following error.Please advise.

[hadoop@z1 bin]$ hive
2016-04-04 10:12:45,686 WARN  [main] conf.HiveConf 
(HiveConf.java:initialize(2605)) - HiveConf of name hive.metastore.local does 
not exist

Logging initialized using configuration in 
jar:file:/home/hadoop/hive/hive111/lib/hive-common-1.1.1.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/hadoop/hadoop262/hadoop262/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/hadoop/hive/hive111/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1485)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:64)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74)
    at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860)
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:453)
    ... 8 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483)
    ... 13 more
Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
    at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)
    at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:78


Regards,Raj





   

  

Unable to start Hive CLI after install

2016-04-04 Thread Raj Hadoop
Hi,
I have downloaded apache hive 1.1.1 and trying to setup hive environment in my 
hadoop cluster.
On one of the nodes i installed hive and when i set all the variables and 
environment i am getting the following error.Please advise.

[hadoop@z1 bin]$ hive
2016-04-04 10:12:45,686 WARN  [main] conf.HiveConf 
(HiveConf.java:initialize(2605)) - HiveConf of name hive.metastore.local does 
not exist

Logging initialized using configuration in 
jar:file:/home/hadoop/hive/hive111/lib/hive-common-1.1.1.jar!/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/hadoop/hadoop262/hadoop262/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/hadoop/hive/hive111/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1485)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:64)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74)
    at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860)
    at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:453)
    ... 8 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483)
    ... 13 more
Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
    at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)
    at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:78


Regards,Raj



HCatStorer error

2015-11-24 Thread Raj hadoop
We are facing below mentioned error on storing dataset using HCatStorer.Can
someone please help us



STORE F INTO 'default.CONTENT_SVC_USED' using
org.apache.hive.hcatalog.pig.HCatStorer();



ERROR hive.log - Got exception: java.net.URISyntaxException Malformed
escape pair at index 9: thrift://%HOSTGROUP::host_group_master1%:9933

java.net.URISyntaxException: Malformed escape pair at index 9:
thrift://%HOSTGROUP::host_group_master1%:9933


Thanks,

Raj


select * from table and select column from table in hive

2014-10-20 Thread Raj Hadoop
I am able to see the data in the table for all the columns when I issue the 
following - 

SELECT * FROM t1 WHERE dt1='2013-11-20' 


But I am unable to see the column data when i issue the following - 

SELECT cust_num FROM t1 WHERE dt1='2013-11-20' 

The above shows null values. 

How should I debug this ?


Re: Remove duplicate records in Hive

2014-09-11 Thread Raj Hadoop
Thank you all for your suggestions. This group is the best.

I am working with the different options you guys suggested.

One big question I have is -

I am good at writing Oracle SQL queries. But the syntax with Hive is different. 
Especially - wiritng multiple SELECT statements in a single Hive Query has 
become a challenge. Can the group suggest any good tutorial that explains the 
basics of "Syntax to develop complex queries in Hive".

Regards,
Rajendra





On Thursday, September 11, 2014 2:48 AM, vivek thakre  
wrote:
 


Considering that the records only differ by one column i.e if the first two 
columns are are unique (distinct), then you simply use group by with max as 
aggregation function to eliminate duplicates i,e 

select cno, sqno, max (date) 
from table 
group by cno, sqno

If the above assumption is not true i.e if cno and sqno are not unique and for 
a particular cno, you want to get sqno with latest date, then you can do inner 
join with max select query something like

select a.cno, a.sqno, a.date
from table a 
join (select cno, max(date)  as max_date from table group by cno) b
on a.cno=b.cno
and a.date = b.max_date



On Wed, Sep 10, 2014 at 3:39 PM, Nishant Kelkar  wrote:

Try something like this then:
>
>
>SELECT A.cno, A.sqno, A.sorted_dates[A.size-1] AS latest_date
>FROM 
>(
>SELECT cno, sqno,
>SORT_ARRAY(COLLECT_SET(date)) AS sorted_dates, SIZE(COLLECT_SET(date)) AS size 
>FROM table GROUP BY cno, sqno
>) A;
>
>
>
>There are better ways of doing this, but this one's quick and dirty :)
>
>
>Best Regards,
>Nishant Kelkar
>
>
>On Wed, Sep 10, 2014 at 12:48 PM, Raj Hadoop  wrote:
>
>sort_array returns in ascending order. so the first element cannot be the 
>largest date. the last element is the largest date.
>>
>>
>>
>>
>>On Wednesday, September 10, 2014 3:38 PM, Nishant Kelkar 
>> wrote:
>> 
>>
>>
>>Hi Raj,
>>
>>
>>You'll have to change the format of your date to something like YYYY-MM-DD. 
>>For example, for "2-oct-2013" it will be 2013-10-02.
>>
>>
>>Best Regards,
>>Nishant Kelkar
>>
>>
>>
>>
>>
>>On Wed, Sep 10, 2014 at 11:48 AM, Raj Hadoop  wrote:
>>
>>The
>>>
>>>SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date
>>>
>>>is returning the lowest date. I need the largest date.
>>>
>>>
>>>
>>>
>>>On Wed, 9/10/14, Raj Hadoop  wrote:
>>>
>>> Subject: Re: Remove duplicate records in Hive
>>> To: user@hive.apache.org
>>> Date: Wednesday, September 10, 2014, 2:41 PM
>>>
>>>
>>> Thanks. I will try it.
>>> 
>>> On Wed, 9/10/14, Nishant Kelkar 
>>> wrote:
>>>
>>>  Subject: Re: Remove
>>> duplicate records in Hive
>>>  To: user@hive.apache.org,
>>> hadoop...@yahoo.com
>>>  Date: Wednesday, September 10, 2014, 1:59
>>> PM
>>>
>>>  Hi
>>>
>>> Raj, 
>>>  You can do something
>>>  along these lines: 
>>>
>>>  SELECT
>>>  cno, sqno,
>>> SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date
>>>  FROM table GROUP BY cno, sqno;
>>>  However, you have to make sure your
>>>  date format is such that sorting it gives you
>>> the most
>>>  recent date. The best way to do
>>> that is to have it in
>>>  format:
>>> -MM-DD.
>>>  Hope this helps.
>>>  Best Regards,Nishant
>>>
>>> Kelkar
>>>  On Wed, Sep 10, 2014 at
>>>  10:04 AM, Raj Hadoop 
>>>  wrote:
>>>
>>>
>>>  Hi,
>>>
>>>
>>>
>>>  I have a requirement in Hive
>>> to remove duplicate records (
>>>  they differ
>>> only by one column i.e a date column) and keep
>>>  the latest date record.
>>>
>>>
>>>
>>>  Sample
>>> :
>>>
>>>  Hive Table :
>>>
>>>   d2 is a higher
>>>
>>>  cno,sqno,date
>>>
>>>
>>>
>>>  100 1 1-oct-2013
>>>
>>>  101 2 1-oct-2013
>>>
>>>  100 1 2-oct-2013
>>>
>>>  102 2 2-oct-2013
>>>
>>>
>>>
>>>
>>>
>>>  Output needed:
>>>
>>>
>>>
>>>  100 1 2-oct-2013
>>>
>>>  101 2 1-oct-2013
>>>
>>>  102 2 2-oct-2013
>>>
>>>
>>>
>>>  I am using
>>> Hive 0.11
>>>
>>>
>>>
>>>  Any suggestions please ?
>>>
>>>
>>>
>>>  Regards,
>>>
>>>
>>> Raj
>>>
>>>
>>>
>>>
>>
>>
>>
>

Re: Remove duplicate records in Hive

2014-09-10 Thread Raj Hadoop
sort_array returns in ascending order. so the first element cannot be the 
largest date. the last element is the largest date.



On Wednesday, September 10, 2014 3:38 PM, Nishant Kelkar 
 wrote:
 


Hi Raj,

You'll have to change the format of your date to something like -MM-DD. For 
example, for "2-oct-2013" it will be 2013-10-02.

Best Regards,
Nishant Kelkar





On Wed, Sep 10, 2014 at 11:48 AM, Raj Hadoop  wrote:

The
>
>SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date
>
>is returning the lowest date. I need the largest date.
>
>
>
>--------
>On Wed, 9/10/14, Raj Hadoop  wrote:
>
> Subject: Re: Remove duplicate records in Hive
> To: user@hive.apache.org
> Date: Wednesday, September 10, 2014, 2:41 PM
>
>
> Thanks. I will try it.
> 
> On Wed, 9/10/14, Nishant Kelkar 
> wrote:
>
>  Subject: Re: Remove
> duplicate records in Hive
>  To: user@hive.apache.org,
> hadoop...@yahoo.com
>  Date: Wednesday, September 10, 2014, 1:59
> PM
>
>  Hi
>
> Raj, 
>  You can do something
>  along these lines: 
>
>  SELECT
>  cno, sqno,
> SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date
>  FROM table GROUP BY cno, sqno;
>  However, you have to make sure your
>  date format is such that sorting it gives you
> the most
>  recent date. The best way to do
> that is to have it in
>  format:
> -MM-DD.
>  Hope this helps.
>  Best Regards,Nishant
>
> Kelkar
>  On Wed, Sep 10, 2014 at
>  10:04 AM, Raj Hadoop 
>  wrote:
>
>
>  Hi,
>
>
>
>  I have a requirement in Hive
> to remove duplicate records (
>  they differ
> only by one column i.e a date column) and keep
>  the latest date record.
>
>
>
>  Sample
> :
>
>  Hive Table :
>
>   d2 is a higher
>
>  cno,sqno,date
>
>
>
>  100 1 1-oct-2013
>
>  101 2 1-oct-2013
>
>  100 1 2-oct-2013
>
>  102 2 2-oct-2013
>
>
>
>
>
>  Output needed:
>
>
>
>  100 1 2-oct-2013
>
>  101 2 1-oct-2013
>
>  102 2 2-oct-2013
>
>
>
>  I am using
> Hive 0.11
>
>
>
>  Any suggestions please ?
>
>
>
>  Regards,
>
>
> Raj
>
>
>
>

Re: Remove duplicate records in Hive

2014-09-10 Thread Raj Hadoop
The

SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date

is returning the lowest date. I need the largest date.




On Wed, 9/10/14, Raj Hadoop  wrote:

 Subject: Re: Remove duplicate records in Hive
 To: user@hive.apache.org
 Date: Wednesday, September 10, 2014, 2:41 PM
 
 Thanks. I will try it.
 
 On Wed, 9/10/14, Nishant Kelkar 
 wrote:
 
  Subject: Re: Remove
 duplicate records in Hive
  To: user@hive.apache.org,
 hadoop...@yahoo.com
  Date: Wednesday, September 10, 2014, 1:59
 PM
  
  Hi
 
 Raj, 
  You can do something
  along these lines: 
  
  SELECT
  cno, sqno,
 SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date
  FROM table GROUP BY cno, sqno;
  However, you have to make sure your
  date format is such that sorting it gives you
 the most
  recent date. The best way to do
 that is to have it in
  format:
 -MM-DD.
  Hope this helps.
  Best Regards,Nishant
 
 Kelkar
  On Wed, Sep 10, 2014 at
  10:04 AM, Raj Hadoop 
  wrote:
  
  
  Hi,
  
  
  
  I have a requirement in Hive
 to remove duplicate records (
  they differ
 only by one column i.e a date column) and keep
  the latest date record.
  
  
  
  Sample
 :
  
  Hive Table :
  
   d2 is a higher
  
  cno,sqno,date
  
  
  
  100 1 1-oct-2013
  
  101 2 1-oct-2013
  
  100 1 2-oct-2013
  
  102 2 2-oct-2013
  
  
  
  
  
  Output needed:
  
  
  
  100 1 2-oct-2013
  
  101 2 1-oct-2013
  
  102 2 2-oct-2013
  
  
  
  I am using
 Hive 0.11
  
  
  
  Any suggestions please ?
  
  
  
  Regards,
  
 
 Raj
  
  



Re: Remove duplicate records in Hive

2014-09-10 Thread Raj Hadoop
Thanks. I will try it.

On Wed, 9/10/14, Nishant Kelkar  wrote:

 Subject: Re: Remove duplicate records in Hive
 To: user@hive.apache.org, hadoop...@yahoo.com
 Date: Wednesday, September 10, 2014, 1:59 PM
 
 Hi
 Raj, 
 You can do something
 along these lines: 
 
 SELECT
 cno, sqno, SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date
 FROM table GROUP BY cno, sqno;
 However, you have to make sure your
 date format is such that sorting it gives you the most
 recent date. The best way to do that is to have it in
 format: -MM-DD.
 Hope this helps.
 Best Regards,Nishant
 Kelkar
 On Wed, Sep 10, 2014 at
 10:04 AM, Raj Hadoop 
 wrote:
 
 
 Hi,
 
 
 
 I have a requirement in Hive to remove duplicate records (
 they differ only by one column i.e a date column) and keep
 the latest date record.
 
 
 
 Sample :
 
 Hive Table :
 
  d2 is a higher
 
 cno,sqno,date
 
 
 
 100 1 1-oct-2013
 
 101 2 1-oct-2013
 
 100 1 2-oct-2013
 
 102 2 2-oct-2013
 
 
 
 
 
 Output needed:
 
 
 
 100 1 2-oct-2013
 
 101 2 1-oct-2013
 
 102 2 2-oct-2013
 
 
 
 I am using Hive 0.11
 
 
 
 Any suggestions please ?
 
 
 
 Regards,
 
 Raj
 
 



Remove duplicate records in Hive

2014-09-10 Thread Raj Hadoop

Hi,

I have a requirement in Hive to remove duplicate records ( they differ only by 
one column i.e a date column) and keep the latest date record.

Sample :
Hive Table :
 d2 is a higher 
cno,sqno,date

100 1 1-oct-2013
101 2 1-oct-2013
100 1 2-oct-2013
102 2 2-oct-2013


Output needed:

100 1 2-oct-2013
101 2 1-oct-2013
102 2 2-oct-2013

I am using Hive 0.11

Any suggestions please ?

Regards,
Raj


Can I update just one row in Hive table using Hive INSERT OVERWRITE

2014-04-04 Thread Raj Hadoop

Can I update ( delete and insert kind of)just one row keeping the remaining 
rows intact in Hive table using Hive INSERT OVERWRITE. There is no partition in 
the Hive table.



INSERT OVERWRITE TABLE tablename SELECT col1,col2,col3 from tabx where 
col2='abc';

Does the above work ? Please advise.

Re: HiveThrift Service Issue

2014-03-20 Thread Raj Hadoop
Hi Szehon,

It is not showing on the http://xyzserver:50030/jobtracker.jsp.


I checked this log. and this shows as -


/tmp/root/hive.log
 
 
 exec.ExecDriver (ExecDriver.java:addInputPaths(853)) - Processing
alias table_emp
exec.ExecDriver (ExecDriver.java:addInputPaths(871)) - Adding input
file hdfs://xyzserver:8020/user/hive/warehouse/table_emp 
2014-03-20 11:57:26,352
INFO  exec.ExecDriver (ExecDriver.java:createTmpDirs(221)) - Making Temp
Directory: 

hdfs://xyzserver:8020:8020/tmp/hive-root/hi
ve_2014-03-20_11-57-25_822_1668300320164798948-3/-ext-10001
2014-03-20 11:57:26,377 ERROR
security.UserGroupInformation (UserGroupInformation.java:doAs(1411)) -
PriviledgedActionException as:root (auth:SIMPLE) 

cause:org.apache.hadoop.sec





On Thursday, March 20, 2014 3:53 PM, Szehon Ho  wrote:
 
Hi Raj,

There are map-reduce job logs generated if the MapRedTask fails, those might 
give some clue.

Thanks,
Szehon




On Thu, Mar 20, 2014 at 12:29 PM, Raj Hadoop  wrote:

I am struggling on this one. Can any one throw some pointers on how to 
troubelshoot this issue please?
>
>
>
>
>On Thursday, March 20, 2014 3:09 PM, Raj Hadoop  wrote:
> 
>Hello everyone,
>
>
>The  HiveThrift Service was started succesfully.
>
>
>
>netstat -nl | grep 1
>
>
>
>
>tcp    0  0 0.0.0.0:1   0.0.0.0:*   
>LISTEN 
>
>
>
>
>
>I am able to read tables from Hive through Tableau. When executing queries 
>through Tableau I am getting the following error -
>
>
>Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 
>from org.apache.hadoop.hive.ql.exec.MapRedTask
>
>
>Can any one suggest what the problem is ?
>
>
>Regards,
>Raj
>
>
>

Re: HiveThrift Service Issue

2014-03-20 Thread Raj Hadoop
I am struggling on this one. Can any one throw some pointers on how to 
troubelshoot this issue please?




On Thursday, March 20, 2014 3:09 PM, Raj Hadoop  wrote:
 
Hello everyone,

The  HiveThrift Service was started succesfully.


netstat -nl | grep 1


tcp    0  0 0.0.0.0:1   0.0.0.0:*   
LISTEN 



I am able to read tables from Hive through Tableau. When executing queries 
through Tableau I am getting the following error -

Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.MapRedTask

Can any one suggest what the problem is ?

Regards,
Raj

HiveThrift Service Issue

2014-03-20 Thread Raj Hadoop
Hello everyone,

The  HiveThrift Service was started succesfully.


netstat -nl | grep 1


tcp    0  0 0.0.0.0:1   0.0.0.0:*   
LISTEN 



I am able to read tables from Hive through Tableau. When executing queries 
through Tableau I am getting the following error -

Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.MapRedTask

Can any one suggest what the problem is ?

Regards,
Raj


Re: Hive append

2014-03-06 Thread Raj hadoop
Hi Nitin,

existing records should remain same and the new records should get inserted
into the table


On Thu, Mar 6, 2014 at 2:11 PM, Nitin Pawar  wrote:

> are you talking about adding new records to tables or updating records in
> already existing table?
>
>
> On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop  wrote:
>
>> Query in HIVE
>>
>>
>>
>> I tried merge kind of operation in Hive to retain the existing records
>> and append the new records instead of dropping the table and populating it
>> again.
>>
>>
>>
>> If anyone can come help with any other approach other than this or the
>> approach to perform merge operation
>>
>>
>>
>> will be great help
>>
>
>
>
> --
> Nitin Pawar
>


Re: Hive append

2014-03-06 Thread Raj hadoop
Hi Nitin,

existing records should remain same and the new records should get inserted
into the table


On Thu, Mar 6, 2014 at 2:11 PM, Nitin Pawar  wrote:

> are you talking about adding new records to tables or updating records in
> already existing table?
>
>
> On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop  wrote:
>
>> Query in HIVE
>>
>>
>>
>> I tried merge kind of operation in Hive to retain the existing records
>> and append the new records instead of dropping the table and populating it
>> again.
>>
>>
>>
>> If anyone can come help with any other approach other than this or the
>> approach to perform merge operation
>>
>>
>>
>> will be great help
>>
>
>
>
> --
> Nitin Pawar
>


Hive append

2014-03-06 Thread Raj hadoop
Query in HIVE



I tried merge kind of operation in Hive to retain the existing records and
append the new records instead of dropping the table and populating it
again.



If anyone can come help with any other approach other than this or the
approach to perform merge operation



will be great help


Re: Merge records in hive

2014-03-05 Thread Raj hadoop
Thanks a lot Sunjay,

Any more thoughts on this,

Im okie with if any alternative for Merge concept in Hive


On Thu, Mar 6, 2014 at 2:02 AM, Subramanian, Sanjay (HQP) <
sanjay.subraman...@roberthalf.com> wrote:

>  Hey Raj
>
>  Maybe I am misunderstanding the question but u don't really have to do
> anything fancy to merge
>
>  ONE TIME
> 
> CREATE EXTERNAL TABLE employee (
>  empnoBIGINT,
>  ename STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;
>
>  ALTER TABLE employee SET LOCATION
> 'hdfs://path/to/dir/on/hdfs/containing/files'
>
>  Or if u r using AMAZON EMR :
> ALTER TABLE employee SET LOCATION
> 's3://bucketname/path/to/subfolder/containing/files'
>
>  Now if u keep putting files into this HDFS dir
>  'hdfs://path/to/dir/on/hdfs/containing/files'
> U should not have to do anything
>
>  Thanks
>
> Warm Regards
>
>
>  Sanjay
>
> linkedin:http://www.linkedin.com/in/subramaniansanjay
>
>   From: Raj hadoop 
> Reply-To: "user@hive.apache.org" 
> Date: Wednesday, March 5, 2014 at 4:16 AM
> To: "user@hive.apache.org" 
> Subject: Merge records in hive
>
>   Hi,
>
>
>
> Help required to merge data in hive,
>
>
>
> Ex:
>
> Today file
>
> -
>
> Empno  ename
>
> 1  abc
>
> 2  def
>
> 3  ghi
>
>
>
> Tomorrow file
>
> -
>
> Empno  ename
>
> 5  abcd
>
> 6  defg
>
> 7  ghij
>
>
>
>
>
> Reg: should not drop the hive table and then create it,what I actually
> require is as shown in the example we have to merge the data,
>
>
>
> Thanks,
>
> Raj
>


Merge records in hive

2014-03-05 Thread Raj hadoop
Hi,



Help required to merge data in hive,



Ex:

Today file

-

Empno  ename

1  abc

2  def

3  ghi



Tomorrow file

-

Empno  ename

5  abcd

6  defg

7  ghij





Reg: should not drop the hive table and then create it,what I actually
require is as shown in the example we have to merge the data,



Thanks,

Raj


Re: Sqoop import to HDFS and then Hive table - Issue with data type

2014-03-04 Thread Raj Hadoop
I am surprised. So you mean to say should I keep on running the same command 
multiple times -]

Alter table

It was a timestamp and then I later changed to string.

Please advise.





On Tuesday, March 4, 2014 5:30 PM, Edward Capriolo  
wrote:
 
No hive meta data does not change how the data is stored. Just keep changing 
until you get it right :)




On Tue, Mar 4, 2014 at 5:23 PM, Raj Hadoop  wrote:

All,
>
>
>I loaded data from an Oracle query through Sqoop to HDFS file. This is bzip 
>compressed files partitioned by one column date.
>
>
>I created a Hive table to point to the above location.
>
>
>After loading lot of data , I realized the data type of one of the column was 
>wrongly given.
>
>
>When I changed the data type of column using ALTER to the new type, it is 
>still showing NULL values.
>
>
>How should I resolve this?
>
>
>Do I need to recreate the table? If so, I have loaded lot of data and I should 
>not loose the data. This is an External table.
>
>
>Please advise.
>
>
>
>
>
>Regards,
>Raj

Sqoop import to HDFS and then Hive table - Issue with data type

2014-03-04 Thread Raj Hadoop
All,

I loaded data from an Oracle query through Sqoop to HDFS file. This is bzip 
compressed files partitioned by one column date.

I created a Hive table to point to the above location.

After loading lot of data , I realized the data type of one of the column was 
wrongly given.

When I changed the data type of column using ALTER to the new type, it is still 
showing NULL values.

How should I resolve this?

Do I need to recreate the table? If so, I have loaded lot of data and I should 
not loose the data. This is an External table.

Please advise.



Regards,
Raj


Connection refused error - Getting repeatedly

2014-02-26 Thread Raj Hadoop
All, 

I have a 3 node hadoop cluster CDH 4.4 and every few days or when ever I load 
some data through sqoop or query through hive , sometimes I get the following 
error -


Call From <> to <> failed on connection exception: 
java.net.ConnectException: Connection refused

This has become so frequent. What can be the reasons and how should I 
troubleshoot this? Is it the hardware or network can be the most common 
problem/issue with this kind of error. Please advise.

Regards,
Raj


Re: part-m-00000 files and their size - Hive table

2014-02-25 Thread Raj Hadoop
Thanks for the detailed explanation Yong. It helps.

Regards,
Raj





On Tuesday, February 25, 2014 9:18 PM, java8964  wrote:
 
Yes, it is good that the file sizes are evenly close, but not very important, 
unless there are files very small (compared to the block size).

The reasons are:

Your files should be splitable to be used in Hadoop (Or in Hive, it is the same 
thing). If they are splitable, then 1G file will use 10 blocks (assume the 
block size is 128M), and 256M file will take 2 blocks. So these 2 files will 
generate 12 mapper tasks, and will be equally run in your cluster. From 
performance point of view, you have 12 mapper tasks, and they are equally 
processed in the cluster. So one 1G file plus one 256M file are not big deal. 
But if you have one file are very small, like 10M, that one file will also 
consume one mapper task, and that is kind of bad for performance, as hadoop 
starting one mapper task only consuming 10M data, which is bad, because 
starting/stop tasks is using quite some resource, but only processing 10M data.

The reason you see unevenly file size of the output of sqoop is that it is hard 
for sqoop to split your source data evenly. For example, if you dump table A 
from DB to hive, sqoop will do the following:

1) Identify the primary/unique keys of the table.
2) Find out the min/max value of the keys, let say they are (1 to 1,000,000)
3) Based on # of your mapper task, split them. If you run sqoop with 4 mappers, 
then the data will be split into 4 groups (1, 250,000) (250,001, 500,000) 
(500,001, 750,000) (750,001, 1,000,000). As you can image, your data most 
likely are not even distributed by the primary keys in that 4 groups, then you 
will get unevenly output as part-m-xxx files.

Keep in mind that it is not required to use primary keys or unique keys as the 
split column. So you can choose whateven column in your table make sense. Pick 
up whateven can make the split more even.

Yong




Date: Tue, 25 Feb 2014 17:42:20 -0800
From: hadoop...@yahoo.com
Subject: part-m-0 files and their size - Hive table
To: user@hive.apache.org


Hi,

I am loading data to HDFS files through sqoop and creating a Hive table to 
point to these files.

The mapper files through sqoop example are generated like this below.

part-m-0

 part-m-1

part-m-2

My question is -
1) For Hive query performance , how important or significant is the 
distribution of the file sizes above.

part_m_0 say 1 GB
part_m_1 say 3 GB
part_m_1 say 0.25 GB

Vs

part_m_0 say 1.4 GB
part_m_1 say 1.4 GB
part_m_1 say  1.45 B


NOTE : The size and no of files is just for sample. The real numbers are far 
bigger.


I am assuming the uniform distribution has a performance benefit .

If so, what is the reason and can I know the technical details. 

part-m-00000 files and their size - Hive table

2014-02-25 Thread Raj Hadoop
Hi,

I am loading data to HDFS files through sqoop and creating a Hive table to 
point to these files.

The mapper files through sqoop example are generated like this below.

part-m-0

 part-m-1

part-m-2

My question is -
1) For Hive query performance , how important or significant is the 
distribution of the file sizes above.

part_m_0 say 1 GB
part_m_1 say 3 GB
part_m_1 say 0.25 GB

Vs

part_m_0 say 1.4 GB
part_m_1 say 1.4 GB
part_m_1 say  1.45 B


NOTE : The size and no of files is just for sample. The real numbers are far 
bigger.


I am assuming the uniform distribution has a performance benefit .

If so, what is the reason and can I know the technical details. 


Re: Can a hive partition contain a string like 'tr_date=2014-01-01'

2014-02-25 Thread Raj Hadoop
Thanks. Will try it.





On Tuesday, February 25, 2014 8:23 PM, Kuldeep Dhole  
wrote:
 
Probably you should use tr_date='2014-01-01'
Considering tr_date partition is there

On Tuesday, February 25, 2014, Raj Hadoop  wrote:

I am trying to create a Hive partition like 'tr_date=2014-01-01'
>
>
>FAILED: ParseException line 1:58 mismatched input '-' expecting ) near '2014' 
>in add partition statement
>
>hive_ret_val:  64
>Errors while executing Hive for bksd table for 2014-01-01
>
>
>Are hyphen's not allowed in the partition directory. ?
>
>
>Please advise.
>
>
>Regards.
>Raj
>

Can a hive partition contain a string like 'tr_date=2014-01-01'

2014-02-25 Thread Raj Hadoop
I am trying to create a Hive partition like 'tr_date=2014-01-01'

FAILED: ParseException line 1:58 mismatched input '-' expecting ) near '2014' 
in add partition statement

hive_ret_val:  64
Errors while executing Hive for bksd table for 2014-01-01

Are hyphen's not allowed in the partition directory. ?

Please advise.

Regards.
Raj


Partition column on an Alpha Numeric Column

2014-02-11 Thread Raj Hadoop
All,

One of the primary key columns in a Relational table has alpha numberic of 6 
characters - varchar(6).

The first three characters has this pattern -
1st one - 1 to 9    

2nd one - 1 to 9 or a -z
3rd one - 1 to 9 or a -z

Is this a good idea for performing queries ( can be any queries based on other 
columns of the table )

Partition the data based on the first three characters summing upto a total of 
10 * 36 * 36 which is 12,960 partitons.


12960 partitons - Is it too much ? Impossible or never heard ? or can we 
consider this design ?

I know NameNode should have a powerful RAM. But how much ? How do we determine 
the limitation of the number of files a Name Node can handle?

Thanks,
Raj

Add few record(s) to a Hive table or a HDFS file on a daily basis

2014-02-09 Thread Raj Hadoop



Hi,

My requirement is a typical Datawarehouse and ETL requirement. I need to 
accomplish

1) Daily Insert transaction records to a Hive table or a HDFS file. This table 
or file is not a big table ( approximately 10 records per day). I don't want to 
Partition the table / file.


I am reading a few articles on this. It was being mentioned that we need to 
load to a staging table in Hive. And then insert like the below :

insertoverwrite tablefinaltable select*fromstaging;


I am not getting this logic. How should I populate the staging table daily.

Thanks,
Raj

Finding Hive and Hadoop version from command line

2014-02-09 Thread Raj Hadoop
All,

Is there any way from the command prompt I can find which hive version I am 
using and Hadoop version too?


Thanks in advance.

Regards,
Raj

How can I just find out the physical location of a partitioned table in Hive

2014-02-06 Thread Raj Hadoop
Hi,

How can I just find out the physical location of a partitioned table in Hive.


Show partitions 

gives me just the partition column info.

I want the location of the hdfs directory / files where the table is created.

Please advise.

Thanks,
Raj

Hive Query Error

2014-02-05 Thread Raj Hadoop
I am trying to create a Hive sequence file from another table by running the 
following -

Your query has the following error(s):
OK
FAILED: ParseException line 5:0 cannot recognize input near 'STORED' 'STORED' 
'AS' in constant click the Error Log tab above for details 
1
CREATE TABLE temp_xyz as
2
SELECT prop1,prop2,prop3,prop4,prop5
3
FROM hitdata
4
WHERE dateoflog=20130101 and prop1='785-ou'
5
STORED AS SEQUENCEFILE;
ok
failed: parseexception line 5:0 cannot recognize input near 'stored' 'stored' 
'as' in constant 

Re: GenericUDF Testing in Hive

2014-02-04 Thread Raj Hadoop

I want to do a simple test like this - but not working -

select ComplexUDFExample(List("a", "b", "c"), "b") from table1 limit 10;


FAILED: SemanticException [Error 10011]: Line 1:25 Invalid function 'List'






On Tuesday, February 4, 2014 2:34 PM, Raj Hadoop  wrote:
 
How to test a Hive GenericUDF which accepts two parameters List, T 

List -> Can it be the output of a collect set. Please advise.

I have a generic udf which takes List, T. I want to test it how it works 
through Hive. 





On Monday, January 20, 2014 5:19 PM, Raj Hadoop  wrote:
 
 
The following is a an example for a GenericUDF. I wanted to test this through a 
Hive query. Basically want to pass parameters some thing like "select 
ComplexUDFExample('a','b','c') from employees limit 10".


 
 
https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
 
 
 
class ComplexUDFExample extends GenericUDF {
  ListObjectInspector listOI;
  StringObjectInspector elementOI;
  @Override
  public String getDisplayString(String[] arg0) {
    return "arrayContainsExample()"; // this should probably be better
  }
  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
    if (arguments.length != 2) {
  throw new UDFArgumentLengthException("arrayContainsExample only takes 2 
arguments: List, T");
    }
    // 1. Check we received the right object types.
    ObjectInspector a = arguments[0];
    ObjectInspector b = arguments[1];
    if (!(a instanceof ListObjectInspector) || !(b instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException("first argument must be a list / array, 
second argument must be a
 string");
    }
    this.listOI = (ListObjectInspector) a;
    this.elementOI = (StringObjectInspector) b;
    
    // 2. Check that the list contains strings
    if(!(listOI.getListElementObjectInspector() instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException("first argument must be a list of 
strings");
    }
    
    // the return type of our function is a boolean, so we provide the correct 
object inspector
    return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
  }
  
  @Override
  public
 Object evaluate(DeferredObject[] arguments) throws HiveException {
    
    // get the list and string from the deferred objects using the object
 inspectors
    List list = (List) this.listOI.getList(arguments[0].get());
    String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
    
    // check for nulls
    if (list == null || arg == null) {
  return null;
    }
    
    // see if our list contains the value we need
    for(String s: list) {
  if (arg.equals(s)) return new Boolean(true);
    }
    return new Boolean(false);
  }
  
}
 
 
hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch 
''c'': arrayContainsExample only takes 2 arguments: List, T
 
--
 
How to test this example in Hive query. I know I am invoking it wrong. But how 
can I invoke it correctly.
 
My requirement is to pass a String of arrays as first argument and another 
string as second argument in Hive like below.
 
 
Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
from 
Employees
Group By col1;
 
How do i do that?
 
Thanks in advance.
 
Regards,
Raj

Re: GenericUDF Testing in Hive

2014-02-04 Thread Raj Hadoop
How to test a Hive GenericUDF which accepts two parameters List, T 

List -> Can it be the output of a collect set. Please advise.

I have a generic udf which takes List, T. I want to test it how it works 
through Hive. 





On Monday, January 20, 2014 5:19 PM, Raj Hadoop  wrote:
 
 
The following is a an example for a GenericUDF. I wanted to test this through a 
Hive query. Basically want to pass parameters some thing like "select 
ComplexUDFExample('a','b','c') from employees limit 10".


 
 
https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
 
 
 
class ComplexUDFExample extends GenericUDF {
  ListObjectInspector listOI;
  StringObjectInspector elementOI;
  @Override
  public String getDisplayString(String[] arg0) {
    return "arrayContainsExample()"; // this should probably be better
  }
  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
    if (arguments.length != 2) {
  throw new UDFArgumentLengthException("arrayContainsExample only takes 2 
arguments: List, T");
    }
    // 1. Check we received the right object types.
    ObjectInspector a = arguments[0];
    ObjectInspector b = arguments[1];
    if (!(a instanceof ListObjectInspector) || !(b instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException("first argument must be a list / array, 
second argument must be a
 string");
    }
    this.listOI = (ListObjectInspector) a;
    this.elementOI = (StringObjectInspector) b;
    
    // 2. Check that the list contains strings
    if(!(listOI.getListElementObjectInspector() instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException("first argument must be a list of 
strings");
    }
    
    // the return type of our function is a boolean, so we provide the correct 
object inspector
    return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
  }
  
  @Override
  public Object evaluate(DeferredObject[] arguments) throws HiveException {
    
    // get the list and string from the deferred objects using the object
 inspectors
    List list = (List) this.listOI.getList(arguments[0].get());
    String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
    
    // check for nulls
    if (list == null || arg == null) {
  return null;
    }
    
    // see if our list contains the value we need
    for(String s: list) {
  if (arg.equals(s)) return new Boolean(true);
    }
    return new Boolean(false);
  }
  
}
 
 
hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch 
''c'': arrayContainsExample only takes 2 arguments: List, T
 
--
 
How to test this example in Hive query. I know I am invoking it wrong. But how 
can I invoke it correctly.
 
My requirement is to pass a String of arrays as first argument and another 
string as second argument in Hive like below.
 
 
Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
from 
Employees
Group By col1;
 
How do i do that?
 
Thanks in advance.
 
Regards,
Raj

Find a date that is in the range of any array dates in Hive

2014-01-31 Thread Raj Hadoop
Hi,


I have the following requirement from a Hive table below.

CustNumActivityDatesRates
10010-Aug-13,12-Aug-13,20-Aug-1310,15,20

The data above says that

From 10 Aug to 11 Aug the rate is 10.
From 12 Aug to 19 Aug the rate is 15.

From 20-Aug to till date the rate is 20.

Note : The order is maintained in 'ActivityDates' and 'Rates'.

From the above table , I need to find the rate on say a given date 15-Aug-13. 
In the above case , the rate for 15-Aug-13 is 15.

How should I get this result in Hive.

I was reading about a Generic UDF and was thinking to write one like this.
The Generic UDF takes two inputs (input date , array of input dates ) . the 
output should be (an int )to return the element number in the array. 

In the above case 
Generic UDF(15-Aug-13,10-Aug-13,12-Aug-13,20-Aug-13) should return the 2nd 
element in array - 2.



Please advise if there is an alternative solution or if the above solution 
works. I have never written a UDF or Generic UDF and would need some help from 
the forum members. Please advise.


Regards,
Raj

Re: delete duplicate records in Hive table

2014-01-30 Thread Raj hadoop
Hi Nitin,

Thanks a ton for quick response,

Could you please share if any sql syntax for this

Thanks,
Raj.


On Thu, Jan 30, 2014 at 3:29 PM, Nitin Pawar wrote:

> easiest way to do is .. write it in a temp table and then select uniq of
> each column and writing to real table
>
>
> On Thu, Jan 30, 2014 at 3:19 PM, Raj hadoop  wrote:
>
>> Hi,
>>
>> Can someone help me how to delete duplicate records in Hive table,
>>
>> I know that delete and update are not supported by hive but still,
>>
>> if some know's some alternative can help me in this
>>
>> Thanks,
>> Raj.
>>
>
>
>
> --
> Nitin Pawar
>


delete duplicate records in Hive table

2014-01-30 Thread Raj hadoop
Hi,

Can someone help me how to delete duplicate records in Hive table,

I know that delete and update are not supported by hive but still,

if some know's some alternative can help me in this

Thanks,
Raj.


GenericUDF Testing in Hive

2014-01-20 Thread Raj Hadoop
 
The following is a an example for a GenericUDF. I wanted to test this through a 
Hive query. Basically want to pass parameters some thing like "select 
ComplexUDFExample('a','b','c') from employees limit 10".


 
 
https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
 
 
 
class ComplexUDFExample extends GenericUDF {
  ListObjectInspector listOI;
  StringObjectInspector elementOI;
  @Override
  public String getDisplayString(String[] arg0) {
    return "arrayContainsExample()"; // this should probably be better
  }
  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
    if (arguments.length != 2) {
  throw new UDFArgumentLengthException("arrayContainsExample only takes 2 
arguments: List, T");
    }
    // 1. Check we received the right object types.
    ObjectInspector a = arguments[0];
    ObjectInspector b = arguments[1];
    if (!(a instanceof ListObjectInspector) || !(b instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException("first argument must be a list / array, 
second argument must be a string");
    }
    this.listOI = (ListObjectInspector) a;
    this.elementOI = (StringObjectInspector) b;
    
    // 2. Check that the list contains strings
    if(!(listOI.getListElementObjectInspector() instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException("first argument must be a list of 
strings");
    }
    
    // the return type of our function is a boolean, so we provide the correct 
object inspector
    return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
  }
  
  @Override
  public Object evaluate(DeferredObject[] arguments) throws HiveException {
    
    // get the list and string from the deferred objects using the object 
inspectors
    List list = (List) this.listOI.getList(arguments[0].get());
    String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
    
    // check for nulls
    if (list == null || arg == null) {
  return null;
    }
    
    // see if our list contains the value we need
    for(String s: list) {
  if (arg.equals(s)) return new Boolean(true);
    }
    return new Boolean(false);
  }
  
}
 
 
hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch 
''c'': arrayContainsExample only takes 2 arguments: List, T
 
--
 
How to test this example in Hive query. I know I am invoking it wrong. But how 
can I invoke it correctly.
 
My requirement is to pass a String of arrays as first argument and another 
string as second argument in Hive like below.
 
 
Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
from 
Employees
Group By col1;
 
How do i do that?
 
Thanks in advance.
 
Regards,
Raj

Re: Basic UDF in Hive - How to setup

2014-01-17 Thread Raj Hadoop
Ok. I just figured out. I have to set classpath with EXPORT. Its working now.





On Friday, January 17, 2014 3:37 PM, Raj Hadoop  wrote:
 
Hi,

I am trying to compile a basic hive UDF java file. I am using all the jar files 
in my classpath but I am not able to compile it and getting the following 
error. I am using CDH4. Can any one advise please?

$ javac HelloWorld.java
HelloWorld.java:3: package org.apache.hadoop.hive.ql.exec does not exist
import org.apache.hadoop.hive.ql.exec.Description;
 ^
HelloWorld.java:4: package org.apache.hadoop.hive.ql.exec does not exist
import
 org.apache.hadoop.hive.ql.exec.UDF;
 ^
HelloWorld.java:5: package org.apache.hadoop.hive.ql.udf does not exist
import org.apache.hadoop.hive.ql.udf.UDFType;
    ^
HelloWorld.java:8: cannot find symbol
symbol: class UDF
public class HelloWorld extends UDF
    ^
4 errors
$ echo
 $CLASSPATH
/usr/lib/hive/lib/hive-beeline-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-builtins-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-cli-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-contrib-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-exec-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-hwi-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-jdbc-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-metastore-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-pdk-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-serde-0.10.0-cdh4.4.0.jar::/usr/lib/hive/lib/hive-service-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-shims-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/parquet-hive-1.0.jar:/usr/lib/hive/lib/sentry-binding-hive-1.1.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-annotations-2.0.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-annotations.jar:/usr/lib/hadoop/hadoop-auth-2.0.0-cdh4.4.0.jar:/usr/lib/hadoop
/hadoop-auth.jar:/usr/lib/hadoop/hadoop-common-2.0.0-cdh4




Thanks,
Raj

Basic UDF in Hive - How to setup

2014-01-17 Thread Raj Hadoop
Hi,

I am trying to compile a basic hive UDF java file. I am using all the jar files 
in my classpath but I am not able to compile it and getting the following 
error. I am using CDH4. Can any one advise please?

$ javac HelloWorld.java
HelloWorld.java:3: package org.apache.hadoop.hive.ql.exec does not exist
import org.apache.hadoop.hive.ql.exec.Description;
 ^
HelloWorld.java:4: package org.apache.hadoop.hive.ql.exec does not exist
import org.apache.hadoop.hive.ql.exec.UDF;
 ^
HelloWorld.java:5: package org.apache.hadoop.hive.ql.udf does not exist
import org.apache.hadoop.hive.ql.udf.UDFType;
    ^
HelloWorld.java:8: cannot find symbol
symbol: class UDF
public class HelloWorld extends UDF
    ^
4 errors
$ echo $CLASSPATH
/usr/lib/hive/lib/hive-beeline-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-builtins-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-cli-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-contrib-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-exec-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-hwi-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-jdbc-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-metastore-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-pdk-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-serde-0.10.0-cdh4.4.0.jar::/usr/lib/hive/lib/hive-service-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-shims-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/parquet-hive-1.0.jar:/usr/lib/hive/lib/sentry-binding-hive-1.1.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-annotations-2.0.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-annotations.jar:/usr/lib/hadoop/hadoop-auth-2.0.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-auth.ja
r:/usr/lib/hadoop/hadoop-common-2.0.0-cdh4




Thanks,
Raj

Re: JSON data to HIVE table

2014-01-07 Thread Raj Hadoop
 
All,
 
If I have to load JSON data to a Hive table (default record format while 
creating the table) - is that a requirement to convert each JSON record into 
one line.
 
How would I do this ?
 
 
Thanks,
Raj



From: Rok Kralj 
To: user@hive.apache.org 
Sent: Tuesday, January 7, 2014 3:54 AM
Subject: Re: JSON data to HIVE table



Also, if you have large or dynamic schemas which are a pain to write by hand, 
you can use this simple tool: 

https://github.com/strelec/hive-serde-gen




2014/1/7 Roberto Congiu 

Also https://github.com/rcongiu/Hive-JSON-Serde ;)
>
>
>
>On Mon, Jan 6, 2014 at 12:00 PM, Russell Jurney  
>wrote:
>
>Check these out: 
>>
>>
>>http://hortonworks.com/blog/discovering-hive-schema-in-collections-of-json-documents/
>>
>>http://hortonworks.com/blog/howto-use-hive-to-sqlize-your-own-tweets-part-two-loading-hive-sql-queries/
>>
>>https://github.com/kevinweil/elephant-bird
>>
>>
>>
>>
>>On Mon, Jan 6, 2014 at 9:36 AM, Raj Hadoop  wrote:
>>
>>Hi,
>>>
>>>I am trying to load a data that is in JSON format to Hive table. Can any one 
>>>suggest what is the method I need to follow?
>>>
>>>Thanks,
>>>Raj
>>
>>
>>
>>-- 
>>Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com 
>
>
>
>-- 
>-- 
>Good judgement comes with experience.
>Experience comes with bad judgement.
>
>--
>
>Roberto Congiu - Data Engineer - OpenX
>tel: +1 626 466 1141


-- 
eMail: rok.kr...@gmail.com 

JSON data to HIVE table

2014-01-06 Thread Raj Hadoop
Hi,
 
I am trying to load a data that is in JSON format to Hive table. Can any one 
suggest what is the method I need to follow?
 
Thanks,
Raj

Re: Dynamic columns in Hive Table - Best Design for the problem

2013-12-29 Thread Raj Hadoop
Matt,

Thanks for the suggestion. Can you please provide more details on what type of 
UDAF should I develop ? I have never worked on a UDAF earlier. But would like 
to explore it. Any tips on how to proceed.

Thanks,
Raj



On Saturday, December 28, 2013 2:47 PM, Matt Tucker  
wrote:
 
It looks like you're essentially doing a pivot function. Your best bet is to 
write a custom UDAF or look at the windowing functions available in recent 
releases.
Matt
On Dec 28, 2013 12:57 PM, "Raj Hadoop"  wrote:

Dear All Hive Group Members,
>
>
>I have the following requirement.
>
>
>Input:
>
>
>Ticket#|Date of booking|Price
>100|20-Oct-13|54
>
>100|21-Oct-13|56
>100|22-Oct-13|54
>100|23-Oct-13|55
>100|27-Oct-13|60
>100|30-Oct-13|47
>
>
>101|10-Sep-13|12
>101|13-Sep-13|14
>101|20-Oct-13|6
>
>
>
>
>Expected Output:
>
>
>Ticket#|Initial|Delta1|Delta2|Delta3|Delta4|Delta5
>100|20-Oct-13,54|21-Oct-13,2|22-Oct-13,0|23-Oct-3,1|27-Oct-13,6|30-Oct-13,-7
>101|10-Sep-13,12|13-Sep-13,2|20-Oct-13,-6|||
>
>
>The number of columns in the expected output is a dynamic list depending on 
>the number of price changes of a ticket.
>
>
>1) What is the best design to solve the above problem in Hive? 
>2) How do we implement it?
>
>
>Please advise.
>
>
>Regards,
>Raj
>
>
>
>
>
>
>
>
>
>

Dynamic columns in Hive Table - Best Design for the problem

2013-12-28 Thread Raj Hadoop
Dear All Hive Group Members,

I have the following requirement.

Input:

Ticket#|Date of booking|Price
100|20-Oct-13|54

100|21-Oct-13|56
100|22-Oct-13|54
100|23-Oct-13|55
100|27-Oct-13|60
100|30-Oct-13|47

101|10-Sep-13|12
101|13-Sep-13|14
101|20-Oct-13|6


Expected Output:

Ticket#|Initial|Delta1|Delta2|Delta3|Delta4|Delta5
100|20-Oct-13,54|21-Oct-13,2|22-Oct-13,0|23-Oct-3,1|27-Oct-13,6|30-Oct-13,-7
101|10-Sep-13,12|13-Sep-13,2|20-Oct-13,-6|||

The number of columns in the expected output is a dynamic list depending on the 
number of price changes of a ticket.

1) What is the best design to solve the above problem in Hive? 
2) How do we implement it?

Please advise.

Regards,
Raj

How to compress the text file - LZO utility ?

2013-12-09 Thread Raj Hadoop
Hi,

I have a large set of text files. I have created a Hive table pointing to each 
of these text files. I am looking to compress the files to save storage.

1) How should I compress the file to use LZO compression.

2) How to know whether LZO compression utility (command ?) is installed on the 
Hadoop cluster?

3) Should the Hive table definition be modified as a Sequence File if I 
compress the text file?

Please advise.

Thanks,
Raj

Re: how to find number of elements in an array in Hive

2013-12-02 Thread Raj Hadoop


Thanks Brad



On Monday, December 2, 2013 5:09 PM, Brad Ruderman  
wrote:
 
Check out

size

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF


Thanks,
Brad




On Mon, Dec 2, 2013 at 5:05 PM, Raj Hadoop  wrote:

hi,
>
>
>how to find number of elements in an array in Hive table?
>
>
>thanks,
>Raj
>
>
>

how to find number of elements in an array in Hive

2013-12-02 Thread Raj Hadoop
hi,

how to find number of elements in an array in Hive table?

thanks,
Raj

Compression for a HDFS text file - Hive External Partition Table

2013-11-13 Thread Raj Hadoop
Hi ,
  
1)  My requirement is to load a file ( a tar.gz file which has multiple tab 
separated values files and one file is the main file which has huge data – 
about 10 GB per day) to an externally partitioned hive table.
 
2)  What I am doing is I have automated the process by extracting the 
tar.gz file and get the main data file (10GB text file) and then loading to a 
hdfs file as text file.
 
3)  I want to compress the files. What is the procedure for it?
 
4)  Do I need to use any utility to compress the hit data file before 
loading to HDFS? And also should I define an Input Structure for HDFS File 
format through a Java Program?
 
Regards,
Raj

How to load a web log file (text format) to Hive with compression

2013-11-11 Thread Raj Hadoop
Hi,

I have a web log files (text format). I want to load these files to a Hive 
table in compressed format. How do I do it ?

Should I compress the text file (using any Linux utilities) and then create the 
Hive table?

Can any one provide me the Hive syntax for loading the compressed file?

Thanks,
Raj

Re: Hive external table partitions with less than symbol ?

2013-11-04 Thread Raj Hadoop
Hi -

I have this doubt.

Why do i need to use an INSERT INTO .

can I just create hdfs directories and map it to a hive external table setting 
the location of the hdfs directories.

will this work ? please advise.

Thanks,
Raj







On Monday, November 4, 2013 8:34 AM, Matouk IFTISSEN 
 wrote:
 
Yes it is possible:
hadoop fs -mkdir /hdfs_path/'cust_id>1000'

I tested it and works, then you can store data in this directory .

for concat function you do simple:

insert into your_table_partionned PARTITION (path_xxx)
select attr,id,  concat ('/data1/customer/', id) as path_xxx  from your_table
where  id <1000
......


Cdt.





2013/11/4 Raj Hadoop 

How can i use concat function? I did not get it. Can you please elaborate. 
>
>
>My requirement is to create a HDFS directory like 
>(cust_id>1000 and cust_id<2000)
>
>
>
>and map this to a Hive External table.
>
>
>can i do that?
>
>
>
>On Monday, November 4, 2013 3:34 AM, Matouk IFTISSEN 
> wrote:
> 
>Hello
>You can use concat function or case to do this like:
>Concat ('/data1/customer/', id) 
>.
>Where id <1000 
>Etc..
>Hope this help you ;)
>Le 3 nov. 2013 23:51, "Raj Hadoop"  a écrit :
>
>All,
>>
>>
>>I want to create partitions like the below and create a hive external table. 
>>How can i do that ?
>>
>>
>>/data1/customer/id<1000
>>/data1/customer/id>1000 and id < 2000
>>
>>/data1/customer/id >2000
>>
>>
>>
>>Is this possible ( < and > symbols in folders ?)
>>
>>
>>My requirement is to partition the hive table based on some customer id's.
>>
>>
>>Thanks,
>>Raj
>
>


-- 

Matouk IFTISSEN | Consultant BI & Big Data
 
24 rue du sentier - 75002 Paris - www.ysance.com
Mob : +33 6 78 51 18 69 || Fax : +33 1 73 72 97 26 
Ysance sur :Twitter | Facebook | Google+ | LinkedIn | Newsletter
Nos autres sites : ys4you | labdecisionnel | decrypt

Re: Hive external table partitions with less than symbol ?

2013-11-04 Thread Raj Hadoop
How can i use concat function? I did not get it. Can you please elaborate. 

My requirement is to create a HDFS directory like 
(cust_id>1000 and cust_id<2000)


and map this to a Hive External table.

can i do that?



On Monday, November 4, 2013 3:34 AM, Matouk IFTISSEN 
 wrote:
 
Hello
You can use concat function or case to do this like:
Concat ('/data1/customer/', id) 
.
Where id <1000 
Etc..
Hope this help you ;)
Le 3 nov. 2013 23:51, "Raj Hadoop"  a écrit :

All,
>
>
>I want to create partitions like the below and create a hive external table. 
>How can i do that ?
>
>
>/data1/customer/id<1000
>/data1/customer/id>1000 and id < 2000
>
>/data1/customer/id >2000
>
>
>
>Is this possible ( < and > symbols in folders ?)
>
>
>My requirement is to partition the hive table based on some customer id's.
>
>
>Thanks,
>Raj

Hive external table partitions with less than symbol ?

2013-11-03 Thread Raj Hadoop
All,

I want to create partitions like the below and create a hive external table. 
How can i do that ?


/data1/customer/id<1000
/data1/customer/id>1000 and id < 2000

/data1/customer/id >2000


Is this possible ( < and > symbols in folders ?)

My requirement is to partition the hive table based on some customer id's.

Thanks,
Raj

Re: Oracle to HDFS through Sqoop and a Hive External Table

2013-11-03 Thread Raj Hadoop
Manish,

Thanks for reply.


1. Load to Hdfs, beware of Sqoop error handling, as its a mapreduce based 
framework, so if 1 mapper fails it might happen that you get partial data.
So do you say that - if I can handle errors in Sqoop, going for 100 HDFS 
folders/files - is it OK ?

2. Create partition based on date and hour, if customer table has some date or 
timestamp column.
I cannot rely on date or timestamp column. So can I go with Customer ID ?

3. Think about file format also, as that will affect the load and query time.
Can you please suggest a file format that I have to use ?

4. Think about compression as well before hand, as that will govern the data 
split, and performance of your queries as well.
Does compression increases or reduces performance ? Isn't the compression 
advantage is saving in storage? 

- Raj



On Sunday, November 3, 2013 11:03 AM, manish.hadoop.work 
 wrote:
 
1. Load to Hdfs, beware of Sqoop error handling, as its a mapreduce based 
framework, so if 1 mapper fails it might happen that you get partial data.

2. Create partition based on date and hour, if customer table has some date or 
timestamp column.

3. Think about file format also, as that will affect the load and query time.

4. Think about compression as well before hand, as that will govern the data 
split, and performance of your queries as well.

Regards,
Manish



Sent from my T-Mobile 4G LTE Device


 Original message 
From: Raj Hadoop  
Date: 11/03/2013  7:39 AM  (GMT-08:00) 
To: Hive ,Sqoop ,User 
 
Subject: Oracle to HDFS through Sqoop and a Hive External Table 



Hi,

I am sending this to the three dist-lists of Hadoop, Hive and Sqoop as this 
question is closely related to all the three areas.

I have this requirement.

I have a big table in Oracle (about 60 million rows - Primary Key Customer Id). 
I want to bring this to HDFS and then create
a Hive external table. My requirement is running queries on this Hive table (at 
this time i do not know what queries i would be running).

Is the following a good design for the above problem ? Any pros and cons of 
this.


1) Load the table to HDFS using Sqoop into multiple folders (divide Customer 
Id's into 100 segments).
2) Create Hive external partition table based on the above 100 HDFS directories.


Thanks,
Raj

Oracle to HDFS through Sqoop and a Hive External Table

2013-11-03 Thread Raj Hadoop
Hi,

I am sending this to the three dist-lists of Hadoop, Hive and Sqoop as this 
question is closely related to all the three areas.

I have this requirement.

I have a big table in Oracle (about 60 million rows - Primary Key Customer Id). 
I want to bring this to HDFS and then create
a Hive external table. My requirement is running queries on this Hive table (at 
this time i do not know what queries i would be running).

Is the following a good design for the above problem ? Any pros and cons of 
this.


1) Load the table to HDFS using Sqoop into multiple folders (divide Customer 
Id's into 100 segments).
2) Create Hive external partition table based on the above 100 HDFS directories.


Thanks,
Raj

Re: External Partition Table

2013-10-31 Thread Raj Hadoop


Thanks Tim. I am using a String column for the partition column. 



On Thursday, October 31, 2013 6:49 PM, Timothy Potter  
wrote:
 
Hi Raj,
This seems like a matter of style vs. any performance benefit / cost ... if 
you're going to do a lot of queries just based on month or year, then #2 might 
be easier, e.g.

select * from foo where year = 2013 seems a little cleaner than select * from 
foo where date >= 20130101 and date <= 20131231 (not sure how you're encoding 
dates into a INT but I think you get the idea)

I do something similar but my partition fields are strings, like 
2013-10-31_ (which has the nice property of lexically sorting the same as 
numeric sort).

I'm assuming they will both have the same performance because Hive is still 
selecting the same number of input paths in both scenarios, one just happens to 
be a little deeper.

Cheers,
Tim



On Thu, Oct 31, 2013 at 4:34 PM, Raj Hadoop  wrote:

Hi,
>
>
>I am planning for a Hive External Partition Table based on a date.
>
>
>Which one of the below yields a better performance or both have the same 
>performance?
>
>
>1) Partition based on one folder per day
>LIKE date INT
>2) Partition based on one folder per year / month / day ( So it has three 
>folders) 
>LIKE year INT, month INT, day INT
>
>
>Thanks,
>Raj
>
>

Re: External Partition Table

2013-10-31 Thread Raj Hadoop
Hi Brad,

Thanks for the quick response.

I have about 10 GB file per day (web logs). And I am creating a 
folder(partition) per each day. Is it something uncommon ?

I do not know at this juncture what kind of queries I would be executing upon 
on this table. But just wanted to know whether this is something normal or not 
at all a normal thing.

Thanks,
Raj



On Thursday, October 31, 2013 6:39 PM, Brad Ruderman  
wrote:
 
Wow that question won't be answerable. It all depends on the amount of data per 
partition and the queries you are going to be executing on it, as well as the 
structure of the data. In general in hive (depending on your cluster size) you 
need to balance the number of files with the size, smaller number of files is 
typically preferred but partitions will help when date restricting.

Thx,
Brad



On Thu, Oct 31, 2013 at 3:34 PM, Raj Hadoop  wrote:

Hi,
>
>
>I am planning for a Hive External Partition Table based on a date.
>
>
>Which one of the below yields a better performance or both have the same 
>performance?
>
>
>1) Partition based on one folder per day
>LIKE date INT
>2) Partition based on one folder per year / month / day ( So it has three 
>folders) 
>LIKE year INT, month INT, day INT
>
>
>Thanks,
>Raj
>
>

External Partition Table

2013-10-31 Thread Raj Hadoop
Hi,

I am planning for a Hive External Partition Table based on a date.

Which one of the below yields a better performance or both have the same 
performance?

1) Partition based on one folder per day
LIKE date INT
2) Partition based on one folder per year / month / day ( So it has three 
folders) 
LIKE year INT, month INT, day INT

Thanks,
Raj


Re: Hive Query Questions - is null in WHERE

2013-10-17 Thread Raj Hadoop
 
Thanks. It worked for me now when i use it as an empty string.



From: Krishnan K 
To: "user@hive.apache.org" ; Raj Hadoop 
 
Sent: Thursday, October 17, 2013 11:11 AM
Subject: Re: Hive Query Questions - is null in WHERE



For string columns, null will be interpreted as an empty string and for others, 
it will be interpreted as null...

On Wednesday, October 16, 2013, Raj Hadoop wrote:

All,
>
>When a query is executed like the below
>
>select field1 from table1 where field1 is null;
>
>I am getting the results which have empty values or nulls in field1. How does 
>is null work in Hive queries.
>
>Thanks,
>Raj

Hive Query Questions - is null in WHERE

2013-10-16 Thread Raj Hadoop
All,
 
When a query is executed like the below
 
select field1 from table1 where field1 is null;
 
I am getting the results which have empty values or nulls in field1. How does 
is null work in Hive queries.
 
Thanks,
Raj

Re: How to load /t /n file to Hive

2013-10-07 Thread Raj Hadoop


Yes, I have it.

Thanks,
Raj


 From: Sonal Goyal 
To: "user@hive.apache.org" ; Raj Hadoop 
 
Sent: Monday, October 7, 2013 1:38 AM
Subject: Re: How to load /t /n file to Hive
 


Do you have the option to escape your tabs and newlines in your base file? 


Best Regards,
Sonal
Nube Technologies 







On Sat, Sep 21, 2013 at 12:34 AM, Raj Hadoop  wrote:

Hi,
> 
>I have a file which is delimted by a tab. Also, there are some fields in the 
>file which has a tab /t character and a new line /n character in some fields.
> 
>Is there any way to load this file using Hive load command? Or do i have to 
>use a Custom Map Reduce (custom) Input format with java ? Please advise.
> 
>Thanks,
>Raj

Re: How to load /t /n file to Hive

2013-09-20 Thread Raj Hadoop
Hi Gabo,

Are you suggesting to use java.net.URLEncoder ? Can you be more specific ? I 
have lot of fields in the file which are not only URL related but some text 
fields which has new line characters.

Thanks,
Raj



 From: Gabriel Eisbruch 
To: "user@hive.apache.org" ; Raj Hadoop 
 
Sent: Friday, September 20, 2013 4:43 PM
Subject: Re: How to load /t /n file to Hive
 


Hi 
 One way that we used to solve that problem it's to transform the data when you 
are creating/loading it, for example we've applied UrlEncode to each field on 
create time.

Thanks,
Gabo.



2013/9/20 Raj Hadoop 

Hi Nitin,
> 
>Thanks for the reply. I have a huge file in unix.
> 
>As per the file definition, the file is a tab separated file of fields. But I 
>am sure that within some field's I have some new line character. 
> 
>How should I find a record? It is a huge file. Is there some command?
> 
>Thanks,
> 
>
>
>From: Nitin Pawar 
>To: "user@hive.apache.org" ; Raj Hadoop 
> 
>Sent: Friday, September 20, 2013 3:15 PM
>Subject: Re: How to load /t /n file to Hive
>
>
>
>If your data contains new line chars, its better you write a custom map reduce 
>job and convert the data into a single line removing all unwanted chars in 
>column separator as well just having single new line char per line 
>
>
>
>On Sat, Sep 21, 2013 at 12:38 AM, Raj Hadoop  wrote:
>
>Please note that there is an escape chacter in the fields where the /t and /n 
>are present.
>>
>>
>>
>>From: Raj Hadoop 
>>To: Hive  
>>Sent: Friday, September 20, 2013 3:04 PM
>>Subject: How to load /t /n file to Hive
>>
>>
>>
>>Hi,
>> 
>>I have a file which is delimted by a tab. Also, there are some fields in the 
>>file which has a tab /t character and a new line /n character in some fields.
>> 
>>Is there any way to load this file using Hive load command? Or do i have to 
>>use a Custom Map Reduce (custom) Input format with java ? Please advise.
>> 
>>Thanks,
>>Raj
>>
>>
>
>
>
>-- 
>Nitin Pawar
>
>
>

Re: How to load /t /n file to Hive

2013-09-20 Thread Raj Hadoop
Hi Nitin,
 
Thanks for the reply. I have a huge file in unix.
 
As per the file definition, the file is a tab separated file of fields. But I 
am sure that within some field's I have some new line character. 
 
How should I find a record? It is a huge file. Is there some command?
 
Thanks,
 



From: Nitin Pawar 
To: "user@hive.apache.org" ; Raj Hadoop 
 
Sent: Friday, September 20, 2013 3:15 PM
Subject: Re: How to load /t /n file to Hive



If your data contains new line chars, its better you write a custom map reduce 
job and convert the data into a single line removing all unwanted chars in 
column separator as well just having single new line char per line 



On Sat, Sep 21, 2013 at 12:38 AM, Raj Hadoop  wrote:

Please note that there is an escape chacter in the fields where the /t and /n 
are present.
>
>
>
>From: Raj Hadoop 
>To: Hive  
>Sent: Friday, September 20, 2013 3:04 PM
>Subject: How to load /t /n file to Hive
>
>
>
>Hi,
>
>I have a file which is delimted by a tab. Also, there are some fields in the 
>file which has a tab /t character and a new line /n character in some fields.
>
>Is there any way to load this file using Hive load command? Or do i have to 
>use a Custom Map Reduce (custom) Input format with java ? Please advise.
>
>Thanks,
>Raj
>
>


-- 
Nitin Pawar

Re: How to load /t /n file to Hive

2013-09-20 Thread Raj Hadoop
Please note that there is an escape chacter in the fields where the /t and /n 
are present.




From: Raj Hadoop 
To: Hive  
Sent: Friday, September 20, 2013 3:04 PM
Subject: How to load /t /n file to Hive



Hi,

I have a file which is delimted by a tab. Also, there are some fields in the 
file which has a tab /t character and a new line /n character in some fields.

Is there any way to load this file using Hive load command? Or do i have to use 
a Custom Map Reduce (custom) Input format with java ? Please advise.

Thanks,
Raj

How to load /t /n file to Hive

2013-09-20 Thread Raj Hadoop
Hi,
 
I have a file which is delimted by a tab. Also, there are some fields in the 
file which has a tab /t character and a new line /n character in some fields.
 
Is there any way to load this file using Hive load command? Or do i have to use 
a Custom Map Reduce (custom) Input format with java ? Please advise.
 
Thanks,
Raj

Hive Thrift Service - Not Running Continously

2013-08-05 Thread Raj Hadoop
Hi,
 
 
The hive thrift service is not running continously. I had to execute  the 
command (hive --service hiveserver &) very frequently . Can any one help me on 
this?
 
Thanks,
Raj

Re: Help in debugging Hive Query

2013-07-25 Thread Raj Hadoop
Hi Sanjay,
 
Thanks for taking the time to write all the details. I did a silly mistake. The 
data type for visit_page_num, i created it as string. The string was causing 
issues when I am using the max function. A type cast to int in the query worked 
for me.
 
Regards,
Raj



From: Sanjay Subramanian 
To: "user@hive.apache.org"  
Sent: Thursday, July 25, 2013 1:41 PM
Subject: Re: Help in debugging Hive Query



The query is correct but since u r creating a managed table , that is possibly 
creating some issue and the records are not all getting created

This is what I would propose

CHECKPOINT  1 : Is this query running at all ?
===
Use this option in BOLD and run the QUERY ONLY (without any table creation) to 
log errors and pipe to a log file by using nohup or some other way that u prefer
hive -hiveconf hive.root.logger=INFO,console -e

select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data) a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from 
omniture_web_data group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;


CHECKPOINT 2 : Run the query (using the CREATE TABLE option) with these 
additional options
===
Required params:

SET mapreduce.job.maps=500; 
SET mapreduce.job.reduces=8; 
SET mapreduce.tasktracker.map.tasks.maximum=12; 
SET mapreduce.tasktracker.reduce.tasks.maximum=8; 
SET 
mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec; 
SET mapreduce.map.output.compress=true; 


Optional params:
---
If u r using compression in output , use the following ; u can change the 
LzoCodec to whatever u r using for compression 
SET hive.exec.compress.intermediate=true; 
SET hive.exec.compress.output=true;
SET 
mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec;
 
SET mapreduce.output.fileoutputformat.compress=true; 


Thanks

Sanjay

From: Raj Hadoop 
Reply-To: "user@hive.apache.org" , Raj Hadoop 

Date: Thursday, July 25, 2013 5:00 AM
To: Hive 
Subject: Help in debugging Hive Query


All,

I am trying to determine visits for customer from omniture weblog file using 
Hive.

Table: omniture_web_data
Columns: visid_high,visid_low,evar23,visit_page_num

Sample Data:
visid_high,visid_low,evar23,visit_page_num
999,888,1003,10
999,888,1003,14
999,888,1003,6
999,777,1003,12
999,777,1003,20

I want to calculate for each Customer Number ( evar23 is  Customer Number ) , 
total visits. visid_high and visid_low determines a unique visit.
For each distinct visitor, calculate sum of maximum visit_page_num. In above 
example

14 + 20 = 34 should be the total visits for the customer 1003.

I am trying to run the following queries - Method 1 is almost the same as 
Method 2. Except in Method 1 I only choose a particualr customer number 1003. 
In method 2 , i generalized to all.

In Method 1 , I am getting the accurate result. In metnhod 2 , I am not getting 
the same result as Method 1. 

Any suggestions on how to trouble shoot. ALso, any alternative approaches.

// Method 1
select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data where 
evar23='1003') a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from 
omniture_web_data where evar23='1003' group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;

/ Result of Method 1

100334

// Method 2

create table temp123 as
select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data) a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from 
omniture_web_data group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;

select * from temp123 where evar23='1003';

// The Result of Method 2 is not the same as Method 1. It is showing a 
different number.



Thanks,
Raj

 

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

Help in debugging Hive Query

2013-07-25 Thread Raj Hadoop
All,
 
I am trying to determine visits for customer from omniture weblog file using 
Hive.
 
Table: omniture_web_data
Columns: visid_high,visid_low,evar23,visit_page_num
 
Sample Data:
visid_high,visid_low,evar23,visit_page_num
999,888,1003,10
999,888,1003,14
999,888,1003,6
999,777,1003,12
999,777,1003,20
 
I want to calculate for each Customer Number ( evar23 is  Customer Number ) , 
total visits. visid_high and visid_low determines a unique visit.
For each distinct visitor, calculate sum of maximum visit_page_num. In above 
example
 
14 + 20 = 34 should be the total visits for the customer 1003.
 
I am trying to run the following queries - Method 1 is almost the same as 
Method 2. Except in Method 1 I only choose a particualr customer number 1003. 
In method 2 , i generalized to all.
 
In Method 1 , I am getting the accurate result. In metnhod 2 , I am not getting 
the same result as Method 1. 
 
Any suggestions on how to trouble shoot. ALso, any alternative approaches.
 
// Method 1
select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data where 
evar23='1003') a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from 
omniture_web_data where evar23='1003' group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;
 
/ Result of Method 1
 
100334
 
// Method 2

create table temp123 as
select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data) a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from 
omniture_web_data group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;
 
select * from temp123 where evar23='1003';
 
// The Result of Method 2 is not the same as Method 1. It is showing a 
different number.
 
 
 
Thanks,
Raj

Oracle to Hive

2013-07-10 Thread Raj Hadoop
 
All,
 
Can anyone give me tips on how to convert the following Oracle SQL to a Hive 
query.
 
 
SELECT a.c100, a.c300, b.c400
  FROM t1 a JOIN t2 b ON a.c200 = b.c200
 WHERE a.c100 IN (SELECT DISTINCT a.c100
 FROM t1 a JOIN t2 b ON a.c200 = b.c200
    WHERE b.c400 >= SYSDATE - 1)
   AND b.c400 >= SYSDATE - 1
   AND a.c300 = 0
 
 
The SYSDATE can be replaced by 
date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'-MM-dd') , 1) in Hive.
 
But I wanted to know the rest of the query. Any pointers or tips so that I can 
start on my own.
 
Thanks in advance.
 
Regards,
Raj

Special characters in web log file causing issues

2013-07-08 Thread Raj Hadoop


Hi ,
 
The log file that I am trying to load throuh Hive has some special characters 
 
The field is shown below and the special characters ¿¿are also shown.
 Shockwave Flash
in;Motive ManagementPlug-in;Google Update;Java(TM)Platform SE 7U21;McAfee 
SiteAdvisor;McAfee Virtual Technician;Windows Live¿¿ Photo Gallery;McAfee 
SecurityCenter;Silverlig
 
 
The above is causing the record to be terminated and loading another line.  How 
can I avoid this type of issues and how to load the proper data ? Any 
suggestions please.
Thanks,
Raj;Chrome Remote Desktop Viewer;NativeClient;Chrome PDF Viewer;Adobe 
Acrobat;Microsoft Office 2010;Motive Plug- 

Re: Loading a flat file + one additional field to a Hive table

2013-07-05 Thread Raj Hadoop
Thanks Sanjay. I will look into this.

Also - one more question.

When I am trying to load log file to Hive and comparing the counts like this

select count(*) from <>

Versus

wc -l <>

I see a few hundred records greater in <>. How should I debug it? Any 
tips please.



 From: Sanjay Subramanian 
To: "user@hive.apache.org" ; Raj Hadoop 
 
Sent: Saturday, July 6, 2013 4:32 AM
Subject: Re: Loading a flat file + one additional field to a Hive table
 


How about this ?

Assume you have a log file called 
oompaloompa.log

TIMESTAMP=$(date +%Y_%m_%d_T%H_%M_%S);mv oompaloopa.log 
oompaloopa.log.${TIMESTAMP};cat oompaloopa.log.${TIMESTAMP}| hdfs dfs -put - 
/user/sasubramanian/oompaloopa.log.${TIMESTAMP}

This will directly put the file on HDFS and u can put it to the LOCATION 
specified by your HIVE TABLE definition

sanjay
 
From: "manishbh...@rocketmail.com" 
Reply-To: "user@hive.apache.org" 
Date: Friday, July 5, 2013 10:39 AM
To: Raj Hadoop , Hive 
Subject: Re: Loading a flat file + one additional field to a Hive table


Raj,

You should dump the data in a temp table first and then move the data into 
final table with select query.
Select date(), c1,c2. From temp table.
Reason: we should avoid custom operation in load unless it is necessary.


Sent via Rocket from my HTC 

- Reply message -
From: "Raj Hadoop" 
To: "Hive" 
Subject: Loading a flat file + one additional field to a Hive table
Date: Fri, Jul 5, 2013 10:30 PM


Hi,
 
Can any one please suggest the best way to do the following in Hive?
 
Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive 
table  T1 ( D1,C1,C2,C3,C4) 
 
Can the following command be modified in some way to acheive the above
hive > load data local inpath '/software/home/hadoop/dat_files/' into table 
T1; 
 
My requirement is to append a date stamp to a Web log file and then load it to 
Hive table.
 
Thanks,
Raj 

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient,
 please contact the sender by reply email and destroy all copies of the 
original message along with any attachments, from your computer system. If you 
are the intended recipient, please be advised that the content of this message 
is subject to access, review
 and disclosure by the sender's Email System Administrator.

Loading a flat file + one additional field to a Hive table

2013-07-05 Thread Raj Hadoop
Hi,
 
Can any one please suggest the best way to do the following in Hive?
 
Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive 
table  T1 ( D1,C1,C2,C3,C4) 
 
Can the following command be modified in some way to acheive the above
hive > load data local inpath '/software/home/hadoop/dat_files/' into table 
T1; 
 
My requirement is to append a date stamp to a Web log file and then load it to 
Hive table.
 
Thanks,
Raj

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Raj Hadoop


Adding to that

- Multiple files can be concatenated from the directory like
Example:  cat 0-0 00-1 0-2 > final




 From: Raj Hadoop 
To: "user@hive.apache.org" ; "matouk.iftis...@ysance.com" 
 
Sent: Friday, July 5, 2013 12:17 AM
Subject: Re: How Can I store the Hive query result in one file ?
 


 

 hive > set hive.io.output.fileformat=CSVTextFile;
 hive > insert overwrite local directory '/usr/home/hadoop/da1/' select * from 
customers

*** customers is a Hive table



 From: Edward Capriolo 
To: "user@hive.apache.org"  
Sent: Friday, July 5, 2013 12:10 AM
Subject: Re: How Can I store the Hive query result in one file ?
 


Normally if use set mapred.reduce.tasks=1 you get one output file. You can also 
look at
hive.merge.mapfiles, mapred.reduce.tasks, hive.merge.reducefiles also you can 
use a separate tool https://github.com/edwardcapriolo/filecrush




On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar  wrote:

will hive -e "query" > filename  or hive -f query.q > filename will do ? 
>
>
>you specially want it to write into a named file on hdfs only? 
>
>
>
>On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN  
>wrote:
>
>Hello Hive users,
>>Is there a manner to store the Hive  query result (SELECT *.) in a 
>>specfique  and alone file (given the file name) like (INSERT OVERWRITE LOCAL 
>>DIRECTORY '/directory_path_name/')?
>>Thanks for your answers
>>
>>
>>
>
>
>
>-- 
>Nitin Pawar
>

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Raj Hadoop
 

 hive > set hive.io.output.fileformat=CSVTextFile;
 hive > insert overwrite local directory '/usr/home/hadoop/da1/' select * from 
customers

*** customers is a Hive table



 From: Edward Capriolo 
To: "user@hive.apache.org"  
Sent: Friday, July 5, 2013 12:10 AM
Subject: Re: How Can I store the Hive query result in one file ?
 


Normally if use set mapred.reduce.tasks=1 you get one output file. You can also 
look at
hive.merge.mapfiles, mapred.reduce.tasks, hive.merge.reducefiles also you can 
use a separate tool https://github.com/edwardcapriolo/filecrush




On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar  wrote:

will hive -e "query" > filename  or hive -f query.q > filename will do ? 
>
>
>you specially want it to write into a named file on hdfs only? 
>
>
>
>On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN  
>wrote:
>
>Hello Hive users,
>>Is there a manner to store the Hive  query result (SELECT *.) in a 
>>specfique  and alone file (given the file name) like (INSERT OVERWRITE LOCAL 
>>DIRECTORY '/directory_path_name/')?
>>Thanks for your answers
>>
>>
>>
>
>
>
>-- 
>Nitin Pawar
>

Issue with Oracle Hive Metastore (SEQUENCE_TABLE)

2013-07-03 Thread Raj Hadoop
Hi,
 
When I installed Hive earlier on my machine I used a oracle hive meta script. 
Please find attached the script. HIVE worked fine for me on this box with no 
issues.
 
I am trying to install Hive on another machine in a different Oracle metastore. 
I executed the meta script but I am having issues with my hive on second box.
 
$ hive
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in 
jar:file:/software/hadoop/hive/hive-0.9.0/lib/hive-common-0.9.0.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201307031616_605717324.txt
hive> show tables;
FAILED: Error in metadata: javax.jdo.JDOException: Couldnt obtain a new 
sequence (unique id) : ORA-00942: table or view does not exist
NestedThrowables:
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

I found the difference between the two meta stores and one table is missing in 
it. The table is SEQUENCE_TABLE. I do not know whether this table will be 
created automatically by Hive or should it be in the script.I dont remember 
what I did earlier and I am assuming I used the same script. Can any one had 
this issue earlier ? Please advise.
 
Also, Where to get the hive 0.9 oracle meta script?
 
Thanks,
Raj

hive-schema-0.9.0.oracle.sql
Description: Binary data


Re: Hive Table to CSV file

2013-07-01 Thread Raj Hadoop
Sorry. Its my bad.  I see the files now. I was looking in a different directory 
earlier.





 From: Mohammad Tariq 
To: user  
Sent: Monday, July 1, 2013 8:26 PM
Subject: Re: Hive Table to CSV file
 


Do you have permissions to write to this path?And make sure you are looking 
into the local FS, as Stephen has specified.


Warm Regards,
Tariq
cloudfront.blogspot.com



On Tue, Jul 2, 2013 at 5:25 AM, Stephen Sprague  wrote:

you gotta admit that's kinda funny.  Your stderr output shows not only once but 
three times where it put the output and in fact how many rows it put there.  
and to top it off it reported 'SUCCESS'.
>
>but you're saying there's nothing there? 
>
>now. call me crazy but i would tend to believe hive over you - but that's just 
>me. :)
>
>are you looking at the local filesystem on the same box you ran hive?
>
>
>
>
>On Mon, Jul 1, 2013 at 4:01 PM, Raj Hadoop  wrote:
>
>Hi,
>>
>>My requirement is to load data from a (one column) Hive view to a CSV file. 
>>After loading it, I dont see any file generated.
>>
>>I used the following commands to load data to file from a view v_june1
>>
>>
>>hive > set hive.io.output.fileformat=CSVTextFile;
>> hive > insert overwrite local directory '/usr/home/hadoop/da1/' select * 
>>from v_june1_pgnum 
>>
>>.The output at console is like the below. 
>>
>>
>>
>>MapReduce Total cumulative CPU time: 4 minutes 15 seconds 590 msec
>>Ended Job = job_201306141336_0113
>>Copying data to local directory /usr/home/hadoop/da1
>>Copying data to local directory /usr/home/hadoop/da1
>>3281 Rows loaded to /usr/home/hadoop/da1
>>MapReduce Jobs Launched:
>>Job 0: Map: 21  Reduce: 6   Cumulative CPU: 255.59 sec   HDFS Read: 
>>5373722496 HDFS Write: 389069 SUCCESS
>>Total MapReduce CPU Time Spent: 4 minutes 15 seconds 590 msec
>>OK
Time taken: 148.764 second
>>
>>
>>
>>My Question : I do not see any files created under /usr/home/hadoop/da1. 
>>Where are the files created?
>>
>>Thanks,
>>Raj
>>
>>
>>
>>
>

Hive Table to CSV file

2013-07-01 Thread Raj Hadoop
Hi,

My requirement is to load data from a (one column) Hive view to a CSV file. 
After loading it, I dont see any file generated.

I used the following commands to load data to file from a view v_june1


hive > set hive.io.output.fileformat=CSVTextFile;
 hive > insert overwrite local directory '/usr/home/hadoop/da1/' select * from 
v_june1_pgnum 

.The output at console is like the below. 



MapReduce Total cumulative CPU time: 4 minutes 15 seconds 590 msec
Ended Job = job_201306141336_0113
Copying data to local directory /usr/home/hadoop/da1
Copying data to local directory /usr/home/hadoop/da1
3281 Rows loaded to /usr/home/hadoop/da1
MapReduce Jobs Launched:
Job 0: Map: 21  Reduce: 6   Cumulative CPU: 255.59 sec   HDFS Read: 5373722496 
HDFS Write: 389069 SUCCESS
Total MapReduce CPU Time Spent: 4 minutes 15 seconds 590 msec
OK
Time taken: 148.764 second



My Question : I do not see any files created under /usr/home/hadoop/da1. Where 
are the files created?

Thanks,
Raj

TempStatsStore derby.log

2013-06-21 Thread Raj Hadoop
Hi,
 
I have Hive metastore created in an Oracle database. 
 
But when i execute my Hive queries , I see following directory and file created.
TempStatsStore  (directory)
derby.log
 
What are this? Can one one suggest why derby log is created even though my 
javax.jdo.option.ConnectionURL is pointing to Oracle?
 
Thanks,
Raj

Sqoop Oracle Import to Hive Table - Error in metadata: InvalidObjectException

2013-05-25 Thread Raj Hadoop
Hi,

I am trying to run the following to load an Oracle table to Hive table using 
Sqoop,


sqoop import --connect jdbc:oracle:thin:@//inferri.dm.com:1521/DBRM25 --table 
DS12.CREDITS --username UPX1 --password piiwer --hive-import

Note: DS12 is a schema and UPX1 is the user through which the schema and the 
table in the schema is accessed. I was able to access the table through sqlplus 
client tool.


I am getting the following error. Can any one identify the issue and let me 
know please?

ERROR exec.Task (SessionState.java:printError(400)) - FAILED: Error in 
metadata: InvalidObjectException(message:There is no database named ds12)
org.apache.hadoop.hive.ql.metadata.HiveException: 
InvalidObjectException(message:There is no database named ds12)
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:544)
    at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3305)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:242)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)
    at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
    at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
    at 
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:439)
    at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:449)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:647)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: InvalidObjectException(message:There is no database named dw)
    at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table(HiveMetaStore.java:852)
    at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:402)
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:538)
    ... 20 more

2013-05-25 17:37:14,276 ERROR ql.Driver (SessionState.java:printError(400)) - 
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask


Thanks,
Raj

Re: Apache Flume Properties File

2013-05-24 Thread Raj Hadoop
Hi,
When I am reading all the stuff on internet on Flume, everything is mostly on 
CDH distribution. I am aware that Flume is Cloudera's contribution but I am 
using a strict Apache version in my research work. When I was reading all this, 
I want to make sure from the forum that Apache flume if had any issues with 
install etc., 

So that is the reason why I had to sent it to the dist lists. My intention is 
not to get a silver platter. I am not expecting that. Anyways - sorry for 
inconvenience.

Thanks,
Raj







 From: Stephen Sprague 
To: user@hive.apache.org; Raj Hadoop  
Sent: Friday, May 24, 2013 6:32 PM
Subject: Re: Apache Flume Properties File
 


so you spammed three big lists there, eh? with a general question for somebody 
to serve up a solution on a silver platter for you -- all before you even read 
any documentation on the subject matter?

nice job and good luck to you.




On Fri, May 24, 2013 at 2:13 PM, Raj Hadoop  wrote:

Hi,
> 
>I just installed Apache Flume 1.3.1 and trying to run a small example to test. 
>Can any one suggest me how can I do this? I am going through the documentation 
>right now.
> 
>Thanks,
>Raj

Apache Flume Properties File

2013-05-24 Thread Raj Hadoop
Hi,
 
I just installed Apache Flume 1.3.1 and trying to run a small example to test. 
Can any one suggest me how can I do this? I am going through the documentation 
right now.
 
Thanks,
Raj

Sqoop Import Oracle Error - Attempted to generate class with no columns!

2013-05-22 Thread Raj Hadoop
Hi,
 
I just finished setting up Apache sqoop 1.4.3. I am trying to test basic sqoop 
import on Oracle.
 
sqoop import --connect jdbc:oracle:thin:@//intelli.dmn.com:1521/DBT --table 
usr1.testonetwo --username usr123 --password passwd123
 
 
I am getting the error as 
13/05/22 17:18:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM usr1.testonetwo t WHERE 1=0
13/05/22 17:18:16 ERROR tool.ImportTool: Imported Failed: Attempted to generate 
class with no columns!
 
I checked the database and the query runs fine from Oracle sqlplus client and 
Toad.
 
Thanks,
Raj

Hive tmp logs

2013-05-22 Thread Raj Hadoop
Hi,
 
My hive job logs are being written to /tmp/hadoop directory. I want to change 
it to a different location i.e. a sub directory somehere under the 'hadoop' 
user home directory.
How do I change it.
 
Thanks,
Ra

ORA-01950: no privileges on tablespace

2013-05-21 Thread Raj Hadoop
 
I am setting up a metastore on Oracle for Hive. I executed the script 
hive-schema-0.9.0-sql file too succesfully.
 
When i ran this
hive > show tables;
 
I am getting the following error.
 
ORA-01950: no privileges on tablespace
 
What kind of Oracle privileges are required (Quota wise for Hive) for hive 
oracle user in metastore? Please advise.

Re: Where to get Oracle scripts for Hive Metastore

2013-05-21 Thread Raj Hadoop
Sanjay -
 
This is the first location I tried. But Apache Hive 0.9.0 doesnt have an oracle 
folder. It only had mysql and derby.
 
Thanks,
Raj



From: Sanjay Subramanian 
To: "u...@hadoop.apache.org" ; Raj Hadoop 
; Hive  
Sent: Tuesday, May 21, 2013 3:12 PM
Subject: Re: Where to get Oracle scripts for Hive Metastore



Raj

The correct location of the script is where u deflated the hive tar 

For example 
/usr/lib/hive/scripts/metastore/upgrade/oracle

You will find a file in this directory called hive-schema-0.9.0.oracle.sql

Use this

sanjay
From: Raj Hadoop 
Reply-To: "u...@hadoop.apache.org" , Raj Hadoop 

Date: Tuesday, May 21, 2013 12:08 PM
To: Hive , User 
Subject: Where to get Oracle scripts for Hive Metastore


I am trying to get Oracle scripts for Hive Metastore.

http://mail-archives.apache.org/mod_mbox/hive-commits/201204.mbox/%3c20120423201303.9742b2388...@eris.apache.org%3E

The scripts in the above link has a  + at the begining of each line. How should 
I supposed to execute scripts like this through Oracle sqlplus.

+CREATE TABLE PART_COL_PRIVS
+(
+    PART_COLUMN_GRANT_ID NUMBER NOT NULL,
+    "COLUMN_NAME" VARCHAR2(128) NULL,
+    CREATE_TIME NUMBER (10) NOT NULL,
+    GRANT_OPTION NUMBER (5) NOT NULL,
+    GRANTOR VARCHAR2(128) NULL,
+    GRANTOR_TYPE VARCHAR2(128) NULL,
+    PART_ID NUMBER NULL,
+    PRINCIPAL_NAME VARCHAR2(128) NULL,
+    PRINCIPAL_TYPE VARCHAR2(128) NULL,
+    PART_COL_PRIV VARCHAR2(128) NULL
+);
+




CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

Re: Where to get Oracle scripts for Hive Metastore

2013-05-21 Thread Raj Hadoop
I got it. This is the link.
 
http://svn.apache.org/viewvc/hive/trunk/metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql?revision=1329416&view=co&pathrev=1329416


____
From: Raj Hadoop 
To: Hive ; User  
Sent: Tuesday, May 21, 2013 3:08 PM
Subject: Where to get Oracle scripts for Hive Metastore



I am trying to get Oracle scripts for Hive Metastore.

http://mail-archives.apache.org/mod_mbox/hive-commits/201204.mbox/%3c20120423201303.9742b2388...@eris.apache.org%3E

The scripts in the above link has a  + at the begining of each line. How should 
I supposed to execute scripts like this through Oracle sqlplus.

+CREATE TABLE PART_COL_PRIVS
+(
+    PART_COLUMN_GRANT_ID NUMBER NOT NULL,
+    "COLUMN_NAME" VARCHAR2(128) NULL,
+    CREATE_TIME NUMBER (10) NOT NULL,
+    GRANT_OPTION NUMBER (5) NOT NULL,
+    GRANTOR VARCHAR2(128) NULL,
+    GRANTOR_TYPE VARCHAR2(128) NULL,
+    PART_ID NUMBER NULL,
+    PRINCIPAL_NAME VARCHAR2(128) NULL,
+    PRINCIPAL_TYPE VARCHAR2(128) NULL,
+    PART_COL_PRIV VARCHAR2(128) NULL
+);
+

Where to get Oracle scripts for Hive Metastore

2013-05-21 Thread Raj Hadoop
I am trying to get Oracle scripts for Hive Metastore.
 
http://mail-archives.apache.org/mod_mbox/hive-commits/201204.mbox/%3c20120423201303.9742b2388...@eris.apache.org%3E
 
The scripts in the above link has a  + at the begining of each line. How should 
I supposed to execute scripts like this through Oracle sqlplus.
 
+CREATE TABLE PART_COL_PRIVS
+(
+    PART_COLUMN_GRANT_ID NUMBER NOT NULL,
+    "COLUMN_NAME" VARCHAR2(128) NULL,
+    CREATE_TIME NUMBER (10) NOT NULL,
+    GRANT_OPTION NUMBER (5) NOT NULL,
+    GRANTOR VARCHAR2(128) NULL,
+    GRANTOR_TYPE VARCHAR2(128) NULL,
+    PART_ID NUMBER NULL,
+    PRINCIPAL_NAME VARCHAR2(128) NULL,
+    PRINCIPAL_TYPE VARCHAR2(128) NULL,
+    PART_COL_PRIV VARCHAR2(128) NULL
+);
+

Re: hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Raj Hadoop
Thanks Sanjay




From: Sanjay Subramanian 
To: bharath vissapragada ; 
"user@hive.apache.org" ; Raj Hadoop  
Cc: User  
Sent: Tuesday, May 21, 2013 2:27 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



Hi Raj

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Quick-Start/cdh4qs_topic_3.html

Installing CDH4 on a Single Linux Node in Pseudo-distributed Mode

On the left panel of the page u will find info on Hive installation etc.

I suggest CHD4 distribution only because it helps u to get started quickly…as 
developers I love to install from individual tar balls but sometimes there is 
little time to learn and execute

There are some great notes here 

sanjay

From: bharath vissapragada 
Date: Tuesday, May 21, 2013 11:12 AM
To: "user@hive.apache.org" , Raj Hadoop 

Cc: Sanjay Subramanian , User 

Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory




Yes !

On Tue, May 21, 2013 at 11:41 PM, Raj Hadoop  wrote:

So that means I need to create a HDFS ( Not an OS physical directory ) 
directory under Hadoop that need to be used in the Hive config file for this 
property. Right?
>
>
>
>From: Dean Wampler 
>To: Raj Hadoop  
>Cc: Sanjay Subramanian ; 
>"user@hive.apache.org" ; User  
>Sent: Tuesday, May 21, 2013 2:06 PM 
>
>Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
>directory
>
>
>
>No, you only need a directory in HDFS, which will be "virtually located" 
>somewhere in your cluster automatically by HDFS. 
>
>
>Also there's a typo in your hive.xml:
>
>
>  
>
>Should be
>
>
>  /correct/path/in/hdfs/to/your/warehouse/directory
>
>
>On Tue, May 21, 2013 at 1:04 PM, Raj Hadoop  wrote:
>
>Thanks Sanjay.
>> 
>>My environment is  like this.
>> 
>>$ echo $HADOOP_HOME
>>/software/home/hadoop/hadoop/hadoop-1.1.2
>> 
>>$ echo $HIVE_HOME
>>/software/home/hadoop/hive/hive-0.9.0
>>
>>$ id
>>uid=50052(hadoop) gid=600(apps) groups=600(apps)
>>
>> 
>>So can i do like this:
>> 
>>$pwd
>>/software/home/hadoop/hive/hive-0.9.0
>> 
>>$mkdir warehouse
>> 
>>$cd /software/home/hadoop/hive/hive-0.9.0/warehouse
>> 
>>$ in hive-site.xml
>>  hive.metastore.warehouse.dir
>>  
>>  location of default database for the warehouse 
>>
>> 
>>Where should I create the HDFS directory ?
>> 
>>
>>From: Sanjay Subramanian 
>>To: "user@hive.apache.org" ; Raj Hadoop 
>>; Dean Wampler  
>>Cc: User  
>>Sent: Tuesday, May 21, 2013 1:53 PM 
>>
>>Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
>>directory
>>
>>
>>
>>Notes below
>>
>>From: Raj Hadoop 
>>Reply-To: "user@hive.apache.org" , Raj Hadoop 
>>
>>Date: Tuesday, May 21, 2013 10:49 AM
>>To: Dean Wampler , "user@hive.apache.org" 
>>
>>Cc: User 
>>Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
>>directory
>>
>>
>>
>>Ok.I got it. My questions -
>> 
>>1) Should a local physical directory be created before using this property?
>>I created a directory in HDFS during Hive installation
>>/user/hive/warehouse
>>
>>
>>My hive-site.xml has the following property defined
>>
>>
>>
>>  hive.metastore.warehouse.dir
>>  /user/hive/warehouse
>>  location of default database for the warehouse 
>>
>>
>>2) Should a HDFS file directory be created from Hadoop before using this 
>>property?
>>hdfs dfs -mkdir /user/hive/warehouse
>>Change the owner:group to hive:hive 
>> 
>>
>>
>>From: Dean Wampler 
>>To: user@hive.apache.org; Raj Hadoop  
>>Cc: User  
>>Sent: Tuesday, May 21, 2013 1:44 PM
>>Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
>>directory
>>
>>
>>
>>The name is misleading; this is the directory within HDFS where Hive stores 
>>the data, by default. (External tables can go elsewhere). It doesn't really 
>>have anything to do with the metastore. 
>>
>>
>>dean
>>
>>
>>On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop  wrote:
>>
>>Can some one help me on this ? I am stuck installing and configuring Hive 
>>with Oracle. Your timely help is really aprreciated.
>>>
>>>
>>>
>>>From: Raj Hadoop 
>>>To: Hive ; User  
>>>Sent: Tuesday, May 21, 2013 1:08 PM
>>>S

Re: hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Raj Hadoop
So that means I need to create a HDFS ( Not an OS physical directory ) 
directory under Hadoop that need to be used in the Hive config file for this 
property. Right?




From: Dean Wampler 
To: Raj Hadoop  
Cc: Sanjay Subramanian ; 
"user@hive.apache.org" ; User  
Sent: Tuesday, May 21, 2013 2:06 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



No, you only need a directory in HDFS, which will be "virtually located" 
somewhere in your cluster automatically by HDFS. 

Also there's a typo in your hive.xml:

  
Should be

  /correct/path/in/hdfs/to/your/warehouse/directory

On Tue, May 21, 2013 at 1:04 PM, Raj Hadoop  wrote:

Thanks Sanjay.
> 
>My environment is  like this.
> 
>$ echo $HADOOP_HOME
>/software/home/hadoop/hadoop/hadoop-1.1.2
> 
>$ echo $HIVE_HOME
>/software/home/hadoop/hive/hive-0.9.0
>
>$ id
>uid=50052(hadoop) gid=600(apps) groups=600(apps)
>
> 
>So can i do like this:
> 
>$pwd
>/software/home/hadoop/hive/hive-0.9.0
> 
>$mkdir warehouse
> 
>$cd /software/home/hadoop/hive/hive-0.9.0/warehouse
> 
>$ in hive-site.xml
>  hive.metastore.warehouse.dir
>  
>  location of default database for the warehouse 
>
> 
>Where should I create the HDFS directory ?
> 
>
>From: Sanjay Subramanian 
>To: "user@hive.apache.org" ; Raj Hadoop 
>; Dean Wampler  
>Cc: User  
>Sent: Tuesday, May 21, 2013 1:53 PM 
>
>Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
>directory
>
>
>
>Notes below
>
>From: Raj Hadoop 
>Reply-To: "user@hive.apache.org" , Raj Hadoop 
>
>Date: Tuesday, May 21, 2013 10:49 AM
>To: Dean Wampler , "user@hive.apache.org" 
>
>Cc: User 
>Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
>directory
>
>
>
>Ok.I got it. My questions -
> 
>1) Should a local physical directory be created before using this property?
>I created a directory in HDFS during Hive installation
>/user/hive/warehouse
>
>
>My hive-site.xml has the following property defined
>
>
>
>  hive.metastore.warehouse.dir
>  /user/hive/warehouse
>  location of default database for the warehouse 
>
>
>2) Should a HDFS file directory be created from Hadoop before using this 
>property?
>hdfs dfs -mkdir /user/hive/warehouse
>Change the owner:group to hive:hive 
> 
>
>
>From: Dean Wampler 
>To: user@hive.apache.org; Raj Hadoop  
>Cc: User  
>Sent: Tuesday, May 21, 2013 1:44 PM
>Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
>directory
>
>
>
>The name is misleading; this is the directory within HDFS where Hive stores 
>the data, by default. (External tables can go elsewhere). It doesn't really 
>have anything to do with the metastore. 
>
>
>dean
>
>
>On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop  wrote:
>
>Can some one help me on this ? I am stuck installing and configuring Hive with 
>Oracle. Your timely help is really aprreciated.
>>
>>
>>
>>From: Raj Hadoop 
>>To: Hive ; User  
>>Sent: Tuesday, May 21, 2013 1:08 PM
>>Subject: hive.metastore.warehouse.dir - Should it point to a physical 
>>directory
>>
>>
>>
>>Hi,
>>
>>I am configurinig Hive. I ahve a question on the property 
>>hive.metastore.warehouse.dir.
>>
>>Should this point to a physical directory. I am guessing it is a logical 
>>directory under Hadoop fs.default.name. Please advise whether I need to 
>>create any directory for the variable hive.metastore.warehouse.dir
>>
>>Thanks,
>>Raj
>>
>>
>
>
>
>-- 
>Dean Wampler, Ph.D.
>@deanwampler
>http://polyglotprogramming.com/
>
>
>
>CONFIDENTIALITY NOTICE
>==
>This email message and any attachments are for the exclusive use of the 
>intended recipient(s) and may contain confidential and privileged information. 
>Any unauthorized review, use, disclosure or distribution is prohibited. If you 
>are not the intended recipient, please contact the sender by reply email and 
>destroy all copies of the original message along with any attachments, from 
>your computer system. If you are the intended recipient, please be advised 
>that the content of this message is subject to access, review and disclosure 
>by the sender's Email System Administrator.
>
>
>


-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com/ 

Re: hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Raj Hadoop
yes thats what i meant. local physical directory. thanks.




From: bharath vissapragada 
To: user@hive.apache.org; Raj Hadoop  
Cc: User  
Sent: Tuesday, May 21, 2013 1:59 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



Hi, 

If by "local physical directory" you mean a directory in the underlying OS file 
system, then No. You just need to create a directory in HDFS and ad it to that 
xml config file.

Thanks,



On Tue, May 21, 2013 at 11:19 PM, Raj Hadoop  wrote:

Ok.I got it. My questions -
> 
>1) Should a local physical directory be created before using this property?
>2) Should a HDFS file directory be created from Hadoop before using this 
>property?
> 
> 
>
>
>From: Dean Wampler 
>To: user@hive.apache.org; Raj Hadoop  
>Cc: User  
>Sent: Tuesday, May 21, 2013 1:44 PM
>Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
>directory
>
>
>
>The name is misleading; this is the directory within HDFS where Hive stores 
>the data, by default. (External tables can go elsewhere). It doesn't really 
>have anything to do with the metastore. 
>
>
>dean
>
>
>On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop  wrote:
>
>Can some one help me on this ? I am stuck installing and configuring Hive with 
>Oracle. Your timely help is really aprreciated.
>>
>>
>>
>>From: Raj Hadoop 
>>To: Hive ; User  
>>Sent: Tuesday, May 21, 2013 1:08 PM
>>Subject: hive.metastore.warehouse.dir - Should it point to a physical 
>>directory
>>
>>
>>
>>Hi,
>>
>>I am configurinig Hive. I ahve a question on the property 
>>hive.metastore.warehouse.dir.
>>
>>Should this point to a physical directory. I am guessing it is a logical 
>>directory under Hadoop fs.default.name. Please advise whether I need to 
>>create any directory for the variable hive.metastore.warehouse.dir
>>
>>Thanks,
>>Raj
>>
>>
>
>
>
>-- 
>Dean Wampler, Ph.D.
>@deanwampler
>http://polyglotprogramming.com/ 
>
>

Re: hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Raj Hadoop
Thanks Sanjay.
 
My environment is  like this.
 
$ echo $HADOOP_HOME
/software/home/hadoop/hadoop/hadoop-1.1.2
 
$ echo $HIVE_HOME
/software/home/hadoop/hive/hive-0.9.0

$ id
uid=50052(hadoop) gid=600(apps) groups=600(apps)

 
So can i do like this:
 
$pwd
/software/home/hadoop/hive/hive-0.9.0
 
$mkdir warehouse
 
$cd /software/home/hadoop/hive/hive-0.9.0/warehouse
 
$ in hive-site.xml
  hive.metastore.warehouse.dir
  
  location of default database for the warehouse 

 
Where should I create the HDFS directory ?
 
 


From: Sanjay Subramanian 
To: "user@hive.apache.org" ; Raj Hadoop 
; Dean Wampler  
Cc: User  
Sent: Tuesday, May 21, 2013 1:53 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



Notes below
From: Raj Hadoop 
Reply-To: "user@hive.apache.org" , Raj Hadoop 

Date: Tuesday, May 21, 2013 10:49 AM
To: Dean Wampler , "user@hive.apache.org" 

Cc: User 
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory


Ok.I got it. My questions -
 
1) Should a local physical directory be created before using this property?
I created a directory in HDFS during Hive installation
/user/hive/warehouse

My hive-site.xml has the following property defined



  hive.metastore.warehouse.dir
  /user/hive/warehouse
  location of default database for the warehouse 

2) Should a HDFS file directory be created from Hadoop before using this 
property?
hdfs dfs -mkdir /user/hive/warehouse
Change the owner:group to hive:hive 
 



From: Dean Wampler 
To: user@hive.apache.org; Raj Hadoop  
Cc: User  
Sent: Tuesday, May 21, 2013 1:44 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



The name is misleading; this is the directory within HDFS where Hive stores the 
data, by default. (External tables can go elsewhere). It doesn't really have 
anything to do with the metastore. 

dean


On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop  wrote:

Can some one help me on this ? I am stuck installing and configuring Hive with 
Oracle. Your timely help is really aprreciated.
>
>
>
>From: Raj Hadoop 
>To: Hive ; User  
>Sent: Tuesday, May 21, 2013 1:08 PM
>Subject: hive.metastore.warehouse.dir - Should it point to a physical directory
>
>
>
>Hi,
>
>I am configurinig Hive. I ahve a question on the property 
>hive.metastore.warehouse.dir.
>
>Should this point to a physical directory. I am guessing it is a logical 
>directory under Hadoop fs.default.name. Please advise whether I need to create 
>any directory for the variable hive.metastore.warehouse.dir
>
>Thanks,
>Raj
>
>


-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com/



CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

Re: hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Raj Hadoop
Ok.I got it. My questions -
 
1) Should a local physical directory be created before using this property?
2) Should a HDFS file directory be created from Hadoop before using this 
property?
 
 



From: Dean Wampler 
To: user@hive.apache.org; Raj Hadoop  
Cc: User  
Sent: Tuesday, May 21, 2013 1:44 PM
Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical 
directory



The name is misleading; this is the directory within HDFS where Hive stores the 
data, by default. (External tables can go elsewhere). It doesn't really have 
anything to do with the metastore. 

dean


On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop  wrote:

Can some one help me on this ? I am stuck installing and configuring Hive with 
Oracle. Your timely help is really aprreciated.
>
>
>
>From: Raj Hadoop 
>To: Hive ; User  
>Sent: Tuesday, May 21, 2013 1:08 PM
>Subject: hive.metastore.warehouse.dir - Should it point to a physical directory
>
>
>
>Hi,
>
>I am configurinig Hive. I ahve a question on the property 
>hive.metastore.warehouse.dir.
>
>Should this point to a physical directory. I am guessing it is a logical 
>directory under Hadoop fs.default.name. Please advise whether I need to create 
>any directory for the variable hive.metastore.warehouse.dir
>
>Thanks,
>Raj
>
>


-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com/ 

Re: hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Raj Hadoop
Can some one help me on this ? I am stuck installing and configuring Hive with 
Oracle. Your timely help is really aprreciated.




From: Raj Hadoop 
To: Hive ; User  
Sent: Tuesday, May 21, 2013 1:08 PM
Subject: hive.metastore.warehouse.dir - Should it point to a physical directory



Hi,

I am configurinig Hive. I ahve a question on the property 
hive.metastore.warehouse.dir.

Should this point to a physical directory. I am guessing it is a logical 
directory under Hadoop fs.default.name. Please advise whether I need to create 
any directory for the variable hive.metastore.warehouse.dir

Thanks,
Raj

hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Raj Hadoop
Hi,
 
I am configurinig Hive. I ahve a question on the property 
hive.metastore.warehouse.dir.
 
Should this point to a physical directory. I am guessing it is a logical 
directory under Hadoop fs.default.name. Please advise whether I need to create 
any directory for the variable hive.metastore.warehouse.dir
 
Thanks,
Raj

  1   2   >