sqoop issue
I'm using hadoop 2.5.1 and sqoop 1.4.6. I am using sqoop import for importing table from mysql database to be used with hadoop. It is showing following error Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.fs.FSOutputSummer
How to handle RAW data type of oracle in SQOOP import
How to handle RAW data type of oracle in SQOOP import
Re: Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
seems its already present Amrit, hdpmaster001:~ # useradd -G hdfs root useradd: Account `root' already exists. hdpmaster001:~ # On Wed, Oct 5, 2016 at 2:46 PM, Amrit Jangid wrote: > Hi Raj > > Do add root user into hdfs group. > Run this command on your NameNode server. > > > useradd -G hdfs root > > On Wed, Oct 5, 2016 at 2:07 PM, Raj hadoop wrote: > >> Im getting it when im trying to start hive >> >> hdpmaster001:~ # hive >> WARNING: Use "yarn jar" to launch YARN applications. >> >> how can I execute the same, >> Thanks, >> Raj. >> >> On Wed, Oct 5, 2016 at 1:56 PM, Raj hadoop wrote: >> >>> Hi All, >>> >>> Could someone help in to solve this issue, >>> >>> Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/h >>> ive-log4j.properties >>> Exception in thread "main" java.lang.RuntimeException: >>> org.apache.hadoop.security.AccessControlException: Permission denied: >>> user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x >>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >>> heck(FSPermissionChecker.java:319) >>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >>> heck(FSPermissionChecker.java:292) >>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >>> heckPermission(FSPermissionChecker.java:213) >>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >>> heckPermission(FSPermissionChecker.java:190) >>> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPerm >>> ission(FSDirectory.java:1780) >>> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPerm >>> ission(FSDirectory.java:1764) >>> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAnce >>> storAccess(FSDirectory.java:1747) >>> at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(F >>> SDirMkdirOp.java:71) >>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(F >>> SNamesystem.java:3972) >>> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkd >>> irs(NameNodeRpcServer.java:1081) >>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ >>> erSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTr >>> anslatorPB.java:630) >>> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocol >>> Protos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNam >>> enodeProtocolProtos.java) >>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn >>> voker.call(ProtobufRpcEngine.java:616) >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:422) >>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro >>> upInformation.java:1709) >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200) >>> >>> at org.apache.hadoop.hive.ql.session.SessionState.start(Session >>> State.java:516) >>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680) >>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >>> ssorImpl.java:62) >>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>> thodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) >>> Caused by: org.apache.hadoop.security.AccessControlException: >>> Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:d >>> rwxr-xr-x >>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >>> heck(FSPermissionChecker.java:319) >>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >>> heck(FSPermissionChecker.java:292) >>> at org.apache
Re: Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
Im getting it when im trying to start hive hdpmaster001:~ # hive WARNING: Use "yarn jar" to launch YARN applications. how can I execute the same, Thanks, Raj. On Wed, Oct 5, 2016 at 1:56 PM, Raj hadoop wrote: > Hi All, > > Could someone help in to solve this issue, > > Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/ > hive-log4j.properties > Exception in thread "main" java.lang.RuntimeException: > org.apache.hadoop.security.AccessControlException: Permission denied: > user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x > at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker. > check(FSPermissionChecker.java:319) > at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker. > check(FSPermissionChecker.java:292) > at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker. > checkPermission(FSPermissionChecker.java:213) > at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker. > checkPermission(FSPermissionChecker.java:190) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory. > checkPermission(FSDirectory.java:1780) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory. > checkPermission(FSDirectory.java:1764) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory. > checkAncestorAccess(FSDirectory.java:1747) > at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs( > FSDirMkdirOp.java:71) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs( > FSNamesystem.java:3972) > at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer. > mkdirs(NameNodeRpcServer.java:1081) > at org.apache.hadoop.hdfs.protocolPB. > ClientNamenodeProtocolServerSideTranslatorPB.mkdirs( > ClientNamenodeProtocolServerSideTranslatorPB.java:630) > at org.apache.hadoop.hdfs.protocol.proto. > ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod( > ClientNamenodeProtocolProtos.java) > at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ > ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1709) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200) > > at org.apache.hadoop.hive.ql.session.SessionState.start( > SessionState.java:516) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: org.apache.hadoop.security.AccessControlException: Permission > denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x > at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker. > check(FSPermissionChecker.java:319) > at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker. > check(FSPermissionChecker.java:292) > at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker. > checkPermission(FSPermissionChecker.java:213) > at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker. > checkPermission(FSPermissionChecker.java:190) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory. > checkPermission(FSDirectory.java:1780) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory. > checkPermission(FSDirectory.java:1764) > at org.apache.hadoop.hdfs.server.namenode.FSDirectory. > checkAncestorAccess(FSDirectory.java:1747) > at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs( > FSDirMkdirOp.java:71) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs( > FSNamesystem.java:3972) > at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer. > mkdirs(NameNodeRpcServer.java:1081) > at org.apache.hadoop.hdfs.protocolPB. > ClientNamenodeProtocolServerSideTranslatorPB.mkdirs( > ClientNamenodePro
Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
Hi All, Could someone help in to solve this issue, Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/hive-log4j.properties Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1780) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1764) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1747) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3972) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1081) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:516) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1780) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1764) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1747) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3972) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1081) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInfo
Re: hive concurrency not working
Thanks everyone.. we are raising case with Hortonworks On Wed, Aug 3, 2016 at 6:44 PM, Raj hadoop wrote: > Dear All, > > In need or your help, > > we have horton works 4 node cluster,and the problem is hive is allowing > only one user at a time, > > if any second resource need to login hive is not working, > > could someone please help me in this > > Thanks, > Rajesh >
hive concurrency not working
Dear All, In need or your help, we have horton works 4 node cluster,and the problem is hive is allowing only one user at a time, if any second resource need to login hive is not working, could someone please help me in this Thanks, Rajesh
Re: Unable to start Hive CLI after install
Hi Mich -I did all those steps. Some how i am not able to find out whats the issue. Can you suggest any debugging tips ?Regards,Rajendra On Monday, April 4, 2016 12:16 PM, Mich Talebzadeh wrote: HI Raj, Hive 2 is as good to go :) Check this I see that you are using Oracle DB as your metastore. Mine is Oracle as well javax.jdo.option.ConnectionURL jdbc:oracle:thin:@rhes564:1521:mydb JDBC connect string for a JDBC metastore Also need username/password for your metastore javax.jdo.option.ConnectionUserName hiveuser Username to use against metastore database javax.jdo.option.ConnectionPassword xxx password to use against metastore database Now you also need to put Oracle jar file ojdbc6.jar in $HIVE_HOME/lib otherwise you won't be able to connect HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com On 4 April 2016 at 20:02, Raj Hadoop wrote: Sorry in a typo with your name - Mich. On Monday, April 4, 2016 12:01 PM, Raj Hadoop wrote: Thanks Mike. If Hive 2.0 is stable - i would definitely go for it. But let me troubleshoot 1.1.1 issues i am facing now. here is my hive-site.xml. Can you please let me know if i am missing anything. hive.exec.scratchdir /tmp/hive hive.metastore.local false hive.metastore.warehouse.dir hdfs://z1:8899/user/hive/warehouse javax.jdo.option.ConnectionURL jdbc:oracle:thin:@//z4:1521/xe javax.jdo.option.ConnectionDriverName com.oracle.jdbc.Driver javax.jdo.option.ConnectionUserName hive javax.jdo.option.ConnectionPassword hive hive.querylog.location $HIVE_HOME/iotmp Location of Hive run time structured log file hive.exec.local.scratchdir $HIVE_HOME/iotmp Local scratch space for Hive jobs hive.downloaded.resources.dir $HIVE_HOME/iotmp Temporary local directory for added resources in the remote file system. On Monday, April 4, 2016 11:46 AM, Mich Talebzadeh wrote: Interesting why you did not download Hive 2.0 which is out now The error says: HiveConf of name hive.metastore.local does not exist In you hive-site.xml how have you configured parameters for hive.metastore? HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com On 4 April 2016 at 18:25, Raj Hadoop wrote: Hi, I have downloaded apache hive 1.1.1 and trying to setup hive environment in my hadoop cluster. On one of the nodes i installed hive and when i set all the variables and environment i am getting the following error.Please advise. [hadoop@z1 bin]$ hive 2016-04-04 10:12:45,686 WARN [main] conf.HiveConf (HiveConf.java:initialize(2605)) - HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in jar:file:/home/hadoop/hive/hive111/lib/hive-common-1.1.1.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop262/hadoop262/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hive/hive111/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1485) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:64) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860) at org.apache.hadoop.hive.ql.session.Ses
Re: Unable to start Hive CLI after install
Thanks Mike. If Hive 2.0 is stable - i would definitely go for it. But let me troubleshoot 1.1.1 issues i am facing now. here is my hive-site.xml. Can you please let me know if i am missing anything. hive.exec.scratchdir /tmp/hive hive.metastore.local false hive.metastore.warehouse.dir hdfs://z1:8899/user/hive/warehouse javax.jdo.option.ConnectionURL jdbc:oracle:thin:@//z4:1521/xe javax.jdo.option.ConnectionDriverName com.oracle.jdbc.Driver javax.jdo.option.ConnectionUserName hive javax.jdo.option.ConnectionPassword hive hive.querylog.location $HIVE_HOME/iotmp Location of Hive run time structured log file hive.exec.local.scratchdir $HIVE_HOME/iotmp Local scratch space for Hive jobs hive.downloaded.resources.dir $HIVE_HOME/iotmp Temporary local directory for added resources in the remote file system. On Monday, April 4, 2016 11:46 AM, Mich Talebzadeh wrote: Interesting why you did not download Hive 2.0 which is out now The error says: HiveConf of name hive.metastore.local does not exist In you hive-site.xml how have you configured parameters for hive.metastore? HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com On 4 April 2016 at 18:25, Raj Hadoop wrote: Hi, I have downloaded apache hive 1.1.1 and trying to setup hive environment in my hadoop cluster. On one of the nodes i installed hive and when i set all the variables and environment i am getting the following error.Please advise. [hadoop@z1 bin]$ hive 2016-04-04 10:12:45,686 WARN [main] conf.HiveConf (HiveConf.java:initialize(2605)) - HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in jar:file:/home/hadoop/hive/hive111/lib/hive-common-1.1.1.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop262/hadoop262/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hive/hive111/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1485) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:64) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:453) ... 8 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483) ... 13 more Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:78 Regards,Raj
Re: Unable to start Hive CLI after install
Sorry in a typo with your name - Mich. On Monday, April 4, 2016 12:01 PM, Raj Hadoop wrote: Thanks Mike. If Hive 2.0 is stable - i would definitely go for it. But let me troubleshoot 1.1.1 issues i am facing now. here is my hive-site.xml. Can you please let me know if i am missing anything. hive.exec.scratchdir /tmp/hive hive.metastore.local false hive.metastore.warehouse.dir hdfs://z1:8899/user/hive/warehouse javax.jdo.option.ConnectionURL jdbc:oracle:thin:@//z4:1521/xe javax.jdo.option.ConnectionDriverName com.oracle.jdbc.Driver javax.jdo.option.ConnectionUserName hive javax.jdo.option.ConnectionPassword hive hive.querylog.location $HIVE_HOME/iotmp Location of Hive run time structured log file hive.exec.local.scratchdir $HIVE_HOME/iotmp Local scratch space for Hive jobs hive.downloaded.resources.dir $HIVE_HOME/iotmp Temporary local directory for added resources in the remote file system. On Monday, April 4, 2016 11:46 AM, Mich Talebzadeh wrote: Interesting why you did not download Hive 2.0 which is out now The error says: HiveConf of name hive.metastore.local does not exist In you hive-site.xml how have you configured parameters for hive.metastore? HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com On 4 April 2016 at 18:25, Raj Hadoop wrote: Hi, I have downloaded apache hive 1.1.1 and trying to setup hive environment in my hadoop cluster. On one of the nodes i installed hive and when i set all the variables and environment i am getting the following error.Please advise. [hadoop@z1 bin]$ hive 2016-04-04 10:12:45,686 WARN [main] conf.HiveConf (HiveConf.java:initialize(2605)) - HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in jar:file:/home/hadoop/hive/hive111/lib/hive-common-1.1.1.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop262/hadoop262/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hive/hive111/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1485) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:64) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:453) ... 8 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483) ... 13 more Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:78 Regards,Raj
Unable to start Hive CLI after install
Hi, I have downloaded apache hive 1.1.1 and trying to setup hive environment in my hadoop cluster. On one of the nodes i installed hive and when i set all the variables and environment i am getting the following error.Please advise. [hadoop@z1 bin]$ hive 2016-04-04 10:12:45,686 WARN [main] conf.HiveConf (HiveConf.java:initialize(2605)) - HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in jar:file:/home/hadoop/hive/hive111/lib/hive-common-1.1.1.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop262/hadoop262/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hive/hive111/lib/hive-jdbc-1.1.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1485) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:64) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:453) ... 8 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483) ... 13 more Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:78 Regards,Raj
HCatStorer error
We are facing below mentioned error on storing dataset using HCatStorer.Can someone please help us STORE F INTO 'default.CONTENT_SVC_USED' using org.apache.hive.hcatalog.pig.HCatStorer(); ERROR hive.log - Got exception: java.net.URISyntaxException Malformed escape pair at index 9: thrift://%HOSTGROUP::host_group_master1%:9933 java.net.URISyntaxException: Malformed escape pair at index 9: thrift://%HOSTGROUP::host_group_master1%:9933 Thanks, Raj
select * from table and select column from table in hive
I am able to see the data in the table for all the columns when I issue the following - SELECT * FROM t1 WHERE dt1='2013-11-20' But I am unable to see the column data when i issue the following - SELECT cust_num FROM t1 WHERE dt1='2013-11-20' The above shows null values. How should I debug this ?
Re: Remove duplicate records in Hive
Thank you all for your suggestions. This group is the best. I am working with the different options you guys suggested. One big question I have is - I am good at writing Oracle SQL queries. But the syntax with Hive is different. Especially - wiritng multiple SELECT statements in a single Hive Query has become a challenge. Can the group suggest any good tutorial that explains the basics of "Syntax to develop complex queries in Hive". Regards, Rajendra On Thursday, September 11, 2014 2:48 AM, vivek thakre wrote: Considering that the records only differ by one column i.e if the first two columns are are unique (distinct), then you simply use group by with max as aggregation function to eliminate duplicates i,e select cno, sqno, max (date) from table group by cno, sqno If the above assumption is not true i.e if cno and sqno are not unique and for a particular cno, you want to get sqno with latest date, then you can do inner join with max select query something like select a.cno, a.sqno, a.date from table a join (select cno, max(date) as max_date from table group by cno) b on a.cno=b.cno and a.date = b.max_date On Wed, Sep 10, 2014 at 3:39 PM, Nishant Kelkar wrote: Try something like this then: > > >SELECT A.cno, A.sqno, A.sorted_dates[A.size-1] AS latest_date >FROM >( >SELECT cno, sqno, >SORT_ARRAY(COLLECT_SET(date)) AS sorted_dates, SIZE(COLLECT_SET(date)) AS size >FROM table GROUP BY cno, sqno >) A; > > > >There are better ways of doing this, but this one's quick and dirty :) > > >Best Regards, >Nishant Kelkar > > >On Wed, Sep 10, 2014 at 12:48 PM, Raj Hadoop wrote: > >sort_array returns in ascending order. so the first element cannot be the >largest date. the last element is the largest date. >> >> >> >> >>On Wednesday, September 10, 2014 3:38 PM, Nishant Kelkar >> wrote: >> >> >> >>Hi Raj, >> >> >>You'll have to change the format of your date to something like YYYY-MM-DD. >>For example, for "2-oct-2013" it will be 2013-10-02. >> >> >>Best Regards, >>Nishant Kelkar >> >> >> >> >> >>On Wed, Sep 10, 2014 at 11:48 AM, Raj Hadoop wrote: >> >>The >>> >>>SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date >>> >>>is returning the lowest date. I need the largest date. >>> >>> >>> >>> >>>On Wed, 9/10/14, Raj Hadoop wrote: >>> >>> Subject: Re: Remove duplicate records in Hive >>> To: user@hive.apache.org >>> Date: Wednesday, September 10, 2014, 2:41 PM >>> >>> >>> Thanks. I will try it. >>> >>> On Wed, 9/10/14, Nishant Kelkar >>> wrote: >>> >>> Subject: Re: Remove >>> duplicate records in Hive >>> To: user@hive.apache.org, >>> hadoop...@yahoo.com >>> Date: Wednesday, September 10, 2014, 1:59 >>> PM >>> >>> Hi >>> >>> Raj, >>> You can do something >>> along these lines: >>> >>> SELECT >>> cno, sqno, >>> SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date >>> FROM table GROUP BY cno, sqno; >>> However, you have to make sure your >>> date format is such that sorting it gives you >>> the most >>> recent date. The best way to do >>> that is to have it in >>> format: >>> -MM-DD. >>> Hope this helps. >>> Best Regards,Nishant >>> >>> Kelkar >>> On Wed, Sep 10, 2014 at >>> 10:04 AM, Raj Hadoop >>> wrote: >>> >>> >>> Hi, >>> >>> >>> >>> I have a requirement in Hive >>> to remove duplicate records ( >>> they differ >>> only by one column i.e a date column) and keep >>> the latest date record. >>> >>> >>> >>> Sample >>> : >>> >>> Hive Table : >>> >>> d2 is a higher >>> >>> cno,sqno,date >>> >>> >>> >>> 100 1 1-oct-2013 >>> >>> 101 2 1-oct-2013 >>> >>> 100 1 2-oct-2013 >>> >>> 102 2 2-oct-2013 >>> >>> >>> >>> >>> >>> Output needed: >>> >>> >>> >>> 100 1 2-oct-2013 >>> >>> 101 2 1-oct-2013 >>> >>> 102 2 2-oct-2013 >>> >>> >>> >>> I am using >>> Hive 0.11 >>> >>> >>> >>> Any suggestions please ? >>> >>> >>> >>> Regards, >>> >>> >>> Raj >>> >>> >>> >>> >> >> >> >
Re: Remove duplicate records in Hive
sort_array returns in ascending order. so the first element cannot be the largest date. the last element is the largest date. On Wednesday, September 10, 2014 3:38 PM, Nishant Kelkar wrote: Hi Raj, You'll have to change the format of your date to something like -MM-DD. For example, for "2-oct-2013" it will be 2013-10-02. Best Regards, Nishant Kelkar On Wed, Sep 10, 2014 at 11:48 AM, Raj Hadoop wrote: The > >SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date > >is returning the lowest date. I need the largest date. > > > >-------- >On Wed, 9/10/14, Raj Hadoop wrote: > > Subject: Re: Remove duplicate records in Hive > To: user@hive.apache.org > Date: Wednesday, September 10, 2014, 2:41 PM > > > Thanks. I will try it. > > On Wed, 9/10/14, Nishant Kelkar > wrote: > > Subject: Re: Remove > duplicate records in Hive > To: user@hive.apache.org, > hadoop...@yahoo.com > Date: Wednesday, September 10, 2014, 1:59 > PM > > Hi > > Raj, > You can do something > along these lines: > > SELECT > cno, sqno, > SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date > FROM table GROUP BY cno, sqno; > However, you have to make sure your > date format is such that sorting it gives you > the most > recent date. The best way to do > that is to have it in > format: > -MM-DD. > Hope this helps. > Best Regards,Nishant > > Kelkar > On Wed, Sep 10, 2014 at > 10:04 AM, Raj Hadoop > wrote: > > > Hi, > > > > I have a requirement in Hive > to remove duplicate records ( > they differ > only by one column i.e a date column) and keep > the latest date record. > > > > Sample > : > > Hive Table : > > d2 is a higher > > cno,sqno,date > > > > 100 1 1-oct-2013 > > 101 2 1-oct-2013 > > 100 1 2-oct-2013 > > 102 2 2-oct-2013 > > > > > > Output needed: > > > > 100 1 2-oct-2013 > > 101 2 1-oct-2013 > > 102 2 2-oct-2013 > > > > I am using > Hive 0.11 > > > > Any suggestions please ? > > > > Regards, > > > Raj > > > >
Re: Remove duplicate records in Hive
The SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date is returning the lowest date. I need the largest date. On Wed, 9/10/14, Raj Hadoop wrote: Subject: Re: Remove duplicate records in Hive To: user@hive.apache.org Date: Wednesday, September 10, 2014, 2:41 PM Thanks. I will try it. On Wed, 9/10/14, Nishant Kelkar wrote: Subject: Re: Remove duplicate records in Hive To: user@hive.apache.org, hadoop...@yahoo.com Date: Wednesday, September 10, 2014, 1:59 PM Hi Raj, You can do something along these lines: SELECT cno, sqno, SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date FROM table GROUP BY cno, sqno; However, you have to make sure your date format is such that sorting it gives you the most recent date. The best way to do that is to have it in format: -MM-DD. Hope this helps. Best Regards,Nishant Kelkar On Wed, Sep 10, 2014 at 10:04 AM, Raj Hadoop wrote: Hi, I have a requirement in Hive to remove duplicate records ( they differ only by one column i.e a date column) and keep the latest date record. Sample : Hive Table : d2 is a higher cno,sqno,date 100 1 1-oct-2013 101 2 1-oct-2013 100 1 2-oct-2013 102 2 2-oct-2013 Output needed: 100 1 2-oct-2013 101 2 1-oct-2013 102 2 2-oct-2013 I am using Hive 0.11 Any suggestions please ? Regards, Raj
Re: Remove duplicate records in Hive
Thanks. I will try it. On Wed, 9/10/14, Nishant Kelkar wrote: Subject: Re: Remove duplicate records in Hive To: user@hive.apache.org, hadoop...@yahoo.com Date: Wednesday, September 10, 2014, 1:59 PM Hi Raj, You can do something along these lines: SELECT cno, sqno, SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date FROM table GROUP BY cno, sqno; However, you have to make sure your date format is such that sorting it gives you the most recent date. The best way to do that is to have it in format: -MM-DD. Hope this helps. Best Regards,Nishant Kelkar On Wed, Sep 10, 2014 at 10:04 AM, Raj Hadoop wrote: Hi, I have a requirement in Hive to remove duplicate records ( they differ only by one column i.e a date column) and keep the latest date record. Sample : Hive Table : d2 is a higher cno,sqno,date 100 1 1-oct-2013 101 2 1-oct-2013 100 1 2-oct-2013 102 2 2-oct-2013 Output needed: 100 1 2-oct-2013 101 2 1-oct-2013 102 2 2-oct-2013 I am using Hive 0.11 Any suggestions please ? Regards, Raj
Remove duplicate records in Hive
Hi, I have a requirement in Hive to remove duplicate records ( they differ only by one column i.e a date column) and keep the latest date record. Sample : Hive Table : d2 is a higher cno,sqno,date 100 1 1-oct-2013 101 2 1-oct-2013 100 1 2-oct-2013 102 2 2-oct-2013 Output needed: 100 1 2-oct-2013 101 2 1-oct-2013 102 2 2-oct-2013 I am using Hive 0.11 Any suggestions please ? Regards, Raj
Can I update just one row in Hive table using Hive INSERT OVERWRITE
Can I update ( delete and insert kind of)just one row keeping the remaining rows intact in Hive table using Hive INSERT OVERWRITE. There is no partition in the Hive table. INSERT OVERWRITE TABLE tablename SELECT col1,col2,col3 from tabx where col2='abc'; Does the above work ? Please advise.
Re: HiveThrift Service Issue
Hi Szehon, It is not showing on the http://xyzserver:50030/jobtracker.jsp. I checked this log. and this shows as - /tmp/root/hive.log exec.ExecDriver (ExecDriver.java:addInputPaths(853)) - Processing alias table_emp exec.ExecDriver (ExecDriver.java:addInputPaths(871)) - Adding input file hdfs://xyzserver:8020/user/hive/warehouse/table_emp 2014-03-20 11:57:26,352 INFO exec.ExecDriver (ExecDriver.java:createTmpDirs(221)) - Making Temp Directory: hdfs://xyzserver:8020:8020/tmp/hive-root/hi ve_2014-03-20_11-57-25_822_1668300320164798948-3/-ext-10001 2014-03-20 11:57:26,377 ERROR security.UserGroupInformation (UserGroupInformation.java:doAs(1411)) - PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.sec On Thursday, March 20, 2014 3:53 PM, Szehon Ho wrote: Hi Raj, There are map-reduce job logs generated if the MapRedTask fails, those might give some clue. Thanks, Szehon On Thu, Mar 20, 2014 at 12:29 PM, Raj Hadoop wrote: I am struggling on this one. Can any one throw some pointers on how to troubelshoot this issue please? > > > > >On Thursday, March 20, 2014 3:09 PM, Raj Hadoop wrote: > >Hello everyone, > > >The HiveThrift Service was started succesfully. > > > >netstat -nl | grep 1 > > > > >tcp 0 0 0.0.0.0:1 0.0.0.0:* >LISTEN > > > > > >I am able to read tables from Hive through Tableau. When executing queries >through Tableau I am getting the following error - > > >Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 >from org.apache.hadoop.hive.ql.exec.MapRedTask > > >Can any one suggest what the problem is ? > > >Regards, >Raj > > >
Re: HiveThrift Service Issue
I am struggling on this one. Can any one throw some pointers on how to troubelshoot this issue please? On Thursday, March 20, 2014 3:09 PM, Raj Hadoop wrote: Hello everyone, The HiveThrift Service was started succesfully. netstat -nl | grep 1 tcp 0 0 0.0.0.0:1 0.0.0.0:* LISTEN I am able to read tables from Hive through Tableau. When executing queries through Tableau I am getting the following error - Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask Can any one suggest what the problem is ? Regards, Raj
HiveThrift Service Issue
Hello everyone, The HiveThrift Service was started succesfully. netstat -nl | grep 1 tcp 0 0 0.0.0.0:1 0.0.0.0:* LISTEN I am able to read tables from Hive through Tableau. When executing queries through Tableau I am getting the following error - Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask Can any one suggest what the problem is ? Regards, Raj
Re: Hive append
Hi Nitin, existing records should remain same and the new records should get inserted into the table On Thu, Mar 6, 2014 at 2:11 PM, Nitin Pawar wrote: > are you talking about adding new records to tables or updating records in > already existing table? > > > On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop wrote: > >> Query in HIVE >> >> >> >> I tried merge kind of operation in Hive to retain the existing records >> and append the new records instead of dropping the table and populating it >> again. >> >> >> >> If anyone can come help with any other approach other than this or the >> approach to perform merge operation >> >> >> >> will be great help >> > > > > -- > Nitin Pawar >
Re: Hive append
Hi Nitin, existing records should remain same and the new records should get inserted into the table On Thu, Mar 6, 2014 at 2:11 PM, Nitin Pawar wrote: > are you talking about adding new records to tables or updating records in > already existing table? > > > On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop wrote: > >> Query in HIVE >> >> >> >> I tried merge kind of operation in Hive to retain the existing records >> and append the new records instead of dropping the table and populating it >> again. >> >> >> >> If anyone can come help with any other approach other than this or the >> approach to perform merge operation >> >> >> >> will be great help >> > > > > -- > Nitin Pawar >
Hive append
Query in HIVE I tried merge kind of operation in Hive to retain the existing records and append the new records instead of dropping the table and populating it again. If anyone can come help with any other approach other than this or the approach to perform merge operation will be great help
Re: Merge records in hive
Thanks a lot Sunjay, Any more thoughts on this, Im okie with if any alternative for Merge concept in Hive On Thu, Mar 6, 2014 at 2:02 AM, Subramanian, Sanjay (HQP) < sanjay.subraman...@roberthalf.com> wrote: > Hey Raj > > Maybe I am misunderstanding the question but u don't really have to do > anything fancy to merge > > ONE TIME > > CREATE EXTERNAL TABLE employee ( > empnoBIGINT, > ename STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ; > > ALTER TABLE employee SET LOCATION > 'hdfs://path/to/dir/on/hdfs/containing/files' > > Or if u r using AMAZON EMR : > ALTER TABLE employee SET LOCATION > 's3://bucketname/path/to/subfolder/containing/files' > > Now if u keep putting files into this HDFS dir > 'hdfs://path/to/dir/on/hdfs/containing/files' > U should not have to do anything > > Thanks > > Warm Regards > > > Sanjay > > linkedin:http://www.linkedin.com/in/subramaniansanjay > > From: Raj hadoop > Reply-To: "user@hive.apache.org" > Date: Wednesday, March 5, 2014 at 4:16 AM > To: "user@hive.apache.org" > Subject: Merge records in hive > > Hi, > > > > Help required to merge data in hive, > > > > Ex: > > Today file > > - > > Empno ename > > 1 abc > > 2 def > > 3 ghi > > > > Tomorrow file > > - > > Empno ename > > 5 abcd > > 6 defg > > 7 ghij > > > > > > Reg: should not drop the hive table and then create it,what I actually > require is as shown in the example we have to merge the data, > > > > Thanks, > > Raj >
Merge records in hive
Hi, Help required to merge data in hive, Ex: Today file - Empno ename 1 abc 2 def 3 ghi Tomorrow file - Empno ename 5 abcd 6 defg 7 ghij Reg: should not drop the hive table and then create it,what I actually require is as shown in the example we have to merge the data, Thanks, Raj
Re: Sqoop import to HDFS and then Hive table - Issue with data type
I am surprised. So you mean to say should I keep on running the same command multiple times -] Alter table It was a timestamp and then I later changed to string. Please advise. On Tuesday, March 4, 2014 5:30 PM, Edward Capriolo wrote: No hive meta data does not change how the data is stored. Just keep changing until you get it right :) On Tue, Mar 4, 2014 at 5:23 PM, Raj Hadoop wrote: All, > > >I loaded data from an Oracle query through Sqoop to HDFS file. This is bzip >compressed files partitioned by one column date. > > >I created a Hive table to point to the above location. > > >After loading lot of data , I realized the data type of one of the column was >wrongly given. > > >When I changed the data type of column using ALTER to the new type, it is >still showing NULL values. > > >How should I resolve this? > > >Do I need to recreate the table? If so, I have loaded lot of data and I should >not loose the data. This is an External table. > > >Please advise. > > > > > >Regards, >Raj
Sqoop import to HDFS and then Hive table - Issue with data type
All, I loaded data from an Oracle query through Sqoop to HDFS file. This is bzip compressed files partitioned by one column date. I created a Hive table to point to the above location. After loading lot of data , I realized the data type of one of the column was wrongly given. When I changed the data type of column using ALTER to the new type, it is still showing NULL values. How should I resolve this? Do I need to recreate the table? If so, I have loaded lot of data and I should not loose the data. This is an External table. Please advise. Regards, Raj
Connection refused error - Getting repeatedly
All, I have a 3 node hadoop cluster CDH 4.4 and every few days or when ever I load some data through sqoop or query through hive , sometimes I get the following error - Call From <> to <> failed on connection exception: java.net.ConnectException: Connection refused This has become so frequent. What can be the reasons and how should I troubleshoot this? Is it the hardware or network can be the most common problem/issue with this kind of error. Please advise. Regards, Raj
Re: part-m-00000 files and their size - Hive table
Thanks for the detailed explanation Yong. It helps. Regards, Raj On Tuesday, February 25, 2014 9:18 PM, java8964 wrote: Yes, it is good that the file sizes are evenly close, but not very important, unless there are files very small (compared to the block size). The reasons are: Your files should be splitable to be used in Hadoop (Or in Hive, it is the same thing). If they are splitable, then 1G file will use 10 blocks (assume the block size is 128M), and 256M file will take 2 blocks. So these 2 files will generate 12 mapper tasks, and will be equally run in your cluster. From performance point of view, you have 12 mapper tasks, and they are equally processed in the cluster. So one 1G file plus one 256M file are not big deal. But if you have one file are very small, like 10M, that one file will also consume one mapper task, and that is kind of bad for performance, as hadoop starting one mapper task only consuming 10M data, which is bad, because starting/stop tasks is using quite some resource, but only processing 10M data. The reason you see unevenly file size of the output of sqoop is that it is hard for sqoop to split your source data evenly. For example, if you dump table A from DB to hive, sqoop will do the following: 1) Identify the primary/unique keys of the table. 2) Find out the min/max value of the keys, let say they are (1 to 1,000,000) 3) Based on # of your mapper task, split them. If you run sqoop with 4 mappers, then the data will be split into 4 groups (1, 250,000) (250,001, 500,000) (500,001, 750,000) (750,001, 1,000,000). As you can image, your data most likely are not even distributed by the primary keys in that 4 groups, then you will get unevenly output as part-m-xxx files. Keep in mind that it is not required to use primary keys or unique keys as the split column. So you can choose whateven column in your table make sense. Pick up whateven can make the split more even. Yong Date: Tue, 25 Feb 2014 17:42:20 -0800 From: hadoop...@yahoo.com Subject: part-m-0 files and their size - Hive table To: user@hive.apache.org Hi, I am loading data to HDFS files through sqoop and creating a Hive table to point to these files. The mapper files through sqoop example are generated like this below. part-m-0 part-m-1 part-m-2 My question is - 1) For Hive query performance , how important or significant is the distribution of the file sizes above. part_m_0 say 1 GB part_m_1 say 3 GB part_m_1 say 0.25 GB Vs part_m_0 say 1.4 GB part_m_1 say 1.4 GB part_m_1 say 1.45 B NOTE : The size and no of files is just for sample. The real numbers are far bigger. I am assuming the uniform distribution has a performance benefit . If so, what is the reason and can I know the technical details.
part-m-00000 files and their size - Hive table
Hi, I am loading data to HDFS files through sqoop and creating a Hive table to point to these files. The mapper files through sqoop example are generated like this below. part-m-0 part-m-1 part-m-2 My question is - 1) For Hive query performance , how important or significant is the distribution of the file sizes above. part_m_0 say 1 GB part_m_1 say 3 GB part_m_1 say 0.25 GB Vs part_m_0 say 1.4 GB part_m_1 say 1.4 GB part_m_1 say 1.45 B NOTE : The size and no of files is just for sample. The real numbers are far bigger. I am assuming the uniform distribution has a performance benefit . If so, what is the reason and can I know the technical details.
Re: Can a hive partition contain a string like 'tr_date=2014-01-01'
Thanks. Will try it. On Tuesday, February 25, 2014 8:23 PM, Kuldeep Dhole wrote: Probably you should use tr_date='2014-01-01' Considering tr_date partition is there On Tuesday, February 25, 2014, Raj Hadoop wrote: I am trying to create a Hive partition like 'tr_date=2014-01-01' > > >FAILED: ParseException line 1:58 mismatched input '-' expecting ) near '2014' >in add partition statement > >hive_ret_val: 64 >Errors while executing Hive for bksd table for 2014-01-01 > > >Are hyphen's not allowed in the partition directory. ? > > >Please advise. > > >Regards. >Raj >
Can a hive partition contain a string like 'tr_date=2014-01-01'
I am trying to create a Hive partition like 'tr_date=2014-01-01' FAILED: ParseException line 1:58 mismatched input '-' expecting ) near '2014' in add partition statement hive_ret_val: 64 Errors while executing Hive for bksd table for 2014-01-01 Are hyphen's not allowed in the partition directory. ? Please advise. Regards. Raj
Partition column on an Alpha Numeric Column
All, One of the primary key columns in a Relational table has alpha numberic of 6 characters - varchar(6). The first three characters has this pattern - 1st one - 1 to 9 2nd one - 1 to 9 or a -z 3rd one - 1 to 9 or a -z Is this a good idea for performing queries ( can be any queries based on other columns of the table ) Partition the data based on the first three characters summing upto a total of 10 * 36 * 36 which is 12,960 partitons. 12960 partitons - Is it too much ? Impossible or never heard ? or can we consider this design ? I know NameNode should have a powerful RAM. But how much ? How do we determine the limitation of the number of files a Name Node can handle? Thanks, Raj
Add few record(s) to a Hive table or a HDFS file on a daily basis
Hi, My requirement is a typical Datawarehouse and ETL requirement. I need to accomplish 1) Daily Insert transaction records to a Hive table or a HDFS file. This table or file is not a big table ( approximately 10 records per day). I don't want to Partition the table / file. I am reading a few articles on this. It was being mentioned that we need to load to a staging table in Hive. And then insert like the below : insertoverwrite tablefinaltable select*fromstaging; I am not getting this logic. How should I populate the staging table daily. Thanks, Raj
Finding Hive and Hadoop version from command line
All, Is there any way from the command prompt I can find which hive version I am using and Hadoop version too? Thanks in advance. Regards, Raj
How can I just find out the physical location of a partitioned table in Hive
Hi, How can I just find out the physical location of a partitioned table in Hive. Show partitions gives me just the partition column info. I want the location of the hdfs directory / files where the table is created. Please advise. Thanks, Raj
Hive Query Error
I am trying to create a Hive sequence file from another table by running the following - Your query has the following error(s): OK FAILED: ParseException line 5:0 cannot recognize input near 'STORED' 'STORED' 'AS' in constant click the Error Log tab above for details 1 CREATE TABLE temp_xyz as 2 SELECT prop1,prop2,prop3,prop4,prop5 3 FROM hitdata 4 WHERE dateoflog=20130101 and prop1='785-ou' 5 STORED AS SEQUENCEFILE; ok failed: parseexception line 5:0 cannot recognize input near 'stored' 'stored' 'as' in constant
Re: GenericUDF Testing in Hive
I want to do a simple test like this - but not working - select ComplexUDFExample(List("a", "b", "c"), "b") from table1 limit 10; FAILED: SemanticException [Error 10011]: Line 1:25 Invalid function 'List' On Tuesday, February 4, 2014 2:34 PM, Raj Hadoop wrote: How to test a Hive GenericUDF which accepts two parameters List, T List -> Can it be the output of a collect set. Please advise. I have a generic udf which takes List, T. I want to test it how it works through Hive. On Monday, January 20, 2014 5:19 PM, Raj Hadoop wrote: The following is a an example for a GenericUDF. I wanted to test this through a Hive query. Basically want to pass parameters some thing like "select ComplexUDFExample('a','b','c') from employees limit 10". https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java class ComplexUDFExample extends GenericUDF { ListObjectInspector listOI; StringObjectInspector elementOI; @Override public String getDisplayString(String[] arg0) { return "arrayContainsExample()"; // this should probably be better } @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { if (arguments.length != 2) { throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: List, T"); } // 1. Check we received the right object types. ObjectInspector a = arguments[0]; ObjectInspector b = arguments[1]; if (!(a instanceof ListObjectInspector) || !(b instanceof StringObjectInspector)) { throw new UDFArgumentException("first argument must be a list / array, second argument must be a string"); } this.listOI = (ListObjectInspector) a; this.elementOI = (StringObjectInspector) b; // 2. Check that the list contains strings if(!(listOI.getListElementObjectInspector() instanceof StringObjectInspector)) { throw new UDFArgumentException("first argument must be a list of strings"); } // the return type of our function is a boolean, so we provide the correct object inspector return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector; } @Override public Object evaluate(DeferredObject[] arguments) throws HiveException { // get the list and string from the deferred objects using the object inspectors List list = (List) this.listOI.getList(arguments[0].get()); String arg = elementOI.getPrimitiveJavaObject(arguments[1].get()); // check for nulls if (list == null || arg == null) { return null; } // see if our list contains the value we need for(String s: list) { if (arg.equals(s)) return new Boolean(true); } return new Boolean(false); } } hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10; FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch ''c'': arrayContainsExample only takes 2 arguments: List, T -- How to test this example in Hive query. I know I am invoking it wrong. But how can I invoke it correctly. My requirement is to pass a String of arrays as first argument and another string as second argument in Hive like below. Select col1, ComplexUDFExample( collectset(col2) , 'xyz') from Employees Group By col1; How do i do that? Thanks in advance. Regards, Raj
Re: GenericUDF Testing in Hive
How to test a Hive GenericUDF which accepts two parameters List, T List -> Can it be the output of a collect set. Please advise. I have a generic udf which takes List, T. I want to test it how it works through Hive. On Monday, January 20, 2014 5:19 PM, Raj Hadoop wrote: The following is a an example for a GenericUDF. I wanted to test this through a Hive query. Basically want to pass parameters some thing like "select ComplexUDFExample('a','b','c') from employees limit 10". https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java class ComplexUDFExample extends GenericUDF { ListObjectInspector listOI; StringObjectInspector elementOI; @Override public String getDisplayString(String[] arg0) { return "arrayContainsExample()"; // this should probably be better } @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { if (arguments.length != 2) { throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: List, T"); } // 1. Check we received the right object types. ObjectInspector a = arguments[0]; ObjectInspector b = arguments[1]; if (!(a instanceof ListObjectInspector) || !(b instanceof StringObjectInspector)) { throw new UDFArgumentException("first argument must be a list / array, second argument must be a string"); } this.listOI = (ListObjectInspector) a; this.elementOI = (StringObjectInspector) b; // 2. Check that the list contains strings if(!(listOI.getListElementObjectInspector() instanceof StringObjectInspector)) { throw new UDFArgumentException("first argument must be a list of strings"); } // the return type of our function is a boolean, so we provide the correct object inspector return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector; } @Override public Object evaluate(DeferredObject[] arguments) throws HiveException { // get the list and string from the deferred objects using the object inspectors List list = (List) this.listOI.getList(arguments[0].get()); String arg = elementOI.getPrimitiveJavaObject(arguments[1].get()); // check for nulls if (list == null || arg == null) { return null; } // see if our list contains the value we need for(String s: list) { if (arg.equals(s)) return new Boolean(true); } return new Boolean(false); } } hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10; FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch ''c'': arrayContainsExample only takes 2 arguments: List, T -- How to test this example in Hive query. I know I am invoking it wrong. But how can I invoke it correctly. My requirement is to pass a String of arrays as first argument and another string as second argument in Hive like below. Select col1, ComplexUDFExample( collectset(col2) , 'xyz') from Employees Group By col1; How do i do that? Thanks in advance. Regards, Raj
Find a date that is in the range of any array dates in Hive
Hi, I have the following requirement from a Hive table below. CustNumActivityDatesRates 10010-Aug-13,12-Aug-13,20-Aug-1310,15,20 The data above says that From 10 Aug to 11 Aug the rate is 10. From 12 Aug to 19 Aug the rate is 15. From 20-Aug to till date the rate is 20. Note : The order is maintained in 'ActivityDates' and 'Rates'. From the above table , I need to find the rate on say a given date 15-Aug-13. In the above case , the rate for 15-Aug-13 is 15. How should I get this result in Hive. I was reading about a Generic UDF and was thinking to write one like this. The Generic UDF takes two inputs (input date , array of input dates ) . the output should be (an int )to return the element number in the array. In the above case Generic UDF(15-Aug-13,10-Aug-13,12-Aug-13,20-Aug-13) should return the 2nd element in array - 2. Please advise if there is an alternative solution or if the above solution works. I have never written a UDF or Generic UDF and would need some help from the forum members. Please advise. Regards, Raj
Re: delete duplicate records in Hive table
Hi Nitin, Thanks a ton for quick response, Could you please share if any sql syntax for this Thanks, Raj. On Thu, Jan 30, 2014 at 3:29 PM, Nitin Pawar wrote: > easiest way to do is .. write it in a temp table and then select uniq of > each column and writing to real table > > > On Thu, Jan 30, 2014 at 3:19 PM, Raj hadoop wrote: > >> Hi, >> >> Can someone help me how to delete duplicate records in Hive table, >> >> I know that delete and update are not supported by hive but still, >> >> if some know's some alternative can help me in this >> >> Thanks, >> Raj. >> > > > > -- > Nitin Pawar >
delete duplicate records in Hive table
Hi, Can someone help me how to delete duplicate records in Hive table, I know that delete and update are not supported by hive but still, if some know's some alternative can help me in this Thanks, Raj.
GenericUDF Testing in Hive
The following is a an example for a GenericUDF. I wanted to test this through a Hive query. Basically want to pass parameters some thing like "select ComplexUDFExample('a','b','c') from employees limit 10". https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java class ComplexUDFExample extends GenericUDF { ListObjectInspector listOI; StringObjectInspector elementOI; @Override public String getDisplayString(String[] arg0) { return "arrayContainsExample()"; // this should probably be better } @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { if (arguments.length != 2) { throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: List, T"); } // 1. Check we received the right object types. ObjectInspector a = arguments[0]; ObjectInspector b = arguments[1]; if (!(a instanceof ListObjectInspector) || !(b instanceof StringObjectInspector)) { throw new UDFArgumentException("first argument must be a list / array, second argument must be a string"); } this.listOI = (ListObjectInspector) a; this.elementOI = (StringObjectInspector) b; // 2. Check that the list contains strings if(!(listOI.getListElementObjectInspector() instanceof StringObjectInspector)) { throw new UDFArgumentException("first argument must be a list of strings"); } // the return type of our function is a boolean, so we provide the correct object inspector return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector; } @Override public Object evaluate(DeferredObject[] arguments) throws HiveException { // get the list and string from the deferred objects using the object inspectors List list = (List) this.listOI.getList(arguments[0].get()); String arg = elementOI.getPrimitiveJavaObject(arguments[1].get()); // check for nulls if (list == null || arg == null) { return null; } // see if our list contains the value we need for(String s: list) { if (arg.equals(s)) return new Boolean(true); } return new Boolean(false); } } hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10; FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch ''c'': arrayContainsExample only takes 2 arguments: List, T -- How to test this example in Hive query. I know I am invoking it wrong. But how can I invoke it correctly. My requirement is to pass a String of arrays as first argument and another string as second argument in Hive like below. Select col1, ComplexUDFExample( collectset(col2) , 'xyz') from Employees Group By col1; How do i do that? Thanks in advance. Regards, Raj
Re: Basic UDF in Hive - How to setup
Ok. I just figured out. I have to set classpath with EXPORT. Its working now. On Friday, January 17, 2014 3:37 PM, Raj Hadoop wrote: Hi, I am trying to compile a basic hive UDF java file. I am using all the jar files in my classpath but I am not able to compile it and getting the following error. I am using CDH4. Can any one advise please? $ javac HelloWorld.java HelloWorld.java:3: package org.apache.hadoop.hive.ql.exec does not exist import org.apache.hadoop.hive.ql.exec.Description; ^ HelloWorld.java:4: package org.apache.hadoop.hive.ql.exec does not exist import org.apache.hadoop.hive.ql.exec.UDF; ^ HelloWorld.java:5: package org.apache.hadoop.hive.ql.udf does not exist import org.apache.hadoop.hive.ql.udf.UDFType; ^ HelloWorld.java:8: cannot find symbol symbol: class UDF public class HelloWorld extends UDF ^ 4 errors $ echo $CLASSPATH /usr/lib/hive/lib/hive-beeline-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-builtins-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-cli-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-contrib-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-exec-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-hwi-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-jdbc-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-metastore-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-pdk-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-serde-0.10.0-cdh4.4.0.jar::/usr/lib/hive/lib/hive-service-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-shims-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/parquet-hive-1.0.jar:/usr/lib/hive/lib/sentry-binding-hive-1.1.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-annotations-2.0.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-annotations.jar:/usr/lib/hadoop/hadoop-auth-2.0.0-cdh4.4.0.jar:/usr/lib/hadoop /hadoop-auth.jar:/usr/lib/hadoop/hadoop-common-2.0.0-cdh4 Thanks, Raj
Basic UDF in Hive - How to setup
Hi, I am trying to compile a basic hive UDF java file. I am using all the jar files in my classpath but I am not able to compile it and getting the following error. I am using CDH4. Can any one advise please? $ javac HelloWorld.java HelloWorld.java:3: package org.apache.hadoop.hive.ql.exec does not exist import org.apache.hadoop.hive.ql.exec.Description; ^ HelloWorld.java:4: package org.apache.hadoop.hive.ql.exec does not exist import org.apache.hadoop.hive.ql.exec.UDF; ^ HelloWorld.java:5: package org.apache.hadoop.hive.ql.udf does not exist import org.apache.hadoop.hive.ql.udf.UDFType; ^ HelloWorld.java:8: cannot find symbol symbol: class UDF public class HelloWorld extends UDF ^ 4 errors $ echo $CLASSPATH /usr/lib/hive/lib/hive-beeline-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-builtins-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-cli-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-contrib-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-exec-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-hwi-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-jdbc-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-metastore-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-pdk-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-serde-0.10.0-cdh4.4.0.jar::/usr/lib/hive/lib/hive-service-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/hive-shims-0.10.0-cdh4.4.0.jar:/usr/lib/hive/lib/parquet-hive-1.0.jar:/usr/lib/hive/lib/sentry-binding-hive-1.1.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-annotations-2.0.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-annotations.jar:/usr/lib/hadoop/hadoop-auth-2.0.0-cdh4.4.0.jar:/usr/lib/hadoop/hadoop-auth.ja r:/usr/lib/hadoop/hadoop-common-2.0.0-cdh4 Thanks, Raj
Re: JSON data to HIVE table
All, If I have to load JSON data to a Hive table (default record format while creating the table) - is that a requirement to convert each JSON record into one line. How would I do this ? Thanks, Raj From: Rok Kralj To: user@hive.apache.org Sent: Tuesday, January 7, 2014 3:54 AM Subject: Re: JSON data to HIVE table Also, if you have large or dynamic schemas which are a pain to write by hand, you can use this simple tool: https://github.com/strelec/hive-serde-gen 2014/1/7 Roberto Congiu Also https://github.com/rcongiu/Hive-JSON-Serde ;) > > > >On Mon, Jan 6, 2014 at 12:00 PM, Russell Jurney >wrote: > >Check these out: >> >> >>http://hortonworks.com/blog/discovering-hive-schema-in-collections-of-json-documents/ >> >>http://hortonworks.com/blog/howto-use-hive-to-sqlize-your-own-tweets-part-two-loading-hive-sql-queries/ >> >>https://github.com/kevinweil/elephant-bird >> >> >> >> >>On Mon, Jan 6, 2014 at 9:36 AM, Raj Hadoop wrote: >> >>Hi, >>> >>>I am trying to load a data that is in JSON format to Hive table. Can any one >>>suggest what is the method I need to follow? >>> >>>Thanks, >>>Raj >> >> >> >>-- >>Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com > > > >-- >-- >Good judgement comes with experience. >Experience comes with bad judgement. > >-- > >Roberto Congiu - Data Engineer - OpenX >tel: +1 626 466 1141 -- eMail: rok.kr...@gmail.com
JSON data to HIVE table
Hi, I am trying to load a data that is in JSON format to Hive table. Can any one suggest what is the method I need to follow? Thanks, Raj
Re: Dynamic columns in Hive Table - Best Design for the problem
Matt, Thanks for the suggestion. Can you please provide more details on what type of UDAF should I develop ? I have never worked on a UDAF earlier. But would like to explore it. Any tips on how to proceed. Thanks, Raj On Saturday, December 28, 2013 2:47 PM, Matt Tucker wrote: It looks like you're essentially doing a pivot function. Your best bet is to write a custom UDAF or look at the windowing functions available in recent releases. Matt On Dec 28, 2013 12:57 PM, "Raj Hadoop" wrote: Dear All Hive Group Members, > > >I have the following requirement. > > >Input: > > >Ticket#|Date of booking|Price >100|20-Oct-13|54 > >100|21-Oct-13|56 >100|22-Oct-13|54 >100|23-Oct-13|55 >100|27-Oct-13|60 >100|30-Oct-13|47 > > >101|10-Sep-13|12 >101|13-Sep-13|14 >101|20-Oct-13|6 > > > > >Expected Output: > > >Ticket#|Initial|Delta1|Delta2|Delta3|Delta4|Delta5 >100|20-Oct-13,54|21-Oct-13,2|22-Oct-13,0|23-Oct-3,1|27-Oct-13,6|30-Oct-13,-7 >101|10-Sep-13,12|13-Sep-13,2|20-Oct-13,-6||| > > >The number of columns in the expected output is a dynamic list depending on >the number of price changes of a ticket. > > >1) What is the best design to solve the above problem in Hive? >2) How do we implement it? > > >Please advise. > > >Regards, >Raj > > > > > > > > > >
Dynamic columns in Hive Table - Best Design for the problem
Dear All Hive Group Members, I have the following requirement. Input: Ticket#|Date of booking|Price 100|20-Oct-13|54 100|21-Oct-13|56 100|22-Oct-13|54 100|23-Oct-13|55 100|27-Oct-13|60 100|30-Oct-13|47 101|10-Sep-13|12 101|13-Sep-13|14 101|20-Oct-13|6 Expected Output: Ticket#|Initial|Delta1|Delta2|Delta3|Delta4|Delta5 100|20-Oct-13,54|21-Oct-13,2|22-Oct-13,0|23-Oct-3,1|27-Oct-13,6|30-Oct-13,-7 101|10-Sep-13,12|13-Sep-13,2|20-Oct-13,-6||| The number of columns in the expected output is a dynamic list depending on the number of price changes of a ticket. 1) What is the best design to solve the above problem in Hive? 2) How do we implement it? Please advise. Regards, Raj
How to compress the text file - LZO utility ?
Hi, I have a large set of text files. I have created a Hive table pointing to each of these text files. I am looking to compress the files to save storage. 1) How should I compress the file to use LZO compression. 2) How to know whether LZO compression utility (command ?) is installed on the Hadoop cluster? 3) Should the Hive table definition be modified as a Sequence File if I compress the text file? Please advise. Thanks, Raj
Re: how to find number of elements in an array in Hive
Thanks Brad On Monday, December 2, 2013 5:09 PM, Brad Ruderman wrote: Check out size https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF Thanks, Brad On Mon, Dec 2, 2013 at 5:05 PM, Raj Hadoop wrote: hi, > > >how to find number of elements in an array in Hive table? > > >thanks, >Raj > > >
how to find number of elements in an array in Hive
hi, how to find number of elements in an array in Hive table? thanks, Raj
Compression for a HDFS text file - Hive External Partition Table
Hi , 1) My requirement is to load a file ( a tar.gz file which has multiple tab separated values files and one file is the main file which has huge data – about 10 GB per day) to an externally partitioned hive table. 2) What I am doing is I have automated the process by extracting the tar.gz file and get the main data file (10GB text file) and then loading to a hdfs file as text file. 3) I want to compress the files. What is the procedure for it? 4) Do I need to use any utility to compress the hit data file before loading to HDFS? And also should I define an Input Structure for HDFS File format through a Java Program? Regards, Raj
How to load a web log file (text format) to Hive with compression
Hi, I have a web log files (text format). I want to load these files to a Hive table in compressed format. How do I do it ? Should I compress the text file (using any Linux utilities) and then create the Hive table? Can any one provide me the Hive syntax for loading the compressed file? Thanks, Raj
Re: Hive external table partitions with less than symbol ?
Hi - I have this doubt. Why do i need to use an INSERT INTO . can I just create hdfs directories and map it to a hive external table setting the location of the hdfs directories. will this work ? please advise. Thanks, Raj On Monday, November 4, 2013 8:34 AM, Matouk IFTISSEN wrote: Yes it is possible: hadoop fs -mkdir /hdfs_path/'cust_id>1000' I tested it and works, then you can store data in this directory . for concat function you do simple: insert into your_table_partionned PARTITION (path_xxx) select attr,id, concat ('/data1/customer/', id) as path_xxx from your_table where id <1000 ...... Cdt. 2013/11/4 Raj Hadoop How can i use concat function? I did not get it. Can you please elaborate. > > >My requirement is to create a HDFS directory like >(cust_id>1000 and cust_id<2000) > > > >and map this to a Hive External table. > > >can i do that? > > > >On Monday, November 4, 2013 3:34 AM, Matouk IFTISSEN > wrote: > >Hello >You can use concat function or case to do this like: >Concat ('/data1/customer/', id) >. >Where id <1000 >Etc.. >Hope this help you ;) >Le 3 nov. 2013 23:51, "Raj Hadoop" a écrit : > >All, >> >> >>I want to create partitions like the below and create a hive external table. >>How can i do that ? >> >> >>/data1/customer/id<1000 >>/data1/customer/id>1000 and id < 2000 >> >>/data1/customer/id >2000 >> >> >> >>Is this possible ( < and > symbols in folders ?) >> >> >>My requirement is to partition the hive table based on some customer id's. >> >> >>Thanks, >>Raj > > -- Matouk IFTISSEN | Consultant BI & Big Data 24 rue du sentier - 75002 Paris - www.ysance.com Mob : +33 6 78 51 18 69 || Fax : +33 1 73 72 97 26 Ysance sur :Twitter | Facebook | Google+ | LinkedIn | Newsletter Nos autres sites : ys4you | labdecisionnel | decrypt
Re: Hive external table partitions with less than symbol ?
How can i use concat function? I did not get it. Can you please elaborate. My requirement is to create a HDFS directory like (cust_id>1000 and cust_id<2000) and map this to a Hive External table. can i do that? On Monday, November 4, 2013 3:34 AM, Matouk IFTISSEN wrote: Hello You can use concat function or case to do this like: Concat ('/data1/customer/', id) . Where id <1000 Etc.. Hope this help you ;) Le 3 nov. 2013 23:51, "Raj Hadoop" a écrit : All, > > >I want to create partitions like the below and create a hive external table. >How can i do that ? > > >/data1/customer/id<1000 >/data1/customer/id>1000 and id < 2000 > >/data1/customer/id >2000 > > > >Is this possible ( < and > symbols in folders ?) > > >My requirement is to partition the hive table based on some customer id's. > > >Thanks, >Raj
Hive external table partitions with less than symbol ?
All, I want to create partitions like the below and create a hive external table. How can i do that ? /data1/customer/id<1000 /data1/customer/id>1000 and id < 2000 /data1/customer/id >2000 Is this possible ( < and > symbols in folders ?) My requirement is to partition the hive table based on some customer id's. Thanks, Raj
Re: Oracle to HDFS through Sqoop and a Hive External Table
Manish, Thanks for reply. 1. Load to Hdfs, beware of Sqoop error handling, as its a mapreduce based framework, so if 1 mapper fails it might happen that you get partial data. So do you say that - if I can handle errors in Sqoop, going for 100 HDFS folders/files - is it OK ? 2. Create partition based on date and hour, if customer table has some date or timestamp column. I cannot rely on date or timestamp column. So can I go with Customer ID ? 3. Think about file format also, as that will affect the load and query time. Can you please suggest a file format that I have to use ? 4. Think about compression as well before hand, as that will govern the data split, and performance of your queries as well. Does compression increases or reduces performance ? Isn't the compression advantage is saving in storage? - Raj On Sunday, November 3, 2013 11:03 AM, manish.hadoop.work wrote: 1. Load to Hdfs, beware of Sqoop error handling, as its a mapreduce based framework, so if 1 mapper fails it might happen that you get partial data. 2. Create partition based on date and hour, if customer table has some date or timestamp column. 3. Think about file format also, as that will affect the load and query time. 4. Think about compression as well before hand, as that will govern the data split, and performance of your queries as well. Regards, Manish Sent from my T-Mobile 4G LTE Device Original message From: Raj Hadoop Date: 11/03/2013 7:39 AM (GMT-08:00) To: Hive ,Sqoop ,User Subject: Oracle to HDFS through Sqoop and a Hive External Table Hi, I am sending this to the three dist-lists of Hadoop, Hive and Sqoop as this question is closely related to all the three areas. I have this requirement. I have a big table in Oracle (about 60 million rows - Primary Key Customer Id). I want to bring this to HDFS and then create a Hive external table. My requirement is running queries on this Hive table (at this time i do not know what queries i would be running). Is the following a good design for the above problem ? Any pros and cons of this. 1) Load the table to HDFS using Sqoop into multiple folders (divide Customer Id's into 100 segments). 2) Create Hive external partition table based on the above 100 HDFS directories. Thanks, Raj
Oracle to HDFS through Sqoop and a Hive External Table
Hi, I am sending this to the three dist-lists of Hadoop, Hive and Sqoop as this question is closely related to all the three areas. I have this requirement. I have a big table in Oracle (about 60 million rows - Primary Key Customer Id). I want to bring this to HDFS and then create a Hive external table. My requirement is running queries on this Hive table (at this time i do not know what queries i would be running). Is the following a good design for the above problem ? Any pros and cons of this. 1) Load the table to HDFS using Sqoop into multiple folders (divide Customer Id's into 100 segments). 2) Create Hive external partition table based on the above 100 HDFS directories. Thanks, Raj
Re: External Partition Table
Thanks Tim. I am using a String column for the partition column. On Thursday, October 31, 2013 6:49 PM, Timothy Potter wrote: Hi Raj, This seems like a matter of style vs. any performance benefit / cost ... if you're going to do a lot of queries just based on month or year, then #2 might be easier, e.g. select * from foo where year = 2013 seems a little cleaner than select * from foo where date >= 20130101 and date <= 20131231 (not sure how you're encoding dates into a INT but I think you get the idea) I do something similar but my partition fields are strings, like 2013-10-31_ (which has the nice property of lexically sorting the same as numeric sort). I'm assuming they will both have the same performance because Hive is still selecting the same number of input paths in both scenarios, one just happens to be a little deeper. Cheers, Tim On Thu, Oct 31, 2013 at 4:34 PM, Raj Hadoop wrote: Hi, > > >I am planning for a Hive External Partition Table based on a date. > > >Which one of the below yields a better performance or both have the same >performance? > > >1) Partition based on one folder per day >LIKE date INT >2) Partition based on one folder per year / month / day ( So it has three >folders) >LIKE year INT, month INT, day INT > > >Thanks, >Raj > >
Re: External Partition Table
Hi Brad, Thanks for the quick response. I have about 10 GB file per day (web logs). And I am creating a folder(partition) per each day. Is it something uncommon ? I do not know at this juncture what kind of queries I would be executing upon on this table. But just wanted to know whether this is something normal or not at all a normal thing. Thanks, Raj On Thursday, October 31, 2013 6:39 PM, Brad Ruderman wrote: Wow that question won't be answerable. It all depends on the amount of data per partition and the queries you are going to be executing on it, as well as the structure of the data. In general in hive (depending on your cluster size) you need to balance the number of files with the size, smaller number of files is typically preferred but partitions will help when date restricting. Thx, Brad On Thu, Oct 31, 2013 at 3:34 PM, Raj Hadoop wrote: Hi, > > >I am planning for a Hive External Partition Table based on a date. > > >Which one of the below yields a better performance or both have the same >performance? > > >1) Partition based on one folder per day >LIKE date INT >2) Partition based on one folder per year / month / day ( So it has three >folders) >LIKE year INT, month INT, day INT > > >Thanks, >Raj > >
External Partition Table
Hi, I am planning for a Hive External Partition Table based on a date. Which one of the below yields a better performance or both have the same performance? 1) Partition based on one folder per day LIKE date INT 2) Partition based on one folder per year / month / day ( So it has three folders) LIKE year INT, month INT, day INT Thanks, Raj
Re: Hive Query Questions - is null in WHERE
Thanks. It worked for me now when i use it as an empty string. From: Krishnan K To: "user@hive.apache.org" ; Raj Hadoop Sent: Thursday, October 17, 2013 11:11 AM Subject: Re: Hive Query Questions - is null in WHERE For string columns, null will be interpreted as an empty string and for others, it will be interpreted as null... On Wednesday, October 16, 2013, Raj Hadoop wrote: All, > >When a query is executed like the below > >select field1 from table1 where field1 is null; > >I am getting the results which have empty values or nulls in field1. How does >is null work in Hive queries. > >Thanks, >Raj
Hive Query Questions - is null in WHERE
All, When a query is executed like the below select field1 from table1 where field1 is null; I am getting the results which have empty values or nulls in field1. How does is null work in Hive queries. Thanks, Raj
Re: How to load /t /n file to Hive
Yes, I have it. Thanks, Raj From: Sonal Goyal To: "user@hive.apache.org" ; Raj Hadoop Sent: Monday, October 7, 2013 1:38 AM Subject: Re: How to load /t /n file to Hive Do you have the option to escape your tabs and newlines in your base file? Best Regards, Sonal Nube Technologies On Sat, Sep 21, 2013 at 12:34 AM, Raj Hadoop wrote: Hi, > >I have a file which is delimted by a tab. Also, there are some fields in the >file which has a tab /t character and a new line /n character in some fields. > >Is there any way to load this file using Hive load command? Or do i have to >use a Custom Map Reduce (custom) Input format with java ? Please advise. > >Thanks, >Raj
Re: How to load /t /n file to Hive
Hi Gabo, Are you suggesting to use java.net.URLEncoder ? Can you be more specific ? I have lot of fields in the file which are not only URL related but some text fields which has new line characters. Thanks, Raj From: Gabriel Eisbruch To: "user@hive.apache.org" ; Raj Hadoop Sent: Friday, September 20, 2013 4:43 PM Subject: Re: How to load /t /n file to Hive Hi One way that we used to solve that problem it's to transform the data when you are creating/loading it, for example we've applied UrlEncode to each field on create time. Thanks, Gabo. 2013/9/20 Raj Hadoop Hi Nitin, > >Thanks for the reply. I have a huge file in unix. > >As per the file definition, the file is a tab separated file of fields. But I >am sure that within some field's I have some new line character. > >How should I find a record? It is a huge file. Is there some command? > >Thanks, > > > >From: Nitin Pawar >To: "user@hive.apache.org" ; Raj Hadoop > >Sent: Friday, September 20, 2013 3:15 PM >Subject: Re: How to load /t /n file to Hive > > > >If your data contains new line chars, its better you write a custom map reduce >job and convert the data into a single line removing all unwanted chars in >column separator as well just having single new line char per line > > > >On Sat, Sep 21, 2013 at 12:38 AM, Raj Hadoop wrote: > >Please note that there is an escape chacter in the fields where the /t and /n >are present. >> >> >> >>From: Raj Hadoop >>To: Hive >>Sent: Friday, September 20, 2013 3:04 PM >>Subject: How to load /t /n file to Hive >> >> >> >>Hi, >> >>I have a file which is delimted by a tab. Also, there are some fields in the >>file which has a tab /t character and a new line /n character in some fields. >> >>Is there any way to load this file using Hive load command? Or do i have to >>use a Custom Map Reduce (custom) Input format with java ? Please advise. >> >>Thanks, >>Raj >> >> > > > >-- >Nitin Pawar > > >
Re: How to load /t /n file to Hive
Hi Nitin, Thanks for the reply. I have a huge file in unix. As per the file definition, the file is a tab separated file of fields. But I am sure that within some field's I have some new line character. How should I find a record? It is a huge file. Is there some command? Thanks, From: Nitin Pawar To: "user@hive.apache.org" ; Raj Hadoop Sent: Friday, September 20, 2013 3:15 PM Subject: Re: How to load /t /n file to Hive If your data contains new line chars, its better you write a custom map reduce job and convert the data into a single line removing all unwanted chars in column separator as well just having single new line char per line On Sat, Sep 21, 2013 at 12:38 AM, Raj Hadoop wrote: Please note that there is an escape chacter in the fields where the /t and /n are present. > > > >From: Raj Hadoop >To: Hive >Sent: Friday, September 20, 2013 3:04 PM >Subject: How to load /t /n file to Hive > > > >Hi, > >I have a file which is delimted by a tab. Also, there are some fields in the >file which has a tab /t character and a new line /n character in some fields. > >Is there any way to load this file using Hive load command? Or do i have to >use a Custom Map Reduce (custom) Input format with java ? Please advise. > >Thanks, >Raj > > -- Nitin Pawar
Re: How to load /t /n file to Hive
Please note that there is an escape chacter in the fields where the /t and /n are present. From: Raj Hadoop To: Hive Sent: Friday, September 20, 2013 3:04 PM Subject: How to load /t /n file to Hive Hi, I have a file which is delimted by a tab. Also, there are some fields in the file which has a tab /t character and a new line /n character in some fields. Is there any way to load this file using Hive load command? Or do i have to use a Custom Map Reduce (custom) Input format with java ? Please advise. Thanks, Raj
How to load /t /n file to Hive
Hi, I have a file which is delimted by a tab. Also, there are some fields in the file which has a tab /t character and a new line /n character in some fields. Is there any way to load this file using Hive load command? Or do i have to use a Custom Map Reduce (custom) Input format with java ? Please advise. Thanks, Raj
Hive Thrift Service - Not Running Continously
Hi, The hive thrift service is not running continously. I had to execute the command (hive --service hiveserver &) very frequently . Can any one help me on this? Thanks, Raj
Re: Help in debugging Hive Query
Hi Sanjay, Thanks for taking the time to write all the details. I did a silly mistake. The data type for visit_page_num, i created it as string. The string was causing issues when I am using the max function. A type cast to int in the query worked for me. Regards, Raj From: Sanjay Subramanian To: "user@hive.apache.org" Sent: Thursday, July 25, 2013 1:41 PM Subject: Re: Help in debugging Hive Query The query is correct but since u r creating a managed table , that is possibly creating some issue and the records are not all getting created This is what I would propose CHECKPOINT 1 : Is this query running at all ? === Use this option in BOLD and run the QUERY ONLY (without any table creation) to log errors and pipe to a log file by using nohup or some other way that u prefer hive -hiveconf hive.root.logger=INFO,console -e select a.evar23,sum(b.max_visit_page_num) from (select distinct visid_high,visid_low,evar23 from web.omniture_web_data) a JOIN (select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from omniture_web_data group by visid_high,visid_low) b where a.visid_high=b.visid_high and a.visid_low=b.visid_low group by a.evar23; CHECKPOINT 2 : Run the query (using the CREATE TABLE option) with these additional options === Required params: SET mapreduce.job.maps=500; SET mapreduce.job.reduces=8; SET mapreduce.tasktracker.map.tasks.maximum=12; SET mapreduce.tasktracker.reduce.tasks.maximum=8; SET mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec; SET mapreduce.map.output.compress=true; Optional params: --- If u r using compression in output , use the following ; u can change the LzoCodec to whatever u r using for compression SET hive.exec.compress.intermediate=true; SET hive.exec.compress.output=true; SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec; SET mapreduce.output.fileoutputformat.compress=true; Thanks Sanjay From: Raj Hadoop Reply-To: "user@hive.apache.org" , Raj Hadoop Date: Thursday, July 25, 2013 5:00 AM To: Hive Subject: Help in debugging Hive Query All, I am trying to determine visits for customer from omniture weblog file using Hive. Table: omniture_web_data Columns: visid_high,visid_low,evar23,visit_page_num Sample Data: visid_high,visid_low,evar23,visit_page_num 999,888,1003,10 999,888,1003,14 999,888,1003,6 999,777,1003,12 999,777,1003,20 I want to calculate for each Customer Number ( evar23 is Customer Number ) , total visits. visid_high and visid_low determines a unique visit. For each distinct visitor, calculate sum of maximum visit_page_num. In above example 14 + 20 = 34 should be the total visits for the customer 1003. I am trying to run the following queries - Method 1 is almost the same as Method 2. Except in Method 1 I only choose a particualr customer number 1003. In method 2 , i generalized to all. In Method 1 , I am getting the accurate result. In metnhod 2 , I am not getting the same result as Method 1. Any suggestions on how to trouble shoot. ALso, any alternative approaches. // Method 1 select a.evar23,sum(b.max_visit_page_num) from (select distinct visid_high,visid_low,evar23 from web.omniture_web_data where evar23='1003') a JOIN (select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from omniture_web_data where evar23='1003' group by visid_high,visid_low) b where a.visid_high=b.visid_high and a.visid_low=b.visid_low group by a.evar23; / Result of Method 1 100334 // Method 2 create table temp123 as select a.evar23,sum(b.max_visit_page_num) from (select distinct visid_high,visid_low,evar23 from web.omniture_web_data) a JOIN (select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from omniture_web_data group by visid_high,visid_low) b where a.visid_high=b.visid_high and a.visid_low=b.visid_low group by a.evar23; select * from temp123 where evar23='1003'; // The Result of Method 2 is not the same as Method 1. It is showing a different number. Thanks, Raj CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Help in debugging Hive Query
All, I am trying to determine visits for customer from omniture weblog file using Hive. Table: omniture_web_data Columns: visid_high,visid_low,evar23,visit_page_num Sample Data: visid_high,visid_low,evar23,visit_page_num 999,888,1003,10 999,888,1003,14 999,888,1003,6 999,777,1003,12 999,777,1003,20 I want to calculate for each Customer Number ( evar23 is Customer Number ) , total visits. visid_high and visid_low determines a unique visit. For each distinct visitor, calculate sum of maximum visit_page_num. In above example 14 + 20 = 34 should be the total visits for the customer 1003. I am trying to run the following queries - Method 1 is almost the same as Method 2. Except in Method 1 I only choose a particualr customer number 1003. In method 2 , i generalized to all. In Method 1 , I am getting the accurate result. In metnhod 2 , I am not getting the same result as Method 1. Any suggestions on how to trouble shoot. ALso, any alternative approaches. // Method 1 select a.evar23,sum(b.max_visit_page_num) from (select distinct visid_high,visid_low,evar23 from web.omniture_web_data where evar23='1003') a JOIN (select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from omniture_web_data where evar23='1003' group by visid_high,visid_low) b where a.visid_high=b.visid_high and a.visid_low=b.visid_low group by a.evar23; / Result of Method 1 100334 // Method 2 create table temp123 as select a.evar23,sum(b.max_visit_page_num) from (select distinct visid_high,visid_low,evar23 from web.omniture_web_data) a JOIN (select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from omniture_web_data group by visid_high,visid_low) b where a.visid_high=b.visid_high and a.visid_low=b.visid_low group by a.evar23; select * from temp123 where evar23='1003'; // The Result of Method 2 is not the same as Method 1. It is showing a different number. Thanks, Raj
Oracle to Hive
All, Can anyone give me tips on how to convert the following Oracle SQL to a Hive query. SELECT a.c100, a.c300, b.c400 FROM t1 a JOIN t2 b ON a.c200 = b.c200 WHERE a.c100 IN (SELECT DISTINCT a.c100 FROM t1 a JOIN t2 b ON a.c200 = b.c200 WHERE b.c400 >= SYSDATE - 1) AND b.c400 >= SYSDATE - 1 AND a.c300 = 0 The SYSDATE can be replaced by date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'-MM-dd') , 1) in Hive. But I wanted to know the rest of the query. Any pointers or tips so that I can start on my own. Thanks in advance. Regards, Raj
Special characters in web log file causing issues
Hi , The log file that I am trying to load throuh Hive has some special characters The field is shown below and the special characters ¿¿are also shown. Shockwave Flash in;Motive ManagementPlug-in;Google Update;Java(TM)Platform SE 7U21;McAfee SiteAdvisor;McAfee Virtual Technician;Windows Live¿¿ Photo Gallery;McAfee SecurityCenter;Silverlig The above is causing the record to be terminated and loading another line. How can I avoid this type of issues and how to load the proper data ? Any suggestions please. Thanks, Raj;Chrome Remote Desktop Viewer;NativeClient;Chrome PDF Viewer;Adobe Acrobat;Microsoft Office 2010;Motive Plug-
Re: Loading a flat file + one additional field to a Hive table
Thanks Sanjay. I will look into this. Also - one more question. When I am trying to load log file to Hive and comparing the counts like this select count(*) from <> Versus wc -l <> I see a few hundred records greater in <>. How should I debug it? Any tips please. From: Sanjay Subramanian To: "user@hive.apache.org" ; Raj Hadoop Sent: Saturday, July 6, 2013 4:32 AM Subject: Re: Loading a flat file + one additional field to a Hive table How about this ? Assume you have a log file called oompaloompa.log TIMESTAMP=$(date +%Y_%m_%d_T%H_%M_%S);mv oompaloopa.log oompaloopa.log.${TIMESTAMP};cat oompaloopa.log.${TIMESTAMP}| hdfs dfs -put - /user/sasubramanian/oompaloopa.log.${TIMESTAMP} This will directly put the file on HDFS and u can put it to the LOCATION specified by your HIVE TABLE definition sanjay From: "manishbh...@rocketmail.com" Reply-To: "user@hive.apache.org" Date: Friday, July 5, 2013 10:39 AM To: Raj Hadoop , Hive Subject: Re: Loading a flat file + one additional field to a Hive table Raj, You should dump the data in a temp table first and then move the data into final table with select query. Select date(), c1,c2. From temp table. Reason: we should avoid custom operation in load unless it is necessary. Sent via Rocket from my HTC - Reply message - From: "Raj Hadoop" To: "Hive" Subject: Loading a flat file + one additional field to a Hive table Date: Fri, Jul 5, 2013 10:30 PM Hi, Can any one please suggest the best way to do the following in Hive? Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive table T1 ( D1,C1,C2,C3,C4) Can the following command be modified in some way to acheive the above hive > load data local inpath '/software/home/hadoop/dat_files/' into table T1; My requirement is to append a date stamp to a Web log file and then load it to Hive table. Thanks, Raj CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Loading a flat file + one additional field to a Hive table
Hi, Can any one please suggest the best way to do the following in Hive? Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive table T1 ( D1,C1,C2,C3,C4) Can the following command be modified in some way to acheive the above hive > load data local inpath '/software/home/hadoop/dat_files/' into table T1; My requirement is to append a date stamp to a Web log file and then load it to Hive table. Thanks, Raj
Re: How Can I store the Hive query result in one file ?
Adding to that - Multiple files can be concatenated from the directory like Example: cat 0-0 00-1 0-2 > final From: Raj Hadoop To: "user@hive.apache.org" ; "matouk.iftis...@ysance.com" Sent: Friday, July 5, 2013 12:17 AM Subject: Re: How Can I store the Hive query result in one file ? hive > set hive.io.output.fileformat=CSVTextFile; hive > insert overwrite local directory '/usr/home/hadoop/da1/' select * from customers *** customers is a Hive table From: Edward Capriolo To: "user@hive.apache.org" Sent: Friday, July 5, 2013 12:10 AM Subject: Re: How Can I store the Hive query result in one file ? Normally if use set mapred.reduce.tasks=1 you get one output file. You can also look at hive.merge.mapfiles, mapred.reduce.tasks, hive.merge.reducefiles also you can use a separate tool https://github.com/edwardcapriolo/filecrush On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar wrote: will hive -e "query" > filename or hive -f query.q > filename will do ? > > >you specially want it to write into a named file on hdfs only? > > > >On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN >wrote: > >Hello Hive users, >>Is there a manner to store the Hive query result (SELECT *.) in a >>specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL >>DIRECTORY '/directory_path_name/')? >>Thanks for your answers >> >> >> > > > >-- >Nitin Pawar >
Re: How Can I store the Hive query result in one file ?
hive > set hive.io.output.fileformat=CSVTextFile; hive > insert overwrite local directory '/usr/home/hadoop/da1/' select * from customers *** customers is a Hive table From: Edward Capriolo To: "user@hive.apache.org" Sent: Friday, July 5, 2013 12:10 AM Subject: Re: How Can I store the Hive query result in one file ? Normally if use set mapred.reduce.tasks=1 you get one output file. You can also look at hive.merge.mapfiles, mapred.reduce.tasks, hive.merge.reducefiles also you can use a separate tool https://github.com/edwardcapriolo/filecrush On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar wrote: will hive -e "query" > filename or hive -f query.q > filename will do ? > > >you specially want it to write into a named file on hdfs only? > > > >On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN >wrote: > >Hello Hive users, >>Is there a manner to store the Hive query result (SELECT *.) in a >>specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL >>DIRECTORY '/directory_path_name/')? >>Thanks for your answers >> >> >> > > > >-- >Nitin Pawar >
Issue with Oracle Hive Metastore (SEQUENCE_TABLE)
Hi, When I installed Hive earlier on my machine I used a oracle hive meta script. Please find attached the script. HIVE worked fine for me on this box with no issues. I am trying to install Hive on another machine in a different Oracle metastore. I executed the meta script but I am having issues with my hive on second box. $ hive WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Logging initialized using configuration in jar:file:/software/hadoop/hive/hive-0.9.0/lib/hive-common-0.9.0.jar!/hive-log4j.properties Hive history file=/tmp/hadoop/hive_job_log_hadoop_201307031616_605717324.txt hive> show tables; FAILED: Error in metadata: javax.jdo.JDOException: Couldnt obtain a new sequence (unique id) : ORA-00942: table or view does not exist NestedThrowables: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask I found the difference between the two meta stores and one table is missing in it. The table is SEQUENCE_TABLE. I do not know whether this table will be created automatically by Hive or should it be in the script.I dont remember what I did earlier and I am assuming I used the same script. Can any one had this issue earlier ? Please advise. Also, Where to get the hive 0.9 oracle meta script? Thanks, Raj hive-schema-0.9.0.oracle.sql Description: Binary data
Re: Hive Table to CSV file
Sorry. Its my bad. I see the files now. I was looking in a different directory earlier. From: Mohammad Tariq To: user Sent: Monday, July 1, 2013 8:26 PM Subject: Re: Hive Table to CSV file Do you have permissions to write to this path?And make sure you are looking into the local FS, as Stephen has specified. Warm Regards, Tariq cloudfront.blogspot.com On Tue, Jul 2, 2013 at 5:25 AM, Stephen Sprague wrote: you gotta admit that's kinda funny. Your stderr output shows not only once but three times where it put the output and in fact how many rows it put there. and to top it off it reported 'SUCCESS'. > >but you're saying there's nothing there? > >now. call me crazy but i would tend to believe hive over you - but that's just >me. :) > >are you looking at the local filesystem on the same box you ran hive? > > > > >On Mon, Jul 1, 2013 at 4:01 PM, Raj Hadoop wrote: > >Hi, >> >>My requirement is to load data from a (one column) Hive view to a CSV file. >>After loading it, I dont see any file generated. >> >>I used the following commands to load data to file from a view v_june1 >> >> >>hive > set hive.io.output.fileformat=CSVTextFile; >> hive > insert overwrite local directory '/usr/home/hadoop/da1/' select * >>from v_june1_pgnum >> >>.The output at console is like the below. >> >> >> >>MapReduce Total cumulative CPU time: 4 minutes 15 seconds 590 msec >>Ended Job = job_201306141336_0113 >>Copying data to local directory /usr/home/hadoop/da1 >>Copying data to local directory /usr/home/hadoop/da1 >>3281 Rows loaded to /usr/home/hadoop/da1 >>MapReduce Jobs Launched: >>Job 0: Map: 21 Reduce: 6 Cumulative CPU: 255.59 sec HDFS Read: >>5373722496 HDFS Write: 389069 SUCCESS >>Total MapReduce CPU Time Spent: 4 minutes 15 seconds 590 msec >>OK Time taken: 148.764 second >> >> >> >>My Question : I do not see any files created under /usr/home/hadoop/da1. >>Where are the files created? >> >>Thanks, >>Raj >> >> >> >> >
Hive Table to CSV file
Hi, My requirement is to load data from a (one column) Hive view to a CSV file. After loading it, I dont see any file generated. I used the following commands to load data to file from a view v_june1 hive > set hive.io.output.fileformat=CSVTextFile; hive > insert overwrite local directory '/usr/home/hadoop/da1/' select * from v_june1_pgnum .The output at console is like the below. MapReduce Total cumulative CPU time: 4 minutes 15 seconds 590 msec Ended Job = job_201306141336_0113 Copying data to local directory /usr/home/hadoop/da1 Copying data to local directory /usr/home/hadoop/da1 3281 Rows loaded to /usr/home/hadoop/da1 MapReduce Jobs Launched: Job 0: Map: 21 Reduce: 6 Cumulative CPU: 255.59 sec HDFS Read: 5373722496 HDFS Write: 389069 SUCCESS Total MapReduce CPU Time Spent: 4 minutes 15 seconds 590 msec OK Time taken: 148.764 second My Question : I do not see any files created under /usr/home/hadoop/da1. Where are the files created? Thanks, Raj
TempStatsStore derby.log
Hi, I have Hive metastore created in an Oracle database. But when i execute my Hive queries , I see following directory and file created. TempStatsStore (directory) derby.log What are this? Can one one suggest why derby log is created even though my javax.jdo.option.ConnectionURL is pointing to Oracle? Thanks, Raj
Sqoop Oracle Import to Hive Table - Error in metadata: InvalidObjectException
Hi, I am trying to run the following to load an Oracle table to Hive table using Sqoop, sqoop import --connect jdbc:oracle:thin:@//inferri.dm.com:1521/DBRM25 --table DS12.CREDITS --username UPX1 --password piiwer --hive-import Note: DS12 is a schema and UPX1 is the user through which the schema and the table in the schema is accessed. I was able to access the table through sqlplus client tool. I am getting the following error. Can any one identify the issue and let me know please? ERROR exec.Task (SessionState.java:printError(400)) - FAILED: Error in metadata: InvalidObjectException(message:There is no database named ds12) org.apache.hadoop.hive.ql.metadata.HiveException: InvalidObjectException(message:There is no database named ds12) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:544) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3305) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:242) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:439) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:449) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:647) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: InvalidObjectException(message:There is no database named dw) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table(HiveMetaStore.java:852) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:402) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:538) ... 20 more 2013-05-25 17:37:14,276 ERROR ql.Driver (SessionState.java:printError(400)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask Thanks, Raj
Re: Apache Flume Properties File
Hi, When I am reading all the stuff on internet on Flume, everything is mostly on CDH distribution. I am aware that Flume is Cloudera's contribution but I am using a strict Apache version in my research work. When I was reading all this, I want to make sure from the forum that Apache flume if had any issues with install etc., So that is the reason why I had to sent it to the dist lists. My intention is not to get a silver platter. I am not expecting that. Anyways - sorry for inconvenience. Thanks, Raj From: Stephen Sprague To: user@hive.apache.org; Raj Hadoop Sent: Friday, May 24, 2013 6:32 PM Subject: Re: Apache Flume Properties File so you spammed three big lists there, eh? with a general question for somebody to serve up a solution on a silver platter for you -- all before you even read any documentation on the subject matter? nice job and good luck to you. On Fri, May 24, 2013 at 2:13 PM, Raj Hadoop wrote: Hi, > >I just installed Apache Flume 1.3.1 and trying to run a small example to test. >Can any one suggest me how can I do this? I am going through the documentation >right now. > >Thanks, >Raj
Apache Flume Properties File
Hi, I just installed Apache Flume 1.3.1 and trying to run a small example to test. Can any one suggest me how can I do this? I am going through the documentation right now. Thanks, Raj
Sqoop Import Oracle Error - Attempted to generate class with no columns!
Hi, I just finished setting up Apache sqoop 1.4.3. I am trying to test basic sqoop import on Oracle. sqoop import --connect jdbc:oracle:thin:@//intelli.dmn.com:1521/DBT --table usr1.testonetwo --username usr123 --password passwd123 I am getting the error as 13/05/22 17:18:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM usr1.testonetwo t WHERE 1=0 13/05/22 17:18:16 ERROR tool.ImportTool: Imported Failed: Attempted to generate class with no columns! I checked the database and the query runs fine from Oracle sqlplus client and Toad. Thanks, Raj
Hive tmp logs
Hi, My hive job logs are being written to /tmp/hadoop directory. I want to change it to a different location i.e. a sub directory somehere under the 'hadoop' user home directory. How do I change it. Thanks, Ra
ORA-01950: no privileges on tablespace
I am setting up a metastore on Oracle for Hive. I executed the script hive-schema-0.9.0-sql file too succesfully. When i ran this hive > show tables; I am getting the following error. ORA-01950: no privileges on tablespace What kind of Oracle privileges are required (Quota wise for Hive) for hive oracle user in metastore? Please advise.
Re: Where to get Oracle scripts for Hive Metastore
Sanjay - This is the first location I tried. But Apache Hive 0.9.0 doesnt have an oracle folder. It only had mysql and derby. Thanks, Raj From: Sanjay Subramanian To: "u...@hadoop.apache.org" ; Raj Hadoop ; Hive Sent: Tuesday, May 21, 2013 3:12 PM Subject: Re: Where to get Oracle scripts for Hive Metastore Raj The correct location of the script is where u deflated the hive tar For example /usr/lib/hive/scripts/metastore/upgrade/oracle You will find a file in this directory called hive-schema-0.9.0.oracle.sql Use this sanjay From: Raj Hadoop Reply-To: "u...@hadoop.apache.org" , Raj Hadoop Date: Tuesday, May 21, 2013 12:08 PM To: Hive , User Subject: Where to get Oracle scripts for Hive Metastore I am trying to get Oracle scripts for Hive Metastore. http://mail-archives.apache.org/mod_mbox/hive-commits/201204.mbox/%3c20120423201303.9742b2388...@eris.apache.org%3E The scripts in the above link has a + at the begining of each line. How should I supposed to execute scripts like this through Oracle sqlplus. +CREATE TABLE PART_COL_PRIVS +( + PART_COLUMN_GRANT_ID NUMBER NOT NULL, + "COLUMN_NAME" VARCHAR2(128) NULL, + CREATE_TIME NUMBER (10) NOT NULL, + GRANT_OPTION NUMBER (5) NOT NULL, + GRANTOR VARCHAR2(128) NULL, + GRANTOR_TYPE VARCHAR2(128) NULL, + PART_ID NUMBER NULL, + PRINCIPAL_NAME VARCHAR2(128) NULL, + PRINCIPAL_TYPE VARCHAR2(128) NULL, + PART_COL_PRIV VARCHAR2(128) NULL +); + CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Where to get Oracle scripts for Hive Metastore
I got it. This is the link. http://svn.apache.org/viewvc/hive/trunk/metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql?revision=1329416&view=co&pathrev=1329416 ____ From: Raj Hadoop To: Hive ; User Sent: Tuesday, May 21, 2013 3:08 PM Subject: Where to get Oracle scripts for Hive Metastore I am trying to get Oracle scripts for Hive Metastore. http://mail-archives.apache.org/mod_mbox/hive-commits/201204.mbox/%3c20120423201303.9742b2388...@eris.apache.org%3E The scripts in the above link has a + at the begining of each line. How should I supposed to execute scripts like this through Oracle sqlplus. +CREATE TABLE PART_COL_PRIVS +( + PART_COLUMN_GRANT_ID NUMBER NOT NULL, + "COLUMN_NAME" VARCHAR2(128) NULL, + CREATE_TIME NUMBER (10) NOT NULL, + GRANT_OPTION NUMBER (5) NOT NULL, + GRANTOR VARCHAR2(128) NULL, + GRANTOR_TYPE VARCHAR2(128) NULL, + PART_ID NUMBER NULL, + PRINCIPAL_NAME VARCHAR2(128) NULL, + PRINCIPAL_TYPE VARCHAR2(128) NULL, + PART_COL_PRIV VARCHAR2(128) NULL +); +
Where to get Oracle scripts for Hive Metastore
I am trying to get Oracle scripts for Hive Metastore. http://mail-archives.apache.org/mod_mbox/hive-commits/201204.mbox/%3c20120423201303.9742b2388...@eris.apache.org%3E The scripts in the above link has a + at the begining of each line. How should I supposed to execute scripts like this through Oracle sqlplus. +CREATE TABLE PART_COL_PRIVS +( + PART_COLUMN_GRANT_ID NUMBER NOT NULL, + "COLUMN_NAME" VARCHAR2(128) NULL, + CREATE_TIME NUMBER (10) NOT NULL, + GRANT_OPTION NUMBER (5) NOT NULL, + GRANTOR VARCHAR2(128) NULL, + GRANTOR_TYPE VARCHAR2(128) NULL, + PART_ID NUMBER NULL, + PRINCIPAL_NAME VARCHAR2(128) NULL, + PRINCIPAL_TYPE VARCHAR2(128) NULL, + PART_COL_PRIV VARCHAR2(128) NULL +); +
Re: hive.metastore.warehouse.dir - Should it point to a physical directory
Thanks Sanjay From: Sanjay Subramanian To: bharath vissapragada ; "user@hive.apache.org" ; Raj Hadoop Cc: User Sent: Tuesday, May 21, 2013 2:27 PM Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical directory Hi Raj http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Quick-Start/cdh4qs_topic_3.html Installing CDH4 on a Single Linux Node in Pseudo-distributed Mode On the left panel of the page u will find info on Hive installation etc. I suggest CHD4 distribution only because it helps u to get started quickly…as developers I love to install from individual tar balls but sometimes there is little time to learn and execute There are some great notes here sanjay From: bharath vissapragada Date: Tuesday, May 21, 2013 11:12 AM To: "user@hive.apache.org" , Raj Hadoop Cc: Sanjay Subramanian , User Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical directory Yes ! On Tue, May 21, 2013 at 11:41 PM, Raj Hadoop wrote: So that means I need to create a HDFS ( Not an OS physical directory ) directory under Hadoop that need to be used in the Hive config file for this property. Right? > > > >From: Dean Wampler >To: Raj Hadoop >Cc: Sanjay Subramanian ; >"user@hive.apache.org" ; User >Sent: Tuesday, May 21, 2013 2:06 PM > >Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical >directory > > > >No, you only need a directory in HDFS, which will be "virtually located" >somewhere in your cluster automatically by HDFS. > > >Also there's a typo in your hive.xml: > > > > >Should be > > > /correct/path/in/hdfs/to/your/warehouse/directory > > >On Tue, May 21, 2013 at 1:04 PM, Raj Hadoop wrote: > >Thanks Sanjay. >> >>My environment is like this. >> >>$ echo $HADOOP_HOME >>/software/home/hadoop/hadoop/hadoop-1.1.2 >> >>$ echo $HIVE_HOME >>/software/home/hadoop/hive/hive-0.9.0 >> >>$ id >>uid=50052(hadoop) gid=600(apps) groups=600(apps) >> >> >>So can i do like this: >> >>$pwd >>/software/home/hadoop/hive/hive-0.9.0 >> >>$mkdir warehouse >> >>$cd /software/home/hadoop/hive/hive-0.9.0/warehouse >> >>$ in hive-site.xml >> hive.metastore.warehouse.dir >> >> location of default database for the warehouse >> >> >>Where should I create the HDFS directory ? >> >> >>From: Sanjay Subramanian >>To: "user@hive.apache.org" ; Raj Hadoop >>; Dean Wampler >>Cc: User >>Sent: Tuesday, May 21, 2013 1:53 PM >> >>Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical >>directory >> >> >> >>Notes below >> >>From: Raj Hadoop >>Reply-To: "user@hive.apache.org" , Raj Hadoop >> >>Date: Tuesday, May 21, 2013 10:49 AM >>To: Dean Wampler , "user@hive.apache.org" >> >>Cc: User >>Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical >>directory >> >> >> >>Ok.I got it. My questions - >> >>1) Should a local physical directory be created before using this property? >>I created a directory in HDFS during Hive installation >>/user/hive/warehouse >> >> >>My hive-site.xml has the following property defined >> >> >> >> hive.metastore.warehouse.dir >> /user/hive/warehouse >> location of default database for the warehouse >> >> >>2) Should a HDFS file directory be created from Hadoop before using this >>property? >>hdfs dfs -mkdir /user/hive/warehouse >>Change the owner:group to hive:hive >> >> >> >>From: Dean Wampler >>To: user@hive.apache.org; Raj Hadoop >>Cc: User >>Sent: Tuesday, May 21, 2013 1:44 PM >>Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical >>directory >> >> >> >>The name is misleading; this is the directory within HDFS where Hive stores >>the data, by default. (External tables can go elsewhere). It doesn't really >>have anything to do with the metastore. >> >> >>dean >> >> >>On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop wrote: >> >>Can some one help me on this ? I am stuck installing and configuring Hive >>with Oracle. Your timely help is really aprreciated. >>> >>> >>> >>>From: Raj Hadoop >>>To: Hive ; User >>>Sent: Tuesday, May 21, 2013 1:08 PM >>>S
Re: hive.metastore.warehouse.dir - Should it point to a physical directory
So that means I need to create a HDFS ( Not an OS physical directory ) directory under Hadoop that need to be used in the Hive config file for this property. Right? From: Dean Wampler To: Raj Hadoop Cc: Sanjay Subramanian ; "user@hive.apache.org" ; User Sent: Tuesday, May 21, 2013 2:06 PM Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical directory No, you only need a directory in HDFS, which will be "virtually located" somewhere in your cluster automatically by HDFS. Also there's a typo in your hive.xml: Should be /correct/path/in/hdfs/to/your/warehouse/directory On Tue, May 21, 2013 at 1:04 PM, Raj Hadoop wrote: Thanks Sanjay. > >My environment is like this. > >$ echo $HADOOP_HOME >/software/home/hadoop/hadoop/hadoop-1.1.2 > >$ echo $HIVE_HOME >/software/home/hadoop/hive/hive-0.9.0 > >$ id >uid=50052(hadoop) gid=600(apps) groups=600(apps) > > >So can i do like this: > >$pwd >/software/home/hadoop/hive/hive-0.9.0 > >$mkdir warehouse > >$cd /software/home/hadoop/hive/hive-0.9.0/warehouse > >$ in hive-site.xml > hive.metastore.warehouse.dir > > location of default database for the warehouse > > >Where should I create the HDFS directory ? > > >From: Sanjay Subramanian >To: "user@hive.apache.org" ; Raj Hadoop >; Dean Wampler >Cc: User >Sent: Tuesday, May 21, 2013 1:53 PM > >Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical >directory > > > >Notes below > >From: Raj Hadoop >Reply-To: "user@hive.apache.org" , Raj Hadoop > >Date: Tuesday, May 21, 2013 10:49 AM >To: Dean Wampler , "user@hive.apache.org" > >Cc: User >Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical >directory > > > >Ok.I got it. My questions - > >1) Should a local physical directory be created before using this property? >I created a directory in HDFS during Hive installation >/user/hive/warehouse > > >My hive-site.xml has the following property defined > > > > hive.metastore.warehouse.dir > /user/hive/warehouse > location of default database for the warehouse > > >2) Should a HDFS file directory be created from Hadoop before using this >property? >hdfs dfs -mkdir /user/hive/warehouse >Change the owner:group to hive:hive > > > >From: Dean Wampler >To: user@hive.apache.org; Raj Hadoop >Cc: User >Sent: Tuesday, May 21, 2013 1:44 PM >Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical >directory > > > >The name is misleading; this is the directory within HDFS where Hive stores >the data, by default. (External tables can go elsewhere). It doesn't really >have anything to do with the metastore. > > >dean > > >On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop wrote: > >Can some one help me on this ? I am stuck installing and configuring Hive with >Oracle. Your timely help is really aprreciated. >> >> >> >>From: Raj Hadoop >>To: Hive ; User >>Sent: Tuesday, May 21, 2013 1:08 PM >>Subject: hive.metastore.warehouse.dir - Should it point to a physical >>directory >> >> >> >>Hi, >> >>I am configurinig Hive. I ahve a question on the property >>hive.metastore.warehouse.dir. >> >>Should this point to a physical directory. I am guessing it is a logical >>directory under Hadoop fs.default.name. Please advise whether I need to >>create any directory for the variable hive.metastore.warehouse.dir >> >>Thanks, >>Raj >> >> > > > >-- >Dean Wampler, Ph.D. >@deanwampler >http://polyglotprogramming.com/ > > > >CONFIDENTIALITY NOTICE >== >This email message and any attachments are for the exclusive use of the >intended recipient(s) and may contain confidential and privileged information. >Any unauthorized review, use, disclosure or distribution is prohibited. If you >are not the intended recipient, please contact the sender by reply email and >destroy all copies of the original message along with any attachments, from >your computer system. If you are the intended recipient, please be advised >that the content of this message is subject to access, review and disclosure >by the sender's Email System Administrator. > > > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com/
Re: hive.metastore.warehouse.dir - Should it point to a physical directory
yes thats what i meant. local physical directory. thanks. From: bharath vissapragada To: user@hive.apache.org; Raj Hadoop Cc: User Sent: Tuesday, May 21, 2013 1:59 PM Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical directory Hi, If by "local physical directory" you mean a directory in the underlying OS file system, then No. You just need to create a directory in HDFS and ad it to that xml config file. Thanks, On Tue, May 21, 2013 at 11:19 PM, Raj Hadoop wrote: Ok.I got it. My questions - > >1) Should a local physical directory be created before using this property? >2) Should a HDFS file directory be created from Hadoop before using this >property? > > > > >From: Dean Wampler >To: user@hive.apache.org; Raj Hadoop >Cc: User >Sent: Tuesday, May 21, 2013 1:44 PM >Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical >directory > > > >The name is misleading; this is the directory within HDFS where Hive stores >the data, by default. (External tables can go elsewhere). It doesn't really >have anything to do with the metastore. > > >dean > > >On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop wrote: > >Can some one help me on this ? I am stuck installing and configuring Hive with >Oracle. Your timely help is really aprreciated. >> >> >> >>From: Raj Hadoop >>To: Hive ; User >>Sent: Tuesday, May 21, 2013 1:08 PM >>Subject: hive.metastore.warehouse.dir - Should it point to a physical >>directory >> >> >> >>Hi, >> >>I am configurinig Hive. I ahve a question on the property >>hive.metastore.warehouse.dir. >> >>Should this point to a physical directory. I am guessing it is a logical >>directory under Hadoop fs.default.name. Please advise whether I need to >>create any directory for the variable hive.metastore.warehouse.dir >> >>Thanks, >>Raj >> >> > > > >-- >Dean Wampler, Ph.D. >@deanwampler >http://polyglotprogramming.com/ > >
Re: hive.metastore.warehouse.dir - Should it point to a physical directory
Thanks Sanjay. My environment is like this. $ echo $HADOOP_HOME /software/home/hadoop/hadoop/hadoop-1.1.2 $ echo $HIVE_HOME /software/home/hadoop/hive/hive-0.9.0 $ id uid=50052(hadoop) gid=600(apps) groups=600(apps) So can i do like this: $pwd /software/home/hadoop/hive/hive-0.9.0 $mkdir warehouse $cd /software/home/hadoop/hive/hive-0.9.0/warehouse $ in hive-site.xml hive.metastore.warehouse.dir location of default database for the warehouse Where should I create the HDFS directory ? From: Sanjay Subramanian To: "user@hive.apache.org" ; Raj Hadoop ; Dean Wampler Cc: User Sent: Tuesday, May 21, 2013 1:53 PM Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical directory Notes below From: Raj Hadoop Reply-To: "user@hive.apache.org" , Raj Hadoop Date: Tuesday, May 21, 2013 10:49 AM To: Dean Wampler , "user@hive.apache.org" Cc: User Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical directory Ok.I got it. My questions - 1) Should a local physical directory be created before using this property? I created a directory in HDFS during Hive installation /user/hive/warehouse My hive-site.xml has the following property defined hive.metastore.warehouse.dir /user/hive/warehouse location of default database for the warehouse 2) Should a HDFS file directory be created from Hadoop before using this property? hdfs dfs -mkdir /user/hive/warehouse Change the owner:group to hive:hive From: Dean Wampler To: user@hive.apache.org; Raj Hadoop Cc: User Sent: Tuesday, May 21, 2013 1:44 PM Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical directory The name is misleading; this is the directory within HDFS where Hive stores the data, by default. (External tables can go elsewhere). It doesn't really have anything to do with the metastore. dean On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop wrote: Can some one help me on this ? I am stuck installing and configuring Hive with Oracle. Your timely help is really aprreciated. > > > >From: Raj Hadoop >To: Hive ; User >Sent: Tuesday, May 21, 2013 1:08 PM >Subject: hive.metastore.warehouse.dir - Should it point to a physical directory > > > >Hi, > >I am configurinig Hive. I ahve a question on the property >hive.metastore.warehouse.dir. > >Should this point to a physical directory. I am guessing it is a logical >directory under Hadoop fs.default.name. Please advise whether I need to create >any directory for the variable hive.metastore.warehouse.dir > >Thanks, >Raj > > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com/ CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: hive.metastore.warehouse.dir - Should it point to a physical directory
Ok.I got it. My questions - 1) Should a local physical directory be created before using this property? 2) Should a HDFS file directory be created from Hadoop before using this property? From: Dean Wampler To: user@hive.apache.org; Raj Hadoop Cc: User Sent: Tuesday, May 21, 2013 1:44 PM Subject: Re: hive.metastore.warehouse.dir - Should it point to a physical directory The name is misleading; this is the directory within HDFS where Hive stores the data, by default. (External tables can go elsewhere). It doesn't really have anything to do with the metastore. dean On Tue, May 21, 2013 at 12:42 PM, Raj Hadoop wrote: Can some one help me on this ? I am stuck installing and configuring Hive with Oracle. Your timely help is really aprreciated. > > > >From: Raj Hadoop >To: Hive ; User >Sent: Tuesday, May 21, 2013 1:08 PM >Subject: hive.metastore.warehouse.dir - Should it point to a physical directory > > > >Hi, > >I am configurinig Hive. I ahve a question on the property >hive.metastore.warehouse.dir. > >Should this point to a physical directory. I am guessing it is a logical >directory under Hadoop fs.default.name. Please advise whether I need to create >any directory for the variable hive.metastore.warehouse.dir > >Thanks, >Raj > > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com/
Re: hive.metastore.warehouse.dir - Should it point to a physical directory
Can some one help me on this ? I am stuck installing and configuring Hive with Oracle. Your timely help is really aprreciated. From: Raj Hadoop To: Hive ; User Sent: Tuesday, May 21, 2013 1:08 PM Subject: hive.metastore.warehouse.dir - Should it point to a physical directory Hi, I am configurinig Hive. I ahve a question on the property hive.metastore.warehouse.dir. Should this point to a physical directory. I am guessing it is a logical directory under Hadoop fs.default.name. Please advise whether I need to create any directory for the variable hive.metastore.warehouse.dir Thanks, Raj
hive.metastore.warehouse.dir - Should it point to a physical directory
Hi, I am configurinig Hive. I ahve a question on the property hive.metastore.warehouse.dir. Should this point to a physical directory. I am guessing it is a logical directory under Hadoop fs.default.name. Please advise whether I need to create any directory for the variable hive.metastore.warehouse.dir Thanks, Raj