Loading data in Hive 0.11 - permission issue

2013-08-05 Thread Sachin Sudarshana
Hi,

I'm using Hive 0.11, downloaded the tarball from Apache's website.

I have a Linux user called *admin * and i invoke the hive CLI using this
user.

In the hive terminal I created a table as follows:

*hive> create table ptest (pkey INT, skey INT, fkey INT, rkey INT, units
INT) row format delimited fields terminated by ',' lines terminated by '\n'
stored as textfile;*
*OK*
*Time taken: 0.241 seconds*

When i try to load data into the table i get the following error:

*hive> LOAD DATA LOCAL INPATH '/home/admin/sample.csv' OVERWRITE INTO TABLE
ptest;*
*Copying data from file:/home/admin/sample.csv*
*Copying file: file:/home/admin/sample.csv*
*Loading data to table default.ptest*
*rmr: DEPRECATED: Please use 'rm -r' instead.*
*rmr: Permission denied: user=admin, access=ALL,
inode="/user/hive_0.11/warehouse/ptest":root:hive:drwxr-xr-x*
*Failed with exception Permission denied: user=admin, access=ALL,
inode="/user/hive_0.11/warehouse/ptest":root:hive:drwxr-xr-x*
*at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
*
*at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSubAccess(FSPermissionChecker.java:174)
*
*at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:144)
*
*at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4684)
*
*at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2794)
*
*at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2757)
*
*at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2740)
*
*at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:621)
*
*at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:406)
*
*at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44094)
*
*at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
*
*at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)*
*at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)*
*at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)*
*at java.security.AccessController.doPrivileged(Native Method)*
*at javax.security.auth.Subject.doAs(Subject.java:415)*
*at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
*
*at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)*
*
*
*FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask*


When I looked into the warehouse directory,

*hive>dfs -ls /user/hive_0.11/warehouse;*
*Found 1 item*
*drwxr-xr-x   - root  hive0 2013-08-05 17:04
/user/hive_0.11/warehouse/ptest*

It seems like the file is owned by the root user even though it was created
by invoking hive CLI from the user admin.

I'm unable to figure out why the owner of the table has been assigned as
root. Could anyone please help me out?

Thank you,
Sachin


Sequence file compression in Hive

2013-06-10 Thread Sachin Sudarshana
Hi,

I have a table stored as SEQUENCEFILE in hive-0.10,* facts520_normal_seq*

Now, i wish to create another table stored as a SEQUENCEFILE itself, but
compressed using the Gzip codec.

So, i set the compression codec and type as BLOCK and then executed the
following query:

*SET hive.exec.compress.output=true;*
*SET
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;*
*SET mapred.output.compression.type=BLOCK;*

*create table test1facts520_gzip_seq as select * from facts520_normal_seq;*
*
*
The table got created and was compressed as well.

*[root@aana1 comp_data]# sudo -u hdfs hadoop fs -ls
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq*
*Found 5 items*
*-rw-r--r--   3 admin supergroup   38099145 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/00_0.gz*
*-rw-r--r--   3 admin supergroup   31450189 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/01_0.gz*
*-rw-r--r--   3 admin supergroup   20764259 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/02_0.gz*
*-rw-r--r--   3 admin supergroup   21107597 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/03_0.gz*
*-rw-r--r--   3 admin supergroup   12202692 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/04_0.gz*
*
*
However, when i checked the table properties, it was surprising to see that
the table has been stored as a textfile!

*hive> show create table test1facts520_gzip_seq;*
*OK*
*CREATE  TABLE test1facts520_gzip_seq(*
*  fact_key bigint,*
*  products_key int,*
*  retailers_key int,*
*  suppliers_key int,*
*  time_key int,*
*  units int)*
*ROW FORMAT SERDE*
*  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'*
*STORED AS INPUTFORMAT*
*  'org.apache.hadoop.mapred.TextInputFormat'*
*OUTPUTFORMAT*
*  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'*
*LOCATION*
*  'hdfs://
aana1.ird.com/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq'*
*TBLPROPERTIES (*
*  'numPartitions'='0',*
*  'numFiles'='5',*
*  'transient_lastDdlTime'='1370867198',*
*  'numRows'='0',*
*  'totalSize'='123623882',*
*  'rawDataSize'='0')*
*Time taken: 0.15 seconds*
*
*
*
*
So, i tried adding the STORED AS clause to my earlier create table
statement and created a new table:

*create table test3facts520_gzip_seq STORED AS SEQUENCEFILE as select *
from facts520_normal_seq;*
*
*
This time, the output table got stored as a SEQUENCEFILE,

*hive> show create table test3facts520_gzip_seq;*
*OK*
*CREATE  TABLE test3facts520_gzip_seq(*
*  fact_key bigint,*
*  products_key int,*
*  retailers_key int,*
*  suppliers_key int,*
*  time_key int,*
*  units int)*
*ROW FORMAT SERDE*
*  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'*
*STORED AS INPUTFORMAT*
*  'org.apache.hadoop.mapred.SequenceFileInputFormat'*
*OUTPUTFORMAT*
*  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'*
*LOCATION*
*  'hdfs://
aana1.ird.com/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq'*
*TBLPROPERTIES (*
*  'numPartitions'='0',*
*  'numFiles'='5',*
*  'transient_lastDdlTime'='137086',*
*  'numRows'='0',*
*  'totalSize'='129811519',*
*  'rawDataSize'='0')*
*Time taken: 0.135 seconds*

But, the compression itself did not happen!

*[root@aana1 comp_data]# sudo -u hdfs hadoop fs -ls
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq*
*Found 5 items*
*-rw-r--r--   3 admin supergroup   40006368 2013-06-10 18:06
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/00_0*
*-rw-r--r--   3 admin supergroup   33026961 2013-06-10 18:06
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/01_0*
*-rw-r--r--   3 admin supergroup   21797242 2013-06-10 18:05
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/02_0*
*-rw-r--r--   3 admin supergroup   22171637 2013-06-10 18:05
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/03_0*
*-rw-r--r--   3 admin supergroup   12809311 2013-06-10 18:05
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/04_0*

Is there anything that I have done wrong, or I have missed something ?

Any help would be greatly appreciated!

Thank you,
Sachin


Compression in Hive

2013-06-09 Thread Sachin Sudarshana
Hi,

I have been testing the usefulness of compression in Hive. I have a general
question,

I would like to know if there are any particular cases where compression in
hive can actually prove useful while running any MR jobs.

Any pointers/examples would really be useful!

Thank you,
Sachin


Compression in Hive using different file formats

2013-06-09 Thread Sachin Sudarshana
Hi,

I was testing Compression in Hive using different file formats.

I have a table stored as a sequence file ,* facts_normal_seq*.

Now I wish to create another table *facts_snappy_seq *by using Snappy
compression codec.

Is this the correct way to do this:

*CREATE TABLE facts_snappy_seq ( ,  ) ROW FORMAT
STORED AS SEQUENCEFILE;*
*
*
*SET hive.exec.compress.output=true;*
*SET
mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;*
*SET mapred.output.compression.type=BLOCK; *
*
*
*INSERT OVERWRITE TABLE facts_snappy_seq SELECT * FROM facts_normal_seq;*
*
*
When i populate the table in this manner, the file in HDFS doesn not seem
to have the .snappy extension.

Any pointers in this regard would really be helpful

Thank you,
Sachin


Re: Textfile compression using Gzip codec

2013-06-06 Thread Sachin Sudarshana
Hi Stephen,

Thank you for your reply.

But, its the silliest error from my side. Its a typo!

The codec is : org.apache.hadoop.io.compress.*GzipCodec* and not
org.apache.hadoop.io.compress.*GZipCodec.*
*
*
I regret making that mistake.

Thank you,
Sachin


On Thu, Jun 6, 2013 at 10:07 PM, Stephen Sprague  wrote:

> Hi Sachin,
> LIke you say looks like something to do with the GZipCodec all right. And
> that would make sense given your original problem.
>
> Yeah, one would think it'd be in there by default but for whatever reason
> its not finding it but at least the problem is now identified.
>
> Now _my guess_ is that maybe your hadoop core-site.xml file might need to
> list the codecs available under the property name:
> "io.compression.codecs".  Can you chase that up as a possibility and let us
> know what you find out?
>
>
>
>
> On Thu, Jun 6, 2013 at 4:02 AM, Sachin Sudarshana  > wrote:
>
>> Hi Stephen,
>>
>> *hive> show create table facts520_normal_text;*
>> *OK*
>> *CREATE  TABLE facts520_normal_text(*
>> *  fact_key bigint,*
>>  *  products_key int,*
>> *  retailers_key int,*
>> *  suppliers_key int,*
>> *  time_key int,*
>> *  units int)*
>> *ROW FORMAT DELIMITED*
>> *  FIELDS TERMINATED BY ','*
>> *  LINES TERMINATED BY '\n'*
>> *STORED AS INPUTFORMAT*
>> *  'org.apache.hadoop.mapred.TextInputFormat'*
>> *OUTPUTFORMAT*
>> *  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'*
>> *LOCATION*
>> *  'hdfs://
>> aana1.ird.com/user/hive/warehouse/facts_520.db/facts520_normal_text'*
>> *TBLPROPERTIES (*
>> *  'numPartitions'='0',*
>> *  'numFiles'='1',*
>> *  'transient_lastDdlTime'='1369395430',*
>> *  'numRows'='0',*
>> *  'totalSize'='545216508',*
>> *  'rawDataSize'='0')*
>> *Time taken: 0.353 seconds*
>>
>>
>> The syserror log shows this:
>>
>> *java.lang.IllegalArgumentException: Compression codec
>> org.apache.hadoop.io.compress.GZipCodec was not found.*
>> * at
>> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:85)
>> *
>> * at
>> org.apache.hadoop.hive.ql.exec.Utilities.getFileExtension(Utilities.java:934)
>> *
>> * at
>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:469)
>> *
>> * at
>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:543)
>> *
>> * at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)*
>> * at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)*
>> * at
>> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>> *
>> * at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)*
>> * at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)*
>> * at
>> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
>> *
>> * at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)*
>> * at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)*
>> * at
>> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:546)*
>> * at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)*
>> * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)*
>> * at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)*
>> * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)*
>> * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)*
>> * at java.security.AccessController.doPrivileged(Native Method)*
>> * at javax.security.auth.Subject.doAs(Subject.java:415)*
>> * at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> *
>> * at org.apache.hadoop.mapred.Child.main(Child.java:262)*
>> *Caused by: java.lang.ClassNotFoundException: Class
>> org.apache.hadoop.io.compress.GZipCodec not found*
>> * at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
>> *
>> * at
>> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:82)
>> *
>> * ... 21 more*
>> *java.lang.IllegalArgumentException: Compression codec
>> org.apache.hadoop.io.compress.GZipCodec was not found.*
>> * at
>> org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:8

Re: Textfile compression using Gzip codec

2013-06-06 Thread Sachin Sudarshana
ot found.*
* at
org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:85)
*
* at
org.apache.hadoop.hive.ql.exec.Utilities.getFileExtension(Utilities.java:934)
*
* at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:469)
*
* ... 14 more*
*Caused by: java.lang.ClassNotFoundException: Class
org.apache.hadoop.io.compress.GZipCodec not found*
* at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
*
* at
org.apache.hadoop.mapred.FileOutputFormat.getOutputCompressorClass(FileOutputFormat.java:82)
*
* ... 16 more*

It says that  GZipCodec is not found.
Isn't Snappy,GZip and BZip codecs available on Hadoop by default?

Thank you,
Sachin





On Wed, Jun 5, 2013 at 11:58 PM, Stephen Sprague  wrote:

> well...   the hiveException has the word "metadata" in it.  maybe that's a
> hint or a red-herrring. :)Let's try the following:
>
> 1.  show create table * facts520_normal_text;
>
> *
> *2.  anything useful at this URL? **
> http://aana1.ird.com:50030/taskdetails.jsp?jobid=job_201306051948_0010&tipid=task_201306051948_0010_m_02or
>  is it just the same stack dump?
>
>
> *
>
>
> On Wed, Jun 5, 2013 at 3:17 AM, Sachin Sudarshana  > wrote:
>
>> Hi,
>>
>> I have hive 0.10 + (CDH 4.2.1 patches) installed on my cluster.
>>
>> I have a table facts520_normal_text stored as a textfile. I'm trying to
>> create a compressed table from this table using GZip codec.
>>
>> *hive> SET hive.exec.compress.output=true;*
>> *hive> SET
>> mapred.output.compression.codec=org.apache.hadoop.io.compress.GZipCodec;*
>> *hive> SET mapred.output.compression.type=BLOCK;*
>> *
>> *
>> *hive>*
>> *> Create table facts520_gzip_text*
>> *> (fact_key BIGINT,*
>> *> products_key INT,*
>> *> retailers_key INT,*
>> *> suppliers_key INT,*
>> *> time_key INT,*
>> *> units INT)*
>> *> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','*
>> *> LINES TERMINATED BY '\n'*
>> *> STORED AS TEXTFILE;*
>> *
>> *
>> *hive> INSERT OVERWRITE TABLE facts520_gzip_text SELECT * from
>> facts520_normal_text;*
>>
>>
>> When I run the above queries, the MR job fails.
>>
>> The error that the Hive CLI itself shows is the following:
>>
>> *Total MapReduce jobs = 3*
>> *Launching Job 1 out of 3*
>> *Number of reduce tasks is set to 0 since there's no reduce operator*
>> *Starting Job = job_201306051948_0010, Tracking URL =
>> http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010*
>> *Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill
>> job_201306051948_0010*
>> *Hadoop job information for Stage-1: number of mappers: 3; number of
>> reducers: 0*
>> *2013-06-05 21:09:42,281 Stage-1 map = 0%,  reduce = 0%*
>> *2013-06-05 21:10:11,446 Stage-1 map = 100%,  reduce = 100%*
>> *Ended Job = job_201306051948_0010 with errors*
>> *Error during job, obtaining debugging information...*
>> *Job Tracking URL:
>> http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010*
>> *Examining task ID: task_201306051948_0010_m_04 (and more) from job
>> job_201306051948_0010*
>> *Examining task ID: task_201306051948_0010_m_01 (and more) from job
>> job_201306051948_0010*
>> *
>> *
>> *Task with the most failures(4):*
>> *-*
>> *Task ID:*
>> *  task_201306051948_0010_m_02*
>> *
>> *
>> *URL:*
>> *
>> http://aana1.ird.com:50030/taskdetails.jsp?jobid=job_201306051948_0010&tipid=task_201306051948_0010_m_02
>> *
>> *-*
>> *Diagnostic Messages for this Task:*
>> *java.lang.RuntimeException:
>> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
>> processing row
>> {"fact_key":7549094,"products_key":205,"retailers_key":304,"suppliers_key":402,"time_key":103,"units":23}
>> *
>> *at
>> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)*
>> *at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)*
>> *at
>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)*
>> *at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)*
>> *at org.apache.hadoop.mapred.Child$4.run(Child.java:268)*
>> *at java.security.AccessController.doPrivileged(Native Method)*
>> *at javax.security.auth.Subject.doAs(Subject.java:415)*
>> *at
>

Textfile compression using Gzip codec

2013-06-05 Thread Sachin Sudarshana
Hi,

I have hive 0.10 + (CDH 4.2.1 patches) installed on my cluster.

I have a table facts520_normal_text stored as a textfile. I'm trying to
create a compressed table from this table using GZip codec.

*hive> SET hive.exec.compress.output=true;*
*hive> SET
mapred.output.compression.codec=org.apache.hadoop.io.compress.GZipCodec;*
*hive> SET mapred.output.compression.type=BLOCK;*
*
*
*hive>*
*> Create table facts520_gzip_text*
*> (fact_key BIGINT,*
*> products_key INT,*
*> retailers_key INT,*
*> suppliers_key INT,*
*> time_key INT,*
*> units INT)*
*> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','*
*> LINES TERMINATED BY '\n'*
*> STORED AS TEXTFILE;*
*
*
*hive> INSERT OVERWRITE TABLE facts520_gzip_text SELECT * from
facts520_normal_text;*


When I run the above queries, the MR job fails.

The error that the Hive CLI itself shows is the following:

*Total MapReduce jobs = 3*
*Launching Job 1 out of 3*
*Number of reduce tasks is set to 0 since there's no reduce operator*
*Starting Job = job_201306051948_0010, Tracking URL =
http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010*
*Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_201306051948_0010*
*Hadoop job information for Stage-1: number of mappers: 3; number of
reducers: 0*
*2013-06-05 21:09:42,281 Stage-1 map = 0%,  reduce = 0%*
*2013-06-05 21:10:11,446 Stage-1 map = 100%,  reduce = 100%*
*Ended Job = job_201306051948_0010 with errors*
*Error during job, obtaining debugging information...*
*Job Tracking URL:
http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010*
*Examining task ID: task_201306051948_0010_m_04 (and more) from job
job_201306051948_0010*
*Examining task ID: task_201306051948_0010_m_01 (and more) from job
job_201306051948_0010*
*
*
*Task with the most failures(4):*
*-*
*Task ID:*
*  task_201306051948_0010_m_02*
*
*
*URL:*
*
http://aana1.ird.com:50030/taskdetails.jsp?jobid=job_201306051948_0010&tipid=task_201306051948_0010_m_02
*
*-*
*Diagnostic Messages for this Task:*
*java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
{"fact_key":7549094,"products_key":205,"retailers_key":304,"suppliers_key":402,"time_key":103,"units":23}
*
*at
org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)*
*at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)*
*at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)*
*at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)*
*at org.apache.hadoop.mapred.Child$4.run(Child.java:268)*
*at java.security.AccessController.doPrivileged(Native Method)*
*at javax.security.auth.Subject.doAs(Subject.java:415)*
*at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
*
*at org.apache.hadoop.mapred.Child.main(Child.java:262)*
*Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row
{"fact_key":7549094,"products_key":205,"retailers_key":304,"suppliers_key":402,"time_key":103,"units":23}
*
*at org.apach*
*
*
*FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask*
*MapReduce Jobs Launched:*
*Job 0: Map: 3   HDFS Read: 0 HDFS Write: 0 FAIL*
*Total MapReduce CPU Time Spent: 0 msec*


I'm unable to figure out why this is happening. It looks like the data is
not being able to be copied properly.
Or is it that GZip codec is not supported on textfiles?

Any help in this issue is greatly appreciated!

Thank you,
Sachin


Re: io.compression.codecs not found

2013-05-23 Thread Sachin Sudarshana
Hi Bejoy,

Thanks for the reply.
I would like to know "what" are the codecs that are available by default in
the Hadoop system, among which i can choose to set in the core-site.xml.

For ex: LZO compression codecs are not available by default and we have to
install the required libraries for it.

Thank you,
Sachin


On Thu, May 23, 2013 at 7:55 PM,  wrote:

> **
> Go to $HADOOP_HOME/config open and edit core-site.xml
>
> Add a new property 'io.compression.codecs' and assign the required
> compression codecs as its value.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> --
> *From: * Sachin Sudarshana 
> *Date: *Thu, 23 May 2013 19:46:37 +0530
> *To: *
> *ReplyTo: * user@hive.apache.org
> *Subject: *Re: io.compression.codecs not found
>
> Hi,
>
> I'm not using CM. I have installed CDH 4.2.1 using Linux packages.
>
> Thank you,
> Sachin
>
>
> On Thu, May 23, 2013 at 7:13 PM, Sanjay Subramanian <
> sanjay.subraman...@wizecommerce.com> wrote:
>
>> This property needs to be set in core-site.xml. If u r using
>> clouderamanager then ping me I will tell u how to set it there. Out of the
>> box hive works beautifully with gzip and snappy.  And if u r using lzo then
>> needs some plumbing. Depends on what ur usecase is I can provide guidance.
>>
>> Regards
>> Sanjay
>>
>> Sent from my iPhone
>>
>> On May 23, 2013, at 3:33 AM, "Sachin Sudarshana" 
>> wrote:
>>
>> > Hi,
>> >
>> > I'm trying to run some queries on compressed tables in Hive 0.10. I
>> wish to know what all compression codecs are available which i can make use
>> of.
>> > However, when i run set io.compression.codecs in the hive CLI, it
>> throws an error saying the io.compression.codecs is not found.
>> >
>> > I'm unable to figure out why its happening. Has it (the hiveconf
>> property) been removed from 0.10?
>> >
>> > Any help is greatly appreciated!
>> >
>> > Thank you,
>> > Sachin
>> >
>>
>> CONFIDENTIALITY NOTICE
>> ==
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>>
>


Re: io.compression.codecs not found

2013-05-23 Thread Sachin Sudarshana
Hi,

I'm not using CM. I have installed CDH 4.2.1 using Linux packages.

Thank you,
Sachin


On Thu, May 23, 2013 at 7:13 PM, Sanjay Subramanian <
sanjay.subraman...@wizecommerce.com> wrote:

> This property needs to be set in core-site.xml. If u r using
> clouderamanager then ping me I will tell u how to set it there. Out of the
> box hive works beautifully with gzip and snappy.  And if u r using lzo then
> needs some plumbing. Depends on what ur usecase is I can provide guidance.
>
> Regards
> Sanjay
>
> Sent from my iPhone
>
> On May 23, 2013, at 3:33 AM, "Sachin Sudarshana" 
> wrote:
>
> > Hi,
> >
> > I'm trying to run some queries on compressed tables in Hive 0.10. I wish
> to know what all compression codecs are available which i can make use of.
> > However, when i run set io.compression.codecs in the hive CLI, it throws
> an error saying the io.compression.codecs is not found.
> >
> > I'm unable to figure out why its happening. Has it (the hiveconf
> property) been removed from 0.10?
> >
> > Any help is greatly appreciated!
> >
> > Thank you,
> > Sachin
> >
>
> CONFIDENTIALITY NOTICE
> ==
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>


io.compression.codecs not found

2013-05-23 Thread Sachin Sudarshana
Hi,

I'm trying to run some queries on compressed tables in Hive 0.10. I wish to
know what all compression codecs are available which i can make use of.
However, when i run set io.compression.codecs in the hive CLI, it throws an
error saying the io.compression.codecs is not found.

I'm unable to figure out why its happening. Has it (the hiveconf property)
been removed from 0.10?

Any help is greatly appreciated!

Thank you,
Sachin


Re: Finding maximum across a row

2013-03-01 Thread Sachin Sudarshana
Hi Bejoy,

I am new to UDF in Hive. Could you send me any link/tutorials on where i
can be able to learn about writing the UDF?

Thanks!

On Fri, Mar 1, 2013 at 10:22 PM,  wrote:

> **
> Hi Sachin
>
> AFAIK There isn't one at the moment. But you can easily achieve this using
> a custom UDF.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ----------
> *From: * Sachin Sudarshana 
> *Date: *Fri, 1 Mar 2013 22:16:37 +0530
> *To: *
> *ReplyTo: * user@hive.apache.org
> *Subject: *Finding maximum across a row
>
> Hi,
>
> Is there any function/method to find the maximum across a row in hive?
>
> Suppose i have a table like this:
>
> ColA   ColB   ColC
> 2  5  7
> 3  2  1
>
> I want the function to return
>
> 7
> 1
>
>
> Its urgently required. Any help would be greatly appreciated!
>
>
>
> --
> Thanks and Regards,
> Sachin Sudarshana
>



-- 
Thanks and Regards,
Sachin Sudarshana


Finding maximum across a row

2013-03-01 Thread Sachin Sudarshana
Hi,

Is there any function/method to find the maximum across a row in hive?

Suppose i have a table like this:

ColA   ColB   ColC
2  5  7
3  2  1

I want the function to return

7
1


Its urgently required. Any help would be greatly appreciated!



-- 
Thanks and Regards,
Sachin Sudarshana


Request to add me onto the list

2013-03-01 Thread Sachin Sudarshana
Hi,

I request you to kindly add me onto this list.

-- 
Thanks and Regards,
Sachin Sudarshana


Re: Security for Hive

2013-02-22 Thread Sachin Sudarshana
Hi,
I have read about roles, user privileges, group privileges etc.
But these roles can be created by any user for any database/table. I would
like to know if there is a specific 'administrator' for hive who can log on
with his credentials and is the only one entitled to create roles, grant
privileges etc.

Thank you.

On Fri, Feb 22, 2013 at 4:19 PM, Jagat Singh  wrote:

> You might want to read this
>
> https://cwiki.apache.org/Hive/languagemanual-auth.html
>
>
>
>
> On Fri, Feb 22, 2013 at 9:44 PM, Sachin Sudarshana <
> sachin.sudarsh...@gmail.com> wrote:
>
>> Hi,
>>
>> I have just started learning about hive.
>> I have configured Hive to use mysql as the metastore instead of derby.
>> If I wish to use GRANT and REVOKE commands, i can use it with any user. A
>> user can issue GRANT or REVOKE commands to any other users' table since
>> both the users' tables are present in the same warehouse.
>>
>> Isn't there a concept of superuser/admin in hive who alone has the
>> authority to issue these commands ?
>>
>> Any answer is greatly appreciated!
>>
>> --
>> Thanks and Regards,
>> Sachin Sudarshana
>>
>
>


-- 
Thanks and Regards,
Sachin Sudarshana