Re: Add few record(s) to a Hive table or a HDFS file on a daily basis

2014-02-09 Thread pandees waran
Why not INSERT INTO for appending new records?

a)load the new records into a staging table
b)INSERT INTO final table from the staging table
On 10-Feb-2014 8:16 am, Raj Hadoop hadoop...@yahoo.com wrote:



 Hi,

 My requirement is a typical Datawarehouse and ETL requirement. I need to
 accomplish

 1) Daily Insert transaction records to a Hive table or a HDFS file. This
 table or file is not a big table ( approximately 10 records per day). I
 don't want to Partition the table / file.


 I am reading a few articles on this. It was being mentioned that we need
 to load to a staging table in Hive. And then insert like the below :

 insert overwrite table finaltable select * from staging;

 I am not getting this logic. How should I populate the staging table daily.

 Thanks,
 Raj





Hive equivalent of dump() in Oracle

2014-02-02 Thread pandees waran
Hi,

In oracle,DUMP returns a VARCHAR2 value containing the datatype code,
length in bytes, and internal representation of expr.

SELECT DUMP('abc', 1016)
   FROM DUAL;

DUMP('ABC',1016)
--
Typ=96 Len=3 CharacterSet=WE8DEC: 61,62,63


Do we have any equivalent function in Hive?

If it's present, can I create a JIRA for this? I feel, it would be
useful much while analyzing data issues.



-- 
Thanks,
Pandeeswaran


Formatting hive queries

2014-01-21 Thread pandees waran
Hi,

I would like to come up with a code which automatically formats your hql
files.
Because, formatting is one of the tedious task and i would like to  come up
with an utility for that.
Please let me know, whether any  specific utilities exist already for
formatting hive queries.

-- 
Thanks,
Pandeeswaran


Special characters support in column names

2013-08-22 Thread pandees waran
Hi,

Currently hive doesn't support special characters(i.e,%,$..etc) in table names.
Is there any request for adding this feature?
Please let me know, what do you feel about this.

-- 
Thanks,
Pandeeswaran


Interpreting explain plan in hive

2013-08-22 Thread pandees waran
Hi,

What are the key areas we need to check in the explain plan generated in Hive?
I have checked the documentation, it's not detailed about the above question.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain
I have similar kind of question asked in our forum, which is unanswered.
http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3CCAAG3+BGHadR65FnR5udmGP9=QcriHuubnR8WR-VbxczdOhA=e...@mail.gmail.com%3E

In summary, how we can distinguish a good/bad plan ?

Thanks for your help.
-- 
Thanks,
Pandeeswaran


Re: only one mapper

2013-08-21 Thread pandees waran
Hi Edward,


  


Could yiu please explain this?


  Snappy + SequenceFile is a better option then LZO.



  





  Thanks,



  Pandeeswaran 



  



  


—
Sent from Mailbox for iPad

On Wed, Aug 21, 2013 at 11:13 PM, Edward Capriolo edlinuxg...@gmail.com
wrote:

 LZO files are only splittable if you index them. Sequence files compresses
 with LZO are splittable without being indexed.
 Snappy + SequenceFile is a better option then LZO.
 On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov i...@decide.com wrote:
 LZO files are combinable so check your max split setting.

 http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3c4e328964.7000...@gmail.com%3E

 igor
 decide.com



 On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 yankunhad...@gmail.com wrote:

 hi all when i use hive
 hive job make only one mapper actually my file split 18 block my block
 size is 128MB and data size 2GB
 i use lzo compression and create file.lzo and make index file.lzo.index
 i use hive 0.10.0

 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Cannot run job locally: Input Size (= 2304560827) is larger than
 hive.exec.mode.local.auto.inputbytes.max (= 134217728)
 Starting Job = job_1377071515613_0003, Tracking URL =
 http://hydra0001:8088/proxy/application_1377071515613_0003/
 Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill
 job_1377071515613_0003
 Hadoop job information for Stage-1: number of mappers: 1; number of
 reducers: 0
 2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
 2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
 6.81 sec
 2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
 6.81 sec
 2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
 6.81 sec
 2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
 9.95 sec
 2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU
 9.95 sec
 2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU
 13.0 sec

 --

 In the Hadoop world, I am just a novice, explore the entire Hadoop
 ecosystem, I hope one day I can contribute their own code

 YanBit
 yankunhad...@gmail.com




LAG throws exceptions

2013-08-18 Thread pandees waran
Hi,

I am executing the below query using lag in 0.11 in Amazon EMR cluster.

SELECT
  id  ,
  MARKET_ID,
  city,
  product_id,
  SALE_DAY,
  isbn,
  seller_ID,
  currency,
  lag(quantity,1,0) over (partition by
isbn,ID,MARKET_ID,city,seller_ID,currency order by SALE_DAY)  AS
start_quantity
FROM test_table

This simple query ended with below exceptions:

Exception in thread Thread-758 java.lang.ClassFormatError:
org/apache/hadoop/mapred/TaskLogServlet
at 
org.apache.hadoop.hive.shims.Hadoop20SShims.getTaskAttemptLogUrl(Hadoop20SShims.java:49)
at 
org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:190)
at 
org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:146)
at java.lang.Thread.run(Thread.java:724)
Counters:
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask

Any thoughts on this? Please let me know, if i am doing something wrong.
-- 
Thanks,
Pandeeswaran


Hive cli Vs beeline cli

2013-08-14 Thread pandees waran
Hi  pros,
Based on your experience with beeline cli,  could you please share your
thoughts in advantages of using beeline cli over default hive  cli?
Please share if you find any useful link for this info.

Thanks
Pandeeswaran


Re: Numbers display in Hive CLI

2013-08-13 Thread pandees waran
Sure , let me explore hive beeline client 
—
Sent from Mailbox for iPad

On Tue, Aug 13, 2013 at 11:24 PM, Stephen Sprague sprag...@gmail.com
wrote:

 Yeah. I would think i'd be a useful feature to have in the client - but
 probably not the Hive CLI client. The Hive client seems pretty bare bones
 and my guess it'll probably stay that way.   The Beeline client, however,
 looks to be where these kinds of bells and whistles probably could/should
 be added.  Check that app out and see if you agree.  (search hive beeline).
 On Tue, Aug 13, 2013 at 9:47 AM, pandees waran pande...@gmail.com wrote:
 Thanks Stephen! I shall check this . My requirement is controlling the
 formatting in session level using some properties set. Looks like, there's
 no such as of now . Would this be a good feature in hive cli?
 If many people think so, then I can file a feature request.
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPad


 On Tue, Aug 13, 2013 at 8:11 PM, Stephen Sprague sprag...@gmail.comwrote:

 well... a good 'ol search (let's not use the word google) of hive udf
 we find this:


 https://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-StringFunctionsand
  there's a reference to a function called format_number().

 or did you really want the *hive CLI* to format the number? if that's the
 case then no there is no option for that in the hive client.


 On Mon, Aug 12, 2013 at 11:30 PM, pandees waran pande...@gmail.comwrote:

 HI,

 I see the SUM(double_column) displays the result in scientific notation
 in the hive cli. Is there any way to customize the number display in hive
 CLI?

 --
 Thanks,
 Pandeeswaran
 Hi,
 I am seeing the double values are displayed as scientifi not





ORC vs TEXT file

2013-08-12 Thread pandees waran
Hi,

Currently, we use TEXTFILE format in hive 0.8 ,while creating the
external tables in intermediate processing .
I have read about ORC in 0.11. I have created the same table in 0.11
with ORC format.
Without any compression, the ORC file(totally 3 files) occupied the
space twice more than the TEXTFILE(only one file).
Even, when i query the data from ORC:
Select count(*) from orc_table

It took more time than the same query against textfile.
But, i see cumulative CPU time is lesser in ORC than the text file.

What sort of queries will benefit, if we use ORC?
In which cases TEXTFILE will be preferred more than ORC?

Thanks.


Re: ORC vs TEXT file

2013-08-12 Thread pandees waran
Thanks Edward.  I shall try compression besides orc and let you know. And
also,  it looks like the cpu  usage is lesser while querying orc rather
than text file.
But the total time taken by the query time is slightly more in orc than
text file.  Could you please explain the difference between cumulative cpu
time and the total time taken (usually in last line in terms or secs)?
Which one should we give preference?
On Aug 12, 2013 7:01 PM, Edward Capriolo edlinuxg...@gmail.com wrote:

 Colmnar formats do not always beat row wise storage. Many times gzip plus
 block storage will compress something better then columnar storage
 especially when you have repeated data in different columns.

 Based on what you are saying it could be possible that you missed a
 setting and the ocr are not compressed.


 On Monday, August 12, 2013, pandees waran pande...@gmail.com wrote:
  Hi,
 
  Currently, we use TEXTFILE format in hive 0.8 ,while creating the
  external tables in intermediate processing .
  I have read about ORC in 0.11. I have created the same table in 0.11
  with ORC format.
  Without any compression, the ORC file(totally 3 files) occupied the
  space twice more than the TEXTFILE(only one file).
  Even, when i query the data from ORC:
  Select count(*) from orc_table
 
  It took more time than the same query against textfile.
  But, i see cumulative CPU time is lesser in ORC than the text file.
 
  What sort of queries will benefit, if we use ORC?
  In which cases TEXTFILE will be preferred more than ORC?
 
  Thanks.
 


Re: ORC vs TEXT file

2013-08-12 Thread pandees waran
Hi Owen,

Thanks for your response.

My structure is like:

a)Textfile:
CREATE EXTERNAL TABLE test_textfile (
COL1 BIGINT,
COL2 STRING,
COL3 BIGINT,
COL4 STRING,
COL5 STRING,
COL6 BIGINT,
COL7 BIGINT,
COL8 BIGINT,
COL9 BIGINT,
COl10 BIGINT,
COl11 BIGINT,
COL12 STRING,
COl13 STRING,
COl14 STRING,
COl15 BIGINT,
COl16 STRING,
COL17 DOUBLE,
COl18 DOUBLE,
COl19 DOUBLE,
COl20 DOUBLE,
COl21 DOUBLE,
COL22 DOUBLE,
COl23 DOUBLE,
COL24 DOUBLE,
COl25 DOUBLE,
COL26 DOUBLE,
COl27 DOUBLE,
COL28 DOUBLE,
COL29 DOUBLE,
COl30 DOUBLE,
COl31 DOUBLE,
COL32 DOUBLE,
COL33 STRING,
COl34 STRING,
COl35 DOUBLE,
COL36 DOUBLE,
COl37 DOUBLE,
COL38 DOUBLE,
COl39 DOUBLE,
COL40 DOUBLE,
COl41 DOUBLE,
COL42 DOUBLE,
COL43 DOUBLE,
COl44 DOUBLE,
COl45 DOUBLE,
COL46 DOUBLE,
COL47 DOUBLE,
COl48 DOUBLE,
COl49 DOUBLE,
COL50 DOUBLE,
COL51 DOUBLE,
COl52 DOUBLE,
COl53 DOUBLE,
COl54 DOUBLE,
COL55 DOUBLE,
COL56 STRING,
COL57 DOUBLE,
COL58 DOUBLE,
COL59 DOUBLE,
COl60 DOUBLE,
COl61 STRING,
COL62 STRING,
COL63 STRING,
COL64 STRING,
COl65 STRING,
COl66 STRING,
COl67 STRING,
COL68 STRING,
Col69 STRING,
COL70 STRING,
COL71 STRING,
COl72 STRING,
COl73 STRING,
COL74  STRING
) PARTITIONED BY (
COL75 STRING,
COL76 STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION 's3://test/textfile/';
Using block level compression and bzip2codec  for output.

b) With the above set of columns, just i have changed as STORED AS ORC for
creating ORC. Not using any compression option

c)Inserted 7256852 records in  both the tables

d)Space occupied in S3:

Storing as ORC(3 files):153.4MB *3=460.2MB
TEXT(single file in bz2 format)=306MB

I need to check ORC with compression enabled.

Please let me know, if i miss anything.

Thanks,




On Mon, Aug 12, 2013 at 8:50 PM, Owen O'Malley omal...@apache.org wrote:

 Pandees,
   I've never seen a table that was larger with ORC than with text. Can you
 share your text's file schema with us? Is the table very small? How many
 rows and GB are the tables? The overhead for ORC is typically small, but as
 Ed says it is possible for rare cases for the overhead to dominate the data
 size itself.

 -- Owen


 On Mon, Aug 12, 2013 at 6:52 AM, pandees waran pande...@gmail.com wrote:

 Thanks Edward.  I shall try compression besides orc and let you know. And
 also,  it looks like the cpu  usage is lesser while querying orc rather
 than text file.
 But the total time taken by the query time is slightly more in orc than
 text file.  Could you please explain the difference between cumulative cpu
 time and the total time taken (usually in last line in terms or secs)?
 Which one should we give preference?
 On Aug 12, 2013 7:01 PM, Edward Capriolo edlinuxg...@gmail.com wrote:

 Colmnar formats do not always beat row wise storage. Many times gzip
 plus block storage will compress something better then columnar storage
 especially when you have repeated data in different columns.

 Based on what you are saying it could be possible that you missed a
 setting and the ocr are not compressed.


 On Monday, August 12, 2013, pandees waran pande...@gmail.com wrote:
  Hi,
 
  Currently, we use TEXTFILE format in hive 0.8 ,while creating the
  external tables in intermediate processing .
  I have read about ORC in 0.11. I have created the same table in 0.11
  with ORC format.
  Without any compression, the ORC file(totally 3 files) occupied the
  space twice more than the TEXTFILE(only one file).
  Even, when i query the data from ORC:
  Select count(*) from orc_table
 
  It took more time than the same query against textfile.
  But, i see cumulative CPU time is lesser in ORC than the text file.
 
  What sort of queries will benefit, if we use ORC?
  In which cases TEXTFILE will be preferred more than ORC?
 
  Thanks.
 





-- 
Thanks,
Pandeeswaran


Re: Join issue in 0.11

2013-08-10 Thread pandees waran
Hi,

Can someone try to reproduce and confirm whether this is an issue in 0.11

a)Create a view with some UDAF in the definition(i tried with
https://github.com/scribd/hive-udaf-maxrow)
b)Join this view with some other table

I am getting the below exception:

Examining task ID: task_201308070831_0010_m_52 (and more) from job
job_201308070831_0010
Exception in thread Thread-98 java.lang.ClassFormatError: Absent
Code attribute in method that is not native or abstract in class file
javax/servlet/http/HttpServlet
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at
org.apache.hadoop.hive.shims.Hadoop20SShims.getTaskAttemptLogUrl(Hadoop20SShims.java:49)
at
org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:190)
at
org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:146)
at java.lang.Thread.run(Thread.java:662)
Counters:
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 67  Reduce: 1   Cumulative CPU: 4172.93 sec   HDFS Read:
43334 HDFS Write: 12982162918 SUCCESS
Job 1: Map: 51   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 days 1 hours 9 minutes 32 seconds 930 msec

The same works fine in 0.8.1.6. After creating the view, i am able to query
the view successfully.
Only when i join with other table, it  throws the above exceptions.

Thanks,
Pandeeswaran


On Fri, Aug 9, 2013 at 4:37 PM, pandees waran pande...@gmail.com wrote:

 Hi Nitin,

 I have executed few  test cases and here are my observations.

 a) i am not using any utilities for upgrading to 0.11. Just executing the
 same  hql which work in 0.8.1.6 in 0.11

 b)In my join i am having a view which has an UDAF. (
 https://github.com/scribd/hive-udaf-maxrow)
  When i try to join this view(with UDAF) it another table, i am
 getting the below errors:


 java.lang.InstantiationException:
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
 Continuing ...
 java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
 Continuing ...
 java.lang.InstantiationException:
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
 Continuing ...
 java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
 Continuing ...
 java.lang.InstantiationException:
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
 Continuing ...
 java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
 Continuing ...
 java.lang.InstantiationException:
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
 Continuing ...
 java.lang.RuntimeException: failed to evaluate: unbound=Class.new();

 

 My query looks like:

 select v.* from view1 v join table1 t  on t.col1=v.col1

 The same query works in in 0.8.1.6 without any issues.
 This query works in 0.11 , if i remove UDAF from the view.

 Do i need to rebuild the UDAF separately for 0.11?
 In general, i expect the hql which works in 0.8.1.6 should work in 0.11
 without having any code changes. please correct me , if my assumption is
 incorrect.

 Thanks,
 Pandeeswaran



 On Wed, Aug 7, 2013 at 9:00 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 Will it be possible for you to share your query ? and if you are using
 any custom udf then the java code for the same ?

 how

Re: Join issue in 0.11

2013-08-09 Thread pandees waran
Hi Nitin,

I have executed few  test cases and here are my observations.

a) i am not using any utilities for upgrading to 0.11. Just executing the
same  hql which work in 0.8.1.6 in 0.11

b)In my join i am having a view which has an UDAF. (
https://github.com/scribd/hive-udaf-maxrow)
 When i try to join this view(with UDAF) it another table, i am getting
the below errors:


java.lang.InstantiationException:
org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
Continuing ...
java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
Continuing ...
java.lang.InstantiationException:
org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
Continuing ...
java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
Continuing ...
java.lang.InstantiationException:
org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
Continuing ...
java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
Continuing ...
java.lang.InstantiationException:
org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
Continuing ...
java.lang.RuntimeException: failed to evaluate: unbound=Class.new();


My query looks like:

select v.* from view1 v join table1 t  on t.col1=v.col1

The same query works in in 0.8.1.6 without any issues.
This query works in 0.11 , if i remove UDAF from the view.

Do i need to rebuild the UDAF separately for 0.11?
In general, i expect the hql which works in 0.8.1.6 should work in 0.11
without having any code changes. please correct me , if my assumption is
incorrect.

Thanks,
Pandeeswaran



On Wed, Aug 7, 2013 at 9:00 PM, Nitin Pawar nitinpawar...@gmail.com wrote:

 Will it be possible for you to share your query ? and if you are using any
 custom udf then the java code for the same ?

 how are you upgrading from hive-0.8 to hive-0.11?

 aws announced that EMR supports hive 0.11  and that was 4 days ago. Can
 you check if you need to see if you need to change something on EMR side ?


 On Wed, Aug 7, 2013 at 8:28 PM, pandees waran pande...@gmail.com wrote:

 Hi Nitin,

 Nope! it ended up with below error messages:

 Examining task ID: task_201308070831_0010_m_52 (and more) from job
 job_201308070831_0010
 Exception in thread Thread-98 java.lang.ClassFormatError: Absent
 Code attribute in method that is not native or abstract in class file
 javax/servlet/http/HttpServlet
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
 at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
 at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at
 org.apache.hadoop.hive.shims.Hadoop20SShims.getTaskAttemptLogUrl(Hadoop20SShims.java:49)
 at
 org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:190)
 at
 org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:146)
 at java.lang.Thread.run(Thread.java:662)
 Counters:
 FAILED: Execution Error, return code 2 from
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched:
 Job 0: Map: 67  Reduce: 1   Cumulative CPU: 4172.93 sec   HDFS Read:
 43334 HDFS Write: 12982162918 SUCCESS
 Job 1: Map: 51   HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 days 1 hours 9 minutes 32 seconds 930
 msec


 But, the same query works fine in hive 0.8.1.6 without any issues.
 i am working on the 0.11 upgrade and facing this issue.

 Thanks,
 Pandeeswaran

 On 8/7/13, Nitin Pawar nitinpawar

Join issue in 0.11

2013-08-07 Thread pandees waran
Hi,

I am facing the same issue as mentioned in the below JIRA:

https://issues.apache.org/jira/browse/HIVE-3872

I am using amazon EMR with hive 0.11.

Do i need to apply any patch on top of 0.11 to fix this NPE issue.?

-- 
Thanks,
Pandeeswaran


Re: Join issue in 0.11

2013-08-07 Thread pandees waran
Hi Nitin,

Nope! it ended up with below error messages:

Examining task ID: task_201308070831_0010_m_52 (and more) from job
job_201308070831_0010
Exception in thread Thread-98 java.lang.ClassFormatError: Absent
Code attribute in method that is not native or abstract in class file
javax/servlet/http/HttpServlet
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at 
org.apache.hadoop.hive.shims.Hadoop20SShims.getTaskAttemptLogUrl(Hadoop20SShims.java:49)
at 
org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:190)
at 
org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:146)
at java.lang.Thread.run(Thread.java:662)
Counters:
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 67  Reduce: 1   Cumulative CPU: 4172.93 sec   HDFS Read:
43334 HDFS Write: 12982162918 SUCCESS
Job 1: Map: 51   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 days 1 hours 9 minutes 32 seconds 930 msec


But, the same query works fine in hive 0.8.1.6 without any issues.
i am working on the 0.11 upgrade and facing this issue.

Thanks,
Pandeeswaran

On 8/7/13, Nitin Pawar nitinpawar...@gmail.com wrote:
 before applying patch,

 can you confirm that map join query worked fine and gave the results you
 wanted?


 On Wed, Aug 7, 2013 at 6:46 PM, Sathya Narayanan K ksat...@live.com
 wrote:

 Hi,

 ** **

 I am also facing the same issue. Could anyone please suggest whether we
 can apply any patch?

 ** **

 Thanks,

 Sathya Narayanan 

 ** **

 *From:* pandees waran [mailto:pande...@gmail.com]
 *Sent:* Wednesday, August 07, 2013 6:39 PM
 *To:* user@hive.apache.org
 *Subject:* Join issue in 0.11

 ** **

 Hi,

 I am facing the same issue as mentioned in the below JIRA:

 https://issues.apache.org/jira/browse/HIVE-3872

 I am using amazon EMR with hive 0.11. 

 Do i need to apply any patch on top of 0.11 to fix this NPE issue.?
 


 -- 

 Thanks,

 Pandeeswaran




 --
 Nitin Pawar



-- 
Thanks,
Pandeeswaran


Re: Prevent users from killing each other's jobs

2013-07-30 Thread pandees waran
Hi Mikhail,

Could you please explain how we can track all the kill requests for a job?
Is there any feature available in hadoop stack for this? Or do we need to
track this in OS layer by capturing the signals?

Thanks,
Pandeesh
On Jul 31, 2013 12:03 AM, Mikhail Antonov olorinb...@gmail.com wrote:

 In addition to using job's ACLs you could have more brutal schema. Track
 all requests to kill the jobs, and if any request is coming from the user
 who should't be trying to kill this particular job, then ssh from the
 script to his client machine and forcibly reboot it :)


 2013/7/30 Edward Capriolo edlinuxg...@gmail.com

 Honestly tell your users to stop being jerks. People know if they kill my
 query there is going to be hell to pay :)


 On Tue, Jul 30, 2013 at 2:25 PM, Vinod Kumar Vavilapalli 
 vino...@apache.org wrote:


 You need to set up Job ACLs. See
 http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Job+Authorization
 .

 It is a per job configuration, you can provide with defaults. If the job
 owner wishes to give others access, he/she can do so.

  Thanks,
 +Vinod Kumar Vavilapalli
 Hortonworks Inc.
 http://hortonworks.com/

 On Jul 30, 2013, at 11:21 AM, Murat Odabasi wrote:

 Hi there,

 I am trying to introduce some sort of security to prevent different
 people using the cluster from interfering with each other's jobs.

 Following the instructions at
 http://hadoop.apache.org/docs/stable/cluster_setup.html and

 https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-9/security
 , this is what I put in my mapred-site.xml:

 property
  namemapred.task.tracker.task-controller/name
  valueorg.apache.hadoop.mapred.LinuxTaskController/value
 /property

 property
  namemapred.acls.enabled/name
  valuetrue/value
 /property

 I can see the configuration parameters in the job configuration when I
 run a hive query, but the users are still able to kill each other's
 jobs.

 Any ideas about what I may be missing?
 Any alternative approaches I can adopt?

 Thanks.






 --
 Thanks,
 Michael Antonov



Wildcard support in specifying file location

2013-07-22 Thread pandees waran
Hi,

I am newbie  to Hive . While creating external tables, can we use wildcard
to specify file location.
i.e:

STORED AS TEXTFILE LOCATION 's3://root/*/date*/'

Is the above specification valid in hive 0.7.1?

Thanks