Re: Add few record(s) to a Hive table or a HDFS file on a daily basis
Why not INSERT INTO for appending new records? a)load the new records into a staging table b)INSERT INTO final table from the staging table On 10-Feb-2014 8:16 am, Raj Hadoop hadoop...@yahoo.com wrote: Hi, My requirement is a typical Datawarehouse and ETL requirement. I need to accomplish 1) Daily Insert transaction records to a Hive table or a HDFS file. This table or file is not a big table ( approximately 10 records per day). I don't want to Partition the table / file. I am reading a few articles on this. It was being mentioned that we need to load to a staging table in Hive. And then insert like the below : insert overwrite table finaltable select * from staging; I am not getting this logic. How should I populate the staging table daily. Thanks, Raj
Hive equivalent of dump() in Oracle
Hi, In oracle,DUMP returns a VARCHAR2 value containing the datatype code, length in bytes, and internal representation of expr. SELECT DUMP('abc', 1016) FROM DUAL; DUMP('ABC',1016) -- Typ=96 Len=3 CharacterSet=WE8DEC: 61,62,63 Do we have any equivalent function in Hive? If it's present, can I create a JIRA for this? I feel, it would be useful much while analyzing data issues. -- Thanks, Pandeeswaran
Formatting hive queries
Hi, I would like to come up with a code which automatically formats your hql files. Because, formatting is one of the tedious task and i would like to come up with an utility for that. Please let me know, whether any specific utilities exist already for formatting hive queries. -- Thanks, Pandeeswaran
Special characters support in column names
Hi, Currently hive doesn't support special characters(i.e,%,$..etc) in table names. Is there any request for adding this feature? Please let me know, what do you feel about this. -- Thanks, Pandeeswaran
Interpreting explain plan in hive
Hi, What are the key areas we need to check in the explain plan generated in Hive? I have checked the documentation, it's not detailed about the above question. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain I have similar kind of question asked in our forum, which is unanswered. http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3CCAAG3+BGHadR65FnR5udmGP9=QcriHuubnR8WR-VbxczdOhA=e...@mail.gmail.com%3E In summary, how we can distinguish a good/bad plan ? Thanks for your help. -- Thanks, Pandeeswaran
Re: only one mapper
Hi Edward, Could yiu please explain this? Snappy + SequenceFile is a better option then LZO. Thanks, Pandeeswaran — Sent from Mailbox for iPad On Wed, Aug 21, 2013 at 11:13 PM, Edward Capriolo edlinuxg...@gmail.com wrote: LZO files are only splittable if you index them. Sequence files compresses with LZO are splittable without being indexed. Snappy + SequenceFile is a better option then LZO. On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov i...@decide.com wrote: LZO files are combinable so check your max split setting. http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3c4e328964.7000...@gmail.com%3E igor decide.com On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 yankunhad...@gmail.com wrote: hi all when i use hive hive job make only one mapper actually my file split 18 block my block size is 128MB and data size 2GB i use lzo compression and create file.lzo and make index file.lzo.index i use hive 0.10.0 Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Cannot run job locally: Input Size (= 2304560827) is larger than hive.exec.mode.local.auto.inputbytes.max (= 134217728) Starting Job = job_1377071515613_0003, Tracking URL = http://hydra0001:8088/proxy/application_1377071515613_0003/ Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job -kill job_1377071515613_0003 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2013-08-21 16:44:30,237 Stage-1 map = 0%, reduce = 0% 2013-08-21 16:44:40,495 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 6.81 sec 2013-08-21 16:44:41,710 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 6.81 sec 2013-08-21 16:44:42,919 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 6.81 sec 2013-08-21 16:44:44,117 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 9.95 sec 2013-08-21 16:44:45,333 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 9.95 sec 2013-08-21 16:44:46,530 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 13.0 sec -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com
LAG throws exceptions
Hi, I am executing the below query using lag in 0.11 in Amazon EMR cluster. SELECT id , MARKET_ID, city, product_id, SALE_DAY, isbn, seller_ID, currency, lag(quantity,1,0) over (partition by isbn,ID,MARKET_ID,city,seller_ID,currency order by SALE_DAY) AS start_quantity FROM test_table This simple query ended with below exceptions: Exception in thread Thread-758 java.lang.ClassFormatError: org/apache/hadoop/mapred/TaskLogServlet at org.apache.hadoop.hive.shims.Hadoop20SShims.getTaskAttemptLogUrl(Hadoop20SShims.java:49) at org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:190) at org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:146) at java.lang.Thread.run(Thread.java:724) Counters: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask Any thoughts on this? Please let me know, if i am doing something wrong. -- Thanks, Pandeeswaran
Hive cli Vs beeline cli
Hi pros, Based on your experience with beeline cli, could you please share your thoughts in advantages of using beeline cli over default hive cli? Please share if you find any useful link for this info. Thanks Pandeeswaran
Re: Numbers display in Hive CLI
Sure , let me explore hive beeline client — Sent from Mailbox for iPad On Tue, Aug 13, 2013 at 11:24 PM, Stephen Sprague sprag...@gmail.com wrote: Yeah. I would think i'd be a useful feature to have in the client - but probably not the Hive CLI client. The Hive client seems pretty bare bones and my guess it'll probably stay that way. The Beeline client, however, looks to be where these kinds of bells and whistles probably could/should be added. Check that app out and see if you agree. (search hive beeline). On Tue, Aug 13, 2013 at 9:47 AM, pandees waran pande...@gmail.com wrote: Thanks Stephen! I shall check this . My requirement is controlling the formatting in session level using some properties set. Looks like, there's no such as of now . Would this be a good feature in hive cli? If many people think so, then I can file a feature request. — Sent from Mailbox https://www.dropbox.com/mailbox for iPad On Tue, Aug 13, 2013 at 8:11 PM, Stephen Sprague sprag...@gmail.comwrote: well... a good 'ol search (let's not use the word google) of hive udf we find this: https://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-StringFunctionsand there's a reference to a function called format_number(). or did you really want the *hive CLI* to format the number? if that's the case then no there is no option for that in the hive client. On Mon, Aug 12, 2013 at 11:30 PM, pandees waran pande...@gmail.comwrote: HI, I see the SUM(double_column) displays the result in scientific notation in the hive cli. Is there any way to customize the number display in hive CLI? -- Thanks, Pandeeswaran Hi, I am seeing the double values are displayed as scientifi not
ORC vs TEXT file
Hi, Currently, we use TEXTFILE format in hive 0.8 ,while creating the external tables in intermediate processing . I have read about ORC in 0.11. I have created the same table in 0.11 with ORC format. Without any compression, the ORC file(totally 3 files) occupied the space twice more than the TEXTFILE(only one file). Even, when i query the data from ORC: Select count(*) from orc_table It took more time than the same query against textfile. But, i see cumulative CPU time is lesser in ORC than the text file. What sort of queries will benefit, if we use ORC? In which cases TEXTFILE will be preferred more than ORC? Thanks.
Re: ORC vs TEXT file
Thanks Edward. I shall try compression besides orc and let you know. And also, it looks like the cpu usage is lesser while querying orc rather than text file. But the total time taken by the query time is slightly more in orc than text file. Could you please explain the difference between cumulative cpu time and the total time taken (usually in last line in terms or secs)? Which one should we give preference? On Aug 12, 2013 7:01 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Colmnar formats do not always beat row wise storage. Many times gzip plus block storage will compress something better then columnar storage especially when you have repeated data in different columns. Based on what you are saying it could be possible that you missed a setting and the ocr are not compressed. On Monday, August 12, 2013, pandees waran pande...@gmail.com wrote: Hi, Currently, we use TEXTFILE format in hive 0.8 ,while creating the external tables in intermediate processing . I have read about ORC in 0.11. I have created the same table in 0.11 with ORC format. Without any compression, the ORC file(totally 3 files) occupied the space twice more than the TEXTFILE(only one file). Even, when i query the data from ORC: Select count(*) from orc_table It took more time than the same query against textfile. But, i see cumulative CPU time is lesser in ORC than the text file. What sort of queries will benefit, if we use ORC? In which cases TEXTFILE will be preferred more than ORC? Thanks.
Re: ORC vs TEXT file
Hi Owen, Thanks for your response. My structure is like: a)Textfile: CREATE EXTERNAL TABLE test_textfile ( COL1 BIGINT, COL2 STRING, COL3 BIGINT, COL4 STRING, COL5 STRING, COL6 BIGINT, COL7 BIGINT, COL8 BIGINT, COL9 BIGINT, COl10 BIGINT, COl11 BIGINT, COL12 STRING, COl13 STRING, COl14 STRING, COl15 BIGINT, COl16 STRING, COL17 DOUBLE, COl18 DOUBLE, COl19 DOUBLE, COl20 DOUBLE, COl21 DOUBLE, COL22 DOUBLE, COl23 DOUBLE, COL24 DOUBLE, COl25 DOUBLE, COL26 DOUBLE, COl27 DOUBLE, COL28 DOUBLE, COL29 DOUBLE, COl30 DOUBLE, COl31 DOUBLE, COL32 DOUBLE, COL33 STRING, COl34 STRING, COl35 DOUBLE, COL36 DOUBLE, COl37 DOUBLE, COL38 DOUBLE, COl39 DOUBLE, COL40 DOUBLE, COl41 DOUBLE, COL42 DOUBLE, COL43 DOUBLE, COl44 DOUBLE, COl45 DOUBLE, COL46 DOUBLE, COL47 DOUBLE, COl48 DOUBLE, COl49 DOUBLE, COL50 DOUBLE, COL51 DOUBLE, COl52 DOUBLE, COl53 DOUBLE, COl54 DOUBLE, COL55 DOUBLE, COL56 STRING, COL57 DOUBLE, COL58 DOUBLE, COL59 DOUBLE, COl60 DOUBLE, COl61 STRING, COL62 STRING, COL63 STRING, COL64 STRING, COl65 STRING, COl66 STRING, COl67 STRING, COL68 STRING, Col69 STRING, COL70 STRING, COL71 STRING, COl72 STRING, COl73 STRING, COL74 STRING ) PARTITIONED BY ( COL75 STRING, COL76 STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 's3://test/textfile/'; Using block level compression and bzip2codec for output. b) With the above set of columns, just i have changed as STORED AS ORC for creating ORC. Not using any compression option c)Inserted 7256852 records in both the tables d)Space occupied in S3: Storing as ORC(3 files):153.4MB *3=460.2MB TEXT(single file in bz2 format)=306MB I need to check ORC with compression enabled. Please let me know, if i miss anything. Thanks, On Mon, Aug 12, 2013 at 8:50 PM, Owen O'Malley omal...@apache.org wrote: Pandees, I've never seen a table that was larger with ORC than with text. Can you share your text's file schema with us? Is the table very small? How many rows and GB are the tables? The overhead for ORC is typically small, but as Ed says it is possible for rare cases for the overhead to dominate the data size itself. -- Owen On Mon, Aug 12, 2013 at 6:52 AM, pandees waran pande...@gmail.com wrote: Thanks Edward. I shall try compression besides orc and let you know. And also, it looks like the cpu usage is lesser while querying orc rather than text file. But the total time taken by the query time is slightly more in orc than text file. Could you please explain the difference between cumulative cpu time and the total time taken (usually in last line in terms or secs)? Which one should we give preference? On Aug 12, 2013 7:01 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Colmnar formats do not always beat row wise storage. Many times gzip plus block storage will compress something better then columnar storage especially when you have repeated data in different columns. Based on what you are saying it could be possible that you missed a setting and the ocr are not compressed. On Monday, August 12, 2013, pandees waran pande...@gmail.com wrote: Hi, Currently, we use TEXTFILE format in hive 0.8 ,while creating the external tables in intermediate processing . I have read about ORC in 0.11. I have created the same table in 0.11 with ORC format. Without any compression, the ORC file(totally 3 files) occupied the space twice more than the TEXTFILE(only one file). Even, when i query the data from ORC: Select count(*) from orc_table It took more time than the same query against textfile. But, i see cumulative CPU time is lesser in ORC than the text file. What sort of queries will benefit, if we use ORC? In which cases TEXTFILE will be preferred more than ORC? Thanks. -- Thanks, Pandeeswaran
Re: Join issue in 0.11
Hi, Can someone try to reproduce and confirm whether this is an issue in 0.11 a)Create a view with some UDAF in the definition(i tried with https://github.com/scribd/hive-udaf-maxrow) b)Join this view with some other table I am getting the below exception: Examining task ID: task_201308070831_0010_m_52 (and more) from job job_201308070831_0010 Exception in thread Thread-98 java.lang.ClassFormatError: Absent Code attribute in method that is not native or abstract in class file javax/servlet/http/HttpServlet at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.apache.hadoop.hive.shims.Hadoop20SShims.getTaskAttemptLogUrl(Hadoop20SShims.java:49) at org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:190) at org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:146) at java.lang.Thread.run(Thread.java:662) Counters: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 67 Reduce: 1 Cumulative CPU: 4172.93 sec HDFS Read: 43334 HDFS Write: 12982162918 SUCCESS Job 1: Map: 51 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 days 1 hours 9 minutes 32 seconds 930 msec The same works fine in 0.8.1.6. After creating the view, i am able to query the view successfully. Only when i join with other table, it throws the above exceptions. Thanks, Pandeeswaran On Fri, Aug 9, 2013 at 4:37 PM, pandees waran pande...@gmail.com wrote: Hi Nitin, I have executed few test cases and here are my observations. a) i am not using any utilities for upgrading to 0.11. Just executing the same hql which work in 0.8.1.6 in 0.11 b)In my join i am having a view which has an UDAF. ( https://github.com/scribd/hive-udaf-maxrow) When i try to join this view(with UDAF) it another table, i am getting the below errors: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); My query looks like: select v.* from view1 v join table1 t on t.col1=v.col1 The same query works in in 0.8.1.6 without any issues. This query works in 0.11 , if i remove UDAF from the view. Do i need to rebuild the UDAF separately for 0.11? In general, i expect the hql which works in 0.8.1.6 should work in 0.11 without having any code changes. please correct me , if my assumption is incorrect. Thanks, Pandeeswaran On Wed, Aug 7, 2013 at 9:00 PM, Nitin Pawar nitinpawar...@gmail.comwrote: Will it be possible for you to share your query ? and if you are using any custom udf then the java code for the same ? how
Re: Join issue in 0.11
Hi Nitin, I have executed few test cases and here are my observations. a) i am not using any utilities for upgrading to 0.11. Just executing the same hql which work in 0.8.1.6 in 0.11 b)In my join i am having a view which has an UDAF. ( https://github.com/scribd/hive-udaf-maxrow) When i try to join this view(with UDAF) it another table, i am getting the below errors: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); My query looks like: select v.* from view1 v join table1 t on t.col1=v.col1 The same query works in in 0.8.1.6 without any issues. This query works in 0.11 , if i remove UDAF from the view. Do i need to rebuild the UDAF separately for 0.11? In general, i expect the hql which works in 0.8.1.6 should work in 0.11 without having any code changes. please correct me , if my assumption is incorrect. Thanks, Pandeeswaran On Wed, Aug 7, 2013 at 9:00 PM, Nitin Pawar nitinpawar...@gmail.com wrote: Will it be possible for you to share your query ? and if you are using any custom udf then the java code for the same ? how are you upgrading from hive-0.8 to hive-0.11? aws announced that EMR supports hive 0.11 and that was 4 days ago. Can you check if you need to see if you need to change something on EMR side ? On Wed, Aug 7, 2013 at 8:28 PM, pandees waran pande...@gmail.com wrote: Hi Nitin, Nope! it ended up with below error messages: Examining task ID: task_201308070831_0010_m_52 (and more) from job job_201308070831_0010 Exception in thread Thread-98 java.lang.ClassFormatError: Absent Code attribute in method that is not native or abstract in class file javax/servlet/http/HttpServlet at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.apache.hadoop.hive.shims.Hadoop20SShims.getTaskAttemptLogUrl(Hadoop20SShims.java:49) at org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:190) at org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:146) at java.lang.Thread.run(Thread.java:662) Counters: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 67 Reduce: 1 Cumulative CPU: 4172.93 sec HDFS Read: 43334 HDFS Write: 12982162918 SUCCESS Job 1: Map: 51 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 days 1 hours 9 minutes 32 seconds 930 msec But, the same query works fine in hive 0.8.1.6 without any issues. i am working on the 0.11 upgrade and facing this issue. Thanks, Pandeeswaran On 8/7/13, Nitin Pawar nitinpawar
Join issue in 0.11
Hi, I am facing the same issue as mentioned in the below JIRA: https://issues.apache.org/jira/browse/HIVE-3872 I am using amazon EMR with hive 0.11. Do i need to apply any patch on top of 0.11 to fix this NPE issue.? -- Thanks, Pandeeswaran
Re: Join issue in 0.11
Hi Nitin, Nope! it ended up with below error messages: Examining task ID: task_201308070831_0010_m_52 (and more) from job job_201308070831_0010 Exception in thread Thread-98 java.lang.ClassFormatError: Absent Code attribute in method that is not native or abstract in class file javax/servlet/http/HttpServlet at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.apache.hadoop.hive.shims.Hadoop20SShims.getTaskAttemptLogUrl(Hadoop20SShims.java:49) at org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:190) at org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:146) at java.lang.Thread.run(Thread.java:662) Counters: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 67 Reduce: 1 Cumulative CPU: 4172.93 sec HDFS Read: 43334 HDFS Write: 12982162918 SUCCESS Job 1: Map: 51 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 days 1 hours 9 minutes 32 seconds 930 msec But, the same query works fine in hive 0.8.1.6 without any issues. i am working on the 0.11 upgrade and facing this issue. Thanks, Pandeeswaran On 8/7/13, Nitin Pawar nitinpawar...@gmail.com wrote: before applying patch, can you confirm that map join query worked fine and gave the results you wanted? On Wed, Aug 7, 2013 at 6:46 PM, Sathya Narayanan K ksat...@live.com wrote: Hi, ** ** I am also facing the same issue. Could anyone please suggest whether we can apply any patch? ** ** Thanks, Sathya Narayanan ** ** *From:* pandees waran [mailto:pande...@gmail.com] *Sent:* Wednesday, August 07, 2013 6:39 PM *To:* user@hive.apache.org *Subject:* Join issue in 0.11 ** ** Hi, I am facing the same issue as mentioned in the below JIRA: https://issues.apache.org/jira/browse/HIVE-3872 I am using amazon EMR with hive 0.11. Do i need to apply any patch on top of 0.11 to fix this NPE issue.? -- Thanks, Pandeeswaran -- Nitin Pawar -- Thanks, Pandeeswaran
Re: Prevent users from killing each other's jobs
Hi Mikhail, Could you please explain how we can track all the kill requests for a job? Is there any feature available in hadoop stack for this? Or do we need to track this in OS layer by capturing the signals? Thanks, Pandeesh On Jul 31, 2013 12:03 AM, Mikhail Antonov olorinb...@gmail.com wrote: In addition to using job's ACLs you could have more brutal schema. Track all requests to kill the jobs, and if any request is coming from the user who should't be trying to kill this particular job, then ssh from the script to his client machine and forcibly reboot it :) 2013/7/30 Edward Capriolo edlinuxg...@gmail.com Honestly tell your users to stop being jerks. People know if they kill my query there is going to be hell to pay :) On Tue, Jul 30, 2013 at 2:25 PM, Vinod Kumar Vavilapalli vino...@apache.org wrote: You need to set up Job ACLs. See http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Job+Authorization . It is a per job configuration, you can provide with defaults. If the job owner wishes to give others access, he/she can do so. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Jul 30, 2013, at 11:21 AM, Murat Odabasi wrote: Hi there, I am trying to introduce some sort of security to prevent different people using the cluster from interfering with each other's jobs. Following the instructions at http://hadoop.apache.org/docs/stable/cluster_setup.html and https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-9/security , this is what I put in my mapred-site.xml: property namemapred.task.tracker.task-controller/name valueorg.apache.hadoop.mapred.LinuxTaskController/value /property property namemapred.acls.enabled/name valuetrue/value /property I can see the configuration parameters in the job configuration when I run a hive query, but the users are still able to kill each other's jobs. Any ideas about what I may be missing? Any alternative approaches I can adopt? Thanks. -- Thanks, Michael Antonov
Wildcard support in specifying file location
Hi, I am newbie to Hive . While creating external tables, can we use wildcard to specify file location. i.e: STORED AS TEXTFILE LOCATION 's3://root/*/date*/' Is the above specification valid in hive 0.7.1? Thanks