Request for edit permission to hive wiki
Hi, Could anyone grant me the edit permission of hive wiki? My Confluence user name: lirui Thanks a lot! Cheers. Rui Li
Re: doubt about locking mechanism in Hive
Hi, We also encounter this in hive 0.13 , we need to enable concurrency in daily ETL workflows (to avoid sub etl start to read parent etl 's output while it's still running). We found that in hive 0.13 sometime when you open hive cli shell it would output the msg conflicting lock present for default mode EXCLUSIVE and wait for some locks to be released. We haven't encounter this in hive 0.11 and are still trying to figure it out. 2014-08-25 15:21 GMT+08:00 Sourygna Luangsay sluang...@pragsis.com: Many thanks Edward for this complete answer. So the main idea is to simply disable concurrency in Hive if I get you. My doubt now is: is it something most Hive users do as default? Can somebody else share its own experience? Regards, *Sourygna Luangsay* *From:* Edward Capriolo [mailto:edlinuxg...@gmail.com] *Sent:* viernes, 22 de agosto de 2014 16:07 *To:* user@hive.apache.org *Subject:* Re: doubt about locking mechanism in Hive IMHO locking support should be turned off by default. I would argue if you are requiring this feature often you may be designing your systems improperly. You really should not have that many situations where you need locking in a write (mostly) once file system. The only time I have ever used it is if I had a process completely re-writing the contents of a table and I needed downstream things not to select from this table when it was in an inconsistent state. Having it on by default is a bad idea. You have pointed out a case where doing a simple select query attempts to acquire locks it does not need. That puts strain on more systems and creates more changes for issues. One of the big design philosophy issues I tend to have with hive lately is we have this pool of users (like myself) that use hive for its original purpose. To query write once text files, and create aggregations. Then there are other groups attempting to implement very complicated semantics around streaming, transactions, locking, whatever. Then you have tools like cloudera manager giving configution warnings such as: Hive: Hive is not configured with ZooKeeper Service. As a result, hive-site will not contain hive.zookeeper.quorum, which can lead to corruption in concurrency scenarios. I think this statement is incorrect AND is BAD advice. Then users such as yourself making a conclusion like I should turn on locking because no one would ever assume that !!!SELECTING 1 ROW FROM A TABLE WOULD CAUSE 1100 LOCKS TO BE ACQUIRED ::rant over:: I am not saying that hive locking is bad, but I am saying I leave it off and turn it on when I need it on a per query basis. On Fri, Aug 22, 2014 at 8:48 AM, Sourygna Luangsay sluang...@pragsis.com wrote: Hi, I have some troubles with the locking/concurrency mechanism of Hive when doing a large select and trying to create a table at the same time. My version of Hive is 0.13. What I try to do is the following: 1) In a hive shell: use mydatabase; select * from competence limit 1; # this table has 1100 partitions. So with hive.support.concurrency=true, it needs at least 90s to execute (I know, this is a silly query: I should rather do a select * where “a partition”… The purpose of this query is to replicate easily the problem by having a query that needs a lot of time to execute) 2) In another hive shell, meanwhile the 1st query is executing: use mydatabase; create table probsourygna (foo string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ; The problem is that the “create table” does not execute untill the first query (select) has finished. And we can see messages of the following type: conflicting lock present for mydatabase mode EXCLUSIVE conflicting lock present for mydatabase mode EXCLUSIVE … (1 line every 60 s) It seems to me that the first query puts a shared lock at the database (mydatabase) level. Then, the second query tries to acquire an exclusive lock at the database level (fails and retries every 60s). Am I right? (when I look at the documentation https://cwiki.apache.org/confluence/display/Hive/Locking , it says nothing about locks at a database level) Is there any solution to my problem? (avoiding a long “select” to block a “create” query, without removing the concurrency of Hive) Regards, *Sourygna Luangsay* AVISO CONFIDENCIAL Este correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo. CONFIDENTIALITY WARNING. This message
Re: Nested types in ORC
Thanks Prasanth. Does it also mean that a query reading nested.k column will invariably read nested.v as well even if nested.v column in not used in the query? On Mon, Sep 8, 2014 at 11:29 PM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: Hi ORC stores nested fields as separate columns. For example: The following table create table orc_nested (key string, nested structk:string,v:string, zip long) stored as orc; will be flattened and stored as separated columns like below key, nested, nested.k, nested.v, zip you can have a look at the structure of ORC files using hive —orcfiledump” utility. With regard to your next question, predicate pushdown is not supported for complex types at this point. There is a JIRA already for supporting it https://issues.apache.org/jira/browse/HIVE-7214 At this point, schema 2 will make you enable predicate pushdown. The performance difference depends mainly on the data layout/if column is sorted or not. Thanks Prasanth Jayachandran On Sep 8, 2014, at 6:16 AM, Abhishek Agarwal abhishc...@gmail.com wrote: Hi all, I have few questions with regards to nested columns in Hive. How does ORC internally stores the complex types such as a struct? Are the nested fields stored as separate columns or is the whole struct is serialized as one column? Is predicate pushdown supported for queries which access nested columns? In general, is there a significant performance difference in following schemas with regards to query execution and storage? Schema1: { string a; struct b { string b1; string b2; } } Schema 2: { string a; string b.b1; string b.b2; } -- Regards, Abhishek Agarwal CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Regards, Abhishek Agarwal
Re: Nested types in ORC
Yes. It does now. Thanks Prasanth Jayachandran On Sep 9, 2014, at 12:30 AM, Abhishek Agarwal abhishc...@gmail.com wrote: Thanks Prasanth. Does it also mean that a query reading nested.k column will invariably read nested.v as well even if nested.v column in not used in the query? On Mon, Sep 8, 2014 at 11:29 PM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: Hi ORC stores nested fields as separate columns. For example: The following table create table orc_nested (key string, nested structk:string,v:string, zip long) stored as orc; will be flattened and stored as separated columns like below key, nested, nested.k, nested.v, zip you can have a look at the structure of ORC files using hive —orcfiledump” utility. With regard to your next question, predicate pushdown is not supported for complex types at this point. There is a JIRA already for supporting it https://issues.apache.org/jira/browse/HIVE-7214 At this point, schema 2 will make you enable predicate pushdown. The performance difference depends mainly on the data layout/if column is sorted or not. Thanks Prasanth Jayachandran On Sep 8, 2014, at 6:16 AM, Abhishek Agarwal abhishc...@gmail.com wrote: Hi all, I have few questions with regards to nested columns in Hive. How does ORC internally stores the complex types such as a struct? Are the nested fields stored as separate columns or is the whole struct is serialized as one column? Is predicate pushdown supported for queries which access nested columns? In general, is there a significant performance difference in following schemas with regards to query execution and storage? Schema1: { string a; struct b { string b1; string b2; } } Schema 2: { string a; string b.b1; string b.b2; } -- Regards, Abhishek Agarwal CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Regards, Abhishek Agarwal -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: UDTF KryoException: unable create/find class error in hive 0.13
Hi, I think I encountered this kind of serialization problem when writing UDFs. Usually, marking every fields of the UDF as *transient* does the trick. I guess the error means that Kryo tries to serialize the UDF class and everything that is inside, and by marking them as transient you ensure that it will not and that they will be instantiated in the default constructor or during the call of initialize() Please keep me informed if it works or not, Regards, Furcy 2014-09-09 1:44 GMT+02:00 Echo Li echo...@gmail.com: I wrote a UDTF in hive 0.13, the function parse a column which is json string and return a table. The function compiles successfully by adding hive-exec-0.13.0.2.1.2.1-471.jar to classpath, however when the jar is added to hive and a function created using the jar then I try to run a query using that function, I got error: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: class_name I went through all steps in a lower version hive (0.10) everything works fine, I searched around and seams that is caused by the ‘kryo” serde, so my question is, is there a fix? and where to find it? thank you.
Re: Dynamic Partitioning- Partition_Naming
you can not modify the paths of partitions being created by dynamic partitioning or rename them Thats the default implementation for having column=value in path as partition On Tue, Sep 9, 2014 at 5:18 AM, anusha Mangina anusha.mang...@gmail.com wrote: I need a table partitioned by country and then city . I created a table and INSERTed data from another table using dynamic partition. CREATE TABLE invoice_details_hive _partitioned(Invoice_Id double,Invoice_Date string,Invoice_Amount double,Paid_Date string)PARTITIONED BY(pay_country STRING,pay_location STRING); Everything worked fine. Partitions by default are named like pay_country=INDIA and pay_city=DELHI etc in ../hive/warehouse/invoice_details_hive_partitioned/pay_country=INDIA/pay_city=DELHI can I get partition name as Just Column Value INDIA and DELHI ...not including column name ...like /hive/warehouse/invoice_details_hive _partitioned/INDIA/DELHI? Thanks in Advance -- Nitin Pawar
Re: doubt about locking mechanism in Hive
We use our own library, simple constructions like files in hdfs that work like pid/lock files. a file like /flags/tablea/process1 could mean hey i'm working on table a leave it alone. Accomplishes the exact same thing with less fuss, it is also much easier for an external process/scheduler/shell script to integrate with this system. I doubt many use hive locking as flow control for a scheduling system. On Tue, Sep 9, 2014 at 3:25 AM, wzc wzc1...@gmail.com wrote: Hi, We also encounter this in hive 0.13 , we need to enable concurrency in daily ETL workflows (to avoid sub etl start to read parent etl 's output while it's still running). We found that in hive 0.13 sometime when you open hive cli shell it would output the msg conflicting lock present for default mode EXCLUSIVE and wait for some locks to be released. We haven't encounter this in hive 0.11 and are still trying to figure it out. 2014-08-25 15:21 GMT+08:00 Sourygna Luangsay sluang...@pragsis.com: Many thanks Edward for this complete answer. So the main idea is to simply disable concurrency in Hive if I get you. My doubt now is: is it something most Hive users do as default? Can somebody else share its own experience? Regards, *Sourygna Luangsay* *From:* Edward Capriolo [mailto:edlinuxg...@gmail.com] *Sent:* viernes, 22 de agosto de 2014 16:07 *To:* user@hive.apache.org *Subject:* Re: doubt about locking mechanism in Hive IMHO locking support should be turned off by default. I would argue if you are requiring this feature often you may be designing your systems improperly. You really should not have that many situations where you need locking in a write (mostly) once file system. The only time I have ever used it is if I had a process completely re-writing the contents of a table and I needed downstream things not to select from this table when it was in an inconsistent state. Having it on by default is a bad idea. You have pointed out a case where doing a simple select query attempts to acquire locks it does not need. That puts strain on more systems and creates more changes for issues. One of the big design philosophy issues I tend to have with hive lately is we have this pool of users (like myself) that use hive for its original purpose. To query write once text files, and create aggregations. Then there are other groups attempting to implement very complicated semantics around streaming, transactions, locking, whatever. Then you have tools like cloudera manager giving configution warnings such as: Hive: Hive is not configured with ZooKeeper Service. As a result, hive-site will not contain hive.zookeeper.quorum, which can lead to corruption in concurrency scenarios. I think this statement is incorrect AND is BAD advice. Then users such as yourself making a conclusion like I should turn on locking because no one would ever assume that !!!SELECTING 1 ROW FROM A TABLE WOULD CAUSE 1100 LOCKS TO BE ACQUIRED ::rant over:: I am not saying that hive locking is bad, but I am saying I leave it off and turn it on when I need it on a per query basis. On Fri, Aug 22, 2014 at 8:48 AM, Sourygna Luangsay sluang...@pragsis.com wrote: Hi, I have some troubles with the locking/concurrency mechanism of Hive when doing a large select and trying to create a table at the same time. My version of Hive is 0.13. What I try to do is the following: 1) In a hive shell: use mydatabase; select * from competence limit 1; # this table has 1100 partitions. So with hive.support.concurrency=true, it needs at least 90s to execute (I know, this is a silly query: I should rather do a select * where “a partition”… The purpose of this query is to replicate easily the problem by having a query that needs a lot of time to execute) 2) In another hive shell, meanwhile the 1st query is executing: use mydatabase; create table probsourygna (foo string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ; The problem is that the “create table” does not execute untill the first query (select) has finished. And we can see messages of the following type: conflicting lock present for mydatabase mode EXCLUSIVE conflicting lock present for mydatabase mode EXCLUSIVE … (1 line every 60 s) It seems to me that the first query puts a shared lock at the database (mydatabase) level. Then, the second query tries to acquire an exclusive lock at the database level (fails and retries every 60s). Am I right? (when I look at the documentation https://cwiki.apache.org/confluence/display/Hive/Locking , it says nothing about locks at a database level) Is there any solution to my problem? (avoiding a long “select” to block a “create” query, without removing the concurrency of Hive) Regards, *Sourygna Luangsay* AVISO CONFIDENCIAL Este correo y la información contenida o adjunta al mismo es privada y
Output File Path- Directory Structure
My Table has Dynamic Partitions and creates the File Path as s3://some-bucket/pageviews/dt=20120311/key=ACME1234/site= example.com/Output-file-1 Is there something i can do so i can have the path always as s3://some-bucket/pageviews/20120311/ACME1234/example.com/Output-file-1 Please help me out guys
Weird Error on Inserting in Table [ORC, MESOS, HIVE]
I am doing a dynamic partition load in Hive 0.13 using ORC files. This has always worked in the past both with MapReduce V1 and YARN. I am working with Mesos now, and trying to trouble shoot this weird error: Failed with exception AlreadyExistsException(message:Partition already exists What's odd is is my insert is an insert (without Overwrite) so it's like two different reducers have data to go into the same partition, but then there is a collision of some sort? Perhaps there is a situation where the partition doesn't exist prior to the run, but when two reducers have data, they both think they should be the one to create the partition? Shouldn't if a partition already exists, the reducer just copies it's file into the partition? I am struggling to see why this would be an issue with Mesos, but not on Yarn, or MRv1. Any thoughts would be welcome. John
Re: Weird Error on Inserting in Table [ORC, MESOS, HIVE]
I ran with debug logging, and this is interesting, there was a loss of connection to the metastore client RIGHT before the partition mention above... as data was looking to be moved around... I wonder if the timing on that is bad? 14/09/09 12:47:37 [main]: INFO exec.MoveTask: Partition is: {day=null, source=null} 14/09/09 12:47:38 [main]: INFO metadata.Hive: Renaming src:maprfs:/user/hive/scratch/hive-mapr/hive_2014-09-09_12-38-30_860_3555291990145206535-1/-ext-1/day=2012-11-30/source=20121119_SWAirlines_Spam/04_0;dest: maprfs:/user/hive/warehouse/intel_flow.db/pcaps/day=2012-11-30/source=20121119_SWAirlines_Spam/04_0;Status:true 14/09/09 12:48:02 [main]: WARN metastore.RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect. org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) On Tue, Sep 9, 2014 at 11:02 AM, John Omernik j...@omernik.com wrote: I am doing a dynamic partition load in Hive 0.13 using ORC files. This has always worked in the past both with MapReduce V1 and YARN. I am working with Mesos now, and trying to trouble shoot this weird error: Failed with exception AlreadyExistsException(message:Partition already exists What's odd is is my insert is an insert (without Overwrite) so it's like two different reducers have data to go into the same partition, but then there is a collision of some sort? Perhaps there is a situation where the partition doesn't exist prior to the run, but when two reducers have data, they both think they should be the one to create the partition? Shouldn't if a partition already exists, the reducer just copies it's file into the partition? I am struggling to see why this would be an issue with Mesos, but not on Yarn, or MRv1. Any thoughts would be welcome. John
RE: Indexes vs Partitions in hive
Lefty, that’s the single best description of indexes/partitions I’ve yet encountered. Stealing it. Nice ☺ From: Lefty Leverenz [mailto:leftylever...@gmail.com] Sent: Tuesday, September 09, 2014 2:28 PM To: user@hive.apache.org Subject: Re: Indexes vs Partitions in hive Others can give technical explanations, but I'll give you a simple analogy: a book might have an index as well as chapters. Both help you find information more quickly. The index directs you to particular information, and chapters partition the book into smaller pieces that are organized around a common theme. To stretch the analogy, a book can only have one set of chapters but it can have multiple indexes (topic index, scientific name index, poem title index, poem author index, and so on). -- Lefty On Mon, Sep 8, 2014 at 6:26 AM, Chhaya Vishwakarma chhaya.vishwaka...@lntinfotech.commailto:chhaya.vishwaka...@lntinfotech.com wrote: Hi All, How indexes in hive are different than partitions? both improves query performance as per my knowledge then in what way they differ? What are the situations I'll be using indexing or partitioning? Can i use them together? Kindly suggest Regards, Chhaya Vishwakarma The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail
Re: Weird Error on Inserting in Table [ORC, MESOS, HIVE]
Well, here is me talking to myself: but in case someone else runs across this, I changed the hive metastore connect timeout to 600 seconds (per the JIRA below for Hive 0.14) and now my problem has gone away. It looks like the timeout was causing some craziness. https://issues.apache.org/jira/browse/HIVE-7140 On Tue, Sep 9, 2014 at 1:00 PM, John Omernik j...@omernik.com wrote: I ran with debug logging, and this is interesting, there was a loss of connection to the metastore client RIGHT before the partition mention above... as data was looking to be moved around... I wonder if the timing on that is bad? 14/09/09 12:47:37 [main]: INFO exec.MoveTask: Partition is: {day=null, source=null} 14/09/09 12:47:38 [main]: INFO metadata.Hive: Renaming src:maprfs:/user/hive/scratch/hive-mapr/hive_2014-09-09_12-38-30_860_3555291990145206535-1/-ext-1/day=2012-11-30/source=20121119_SWAirlines_Spam/04_0;dest: maprfs:/user/hive/warehouse/intel_flow.db/pcaps/day=2012-11-30/source=20121119_SWAirlines_Spam/04_0;Status:true 14/09/09 12:48:02 [main]: WARN metastore.RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect. org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) On Tue, Sep 9, 2014 at 11:02 AM, John Omernik j...@omernik.com wrote: I am doing a dynamic partition load in Hive 0.13 using ORC files. This has always worked in the past both with MapReduce V1 and YARN. I am working with Mesos now, and trying to trouble shoot this weird error: Failed with exception AlreadyExistsException(message:Partition already exists What's odd is is my insert is an insert (without Overwrite) so it's like two different reducers have data to go into the same partition, but then there is a collision of some sort? Perhaps there is a situation where the partition doesn't exist prior to the run, but when two reducers have data, they both think they should be the one to create the partition? Shouldn't if a partition already exists, the reducer just copies it's file into the partition? I am struggling to see why this would be an issue with Mesos, but not on Yarn, or MRv1. Any thoughts would be welcome. John
Re: Hive Index and ORC
On 9/6/14, 9:36 AM, Alain Petrus wrote: I am wondering whether is it possible to use Hive index and ORC format? Does it make sense? ORC maintains its own indexes within the file - one index record every 10,000 rows (orc.row.index.stride / orc.create.index). You can take advantage of it during scan+filter with the following option hive set hive.optimize.index.filter=true; A recent IBM paper did have some detailed analysis on ORC's indexing performance - but it is relatively free because there is no other step than just inserting into an ORC table. The part where ORC does help a lot is if you then do a ANALYZE TABLE to build information required to make query plans better, because it will read the stats off the single index record at the bottom of each orc file (the partial scan mode). Cheers, Gopal
PIG heart beat freeze using hue + cdh 5.1
Hi I have a only 604 rows in the hive table. while using A = LOAD 'revenue' USING org.apache.hcatalog.pig.HCatLoader(); DUMP A; it starts spouting heart beat repeatedly and does not leave this state.Can please someone help.I am getting following exception 2014-09-09 17:27:45,844 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Kind: RM_DELEGATION_TOKEN, Service: 10.215.204.182:8032, Ident: (owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1410301632571, maxDate=1410906432571, sequenceNumber=14, masterKeyId=2) 2014-09-09 17:27:46,709 [JobControl] WARN org.apache.hadoop.mapreduce.v2.util.MRApps - cache file (mapreduce.job.cache.files) hdfs://txwlcloud2:8020/user/oozie/share/lib/lib_20140820161455/pig/commons-httpclient-3.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://txwlcloud2:8020/user/oozie/share/lib/lib_20140820161455/hcatalog/commons-httpclient-3.1.jar This will be an error in Hadoop 2.0 2014-09-09 17:27:46,712 [JobControl] WARN org.apache.hadoop.mapreduce.v2.util.MRApps - cache file (mapreduce.job.cache.files) hdfs://txwlcloud2:8020/user/oozie/share/lib/lib_20140820161455/pig/commons-io-2.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://txwlcloud2:8020/user/oozie/share/lib/lib_20140820161455/hcatalog/commons-io-2.1.jar This will be an error in Hadoop 2.0 2014-09-09 17:27:46,894 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1410291186220_0006 2014-09-09 17:27:46,968 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://txwlcloud2:8088/proxy/application_1410291186220_0006/ 2014-09-09 17:27:46,969 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1410291186220_0006 2014-09-09 17:27:46,969 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A 2014-09-09 17:27:46,969 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4] C: R: 2014-09-09 17:27:46,969 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://txwlcloud2:50030/jobdetails.jsp?jobid=job_1410291186220_0006 2014-09-09 17:27:47,019 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete Heart beat Heart beat Heart beat Heart beat Heart beat
Re: Pig jobs run forever with PigEditor in Hue
Hi Does anyone please let me know how to increase the mapreduce slots? i am getting infinite heartbeat when i run a PIG script from hue cloudera cdh5.1 Thanks,Amit
Increase mapreduce slots
Hi Does anyone please let me know how to increase the mapreduce slots? i am getting infinite heartbeat when i run a PIG script from hue cloudera cdh5.1 Thanks,Amit
Re: PIG heart beat freeze using hue + cdh 5.1
It use Yarn now you need to set your container resource memory and CPU then set the mapreduce physical memory and CPU cores the number of mapper and reducers are calculated based on the resource you gave to your mapper and reducer Pengcheng Sent from my iPhone On Sep 9, 2014, at 7:55 PM, Amit Dutta amitkrdu...@outlook.com wrote: I think one of the issue is number of mapreduce slot for the cluster... Can anyone please let me know how do I increase the mapreduce slot? From: amitkrdu...@outlook.com To: user@hive.apache.org Subject: PIG heart beat freeze using hue + cdh 5.1 Date: Tue, 9 Sep 2014 17:55:01 -0500 Hi I have a only 604 rows in the hive table. while using A = LOAD 'revenue' USING org.apache.hcatalog.pig.HCatLoader(); DUMP A; it starts spouting heart beat repeatedly and does not leave this state. Can please someone help. I am getting following exception 2014-09-09 17:27:45,844 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Kind: RM_DELEGATION_TOKEN, Service: 10.215.204.182:8032, Ident: (owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1410301632571, maxDate=1410906432571, sequenceNumber=14, masterKeyId=2) 2014-09-09 17:27:46,709 [JobControl] WARN org.apache.hadoop.mapreduce.v2.util.MRApps - cache file (mapreduce.job.cache.files) hdfs://txwlcloud2:8020/user/oozie/share/lib/lib_20140820161455/pig/commons-httpclient-3.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://txwlcloud2:8020/user/oozie/share/lib/lib_20140820161455/hcatalog/commons-httpclient-3.1.jar This will be an error in Hadoop 2.0 2014-09-09 17:27:46,712 [JobControl] WARN org.apache.hadoop.mapreduce.v2.util.MRApps - cache file (mapreduce.job.cache.files) hdfs://txwlcloud2:8020/user/oozie/share/lib/lib_20140820161455/pig/commons-io-2.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://txwlcloud2:8020/user/oozie/share/lib/lib_20140820161455/hcatalog/commons-io-2.1.jar This will be an error in Hadoop 2.0 2014-09-09 17:27:46,894 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1410291186220_0006 2014-09-09 17:27:46,968 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://txwlcloud2:8088/proxy/application_1410291186220_0006/ 2014-09-09 17:27:46,969 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1410291186220_0006 2014-09-09 17:27:46,969 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A 2014-09-09 17:27:46,969 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4] C: R: 2014-09-09 17:27:46,969 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://txwlcloud2:50030/jobdetails.jsp?jobid=job_1410291186220_0006 2014-09-09 17:27:47,019 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete Heart beat Heart beat Heart beat Heart beat Heart beat
Re: Indexes vs Partitions in hive
Thanks very much Nick, it's yours for the taking. -- Lefty On Tue, Sep 9, 2014 at 2:37 PM, Martin, Nick nimar...@pssd.com wrote: Lefty, that’s the single best description of indexes/partitions I’ve yet encountered. Stealing it. Nice J *From:* Lefty Leverenz [mailto:leftylever...@gmail.com] *Sent:* Tuesday, September 09, 2014 2:28 PM *To:* user@hive.apache.org *Subject:* Re: Indexes vs Partitions in hive Others can give technical explanations, but I'll give you a simple analogy: a book might have an index as well as chapters. Both help you find information more quickly. The index directs you to particular information, and chapters partition the book into smaller pieces that are organized around a common theme. To stretch the analogy, a book can only have one set of chapters but it can have multiple indexes (topic index, scientific name index, poem title index, poem author index, and so on). -- Lefty On Mon, Sep 8, 2014 at 6:26 AM, Chhaya Vishwakarma chhaya.vishwaka...@lntinfotech.com wrote: Hi All, How indexes in hive are different than partitions? both improves query performance as per my knowledge then in what way they differ? What are the situations I'll be using indexing or partitioning? Can i use them together? Kindly suggest Regards, Chhaya Vishwakarma -- The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail