Re: Indexes vs Partitions in hive

2014-09-09 Thread Lefty Leverenz
Thanks very much Nick, it's yours for the taking. -- Lefty On Tue, Sep 9, 2014 at 2:37 PM, Martin, Nick wrote: > Lefty, that’s the single best description of indexes/partitions I’ve yet > encountered. Stealing it. > > > > Nice J > > > > *From:* Lefty Leverenz [mailto:leftylever...@gmail.com] >

RE: PIG heart beat freeze using hue + cdh 5.1

2014-09-09 Thread Amit Dutta
Thanks a lot for your reply..I changed the following parameters from Cloudera manager mapred.tasktracker.map.tasks.maximum = 2 (it was 1 before) mapred.tasktracker.reduce.tasks.maximum = 2 (it was 1 before) could you please mention what are the parameters and how do I change those ... Regards,Am

Re: PIG heart beat freeze using hue + cdh 5.1

2014-09-09 Thread Zenonlpc
It use Yarn now you need to set your container resource memory and CPU then set the mapreduce physical memory and CPU cores the number of mapper and reducers are calculated based on the resource you gave to your mapper and reducer Pengcheng Sent from my iPhone > On Sep 9, 2014, at 7:55 PM, Amit

RE: PIG heart beat freeze using hue + cdh 5.1

2014-09-09 Thread Amit Dutta
I think one of the issue is number of mapreduce slot for the cluster... Can anyone please let me know how do I increase the mapreduce slot? From: amitkrdu...@outlook.com To: user@hive.apache.org Subject: PIG heart beat freeze using hue + cdh 5.1 Date: Tue, 9 Sep 2014 17:55:01 -0500 Hi I have

Increase mapreduce slots

2014-09-09 Thread Amit Dutta
Hi Does anyone please let me know how to increase the mapreduce slots? i am getting infinite heartbeat when i run a PIG script from hue cloudera cdh5.1 Thanks,Amit

Re: Pig jobs run forever with PigEditor in Hue

2014-09-09 Thread Amit Dutta
Hi Does anyone please let me know how to increase the mapreduce slots? i am getting infinite heartbeat when i run a PIG script from hue cloudera cdh5.1 Thanks,Amit

PIG heart beat freeze using hue + cdh 5.1

2014-09-09 Thread Amit Dutta
Hi I have a only 604 rows in the hive table. while using A = LOAD 'revenue' USING org.apache.hcatalog.pig.HCatLoader(); DUMP A; it starts spouting heart beat repeatedly and does not leave this state.Can please someone help.I am getting following exception 2014-09-09 17:27:45,844 [JobControl] IN

Re: UDTF "KryoException: unable create/find class" error in hive 0.13

2014-09-09 Thread Echo Li
Hi Furcy, Thanks for sharing. I modified my code to mark the map variables "transient" but still got same error. this is the code: public class fun_name extends GenericUDTF { private PrimitiveObjectInspector stringOI = null; transient Map> mapObject; transient Map eventDetails;

Re: Hive Index and ORC

2014-09-09 Thread Gopal V
On 9/6/14, 9:36 AM, Alain Petrus wrote: I am wondering whether is it possible to use Hive index and ORC format? Does it make sense? ORC maintains its own indexes within the file - one index record every 10,000 rows (orc.row.index.stride / orc.create.index). You can take advantage of it du

Re: Weird Error on Inserting in Table [ORC, MESOS, HIVE]

2014-09-09 Thread John Omernik
Well, here is me talking to myself: but in case someone else runs across this, I changed the hive metastore connect timeout to 600 seconds (per the JIRA below for Hive 0.14) and now my problem has gone away. It looks like the timeout was causing some craziness. https://issues.apache.org/jira/brows

RE: Indexes vs Partitions in hive

2014-09-09 Thread Martin, Nick
Lefty, that’s the single best description of indexes/partitions I’ve yet encountered. Stealing it. Nice ☺ From: Lefty Leverenz [mailto:leftylever...@gmail.com] Sent: Tuesday, September 09, 2014 2:28 PM To: user@hive.apache.org Subject: Re: Indexes vs Partitions in hive Others can give technical

Re: Indexes vs Partitions in hive

2014-09-09 Thread Lefty Leverenz
Others can give technical explanations, but I'll give you a simple analogy: a book might have an index as well as chapters. Both help you find information more quickly. The index directs you to particular information, and chapters partition the book into smaller pieces that are organized around

Re: Output File Path- Directory Structure

2014-09-09 Thread Nishant Kelkar
Hi Anusha, 1. Well, not quite. What my solution gives you is only a way to move your data from 's3://some-bucket/pageviews/dt=20120311/key=ACME1234/site= example.com/Output-file-1' to 's3://some-bucket/pageviews/20120311/ACME1234/ example.com/Output-file-1'. You could actually do this via the linu

Re: Weird Error on Inserting in Table [ORC, MESOS, HIVE]

2014-09-09 Thread John Omernik
I ran with debug logging, and this is interesting, there was a loss of connection to the metastore client RIGHT before the partition mention above... as data was looking to be moved around... I wonder if the timing on that is bad? 14/09/09 12:47:37 [main]: INFO exec.MoveTask: Partition is: {day=nu

Re: Output File Path- Directory Structure

2014-09-09 Thread anusha Mangina
Thanks Nishanth.. I got thousands of records inserted into dynamically partitioned Tables. 1)Do you think this is ideal solution to CONVERT the path for every record or didnt i understand your answer.? 2) Is there anyway we can set up so the initial path formed as we need(only with Column value

Re: Output File Path- Directory Structure

2014-09-09 Thread Nishant Kelkar
You can use a regex to solve this. If you're using this file path in Java, you could try something like the following: String s = "s3://some-bucket/pageviews/dt=20120311/key=ACME1234/site= example.com/Output-file-1"; System.out.println(s.replaceAll("*[a-z]{2,4}=*", "")); If you'd

Weird Error on Inserting in Table [ORC, MESOS, HIVE]

2014-09-09 Thread John Omernik
I am doing a dynamic partition load in Hive 0.13 using ORC files. This has always worked in the past both with MapReduce V1 and YARN. I am working with Mesos now, and trying to trouble shoot this weird error: Failed with exception AlreadyExistsException(message:Partition already exists What's

Output File Path- Directory Structure

2014-09-09 Thread anusha Mangina
My Table has Dynamic Partitions and creates the File Path as s3://some-bucket/pageviews/dt=20120311/key=ACME1234/site= example.com/Output-file-1 Is there something i can do so i can have the path always as s3://some-bucket/pageviews/20120311/ACME1234/example.com/Output-file-1 Please help me

Re: doubt about locking mechanism in Hive

2014-09-09 Thread Edward Capriolo
We use our own library, simple constructions like files in hdfs that work like pid/lock files. a file like /flags/tablea/process1 could mean "hey i'm working on table a leave it alone". Accomplishes the exact same thing with less fuss, it is also much easier for an external process/scheduler/shell

Re: Dynamic Partitioning- Partition_Naming

2014-09-09 Thread Nitin Pawar
you can not modify the paths of partitions being created by dynamic partitioning or rename them Thats the default implementation for having column=value in path as partition On Tue, Sep 9, 2014 at 5:18 AM, anusha Mangina wrote: > > I need a table partitioned by country and then city . I created

Re: UDTF "KryoException: unable create/find class" error in hive 0.13

2014-09-09 Thread Furcy Pin
Hi, I think I encountered this kind of serialization problem when writing UDFs. Usually, marking every fields of the UDF as *transient* does the trick. I guess the error means that Kryo tries to serialize the UDF class and everything that is inside, and by marking them as transient you ensure tha

Re: Nested types in ORC

2014-09-09 Thread Prasanth Jayachandran
Yes. It does now. Thanks Prasanth Jayachandran On Sep 9, 2014, at 12:30 AM, Abhishek Agarwal wrote: > Thanks Prasanth. Does it also mean that a query reading nested.k column will > invariably read nested.v as well even if nested.v column in not used in the > query? > > On Mon, Sep 8, 2014

Re: Nested types in ORC

2014-09-09 Thread Abhishek Agarwal
Thanks Prasanth. Does it also mean that a query reading nested.k column will invariably read nested.v as well even if nested.v column in not used in the query? On Mon, Sep 8, 2014 at 11:29 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Hi > > ORC stores nested fields as separ

Re: doubt about locking mechanism in Hive

2014-09-09 Thread wzc
Hi, We also encounter this in hive 0.13 , we need to enable concurrency in daily ETL workflows (to avoid sub etl start to read parent etl 's output while it's still running). We found that in hive 0.13 sometime when you open hive cli shell it would output the msg "conflicting lock present for defa