Re: Hive insert into RCFILE issue with timestamp columns

2013-03-06 Thread Prasad Mujumdar
Dilip, Looks like you are using the data from the original schema for this new table that has single timestamp column. When I tried with just the timestamp from your data, the query runs fine. I guess the original issue you hit on the data that didn't have fraction part (1969-12-31 19:00:00,

Re: Hive insert into RCFILE issue with timestamp columns

2013-03-06 Thread Sékine Coulibaly
Prasad, Isn't the fractional part of the TIMESTAMP type supposed to be optional, as per the error message : Failed with exception java.io.IOException:java.lang.IllegalArgumentException: Timestamp format must be -mm-dd hh:mm:ss[.f] Shall we understand 9 digits for fractional part are

Re: Read map value from a table

2013-03-06 Thread Sai Sai
Here is my data in a file which i have successfully loaded into a table test and successfully get the data for: Select * from test; Name    ph    category Name1    ph1    {type:1000,color:200,shape:610} Name2    ph2    {type:2000,color:200,shape:150} Name3    ph3   

Oozie - using with datastax hadoop - cassandra file system

2013-03-06 Thread shreehari padaki
Hi All, We are using DataStax for Hadoop with Cassandra, now we are trying to run a job through Oozie, but while running the workflow job we are getting below error java.io.IOException: No FileSystem for scheme: cfs We have added property  property nameoozie.filesystems.supported/name

Re: Oozie - using with datastax hadoop - cassandra file system

2013-03-06 Thread Viral Bajaria
Though I had love to hear the rationale behind using the DataStax Hadoop, we can do that off the list (will email you separately for that). But this list is for Hive related questions and since the error is in Oozie you will be better off asking this question on the Oozie mailing list. -Viral On

Re: Best table storage for analytical use case

2013-03-06 Thread Sékine Coulibaly
Hi Dean, Indeed, switching from RCFiles to SequenceFiles yield a query duration down 35% (82secs down to 53secs) ! I added Snappy/Gzip block compression altogether. Things are getting better, down to 30secs (sequenceFile+snappy). Yes, most request have a WHERE clause with a time range, will have

Re: Best table storage for analytical use case

2013-03-06 Thread Dean Wampler
MapReduce is very course-grained. It might seem that more cores is better, but once the data sizes get well below the block threshold in size, the overhead of starting JVM processes and all the other background becomes a significant percentage of the overall runtime. So, you quickly reach the

Re: Where is the location of hive queries

2013-03-06 Thread Sai Sai
After we run a query in hive shell as: Select * from myTable; Are these results getting saved to any file apart from the console/terminal display. If so where is the location of the results. Thanks Sai

Re: Where is the location of hive queries

2013-03-06 Thread Nitin Pawar
the results are not stored to any file .. they are available on console only if you want to save to the results then write execute your query like hive -e query file On Wed, Mar 6, 2013 at 9:32 PM, Sai Sai saigr...@yahoo.in wrote: After we run a query in hive shell as: Select * from

Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an error

2013-03-06 Thread Venkat Ranganathan
Hi Ajit Do you know if rest of the columns also are null when the three non null columns are null Venkat On Wed, Mar 6, 2013 at 12:35 AM, Ajit Kumar Shreevastava ajit.shreevast...@hcl.com wrote: Hi Abhijeet, Thanks for your response. If values that don’t fit in double must be getting

Re: Where is the location of hive queries

2013-03-06 Thread Dean Wampler
Or use a variant of the INSERT statement to write to a directory or a table. On Wed, Mar 6, 2013 at 10:05 AM, Nitin Pawar nitinpawar...@gmail.comwrote: the results are not stored to any file .. they are available on console only if you want to save to the results then write execute your

Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an error

2013-03-06 Thread Jarek Jarcec Cecho
Hi Ajit, I've seen similar issue many times. Does your table have textual data? If so, can it happen that your textual data contains hive delimiters like new line characters? Because if so then Sqoop might create two lines in for one single row in the table that will be consequently seen as two

Combine two overlapping schema?

2013-03-06 Thread Keith Wiley
I have two tables which have overlapping but nonidentical schema. I want to creating a new table that unions them, leaving nulls in any given row where a column name doesn't occur in the other table: SCHEMA 1: { a, b, c, Y } row: { 1, 2, 3, 4 } SCHEMA 2: { a, b, c, Z } row: { 5, 6,

Re: Combine two overlapping schema?

2013-03-06 Thread Dean Wampler
Of the top of my head, I think UNION ALL should work if you explicitly project out the missing columns with NULL or other values, e.g. using nested SELECTs, something like SELECT * FROM ( SELECT a,b,c, Y, NULL AS Z FROM table1 UNION ALL SELECT a,b,c, NULL AS Y, Z FROM table2 ) table12; On

Re: Combine two overlapping schema?

2013-03-06 Thread Keith Wiley
Ah. I was stuck on the requirement that the two schema match, but I see your point. I'll see if that works. On Mar 6, 2013, at 10:11 , Dean Wampler wrote: Of the top of my head, I think UNION ALL should work if you explicitly project out the missing columns with NULL or other values, e.g.

Hadoop cluster hangs on big hive job

2013-03-06 Thread Daning Wang
We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while running big hive jobs(hive-0.8.1). Basically all the nodes are dead, from that trasktracker's log looks it went into some kinds of loop forever. All the log entries like this when problem happened. Any idea how to debug the

RE: Hadoop cluster hangs on big hive job

2013-03-06 Thread Chalcy Raja
You could try breaking up the hive query to return smaller datasets. I have noticed this behavior when the hive query has 'in' in where clause. Thanks, Chalcy From: Daning Wang [mailto:dan...@netseer.com] Sent: Wednesday, March 06, 2013 3:08 PM To: user@hive.apache.org Subject: Hadoop cluster

Re: Error while exporting table data from hive to Oracle through Sqoop

2013-03-06 Thread Jarek Jarcec Cecho
Hi Ajit, would you mind upgrading to Sqoop 1.4.3 RC 0 [1]? It has been already voted to be released as the final 1.4.3, so it should be safe to use. One of the improvements in 1.4.3 is SQOOP-720 [2] that significantly improves the error message in this scenario. Jarcec Links: 1:

Re: Hadoop cluster hangs on big hive job

2013-03-06 Thread Daning Wang
Thanks Chalcy! But the hadoop cluster should not hang in any way, is that a bug? On Wed, Mar 6, 2013 at 12:33 PM, Chalcy Raja chalcy.r...@careerbuilder.comwrote: You could try breaking up the hive query to return smaller datasets. I have noticed this behavior when the hive query has ‘in’ in

Re: Variable Substitution

2013-03-06 Thread Dean Wampler
Even newer versions of Hive do this. Any reason you don't want to provide a definition for all of them? You could argue that an undefined variable is a bug and leaving the literal text in place makes it easier to notice. Although, Unix shells would insert an empty string, so never mind ;) On Wed,

Re: Variable Substitution

2013-03-06 Thread Edward Capriolo
It was done like this in hive because that is what hadoops variable substitution does, namely if it does not understand the variable it does not replace it. On Wed, Mar 6, 2013 at 4:30 PM, Dean Wampler dean.wamp...@thinkbiganalytics.com wrote: Even newer versions of Hive do this. Any reason

Re: Variable Substitution

2013-03-06 Thread Matt Tucker
I'm fine with the variable placeholder not being removed in cases where the variable is not defined (until I change my mind). When I define var2 and var3, though, their placeholders aren't swapped for their values. My reasoning for this was that I'm moving from one execution script that

RE: Hadoop cluster hangs on big hive job

2013-03-06 Thread Chalcy Raja
In my case, it was not a bug. The temp data was filling up the data space and it appeared like hanging, but the last reducer job was still running trying to move data. Once there is absolutely no space for data then, cluster goes into safemode and it hangs. In my case it did not get to the