We cannot insert data to external table right?

2012-03-13 Thread Lu, Wei
Hi , Can we insert data to external hive tables? 1)Create an external table create external table binary_tbl_local(byt TINYINT, bl boolean, it int, lng BIGINT, flt float, dbl double, shrt SMALLINT, str string) row format serde 'org.apache.hadoop.hive.contrib.serde2.TypedBtesSerDe' stored a

Re: csv boolean type

2012-03-13 Thread Keith Wiley
I obviously intended '1', not '0' as an example of a true value. Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "The easy confidence with which I know another man's religion is folly

csv boolean type

2012-03-13 Thread Keith Wiley
What string values in a csv field are parsable by Hive as booleans? If I indicate that a column is of type boolean when wrapping an external table around a csv file, what are the legal values? I can imagine numerous possibilities, for example (for the true values): 0 t T true True TRUE y Y yes

Re: order by having no effect?!

2012-03-13 Thread Keith Wiley
On Mar 13, 2012, at 13:57 , Igor Tatarinov wrote: > You have attributevalue in quotes which makes it a constant literal. > > igor > decide.com Argh! You are correct good sir! Keith Wiley kwi...@keithwiley.c

Re: order by having no effect?!

2012-03-13 Thread Edward Capriolo
This syntax is wrong for both hive and SQL: hive> select * from stringmap where attributename='foo' order by 'attributevalue'; This is right. hive> select * from stringmap where attributename='foo' order by attributevalue; On Tue, Mar 13, 2012 at 4:54 PM, Keith Wiley wrote: > Um, this is weird.

Re: order by having no effect?!

2012-03-13 Thread Mark Grover
Hi Keith, Hive has 2 sort of (well, not really) similar clauses: sort by and order by. For order by to work, you need to guarantee 1 reducer. Read up more here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy Mark Mark Grover, Business Intelligence Analyst OANDA Corporat

Re: order by having no effect?!

2012-03-13 Thread Igor Tatarinov
You have attributevalue in quotes which makes it a constant literal. igor decide.com On Tue, Mar 13, 2012 at 1:54 PM, Keith Wiley wrote: > Um, this is weird. It simply isn't modifying the order of the returned > rows at all. I get the same result with no 'order by' clause as with one. > Addi

order by having no effect?!

2012-03-13 Thread Keith Wiley
Um, this is weird. It simply isn't modifying the order of the returned rows at all. I get the same result with no 'order by' clause as with one. Adding a limit or specifying 'asc' has no effect. Using 'sort by' also has no effect. The column used for ordering is type INT. In the example be

Re: order by date

2012-03-13 Thread Keith Wiley
Thanks for all the feedback on this. I'll take a look at all the suggestions I received. Cheers! Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "Yet mark his perfect self-contentm

RE: order by date

2012-03-13 Thread Tucker, Matt
If you don't want to modify your CSV files, I would suggest doing the conversion as part of the query. For that, you can either include the conversion in each query, or you can create a view of your table that includes a column with the converted date. Either way, you may want to try from_unix

Re: order by date

2012-03-13 Thread Keith Wiley
Okay, what are the differences and similarities with use a date function for the same purpose? Is there a reason to go one route or the other? On Mar 13, 2012, at 11:29 , Mark Grover wrote: > Hi Keith, > You should also consider writing you own UDF that takes in the date in > "American" format

Re: How to import extremely "wide" csv tables

2012-03-13 Thread Edward Capriolo
You could do something like that. However you can structure the table as: CREATE TABLE X ( Map stuff) CREATE TABLE X ( List stuff) You can then define a viww over these structures that allow you to cherry pick the fields you want. Edward On Tue, Mar 13, 2012 at 1:03 PM, Keith Wiley wrote: > Wra

Re: order by date

2012-03-13 Thread Mark Grover
Hi Keith, You should also consider writing you own UDF that takes in the date in "American" format and spits out a lexicographical string. That way you don't have to modify your base data, just use this newly created from_american_date(String date) UDF to get your new date string in Mark Grover,

Re: order by date

2012-03-13 Thread Keith Wiley
I see how I misled you, sorry. I wasn't implying that my csv data is cleanly represented in -MM-dd format. I was just asking syntactically how to use date functions in HiveQL because I hadn't found any examples and I used -MM-dd in my example. The dates in my csv tables are often in "

RE: order by date

2012-03-13 Thread Tucker, Matt
I'm a bit confused. It sounds like you're already storing your dates as strings in a '-mm-dd' format. In that case, you can just sort by dateColName. There's no issue with using UNIX_TIMESTAMP() in the order by clause, as it outputs integer values. Most of the date functions in hive take

Re: non-equality joins

2012-03-13 Thread Keith Wiley
Sounds like Matt possesses the proper combination of expertise in both databases and MapReduce to assist you. I'm bowing out as I honestly don't know advanced database concepts at all. In addition, hive offers hive-specific tools like Matt suggested (map-side joins) to help out, which I'm too

Re: Reduce the number of map/reduce jobs during join

2012-03-13 Thread shule ney
Do the joins share the same key? 2012/3/13 Bruce Bian > Yes,it's in my hive-default.xml and Hive figured to use one reducer only, > so I thought increase it to 5 might help,which doesn't. > Anyway, to scan the largest table 6 times isn't efficient hence my > question. > > > On Wed, Mar 14, 2012

RE: non-equality joins

2012-03-13 Thread Tucker, Matt
For theta joins, you'll have to convert the query to an equi-join, and then filter for non-equality in the WHERE clause. Depending upon the size of each table, you might consider looking at map-side joins, which will allow for doing non-equality filters during a join before it's passed to the r

How to import extremely "wide" csv tables

2012-03-13 Thread Keith Wiley
Wrapping hive around existing csv files consists of manually naming and typing every column during the creation command. I have several csv tables and some of them have a ton of columns. I would love a way to create hive tables which automatically infers the column types by attempting various

Re: non-equality joins

2012-03-13 Thread mahsa mofidpoor
Hi Keith, Do you know exactly how an algorithm should be in order to fit in the MapReduce framework? Could you refer me to some references? Thanks and Regards, Mahsa On Tue, Mar 13, 2012 at 12:49 PM, Keith Wiley wrote: > https://cwiki.apache.org/Hive/languagemanual-joins.html > > "Hive does

Re: order by date

2012-03-13 Thread Keith Wiley
Is see, you store the date-time as a lexicographically sortable string. That's fine, but I'm operating on existing csv tables. I guess I could whip up a hadoop job to convert all the date-time columns to lexicographic strings and then wrap hive around the resulting converted tables. I was jus

Reduce the number of map/reduce jobs during join

2012-03-13 Thread Bruce Bian
Yes,it's in my hive-default.xml and Hive figured to use one reducer only, so I thought increase it to 5 might help,which doesn't. Anyway, to scan the largest table 6 times isn't efficient hence my question. On Wed, Mar 14, 2012 at 12:37 AM, Jagat wrote: > > Hello Weidong Bian > > Did you see the

RE: order by date

2012-03-13 Thread Tucker, Matt
Hi Keith, We generally store date columns as a string in a similar format to ISO 8601 (-mm-dd hh:MM:ss). This way, when we put the date column in the ORDER BY clause, it will be sorted chronologically. It also saves us the trouble of whipping out a unix timestamp calculator to figure out

Re: non-equality joins

2012-03-13 Thread Keith Wiley
https://cwiki.apache.org/Hive/languagemanual-joins.html "Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job." I admit, that isn't a very detailed answer, but it gives some indication of the reason for the

order by date

2012-03-13 Thread Keith Wiley
I realize that hive doesn't have a date type for the columns and I realize that hive *does* have various date functions. I just haven't found a concrete example of how these two issues are brought together. Ordering the results of a sql query is done by adding "order by dateColName" to the que

Re: Reduce the number of map/reduce jobs during join

2012-03-13 Thread Jagat
Hello Weidong Bian Did you see the following configuration properties in conf directory mapred.reduce.tasks -1 The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Hadoop set this t

Re: need some Clarification about tmp folder(urgent)

2012-03-13 Thread Nitin Pawar
to my knowledge this is all temporary data. All the data related to your tables is stored on the location which you can get with "desc formatted " this is a temporary hive storage place. If you kill the job in between, this data in /tmp/ is left as stale data On Tue, Mar 13, 2012 at 11:35 AM, ha