Hi ,
Can we insert data to external hive tables?
1)Create an external table
create external table binary_tbl_local(byt TINYINT, bl boolean, it int, lng
BIGINT, flt float, dbl double, shrt SMALLINT, str string) row format serde
'org.apache.hadoop.hive.contrib.serde2.TypedBtesSerDe' stored a
I obviously intended '1', not '0' as an example of a true value.
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com
"The easy confidence with which I know another man's religion is folly
What string values in a csv field are parsable by Hive as booleans? If I
indicate that a column is of type boolean when wrapping an external table
around a csv file, what are the legal values? I can imagine numerous
possibilities, for example (for the true values):
0
t
T
true
True
TRUE
y
Y
yes
On Mar 13, 2012, at 13:57 , Igor Tatarinov wrote:
> You have attributevalue in quotes which makes it a constant literal.
>
> igor
> decide.com
Argh! You are correct good sir!
Keith Wiley kwi...@keithwiley.c
This syntax is wrong for both hive and SQL:
hive> select * from stringmap where attributename='foo' order by
'attributevalue';
This is right.
hive> select * from stringmap where attributename='foo' order by attributevalue;
On Tue, Mar 13, 2012 at 4:54 PM, Keith Wiley wrote:
> Um, this is weird.
Hi Keith,
Hive has 2 sort of (well, not really) similar clauses: sort by and order by.
For order by to work, you need to guarantee 1 reducer.
Read up more here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy
Mark
Mark Grover, Business Intelligence Analyst
OANDA Corporat
You have attributevalue in quotes which makes it a constant literal.
igor
decide.com
On Tue, Mar 13, 2012 at 1:54 PM, Keith Wiley wrote:
> Um, this is weird. It simply isn't modifying the order of the returned
> rows at all. I get the same result with no 'order by' clause as with one.
> Addi
Um, this is weird. It simply isn't modifying the order of the returned rows at
all. I get the same result with no 'order by' clause as with one. Adding a
limit or specifying 'asc' has no effect. Using 'sort by' also has no effect.
The column used for ordering is type INT. In the example be
Thanks for all the feedback on this. I'll take a look at all the suggestions I
received.
Cheers!
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com
"Yet mark his perfect self-contentm
If you don't want to modify your CSV files, I would suggest doing the
conversion as part of the query. For that, you can either include the
conversion in each query, or you can create a view of your table that includes
a column with the converted date.
Either way, you may want to try from_unix
Okay, what are the differences and similarities with use a date function for
the same purpose? Is there a reason to go one route or the other?
On Mar 13, 2012, at 11:29 , Mark Grover wrote:
> Hi Keith,
> You should also consider writing you own UDF that takes in the date in
> "American" format
You could do something like that. However you can structure the table as:
CREATE TABLE X ( Map stuff)
CREATE TABLE X ( List stuff)
You can then define a viww over these structures that allow you to
cherry pick the fields you want.
Edward
On Tue, Mar 13, 2012 at 1:03 PM, Keith Wiley wrote:
> Wra
Hi Keith,
You should also consider writing you own UDF that takes in the date in
"American" format and spits out a lexicographical string.
That way you don't have to modify your base data, just use this newly created
from_american_date(String date) UDF to get your new date string in
Mark Grover,
I see how I misled you, sorry. I wasn't implying that my csv data is cleanly
represented in -MM-dd format. I was just asking syntactically how to use
date functions in HiveQL because I hadn't found any examples and I used
-MM-dd in my example. The dates in my csv tables are often in "
I'm a bit confused. It sounds like you're already storing your dates as
strings in a '-mm-dd' format. In that case, you can just sort by
dateColName. There's no issue with using UNIX_TIMESTAMP() in the order by
clause, as it outputs integer values.
Most of the date functions in hive take
Sounds like Matt possesses the proper combination of expertise in both
databases and MapReduce to assist you. I'm bowing out as I honestly don't know
advanced database concepts at all. In addition, hive offers hive-specific
tools like Matt suggested (map-side joins) to help out, which I'm too
Do the joins share the same key?
2012/3/13 Bruce Bian
> Yes,it's in my hive-default.xml and Hive figured to use one reducer only,
> so I thought increase it to 5 might help,which doesn't.
> Anyway, to scan the largest table 6 times isn't efficient hence my
> question.
>
>
> On Wed, Mar 14, 2012
For theta joins, you'll have to convert the query to an equi-join, and then
filter for non-equality in the WHERE clause. Depending upon the size of each
table, you might consider looking at map-side joins, which will allow for doing
non-equality filters during a join before it's passed to the r
Wrapping hive around existing csv files consists of manually naming and typing
every column during the creation command. I have several csv tables and some
of them have a ton of columns. I would love a way to create hive tables which
automatically infers the column types by attempting various
Hi Keith,
Do you know exactly how an algorithm should be in order to fit in the
MapReduce framework? Could you refer me to some references?
Thanks and Regards,
Mahsa
On Tue, Mar 13, 2012 at 12:49 PM, Keith Wiley wrote:
> https://cwiki.apache.org/Hive/languagemanual-joins.html
>
> "Hive does
Is see, you store the date-time as a lexicographically sortable string. That's
fine, but I'm operating on existing csv tables. I guess I could whip up a
hadoop job to convert all the date-time columns to lexicographic strings and
then wrap hive around the resulting converted tables. I was jus
Yes,it's in my hive-default.xml and Hive figured to use one reducer only,
so I thought increase it to 5 might help,which doesn't.
Anyway, to scan the largest table 6 times isn't efficient hence my question.
On Wed, Mar 14, 2012 at 12:37 AM, Jagat wrote:
>
> Hello Weidong Bian
>
> Did you see the
Hi Keith,
We generally store date columns as a string in a similar format to ISO 8601
(-mm-dd hh:MM:ss). This way, when we put the date column in the ORDER BY
clause, it will be sorted chronologically. It also saves us the trouble of
whipping out a unix timestamp calculator to figure out
https://cwiki.apache.org/Hive/languagemanual-joins.html
"Hive does not support join conditions that are not equality conditions as it
is very difficult to express such conditions as a map/reduce job."
I admit, that isn't a very detailed answer, but it gives some indication of the
reason for the
I realize that hive doesn't have a date type for the columns and I realize that
hive *does* have various date functions. I just haven't found a concrete
example of how these two issues are brought together. Ordering the results of
a sql query is done by adding "order by dateColName" to the que
Hello Weidong Bian
Did you see the following configuration properties in conf directory
mapred.reduce.tasks
-1
The default number of reduce tasks per job. Typically set
to a prime close to the number of available hosts. Ignored when
mapred.job.tracker is "local". Hadoop set this t
to my knowledge this is all temporary data.
All the data related to your tables is stored on the location which you can
get with "desc formatted "
this is a temporary hive storage place. If you kill the job in between,
this data in /tmp/ is left as stale data
On Tue, Mar 13, 2012 at 11:35 AM, ha
27 matches
Mail list logo