Diagnostic Messages for this Task:
Error: java.io.IOException: cannot find class
com.hadoop.mapred.DeprecatedLzoTextInputFormat
i add hadoop-lzoxx.jar to $HIVE_HOME/lib but it not effect
show me the same exception,who can tell me why? And what should i do
thank for you help
--
In the
hi all when i use hive
hive job make only one mapper actually my file split 18 block my block size
is 128MB and data size 2GB
i use lzo compression and create file.lzo and make index file.lzo.index
i use hive 0.10.0
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set
Hi all ,
I want to join two tables
**
I have table_A:
id1 var1 var2
1 ab
2 cd
Table_B:
id2 var3 var4
3 ef
4 gh
Expected Output is :
id1 var1 var2 id2 var3 var4
1 ab 3e f
2 cd 4g h
Thanks in advance.
--
--Regards
Sandeep Nemuri
We rally should pre compile the jsp. There ia a jira on this somewhere.
On Tuesday, August 20, 2013, Bing Li sarah.lib...@gmail.com wrote:
Hi, Eric et al
Did you resolve this failure?
I'm using Hive-0.11.0, and get the same error when access to HWI via
browser.
I already set the following
Hi -
I just started to get my feet wet with Hive and have a question that I have
not been able to find an answer to..
Suppose I have 2 CSV files:
cat Schema1.csv
Name, Address, Phone
Chris, address1, 999-999-
and
cat Schema2.csv
Id, Name, Address, Gender, Phone
13, Tom, address2, male,
hi igor,
lots of ideas there! I can't speak for them all but let me confirm first
that cluster by X into 1 bucket didn't work? I would have thought that
would have done it.
On Tue, Aug 20, 2013 at 2:29 PM, Igor Tatarinov i...@decide.com wrote:
What's the best way to enforce a single output
Sanjay gets some love after all! :)
On Tue, Aug 20, 2013 at 4:00 PM, Sanjay Subramanian
sanjay.subraman...@wizecommerce.com wrote:
Thanks Ashutosh
From: Ashutosh Chauhan hashut...@apache.orgmailto:hashut...@apache.org
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org
I'm not sure if you'd call that a join. that just looks like two tables
side by side in some random order.
the only way to get that (that i can see) is if there is some kind of
function between the two ids in the two tables. That way you could join
on A.id1 = function(B.id2) otherwise the only
yeah. database design is always subjective so everybody has an opinion
about it. but if you're just starting out i would recommend you kinda
follow the rules as you would in a traditional relational database system.
so two different datasets would mean two different tables in both Hive and
an Rdb
LZO files are combinable so check your max split setting.
http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3c4e328964.7000...@gmail.com%3E
igor
decide.com
On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 yankunhad...@gmail.com wrote:
hi all when i use hive
hive job make only one mapper
LZO files are only splittable if you index them. Sequence files compresses
with LZO are splittable without being indexed.
Snappy + SequenceFile is a better option then LZO.
On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov i...@decide.com wrote:
LZO files are combinable so check your max split
Hi Edward,
Could yiu please explain this?
Snappy + SequenceFile is a better option then LZO.
Thanks,
Pandeeswaran
—
Sent from Mailbox for iPad
On Wed, Aug 21, 2013
Using a single bucket per partition seems to create a single reducer which
is too slow.
I've tried enforcing small files merge but that didn't work. I still got
multiple output files.
Creating a temp table and then combining the multiple files into one
using a simple select * is the only option
Can you provide details on what you want to do.
You maybe able to express this by stacking queries: execute npath in a SubQuery
in the from clause and then do windowing in an outer select.
Also you get the 'path' object back from npath, so you can apply array indexing
on it.
regards,
Harish.
Some ideas to get u started
CREATE EXTERNAL TABLE IF NOT EXISTS names(fullname STRING,address
STRING,phone STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
CREATE EXTERNAL TABLE IF NOT EXISTS names_detail(id BIGINT, fullname
STRING,address STRING,gender STRING, phone STRING) ROW FORMAT
I see. I'll have to punt then. However, there is an after the fact file
crusher Ed Capriolo wrote a while back here:
https://github.com/edwardcapriolo/filecrush YMMV
On Wed, Aug 21, 2013 at 11:12 AM, Igor Tatarinov i...@decide.com wrote:
Using a single bucket per partition seems to create a
Hi
Try this setting in your hive query
SET mapreduce.input.fileinputformat.split.maxsize=some bytes;
If u set this value low then the MR job will use this size to split the input
LZO files and u will get multiple mappers (and make sure the input LZO files
are indexed I.e. .LZO.INDEX files are
Hi
I tried file crusher with LZO but it does not work….I have LZO correctly
configured in production and my jobs are running daily using LZO compression.
I like Crusher so I will see why its not working…Thanks to Edward the code is
there to tweak :-) and test locally
sanjay
From: Stephen
Assuming click stream type of data I want to get the search terms from the
first search request, and return the product id that was eventually viewed
and the number of clicks to the product. So something like this
select search_terms, productid, clicks_to_product from npath ( on clicks
Actually, using a temp table doesn't work either. Apparently, a single
mapper can read from multiple partitions (and output multiple files). There
is no way to force a single mapper per partition.
On Wed, Aug 21, 2013 at 11:12 AM, Igor Tatarinov i...@decide.com wrote:
Using a single bucket per
Can I also get the edit privilege for wiki please?
I'd like to add some details about LDAP authentication..
Mikhail
2013/8/21 Stephen Sprague sprag...@gmail.com
Sanjay gets some love after all! :)
On Tue, Aug 20, 2013 at 4:00 PM, Sanjay Subramanian
sanjay.subraman...@wizecommerce.com
Not able to find this id in cwiki. Did you create an account on
cwiki.apache.org
On Wed, Aug 21, 2013 at 2:59 PM, Mikhail Antonov olorinb...@gmail.comwrote:
mantonov
Can you try this:
select search_terms, productid, clicks_to_product from npath ( on clicks
distributed by sessionid sort by timestamp
arg1('SEARCH.NOTPRODUCT*.PRODUCT'),
arg2('SEARCH'), arg3(page = 'SEARCH'),
arg4('PRODUCT'),
I believe I tried that, both in the return argument and the outer query. If
memory serves me, I got an error about the array index needing to be a
constant value.
I will try again when I get back to a computer.
Sent from my iPhone
On Aug 21, 2013, at 6:48 PM, Harish Butani
In hive i use SET mapreduce.input.fileinputformat.split.maxsize=134217728;
but not effect and i found when use
LOAD DATA INPATH '/data_split/data_rowkey.lzo'
OVERWRITE INTO TABLE data_zh
The hdfs data move to hive directory i CREATE EXTERNAL TABLE but
issue is data_rowkey.lzo.index
is also
HI,ALL:
i execute a query ,but error,any one know what happened? BTW i use yarn
framework
2013-08-22 09:47:09,893 Stage-1 map = 28%, reduce = 1%, Cumulative CPU
4140.64 sec
2013-08-22 09:47:10,952 Stage-1 map = 28%, reduce = 1%, Cumulative CPU
4140.72 sec
2013-08-22 09:47:12,008 Stage-1 map =
hi all
when exec hive query throw exception as follow
I donnot know where is error log I found $HIVE_HOME/ logs not exist
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 3
In order to change the average load for a reducer (in
By default, hive.log should exist in /tmp/user_name.
Also, it could be set in $HIVE_HOME/conf/hive-log4j.properties and
hive-exec-log4j.properties
- hive.log.dir
- hive.log.file
2013/8/22 闫昆 yankunhad...@gmail.com
hi all
when exec hive query throw exception as follow
I donnot know where is
Create the LZO index after moving the file to hive directory (i.e after
executing your LOAD DATA* statement). Index file is needed only during job
execution and if its not present in the same directory, it would not split
the large file.
On Thu, Aug 22, 2013 at 7:11 AM, 闫昆
hive select page_url,concat_ws('',
map_keys(UNION_MAP(MAP(first_category,'fcategory' as fcategorys ,
token,concat_ws('',
map_keys(UNION_MAP(MAP(concat(original_category,',',weight),'dummy' as
r from media_visit_info group by page_url,token ;
# jstat -gcutil 28409 1000 10
S0 S1 E
thanks all i move lzo index to hive directory is work fine .
thanks
2013/8/22 Rajesh Balamohan rajesh.balamo...@gmail.com
Create the LZO index after moving the file to hive directory (i.e after
executing your LOAD DATA* statement). Index file is needed only during job
execution and if its
hi all
i exec
hive (zh_site) select id,
longitude,
latitude,
month,
round(avg(sd), 2) as avgsd
from data_zh
where id between '5' and '6'
and month = 1
Good to hear that.
On Thu, Aug 22, 2013 at 9:02 AM, 闫昆 yankunhad...@gmail.com wrote:
thanks all i move lzo index to hive directory is work fine .
thanks
2013/8/22 Rajesh Balamohan rajesh.balamo...@gmail.com
Create the LZO index after moving the file to hive directory (i.e after
thanks Bing I found it
2013/8/22 Bing Li sarah.lib...@gmail.com
By default, hive.log should exist in /tmp/user_name.
Also, it could be set in $HIVE_HOME/conf/hive-log4j.properties and
hive-exec-log4j.properties
- hive.log.dir
- hive.log.file
2013/8/22 闫昆 yankunhad...@gmail.com
hi all
34 matches
Mail list logo