cannot find class com.hadoop.mapred.DeprecatedLzoTextInputFormat

2013-08-21 Thread 闫昆
Diagnostic Messages for this Task: Error: java.io.IOException: cannot find class com.hadoop.mapred.DeprecatedLzoTextInputFormat i add hadoop-lzoxx.jar to $HIVE_HOME/lib but it not effect show me the same exception,who can tell me why? And what should i do thank for you help -- In the

only one mapper

2013-08-21 Thread 闫昆
hi all when i use hive hive job make only one mapper actually my file split 18 block my block size is 128MB and data size 2GB i use lzo compression and create file.lzo and make index file.lzo.index i use hive 0.10.0 Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set

joining 2 tables

2013-08-21 Thread Sandeep Nemuri
Hi all , I want to join two tables ** I have table_A: id1 var1 var2 1 ab 2 cd Table_B: id2 var3 var4 3 ef 4 gh Expected Output is : id1 var1 var2 id2 var3 var4 1 ab 3e f 2 cd 4g h Thanks in advance. -- --Regards Sandeep Nemuri

Re: No java compiler available exception for HWI

2013-08-21 Thread Edward Capriolo
We rally should pre compile the jsp. There ia a jira on this somewhere. On Tuesday, August 20, 2013, Bing Li sarah.lib...@gmail.com wrote: Hi, Eric et al Did you resolve this failure? I'm using Hive-0.11.0, and get the same error when access to HWI via browser. I already set the following

using hive with multiple schemas

2013-08-21 Thread Chris Driscol
Hi - I just started to get my feet wet with Hive and have a question that I have not been able to find an answer to.. Suppose I have 2 CSV files: cat Schema1.csv Name, Address, Phone Chris, address1, 999-999- and cat Schema2.csv Id, Name, Address, Gender, Phone 13, Tom, address2, male,

Re: single output file per partition?

2013-08-21 Thread Stephen Sprague
hi igor, lots of ideas there! I can't speak for them all but let me confirm first that cluster by X into 1 bucket didn't work? I would have thought that would have done it. On Tue, Aug 20, 2013 at 2:29 PM, Igor Tatarinov i...@decide.com wrote: What's the best way to enforce a single output

Re: Last time request for cwiki update privileges

2013-08-21 Thread Stephen Sprague
Sanjay gets some love after all! :) On Tue, Aug 20, 2013 at 4:00 PM, Sanjay Subramanian sanjay.subraman...@wizecommerce.com wrote: Thanks Ashutosh From: Ashutosh Chauhan hashut...@apache.orgmailto:hashut...@apache.org Reply-To: user@hive.apache.orgmailto:user@hive.apache.org

Re: joining 2 tables

2013-08-21 Thread Stephen Sprague
I'm not sure if you'd call that a join. that just looks like two tables side by side in some random order. the only way to get that (that i can see) is if there is some kind of function between the two ids in the two tables. That way you could join on A.id1 = function(B.id2) otherwise the only

Re: using hive with multiple schemas

2013-08-21 Thread Stephen Sprague
yeah. database design is always subjective so everybody has an opinion about it. but if you're just starting out i would recommend you kinda follow the rules as you would in a traditional relational database system. so two different datasets would mean two different tables in both Hive and an Rdb

Re: only one mapper

2013-08-21 Thread Igor Tatarinov
LZO files are combinable so check your max split setting. http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3c4e328964.7000...@gmail.com%3E igor decide.com On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 yankunhad...@gmail.com wrote: hi all when i use hive hive job make only one mapper

Re: only one mapper

2013-08-21 Thread Edward Capriolo
LZO files are only splittable if you index them. Sequence files compresses with LZO are splittable without being indexed. Snappy + SequenceFile is a better option then LZO. On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov i...@decide.com wrote: LZO files are combinable so check your max split

Re: only one mapper

2013-08-21 Thread pandees waran
Hi Edward, Could yiu please explain this? Snappy + SequenceFile is a better option then LZO. Thanks, Pandeeswaran  — Sent from Mailbox for iPad On Wed, Aug 21, 2013

Re: single output file per partition?

2013-08-21 Thread Igor Tatarinov
Using a single bucket per partition seems to create a single reducer which is too slow. I've tried enforcing small files merge but that didn't work. I still got multiple output files. Creating a temp table and then combining the multiple files into one using a simple select * is the only option

Re: First/last in npath

2013-08-21 Thread Harish Butani
Can you provide details on what you want to do. You maybe able to express this by stacking queries: execute npath in a SubQuery in the from clause and then do windowing in an outer select. Also you get the 'path' object back from npath, so you can apply array indexing on it. regards, Harish.

Re: using hive with multiple schemas

2013-08-21 Thread Sanjay Subramanian
Some ideas to get u started CREATE EXTERNAL TABLE IF NOT EXISTS names(fullname STRING,address STRING,phone STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' CREATE EXTERNAL TABLE IF NOT EXISTS names_detail(id BIGINT, fullname STRING,address STRING,gender STRING, phone STRING) ROW FORMAT

Re: single output file per partition?

2013-08-21 Thread Stephen Sprague
I see. I'll have to punt then. However, there is an after the fact file crusher Ed Capriolo wrote a while back here: https://github.com/edwardcapriolo/filecrush YMMV On Wed, Aug 21, 2013 at 11:12 AM, Igor Tatarinov i...@decide.com wrote: Using a single bucket per partition seems to create a

Re: only one mapper

2013-08-21 Thread Sanjay Subramanian
Hi Try this setting in your hive query SET mapreduce.input.fileinputformat.split.maxsize=some bytes; If u set this value low then the MR job will use this size to split the input LZO files and u will get multiple mappers (and make sure the input LZO files are indexed I.e. .LZO.INDEX files are

Re: single output file per partition?

2013-08-21 Thread Sanjay Subramanian
Hi I tried file crusher with LZO but it does not work….I have LZO correctly configured in production and my jobs are running daily using LZO compression. I like Crusher so I will see why its not working…Thanks to Edward the code is there to tweak :-) and test locally sanjay From: Stephen

Re: First/last in npath

2013-08-21 Thread Justin Workman
Assuming click stream type of data I want to get the search terms from the first search request, and return the product id that was eventually viewed and the number of clicks to the product. So something like this select search_terms, productid, clicks_to_product from npath ( on clicks

Re: single output file per partition?

2013-08-21 Thread Igor Tatarinov
Actually, using a temp table doesn't work either. Apparently, a single mapper can read from multiple partitions (and output multiple files). There is no way to force a single mapper per partition. On Wed, Aug 21, 2013 at 11:12 AM, Igor Tatarinov i...@decide.com wrote: Using a single bucket per

Re: Last time request for cwiki update privileges

2013-08-21 Thread Mikhail Antonov
Can I also get the edit privilege for wiki please? I'd like to add some details about LDAP authentication.. Mikhail 2013/8/21 Stephen Sprague sprag...@gmail.com Sanjay gets some love after all! :) On Tue, Aug 20, 2013 at 4:00 PM, Sanjay Subramanian sanjay.subraman...@wizecommerce.com

Re: Last time request for cwiki update privileges

2013-08-21 Thread Ashutosh Chauhan
Not able to find this id in cwiki. Did you create an account on cwiki.apache.org On Wed, Aug 21, 2013 at 2:59 PM, Mikhail Antonov olorinb...@gmail.comwrote: mantonov

Re: First/last in npath

2013-08-21 Thread Harish Butani
Can you try this: select search_terms, productid, clicks_to_product from npath ( on clicks distributed by sessionid sort by timestamp arg1('SEARCH.NOTPRODUCT*.PRODUCT'), arg2('SEARCH'), arg3(page = 'SEARCH'), arg4('PRODUCT'),

Re: First/last in npath

2013-08-21 Thread Justin Workman
I believe I tried that, both in the return argument and the outer query. If memory serves me, I got an error about the array index needing to be a constant value. I will try again when I get back to a computer. Sent from my iPhone On Aug 21, 2013, at 6:48 PM, Harish Butani

Re: only one mapper

2013-08-21 Thread 闫昆
In hive i use SET mapreduce.input.fileinputformat.split.maxsize=134217728; but not effect and i found when use LOAD DATA INPATH '/data_split/data_rowkey.lzo' OVERWRITE INTO TABLE data_zh The hdfs data move to hive directory i CREATE EXTERNAL TABLE but issue is data_rowkey.lzo.index is also

execute hive query error Error: GC overhead limit exceeded

2013-08-21 Thread ch huang
HI,ALL: i execute a query ,but error,any one know what happened? BTW i use yarn framework 2013-08-22 09:47:09,893 Stage-1 map = 28%, reduce = 1%, Cumulative CPU 4140.64 sec 2013-08-22 09:47:10,952 Stage-1 map = 28%, reduce = 1%, Cumulative CPU 4140.72 sec 2013-08-22 09:47:12,008 Stage-1 map =

hive query error

2013-08-21 Thread 闫昆
hi all when exec hive query throw exception as follow I donnot know where is error log I found $HIVE_HOME/ logs not exist Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 3 In order to change the average load for a reducer (in

Re: hive query error

2013-08-21 Thread Bing Li
By default, hive.log should exist in /tmp/user_name. Also, it could be set in $HIVE_HOME/conf/hive-log4j.properties and hive-exec-log4j.properties - hive.log.dir - hive.log.file 2013/8/22 闫昆 yankunhad...@gmail.com hi all when exec hive query throw exception as follow I donnot know where is

Re: only one mapper

2013-08-21 Thread Rajesh Balamohan
Create the LZO index after moving the file to hive directory (i.e after executing your LOAD DATA* statement). Index file is needed only during job execution and if its not present in the same directory, it would not split the large file. On Thu, Aug 22, 2013 at 7:11 AM, 闫昆

high gc time ,how do tuning?

2013-08-21 Thread ch huang
hive select page_url,concat_ws('', map_keys(UNION_MAP(MAP(first_category,'fcategory' as fcategorys , token,concat_ws('', map_keys(UNION_MAP(MAP(concat(original_category,',',weight),'dummy' as r from media_visit_info group by page_url,token ; # jstat -gcutil 28409 1000 10 S0 S1 E

Re: only one mapper

2013-08-21 Thread 闫昆
thanks all i move lzo index to hive directory is work fine . thanks 2013/8/22 Rajesh Balamohan rajesh.balamo...@gmail.com Create the LZO index after moving the file to hive directory (i.e after executing your LOAD DATA* statement). Index file is needed only during job execution and if its

FileNotFoundException

2013-08-21 Thread 闫昆
hi all i exec hive (zh_site) select id, longitude, latitude, month, round(avg(sd), 2) as avgsd from data_zh where id between '5' and '6' and month = 1

Re: only one mapper

2013-08-21 Thread Rajesh Balamohan
Good to hear that. On Thu, Aug 22, 2013 at 9:02 AM, 闫昆 yankunhad...@gmail.com wrote: thanks all i move lzo index to hive directory is work fine . thanks 2013/8/22 Rajesh Balamohan rajesh.balamo...@gmail.com Create the LZO index after moving the file to hive directory (i.e after

Re: hive query error

2013-08-21 Thread 闫昆
thanks Bing I found it 2013/8/22 Bing Li sarah.lib...@gmail.com By default, hive.log should exist in /tmp/user_name. Also, it could be set in $HIVE_HOME/conf/hive-log4j.properties and hive-exec-log4j.properties - hive.log.dir - hive.log.file 2013/8/22 闫昆 yankunhad...@gmail.com hi all