Re: hive load data from remote system
Thank you jone. i will try that On Tue, Mar 4, 2014 at 1:23 PM, Jone Lura wrote: > Hi Vishnu, > > I have a similar scenario, and I came over the webhdfs project that is a > REST service interface to the Hadoop File system. > > http://hadoop.apache.org/docs/r1.2.1/webhdfs.html > > Im not sure if this makes it simpler than using FTP, but it is an > alternative which may be considered. > > Best regards, > > Jone > > > On 04 Mar 2014, at 06:03, Vishnu Viswanath > wrote: > > > Hi All, > > > > I have a scenario where i have to load data into a hive table from a csv > file. > > But the csv file is in a remote system. > > > > I am planning to use ftp to transfer the file to the system where hive > is running and load the data into the table. Is there any better way of > doing this? > > > > Regards > >
Re: hive load data from remote system
Hi Vishnu, I have a similar scenario, and I came over the webhdfs project that is a REST service interface to the Hadoop File system. http://hadoop.apache.org/docs/r1.2.1/webhdfs.html Im not sure if this makes it simpler than using FTP, but it is an alternative which may be considered. Best regards, Jone On 04 Mar 2014, at 06:03, Vishnu Viswanath wrote: > Hi All, > > I have a scenario where i have to load data into a hive table from a csv file. > But the csv file is in a remote system. > > I am planning to use ftp to transfer the file to the system where hive is > running and load the data into the table. Is there any better way of doing > this? > > Regards
HIVE QUERY HELP:: HOW TO IMPLEMENT THIS CASE
Hello All, I have a use case in RDBMS query which I have implemented in HIVE as.. *1.1) Update statement* *in RDBMS* update TABLE1 set Age= case when isnull(age,'') ='' then 'A= Current' else '240+ Days' end, Prev_Age=case when isnull(prev_age,'') ='' then 'A= Current' else '240+ Days' end; *1.2) Update statement* *in HIVE* create table TABLE2 as select a.* , case when coalesce(a.age,'')='' then 'A=Current' else '240+ Days' end as Age, case when coalesce(a.prev_age,'')='' then 'A=Current' else '240+ Days' end as Prev_age from TABLE1 a ; *Now I have a case statement in which I have a join condition*. *2) Join in RDBMS* update TABLE1 set a.Age = c.sys_buscd_item_desc1 from TABLE1 a join TABLE2 c on c.sys_type_cd='AGE' where isnull(a.age,'00')=c.sys_item; commit; How can I implement this query into Hive, Pls help and suggest. Thanks In Advance Yogesh Kumar
Re: Hive hbase handler composite key - hbase full scan on key
https://issues.apache.org/jira/browse/HIVE-6411 is exactly for the cases. The bad new is that it seemed not included even in 0.13.0 and you should implement own predicate analyzer. Thanks, Navis 2014-03-03 20:52 GMT+09:00 Juraj jiv : > Hello, > im currently testing Hbase integration into Hive. I want to use fast hbase > key lookup in Hive but my hbase key is composite. > I found a solution how to crete table with hbase key as struct which work > fine: > > CREATE EXTERNAL TABLE table_tst( > key struct, > ROW FORMAT DELIMITED > COLLECTION ITEMS TERMINATED BY '_' > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' ... > > But if i use this select in hive: > select * from table_tst where key.a = '1407273705'; > It takes about 860 seconds to print 2 records. So it makes full scan :/ > > If i use similar select from Java Hbase API as: > Scan scan = new Scan(); > scan.setStartRow("1407273705".getBytes()); > scan.setStopRow("1407273705~".getBytes()); > > Note: "~" is end char for me - it has high byte value, my composite key > delimiter is "_" > This select 2 records in 2 seconds. > > How can i tell Hive go with start/stop scanner over this key.a value... > > JV >
Re: Limited capabilities of a custom input format
You might be interested in https://issues.apache.org/jira/browse/HIVE-1662, using predicate on file-name vc to filter out inputs. For example, select key,INPUT__FILE__NAME from srcbucket2 where INPUT__FILE__NAME rlike '.*/srcbucket2[03].txt' But it's not committed, yet. Thanks, 2014-03-03 23:14 GMT+09:00 Petter von Dolwitz (Hem) < petter.von.dolw...@gmail.com>: > Hi, > > I have implemented a few custom input formats in Hive. It seems like only > the getRecordReader() method of these input formats is being called though, > i.e. there is no way of overriding the listStatus() method and provide a > custom input filter. The only way I can set a file filter is by using the > mapred.input.pathFilter.class property which leaves me at using the same > filter for all input formats. I would like a way to specify a filter per > input format. Is there a way around this limitation? > > I am on Hive 0.10. I think I have seen that when running jobs locally that > the listStatus() method of my input formats are called but not when handing > over the job to a hadoop cluster. It seems like the listStatus is called on > hadoops CombineFileInputFormat instead. > > Thanks, > Petter >
hive load data from remote system
Hi All, I have a scenario where i have to load data into a hive table from a csv file. But the csv file is in a remote system. I am planning to use ftp to transfer the file to the system where hive is running and load the data into the table. Is there any better way of doing this? Regards
Re: [ANNOUNCE] New Hive PMC Member - Xuefu Zhang
Congrats Xuefu! On Mar 1, 2014, at 8:29 PM, Navis류승우 wrote: > Congratulations, Xuefu! > > > 2014-03-01 14:38 GMT+09:00 Lefty Leverenz : > Congrats Xuefu! > > -- Lefty > > > On Fri, Feb 28, 2014 at 2:52 PM, Eric Hanson (BIG DATA) > wrote: > Congratulations Xuefu! > > -Original Message- > From: Remus Rusanu [mailto:rem...@microsoft.com] > Sent: Friday, February 28, 2014 11:43 AM > To: d...@hive.apache.org; user@hive.apache.org > Cc: Xuefu Zhang > Subject: RE: [ANNOUNCE] New Hive PMC Member - Xuefu Zhang > > Grats! > > From: Prasanth Jayachandran > Sent: Friday, February 28, 2014 9:11 PM > To: d...@hive.apache.org > Cc: user@hive.apache.org; Xuefu Zhang > Subject: Re: [ANNOUNCE] New Hive PMC Member - Xuefu Zhang > > Congratulations Xuefu! > > Thanks > Prasanth Jayachandran > > On Feb 28, 2014, at 11:04 AM, Vaibhav Gumashta > wrote: > > > Congrats Xuefu! > > > > > > On Fri, Feb 28, 2014 at 9:20 AM, Prasad Mujumdar > > wrote: > > > >> Congratulations Xuefu !! > >> > >> thanks > >> Prasad > >> > >> > >> > >> On Fri, Feb 28, 2014 at 1:20 AM, Carl Steinbach wrote: > >> > >>> I am pleased to announce that Xuefu Zhang has been elected to the > >>> Hive Project Management Committee. Please join me in congratulating Xuefu! > >>> > >>> Thanks. > >>> > >>> Carl > >>> > >>> > >> > > > > -- > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or > > entity to which it is addressed and may contain information that is > > confidential, privileged and exempt from disclosure under applicable > > law. If the reader of this message is not the intended recipient, you > > are hereby notified that any printing, copying, dissemination, > > distribution, disclosure or forwarding of this communication is > > strictly prohibited. If you have received this communication in error, > > please contact the sender immediately and delete it from your system. Thank > > You. > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader of > this message is not the intended recipient, you are hereby notified that any > printing, copying, dissemination, distribution, disclosure or forwarding of > this communication is strictly prohibited. If you have received this > communication in error, please contact the sender immediately and delete it > from your system. Thank You. > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
HDFS Storage Locations and Hive
Given the direction HDFS is going with Storage locations as identified in https://issues.apache.org/jira/browse/HDFS-2832 and https://issues.apache.org/jira/secure/attachment/12597860/20130813-HeterogeneousStorage.pdf Is now the right time to toss out some suggestions for the Hive project on incorporating some of these features? I don't see it being a hard thing, we could have new partitions created in a managed table targeted to a certain storage type (say SSD) and then a very simple built in command that could be run to set the storage location on partitions older than a certain date to be on slower storage (say HDD). Basically, the ideal from a user/administrator perspective is managed tables with partition locations on various storage locations that is seamless to user running a query. Say I have 5 partitions day='2014-03-03' day='2014-03-02' day='2014-03-01' day='2014-02-28' day='2014-02-27' March 2 and 3 would be assigned to SSD storage, and Feb 27-March 1 would be on HDD. at some point in the early morning of March 4. (note March 4 automatically goes into SSD). A command could be run ALTER TABLE mytable SET part_location = /user/hive/fast_data/mytable/day='2014-02-27' WHERE day='2014-02-27' This command would do a few things: Ensure the location doesn't exist (if it does, it will fail, perhaps this is controllable in the command, i.e. copy data, don't fail if directory exists) Create the new new part_location Copy the data from old location to new location (verify this command completes) Update Metadata in hive to point the managed part location to new location remove old data/location This would allow there to be a simple command that does all the work of moving things, yes it still would be manual (i.e. it's not built into hive to auto age older partitions) but at the same time, that is something that should be managed by the admin anyhow. This allows a simple command to all the work, including moving the data and updating the metadata. Heck, you could even add a feature here. instead of just copying the files in the old location, perhaps do a INSERT OVERWRITE type command where if there was lots of smaller files appended to the original older location, the "archival" process uses Map Reduce to reorganize the files into larger files for better compression, storage. Maybe add a "WITH DEFRAG" option to the Alter statement. WITH DEFAG could be a trigger for a m/r rather than just file copy. If this M/R fails, obviously the metadata isn't updated and the old data isn't deleted. Thoughts?
Limited capabilities of a custom input format
Hi, I have implemented a few custom input formats in Hive. It seems like only the getRecordReader() method of these input formats is being called though, i.e. there is no way of overriding the listStatus() method and provide a custom input filter. The only way I can set a file filter is by using the mapred.input.pathFilter.class property which leaves me at using the same filter for all input formats. I would like a way to specify a filter per input format. Is there a way around this limitation? I am on Hive 0.10. I think I have seen that when running jobs locally that the listStatus() method of my input formats are called but not when handing over the job to a hadoop cluster. It seems like the listStatus is called on hadoops CombineFileInputFormat instead. Thanks, Petter
Hive hbase handler composite key - hbase full scan on key
Hello, im currently testing Hbase integration into Hive. I want to use fast hbase key lookup in Hive but my hbase key is composite. I found a solution how to crete table with hbase key as struct which work fine: CREATE EXTERNAL TABLE table_tst( key struct, ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '_' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' ... But if i use this select in hive: select * from table_tst where key.a = '1407273705'; It takes about 860 seconds to print 2 records. So it makes full scan :/ If i use similar select from Java Hbase API as: Scan scan = new Scan(); scan.setStartRow("1407273705".getBytes()); scan.setStopRow("1407273705~".getBytes()); Note: "~" is end char for me - it has high byte value, my composite key delimiter is "_" This select 2 records in 2 seconds. How can i tell Hive go with start/stop scanner over this key.a value... JV