Re: hive load data from remote system

2014-03-03 Thread Vishnu Viswanath
Thank you jone. i will try that


On Tue, Mar 4, 2014 at 1:23 PM, Jone Lura  wrote:

> Hi Vishnu,
>
> I have a similar scenario, and I came over the webhdfs project that is a
> REST service interface to the Hadoop File system.
>
> http://hadoop.apache.org/docs/r1.2.1/webhdfs.html
>
> Im not sure if this makes it simpler than using FTP, but it is an
> alternative which may be considered.
>
> Best regards,
>
> Jone
>
>
> On 04 Mar 2014, at 06:03, Vishnu Viswanath 
> wrote:
>
> > Hi All,
> >
> > I have a scenario where i have to load data into a hive table from a csv
> file.
> > But the csv file is in a remote system.
> >
> > I am planning to use ftp to transfer the file to the system where hive
> is running and load the data into the table. Is there any better way of
> doing this?
> >
> > Regards
>
>


Re: hive load data from remote system

2014-03-03 Thread Jone Lura
Hi Vishnu,

I have a similar scenario, and I came over the webhdfs project that is a REST 
service interface to the Hadoop File system.

http://hadoop.apache.org/docs/r1.2.1/webhdfs.html

Im not sure if this makes it simpler than using FTP, but it is an alternative 
which may be considered.

Best regards,

Jone


On 04 Mar 2014, at 06:03, Vishnu Viswanath  wrote:

> Hi All,
> 
> I have a scenario where i have to load data into a hive table from a csv file.
> But the csv file is in a remote system. 
> 
> I am planning to use ftp to transfer the file to the system where hive is 
> running and load the data into the table. Is there any better way of doing 
> this?
> 
> Regards 



HIVE QUERY HELP:: HOW TO IMPLEMENT THIS CASE

2014-03-03 Thread yogesh dhari
Hello All,

I have a use case in RDBMS query which I have implemented in
HIVE as..



*1.1) Update statement* *in RDBMS*

update  TABLE1
set
Age= case when isnull(age,'') ='' then 'A= Current' else '240+ Days' end,
Prev_Age=case when isnull(prev_age,'') ='' then 'A= Current' else '240+
Days' end;
*1.2) Update statement* *in HIVE*

create table  TABLE2 as select
a.* ,
case when coalesce(a.age,'')='' then 'A=Current' else '240+ Days' end as
Age,
case when coalesce(a.prev_age,'')='' then 'A=Current' else '240+ Days' end
as Prev_age from TABLE1 a ;





*Now I have a case statement in which I have a join condition*.



*2) Join in RDBMS*
update  TABLE1
set a.Age = c.sys_buscd_item_desc1
from  TABLE1 a
join  TABLE2 c
on c.sys_type_cd='AGE'
where isnull(a.age,'00')=c.sys_item;
commit;





How can I implement this query into Hive, Pls help and suggest.



Thanks In Advance

Yogesh Kumar


Re: Hive hbase handler composite key - hbase full scan on key

2014-03-03 Thread Navis류승우
https://issues.apache.org/jira/browse/HIVE-6411 is exactly for the cases.

The bad new is that it seemed not included even in 0.13.0 and you should
implement own predicate analyzer.

Thanks,
Navis


2014-03-03 20:52 GMT+09:00 Juraj jiv :

> Hello,
> im currently testing Hbase integration into Hive. I want to use fast hbase
> key lookup in Hive but my hbase key is composite.
> I found a solution how to crete table with hbase key as struct which work
> fine:
>
> CREATE EXTERNAL TABLE table_tst(
> key struct, 
> ROW FORMAT DELIMITED
> COLLECTION ITEMS TERMINATED BY '_'
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' ...
>
> But if i use this select in hive:
> select * from table_tst where key.a = '1407273705';
> It takes about 860 seconds to print 2 records. So it makes full scan :/
>
> If i use similar select from Java Hbase API as:
> Scan scan = new Scan();
> scan.setStartRow("1407273705".getBytes());
> scan.setStopRow("1407273705~".getBytes());
>
> Note: "~" is end char for me - it has high byte value, my composite key
> delimiter is "_"
> This select 2 records in 2 seconds.
>
> How can i tell Hive go with start/stop scanner over this key.a value...
>
> JV
>


Re: Limited capabilities of a custom input format

2014-03-03 Thread Navis류승우
You might be interested in https://issues.apache.org/jira/browse/HIVE-1662,
using predicate on file-name vc to filter out inputs. For example,

select key,INPUT__FILE__NAME from srcbucket2 where INPUT__FILE__NAME rlike
'.*/srcbucket2[03].txt'

But it's not committed, yet.

Thanks,



2014-03-03 23:14 GMT+09:00 Petter von Dolwitz (Hem) <
petter.von.dolw...@gmail.com>:

> Hi,
>
> I have implemented a few custom input formats in Hive. It seems like only
> the getRecordReader() method of these input formats is being called though,
> i.e. there is no way of overriding the listStatus() method and provide a
> custom input filter. The only way I can set a file filter is by using the
> mapred.input.pathFilter.class property which leaves me at using the same
> filter for all input formats. I would like a way to specify a filter per
> input format. Is there a way around this limitation?
>
> I am on Hive 0.10. I think I have seen that when running jobs locally that
> the listStatus() method of my input formats are called but not when handing
> over the job to a hadoop cluster. It seems like the listStatus is called on
> hadoops CombineFileInputFormat instead.
>
> Thanks,
> Petter
>


hive load data from remote system

2014-03-03 Thread Vishnu Viswanath
Hi All,

I have a scenario where i have to load data into a hive table from a csv
file.
But the csv file is in a remote system.

I am planning to use ftp to transfer the file to the system where hive is
running and load the data into the table. Is there any better way of doing
this?

Regards


Re: [ANNOUNCE] New Hive PMC Member - Xuefu Zhang

2014-03-03 Thread Vikram Dixit
Congrats Xuefu!

On Mar 1, 2014, at 8:29 PM, Navis류승우 wrote:

> Congratulations, Xuefu!
> 
> 
> 2014-03-01 14:38 GMT+09:00 Lefty Leverenz :
> Congrats Xuefu!
> 
> -- Lefty
> 
> 
> On Fri, Feb 28, 2014 at 2:52 PM, Eric Hanson (BIG DATA) 
>  wrote:
> Congratulations Xuefu!
> 
> -Original Message-
> From: Remus Rusanu [mailto:rem...@microsoft.com]
> Sent: Friday, February 28, 2014 11:43 AM
> To: d...@hive.apache.org; user@hive.apache.org
> Cc: Xuefu Zhang
> Subject: RE: [ANNOUNCE] New Hive PMC Member - Xuefu Zhang
> 
> Grats!
> 
> From: Prasanth Jayachandran 
> Sent: Friday, February 28, 2014 9:11 PM
> To: d...@hive.apache.org
> Cc: user@hive.apache.org; Xuefu Zhang
> Subject: Re: [ANNOUNCE] New Hive PMC Member - Xuefu Zhang
> 
> Congratulations Xuefu!
> 
> Thanks
> Prasanth Jayachandran
> 
> On Feb 28, 2014, at 11:04 AM, Vaibhav Gumashta  
> wrote:
> 
> > Congrats Xuefu!
> >
> >
> > On Fri, Feb 28, 2014 at 9:20 AM, Prasad Mujumdar 
> > wrote:
> >
> >>   Congratulations Xuefu !!
> >>
> >> thanks
> >> Prasad
> >>
> >>
> >>
> >> On Fri, Feb 28, 2014 at 1:20 AM, Carl Steinbach  wrote:
> >>
> >>> I am pleased to announce that Xuefu Zhang has been elected to the
> >>> Hive Project Management Committee. Please join me in congratulating Xuefu!
> >>>
> >>> Thanks.
> >>>
> >>> Carl
> >>>
> >>>
> >>
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or
> > entity to which it is addressed and may contain information that is
> > confidential, privileged and exempt from disclosure under applicable
> > law. If the reader of this message is not the intended recipient, you
> > are hereby notified that any printing, copying, dissemination,
> > distribution, disclosure or forwarding of this communication is
> > strictly prohibited. If you have received this communication in error,
> > please contact the sender immediately and delete it from your system. Thank 
> > You.
> 
> 
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader of 
> this message is not the intended recipient, you are hereby notified that any 
> printing, copying, dissemination, distribution, disclosure or forwarding of 
> this communication is strictly prohibited. If you have received this 
> communication in error, please contact the sender immediately and delete it 
> from your system. Thank You.
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


HDFS Storage Locations and Hive

2014-03-03 Thread John Omernik
Given the direction HDFS is going with Storage locations as identified in

https://issues.apache.org/jira/browse/HDFS-2832

and

https://issues.apache.org/jira/secure/attachment/12597860/20130813-HeterogeneousStorage.pdf

Is now the right time to toss out some suggestions for the Hive project on
incorporating some of these features?

I don't see it being a hard thing, we could have new partitions created in
a managed table targeted to a certain storage type (say SSD) and then a
very simple built in command that could be run to set the storage location
on partitions older than a certain date to be on slower storage (say HDD).


Basically, the ideal from a user/administrator perspective is managed
tables with partition locations on various storage locations that is
seamless to user running a query.

Say I have 5 partitions
day='2014-03-03'
day='2014-03-02'
day='2014-03-01'
day='2014-02-28'
day='2014-02-27'

March 2 and 3 would be assigned to SSD storage, and Feb 27-March 1 would be
on HDD.  at some point in the early morning of March 4. (note March 4
automatically goes into SSD).  A command could be run

ALTER TABLE mytable SET part_location =
/user/hive/fast_data/mytable/day='2014-02-27' WHERE day='2014-02-27'

This command would do a few things:

Ensure the location doesn't exist (if it does, it will fail, perhaps this
is controllable in the command, i.e. copy data, don't fail if directory
exists)
Create the new new part_location
Copy the data from old location to new location (verify this command
completes)
Update Metadata in hive to point the managed part location to new location
remove old data/location

This would allow there to be a simple command that does all the work of
moving things, yes it still would be manual (i.e. it's not built into hive
to auto age older partitions) but at the same time, that is something that
should be managed by the admin anyhow. This allows a simple command to all
the work, including moving the data and updating the metadata.  Heck, you
could even add a feature here. instead of just copying the files in the old
location, perhaps do a INSERT OVERWRITE  type command where if there was
lots of smaller files appended to the original older location, the
"archival" process uses Map Reduce to reorganize the files into larger
files for better compression, storage.  Maybe add a "WITH DEFRAG" option to
the Alter statement.   WITH DEFAG could be a trigger for a m/r rather than
just file copy. If this M/R fails, obviously the metadata isn't updated and
the old data isn't deleted.



Thoughts?


Limited capabilities of a custom input format

2014-03-03 Thread Petter von Dolwitz (Hem)
Hi,

I have implemented a few custom input formats in Hive. It seems like only
the getRecordReader() method of these input formats is being called though,
i.e. there is no way of overriding the listStatus() method and provide a
custom input filter. The only way I can set a file filter is by using the
mapred.input.pathFilter.class property which leaves me at using the same
filter for all input formats. I would like a way to specify a filter per
input format. Is there a way around this limitation?

I am on Hive 0.10. I think I have seen that when running jobs locally that
the listStatus() method of my input formats are called but not when handing
over the job to a hadoop cluster. It seems like the listStatus is called on
hadoops CombineFileInputFormat instead.

Thanks,
Petter


Hive hbase handler composite key - hbase full scan on key

2014-03-03 Thread Juraj jiv
Hello,
im currently testing Hbase integration into Hive. I want to use fast hbase
key lookup in Hive but my hbase key is composite.
I found a solution how to crete table with hbase key as struct which work
fine:

CREATE EXTERNAL TABLE table_tst(
key struct, 
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '_'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' ...

But if i use this select in hive:
select * from table_tst where key.a = '1407273705';
It takes about 860 seconds to print 2 records. So it makes full scan :/

If i use similar select from Java Hbase API as:
Scan scan = new Scan();
scan.setStartRow("1407273705".getBytes());
scan.setStopRow("1407273705~".getBytes());

Note: "~" is end char for me - it has high byte value, my composite key
delimiter is "_"
This select 2 records in 2 seconds.

How can i tell Hive go with start/stop scanner over this key.a value...

JV