Re: Hive FAQ

2012-01-25 Thread Nicolas Lalevée

Le 24 janv. 2012 à 22:48, Carl Steinbach a écrit :

 Hi Nicolas,
 
 Thanks for pointing this out. If you would like to provide answers I can give 
 you edit access to the wiki. Please create an account and send me your 
 username.

It feels a little strange that someone write all these questions in a FAQ 
without writing any answer. Why bother writing them at all ? At least it made 
me laugh :)

But never mind, I would be glad to help, my id is 'hibou'. I can at least 
answer one of them.

cheers,
Nicolas

 
 Thanks.
 
 Carl
 
 2012/1/24 Nicolas Lalevée nicolas.lale...@hibnet.org
 It seems to be there:
 https://cwiki.apache.org/confluence/display/Hive/User+FAQ
 
 But humm..how should I phrase it. shouldn't questions be followed 
 by answers ? :D
 
 Nicolas
 
 



Re: Important Question

2012-01-25 Thread bejoy_ks
Real Time.. Definitely not hive. Go in for HBase, but don't expect Hbase to be 
as flexible as RDBMS. You need to choose your Row Key and Column Families 
wisely as per your requirements.
For data mining and analytics you can mount Hive table  over corresponding 
Hbase table and play on with SQL like queries.



Regards
Bejoy K S

-Original Message-
From: Dalia Sobhy dalia.mohso...@hotmail.com
Date: Wed, 25 Jan 2012 17:01:08 
To: u...@hbase.apache.org; user@hive.apache.org
Reply-To: user@hive.apache.org
Subject: Important Question


Dear all,
I am developing an API for medical use i.e Hospital admissions and all about 
patients, thus transactions and queries and realtime data is important here...
Therefore both real-time and analytical processing is a must..
Therefore which best suits my application Hbase or Hive or another method ??
Please reply quickly bec this is critical thxxx a million ;)
  


Re: Important Question

2012-01-25 Thread Dalia Sobhy
So what about HBQL??
And if i had complex queries would i get stuck with HBase?

Also can anyone provide me with examples of a table in RDBMS transformed into 
hbase, realtime query and analytical processing..

Sent from my iPhone

On 2012-01-25, at 6:15 PM, bejoy...@yahoo.com wrote:

 Real Time.. Definitely not hive. Go in for HBase, but don't expect Hbase to 
 be as flexible as RDBMS. You need to choose your Row Key and Column Families 
 wisely as per your requirements.
 For data mining and analytics you can mount Hive table  over corresponding 
 Hbase table and play on with SQL like queries.
 
 
 
 Regards
 Bejoy K S
 
 -Original Message-
 From: Dalia Sobhy dalia.mohso...@hotmail.com
 Date: Wed, 25 Jan 2012 17:01:08 
 To: u...@hbase.apache.org; user@hive.apache.org
 Reply-To: user@hive.apache.org
 Subject: Important Question
 
 
 Dear all,
 I am developing an API for medical use i.e Hospital admissions and all about 
 patients, thus transactions and queries and realtime data is important here...
 Therefore both real-time and analytical processing is a must..
 Therefore which best suits my application Hbase or Hive or another method ??
 Please reply quickly bec this is critical thxxx a million ;)  



Re: Important Question

2012-01-25 Thread Bejoy Ks
Hi Dalia
    By complex queries if you are looking at joins with multiple tables and so 
on, Hbase doesn't support joins. In the absence of joins if you want to achieve 
a join that involved multiple tables in RDBMS, based on your requirement you 
should find suitable Column Families and Qualifiers in single Hbase table to 
accommodate those multiple tables in RDBMS. I haven't played much with HBQL, 
but if you are developing some API you can depend on the HBase Java API 
internally for storage and retrieval of records. Hbase the Querying (Retrieval 
time) largely depends on how you design the Row key and Column family (Hbase 
stores CF together and Row Keys sorted and distributed across regions). If you 
want to have a SQL like querying functionality for a Hbase table you have to 
correspondingly mount that to a hive table.

    In my personal experience I have used hbase tables for real time data 
storage and retrieval for a hadoop enterprise application. There were scheduled 
Map Reduce jobs that run on off peak hours that dumps the required data 
(formatted and filtered) from this Hbase table into hdfs and from there hive 
consumes the data for analytical purposes. We had a good number of analytical 
jobs and didn't wanted to choke hbase servers in peak hours so the mining and 
analytics part were moved completely to hive.

Regards
Bejoy.K.S




 From: Dalia Sobhy dalia.mohso...@hotmail.com
To: u...@hbase.apache.org u...@hbase.apache.org 
Cc: user@hive.apache.org user@hive.apache.org; u...@hbase.apache.org 
u...@hbase.apache.org 
Sent: Wednesday, January 25, 2012 10:00 PM
Subject: Re: Important Question
 
So what about HBQL??
And if i had complex queries would i get stuck with HBase?

Also can anyone provide me with examples of a table in RDBMS transformed into 
hbase, realtime query and analytical processing..

Sent from my iPhone

On 2012-01-25, at 6:15 PM, bejoy...@yahoo.com wrote:

 Real Time.. Definitely not hive. Go in for HBase, but don't expect Hbase to 
 be as flexible as RDBMS. You need to choose your Row Key and Column Families 
 wisely as per your requirements.
 For data mining and analytics you can mount Hive table  over corresponding 
 Hbase table and play on with SQL like queries.
 
 
 
 Regards
 Bejoy K S
 
 -Original Message-
 From: Dalia Sobhy dalia.mohso...@hotmail.com
 Date: Wed, 25 Jan 2012 17:01:08 
 To: u...@hbase.apache.org; user@hive.apache.org
 Reply-To: user@hive.apache.org
 Subject: Important Question
 
 
 Dear all,
 I am developing an API for medical use i.e Hospital admissions and all about 
 patients, thus transactions and queries and realtime data is important here...
 Therefore both real-time and analytical processing is a must..
 Therefore which best suits my application Hbase or Hive or another method ??
 Please reply quickly bec this is critical thxxx a million ;)                  
       

Re: Hive query result in sequence file

2012-01-25 Thread jingjung Ng
Thanks Aniket.

I am pretty new to hive, any java example (serde)  for archieving this ?

-Andrew

On Wed, Jan 25, 2012 at 12:12 AM, Aniket Mokashi aniket...@gmail.comwrote:

 You will have to do your own serde..

 Hive can write it sequencefile but it will be Text with NULL(bytewritable)
 key.

 Thanks,
 Aniket


 On Tue, Jan 24, 2012 at 11:41 PM, jingjung Ng jingjun...@gmail.comwrote:

 Hi,

 I have following hive query (pseudo hive query code)

  select name, address, phone from t1 join t2

 Executing above query will end up file stored in the format of name,
 address, phone format on the fie system (hdfs or local).

 However  I'd like to write to either to a sequence file (key: name,
 value: address and phone).

 Is this possible, if so how could I do this ?


 Thank you.

 JingJung




 --
 ...:::Aniket:::... Quetzalco@tl



Re: Important Question

2012-01-25 Thread Stephen Boesch
Dalia
 your requirements appear to be transaction oriented and thus OLTP systems
- i.e. regular relational databases - are more likely to be suitable than a
hive (/hadoop) based solution.  Hive is more for business intelligence and
certainly includes latencies - which by saying 'realtime'  - would likely
not be acceptable for your application.

stephenb

2012/1/25 Dalia Sobhy dalia.mohso...@hotmail.com

  I will explain to u more Mike.

 I am building a Software Oriented Architecture, I want my API to provide
 some services such as Add/Delete Patients, Search for a patient by name/ID,
 count the number of people who are suffering from measles in Alexandria
 Egypt.

 Something like that so I am wondering which best suits my API ??

  To: dalia.mohso...@hotmail.com
  CC: u...@hbase.apache.org; user@hive.apache.org
  Subject: Re: Important Question
  From: mspre...@us.ibm.com
  Date: Wed, 25 Jan 2012 12:05:39 -0500

 
  BTW, what do you mean by realtime? Do you mean you want to run some
  non-trivial query quickly enough for some sort of interactive use? Can
  you give us a feel for the sort of queries that interest you?
 
  Thanks,
  Mike
 
 
 
  From: Dalia Sobhy dalia.mohsobhy@hotm ail.com
  To: u...@hbase.apache.org u...@hbase.apache.org
  Cc: user@hive.apache.org user@hive.apache.org,
  u...@hbase.apache.org u...@hbase.apache.org
  Date: 01/25/2012 11:34 AM
  Subject: Re: Important Question
 
 
 
  So what about HBQL??
  And if i had complex queries would i get stuck with HBase?
 
  Also can anyone provide me with examples of a table in RDBMS transformed
  into hbase, realtime query and analytical processing..
 
  Sent from my iPhone
 
  On 2012-01-25, at 6:15 PM, bejoy...@yahoo.com wrote:
 
   Real Time.. Definitely not hive. Go in for HBase, but don't expect
 Hbase
  to be as flexible as RDBMS. You need to choose your Row Key and Column
  Families wisely as per your requirements.
   For data mining and analytics you can mount Hive table over
  corresponding Hbase table and play on with SQL like queries.
  
  
  
   Regards
   Bejoy K S
  
   -Original Message-
   From: Dalia Sobhy dalia.mohso...@hotmail.com
   Date: Wed, 25 Jan 2012 17:01:08
   To: u...@hbase.apache.org; user@hive.apache.org
   Reply-To: user@hive.apache.org
   Subject: Important Question
  
  
   Dear all,
   I am developing an API for medical use i.e Hospital admissions and all
  about patients, thus transactions and queries and realtime data is
  important here...
   Therefore both real-time and analytical processing is a must..
   Therefore which best suits my application Hbase or Hive or another
  method ??
   Please reply quickly bec this is critical thxxx a million ;)
 
 
 **



Reading compressed files (external tables) from hive using DeprecatedLzoTextInputFormat

2012-01-25 Thread Sam William

I have  some data generated from a Pig script  which is LZO compressed.  There 
is no indexer run on this data .   I created an external table on hive   on top 
of this data . Here is thecreate table script .



CREATE EXTERNAL TABLE tmp_hive(domain string,url string)  ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'  STORED AS INPUTFORMAT 
com.hadoop.mapred.DeprecatedLzoTextInputFormat  OUTPUTFORMAT 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat LOCATION  
'/tmp/test2';

However, when I  try to query this table, I get this error . 

Failed with exception java.io.IOException:java.io.IOException: No LZO codec 
found, cannot run.


What am I missing?  Any help is appreciated.


Thanks,
Sam William
sa...@stumbleupon.com





Re: Wiki Write Access

2012-01-25 Thread Carl Steinbach
Hi Aniket,

I added you to the ACL. Thanks for your help with the wiki.

Carl

On Tue, Jan 24, 2012 at 11:26 PM, Aniket Mokashi aniket...@gmail.comwrote:

 Hi Carl,

 It would be helpful for me too.
 My wiki username: aniket486.

 Thanks,
 Aniket


 On Tue, Jan 24, 2012 at 11:57 AM, Carl Steinbach c...@cloudera.comwrote:

 Hi Matt,

 Great!

 Please sign up for a wiki account here:
 https://cwiki.apache.org/confluence/signup.action

 Then email me your wiki username and I will add you to the Hive wiki ACL.

 Thanks.

 Carl


 On Tue, Jan 24, 2012 at 7:10 AM, Tucker, Matt matt.tuc...@disney.comwrote:

 Hi,

 ** **

 I would like to get write access to the Hive wiki, so that I can add
 documentation on existing UDFs.

 ** **

 Thanks

 ** **

 Matt Tucker

 Associate eBusiness Analyst

 Walt Disney Parks and Resorts Online

 Ph: 407-566-2545

 Tie: 8-296-2545

 ** **





 --
 ...:::Aniket:::... Quetzalco@tl



Re: rainstor

2012-01-25 Thread Sam Wilson
Google?

Sent from my iPhone

On Jan 25, 2012, at 7:34 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote:

 Do anyone have any idea about rainstor ???
 
 Opensource? How to download ? How to use? PErformance ??