Re: How to apply data mining on Hive?

2012-06-08 Thread Sukhendu Chakraborty
If you are interested, you can also look at Apache hama which provides an
MPI like interface on top of hadoop map-reduce.

http://incubator.apache.org/hama/
On Jun 8, 2012 4:55 PM, "Mark Grover"  wrote:

> Hi Jason,
> Hive does expose a JDBC interface which can by tools and applications. You
> would check out individual tools to see if they support Hadoop (I use the
> word Hadoop and not Hive since an application doesn't need Hive to run Map
> Reduce jobs on data in HDFS).
>
> Apache Mahout, as Sreenath, mentioned is also an interesting open source
> project which combines canonical machine learning algorithms with the power
> of Hadoop. That might fit your bill too.
>
> Good luck,
> Mark
>
> On Fri, Jun 8, 2012 at 1:25 AM, jason Yang wrote:
>
>> Hi, Mark.
>>
>> Thank you for your reply.
>>
>> I have read the User Guide, but I'm still wondering what can I do for the
>> following scenario:
>> 
>> 1. Suppose I have  a table t_customer_info in Hive, which include lots
>> of information about our customers.
>> 2. Now I would like to cluster those customers into different groups so
>> that customers within a group have high similarity, but are very dissimilar
>> to customers in other groups.
>> 3. This is a classical clustering problem in Data Mining field, I thought
>> such job can not be done by query language, instead of some data mining
>> algorithms.
>> 
>>
>> When we look "back" to the traditional DBMS, there're lots of data mining
>> tools or BI tools which could connect to the DBMS, and apply some canonical
>> algorithms to the data in the DBMS. So I start to wonder is there similar
>> tools over Hive?
>>
>> If not, what's the most used way to do data mining over Hadoop?
>>
>> 2012/6/8 Mark Grover 
>>
>>> Hi Jason,
>>> Hive is a data warehouse system that sits on top of Hadoop. The key
>>> selling point here is that it allows users to write SQL-like queries to
>>> query their large scale data. These queries get compiled into Map Reduce
>>> which is then run on the Hadoop cluster just like any other Map Reduce jobs.
>>>
>>> Hadoop does all the parallel processing for you. All you have to do is
>>> set up a Hadoop cluster, install Hive on the cluster and run your Hive
>>> queries. All underlying processing will happen in parallel where possible.
>>>
>>> This is a good place to get started and learn more about Hive:
>>> https://cwiki.apache.org/confluence/display/Hive/GettingStarted
>>>
>>> Welcome and good luck!
>>>
>>> Mark
>>>
>>>
>>> On Thu, Jun 7, 2012 at 10:10 PM, jason Yang wrote:
>>>
 Hi, dear friends.

 I was wondering what's the popular way to do data mining on Hive?

 Since the data in Hive is distributed over the cluster, is there any
 tool or solution could parallelize the data mining?

 Any suggestion would be appreciated.

 --
 YANG, Lin


>>>
>>
>>
>> --
>> YANG, Lin
>>
>>
>


Re: NaNs and Infinity support in HIVE?

2012-05-17 Thread Sukhendu Chakraborty
Thanks Mark. Yes, my findings were similar. It looks like HIVE does
not distinguish between Infinity and NaNs.

-Sukhendu

On Thu, May 17, 2012 at 6:50 PM, Mark Grover  wrote:
> I did a test for this.
>
> If a NaN is inserted into a string column, the value is serialized as 
> "Infinity" in HDFS. However, if it's inserted into an Integer column, it's 
> serialized as 2147483647 or -2147483648 depending on whether the output is 
> +infinity or -infinity.
>
> Mark
>
> - Original Message -
> From: "Nanda Vijaydev" 
> To: user@hive.apache.org
> Sent: Thursday, May 17, 2012 8:10:45 PM
> Subject: Re: NaNs and Infinity support in HIVE?
>
> Can you paste a sample line of your data on HDFS and which column you are 
> trying to query?
>
>
> Thanks
> Nanda Vijaydev
>
>
> On Mon, May 14, 2012 at 11:40 AM, Sukhendu Chakraborty < 
> sukhendu.chakrabo...@gmail.com > wrote:
>
>
> Are NaNs and/or Infinity supported in HIVE? If yes, I wanted to know
> how are NaNs and Infinity values represented in HDFS files to be
> interpreted correctly in Hive.
>
> When I do 'select 1/0 from tab', I get a text value, "Infinity".
> However, when I enter "Infinity" v in my HDFS file represented by the
> HIVE table (the column datatype of the table is int), and then do a
> select, I get 'NULL' (and not Infinity).
>
> I am working with HIVE 0.7.1.
>
> Thanks,
> -Sukhendu


is anybody looking at case-sensitivity for HIVE object names?

2012-05-17 Thread Sukhendu Chakraborty
https://issues.apache.org/jira/browse/HIVE-912 talks about
case-sensitivity, but it does not fixes the issue.

I would like for HIVE to have to enforce case-sensitivity for HIVE
object names. For eg. Oracle achieves this by enclosing the name in
double quotes: "". This effort also take care of object names with
special characters (eg '.', $ etc)


Re: Order by Sort by partitioned columns

2012-05-15 Thread Sukhendu Chakraborty
I think in HIVE the partitioned columns are virtual. They are not physical
data columns but a directory name corr. to the partition key value to
facilitate partition pruning during select.
On May 14, 2012 6:29 AM, "Mark Grover"  wrote:

> Hi Shin,
> If you could list the query that failed and the query used to create the
> tables in question, that would be very helpful.
>
> Mark
>
> - Original Message -
> From: "Shin Chan" 
> To: "HIVE User" 
> Sent: Monday, May 14, 2012 2:28:06 AM
> Subject: Order by Sort by partitioned columns
>
> Hi All
>
> Just curious if its possible to Order by or Sort by partitioned columns.
>
> I tried it , it fails as it is not able to find those columns. They are
> present as partitions.
>
> Any help would be appreciated.
>
> Thanks and Regards ,
>


NaNs and Infinity support in HIVE?

2012-05-14 Thread Sukhendu Chakraborty
Are NaNs and/or Infinity supported in HIVE? If yes, I wanted to know
how are NaNs and Infinity values represented in HDFS files to be
interpreted correctly in Hive.

When I do 'select 1/0 from tab', I get a text value, "Infinity".
However, when I enter "Infinity" v in my HDFS file represented by the
HIVE table (the column datatype of the table is int), and then do a
select, I get 'NULL' (and not Infinity).

I am working with HIVE 0.7.1.

Thanks,
-Sukhendu


Re: removing hdfs table data directory does not throw error in hive

2012-04-23 Thread Sukhendu Chakraborty
Thanks Nitin. I am aware of what Hive is doing. The question is, is it
okay not return an error/warning when no data is found since the
metadata for the table also contains the data location when you create
the table (which creates the hdfs directory as well). So, if somebody
erroneously removes  the hive directory corr. to the table, atleast a
warning on select might be a good idea.

-Sukhendu

On Mon, Apr 23, 2012 at 8:28 PM, Nitin Pawar  wrote:
> hive table meta data is stored into a meta data store which will retain the
> table structure and other meta info even if you delete hdfs table directory
> as its stored in metadata store db.
>
> When you do a select * from table;
> 1) hive checks for table exists in metadata store
> 2) if table is existing then check the location of data
> 3) if data is available in the location process the data else return OK
> without doing anything
>
> It is not an error case because hive job did not fail.
>
>
> On Tue, Apr 24, 2012 at 6:25 AM, Sukhendu Chakraborty
>  wrote:
>>
>> I have a hive table tab3 with two columns (c1 int, c2 int)
>>
>> hive> load data local inpath '/tmp/orhc466fb981' into table tab3;
>> Copying data from file:/tmp/orhc466fb981
>> Copying file: file:/tmp/orhc466fb981
>> Loading data to table default.tab3
>> OK
>> Time taken: 3.907 seconds
>> hive> select * from tab3;
>> OK
>> 4       2
>> 4       10
>> 7       4
>> 7       22
>> .
>> //remove the tab3 directory from hdfs
>> [schakrab@diy-1-2 orch]$ hadoop fs -rmr /user/hive/warehouse/tab3;
>> Deleted hdfs://localhost:9000/user/hive/warehouse/tab3
>> [schakrab@diy-1-2 orch]$ hive
>> Hive history
>> file=/tmp/schakrab/hive_job_log_schakrab_201204231748_1985146177.txt
>> //no error thrown!
>> hive> select * from tab3;
>> OK
>> Time taken: 3.68 seconds
>> // of course. metadata still exists.
>> hive> desc tab3;
>> OK
>> c1      int
>> c2      int
>> Time taken: 0.127 seconds
>>
>> // doing another load recreates the directory tab3
>>
>> Shouldn't the select * query return an error when the underlying table
>> file is removed ?
>>
>> -Sukhendu
>
>
>
>
> --
> Nitin Pawar
>


Re: WITH clause support in HIVE

2012-04-17 Thread Sukhendu Chakraborty
We are looking to build an abstraction layer on top of hive where
objects store Hive queries. These queries gets executed when those
objects are accessed by the user. There can be derived obects, for eg.
y = op(x), where y is another Hive query built on top of the  query
represented by x depending on the operator 'op'.

We thought of using WITH clause to achieve these extensions easily.
However, since Hive does not support WITH clause yet, we will look
into the options you suggested.

Thanks,
-Sukhendu

On Tue, Apr 17, 2012 at 11:46 AM, Mark Grover  wrote:
> Sukhendu,
> Perhaps you can use a view or derived table to accomplish the same.
>
> What's  your use case?
>
> Mark Grover, Business Intelligence Analyst
> OANDA Corporation
>
> www: oanda.com www: fxtrade.com
> e: mgro...@oanda.com
>
> "Best Trading Platform" - World Finance's Forex Awards 2009.
> "The One to Watch" - Treasury Today's Adam Smith Awards 2009.
>
>
> - Original Message -
> From: "Sukhendu Chakraborty" 
> To: user@hive.apache.org
> Sent: Tuesday, April 17, 2012 2:23:33 PM
> Subject: Re: WITH clause support in HIVE
>
> Never mind. Its not implemented yet:
> https://issues.apache.org/jira/browse/HIVE-1180
>
> On Tue, Apr 17, 2012 at 11:02 AM, Sukhendu Chakraborty
>  wrote:
>> Resending, as I forgot to change the subject line in my last post. Pls
>> respond to this email.
>>
>> Does HIVE support SQL 99 'WITH' clause yet? I dod not find  definitive
>> answer in the web. I am in thr process of installing HIVE locally, but
>> wanted to check with you before trying it.
>>
>> Thanks,
>> -Sukhendu


Re: WITH clause support in HIVE

2012-04-17 Thread Sukhendu Chakraborty
Never mind. Its not implemented yet:
https://issues.apache.org/jira/browse/HIVE-1180

On Tue, Apr 17, 2012 at 11:02 AM, Sukhendu Chakraborty
 wrote:
> Resending, as I forgot to change the subject line in my last post. Pls
> respond to this email.
>
> Does HIVE support SQL 99 'WITH' clause yet? I dod not find  definitive
> answer in the web. I am in thr process of installing HIVE locally, but
> wanted to check with you before trying it.
>
> Thanks,
> -Sukhendu


WITH clause support in HIVE

2012-04-17 Thread Sukhendu Chakraborty
Resending, as I forgot to change the subject line in my last post. Pls
respond to this email.

Does HIVE support SQL 99 'WITH' clause yet? I dod not find  definitive
answer in the web. I am in thr process of installing HIVE locally, but
wanted to check with you before trying it.

Thanks,
-Sukhendu


Re: Creating Hbase table with pre-splitted regions using Hive QL

2012-04-17 Thread Sukhendu Chakraborty
Does HIVE support SQL 99 'WITH' clause yet? I dod not find  definitive
answer in the web. I am in thr process of installing HIVE locally, but
wanted to check with you before trying it.

Thanks,
-Sukhendu