Re: How to apply data mining on Hive?
If you are interested, you can also look at Apache hama which provides an MPI like interface on top of hadoop map-reduce. http://incubator.apache.org/hama/ On Jun 8, 2012 4:55 PM, "Mark Grover" wrote: > Hi Jason, > Hive does expose a JDBC interface which can by tools and applications. You > would check out individual tools to see if they support Hadoop (I use the > word Hadoop and not Hive since an application doesn't need Hive to run Map > Reduce jobs on data in HDFS). > > Apache Mahout, as Sreenath, mentioned is also an interesting open source > project which combines canonical machine learning algorithms with the power > of Hadoop. That might fit your bill too. > > Good luck, > Mark > > On Fri, Jun 8, 2012 at 1:25 AM, jason Yang wrote: > >> Hi, Mark. >> >> Thank you for your reply. >> >> I have read the User Guide, but I'm still wondering what can I do for the >> following scenario: >> >> 1. Suppose I have a table t_customer_info in Hive, which include lots >> of information about our customers. >> 2. Now I would like to cluster those customers into different groups so >> that customers within a group have high similarity, but are very dissimilar >> to customers in other groups. >> 3. This is a classical clustering problem in Data Mining field, I thought >> such job can not be done by query language, instead of some data mining >> algorithms. >> >> >> When we look "back" to the traditional DBMS, there're lots of data mining >> tools or BI tools which could connect to the DBMS, and apply some canonical >> algorithms to the data in the DBMS. So I start to wonder is there similar >> tools over Hive? >> >> If not, what's the most used way to do data mining over Hadoop? >> >> 2012/6/8 Mark Grover >> >>> Hi Jason, >>> Hive is a data warehouse system that sits on top of Hadoop. The key >>> selling point here is that it allows users to write SQL-like queries to >>> query their large scale data. These queries get compiled into Map Reduce >>> which is then run on the Hadoop cluster just like any other Map Reduce jobs. >>> >>> Hadoop does all the parallel processing for you. All you have to do is >>> set up a Hadoop cluster, install Hive on the cluster and run your Hive >>> queries. All underlying processing will happen in parallel where possible. >>> >>> This is a good place to get started and learn more about Hive: >>> https://cwiki.apache.org/confluence/display/Hive/GettingStarted >>> >>> Welcome and good luck! >>> >>> Mark >>> >>> >>> On Thu, Jun 7, 2012 at 10:10 PM, jason Yang wrote: >>> Hi, dear friends. I was wondering what's the popular way to do data mining on Hive? Since the data in Hive is distributed over the cluster, is there any tool or solution could parallelize the data mining? Any suggestion would be appreciated. -- YANG, Lin >>> >> >> >> -- >> YANG, Lin >> >> >
Re: NaNs and Infinity support in HIVE?
Thanks Mark. Yes, my findings were similar. It looks like HIVE does not distinguish between Infinity and NaNs. -Sukhendu On Thu, May 17, 2012 at 6:50 PM, Mark Grover wrote: > I did a test for this. > > If a NaN is inserted into a string column, the value is serialized as > "Infinity" in HDFS. However, if it's inserted into an Integer column, it's > serialized as 2147483647 or -2147483648 depending on whether the output is > +infinity or -infinity. > > Mark > > - Original Message - > From: "Nanda Vijaydev" > To: user@hive.apache.org > Sent: Thursday, May 17, 2012 8:10:45 PM > Subject: Re: NaNs and Infinity support in HIVE? > > Can you paste a sample line of your data on HDFS and which column you are > trying to query? > > > Thanks > Nanda Vijaydev > > > On Mon, May 14, 2012 at 11:40 AM, Sukhendu Chakraborty < > sukhendu.chakrabo...@gmail.com > wrote: > > > Are NaNs and/or Infinity supported in HIVE? If yes, I wanted to know > how are NaNs and Infinity values represented in HDFS files to be > interpreted correctly in Hive. > > When I do 'select 1/0 from tab', I get a text value, "Infinity". > However, when I enter "Infinity" v in my HDFS file represented by the > HIVE table (the column datatype of the table is int), and then do a > select, I get 'NULL' (and not Infinity). > > I am working with HIVE 0.7.1. > > Thanks, > -Sukhendu
is anybody looking at case-sensitivity for HIVE object names?
https://issues.apache.org/jira/browse/HIVE-912 talks about case-sensitivity, but it does not fixes the issue. I would like for HIVE to have to enforce case-sensitivity for HIVE object names. For eg. Oracle achieves this by enclosing the name in double quotes: "". This effort also take care of object names with special characters (eg '.', $ etc)
Re: Order by Sort by partitioned columns
I think in HIVE the partitioned columns are virtual. They are not physical data columns but a directory name corr. to the partition key value to facilitate partition pruning during select. On May 14, 2012 6:29 AM, "Mark Grover" wrote: > Hi Shin, > If you could list the query that failed and the query used to create the > tables in question, that would be very helpful. > > Mark > > - Original Message - > From: "Shin Chan" > To: "HIVE User" > Sent: Monday, May 14, 2012 2:28:06 AM > Subject: Order by Sort by partitioned columns > > Hi All > > Just curious if its possible to Order by or Sort by partitioned columns. > > I tried it , it fails as it is not able to find those columns. They are > present as partitions. > > Any help would be appreciated. > > Thanks and Regards , >
NaNs and Infinity support in HIVE?
Are NaNs and/or Infinity supported in HIVE? If yes, I wanted to know how are NaNs and Infinity values represented in HDFS files to be interpreted correctly in Hive. When I do 'select 1/0 from tab', I get a text value, "Infinity". However, when I enter "Infinity" v in my HDFS file represented by the HIVE table (the column datatype of the table is int), and then do a select, I get 'NULL' (and not Infinity). I am working with HIVE 0.7.1. Thanks, -Sukhendu
Re: removing hdfs table data directory does not throw error in hive
Thanks Nitin. I am aware of what Hive is doing. The question is, is it okay not return an error/warning when no data is found since the metadata for the table also contains the data location when you create the table (which creates the hdfs directory as well). So, if somebody erroneously removes the hive directory corr. to the table, atleast a warning on select might be a good idea. -Sukhendu On Mon, Apr 23, 2012 at 8:28 PM, Nitin Pawar wrote: > hive table meta data is stored into a meta data store which will retain the > table structure and other meta info even if you delete hdfs table directory > as its stored in metadata store db. > > When you do a select * from table; > 1) hive checks for table exists in metadata store > 2) if table is existing then check the location of data > 3) if data is available in the location process the data else return OK > without doing anything > > It is not an error case because hive job did not fail. > > > On Tue, Apr 24, 2012 at 6:25 AM, Sukhendu Chakraborty > wrote: >> >> I have a hive table tab3 with two columns (c1 int, c2 int) >> >> hive> load data local inpath '/tmp/orhc466fb981' into table tab3; >> Copying data from file:/tmp/orhc466fb981 >> Copying file: file:/tmp/orhc466fb981 >> Loading data to table default.tab3 >> OK >> Time taken: 3.907 seconds >> hive> select * from tab3; >> OK >> 4 2 >> 4 10 >> 7 4 >> 7 22 >> . >> //remove the tab3 directory from hdfs >> [schakrab@diy-1-2 orch]$ hadoop fs -rmr /user/hive/warehouse/tab3; >> Deleted hdfs://localhost:9000/user/hive/warehouse/tab3 >> [schakrab@diy-1-2 orch]$ hive >> Hive history >> file=/tmp/schakrab/hive_job_log_schakrab_201204231748_1985146177.txt >> //no error thrown! >> hive> select * from tab3; >> OK >> Time taken: 3.68 seconds >> // of course. metadata still exists. >> hive> desc tab3; >> OK >> c1 int >> c2 int >> Time taken: 0.127 seconds >> >> // doing another load recreates the directory tab3 >> >> Shouldn't the select * query return an error when the underlying table >> file is removed ? >> >> -Sukhendu > > > > > -- > Nitin Pawar >
Re: WITH clause support in HIVE
We are looking to build an abstraction layer on top of hive where objects store Hive queries. These queries gets executed when those objects are accessed by the user. There can be derived obects, for eg. y = op(x), where y is another Hive query built on top of the query represented by x depending on the operator 'op'. We thought of using WITH clause to achieve these extensions easily. However, since Hive does not support WITH clause yet, we will look into the options you suggested. Thanks, -Sukhendu On Tue, Apr 17, 2012 at 11:46 AM, Mark Grover wrote: > Sukhendu, > Perhaps you can use a view or derived table to accomplish the same. > > What's your use case? > > Mark Grover, Business Intelligence Analyst > OANDA Corporation > > www: oanda.com www: fxtrade.com > e: mgro...@oanda.com > > "Best Trading Platform" - World Finance's Forex Awards 2009. > "The One to Watch" - Treasury Today's Adam Smith Awards 2009. > > > - Original Message - > From: "Sukhendu Chakraborty" > To: user@hive.apache.org > Sent: Tuesday, April 17, 2012 2:23:33 PM > Subject: Re: WITH clause support in HIVE > > Never mind. Its not implemented yet: > https://issues.apache.org/jira/browse/HIVE-1180 > > On Tue, Apr 17, 2012 at 11:02 AM, Sukhendu Chakraborty > wrote: >> Resending, as I forgot to change the subject line in my last post. Pls >> respond to this email. >> >> Does HIVE support SQL 99 'WITH' clause yet? I dod not find definitive >> answer in the web. I am in thr process of installing HIVE locally, but >> wanted to check with you before trying it. >> >> Thanks, >> -Sukhendu
Re: WITH clause support in HIVE
Never mind. Its not implemented yet: https://issues.apache.org/jira/browse/HIVE-1180 On Tue, Apr 17, 2012 at 11:02 AM, Sukhendu Chakraborty wrote: > Resending, as I forgot to change the subject line in my last post. Pls > respond to this email. > > Does HIVE support SQL 99 'WITH' clause yet? I dod not find definitive > answer in the web. I am in thr process of installing HIVE locally, but > wanted to check with you before trying it. > > Thanks, > -Sukhendu
WITH clause support in HIVE
Resending, as I forgot to change the subject line in my last post. Pls respond to this email. Does HIVE support SQL 99 'WITH' clause yet? I dod not find definitive answer in the web. I am in thr process of installing HIVE locally, but wanted to check with you before trying it. Thanks, -Sukhendu
Re: Creating Hbase table with pre-splitted regions using Hive QL
Does HIVE support SQL 99 'WITH' clause yet? I dod not find definitive answer in the web. I am in thr process of installing HIVE locally, but wanted to check with you before trying it. Thanks, -Sukhendu