Re: working with SAS

alo alt Mon, 06 Feb 2012 00:16:50 -0800

Hi,

hadoop is running on a linux box (mostly) and can run in a standalone 
installation for testing only. If you decide to use hadoop with hive or hbase 
you have to face a lot of more tasks:


- installation (whirr and Amazone EC2 as example)
- write your own mapreduce job or use hive / hbase
- setup sqoop with the terradata-driver

You can easy setup part 1 and 2 with Amazon's EC2, I think you can also book 
Windows Server there. For a single query the best option I think before you 
install a hadoop cluster.

best,
 Alex 


--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 6, 2012, at 8:11 AM, Ali Jooan Rizvi wrote:

> Hi,
> 
> 
> 
> I would like to know if hadoop will be of help to me? Let me explain you
> guys my scenario:
> 
> 
> 
> I have a windows server based single machine server having 16 Cores and 48
> GB of Physical Memory. In addition, I have 120 GB of virtual memory.
> 
> 
> 
> I am running a query with statistical calculation on large data of over 1
> billion rows, on SAS. In this case, SAS is acting like a database on which
> both source and target tables are residing. For storage, I can keep the
> source and target data on Teradata as well but the query containing a patent
> can only be run on SAS interface.
> 
> 
> 
> The problem is that SAS is taking many days (25 days) to run it (a single
> query with statistical function) and not all cores all the time were used
> and rather merely 5% CPU was utilized on average. However memory utilization
> was high, very high, and that's why large virtual memory was used. 
> 
> 
> 
> Can I have a hadoop interface in place to do it all so that I may end up
> running the query in lesser time that is in 1 or 2 days. Anything squeezing
> my run time will be very helpful. 
> 
> 
> 
> Thanks
> 
> 
> 
> Ali Jooan Rizvi
>

Re: working with SAS

Reply via email to