Use Pig or Hive.  Lots of overlap, some differences, but it looks like both 
projects' future plans mean even more overlap, though I didn't hear any 
mentions of convergence and merging.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: Amandeep Khurana <ama...@gmail.com>
> To: common-user@hadoop.apache.org
> Sent: Friday, October 2, 2009 6:28:51 PM
> Subject: Re: indexing log files for adhoc queries - suggestions?
> 
> Hive is an sql-like abstraction over map reduce. It just enables you
> to execute sql-like queries over data without actually having to write
> the MR job. However it converts the query into a job at the back.
> 
> Hbase might be what you are looking for. You can put your logs into
> hbase and query them as well as run MR jobs over them...
> 
> On 10/1/09, Mayuran Yogarajah wrote:
> > ishwar ramani wrote:
> >> Hi,
> >>
> >> I have a setup where logs are periodically bundled up and dumped into
> >> hadoop dfs as large sequence file.
> >>
> >> It works fine for all my map reduce jobs.
> >>
> >> Now i need to handle adhoc queries for pulling out logs based on user
> >> and time range.
> >>
> >> I really dont need a full indexer (like lucene) for this purpose.
> >>
> >> My first thought is to run a periodic mapreduce to generate a large
> >> text file sorted by user id.
> >>
> >> The text file will have (sequence file name, offset) to retrieve the logs
> >> ....
> >>
> >>
> >> I am guessing many of you ran into similar requirements... Any
> >> suggestions on doing this better?
> >>
> >> ishwar
> >>
> > Have you looked into Hive? Its perfect for ad hoc queries..
> >
> > M
> >
> 
> 
> -- 
> 
> 
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz

Reply via email to