Use Pig or Hive. Lots of overlap, some differences, but it looks like both projects' future plans mean even more overlap, though I didn't hear any mentions of convergence and merging.
Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR ----- Original Message ---- > From: Amandeep Khurana <ama...@gmail.com> > To: common-user@hadoop.apache.org > Sent: Friday, October 2, 2009 6:28:51 PM > Subject: Re: indexing log files for adhoc queries - suggestions? > > Hive is an sql-like abstraction over map reduce. It just enables you > to execute sql-like queries over data without actually having to write > the MR job. However it converts the query into a job at the back. > > Hbase might be what you are looking for. You can put your logs into > hbase and query them as well as run MR jobs over them... > > On 10/1/09, Mayuran Yogarajah wrote: > > ishwar ramani wrote: > >> Hi, > >> > >> I have a setup where logs are periodically bundled up and dumped into > >> hadoop dfs as large sequence file. > >> > >> It works fine for all my map reduce jobs. > >> > >> Now i need to handle adhoc queries for pulling out logs based on user > >> and time range. > >> > >> I really dont need a full indexer (like lucene) for this purpose. > >> > >> My first thought is to run a periodic mapreduce to generate a large > >> text file sorted by user id. > >> > >> The text file will have (sequence file name, offset) to retrieve the logs > >> .... > >> > >> > >> I am guessing many of you ran into similar requirements... Any > >> suggestions on doing this better? > >> > >> ishwar > >> > > Have you looked into Hive? Its perfect for ad hoc queries.. > > > > M > > > > > -- > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz