Re: log4j format logs in Hive table

2011-12-06 Thread alo alt
Hi, I hope I understood your question correct - did you describe your table? Like create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS TEXTFILE; row* = a name of your descision, Datatype look @documentation. After

Re: log4j format logs in Hive table

2011-12-06 Thread sangeetha k
Hi,   Thanks for the response. Yes, You got my question.   An example of my log message line will be as below:   [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0] [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource] [Organization: Travelocity] [Client: AA] [Location

Hive query taking too much time

2011-12-06 Thread Savant, Keshav
Hi All, My setup is hadoop-0.20.203.0 hive-0.7.1 I am having a total of 5 node cluster: 4 data nodes, 1 namenode (it is also acting as secondary name node). On namenode I have setup hive with HiveDerbyServerMode to support multiple hive server connection. I have inserted plain text

Re: Hive Reducers hanging - interesting problem - skew ?

2011-12-06 Thread john smith
Hi Mark, Thanks for your response. I tried skew optimization and I also saw the video by Lin and Namit. From what I understand about skew join, instead of a single go , they divide it into 2 stages. Stage1 Join non-skew pairs. and write the skew pairs into temporary files on HDFS. Stage 2 Do a

Re: Hive query taking too much time

2011-12-06 Thread Wojciech Langiewicz
Hi, In your case total file size isn't main factor that reduces performance, number of files is. To test this try merging those over 2000 files into one (or few) big, then upload it to HDFS and test hive performance (it should be definitely higher). It this works you should think about

Re: Hive query taking too much time

2011-12-06 Thread Mohit Gupta
Hi Paul, I am having the same problem. Do you know any efficient way of merging the files? -Mohit On Tue, Dec 6, 2011 at 8:14 PM, Paul Mackles pmack...@adobe.com wrote: How much time is it spending in the map/reduce phases, respectively? The large number of files could be creating a lot of

Hive web console - schema is empty

2011-12-06 Thread sangeetha k
Hi,   I opened the web console for Hive using http://localhost:/hwi   In the Browse Schema option, I could see only the default Hive table list name and description. Not able to view the tables. What should be issue?   I have created 2 tables under default schema , but could not able to see

Re: Hive web console - schema is empty

2011-12-06 Thread sangeetha k
I get this error message in the console..   11/12/06 08:14:50 INFO DataNucleus.MetaData: Registering listener for metadata initialisation 11/12/06 08:14:50 INFO metastore.ObjectStore: Initialized ObjectStore 11/12/06 08:14:50 WARN DataNucleus.MetaData: MetaData Parser encountered an error in

Re: log4j format logs in Hive table

2011-12-06 Thread Mark Grover
Hi Sangeetha, Hive uses SerDe (Serializer/Deserializer) for reading data from and writing to HDFS. You have many options for choosing the SerDe for your table. For example, if your file contains tab delimited fields, you could use the default SerDe (by not specifying any SerDe) and specify the

Re: log4j format logs in Hive table

2011-12-06 Thread alo alt
Hi Sangeetha, sry, was on road and the answer tooks a while. As Mark wrote, SerDe will be a good start. If its usefull for you take a look at http://code.google.com/p/hive-json-serde/wiki/GettingStarted. - alex On Tue, Dec 6, 2011 at 10:26 AM, sangeetha k get2sa...@yahoo.com wrote: Hi,

Re: Hive Reducers hanging - interesting problem - skew ?

2011-12-06 Thread Aaron Sun
Can you try from B join A. One simple rule of join in Hive is Largest table last. The smaller tables can then be buffered into distributed cache for fast retrieval and comparison. Thanks Aaron On Tue, Dec 6, 2011 at 4:01 AM, john smith js1987.sm...@gmail.com wrote: Hi Mark, Thanks for your

Re: How to see the intermediate results between AST and optimized logical query plan.

2011-12-06 Thread Mohit Gupta
Hi, I am trying to understand the output of hive Explain command. I found the documentation provided ( https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain ) to be of little help. Is there any other place where I can find the detailed documentation on this? Hiroyuki, were you

Re: log4j format logs in Hive table

2011-12-06 Thread Aniket Mokashi
Pig has a Log loader in Piggybank. You can use that to generate the columns of that table and make the table point to it. Take a look-- https://github.com/apache/pig/tree/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/apachelog Thanks, Aniket On Tue, Dec 6, 2011 at

Re: Hive query taking too much time

2011-12-06 Thread Ayon Sinha
How about a simple Pig script with a load and a store statement? Set the max # reducers to say 20 or 30, that way you will only have 20-30 files as output. Then put these files in the Hive dir. Make sure to match the delimiters in Hive Pig.   -Ayon See My Photos on Flickr Also check out my