Re: log4j format logs in Hive table
Pig has a Log loader in Piggybank. You can use that to generate the columns of that table and make the table point to it. Take a look-- https://github.com/apache/pig/tree/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/apachelog Thanks, Aniket On Tue, Dec 6, 2011 at 10:19 AM, Abhishek Pratap Singh wrote: > Hi Sangeetha, > > One more easier option is to use Flume Decorators to put some delimiter in > you stream of data and then load the data into table. > > For example: > Below data can be converted to say PIPE Delimited data (You an code for > any delimiter) by using Flume decorators. > > [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0] > [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource] > [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User: > 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server: > server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering > Method = getKey() > > PIPE Delimited--- > 2011-10-17 16:30:57,281 | INFO > |33157362@qtp-28456974-0|net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource|Organization: > Travelocity|Client: AA|Location of device: DFW|User: 550393|user_role: > |CorelationId: 248|Component: Crossplane|Server: server01|Request: > seats=5|Response: yes|Status: pass| - Entering Method = getKey() > > Now once you have this pipe delimited data, you can create a table with > pipe delimiter and load this file. > > You can choose any delimiter as well as remove some data in flume > decorator and finally load into Hive table with same schema and delimiter. > Hope it helps. > > ~Abhishek P Singh > > On Tue, Dec 6, 2011 at 7:58 AM, alo alt wrote: > >> Hi Sangeetha, >> >> sry, was on road and the answer tooks a while. >> >> As Mark wrote, SerDe will be a good start. If its usefull for you take a >> look at http://code.google.com/p/hive-json-serde/wiki/GettingStarted. >> >> - alex >> >> >> On Tue, Dec 6, 2011 at 10:26 AM, sangeetha k wrote: >> >>> Hi, >>> >>> Thanks for the response. >>> Yes, You got my question. >>> >>> An example of my log message line will be as below: >>> >>> [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0] >>> [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource] >>> [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User: >>> 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server: >>> server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering >>> Method = getKey() >>> >>> How to specify the delimiter, while describing the table? >>> >>> Thanks, >>> Sangeetha >>> >>> *From:* alo alt >>> *To:* user@hive.apache.org; sangeetha k >>> *Sent:* Tuesday, December 6, 2011 2:01 PM >>> *Subject:* Re: log4j format logs in Hive table >>> >>> Hi, >>> >>> I hope I understood your question correct - did you describe your table? >>> Like >>> "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW >>> FORMAT DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS >>> TEXTFILE;" >>> >>> row* = a name of your descision, Datatype look @documentation. >>> >>> After import via "insert (overwrite) table YOURTABLE" >>> >>> - alex >>> >>> >>> On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k wrote: >>> >>> Hi, >>> >>> I am new to Hive. >>> >>> I am using Flume agent to collect log4j logs and sending to HDFS. >>> Now i wanted to load the log4j format logs from HDFS to Hive tables. >>> Each of the attributes in log statements like timestamp, level, >>> classname etc... should be loaded in seperate columns in the Hive tables. >>> >>> I tried creating table in Hive and loaded the entire log in one column, >>> but dont know how to load the above mentioned data in seperate columns. >>> >>> Please send me your suggestions, any links, tutorials on this. >>> >>> Thanks, >>> Sangeetha >>> >>> >>> >>> >>> -- >>> Alexander Lorenz >>> http://mapredit.blogspot.com >>> >>> *P **Think of the environment: please don't print this email unless you >>> really need to.* >>> >>> >>> >>> >>> >> >> >> -- >> Alexander Lorenz >> http://mapredit.blogspot.com >> >> *P **Think of the environment: please don't print this email unless you >> really need to.* >> >> >> > -- "...:::Aniket:::... Quetzalco@tl"
Re: log4j format logs in Hive table
Hi Sangeetha, One more easier option is to use Flume Decorators to put some delimiter in you stream of data and then load the data into table. For example: Below data can be converted to say PIPE Delimited data (You an code for any delimiter) by using Flume decorators. [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0] [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource] [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User: 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server: server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering Method = getKey() PIPE Delimited--- 2011-10-17 16:30:57,281 | INFO |33157362@qtp-28456974-0|net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource|Organization: Travelocity|Client: AA|Location of device: DFW|User: 550393|user_role: |CorelationId: 248|Component: Crossplane|Server: server01|Request: seats=5|Response: yes|Status: pass| - Entering Method = getKey() Now once you have this pipe delimited data, you can create a table with pipe delimiter and load this file. You can choose any delimiter as well as remove some data in flume decorator and finally load into Hive table with same schema and delimiter. Hope it helps. ~Abhishek P Singh On Tue, Dec 6, 2011 at 7:58 AM, alo alt wrote: > Hi Sangeetha, > > sry, was on road and the answer tooks a while. > > As Mark wrote, SerDe will be a good start. If its usefull for you take a > look at http://code.google.com/p/hive-json-serde/wiki/GettingStarted. > > - alex > > > On Tue, Dec 6, 2011 at 10:26 AM, sangeetha k wrote: > >> Hi, >> >> Thanks for the response. >> Yes, You got my question. >> >> An example of my log message line will be as below: >> >> [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0] >> [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource] >> [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User: >> 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server: >> server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering >> Method = getKey() >> >> How to specify the delimiter, while describing the table? >> >> Thanks, >> Sangeetha >> >> *From:* alo alt >> *To:* user@hive.apache.org; sangeetha k >> *Sent:* Tuesday, December 6, 2011 2:01 PM >> *Subject:* Re: log4j format logs in Hive table >> >> Hi, >> >> I hope I understood your question correct - did you describe your table? >> Like >> "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW >> FORMAT DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS >> TEXTFILE;" >> >> row* = a name of your descision, Datatype look @documentation. >> >> After import via "insert (overwrite) table YOURTABLE" >> >> - alex >> >> >> On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k wrote: >> >> Hi, >> >> I am new to Hive. >> >> I am using Flume agent to collect log4j logs and sending to HDFS. >> Now i wanted to load the log4j format logs from HDFS to Hive tables. >> Each of the attributes in log statements like timestamp, level, classname >> etc... should be loaded in seperate columns in the Hive tables. >> >> I tried creating table in Hive and loaded the entire log in one column, >> but dont know how to load the above mentioned data in seperate columns. >> >> Please send me your suggestions, any links, tutorials on this. >> >> Thanks, >> Sangeetha >> >> >> >> >> -- >> Alexander Lorenz >> http://mapredit.blogspot.com >> >> *P **Think of the environment: please don't print this email unless you >> really need to.* >> >> >> >> >> > > > -- > Alexander Lorenz > http://mapredit.blogspot.com > > *P **Think of the environment: please don't print this email unless you > really need to.* > > >
Re: log4j format logs in Hive table
Hi Sangeetha, sry, was on road and the answer tooks a while. As Mark wrote, SerDe will be a good start. If its usefull for you take a look at http://code.google.com/p/hive-json-serde/wiki/GettingStarted. - alex On Tue, Dec 6, 2011 at 10:26 AM, sangeetha k wrote: > Hi, > > Thanks for the response. > Yes, You got my question. > > An example of my log message line will be as below: > > [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0] > [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource] > [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User: > 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server: > server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering > Method = getKey() > > How to specify the delimiter, while describing the table? > > Thanks, > Sangeetha > > *From:* alo alt > *To:* user@hive.apache.org; sangeetha k > *Sent:* Tuesday, December 6, 2011 2:01 PM > *Subject:* Re: log4j format logs in Hive table > > Hi, > > I hope I understood your question correct - did you describe your table? > Like > "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW FORMAT > DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS TEXTFILE;" > > row* = a name of your descision, Datatype look @documentation. > > After import via "insert (overwrite) table YOURTABLE" > > - alex > > > On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k wrote: > > Hi, > > I am new to Hive. > > I am using Flume agent to collect log4j logs and sending to HDFS. > Now i wanted to load the log4j format logs from HDFS to Hive tables. > Each of the attributes in log statements like timestamp, level, classname > etc... should be loaded in seperate columns in the Hive tables. > > I tried creating table in Hive and loaded the entire log in one column, > but dont know how to load the above mentioned data in seperate columns. > > Please send me your suggestions, any links, tutorials on this. > > Thanks, > Sangeetha > > > > > -- > Alexander Lorenz > http://mapredit.blogspot.com > > *P **Think of the environment: please don't print this email unless you > really need to.* > > > > > -- Alexander Lorenz http://mapredit.blogspot.com *P **Think of the environment: please don't print this email unless you really need to.*
Re: log4j format logs in Hive table
Hi Sangeetha, Hive uses SerDe (Serializer/Deserializer) for reading data from and writing to HDFS. You have many options for choosing the SerDe for your table. For example, if your file contains tab delimited fields, you could use the default SerDe (by not specifying any SerDe) and specify the delimiter by using FIELDS TERMINATED BY '\t' in your create table statement. If you desire, you could use the Regex SerDe (albeit, with some performance overhead) using something like: ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = ".*time:([^,]*)", "output.format.string" = "time:%1$s") in your create table statement. As you get more familiar with Hive, you might find the need for writing your own UDF for parsing the data. Here is the link to the Hive wiki for Create Table: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create%2FDropTable Here is the link for UDFs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF Welcome and good luck! Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com e: mgro...@oanda.com "Best Trading Platform" - World Finance's Forex Awards 2009. "The One to Watch" - Treasury Today's Adam Smith Awards 2009. - Original Message - From: "sangeetha k" To: user@hive.apache.org Sent: Tuesday, December 6, 2011 4:26:03 AM Subject: Re: log4j format logs in Hive table Hi, Thanks for the response. Yes, You got my question. An example of my log message line will be as below: [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0] [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource] [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User: 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server: server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering Method = getKey() How to specify the delimiter, while describing the table? Thanks, Sangeetha From: alo alt To: user@hive.apache.org; sangeetha k Sent: Tuesday, December 6, 2011 2:01 PM Subject: Re: log4j format logs in Hive table Hi, I hope I understood your question correct - did you describe your table? Like "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS TEXTFILE;" row* = a name of your descision, Datatype look @documentation. After import via "insert (overwrite) table YOURTABLE" - alex On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k < get2sa...@yahoo.com > wrote: Hi, I am new to Hive. I am using Flume agent to collect log4j logs and sending to HDFS. Now i wanted to load the log4j format logs from HDFS to Hive tables. Each of the attributes in log statements like timestamp, level, classname etc... should be loaded in seperate columns in the Hive tables. I tried creating table in Hive and loaded the entire log in one column, but dont know how to load the above mentioned data in seperate columns. Please send me your suggestions, any links, tutorials on this. Thanks, Sangeetha -- Alexander Lorenz http://mapredit.blogspot.com P Think of the environment: please don't print this email unless you really need to.
Re: log4j format logs in Hive table
Hi, Thanks for the response. Yes, You got my question. An example of my log message line will be as below: [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0] [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource] [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User: 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server: server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering Method = getKey() How to specify the delimiter, while describing the table? Thanks, Sangeetha From: alo alt To: user@hive.apache.org; sangeetha k Sent: Tuesday, December 6, 2011 2:01 PM Subject: Re: log4j format logs in Hive table Hi, I hope I understood your question correct - did you describe your table? Like "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS TEXTFILE;" row* = a name of your descision, Datatype look @documentation. After import via "insert (overwrite) table YOURTABLE" - alex On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k wrote: Hi, > >I am new to Hive. > >I am using Flume agent to collect log4j logs and sending to HDFS. >Now i wanted to load the log4j format logs from HDFS to Hive tables. >Each of the attributes in log statements like timestamp, level, classname >etc... should be loaded in seperate columns in the Hive tables. > >I tried creating table in Hive and loaded the entire log in one column, but >dont know how to load the above mentioned data in seperate columns. > >Please send me your suggestions, any links, tutorials on this. > >Thanks, >Sangeetha -- Alexander Lorenz http://mapredit.blogspot.com P Think of the environment: please don't print this email unless you really need to.
Re: log4j format logs in Hive table
Hi, I hope I understood your question correct - did you describe your table? Like "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS TEXTFILE;" row* = a name of your descision, Datatype look @documentation. After import via "insert (overwrite) table YOURTABLE" - alex On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k wrote: > Hi, > > I am new to Hive. > > I am using Flume agent to collect log4j logs and sending to HDFS. > Now i wanted to load the log4j format logs from HDFS to Hive tables. > Each of the attributes in log statements like timestamp, level, classname > etc... should be loaded in seperate columns in the Hive tables. > > I tried creating table in Hive and loaded the entire log in one column, > but dont know how to load the above mentioned data in seperate columns. > > Please send me your suggestions, any links, tutorials on this. > > Thanks, > Sangeetha > -- Alexander Lorenz http://mapredit.blogspot.com *P **Think of the environment: please don't print this email unless you really need to.*
log4j format logs in Hive table
Hi, I am new to Hive. I am using Flume agent to collect log4j logs and sending to HDFS. Now i wanted to load the log4j format logs from HDFS to Hive tables. Each of the attributes in log statements like timestamp, level, classname etc... should be loaded in seperate columns in the Hive tables. I tried creating table in Hive and loaded the entire log in one column, but dont know how to load the above mentioned data in seperate columns. Please send me your suggestions, any links, tutorials on this. Thanks, Sangeetha