Re: log4j format logs in Hive table

2011-12-06 Thread Aniket Mokashi
Pig has a Log loader in Piggybank. You can use that to generate the columns
of that table and make the table point to it.

Take a look--
https://github.com/apache/pig/tree/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/apachelog

Thanks,
Aniket

On Tue, Dec 6, 2011 at 10:19 AM, Abhishek Pratap Singh
wrote:

> Hi Sangeetha,
>
> One more easier option is to use Flume Decorators to put some delimiter in
> you stream of data and then load the data into table.
>
> For example:
> Below data can be converted to say PIPE Delimited data (You an code for
> any delimiter) by using Flume decorators.
>
> [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0]
> [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource]
> [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User:
> 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server:
> server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering
> Method = getKey()
>
> PIPE Delimited---
> 2011-10-17 16:30:57,281 |  INFO 
> |33157362@qtp-28456974-0|net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource|Organization:
> Travelocity|Client: AA|Location of device: DFW|User: 550393|user_role:
> |CorelationId: 248|Component: Crossplane|Server: server01|Request:
> seats=5|Response: yes|Status: pass| - Entering Method = getKey()
>
> Now once you have this pipe delimited data, you can create a table with
> pipe delimiter and load this file.
>
> You can choose any delimiter as well as remove some data in flume
> decorator and finally load into Hive table with same schema and delimiter.
> Hope it helps.
>
> ~Abhishek P Singh
>
>  On Tue, Dec 6, 2011 at 7:58 AM, alo alt  wrote:
>
>> Hi Sangeetha,
>>
>> sry, was on road and the answer tooks a while.
>>
>> As Mark wrote, SerDe will be a good start. If its usefull for you take a
>> look at http://code.google.com/p/hive-json-serde/wiki/GettingStarted.
>>
>> - alex
>>
>>
>> On Tue, Dec 6, 2011 at 10:26 AM, sangeetha k  wrote:
>>
>>> Hi,
>>>
>>> Thanks for the response.
>>> Yes, You got my question.
>>>
>>> An example of my log message line will be as below:
>>>
>>> [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0]
>>> [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource]
>>> [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User:
>>> 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server:
>>> server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering
>>> Method = getKey()
>>>
>>> How to specify the delimiter, while describing the table?
>>>
>>> Thanks,
>>> Sangeetha
>>>
>>>   *From:* alo alt 
>>> *To:* user@hive.apache.org; sangeetha k 
>>> *Sent:* Tuesday, December 6, 2011 2:01 PM
>>> *Subject:* Re: log4j format logs in Hive table
>>>
>>> Hi,
>>>
>>> I hope I understood your question correct - did you describe your table?
>>> Like
>>> "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW
>>> FORMAT DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS
>>> TEXTFILE;"
>>>
>>> row* = a name of your descision, Datatype look @documentation.
>>>
>>> After import via "insert (overwrite) table YOURTABLE"
>>>
>>> - alex
>>>
>>>
>>> On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k  wrote:
>>>
>>>  Hi,
>>>
>>> I am new to Hive.
>>>
>>> I am using Flume agent to collect log4j logs and sending to HDFS.
>>> Now i wanted to load the log4j format logs from HDFS to Hive tables.
>>> Each of the attributes in log statements like timestamp, level,
>>> classname etc... should be loaded in seperate columns in the Hive tables.
>>>
>>> I tried creating table in Hive and loaded the entire log in one column,
>>> but dont know how to load the above mentioned data in seperate columns.
>>>
>>> Please send me your suggestions, any links, tutorials on this.
>>>
>>> Thanks,
>>> Sangeetha
>>>
>>>
>>>
>>>
>>> --
>>> Alexander Lorenz
>>> http://mapredit.blogspot.com
>>>
>>> *P **Think of the environment: please don't print this email unless you
>>> really need to.*
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Alexander Lorenz
>> http://mapredit.blogspot.com
>>
>> *P **Think of the environment: please don't print this email unless you
>> really need to.*
>>
>>
>>
>


-- 
"...:::Aniket:::... Quetzalco@tl"


Re: log4j format logs in Hive table

2011-12-06 Thread Abhishek Pratap Singh
Hi Sangeetha,

One more easier option is to use Flume Decorators to put some delimiter in
you stream of data and then load the data into table.

For example:
Below data can be converted to say PIPE Delimited data (You an code for any
delimiter) by using Flume decorators.

[2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0]
[net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource]
[Organization: Travelocity] [Client: AA] [Location of device: DFW] [User:
550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server:
server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering
Method = getKey()

PIPE Delimited---
2011-10-17 16:30:57,281 |  INFO
|33157362@qtp-28456974-0|net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource|Organization:
Travelocity|Client: AA|Location of device: DFW|User: 550393|user_role:
|CorelationId: 248|Component: Crossplane|Server: server01|Request:
seats=5|Response: yes|Status: pass| - Entering Method = getKey()

Now once you have this pipe delimited data, you can create a table with
pipe delimiter and load this file.

You can choose any delimiter as well as remove some data in flume decorator
and finally load into Hive table with same schema and delimiter.
Hope it helps.

~Abhishek P Singh

On Tue, Dec 6, 2011 at 7:58 AM, alo alt  wrote:

> Hi Sangeetha,
>
> sry, was on road and the answer tooks a while.
>
> As Mark wrote, SerDe will be a good start. If its usefull for you take a
> look at http://code.google.com/p/hive-json-serde/wiki/GettingStarted.
>
> - alex
>
>
> On Tue, Dec 6, 2011 at 10:26 AM, sangeetha k  wrote:
>
>> Hi,
>>
>> Thanks for the response.
>> Yes, You got my question.
>>
>> An example of my log message line will be as below:
>>
>> [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0]
>> [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource]
>> [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User:
>> 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server:
>> server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering
>> Method = getKey()
>>
>> How to specify the delimiter, while describing the table?
>>
>> Thanks,
>> Sangeetha
>>
>>   *From:* alo alt 
>> *To:* user@hive.apache.org; sangeetha k 
>> *Sent:* Tuesday, December 6, 2011 2:01 PM
>> *Subject:* Re: log4j format logs in Hive table
>>
>> Hi,
>>
>> I hope I understood your question correct - did you describe your table?
>> Like
>> "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW
>> FORMAT DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS
>> TEXTFILE;"
>>
>> row* = a name of your descision, Datatype look @documentation.
>>
>> After import via "insert (overwrite) table YOURTABLE"
>>
>> - alex
>>
>>
>> On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k  wrote:
>>
>>  Hi,
>>
>> I am new to Hive.
>>
>> I am using Flume agent to collect log4j logs and sending to HDFS.
>> Now i wanted to load the log4j format logs from HDFS to Hive tables.
>> Each of the attributes in log statements like timestamp, level, classname
>> etc... should be loaded in seperate columns in the Hive tables.
>>
>> I tried creating table in Hive and loaded the entire log in one column,
>> but dont know how to load the above mentioned data in seperate columns.
>>
>> Please send me your suggestions, any links, tutorials on this.
>>
>> Thanks,
>> Sangeetha
>>
>>
>>
>>
>> --
>> Alexander Lorenz
>> http://mapredit.blogspot.com
>>
>> *P **Think of the environment: please don't print this email unless you
>> really need to.*
>>
>>
>>
>>
>>
>
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> *P **Think of the environment: please don't print this email unless you
> really need to.*
>
>
>


Re: log4j format logs in Hive table

2011-12-06 Thread alo alt
Hi Sangeetha,

sry, was on road and the answer tooks a while.

As Mark wrote, SerDe will be a good start. If its usefull for you take a
look at http://code.google.com/p/hive-json-serde/wiki/GettingStarted.

- alex


On Tue, Dec 6, 2011 at 10:26 AM, sangeetha k  wrote:

> Hi,
>
> Thanks for the response.
> Yes, You got my question.
>
> An example of my log message line will be as below:
>
> [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0]
> [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource]
> [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User:
> 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server:
> server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering
> Method = getKey()
>
> How to specify the delimiter, while describing the table?
>
> Thanks,
> Sangeetha
>
>  *From:* alo alt 
> *To:* user@hive.apache.org; sangeetha k 
> *Sent:* Tuesday, December 6, 2011 2:01 PM
> *Subject:* Re: log4j format logs in Hive table
>
> Hi,
>
> I hope I understood your question correct - did you describe your table?
> Like
> "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW FORMAT
> DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS TEXTFILE;"
>
> row* = a name of your descision, Datatype look @documentation.
>
> After import via "insert (overwrite) table YOURTABLE"
>
> - alex
>
>
> On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k  wrote:
>
>  Hi,
>
> I am new to Hive.
>
> I am using Flume agent to collect log4j logs and sending to HDFS.
> Now i wanted to load the log4j format logs from HDFS to Hive tables.
> Each of the attributes in log statements like timestamp, level, classname
> etc... should be loaded in seperate columns in the Hive tables.
>
> I tried creating table in Hive and loaded the entire log in one column,
> but dont know how to load the above mentioned data in seperate columns.
>
> Please send me your suggestions, any links, tutorials on this.
>
> Thanks,
> Sangeetha
>
>
>
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> *P **Think of the environment: please don't print this email unless you
> really need to.*
>
>
>
>
>


-- 
Alexander Lorenz
http://mapredit.blogspot.com

*P **Think of the environment: please don't print this email unless you
really need to.*


Re: log4j format logs in Hive table

2011-12-06 Thread Mark Grover
Hi Sangeetha,
Hive uses SerDe (Serializer/Deserializer) for reading data from and writing to 
HDFS. You have many options for choosing the SerDe for your table.
For example, if your file contains tab delimited fields, you could use the 
default SerDe (by not specifying any SerDe) and specify the delimiter by using
FIELDS TERMINATED BY '\t'
in your create table statement.

If you desire,  you could use the Regex SerDe (albeit, with some performance 
overhead) using something like:

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES  (  
"input.regex" = ".*time:([^,]*)",  
"output.format.string" = "time:%1$s")

in your create table statement.

As you get more familiar with Hive, you might find the need for writing your 
own UDF for parsing the data.

Here is the link to the Hive wiki for Create Table:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create%2FDropTable

Here is the link for UDFs:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF


Welcome and good luck!
Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: mgro...@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


- Original Message -
From: "sangeetha k" 
To: user@hive.apache.org
Sent: Tuesday, December 6, 2011 4:26:03 AM
Subject: Re: log4j format logs in Hive table



Hi, 

Thanks for the response. 
Yes, You got my question. 

An example of my log message line will be as below: 

[2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0] 
[net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource] 
[Organization: Travelocity] [Client: AA] [Location of device: DFW] [User: 
550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server: 
server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering Method = 
getKey() 

How to specify the delimiter, while describing the table? 

Thanks, 
Sangeetha 




From: alo alt  
To: user@hive.apache.org; sangeetha k  
Sent: Tuesday, December 6, 2011 2:01 PM 
Subject: Re: log4j format logs in Hive table 


Hi, 


I hope I understood your question correct - did you describe your table? Like 
"create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW FORMAT 
DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS TEXTFILE;" 


row* = a name of your descision, Datatype look @documentation. 


After import via "insert (overwrite) table YOURTABLE" 


- alex 




On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k < get2sa...@yahoo.com > wrote: 





Hi, 

I am new to Hive. 

I am using Flume agent to collect log4j logs and sending to HDFS. 
Now i wanted to load the log4j format logs from HDFS to Hive tables. 
Each of the attributes in log statements like timestamp, level, classname 
etc... should be loaded in seperate columns in the Hive tables. 

I tried creating table in Hive and loaded the entire log in one column, but 
dont know how to load the above mentioned data in seperate columns. 

Please send me your suggestions, any links, tutorials on this. 

Thanks, 
Sangeetha 



-- 

Alexander Lorenz 
http://mapredit.blogspot.com 


P Think of the environment: please don't print this email unless you really 
need to. 






Re: log4j format logs in Hive table

2011-12-06 Thread sangeetha k
Hi,
 
Thanks for the response.
Yes, You got my question.
 
An example of my log message line will be as below:
 
[2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0] 
[net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource] 
[Organization: Travelocity] [Client: AA] [Location of device: DFW] [User: 
550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server: 
server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering Method = 
getKey() 
 
How to specify the delimiter, while describing the table?
 
Thanks,
Sangeetha



From: alo alt 
To: user@hive.apache.org; sangeetha k  
Sent: Tuesday, December 6, 2011 2:01 PM
Subject: Re: log4j format logs in Hive table


Hi, 

I hope I understood your question correct - did you describe your table? 
Like "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW FORMAT 
DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS TEXTFILE;" 

row* = a name of your descision, Datatype look @documentation.

After import via "insert (overwrite) table YOURTABLE"

- alex




On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k  wrote:

Hi,
>
>I am new to Hive.
>
>I am using Flume agent to collect log4j logs and sending to HDFS.
>Now i wanted to load the log4j format logs from HDFS to Hive tables.
>Each of the attributes in log statements like timestamp, level, classname 
>etc... should be loaded in seperate columns in the Hive tables.
>
>I tried creating table in Hive and loaded the entire log in one column, but 
>dont know how to load the above mentioned data in seperate columns.
>
>Please send me your suggestions, any links, tutorials on this.
>
>Thanks,
>Sangeetha


-- 

Alexander Lorenz
http://mapredit.blogspot.com

P Think of the environment: please don't print this email unless you really 
need to.

Re: log4j format logs in Hive table

2011-12-06 Thread alo alt
Hi,

I hope I understood your question correct - did you describe your table?
Like
"create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW FORMAT
DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS TEXTFILE;"

row* = a name of your descision, Datatype look @documentation.

After import via "insert (overwrite) table YOURTABLE"

- alex


On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k  wrote:

> Hi,
>
> I am new to Hive.
>
> I am using Flume agent to collect log4j logs and sending to HDFS.
> Now i wanted to load the log4j format logs from HDFS to Hive tables.
> Each of the attributes in log statements like timestamp, level, classname
> etc... should be loaded in seperate columns in the Hive tables.
>
> I tried creating table in Hive and loaded the entire log in one column,
> but dont know how to load the above mentioned data in seperate columns.
>
> Please send me your suggestions, any links, tutorials on this.
>
> Thanks,
> Sangeetha
>



-- 
Alexander Lorenz
http://mapredit.blogspot.com

*P **Think of the environment: please don't print this email unless you
really need to.*


log4j format logs in Hive table

2011-12-05 Thread sangeetha k
Hi,
 
I am new to Hive.
 
I am using Flume agent to collect log4j logs and sending to HDFS.
Now i wanted to load the log4j format logs from HDFS to Hive tables.
Each of the attributes in log statements like timestamp, level, classname 
etc... should be loaded in seperate columns in the Hive tables.
 
I tried creating table in Hive and loaded the entire log in one column, but 
dont know how to load the above mentioned data in seperate columns.
 
Please send me your suggestions, any links, tutorials on this.
 
Thanks,
Sangeetha