use the request column in apache access.log as the source of the Hadoop table

2010-11-23 Thread liad livnat
Hi All

I'm facing a problem and need your help.

*I would like to use the request column in apache access.log as the source
of the Hadoop table.*





I was able to insert the entire log table but, I would like to insert
a *specific
request to specific table* *the question is* : is possible without
additional script? If so, how.

The following example should demonstrate what we are looking for:



1.   Supposed we have the following log file

a.   XXX.16.3.221 - - [22/Nov/2010:23:57:09 -0800] GET
/includes/Entity1.ent?ClientID=1189272DayOfWeek=2Sent=OKWeekStart=31%2000:00:00
HTTP/1.1 200 1150 - -

2.   And the following appropriate table

CREATE TABLE Entity1(

Id INT,

DayOfWeek INT,

Sent STRING,

WeekStart INT)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'STORED
AS TEXTFILE;

3.   The following query : select * from Entity1 -  should return :
1189272,2,OK, 31



1.   Did you do something like this before?

2.   Suppose the request string was encapsulate with base64, is there a
way to decode it – do we need to use python script for that?

3.   One last question, can you give as example of your use in python  -
aka what are you use it for?



Thanks in advanced,

Liad.


Example of automatic insertion process from apache access.log using to hadoop table using hive

2010-11-23 Thread liad livnat
Hi,
1. can someone provide me an example for automatic insertion process from
apache access.log to hadoop table using hive.
2. can someone explain if there is a way to directly point a directory which
will be the data source of hadoop table (ex. copying a file to directory and
when i use select hive automatically referrer to the directory and search in
all the files)
Thanks,
Liad.


use the request column in apache access.log as the source of the Hadoop table

2010-11-23 Thread liad livnat
Hi All

I'm facing a problem and need your help.

*I would like to use the request column in apache access.log as the source
of the Hadoop table.*





I was able to insert the entire log table but, I would like to insert
a *specific
request to specific table* *the question is* : is possible without
additional script? If so, how.

The following example should demonstrate what we are looking for:



1.   Supposed we have the following log file

a.   XXX.16.3.221 - - [22/Nov/2010:23:57:09 -0800] GET
/includes/Entity1.ent?ClientID=1189272DayOfWeek=2Sent=OKWeekStart=31%2000:00:00
HTTP/1.1 200 1150 - -

2.   And the following appropriate table

CREATE TABLE Entity1(

Id INT,

DayOfWeek INT,

Sent STRING,

WeekStart INT)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'STORED
AS TEXTFILE;

3.   The following query : select * from Entity1 -  should return :
1189272,2,OK, 31



1.   Did you do something like this before?

2.   Suppose the request string was encapsulate with base64, is there a
way to decode it – do we need to use python script for that?

3.   One last question, can you give as example of your use in python  -
aka what are you use it for?



Thanks in advanced,

Liad.