[jira] [Updated] (PIG-3215) [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files

2013-04-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3215:


Status: Open  (was: Patch Available)

> [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files
> 
>
> Key: PIG-3215
> URL: https://issues.apache.org/jira/browse/PIG-3215
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Reporter: MIYAKAWA Taku
>Assignee: MIYAKAWA Taku
>  Labels: piggybank
> Attachments: LTSVLoader-6.html, LTSVLoader.html, PIG-3215-6.patch, 
> PIG-3215.patch
>
>
> LTSV, or Labeled Tab-separated Values format is now getting popular in Japan 
> for log files, especially of web servers. The goal of this jira is to add 
> LTSVLoader in PiggyBank to load LTSV files.
> LTSV is based on TSV thus columns are separated by tab characters. 
> Additionally each of columns includes a label and a value, separated by ":" 
> character.
> Read about LTSV on http://ltsv.org/.
> h4. Example LTSV file (access.log)
> Columns are separated by tab characters.
> {noformat}
> host:host1.example.orgreq:GET /index.html ua:Opera/9.80
> host:host1.example.orgreq:GET /favicon.icoua:Opera/9.80
> host:pc.example.com   req:GET /news.html  ua:Mozilla/5.0
> {noformat}
> h4. Usage 1: Extract fields from each line
> Users can specify an input schema and get columns as Pig fields.
> This example loads the LTSV file shown in the previous section.
> {code}
> -- Parses the access log and count the number of lines
> -- for each pair of the host column and the ua column.
> access = LOAD 'access.log' USING 
> org.apache.pig.piggybank.storage.LTSVLoader('host:chararray, ua:chararray');
> grouped_access = GROUP access BY (host, ua);
> count_for_host_ua = FOREACH grouped_access GENERATE group.host, group.ua, 
> COUNT(access);
> DUMP count_for_host_ua;
> {code}
> The below text will be printed out.
> {noformat}
> (host1.example.org,Opera/9.80,2)
> (pc.example.com,Firefox/5.0,1)
> {noformat}
> h4. Usage 2: Extract a map from each line
> Users can get a map for each LTSV line. The key of a map is a label of the 
> LTSV column. The value of a map comes from characters after ":" in the LTSV 
> column.
> {code}
> -- Parses the access log and projects the user agent field.
> access = LOAD 'access.log' USING 
> org.apache.pig.piggybank.storage.LTSVLoader() AS (m:map[]);
> user_agent = FOREACH access GENERATE m#'ua' AS ua;
> DUMP user_agent;
> {code}
> The below text will be printed out.
> {noformat}
> (Opera/9.80)
> (Opera/9.80)
> (Firefox/5.0)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3215) [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files

2013-03-09 Thread MIYAKAWA Taku (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MIYAKAWA Taku updated PIG-3215:
---

Attachment: LTSVLoader-6.html
PIG-3215-6.patch

> [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files
> 
>
> Key: PIG-3215
> URL: https://issues.apache.org/jira/browse/PIG-3215
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Reporter: MIYAKAWA Taku
>Assignee: MIYAKAWA Taku
>  Labels: piggybank
> Attachments: LTSVLoader-6.html, LTSVLoader.html, PIG-3215-6.patch, 
> PIG-3215.patch
>
>
> LTSV, or Labeled Tab-separated Values format is now getting popular in Japan 
> for log files, especially of web servers. The goal of this jira is to add 
> LTSVLoader in PiggyBank to load LTSV files.
> LTSV is based on TSV thus columns are separated by tab characters. 
> Additionally each of columns includes a label and a value, separated by ":" 
> character.
> Read about LTSV on http://ltsv.org/.
> h4. Example LTSV file (access.log)
> Columns are separated by tab characters.
> {noformat}
> host:host1.example.orgreq:GET /index.html ua:Opera/9.80
> host:host1.example.orgreq:GET /favicon.icoua:Opera/9.80
> host:pc.example.com   req:GET /news.html  ua:Mozilla/5.0
> {noformat}
> h4. Usage 1: Extract fields from each line
> Users can specify an input schema and get columns as Pig fields.
> This example loads the LTSV file shown in the previous section.
> {code}
> -- Parses the access log and count the number of lines
> -- for each pair of the host column and the ua column.
> access = LOAD 'access.log' USING 
> org.apache.pig.piggybank.storage.LTSVLoader('host:chararray, ua:chararray');
> grouped_access = GROUP access BY (host, ua);
> count_for_host_ua = FOREACH grouped_access GENERATE group.host, group.ua, 
> COUNT(access);
> DUMP count_for_host_ua;
> {code}
> The below text will be printed out.
> {noformat}
> (host1.example.org,Opera/9.80,2)
> (pc.example.com,Firefox/5.0,1)
> {noformat}
> h4. Usage 2: Extract a map from each line
> Users can get a map for each LTSV line. The key of a map is a label of the 
> LTSV column. The value of a map comes from characters after ":" in the LTSV 
> column.
> {code}
> -- Parses the access log and projects the user agent field.
> access = LOAD 'access.log' USING 
> org.apache.pig.piggybank.storage.LTSVLoader() AS (m:map[]);
> user_agent = FOREACH access GENERATE m#'ua' AS ua;
> DUMP user_agent;
> {code}
> The below text will be printed out.
> {noformat}
> (Opera/9.80)
> (Opera/9.80)
> (Firefox/5.0)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3215) [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files

2013-02-24 Thread MIYAKAWA Taku (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MIYAKAWA Taku updated PIG-3215:
---

Status: Patch Available  (was: Open)

> [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files
> 
>
> Key: PIG-3215
> URL: https://issues.apache.org/jira/browse/PIG-3215
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Reporter: MIYAKAWA Taku
>  Labels: piggybank
> Attachments: LTSVLoader.html, PIG-3215.patch
>
>
> LTSV, or Labeled Tab-separated Values format is now getting popular in Japan 
> for log files, especially of web servers. The goal of this jira is to add 
> LTSVLoader in PiggyBank to load LTSV files.
> LTSV is based on TSV thus columns are separated by tab characters. 
> Additionally each of columns includes a label and a value, separated by ":" 
> character.
> Read about LTSV on http://ltsv.org/.
> h4. Example LTSV file (access.log)
> Columns are separated by tab characters.
> {noformat}
> host:host1.example.orgreq:GET /index.html ua:Opera/9.80
> host:host1.example.orgreq:GET /favicon.icoua:Opera/9.80
> host:pc.example.com   req:GET /news.html  ua:Mozilla/5.0
> {noformat}
> h4. Usage 1: Extract fields from each line
> Users can specify an input schema and get columns as Pig fields.
> This example loads the LTSV file shown in the previous section.
> {code}
> -- Parses the access log and count the number of lines
> -- for each pair of the host column and the ua column.
> access = LOAD 'access.log' USING 
> org.apache.pig.piggybank.storage.LTSVLoader('host:chararray, ua:chararray');
> grouped_access = GROUP access BY (host, ua);
> count_for_host_ua = FOREACH grouped_access GENERATE group.host, group.ua, 
> COUNT(access);
> DUMP count_for_host_ua;
> {code}
> The below text will be printed out.
> {noformat}
> (host1.example.org,Opera/9.80,2)
> (pc.example.com,Firefox/5.0,1)
> {noformat}
> h4. Usage 2: Extract a map from each line
> Users can get a map for each LTSV line. The key of a map is a label of the 
> LTSV column. The value of a map comes from characters after ":" in the LTSV 
> column.
> {code}
> -- Parses the access log and projects the user agent field.
> access = LOAD 'access.log' USING 
> org.apache.pig.piggybank.storage.LTSVLoader() AS (m:map[]);
> user_agent = FOREACH access GENERATE m#'ua' AS ua;
> DUMP user_agent;
> {code}
> The below text will be printed out.
> {noformat}
> (Opera/9.80)
> (Opera/9.80)
> (Firefox/5.0)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3215) [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files

2013-02-24 Thread MIYAKAWA Taku (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MIYAKAWA Taku updated PIG-3215:
---

Attachment: LTSVLoader.html
PIG-3215.patch

> [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files
> 
>
> Key: PIG-3215
> URL: https://issues.apache.org/jira/browse/PIG-3215
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Reporter: MIYAKAWA Taku
>  Labels: piggybank
> Attachments: LTSVLoader.html, PIG-3215.patch
>
>
> LTSV, or Labeled Tab-separated Values format is now getting popular in Japan 
> for log files, especially of web servers. The goal of this jira is to add 
> LTSVLoader in PiggyBank to load LTSV files.
> LTSV is based on TSV thus columns are separated by tab characters. 
> Additionally each of columns includes a label and a value, separated by ":" 
> character.
> Read about LTSV on http://ltsv.org/.
> h4. Example LTSV file (access.log)
> Columns are separated by tab characters.
> {noformat}
> host:host1.example.orgreq:GET /index.html ua:Opera/9.80
> host:host1.example.orgreq:GET /favicon.icoua:Opera/9.80
> host:pc.example.com   req:GET /news.html  ua:Mozilla/5.0
> {noformat}
> h4. Usage 1: Extract fields from each line
> Users can specify an input schema and get columns as Pig fields.
> This example loads the LTSV file shown in the previous section.
> {code}
> -- Parses the access log and count the number of lines
> -- for each pair of the host column and the ua column.
> access = LOAD 'access.log' USING 
> org.apache.pig.piggybank.storage.LTSVLoader('host:chararray, ua:chararray');
> grouped_access = GROUP access BY (host, ua);
> count_for_host_ua = FOREACH grouped_access GENERATE group.host, group.ua, 
> COUNT(access);
> DUMP count_for_host_ua;
> {code}
> The below text will be printed out.
> {noformat}
> (host1.example.org,Opera/9.80,2)
> (pc.example.com,Firefox/5.0,1)
> {noformat}
> h4. Usage 2: Extract a map from each line
> Users can get a map for each LTSV line. The key of a map is a label of the 
> LTSV column. The value of a map comes from characters after ":" in the LTSV 
> column.
> {code}
> -- Parses the access log and projects the user agent field.
> access = LOAD 'access.log' USING 
> org.apache.pig.piggybank.storage.LTSVLoader() AS (m:map[]);
> user_agent = FOREACH access GENERATE m#'ua' AS ua;
> DUMP user_agent;
> {code}
> The below text will be printed out.
> {noformat}
> (Opera/9.80)
> (Opera/9.80)
> (Firefox/5.0)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira