date:20100327

[jira] Commented: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-27 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850484#action_12850484
]

Hadoop QA commented on PIG-1306:

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12439937/PIG-1306.patch
against trunk revision 928080.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 35 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/252/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/252/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/252/console

This message is automatically generated.

[zebra] Support of locally sorted input splits
--

Key: PIG-1306
URL: https://issues.apache.org/jira/browse/PIG-1306
Project: Pig
Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
Fix For: 0.7.0

Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch,
PIG-1306.patch, PIG-1306.patch

Current Zebra supports sorted or unsorted input splits on sorted table or
sorted table unions. The sorted input splits are based upon key ranges which
do not overlap. And the splits are basically globally sorted in that they are
locally sorted, and their key ranges do not overlap.
The biggest problem of the key-range splits are performance hits suffered if
data skew is present, particularly if a key range contains a duplicate key
solely which makes the data trunk of the duplicate keys virtually
unsplittable regardless how many mappers are available: it just has to be
processed by a single mapper.
On the other hand, there are scenarios when the globally sorted splits are a
over-kill and only locally sorted splits are good enough. Examples are the
use of Zebra sorted tables as the probe table in a map-side merge inner join.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-27 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850485#action_12850485
]

Hadoop QA commented on PIG-1331:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12439938/owl.contrib.3.tgz
against trunk revision 928080.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no tests are needed for this patch.

-1 patch. The patch command could not apply the patch.

Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/253/console

This message is automatically generated.

Owl Hadoop Table Management Service
---

Key: PIG-1331
URL: https://issues.apache.org/jira/browse/PIG-1331
Project: Pig
Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
Attachments: owl.contrib.3.tgz

This JIRA is a proposal to create a Hadoop table management service: Owl.
Today, MapReduce and Pig applications interacts directly with HDFS
directories and files and must deal with low level data management issues
such as storage format, serialization/compression schemes, data layout, and
efficient data accesses, etc, often with different solutions. Owl aims to
provide a standard way to addresses this issue and abstracts away the
complexities of reading/writing huge amount of data from/to HDFS.
Owl has a data access API that is modeled after the traditional Hadoop
!InputFormt and a management API to manipulate Owl objects. This JIRA is
related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata
store. Owl integrates with different storage module like Zebra with a
pluggable architecture.
Initially, the proposal is to submit Owl as a Pig contrib project. Over
time, it makes sense to move it to a Hadoop subproject.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-27 Thread Jay Tang (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850638#action_12850638
]

Jay Tang commented on PIG-1331:
---

Owl's data access API, OwlInputFormat, provides a uniform API to access data
stored in different storage format like Zebra, RCFile, SequenceFile, etc. Its
a single data access abstraction on top of disparate data.

Owl Hadoop Table Management Service
---

Key: PIG-1331
URL: https://issues.apache.org/jira/browse/PIG-1331
Project: Pig
Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
Attachments: owl.contrib.3.tgz

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-27 Thread Carl Steinbach (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850639#action_12850639
]

Carl Steinbach commented on PIG-1331:
-

bq. Owl's data access API, OwlInputFormat, provides a uniform API to access
data stored in different storage format like Zebra, RCFile, SequenceFile, etc.
Its a single data access abstraction on top of disparate data.

This sounds like Hive's SerDe interface. Are there any differences?

Owl Hadoop Table Management Service
---

Key: PIG-1331
URL: https://issues.apache.org/jira/browse/PIG-1331
Project: Pig
Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
Attachments: owl.contrib.3.tgz

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1306) [zebra] Support of locally sorted input splits

[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

4 matches

Site Navigation

Mail list logo

Footer information