[jira] Commented: (PIG-1375) [Zebra] To support writing multiple Zebra tables through Pig

Xuefu Zhang (JIRA) Fri, 16 Apr 2010 18:05:48 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858065#action_12858065
 ]


Xuefu Zhang commented on PIG-1375:
----------------------------------

Overall, patch looks Okay. However, I suggest the following changes:

1. Don't make unnecessary format (like indention) changes unless you're change 
that part of the code. I think this is in Apache code guideline.

2. The following if-else can be made clearer.

+         // If this is a sorted table and key is null (Pig's call path);
+    if (sortColIndices != null && key == null) {
+      for (int i =0; i < sortColIndices.length;++i) {
+        t.set(i, value.get(sortColIndices[i]));
+      }
+      key = builder.generateKey(t);
+    } else if (key == null) { // for unsorted table;
+      key = KEY0;
+    }

it can be: 
    if( key == null ) {
        if( sortColIndices != null ) {
            ...
        } else {
             ...
        }
    }

They are equivalent, but latter is a little easier to understand.

3. However, the above row-level if-else check should be avoided if possible. I 
think conf should contain a flag to indicate if key generation is required (so 
the flag is set only in pig's path).

4. The following object creation is unnecessary. Array should be used directly.

List<Path> paths = new ArrayList<Path>(outputs.length);



> [Zebra] To support writing multiple Zebra tables through Pig
> ------------------------------------------------------------
>
>                 Key: PIG-1375
>                 URL: https://issues.apache.org/jira/browse/PIG-1375
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>            Reporter: Chao Wang
>            Assignee: Chao Wang
>             Fix For: 0.8.0
>
>         Attachments: PIG-1375.patch, PIG-1375.patch
>
>
> In Zebra, we already have multiple outputs support for map/reduce.  But we do 
> not support this feature if users use Zebra through Pig.
> This jira is to address this issue. We plan to support writing to multiple 
> output tables through Pig as well.
> We propose to support the following Pig store statements with multiple 
> outputs:
> store relation into 'loc1,loc2,loc3....' using 
> org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
> 'complete name of your custom partition class', 'some arguments to partition 
> class'); /* if certain partition class arguments is needed */
> store relation into 'loc1,loc2,loc3....' using 
> org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
> 'complete name of your custom partition class'); /* if no partition class 
> arguments is needed */
> Note that users need to specify up to three arguments - storage hint string, 
> complete name of partition class and partition class arguments string.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1375) [Zebra] To support writing multiple Zebra tables through Pig

Reply via email to