[jira] [Created] (PIG-2011) Speed up TestTypedMap.java

2011-04-21 Thread Dmitriy V. Ryaboy (JIRA)
Speed up TestTypedMap.java 
---

 Key: PIG-2011
 URL: https://issues.apache.org/jira/browse/PIG-2011
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.10


TestTypedMap uses Mapreduce mode and takes 7 minutes.
This can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2010) Bundle registered jars via distributed cache

2011-04-21 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2010:
---

Attachment: pig-2010.patch

This patch by Travis Crawford implements the change.

Note that as written, the patch depends on Hadoop's MAPRECUDE-787 which is 
available in CDH3 and Hadoop 0.21. 

Looks like 787 was also rolled into the yahoo distribution, so it may be 
available in the security branch?

> Bundle registered jars via distributed cache
> 
>
> Key: PIG-2010
> URL: https://issues.apache.org/jira/browse/PIG-2010
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig-2010.patch
>
>
> Currently registered jars get collapsed into a single job megajar that gets 
> submitted to Hadoop.
> A better pattern would be to take advantage of the distributed cache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2010) Bundle registered jars via distributed cache

2011-04-21 Thread Dmitriy V. Ryaboy (JIRA)
Bundle registered jars via distributed cache


 Key: PIG-2010
 URL: https://issues.apache.org/jira/browse/PIG-2010
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy


Currently registered jars get collapsed into a single job megajar that gets 
submitted to Hadoop.
A better pattern would be to take advantage of the distributed cache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason

2011-04-21 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1827:
--

Attachment: PIG-1827-1.patch

> When passing a parameter to Pig, if the value contains $ it has to be escaped 
> for no apparent reason
> 
>
> Key: PIG-1827
> URL: https://issues.apache.org/jira/browse/PIG-1827
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Julien Le Dem
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-1827-1.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason

2011-04-21 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023034#comment-13023034
 ] 

Richard Ding commented on PIG-1827:
---

Looked into it a little more. Actually, embedded Pig just use parameter 
substitution to perform parameter binding and it doesn't support recursive 
substitution. So it should be ok to remove the requirement that $ in parameter 
be escaped.

> When passing a parameter to Pig, if the value contains $ it has to be escaped 
> for no apparent reason
> 
>
> Key: PIG-1827
> URL: https://issues.apache.org/jira/browse/PIG-1827
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Julien Le Dem
>Assignee: Richard Ding
> Fix For: 0.9.0
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2009) Better MergeForEach rule

2011-04-21 Thread Daniel Dai (JIRA)
Better MergeForEach rule


 Key: PIG-2009
 URL: https://issues.apache.org/jira/browse/PIG-2009
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.9.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.10


MergeForEach rule will not merge two consecutive ForEach if the second ForEach 
has inner relational plan. This prevent some optimizations. Eg,
{code}
A = LOAD 'input1' AS (a0, a1, a2);
B = LOAD 'input2' AS (b0, b1, b2);
C = cogroup A by a0, B by b0;
D = foreach C { E = limit A 10; F = E.a1; G = DISTINCT F; generate group, 
COUNT(G);};
explain D;
{code}
We add ForEach after cogroup to prune B, however, we cannot merge this ForEach 
with D. Secondary key optimization for this query is thus disabled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1985) Utils.getSchemaFromString does not use the new parser, and thus fails to parse valid schema

2011-04-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1985:


Attachment: PIG-1985-1.patch

> Utils.getSchemaFromString does not use the new parser, and thus fails to 
> parse valid schema
> ---
>
> Key: PIG-1985
> URL: https://issues.apache.org/jira/browse/PIG-1985
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Woody Anderson
>Assignee: Daniel Dai
> Fix For: 0.9.0, 0.10
>
> Attachments: PIG-1985-1.patch
>
>
> I've been told this is because Utils.getSchemaFromString does not use the new 
> parser to parse the schema, so we should update the impl to use the new 
> parser:
> {code}
> Utils.getSchemaFromString("f: map[]")
> {code}
> results in: (org.apache.pig.impl.logicalLayer.schema.Schema) {f: map[]}
> {code}
> Utils.getSchemaFromString("f: map[int]")
> {code}
> results in: An exception occurred: 
> org.apache.pig.impl.logicalLayer.parser.ParseException
> ..
> org.apache.pig.impl.logicalLayer.parser.ParseException: Encountered " "map" 
> "map "" at line 1, column 4.
> Was expecting one of:
> "int" ...
> "long" ...
> "float" ...
> "double" ...
> "chararray" ...
> "bytearray" ...
> "int" ...
> "long" ...
> "float" ...
> "double" ...
> "chararray" ...
> "bytearray" ...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1985) Utils.getSchemaFromString does not use the new parser, and thus fails to parse valid schema

2011-04-21 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022975#comment-13022975
 ] 

Daniel Dai commented on PIG-1985:
-

Migrate the typed schema parser back to old logical plan, so that 
Utils.getSchemaFromString can be used.

> Utils.getSchemaFromString does not use the new parser, and thus fails to 
> parse valid schema
> ---
>
> Key: PIG-1985
> URL: https://issues.apache.org/jira/browse/PIG-1985
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Woody Anderson
>Assignee: Daniel Dai
> Fix For: 0.9.0, 0.10
>
> Attachments: PIG-1985-1.patch
>
>
> I've been told this is because Utils.getSchemaFromString does not use the new 
> parser to parse the schema, so we should update the impl to use the new 
> parser:
> {code}
> Utils.getSchemaFromString("f: map[]")
> {code}
> results in: (org.apache.pig.impl.logicalLayer.schema.Schema) {f: map[]}
> {code}
> Utils.getSchemaFromString("f: map[int]")
> {code}
> results in: An exception occurred: 
> org.apache.pig.impl.logicalLayer.parser.ParseException
> ..
> org.apache.pig.impl.logicalLayer.parser.ParseException: Encountered " "map" 
> "map "" at line 1, column 4.
> Was expecting one of:
> "int" ...
> "long" ...
> "float" ...
> "double" ...
> "chararray" ...
> "bytearray" ...
> "int" ...
> "long" ...
> "float" ...
> "double" ...
> "chararray" ...
> "bytearray" ...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is back to normal : Pig-trunk #987

2011-04-21 Thread Apache Jenkins Server
See 




[jira] [Assigned] (PIG-1985) Utils.getSchemaFromString does not use the new parser, and thus fails to parse valid schema

2011-04-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-1985:
---

Assignee: Daniel Dai

> Utils.getSchemaFromString does not use the new parser, and thus fails to 
> parse valid schema
> ---
>
> Key: PIG-1985
> URL: https://issues.apache.org/jira/browse/PIG-1985
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Woody Anderson
>Assignee: Daniel Dai
> Fix For: 0.9.0, 0.10
>
>
> I've been told this is because Utils.getSchemaFromString does not use the new 
> parser to parse the schema, so we should update the impl to use the new 
> parser:
> {code}
> Utils.getSchemaFromString("f: map[]")
> {code}
> results in: (org.apache.pig.impl.logicalLayer.schema.Schema) {f: map[]}
> {code}
> Utils.getSchemaFromString("f: map[int]")
> {code}
> results in: An exception occurred: 
> org.apache.pig.impl.logicalLayer.parser.ParseException
> ..
> org.apache.pig.impl.logicalLayer.parser.ParseException: Encountered " "map" 
> "map "" at line 1, column 4.
> Was expecting one of:
> "int" ...
> "long" ...
> "float" ...
> "double" ...
> "chararray" ...
> "bytearray" ...
> "int" ...
> "long" ...
> "float" ...
> "double" ...
> "chararray" ...
> "bytearray" ...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1981) LoadPushDown.pushProjection should pass alias in addition to position

2011-04-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1981:


Attachment: PIG-1981-1.patch

> LoadPushDown.pushProjection should pass alias in addition to position
> -
>
> Key: PIG-1981
> URL: https://issues.apache.org/jira/browse/PIG-1981
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1981-1.patch
>
>
> Currently
> pushProjection(RequiredFieldList requiredFieldList)
> requiredFieldList only contains position. It is better that we also provide 
> alias whenever available.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1973) UDFContext.getUDFContext has a thread race condition around it's ThreadLocal

2011-04-21 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022939#comment-13022939
 ] 

Woody Anderson commented on PIG-1973:
-

ok. i agree. it's not a bug.
though, i still find it misleading code, in that it doesn't utilize the easy 
concise form, and at least to me "looks wrong" on 1st and second inspection.

> UDFContext.getUDFContext has a thread race condition around it's ThreadLocal
> 
>
> Key: PIG-1973
> URL: https://issues.apache.org/jira/browse/PIG-1973
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Woody Anderson
>Assignee: Woody Anderson
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: 1973.patch
>
>
> this is probably isn't manifesting anywhere, but it's an incorrect use of the 
> ThreadLocal pattern.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1973) UDFContext.getUDFContext has a thread race condition around it's ThreadLocal

2011-04-21 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1973:


Issue Type: Improvement  (was: Bug)

> UDFContext.getUDFContext has a thread race condition around it's ThreadLocal
> 
>
> Key: PIG-1973
> URL: https://issues.apache.org/jira/browse/PIG-1973
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Woody Anderson
>Assignee: Woody Anderson
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: 1973.patch
>
>
> this is probably isn't manifesting anywhere, but it's an incorrect use of the 
> ThreadLocal pattern.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1693) support project-range expression. (was: There needs to be a way in foreach to indicate "and all the rest of the fields" )

2011-04-21 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1693:
---

Release Note: 

Project-range ( '..' ) can be used to project a range of columns from input. 
For example, the expressions - 
.. $x  : projects columns $0 through $x, inclusive

$x ..  : projects columns through end, inclusive

$x .. $y : projects columns through $y, inclusive
If the input relation has a schema, you can also use column aliases instead of 
referring to columns using position. You can also combine the use of alias and 
column positions in a project-range expression (ie, "col1 .. $5"  is valid).


This expression can be used in all cases where the use of '*' (project-star) is 
allowed, except as a udf argument. Support for that use case will be added in 
PIG-1938.

It can be used in following statements -
- foreach 
- join
- order (also when it is within a nested foreach block)
- group/cogroup

Examples - 
{code}
grunt> F = foreach IN generate (int)col0, col1 .. col3;  
grunt> describe F;   
F: {col0: int,col1: bytearray,col2: bytearray,col3: bytearray}
{code}
{code}
grunt> SORT = order IN by col2 .. col3, col0, col4 ..;
{code}
{code}
J = join IN1 by  $0 .. $3,  IN2 by $0 .. $3;
{code}
{code}
g = group l1 by  b .. c;
{code}

Limitations:
There are some restrictions on the use of project-to-end form of project range 
(eg "x .. ") when input schema is null (unknown). These are also cases where 
the use of project-star ('*') is restricted.

1.  In Cogroup/Group statements, project-to-end form of project-range is only 
allowed if the input has a schema

2. In order-by statement, project-to-end form of project-range is supported 
only as last sort column, if input schema is null.
example-
{code}
grunt> describe IN;
Schema for IN unknown.

-- Following statement is supported
SORT = order IN by $2 .. $3, $6 ..;

-- Following statement is NOT supported
SORT = order IN by $2 .. $3, $6 ..;
{code}



  was:

Project-range ( '..' ) can be used to project a range of columns from input. 
For example, the expressions - 
.. $x  : projects columns $0 through $x, inclusive

$x ..  : projects columns through end, inclusive

$x .. $y : projects columns through $y, inclusive
If the input relation has a schema, you can also use column aliases instead of 
referring to columns using position. You can also combine the use of alias and 
column positions in a project-range expression (ie, "col1 .. $5"  is valid).


This expression can be used in all cases where the use of '*' (project-star) is 
allowed, except as a udf argument. Support for that use case will be added in 
PIG-1938.

It can be used in following statements -
- foreach 
- join
- order (also when it is within a nested foreach block)
- group/cogroup

Examples - 
{code}
grunt> F = foreach IN generate (int)col0, col1 .. col3;  
grunt> describe F;   
F: {col0: int,col1: bytearray,col2: bytearray,col3: bytearray}
{code}
{code}
grunt> SORT = order IN by col2 .. col3, col0, col4 ..;
{code}
{code}
J = join IN1 by  $0 .. $3,  IN2 by $0 .. $3;
{code}
{code}
g = group l1 by  b .. c;
{code}

Limitations:
There are some restrictions on the use of project-to-end form of project range 
(eg "x .. ") when input schema is null (unknown). These are also cases where 
the use of project-star ('*') is restricted.

1.  In Cogroup/Group statements, project-to-end form of project-range is only 
allowed if the input has a schema

2. In order-by statement, project-to-end form of project-range is supported 
only as last sort column, if input schema is null.
Note: there is a bug PIG-1939, because of which the use is restricted when 
schema is present. That should be fixed soon.
example-
{code}
grunt> describe IN;
Schema for IN unknown.

-- Following statement is supported
SORT = order IN by $2 .. $3, $6 ..;

-- Following statement is NOT supported
SORT = order IN by $2 .. $3, $6 ..;
{code}




> support project-range expression. (was: There needs to be a way in foreach to 
> indicate "and all the rest of the fields" )
> -
>
> Key: PIG-1693
> URL: https://issues.apache.org/jira/browse/PIG-1693
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Alan Gates
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1693.1.patch, PIG-1693.2.patch
>
>
> A common use case we see in Pig is people have many columns in their data and 
> they only want to operate on a few of them.  Consider for example if before 
> storing data with ten columns, the user wants to perform a cast on one column:
> {code}
> ...
> Z = foreach Y generate (int)firstcol, secondcol, thridcol, forth

[jira] [Updated] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason

2011-04-21 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1827:
--

Fix Version/s: 0.9.0
 Assignee: Richard Ding

> When passing a parameter to Pig, if the value contains $ it has to be escaped 
> for no apparent reason
> 
>
> Key: PIG-1827
> URL: https://issues.apache.org/jira/browse/PIG-1827
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Julien Le Dem
>Assignee: Richard Ding
> Fix For: 0.9.0
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


RE: Next Pig Developer Meeting

2011-04-21 Thread Dmitriy Ryaboy
Missed my train, will be late.. 4:30 if I'm lucky.

-Original Message-
From: "Julien Le Dem" 
To: "dev@pig.apache.org" 
Sent: 4/21/2011 9:28 AM
Subject: Re: Next Pig Developer Meeting

Hi guys,
I won't be able to attend the meeting today. Sorry about that.
(2) It would help to have much faster unit tests. MiniCluster is way too slow 
for unit tests (10 hours to run ant test). Also it seems that individual tests 
can't be run through the eclipse plugin.

Julien

On 4/18/11 10:48 AM, "Olga Natkovich"  wrote:

Hi,

Just a reminder that we will be holding the next meeting this Thursday, 4/20 
4-6 pm at Yahoo.

The address is 701 First Ave. Building E Sunnyvale, CA 94089. Please ask for me 
or Alan Gates at reception.

Thanks,

Olga

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com]
Sent: Wednesday, April 06, 2011 4:42 PM
To: dev@pig.apache.org
Subject: Next Pig Developer Meeting

Hi guys,

It has been a while and I think we have a few topics to discuss. Here are the 
ones I would like to propose :


(1)What is the process and criteria for changing existing releases. For 
instance
[truncated by sender]


[jira] [Updated] (PIG-1976) One more TwoLevelAccess to remove

2011-04-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1976:


Attachment: PIG-1976-1.patch

> One more TwoLevelAccess to remove
> -
>
> Key: PIG-1976
> URL: https://issues.apache.org/jira/browse/PIG-1976
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1976-1.patch
>
>
> We removed two level access in PIG-847. However, there is another occurrence 
> we miss in ResourceSchema.java:
> {code}
> if (type == DataType.BAG && fieldSchema.schema != null
> && !fieldSchema.schema.isTwoLevelAccessRequired()) { 
> log.info("Insert two-level access to Resource Schema");
> FieldSchema fs = new FieldSchema("t", fieldSchema.schema);
> inner = new Schema(fs);
> }
> {code}
> Though by default schema.isTwoLevelAccessRequired is false, we shall not use 
> this flag in Pig. User could set this flag in legacy UDF.
> Thanks Woody uncovered this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2005) Discrepancy in the way dry run handles semicolon in macro definition

2011-04-21 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022923#comment-13022923
 ] 

Thejas M Nair commented on PIG-2005:


+1

> Discrepancy in the way dry run handles semicolon in macro definition
> 
>
> Key: PIG-2005
> URL: https://issues.apache.org/jira/browse/PIG-2005
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2005_1.patch
>
>
> Macro definition requires a semicolon to mark the end. For example:
> {code}
> define mymacro(x) returns y {... ...};
> {code}
> But invoked through command line, the macro definitions without semicolon 
> also work except in the case of dryrun. This discrepancy is due to 
> GruntParser automatic appending a semicolon to Pig statements if semicolon is 
> absent at the end. Dryrun GruntParser should do the same.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1814) mapred.output.compress in SET statement does not work

2011-04-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1814:


Attachment: PIG-1814-1.patch

> mapred.output.compress in SET statement does not work
> -
>
> Key: PIG-1814
> URL: https://issues.apache.org/jira/browse/PIG-1814
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-1814-1.patch
>
>
> Setting output compression using "SET" in the script does not work:
> SET mapred.output.compress true;
> SET mapred.output.compression.codec org.apache.hadoop.io.compress.GzipCodec;
> We did some trick to make individual compression setting for multistore work. 
> Instead of the above parameter, using the following works:
> SET output.compression.enabled true;
> SET output.compression.codec org.apache.hadoop.io.compress.GzipCodec;
> However, this is against intuition. We should use 
> mapred.output.compress/mapred.output.compression.codec.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2008) Cache outputFormat in HBaseStorage

2011-04-21 Thread Jacob Perkins (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Perkins updated PIG-2008:
---

Attachment: patch_file.txt

> Cache outputFormat in HBaseStorage
> --
>
> Key: PIG-2008
> URL: https://issues.apache.org/jira/browse/PIG-2008
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.8.0
>Reporter: Jacob Perkins
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: patch_file.txt
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> getOutputFormat gets called more than one time in a StoreFunc. Modify 
> HBaseStorage to only create an instance of TableOutputFormat one time (since 
> it creates a new HTable connection each time) as opposed to multiple times 
> like it does now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2008) Cache outputFormat in HBaseStorage

2011-04-21 Thread Jacob Perkins (JIRA)
Cache outputFormat in HBaseStorage
--

 Key: PIG-2008
 URL: https://issues.apache.org/jira/browse/PIG-2008
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.0
Reporter: Jacob Perkins
Priority: Minor
 Fix For: 0.8.0


getOutputFormat gets called more than one time in a StoreFunc. Modify 
HBaseStorage to only create an instance of TableOutputFormat one time (since it 
creates a new HTable connection each time) as opposed to multiple times like it 
does now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2000) Pig gives incorrect error message dealing with scalar projection

2011-04-21 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved PIG-2000.
--

Resolution: Fixed

Unit test passed. Patch is committed to trunk. Issue is closed.

> Pig gives incorrect error message dealing with scalar projection
> 
>
> Key: PIG-2000
> URL: https://issues.apache.org/jira/browse/PIG-2000
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
> Attachments: PIG-2000.patch
>
>
> For the following query:
> A = load 'x' as (u:tuple(x,y),v);
> B = load 'y';
> C = foreach B generate $0, A.u.x;
> error msg in 0.8:
> ERROR 1000: Error during parsing. Invalid alias: x in {u: (x: bytearray,y: 
> bytearray),v: bytearray}
> error msg in 0.9:
> ERROR 1200: Pig script failed to parse:  Invalid scalar 
> projection: A
> Both messages are not clear enough. For scalar support, we only support one 
> level, which gives a syntax of R.f format.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Next Pig Developer Meeting

2011-04-21 Thread Julien Le Dem
Hi guys,
I won't be able to attend the meeting today. Sorry about that.
(2) It would help to have much faster unit tests. MiniCluster is way too slow 
for unit tests (10 hours to run ant test). Also it seems that individual tests 
can't be run through the eclipse plugin.

Julien

On 4/18/11 10:48 AM, "Olga Natkovich"  wrote:

Hi,

Just a reminder that we will be holding the next meeting this Thursday, 4/20 
4-6 pm at Yahoo.

The address is 701 First Ave. Building E Sunnyvale, CA 94089. Please ask for me 
or Alan Gates at reception.

Thanks,

Olga

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com]
Sent: Wednesday, April 06, 2011 4:42 PM
To: dev@pig.apache.org
Subject: Next Pig Developer Meeting

Hi guys,

It has been a while and I think we have a few topics to discuss. Here are the 
ones I would like to propose :


(1)What is the process and criteria for changing existing releases. For 
instance, which changes are ok for existing branches vs. which ones should go 
to the trunk or to another branch.

(2)How do we improve our testing. We have found a large number of problems 
post 0.8 release. It is pretty clear that our testing is not keeping up with 
the pace of the changes we introduce. We need to find a better testing strategy 
that goes beyond unit tests.

As always, feel free to suggest additional topics for discussion.

Would April 21st work? We at Yahoo would be happy to host but are also opened 
to other suggestions/locations.

Olga



[jira] [Commented] (PIG-1998) Allow macro to return void

2011-04-21 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022813#comment-13022813
 ] 

Richard Ding commented on PIG-1998:
---

Unit tests pass.

> Allow macro to return void
> --
>
> Key: PIG-1998
> URL: https://issues.apache.org/jira/browse/PIG-1998
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-1998_1.patch
>
>
> Pig macro is allowed to not have output alias. But this property isn't clear 
> from macro definition and macro invocation (macro inline). Here we propose to 
> make it clear:
> 1. If a macro doesn't output any alias, it must specify void as return value. 
> For example:
> {code}  
> define mymacro(...) returns void {
>... ...
> };
> {code}
> 2. If a macro doesn't output any alias, it must be invoked without return 
> value. For example, to invoke above macro, just specify:
> {code}
> mymacro(...);
> {code}
> 3. Any non-void return alias in the macro definition must exist in the macro 
> body and be prefixed with $. For example:
> {code}  
> define mymacro(...) returns B {
>... ...
>$B = filter ...;
> };
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2007) Parsing error when map key referred directly from udf in nested foreach

2011-04-21 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2007:


  Description: 
The below script when executed with version 0.9 fails with parsing error.

{code}
 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
 mismatched input '{' expecting GENERATE
{code}

Script1
{code}
register myudf.jar;
A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
B1 = foreach A {
C = test.TOMAP('key1',$1)#'key1';
generate C as C;
}
{code}

The above happens when, in a nested foreach i refer to a map key directly from 
a udf result

The same would work if one executes without the nested foreach.
{code}
register myudf.jar;
A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
B1 = foreach A generate test.TOMAP('key1',$1)#'key1';
dump B1;
{code}

Script1 works well with 0.8.


  was:

The below script when executed with version 0.9 fails with parsing error.

{code}
 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
 mismatched input '{' expecting GENERATE
{code}

Script1
{code}
register myudf.jar;
A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
B1 = foreach A {
C = test.TOMAP('key1',$1)#'key1';
generate C as C;
}
{code}

The above happens when, in a nested foreach i refer to a map key directly from 
a udf result

The same would work if one executes without the nested foreach.
{code}
register myudf.jar;
A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
B1 = foreach A generate test.TOMAP('key1',$1)#'key1';
dump B1;
{code}

Script1 works well with 0.8.


Fix Version/s: 0.9.0
 Assignee: Xuefu Zhang

> Parsing error when map key referred directly from udf in nested foreach 
> 
>
> Key: PIG-2007
> URL: https://issues.apache.org/jira/browse/PIG-2007
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Anitha Raju
>Assignee: Xuefu Zhang
> Fix For: 0.9.0
>
>
> The below script when executed with version 0.9 fails with parsing error.
> {code}
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
>  mismatched input '{' expecting GENERATE
> {code}
> Script1
> {code}
> register myudf.jar;
> A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
> B1 = foreach A {
> C = test.TOMAP('key1',$1)#'key1';
> generate C as C;
> }
> {code}
> The above happens when, in a nested foreach i refer to a map key directly 
> from a udf result
> The same would work if one executes without the nested foreach.
> {code}
> register myudf.jar;
> A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
> B1 = foreach A generate test.TOMAP('key1',$1)#'key1';
> dump B1;
> {code}
> Script1 works well with 0.8.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2007) Parsing error when map key referred directly from udf in nested foreach

2011-04-21 Thread Anitha Raju (JIRA)
Parsing error when map key referred directly from udf in nested foreach 


 Key: PIG-2007
 URL: https://issues.apache.org/jira/browse/PIG-2007
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Anitha Raju



The below script when executed with version 0.9 fails with parsing error.

{code}
 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
 mismatched input '{' expecting GENERATE
{code}

Script1
{code}
register myudf.jar;
A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
B1 = foreach A {
C = test.TOMAP('key1',$1)#'key1';
generate C as C;
}
{code}

The above happens when, in a nested foreach i refer to a map key directly from 
a udf result

The same would work if one executes without the nested foreach.
{code}
register myudf.jar;
A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
B1 = foreach A generate test.TOMAP('key1',$1)#'key1';
dump B1;
{code}

Script1 works well with 0.8.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira