[jira] Created: (PIG-1270) Push limit into loader

2010-03-02 Thread Daniel Dai (JIRA)
Push limit into loader
--

 Key: PIG-1270
 URL: https://issues.apache.org/jira/browse/PIG-1270
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai


We can optimize limit operation by stopping early in PigRecordReader. In 
general, we need a way to communicate between PigRecordReader and execution 
pipeline. POLimit could instruct PigRecordReader that we have already had 
enough records and stop feeding more data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1271) Provide a more flexible data format to load complex field (bag/tuple/map) in PigStorage

2010-03-02 Thread Daniel Dai (JIRA)
Provide a more flexible data format to load complex field (bag/tuple/map) in 
PigStorage
---

 Key: PIG-1271
 URL: https://issues.apache.org/jira/browse/PIG-1271
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai


With [PIG-613|https://issues.apache.org/jira/browse/PIG-613], we are able to 
load txt files containing complex data type (map/bag/tuple) according to 
schema. However, the format of complex data field is very strict. User have to 
use pre-determined special characters to mark the beginning and end of each 
field, and those special characters can not be used in the content. The goals 
of this issue are:

1. Provide a way for user to escape special characters
2. Make it easy for users to customize Utf8StorageConverter when they have 
their own data format



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1272) Column pruner causes wrong results

2010-03-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1272:


Attachment: PIG-1272-1.patch

 Column pruner causes wrong results
 --

 Key: PIG-1272
 URL: https://issues.apache.org/jira/browse/PIG-1272
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1272-1.patch


 For a simple script the column pruner optimization removes certain columns 
 from the original relation, which results in wrong results.
 Input file kv contains the following columns (tab separated)
 {code}
 a   1
 a   2
 a   3
 b   4
 c   5
 c   6
 b   7
 d   8
 {code}
 Now running this script in Pig 0.6 produces
 {code}
 kv = load 'kv' as (k,v);
 keys= foreach kv generate k;
 keys = distinct keys; 
 keys = limit keys 2;
 rejoin = join keys by k, kv by k;
 dump rejoin;
 {code}
 (a,a)
 (a,a)
 (a,a)
 (b,b)
 (b,b)
 Running this in Pig 0.5 version without column pruner results in:
 (a,a,1)
 (a,a,2)
 (a,a,3)
 (b,b,4)
 (b,b,7)
 When we disable the ColumnPruner optimization it gives right results.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1262) Additional findbugs and javac warnings

2010-02-26 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1262:


Status: Patch Available  (was: Open)

Hudson is not working, resubmit.

 Additional findbugs and javac warnings
 --

 Key: PIG-1262
 URL: https://issues.apache.org/jira/browse/PIG-1262
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1262-1.patch


 After a while, we have introduced some new findbugs and javacc warnings. Will 
 fix them in this Jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1262) Additional findbugs and javac warnings

2010-02-26 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1262:


Status: Open  (was: Patch Available)

 Additional findbugs and javac warnings
 --

 Key: PIG-1262
 URL: https://issues.apache.org/jira/browse/PIG-1262
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1262-1.patch


 After a while, we have introduced some new findbugs and javacc warnings. Will 
 fix them in this Jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1251) Move SortInfo calculation earlier in compilation

2010-02-26 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838977#action_12838977
 ] 

Daniel Dai commented on PIG-1251:
-

+1 for the patch. Please resync with trunk, uncomment testLocalModeNegative2 
and testMapReduceModeInputNegative2 in TestInputOutputFileValidator, then 
commit.

 Move SortInfo calculation earlier in compilation 
 -

 Key: PIG-1251
 URL: https://issues.apache.org/jira/browse/PIG-1251
 Project: Pig
  Issue Type: Bug
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: pig-1251.patch


 In LSR Pig does Input Output Validation by calling hadoop's checkSpecs() A 
 storefunc might need schema to do such a validation. So, we should call 
 checkSchema() before doing the validation. checkSchema() in turn requires 
 SortInfo which is calculated later in compilation phase. We need to move it 
 earlier in compilation phase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields)

2010-02-26 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839008#action_12839008
 ] 

Daniel Dai commented on PIG-1259:
-

+1

 ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as 
 its only sub field  (the tuple itself can have a schema with  1 subfields)
 -

 Key: PIG-1259
 URL: https://issues.apache.org/jira/browse/PIG-1259
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1259-2.patch, PIG-1259.patch


 Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in 
 the ResourceSchema with a subschema containing anything other than a tuple. 
 The tuple itself can have a schema with  1 subfields. This check should also 
  be enforced in ResourceFieldSchema.setSchema()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1262) Additional findbugs and javac warnings

2010-02-25 Thread Daniel Dai (JIRA)
Additional findbugs and javac warnings
--

 Key: PIG-1262
 URL: https://issues.apache.org/jira/browse/PIG-1262
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0


After a while, we have introduced some new findbugs and javacc warnings. Will 
fix them in this Jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1262) Additional findbugs and javac warnings

2010-02-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1262:


Status: Patch Available  (was: Open)

 Additional findbugs and javac warnings
 --

 Key: PIG-1262
 URL: https://issues.apache.org/jira/browse/PIG-1262
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1262-1.patch


 After a while, we have introduced some new findbugs and javacc warnings. Will 
 fix them in this Jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1262) Additional findbugs and javac warnings

2010-02-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1262:


Attachment: PIG-1262-1.patch

 Additional findbugs and javac warnings
 --

 Key: PIG-1262
 URL: https://issues.apache.org/jira/browse/PIG-1262
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1262-1.patch


 After a while, we have introduced some new findbugs and javacc warnings. Will 
 fix them in this Jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect

2010-02-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-613:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed.

 Casting elements inside a tuple does not take effect
 

 Key: PIG-613
 URL: https://issues.apache.org/jira/browse/PIG-613
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: myfloatdata.txt, PIG-613-1.patch, PIG-613-2.patch, 
 SQUARE.java


 Consider the following Pig script which casts return values of the SQUARE UDF 
 which are  tuples of doubles to long. The describe output of B shows it is 
 long, however the result is still double.
 {code}
 register statistics.jar;
 A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double);
 B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as 
 squares:(loadtimesq);
 describe B;
 explain B;
 dump B;
 {code}
 ===
 Describe output of B:
 B: {squares: (loadtimesq: long)}
 ===
 Sample output of B:
 ((7885.44))
 ((792098.2200010001))
 ((1497360.9268889998))
 ((50023.7956))
 ((0.972196))
 ((0.30980356))
 ((9.9760144E-7))
 ===
 Cause: The cast for Tuples has not been implemented in POCast.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1255) Tiny code cleanup for serialization code for PigSplit

2010-02-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1255:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

No test included since this patch does not include any new feature.  Patch 
committed.

 Tiny code cleanup for serialization code for PigSplit
 -

 Key: PIG-1255
 URL: https://issues.apache.org/jira/browse/PIG-1255
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1255-1.patch, PIG-1255-2.patch


 A bug which close output stream while serialization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-613) Casting complex type(tuple/bag/map) does not take effect

2010-02-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-613:
---

Summary: Casting complex type(tuple/bag/map) does not take effect  (was: 
Casting elements inside a tuple does not take effect)

 Casting complex type(tuple/bag/map) does not take effect
 

 Key: PIG-613
 URL: https://issues.apache.org/jira/browse/PIG-613
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: myfloatdata.txt, PIG-613-1.patch, PIG-613-2.patch, 
 SQUARE.java


 Consider the following Pig script which casts return values of the SQUARE UDF 
 which are  tuples of doubles to long. The describe output of B shows it is 
 long, however the result is still double.
 {code}
 register statistics.jar;
 A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double);
 B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as 
 squares:(loadtimesq);
 describe B;
 explain B;
 dump B;
 {code}
 ===
 Describe output of B:
 B: {squares: (loadtimesq: long)}
 ===
 Sample output of B:
 ((7885.44))
 ((792098.2200010001))
 ((1497360.9268889998))
 ((50023.7956))
 ((0.972196))
 ((0.30980356))
 ((9.9760144E-7))
 ===
 Cause: The cast for Tuples has not been implemented in POCast.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1255) Tiny code cleanup for serialization code for PigSplit

2010-02-23 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1255:


Status: Patch Available  (was: Open)

 Tiny code cleanup for serialization code for PigSplit
 -

 Key: PIG-1255
 URL: https://issues.apache.org/jira/browse/PIG-1255
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1255-1.patch


 A bug which close output stream while serialization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect

2010-02-23 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-613:
---

Status: Patch Available  (was: Open)

 Casting elements inside a tuple does not take effect
 

 Key: PIG-613
 URL: https://issues.apache.org/jira/browse/PIG-613
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: myfloatdata.txt, PIG-613-1.patch, PIG-613-2.patch, 
 SQUARE.java


 Consider the following Pig script which casts return values of the SQUARE UDF 
 which are  tuples of doubles to long. The describe output of B shows it is 
 long, however the result is still double.
 {code}
 register statistics.jar;
 A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double);
 B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as 
 squares:(loadtimesq);
 describe B;
 explain B;
 dump B;
 {code}
 ===
 Describe output of B:
 B: {squares: (loadtimesq: long)}
 ===
 Sample output of B:
 ((7885.44))
 ((792098.2200010001))
 ((1497360.9268889998))
 ((50023.7956))
 ((0.972196))
 ((0.30980356))
 ((9.9760144E-7))
 ===
 Cause: The cast for Tuples has not been implemented in POCast.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2010-02-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837574#action_12837574
 ] 

Daniel Dai commented on PIG-1016:
-

Hi, busy,
I checked your code, seems your patch assume PIG-1016.patch checked in. If I 
understand correctly, there are inconsistency in this approach. In your code, 
you allow map value to be any type. However, internally Pig always assume map 
value to be bytearray. So Pig will choose to use PigBytesRawComparator. And you 
further modify PigBytesRawComparator to handle all data type. This logic is 
very confusing. Further, TextDataParser itself if bogus since it will guess the 
data type based on the content. 

In PIG-613, we reiterate that map value is bytearray. However, we fixed the 
code which can cast bytearray to map/tuple/bag correctly. I verified the test 
case you gave, and it works.

{code}
A= load '9.txt' as (data:map[]);
B= foreach A generate (int)(data#'a'), 
(chararray)(data#'b'),(tuple(map[]))(data#'c');
C= order B by $0;
dump C;
{code}
Result:
(1,'a',(1,2,3))
(2,'d',(1,2,3))
(3,'c',(1,2,3))

{code}
D= order B by $1;
dump D;
{code}
Result:
(1,'a',(1,2,3))
(3,'c',(1,2,3))
(2,'d',(1,2,3))

{code}
describe B;
{code}
Result:
B: {int,chararray,(map[ ])}

Do you have other use cases which PIG-613 cannot address?

 Reading in map data seems broken
 

 Key: PIG-1016
 URL: https://issues.apache.org/jira/browse/PIG-1016
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.4.0
Reporter: hc busy
 Attachments: PIG-1016.patch


 Hi, I'm trying to load a map that has a tuple for value. The read fails in 
 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
 documentation it is stated that value of the map can be any time.
 I've attached a patch that allows us to read in complex objects as value as 
 documented. I've done simple verification of loading in maps with tuple/map 
 values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2010-02-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837643#action_12837643
 ] 

Daniel Dai commented on PIG-1016:
-

Hi, busy,
Finally I think I understand what you mean. You want to write a loader and in 
the loader, you want to put whatever to the map value, right? Then I think it 
is a valid use case. What I am talking about is if you use PigStorage to load 
data, map value is always bytearray.

 Reading in map data seems broken
 

 Key: PIG-1016
 URL: https://issues.apache.org/jira/browse/PIG-1016
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.4.0
Reporter: hc busy
 Attachments: PIG-1016.patch


 Hi, I'm trying to load a map that has a tuple for value. The read fails in 
 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
 documentation it is stated that value of the map can be any time.
 I've attached a patch that allows us to read in complex objects as value as 
 documented. I've done simple verification of loading in maps with tuple/map 
 values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect

2010-02-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-613:
---

Attachment: PIG-613-1.patch

 Casting elements inside a tuple does not take effect
 

 Key: PIG-613
 URL: https://issues.apache.org/jira/browse/PIG-613
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: myfloatdata.txt, PIG-613-1.patch, SQUARE.java


 Consider the following Pig script which casts return values of the SQUARE UDF 
 which are  tuples of doubles to long. The describe output of B shows it is 
 long, however the result is still double.
 {code}
 register statistics.jar;
 A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double);
 B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as 
 squares:(loadtimesq);
 describe B;
 explain B;
 dump B;
 {code}
 ===
 Describe output of B:
 B: {squares: (loadtimesq: long)}
 ===
 Sample output of B:
 ((7885.44))
 ((792098.2200010001))
 ((1497360.9268889998))
 ((50023.7956))
 ((0.972196))
 ((0.30980356))
 ((9.9760144E-7))
 ===
 Cause: The cast for Tuples has not been implemented in POCast.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect

2010-02-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-613:
---

Status: Patch Available  (was: Open)

 Casting elements inside a tuple does not take effect
 

 Key: PIG-613
 URL: https://issues.apache.org/jira/browse/PIG-613
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: myfloatdata.txt, PIG-613-1.patch, SQUARE.java


 Consider the following Pig script which casts return values of the SQUARE UDF 
 which are  tuples of doubles to long. The describe output of B shows it is 
 long, however the result is still double.
 {code}
 register statistics.jar;
 A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double);
 B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as 
 squares:(loadtimesq);
 describe B;
 explain B;
 dump B;
 {code}
 ===
 Describe output of B:
 B: {squares: (loadtimesq: long)}
 ===
 Sample output of B:
 ((7885.44))
 ((792098.2200010001))
 ((1497360.9268889998))
 ((50023.7956))
 ((0.972196))
 ((0.30980356))
 ((9.9760144E-7))
 ===
 Cause: The cast for Tuples has not been implemented in POCast.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect

2010-02-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-613:
---

Attachment: PIG-613-1.patch

 Casting elements inside a tuple does not take effect
 

 Key: PIG-613
 URL: https://issues.apache.org/jira/browse/PIG-613
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: myfloatdata.txt, PIG-613-1.patch, SQUARE.java


 Consider the following Pig script which casts return values of the SQUARE UDF 
 which are  tuples of doubles to long. The describe output of B shows it is 
 long, however the result is still double.
 {code}
 register statistics.jar;
 A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double);
 B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as 
 squares:(loadtimesq);
 describe B;
 explain B;
 dump B;
 {code}
 ===
 Describe output of B:
 B: {squares: (loadtimesq: long)}
 ===
 Sample output of B:
 ((7885.44))
 ((792098.2200010001))
 ((1497360.9268889998))
 ((50023.7956))
 ((0.972196))
 ((0.30980356))
 ((9.9760144E-7))
 ===
 Cause: The cast for Tuples has not been implemented in POCast.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect

2010-02-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-613:
---

Status: Open  (was: Patch Available)

 Casting elements inside a tuple does not take effect
 

 Key: PIG-613
 URL: https://issues.apache.org/jira/browse/PIG-613
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: myfloatdata.txt, PIG-613-1.patch, SQUARE.java


 Consider the following Pig script which casts return values of the SQUARE UDF 
 which are  tuples of doubles to long. The describe output of B shows it is 
 long, however the result is still double.
 {code}
 register statistics.jar;
 A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double);
 B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as 
 squares:(loadtimesq);
 describe B;
 explain B;
 dump B;
 {code}
 ===
 Describe output of B:
 B: {squares: (loadtimesq: long)}
 ===
 Sample output of B:
 ((7885.44))
 ((792098.2200010001))
 ((1497360.9268889998))
 ((50023.7956))
 ((0.972196))
 ((0.30980356))
 ((9.9760144E-7))
 ===
 Cause: The cast for Tuples has not been implemented in POCast.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect

2010-02-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-613:
---

Attachment: (was: PIG-613-1.patch)

 Casting elements inside a tuple does not take effect
 

 Key: PIG-613
 URL: https://issues.apache.org/jira/browse/PIG-613
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: myfloatdata.txt, PIG-613-1.patch, SQUARE.java


 Consider the following Pig script which casts return values of the SQUARE UDF 
 which are  tuples of doubles to long. The describe output of B shows it is 
 long, however the result is still double.
 {code}
 register statistics.jar;
 A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double);
 B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as 
 squares:(loadtimesq);
 describe B;
 explain B;
 dump B;
 {code}
 ===
 Describe output of B:
 B: {squares: (loadtimesq: long)}
 ===
 Sample output of B:
 ((7885.44))
 ((792098.2200010001))
 ((1497360.9268889998))
 ((50023.7956))
 ((0.972196))
 ((0.30980356))
 ((9.9760144E-7))
 ===
 Cause: The cast for Tuples has not been implemented in POCast.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect

2010-02-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-613:
---

Status: Patch Available  (was: Open)

 Casting elements inside a tuple does not take effect
 

 Key: PIG-613
 URL: https://issues.apache.org/jira/browse/PIG-613
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: myfloatdata.txt, PIG-613-1.patch, SQUARE.java


 Consider the following Pig script which casts return values of the SQUARE UDF 
 which are  tuples of doubles to long. The describe output of B shows it is 
 long, however the result is still double.
 {code}
 register statistics.jar;
 A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double);
 B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as 
 squares:(loadtimesq);
 describe B;
 explain B;
 dump B;
 {code}
 ===
 Describe output of B:
 B: {squares: (loadtimesq: long)}
 ===
 Sample output of B:
 ((7885.44))
 ((792098.2200010001))
 ((1497360.9268889998))
 ((50023.7956))
 ((0.972196))
 ((0.30980356))
 ((9.9760144E-7))
 ===
 Cause: The cast for Tuples has not been implemented in POCast.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2010-02-22 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837106#action_12837106
 ] 

Daniel Dai commented on PIG-1016:
-

This issue should be fixed as part of the effort in 
[PIG-613|https://issues.apache.org/jira/browse/PIG-613]. hc busy, can you check 
if that patch address your issue?

 Reading in map data seems broken
 

 Key: PIG-1016
 URL: https://issues.apache.org/jira/browse/PIG-1016
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.4.0
Reporter: hc busy
 Attachments: PIG-1016.patch


 Hi, I'm trying to load a map that has a tuple for value. The read fails in 
 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
 documentation it is stated that value of the map can be any time.
 I've attached a patch that allows us to read in complex objects as value as 
 documented. I've done simple verification of loading in maps with tuple/map 
 values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1247) Error Number makes it hard to debug: ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error

2010-02-19 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836107#action_12836107
 ] 

Daniel Dai commented on PIG-1247:
-

This error handling code is hard coded by javacc. Seems we do not have a way to 
get around currently.

 Error Number makes it hard to debug: ERROR 2999: Unexpected internal error. 
 org.apache.pig.backend.datastorage.DataStorageException cannot be cast to 
 java.lang.Error
 -

 Key: PIG-1247
 URL: https://issues.apache.org/jira/browse/PIG-1247
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.7.0


 I have a large script in which there are intermediate stores statements, one 
 of them writes to a directory I do not have permission to write to. 
 The stack trace I get from Pig is this:
 2010-02-20 02:16:32,055 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2999: Unexpected internal error. 
 org.apache.pig.backend.datastorage.DataStorageException cannot be cast to 
 java.lang.Error
 Details at logfile: /home/viraj/pig_1266632145355.log
 Pig Stack Trace
 ---
 ERROR 2999: Unexpected internal error. 
 org.apache.pig.backend.datastorage.DataStorageException cannot be cast to 
 java.lang.Error
 java.lang.ClassCastException: 
 org.apache.pig.backend.datastorage.DataStorageException cannot be cast to 
 java.lang.Error
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.StoreClause(QueryParser.java:3583)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1407)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:949)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:762)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1036)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:986)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:386)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:720)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:386)
 
 The only way to find the error was to look at the javacc generated 
 QueryParser.java code and do a System.out.println()
 Here is a script to reproduce the problem:
 {code}
 A = load '/user/viraj/three.txt' using PigStorage();
 B = foreach A generate ['a'#'12'] as b:map[] ;
 store B into '/user/secure/pigtest' using PigStorage();
 {code}
 three.txt has 3 lines which contain nothing but the number 1.
 {code}
 $ hadoop fs -ls /user/secure/
 ls: could not get get listing for 'hdfs://mynamenode/user/secure' : 
 org.apache.hadoop.security.AccessControlException: Permission denied: 
 user=viraj, access=READ_EXECUTE, inode=secure:secure:users:rwx--
 {code}
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1169) Top-N queries produce incorrect results when a store statement is added between order by and limit statement

2010-02-17 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834966#action_12834966
 ] 

Daniel Dai commented on PIG-1169:
-

+1

 Top-N queries produce incorrect results when a store statement is added 
 between order by and limit statement
 

 Key: PIG-1169
 URL: https://issues.apache.org/jira/browse/PIG-1169
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1169.patch


 ??We tried to get top N results after a groupby and sort, and got different 
 results with or without storing the full sorted results. Here is a skeleton 
 of our pig script.??
 {code}
 raw_data = Load 'input_files' AS (f1, f2, ..., fn);
 grouped = group raw_data by (f1, f2);
 data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value;
 ordered = order data by value DESC parallel 10;
 topn = limit ordered 10;
 store ordered into 'outputdir/full';
 store topn into 'outputdir/topn';
 {code}
 ??With the statement 'store ordered ...', top N results are incorrect, but 
 without the statement, results are correct. Has anyone seen this before? I 
 know a similar bug has been fixed in the multi-query release. We are on pig 
 .4 and hadoop .20.1.??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1238) Dump does not respect the schema

2010-02-17 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835086#action_12835086
 ] 

Daniel Dai commented on PIG-1238:
-

Do an explain, the last limit job is :

MapReduce node 1-99
Map Plan
Local Rearrange[tuple]{double}(false) - 1-103
|   |
|   Project[double][1] - 1-102
|
|---Limit - 1-101
|

|---Load(file:/tmp/temp-513510662/tmp1311900615:org.apache.pig.builtin.BinStorage)
 - 1-100
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-109
|
|---Limit - 1-108
|
|---New For Each(true)[bag] - 1-107
|   |
|   Project[tuple][1] - 1-106
|
|---Package[tuple]{double} - 1-105
Global sort: false

The project in the map plan is wrong.

 Dump does not respect the schema
 

 Key: PIG-1238
 URL: https://issues.apache.org/jira/browse/PIG-1238
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur

 For complex data type and certain sequence of operations dump produces 
 results with non-existent field in the relation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1238) Dump does not respect the schema

2010-02-16 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834422#action_12834422
 ] 

Daniel Dai commented on PIG-1238:
-

Hi, Ankur, I encounter syntax error in B = FOREACH A GENERATE 'a'#'12' as 
b:map[], ['b'#'c'#'12'] as mapFields;. Can you verity the script?

 Dump does not respect the schema
 

 Key: PIG-1238
 URL: https://issues.apache.org/jira/browse/PIG-1238
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur

 For complex data type and certain sequence of operations dump produces 
 results with non-existent field in the relation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases

2010-02-12 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833129#action_12833129
 ] 

Daniel Dai commented on PIG-1231:
-

Seems hudson is not running the testing process at all. Manual test success in 
both trunk and 0.6 branch. Did not include new testcase since it is a fix to 
existing testcase.

 Default DataBagIterator.hasNext() should be idempotent in all cases
 ---

 Key: PIG-1231
 URL: https://issues.apache.org/jira/browse/PIG-1231
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1231-1.patch, PIG-1231-2.patch


 DefaultDataBagIterator.hasNext() is not repeatable when the below conditions 
 met:
 1. There is no more tuple in the last spill file
 2. There is no tuples in memory (all contents are spilled to files)
 This is not acceptable cuz the name hasNext() implies that it is idempotent. 
 In BagFormat, we do misuse DataBagIterator.hasNext() because of the 
 assumption that hasNext() is always idempotent, which leads to some 
 mysterious errors. 
 Condition 2 seems to be very restrictive, but when the databag is really big, 
 the memory can hold less than a couple of tuples, the chance to hit 2. is 
 high enough.
 Here is one error we saw:
 Caused by: java.io.IOException: Stream closed
 at 
 java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 at java.io.DataInputStream.readByte(DataInputStream.java:248)
 at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
 at 
 org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
 ... 20 more
 This happens because: we call hasNext(), which reach EOF and we close the 
 file. Then we call hasNext() again in the assumption that it is idempotent. 
 However, the stream is closed so we get this error message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases

2010-02-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1231:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to both trunk and 0.6 branch.

 Default DataBagIterator.hasNext() should be idempotent in all cases
 ---

 Key: PIG-1231
 URL: https://issues.apache.org/jira/browse/PIG-1231
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1231-1.patch, PIG-1231-2.patch


 DefaultDataBagIterator.hasNext() is not repeatable when the below conditions 
 met:
 1. There is no more tuple in the last spill file
 2. There is no tuples in memory (all contents are spilled to files)
 This is not acceptable cuz the name hasNext() implies that it is idempotent. 
 In BagFormat, we do misuse DataBagIterator.hasNext() because of the 
 assumption that hasNext() is always idempotent, which leads to some 
 mysterious errors. 
 Condition 2 seems to be very restrictive, but when the databag is really big, 
 the memory can hold less than a couple of tuples, the chance to hit 2. is 
 high enough.
 Here is one error we saw:
 Caused by: java.io.IOException: Stream closed
 at 
 java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 at java.io.DataInputStream.readByte(DataInputStream.java:248)
 at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
 at 
 org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
 ... 20 more
 This happens because: we call hasNext(), which reach EOF and we close the 
 file. Then we call hasNext() again in the assumption that it is idempotent. 
 However, the stream is closed so we get this error message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases

2010-02-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1231:


Hadoop Flags:   (was: [Reviewed])
  Status: Patch Available  (was: Reopened)

 Default DataBagIterator.hasNext() should be idempotent in all cases
 ---

 Key: PIG-1231
 URL: https://issues.apache.org/jira/browse/PIG-1231
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1231-1.patch, PIG-1231-2.patch


 DefaultDataBagIterator.hasNext() is not repeatable when the below conditions 
 met:
 1. There is no more tuple in the last spill file
 2. There is no tuples in memory (all contents are spilled to files)
 This is not acceptable cuz the name hasNext() implies that it is idempotent. 
 In BagFormat, we do misuse DataBagIterator.hasNext() because of the 
 assumption that hasNext() is always idempotent, which leads to some 
 mysterious errors. 
 Condition 2 seems to be very restrictive, but when the databag is really big, 
 the memory can hold less than a couple of tuples, the chance to hit 2. is 
 high enough.
 Here is one error we saw:
 Caused by: java.io.IOException: Stream closed
 at 
 java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 at java.io.DataInputStream.readByte(DataInputStream.java:248)
 at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
 at 
 org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
 ... 20 more
 This happens because: we call hasNext(), which reach EOF and we close the 
 file. Then we call hasNext() again in the assumption that it is idempotent. 
 However, the stream is closed so we get this error message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases

2010-02-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1231:


Attachment: PIG-1231-2.patch

 Default DataBagIterator.hasNext() should be idempotent in all cases
 ---

 Key: PIG-1231
 URL: https://issues.apache.org/jira/browse/PIG-1231
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1231-1.patch, PIG-1231-2.patch


 DefaultDataBagIterator.hasNext() is not repeatable when the below conditions 
 met:
 1. There is no more tuple in the last spill file
 2. There is no tuples in memory (all contents are spilled to files)
 This is not acceptable cuz the name hasNext() implies that it is idempotent. 
 In BagFormat, we do misuse DataBagIterator.hasNext() because of the 
 assumption that hasNext() is always idempotent, which leads to some 
 mysterious errors. 
 Condition 2 seems to be very restrictive, but when the databag is really big, 
 the memory can hold less than a couple of tuples, the chance to hit 2. is 
 high enough.
 Here is one error we saw:
 Caused by: java.io.IOException: Stream closed
 at 
 java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 at java.io.DataInputStream.readByte(DataInputStream.java:248)
 at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
 at 
 org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
 ... 20 more
 This happens because: we call hasNext(), which reach EOF and we close the 
 file. Then we call hasNext() again in the assumption that it is idempotent. 
 However, the stream is closed so we get this error message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases

2010-02-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reopened PIG-1231:
-


There is unit case failure in 0.6 branch.

 Default DataBagIterator.hasNext() should be idempotent in all cases
 ---

 Key: PIG-1231
 URL: https://issues.apache.org/jira/browse/PIG-1231
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1231-1.patch, PIG-1231-2.patch


 DefaultDataBagIterator.hasNext() is not repeatable when the below conditions 
 met:
 1. There is no more tuple in the last spill file
 2. There is no tuples in memory (all contents are spilled to files)
 This is not acceptable cuz the name hasNext() implies that it is idempotent. 
 In BagFormat, we do misuse DataBagIterator.hasNext() because of the 
 assumption that hasNext() is always idempotent, which leads to some 
 mysterious errors. 
 Condition 2 seems to be very restrictive, but when the databag is really big, 
 the memory can hold less than a couple of tuples, the chance to hit 2. is 
 high enough.
 Here is one error we saw:
 Caused by: java.io.IOException: Stream closed
 at 
 java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 at java.io.DataInputStream.readByte(DataInputStream.java:248)
 at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
 at 
 org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
 ... 20 more
 This happens because: we call hasNext(), which reach EOF and we close the 
 file. Then we call hasNext() again in the assumption that it is idempotent. 
 However, the stream is closed so we get this error message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1190) Handling of quoted strings in pig-latin/grunt commands

2010-02-11 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832805#action_12832805
 ] 

Daniel Dai commented on PIG-1190:
-

+1 for the new change.

 Handling of quoted strings in pig-latin/grunt commands
 --

 Key: PIG-1190
 URL: https://issues.apache.org/jira/browse/PIG-1190
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: correct-testcase.patch, pig-1190.patch, pig-1190_1.patch


 There is some inconsistency in the way quoted strings are used/handled in 
 pig-latin .
 In load/store and define-ship commands, files are specified in quoted strings 
 , and the file name is the content within the quotes.  But in case of 
 register, set, and file system commands , if string is specified in quotes, 
 the quotes are also included as part of the string. This is not only 
 inconsistent , it is also unintuitive. 
 This is also inconsistent with the way hdfs commandline (or bash shell) 
 interpret file names.
 For example, currently with the command - 
 set job.name 'job123'
 The job name set set to 'job123' (including the quotes) not job123 .
 This needs to be fixed, and above command should be considered equivalent to 
 - set job.name job123. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1169) Top-N queries produce incorrect results when a store statement is added between order by and limit statement

2010-02-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-1169:
---

Assignee: Richard Ding  (was: Daniel Dai)

 Top-N queries produce incorrect results when a store statement is added 
 between order by and limit statement
 

 Key: PIG-1169
 URL: https://issues.apache.org/jira/browse/PIG-1169
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0


 ??We tried to get top N results after a groupby and sort, and got different 
 results with or without storing the full sorted results. Here is a skeleton 
 of our pig script.??
 {code}
 raw_data = Load 'input_files' AS (f1, f2, ..., fn);
 grouped = group raw_data by (f1, f2);
 data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value;
 ordered = order data by value DESC parallel 10;
 topn = limit ordered 10;
 store ordered into 'outputdir/full';
 store topn into 'outputdir/topn';
 {code}
 ??With the statement 'store ordered ...', top N results are incorrect, but 
 without the statement, results are correct. Has anyone seen this before? I 
 know a similar bug has been fixed in the multi-query release. We are on pig 
 .4 and hadoop .20.1.??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases

2010-02-09 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831558#action_12831558
 ] 

Daniel Dai commented on PIG-1231:
-

testCompressed1: java.lang.IllegalArgumentException: port out of range:-1. Not 
a real problem. Manual test passes.

 Default DataBagIterator.hasNext() should be idempotent in all cases
 ---

 Key: PIG-1231
 URL: https://issues.apache.org/jira/browse/PIG-1231
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1231-1.patch


 DefaultDataBagIterator.hasNext() is not repeatable when the below conditions 
 met:
 1. There is no more tuple in the last spill file
 2. There is no tuples in memory (all contents are spilled to files)
 This is not acceptable cuz the name hasNext() implies that it is idempotent. 
 In BagFormat, we do misuse DataBagIterator.hasNext() because of the 
 assumption that hasNext() is always idempotent, which leads to some 
 mysterious errors. 
 Condition 2 seems to be very restrictive, but when the databag is really big, 
 the memory can hold less than a couple of tuples, the chance to hit 2. is 
 high enough.
 Here is one error we saw:
 Caused by: java.io.IOException: Stream closed
 at 
 java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 at java.io.DataInputStream.readByte(DataInputStream.java:248)
 at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
 at 
 org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
 ... 20 more
 This happens because: we call hasNext(), which reach EOF and we close the 
 file. Then we call hasNext() again in the assumption that it is idempotent. 
 However, the stream is closed so we get this error message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases

2010-02-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1231:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to both trunk and 0.6 branch.

 Default DataBagIterator.hasNext() should be idempotent in all cases
 ---

 Key: PIG-1231
 URL: https://issues.apache.org/jira/browse/PIG-1231
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1231-1.patch


 DefaultDataBagIterator.hasNext() is not repeatable when the below conditions 
 met:
 1. There is no more tuple in the last spill file
 2. There is no tuples in memory (all contents are spilled to files)
 This is not acceptable cuz the name hasNext() implies that it is idempotent. 
 In BagFormat, we do misuse DataBagIterator.hasNext() because of the 
 assumption that hasNext() is always idempotent, which leads to some 
 mysterious errors. 
 Condition 2 seems to be very restrictive, but when the databag is really big, 
 the memory can hold less than a couple of tuples, the chance to hit 2. is 
 high enough.
 Here is one error we saw:
 Caused by: java.io.IOException: Stream closed
 at 
 java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 at java.io.DataInputStream.readByte(DataInputStream.java:248)
 at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
 at 
 org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
 ... 20 more
 This happens because: we call hasNext(), which reach EOF and we close the 
 file. Then we call hasNext() again in the assumption that it is idempotent. 
 However, the stream is closed so we get this error message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1231) DataBagIterator.hasNext() should be idempotent

2010-02-08 Thread Daniel Dai (JIRA)
DataBagIterator.hasNext() should be idempotent
--

 Key: PIG-1231
 URL: https://issues.apache.org/jira/browse/PIG-1231
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0


Current implementation of DataBagIterator.hasNext() will actually fetch the 
next tuple every time. So if we call hasNext() consecutively, more than 1 
tuples will be fetched. This is confusing cuz the name hasNext() implies that 
it is idempotent. In BagFormat, we do misuse DataBagIterator.hasNext() because 
of this, which leads to some mysterious errors. Here is one error we saw:

Caused by: java.io.IOException: Stream closed
at 
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readByte(DataInputStream.java:248)
at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
at 
org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
... 20 more

This happens because: we call hasNext(), which reach EOF and we close the file. 
Then we call hasNext() again in the assumption that it is idempotent. However, 
the stream is closed so we get this error message.

This fix will go to DefaultDataBagIterator, DistinctDataBagIterator, 
CachedBagIterator, SortedDataBagIterator. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1231) DataBagIterator.hasNext() should be idempotent

2010-02-08 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1231:


Status: Patch Available  (was: Open)

 DataBagIterator.hasNext() should be idempotent
 --

 Key: PIG-1231
 URL: https://issues.apache.org/jira/browse/PIG-1231
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1231-1.patch


 DataBagIterator.hasNext() is not repeatable in some situations. This is not 
 acceptable cuz the name hasNext() implies that it is idempotent. While 
 hasNext() returns true, it is repeatable, but if hasNext() returns false, it 
 is not. In BagFormat, we do misuse DataBagIterator.hasNext() because of the 
 assumption that hasNext() is always idempotent, which leads to some 
 mysterious errors. Here is one error we saw:
 Caused by: java.io.IOException: Stream closed
 at 
 java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 at java.io.DataInputStream.readByte(DataInputStream.java:248)
 at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
 at 
 org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
 ... 20 more
 This happens because: we call hasNext(), which reach EOF and we close the 
 file. Then we call hasNext() again in the assumption that it is idempotent. 
 However, the stream is closed so we get this error message.
 This fix will go to DefaultDataBagIterator, DistinctDataBagIterator, 
 CachedBagIterator, SortedDataBagIterator. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1231) DataBagIterator.hasNext() should be idempotent

2010-02-08 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1231:


Attachment: PIG-1231-1.patch

DefaultDataBagIterator is the only DataBag has this problem. Other databag 
handles this through different mechanisms. 

 DataBagIterator.hasNext() should be idempotent
 --

 Key: PIG-1231
 URL: https://issues.apache.org/jira/browse/PIG-1231
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1231-1.patch


 DataBagIterator.hasNext() is not repeatable in some situations. This is not 
 acceptable cuz the name hasNext() implies that it is idempotent. While 
 hasNext() returns true, it is repeatable, but if hasNext() returns false, it 
 is not. In BagFormat, we do misuse DataBagIterator.hasNext() because of the 
 assumption that hasNext() is always idempotent, which leads to some 
 mysterious errors. Here is one error we saw:
 Caused by: java.io.IOException: Stream closed
 at 
 java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 at java.io.DataInputStream.readByte(DataInputStream.java:248)
 at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
 at 
 org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
 ... 20 more
 This happens because: we call hasNext(), which reach EOF and we close the 
 file. Then we call hasNext() again in the assumption that it is idempotent. 
 However, the stream is closed so we get this error message.
 This fix will go to DefaultDataBagIterator, DistinctDataBagIterator, 
 CachedBagIterator, SortedDataBagIterator. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases

2010-02-08 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1231:


Description: 
DefaultDataBagIterator.hasNext() is not repeatable when the below conditions 
met:
1. There is no more tuple in the last spill file
2. There is no tuples in memory (all contents are spilled to files)

This is not acceptable cuz the name hasNext() implies that it is idempotent. In 
BagFormat, we do misuse DataBagIterator.hasNext() because of the assumption 
that hasNext() is always idempotent, which leads to some mysterious errors. 

Condition 2 seems to be very restrictive, but when the databag is really big, 
the memory can hold less than a couple of tuples, the chance to hit 2. is high 
enough.

Here is one error we saw:

Caused by: java.io.IOException: Stream closed
at 
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readByte(DataInputStream.java:248)
at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
at 
org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
... 20 more

This happens because: we call hasNext(), which reach EOF and we close the file. 
Then we call hasNext() again in the assumption that it is idempotent. However, 
the stream is closed so we get this error message.

  was:
DataBagIterator.hasNext() is not repeatable in some situations. This is not 
acceptable cuz the name hasNext() implies that it is idempotent. While 
hasNext() returns true, it is repeatable, but if hasNext() returns false, it is 
not. In BagFormat, we do misuse DataBagIterator.hasNext() because of the 
assumption that hasNext() is always idempotent, which leads to some mysterious 
errors. Here is one error we saw:

Caused by: java.io.IOException: Stream closed
at 
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readByte(DataInputStream.java:248)
at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
at 
org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
... 20 more

This happens because: we call hasNext(), which reach EOF and we close the file. 
Then we call hasNext() again in the assumption that it is idempotent. However, 
the stream is closed so we get this error message.

This fix will go to DefaultDataBagIterator, DistinctDataBagIterator, 
CachedBagIterator, SortedDataBagIterator. 

Summary: Default DataBagIterator.hasNext() should be idempotent in all 
cases  (was: DataBagIterator.hasNext() should be idempotent)

 Default DataBagIterator.hasNext() should be idempotent in all cases
 ---

 Key: PIG-1231
 URL: https://issues.apache.org/jira/browse/PIG-1231
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1231-1.patch


 DefaultDataBagIterator.hasNext() is not repeatable when the below conditions 
 met:
 1. There is no more tuple in the last spill file
 2. There is no tuples in memory (all contents are spilled to files)
 This is not acceptable cuz the name hasNext() implies that it is idempotent. 
 In BagFormat, we do misuse DataBagIterator.hasNext() because of the 
 assumption that hasNext() is always idempotent, which leads to some 
 mysterious errors. 
 Condition 2 seems to be very restrictive, but when the databag is really big, 
 the memory can hold less than a couple of tuples, the chance to hit 2. is 
 high enough.
 Here is one error we saw:
 Caused by: java.io.IOException: Stream closed
 at 
 java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 at java.io.DataInputStream.readByte(DataInputStream.java:248)
 at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
 at 
 org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
 ... 20 more
 This happens because: we call hasNext(), which reach EOF and we close the 
 file. Then we call hasNext() again in the assumption that it is idempotent. 
 However, the stream is closed so we get this error message.

-- 
This message is automatically generated 

[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-02-05 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: PIG-1090-22.patch

Two bug fix:
1. Cycle in plan if load and store location are the same
2. relToAbsPathForStoreLocation is not called using pig API directly not using 
Grunt.

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, 
 PIG-1090-20.patch, PIG-1090-21.patch, PIG-1090-22.patch, PIG-1090-3.patch, 
 PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, 
 PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-05 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1219:


Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Will go for distributed cache approach 
(https://issues.apache.org/jira/browse/PIG-1218). This patch is no longer 
needed then.

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1219-1.patch, PIG-1219-2.patch, PIG-1219-3.patch


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1219:


Status: Open  (was: Patch Available)

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1219-1.patch, PIG-1219-2.patch


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1219:


Attachment: PIG-1219-3.patch

The test failure is because the way we test it, not the core code. We now 
require the quantile file to be created before we run JobControlCompiler. In 
our testcase, we invoke the methods of JobControlCompiler directly without 
actually running the job, so we do not have quantile file when we get into 
JobControlCompiler. Change testcase to force create the quantile file.

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1219-1.patch, PIG-1219-2.patch, PIG-1219-3.patch


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1219) Extra call to the namenode in WeightedRangePartitioner

2010-02-03 Thread Daniel Dai (JIRA)
Extra call to the namenode in WeightedRangePartitioner
--

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0


We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile 
file. openDFSFile internally will check the existence of the quantile file, 
which adds burden to hdfs namenode. We shall remove this extra check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-03 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1219:


Summary: Extra listStatus call to the namenode in WeightedRangePartitioner  
(was: Extra call to the namenode in WeightedRangePartitioner)

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-03 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1219:


Attachment: PIG-1219-1.patch

I am still testing with the patch. Attach it first so other committers can 
review.

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1219-1.patch


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-03 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1219:


Status: Patch Available  (was: Open)

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1219-1.patch


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-03 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1219:


Status: Open  (was: Patch Available)

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1219-1.patch, PIG-1219-2.patch


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-03 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1219:


Attachment: PIG-1219-2.patch

Thanks Richard. Post updated patch.

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1219-1.patch, PIG-1219-2.patch


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-03 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1219:


Status: Patch Available  (was: Open)

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1219-1.patch, PIG-1219-2.patch


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-02-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: PIG-1090-20.patch

Fix one bug in MergeJoin when index has only one entry.

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, 
 PIG-1090-20.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, 
 PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, 
 PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-02-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: (was: PIG-1090-20.patch)

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, 
 PIG-1090-20.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, 
 PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, 
 PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-02-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: PIG-1090-20.patch

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, 
 PIG-1090-20.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, 
 PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, 
 PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null

2010-01-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1212:


Status: Patch Available  (was: Open)

 LogicalPlan.replaceAndAddSucessors produce wrong result when successors are 
 null
 

 Key: PIG-1212
 URL: https://issues.apache.org/jira/browse/PIG-1212
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1212-1.patch


 The following script throw a NPE:
 a = load '1.txt' as (a0:chararray);
 b = load '2.txt' as (b0:chararray);
 c = join a by a0, b by b0;
 d = filter c by a0 == 'a';
 explain d;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null

2010-01-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1212:


Attachment: PIG-1212-1.patch

 LogicalPlan.replaceAndAddSucessors produce wrong result when successors are 
 null
 

 Key: PIG-1212
 URL: https://issues.apache.org/jira/browse/PIG-1212
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1212-1.patch


 The following script throw a NPE:
 a = load '1.txt' as (a0:chararray);
 b = load '2.txt' as (b0:chararray);
 c = join a by a0, b by b0;
 d = filter c by a0 == 'a';
 explain d;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1210) fieldsToRead send the same fields more than once in some cases

2010-01-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1210:


Status: Patch Available  (was: Open)

Attach patch with test case.

 fieldsToRead send the same fields more than once in some cases
 --

 Key: PIG-1210
 URL: https://issues.apache.org/jira/browse/PIG-1210
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1210-1.patch, PIG-1210-2.patch


 This bug will happen if the following condition meet:
 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only 
 LoadFunc we notice now is Zebra.
 2. The first item in FOREACH statement contains reference to the same input 
 more than once.
 For example, the following script will be affected:
 a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0');
 b = foreach a generate a0+a0;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1210) fieldsToRead send the same fields more than once in some cases

2010-01-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1210:


Status: Open  (was: Patch Available)

 fieldsToRead send the same fields more than once in some cases
 --

 Key: PIG-1210
 URL: https://issues.apache.org/jira/browse/PIG-1210
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1210-1.patch, PIG-1210-2.patch


 This bug will happen if the following condition meet:
 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only 
 LoadFunc we notice now is Zebra.
 2. The first item in FOREACH statement contains reference to the same input 
 more than once.
 For example, the following script will be affected:
 a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0');
 b = foreach a generate a0+a0;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-29 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806468#action_12806468
 ] 

Daniel Dai commented on PIG-1090:
-

PIG-1090-19.patch committed.

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, 
 PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, 
 PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null

2010-01-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1212:


Attachment: PIG-1212-2.patch

Address Richard's comment.

 LogicalPlan.replaceAndAddSucessors produce wrong result when successors are 
 null
 

 Key: PIG-1212
 URL: https://issues.apache.org/jira/browse/PIG-1212
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1212-1.patch, PIG-1212-2.patch


 The following script throw a NPE:
 a = load '1.txt' as (a0:chararray);
 b = load '2.txt' as (b0:chararray);
 c = join a by a0, b by b0;
 d = filter c by a0 == 'a';
 explain d;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1213) Schema serialization is broken

2010-01-29 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806540#action_12806540
 ] 

Daniel Dai commented on PIG-1213:
-

+1. Please commit once hudson reviewed.

 Schema serialization is broken
 --

 Key: PIG-1213
 URL: https://issues.apache.org/jira/browse/PIG-1213
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: PIG-1213.patch


 Consider a udf which needs to know the schema of its input in the backend 
 while executing. To achieve this, the udf needs to store the schema into the 
 UDFContext. Internally the UDFContext will serialize the schema into the 
 jobconf. However this currently is broken and gives a Serialization exception

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1210) fieldsToRead send the same fields more than once in some cases

2010-01-29 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806557#action_12806557
 ] 

Daniel Dai commented on PIG-1210:
-

Test failure is due to java.lang.IllegalArgumentException: port out of 
range:-1. Should be an temporal one.

 fieldsToRead send the same fields more than once in some cases
 --

 Key: PIG-1210
 URL: https://issues.apache.org/jira/browse/PIG-1210
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1210-1.patch, PIG-1210-2.patch


 This bug will happen if the following condition meet:
 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only 
 LoadFunc we notice now is Zebra.
 2. The first item in FOREACH statement contains reference to the same input 
 more than once.
 For example, the following script will be affected:
 a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0');
 b = foreach a generate a0+a0;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1210) fieldsToRead send the same fields more than once in some cases

2010-01-29 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806610#action_12806610
 ] 

Daniel Dai commented on PIG-1210:
-

List is the data structure needed for the construct of RequiredFields. Yes, we 
could Set, but we need to check if any of our code assume the order within the 
list, since if we use Set, we lose the order. We can think about that in the 
new logical plan.

 fieldsToRead send the same fields more than once in some cases
 --

 Key: PIG-1210
 URL: https://issues.apache.org/jira/browse/PIG-1210
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1210-1.patch, PIG-1210-2.patch


 This bug will happen if the following condition meet:
 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only 
 LoadFunc we notice now is Zebra.
 2. The first item in FOREACH statement contains reference to the same input 
 more than once.
 For example, the following script will be affected:
 a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0');
 b = foreach a generate a0+a0;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1210) fieldsToRead send the same fields more than once in some cases

2010-01-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1210:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to both trunk and 0.6 branch.

 fieldsToRead send the same fields more than once in some cases
 --

 Key: PIG-1210
 URL: https://issues.apache.org/jira/browse/PIG-1210
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1210-1.patch, PIG-1210-2.patch


 This bug will happen if the following condition meet:
 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only 
 LoadFunc we notice now is Zebra.
 2. The first item in FOREACH statement contains reference to the same input 
 more than once.
 For example, the following script will be affected:
 a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0');
 b = foreach a generate a0+a0;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null

2010-01-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1212:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed. Thanks Richard!

 LogicalPlan.replaceAndAddSucessors produce wrong result when successors are 
 null
 

 Key: PIG-1212
 URL: https://issues.apache.org/jira/browse/PIG-1212
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1212-1.patch, PIG-1212-2.patch


 The following script throw a NPE:
 a = load '1.txt' as (a0:chararray);
 b = load '2.txt' as (b0:chararray);
 c = join a by a0, b by b0;
 d = filter c by a0 == 'a';
 explain d;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: PIG-1090-17.patch

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, 
 PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, 
 PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: PIG-1090-17.patch

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-2.patch, PIG-1090-3.patch, 
 PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, 
 PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: (was: PIG-1090-17.patch)

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-2.patch, PIG-1090-3.patch, 
 PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, 
 PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: PIG-1090-17.patch

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-2.patch, PIG-1090-3.patch, 
 PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, 
 PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: (was: PIG-1090-17.patch)

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-2.patch, PIG-1090-3.patch, 
 PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, 
 PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: PIG-1090-17.patch

Resubmit PIG-1090-17.patch to address Pradeep's comments.

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-2.patch, PIG-1090-3.patch, 
 PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, 
 PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1210) fieldsToRead send the same fields more than once in some cases

2010-01-28 Thread Daniel Dai (JIRA)
fieldsToRead send the same fields more than once in some cases
--

 Key: PIG-1210
 URL: https://issues.apache.org/jira/browse/PIG-1210
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0


This bug will happen if the following condition meet:
1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only 
LoadFunc we notice now is Zebra.
2. The first item in FOREACH statement contains reference to the same input 
more than once.

For example, the following script will be affected:
a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0');
b = foreach a generate a0+a0;


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1210) fieldsToRead send the same fields more than once in some cases

2010-01-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1210:


Attachment: PIG-1210-1.patch

 fieldsToRead send the same fields more than once in some cases
 --

 Key: PIG-1210
 URL: https://issues.apache.org/jira/browse/PIG-1210
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1210-1.patch


 This bug will happen if the following condition meet:
 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only 
 LoadFunc we notice now is Zebra.
 2. The first item in FOREACH statement contains reference to the same input 
 more than once.
 For example, the following script will be affected:
 a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0');
 b = foreach a generate a0+a0;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1210) fieldsToRead send the same fields more than once in some cases

2010-01-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1210:


Fix Version/s: (was: 0.7.0)
   0.6.0
   Status: Patch Available  (was: Open)

 fieldsToRead send the same fields more than once in some cases
 --

 Key: PIG-1210
 URL: https://issues.apache.org/jira/browse/PIG-1210
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1210-1.patch


 This bug will happen if the following condition meet:
 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only 
 LoadFunc we notice now is Zebra.
 2. The first item in FOREACH statement contains reference to the same input 
 more than once.
 For example, the following script will be affected:
 a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0');
 b = foreach a generate a0+a0;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null

2010-01-28 Thread Daniel Dai (JIRA)
LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null


 Key: PIG-1212
 URL: https://issues.apache.org/jira/browse/PIG-1212
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0


The following script through a NPE:

a = load '1.txt' as (a0:chararray);
b = load '2.txt' as (b0:chararray);
c = join a by a0, b by b0;
d = filter c by a0 == 'a';
explain d;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null

2010-01-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1212:


Description: 
The following script throw a NPE:

a = load '1.txt' as (a0:chararray);
b = load '2.txt' as (b0:chararray);
c = join a by a0, b by b0;
d = filter c by a0 == 'a';
explain d;

  was:
The following script through a NPE:

a = load '1.txt' as (a0:chararray);
b = load '2.txt' as (b0:chararray);
c = join a by a0, b by b0;
d = filter c by a0 == 'a';
explain d;


 LogicalPlan.replaceAndAddSucessors produce wrong result when successors are 
 null
 

 Key: PIG-1212
 URL: https://issues.apache.org/jira/browse/PIG-1212
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0


 The following script throw a NPE:
 a = load '1.txt' as (a0:chararray);
 b = load '2.txt' as (b0:chararray);
 c = join a by a0, b by b0;
 d = filter c by a0 == 'a';
 explain d;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1203) Temporarily disable failed unit test in load-store-redesign branch which have external dependency

2010-01-26 Thread Daniel Dai (JIRA)
Temporarily disable failed unit test in load-store-redesign branch which have 
external dependency
-

 Key: PIG-1203
 URL: https://issues.apache.org/jira/browse/PIG-1203
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0


In load-store-redesign branch, two test suits, TestHBaseStorage and 
TestCounters always fail. TestHBaseStorage depends on 
https://issues.apache.org/jira/browse/PIG-1200, TestCounters depends on future 
version of hadoop. We disable these two test suits temporarily, and will enable 
them once the dependent issues are solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1203) Temporarily disable failed unit test in load-store-redesign branch which have external dependency

2010-01-26 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1203:


Issue Type: Sub-task  (was: Bug)
Parent: PIG-966

 Temporarily disable failed unit test in load-store-redesign branch which have 
 external dependency
 -

 Key: PIG-1203
 URL: https://issues.apache.org/jira/browse/PIG-1203
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0


 In load-store-redesign branch, two test suits, TestHBaseStorage and 
 TestCounters always fail. TestHBaseStorage depends on 
 https://issues.apache.org/jira/browse/PIG-1200, TestCounters depends on 
 future version of hadoop. We disable these two test suits temporarily, and 
 will enable them once the dependent issues are solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1203) Temporarily disable failed unit test in load-store-redesign branch which have external dependency

2010-01-26 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1203:


Attachment: PIG-1203-1.patch

Patch for the load-store-redesign branch

 Temporarily disable failed unit test in load-store-redesign branch which have 
 external dependency
 -

 Key: PIG-1203
 URL: https://issues.apache.org/jira/browse/PIG-1203
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1203-1.patch


 In load-store-redesign branch, two test suits, TestHBaseStorage and 
 TestCounters always fail. TestHBaseStorage depends on 
 https://issues.apache.org/jira/browse/PIG-1200, TestCounters depends on 
 future version of hadoop. We disable these two test suits temporarily, and 
 will enable them once the dependent issues are solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect

2010-01-26 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-613:
---

Fix Version/s: 0.7.0
 Assignee: Daniel Dai

 Casting elements inside a tuple does not take effect
 

 Key: PIG-613
 URL: https://issues.apache.org/jira/browse/PIG-613
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: myfloatdata.txt, SQUARE.java


 Consider the following Pig script which casts return values of the SQUARE UDF 
 which are  tuples of doubles to long. The describe output of B shows it is 
 long, however the result is still double.
 {code}
 register statistics.jar;
 A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double);
 B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as 
 squares:(loadtimesq);
 describe B;
 explain B;
 dump B;
 {code}
 ===
 Describe output of B:
 B: {squares: (loadtimesq: long)}
 ===
 Sample output of B:
 ((7885.44))
 ((792098.2200010001))
 ((1497360.9268889998))
 ((50023.7956))
 ((0.972196))
 ((0.30980356))
 ((9.9760144E-7))
 ===
 Cause: The cast for Tuples has not been implemented in POCast.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later

2010-01-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1184:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed.

 PruneColumns optimization does not handle the case of foreach flatten 
 correctly if flattened bag is not used later
 --

 Key: PIG-1184
 URL: https://issues.apache.org/jira/browse/PIG-1184
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Pradeep Kamath
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1184-1.patch, PIG-1184-2.patch


 The following script :
 {noformat}
 -e a = load 'input.txt' as (f1:chararray, f2:chararray, 
 f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a 
 generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, 
 \$4; dump b;
 {noformat}
 gives the following result:
 (oiue,M,10)
 {noformat}
 cat input.txt:
 oiueM   {(3),(4)}   {(toronto),(montreal)}
 {noformat}
 If PruneColumns optimizations is disabled, we get the right result:
 (oiue,M,10)
 (oiue,M,10)
 (oiue,M,10)
 (oiue,M,10)
 The flatten results in 4 records - so the output should contain 4 records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: PIG-1090-15.patch

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-2.patch, 
 PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, 
 PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register

2010-01-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed.

 StoreFunc UDF should ship to the backend automatically without register
 -

 Key: PIG-1189
 URL: https://issues.apache.org/jira/browse/PIG-1189
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: multimapstore.pig, multireducestore.pig, 
 PIG-1189-1.patch, PIG-1189-2.patch, PIG-1189-3.patch, singlemapstore.pig, 
 singlereducestore.pig


 Pig should ship store UDF to backend even if user do not use register. The 
 prerequisite is that UDF should be in classpath on frontend. We make that 
 work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
 we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: (was: PIG-1090-15.patch)

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-2.patch, 
 PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, 
 PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register

2010-01-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: PIG-1189-2.patch

Attach patch to address unit failures. These failures are because we add 
storeFunc udf into udf array in MapReduceOper, so the return value for 
MapReduceOper.name() changes. Instead of fixing Golden file, I change the way 
we generate POStore in TestMRCompiler because it is inconsistent with the way 
we generate POLoad. Also I change MapReduceOper.udf from a List to a set which 
I feel more proper.

 StoreFunc UDF should ship to the backend automatically without register
 -

 Key: PIG-1189
 URL: https://issues.apache.org/jira/browse/PIG-1189
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: multimapstore.pig, multireducestore.pig, 
 PIG-1189-1.patch, PIG-1189-2.patch, singlemapstore.pig, singlereducestore.pig


 Pig should ship store UDF to backend even if user do not use register. The 
 prerequisite is that UDF should be in classpath on frontend. We make that 
 work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
 we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register

2010-01-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Status: Open  (was: Patch Available)

 StoreFunc UDF should ship to the backend automatically without register
 -

 Key: PIG-1189
 URL: https://issues.apache.org/jira/browse/PIG-1189
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: multimapstore.pig, multireducestore.pig, 
 PIG-1189-1.patch, PIG-1189-2.patch, singlemapstore.pig, singlereducestore.pig


 Pig should ship store UDF to backend even if user do not use register. The 
 prerequisite is that UDF should be in classpath on frontend. We make that 
 work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
 we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register

2010-01-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Status: Patch Available  (was: Open)

 StoreFunc UDF should ship to the backend automatically without register
 -

 Key: PIG-1189
 URL: https://issues.apache.org/jira/browse/PIG-1189
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: multimapstore.pig, multireducestore.pig, 
 PIG-1189-1.patch, PIG-1189-2.patch, singlemapstore.pig, singlereducestore.pig


 Pig should ship store UDF to backend even if user do not use register. The 
 prerequisite is that UDF should be in classpath on frontend. We make that 
 work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
 we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later

2010-01-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1184:


Hadoop Flags:   (was: [Reviewed])
  Status: Patch Available  (was: Reopened)

 PruneColumns optimization does not handle the case of foreach flatten 
 correctly if flattened bag is not used later
 --

 Key: PIG-1184
 URL: https://issues.apache.org/jira/browse/PIG-1184
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Pradeep Kamath
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1184-1.patch, PIG-1184-2.patch


 The following script :
 {noformat}
 -e a = load 'input.txt' as (f1:chararray, f2:chararray, 
 f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a 
 generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, 
 \$4; dump b;
 {noformat}
 gives the following result:
 (oiue,M,10)
 {noformat}
 cat input.txt:
 oiueM   {(3),(4)}   {(toronto),(montreal)}
 {noformat}
 If PruneColumns optimizations is disabled, we get the right result:
 (oiue,M,10)
 (oiue,M,10)
 (oiue,M,10)
 (oiue,M,10)
 The flatten results in 4 records - so the output should contain 4 records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later

2010-01-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1184:


Attachment: PIG-1184-2.patch

Fix unit test failures

 PruneColumns optimization does not handle the case of foreach flatten 
 correctly if flattened bag is not used later
 --

 Key: PIG-1184
 URL: https://issues.apache.org/jira/browse/PIG-1184
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Pradeep Kamath
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1184-1.patch, PIG-1184-2.patch


 The following script :
 {noformat}
 -e a = load 'input.txt' as (f1:chararray, f2:chararray, 
 f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a 
 generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, 
 \$4; dump b;
 {noformat}
 gives the following result:
 (oiue,M,10)
 {noformat}
 cat input.txt:
 oiueM   {(3),(4)}   {(toronto),(montreal)}
 {noformat}
 If PruneColumns optimizations is disabled, we get the right result:
 (oiue,M,10)
 (oiue,M,10)
 (oiue,M,10)
 (oiue,M,10)
 The flatten results in 4 records - so the output should contain 4 records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: PIG-1090-15.patch

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-2.patch, 
 PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, 
 PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-24 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804378#action_12804378
 ] 

Daniel Dai commented on PIG-1090:
-

PIG-1090-15.patch is store side changes in regard to StoreMetadata.storeSchema

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-2.patch, 
 PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, 
 PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register

2010-01-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: PIG-1189-3.patch

Address javac warnings.

 StoreFunc UDF should ship to the backend automatically without register
 -

 Key: PIG-1189
 URL: https://issues.apache.org/jira/browse/PIG-1189
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: multimapstore.pig, multireducestore.pig, 
 PIG-1189-1.patch, PIG-1189-2.patch, PIG-1189-3.patch, singlemapstore.pig, 
 singlereducestore.pig


 Pig should ship store UDF to backend even if user do not use register. The 
 prerequisite is that UDF should be in classpath on frontend. We make that 
 work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
 we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register

2010-01-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Status: Open  (was: Patch Available)

 StoreFunc UDF should ship to the backend automatically without register
 -

 Key: PIG-1189
 URL: https://issues.apache.org/jira/browse/PIG-1189
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: multimapstore.pig, multireducestore.pig, 
 PIG-1189-1.patch, PIG-1189-2.patch, PIG-1189-3.patch, singlemapstore.pig, 
 singlereducestore.pig


 Pig should ship store UDF to backend even if user do not use register. The 
 prerequisite is that UDF should be in classpath on frontend. We make that 
 work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
 we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register

2010-01-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Status: Patch Available  (was: Open)

 StoreFunc UDF should ship to the backend automatically without register
 -

 Key: PIG-1189
 URL: https://issues.apache.org/jira/browse/PIG-1189
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: multimapstore.pig, multireducestore.pig, 
 PIG-1189-1.patch, PIG-1189-2.patch, PIG-1189-3.patch, singlemapstore.pig, 
 singlereducestore.pig


 Pig should ship store UDF to backend even if user do not use register. The 
 prerequisite is that UDF should be in classpath on frontend. We make that 
 work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
 we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register

2010-01-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: (was: PIG-1189-1.patch)

 StoreFunc UDF should ship to the backend automatically without register
 -

 Key: PIG-1189
 URL: https://issues.apache.org/jira/browse/PIG-1189
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: multimapstore.pig, multireducestore.pig, 
 singlemapstore.pig, singlereducestore.pig


 Pig should ship store UDF to backend even if user do not use register. The 
 prerequisite is that UDF should be in classpath on frontend. We make that 
 work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
 we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register

2010-01-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Attachment: PIG-1189-1.patch

Hudson apply *.pig as the patch. Reattach to wake up hudson with the right 
patch file.

 StoreFunc UDF should ship to the backend automatically without register
 -

 Key: PIG-1189
 URL: https://issues.apache.org/jira/browse/PIG-1189
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: multimapstore.pig, multireducestore.pig, 
 PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig


 Pig should ship store UDF to backend even if user do not use register. The 
 prerequisite is that UDF should be in classpath on frontend. We make that 
 work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
 we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register

2010-01-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Status: Open  (was: Patch Available)

 StoreFunc UDF should ship to the backend automatically without register
 -

 Key: PIG-1189
 URL: https://issues.apache.org/jira/browse/PIG-1189
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: multimapstore.pig, multireducestore.pig, 
 PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig


 Pig should ship store UDF to backend even if user do not use register. The 
 prerequisite is that UDF should be in classpath on frontend. We make that 
 work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
 we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register

2010-01-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1189:


Status: Patch Available  (was: Open)

 StoreFunc UDF should ship to the backend automatically without register
 -

 Key: PIG-1189
 URL: https://issues.apache.org/jira/browse/PIG-1189
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: multimapstore.pig, multireducestore.pig, 
 PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig


 Pig should ship store UDF to backend even if user do not use register. The 
 prerequisite is that UDF should be in classpath on frontend. We make that 
 work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
 we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later

2010-01-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reopened PIG-1184:
-


Still need to address core test failures.

 PruneColumns optimization does not handle the case of foreach flatten 
 correctly if flattened bag is not used later
 --

 Key: PIG-1184
 URL: https://issues.apache.org/jira/browse/PIG-1184
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Pradeep Kamath
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1184-1.patch


 The following script :
 {noformat}
 -e a = load 'input.txt' as (f1:chararray, f2:chararray, 
 f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a 
 generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, 
 \$4; dump b;
 {noformat}
 gives the following result:
 (oiue,M,10)
 {noformat}
 cat input.txt:
 oiueM   {(3),(4)}   {(toronto),(montreal)}
 {noformat}
 If PruneColumns optimizations is disabled, we get the right result:
 (oiue,M,10)
 (oiue,M,10)
 (oiue,M,10)
 (oiue,M,10)
 The flatten results in 4 records - so the output should contain 4 records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



<    3   4   5   6   7   8   9   10   11   12   >