[jira] Created: (PIG-1270) Push limit into loader
Push limit into loader -- Key: PIG-1270 URL: https://issues.apache.org/jira/browse/PIG-1270 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai We can optimize limit operation by stopping early in PigRecordReader. In general, we need a way to communicate between PigRecordReader and execution pipeline. POLimit could instruct PigRecordReader that we have already had enough records and stop feeding more data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1271) Provide a more flexible data format to load complex field (bag/tuple/map) in PigStorage
Provide a more flexible data format to load complex field (bag/tuple/map) in PigStorage --- Key: PIG-1271 URL: https://issues.apache.org/jira/browse/PIG-1271 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai With [PIG-613|https://issues.apache.org/jira/browse/PIG-613], we are able to load txt files containing complex data type (map/bag/tuple) according to schema. However, the format of complex data field is very strict. User have to use pre-determined special characters to mark the beginning and end of each field, and those special characters can not be used in the content. The goals of this issue are: 1. Provide a way for user to escape special characters 2. Make it easy for users to customize Utf8StorageConverter when they have their own data format -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1272) Column pruner causes wrong results
[ https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1272: Attachment: PIG-1272-1.patch Column pruner causes wrong results -- Key: PIG-1272 URL: https://issues.apache.org/jira/browse/PIG-1272 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1272-1.patch For a simple script the column pruner optimization removes certain columns from the original relation, which results in wrong results. Input file kv contains the following columns (tab separated) {code} a 1 a 2 a 3 b 4 c 5 c 6 b 7 d 8 {code} Now running this script in Pig 0.6 produces {code} kv = load 'kv' as (k,v); keys= foreach kv generate k; keys = distinct keys; keys = limit keys 2; rejoin = join keys by k, kv by k; dump rejoin; {code} (a,a) (a,a) (a,a) (b,b) (b,b) Running this in Pig 0.5 version without column pruner results in: (a,a,1) (a,a,2) (a,a,3) (b,b,4) (b,b,7) When we disable the ColumnPruner optimization it gives right results. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1262) Additional findbugs and javac warnings
[ https://issues.apache.org/jira/browse/PIG-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1262: Status: Patch Available (was: Open) Hudson is not working, resubmit. Additional findbugs and javac warnings -- Key: PIG-1262 URL: https://issues.apache.org/jira/browse/PIG-1262 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1262-1.patch After a while, we have introduced some new findbugs and javacc warnings. Will fix them in this Jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1262) Additional findbugs and javac warnings
[ https://issues.apache.org/jira/browse/PIG-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1262: Status: Open (was: Patch Available) Additional findbugs and javac warnings -- Key: PIG-1262 URL: https://issues.apache.org/jira/browse/PIG-1262 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1262-1.patch After a while, we have introduced some new findbugs and javacc warnings. Will fix them in this Jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1251) Move SortInfo calculation earlier in compilation
[ https://issues.apache.org/jira/browse/PIG-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838977#action_12838977 ] Daniel Dai commented on PIG-1251: - +1 for the patch. Please resync with trunk, uncomment testLocalModeNegative2 and testMapReduceModeInputNegative2 in TestInputOutputFileValidator, then commit. Move SortInfo calculation earlier in compilation - Key: PIG-1251 URL: https://issues.apache.org/jira/browse/PIG-1251 Project: Pig Issue Type: Bug Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: pig-1251.patch In LSR Pig does Input Output Validation by calling hadoop's checkSpecs() A storefunc might need schema to do such a validation. So, we should call checkSchema() before doing the validation. checkSchema() in turn requires SortInfo which is calculated later in compilation phase. We need to move it earlier in compilation phase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields)
[ https://issues.apache.org/jira/browse/PIG-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839008#action_12839008 ] Daniel Dai commented on PIG-1259: - +1 ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields) - Key: PIG-1259 URL: https://issues.apache.org/jira/browse/PIG-1259 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1259-2.patch, PIG-1259.patch Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in the ResourceSchema with a subschema containing anything other than a tuple. The tuple itself can have a schema with 1 subfields. This check should also be enforced in ResourceFieldSchema.setSchema() -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1262) Additional findbugs and javac warnings
Additional findbugs and javac warnings -- Key: PIG-1262 URL: https://issues.apache.org/jira/browse/PIG-1262 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 After a while, we have introduced some new findbugs and javacc warnings. Will fix them in this Jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1262) Additional findbugs and javac warnings
[ https://issues.apache.org/jira/browse/PIG-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1262: Status: Patch Available (was: Open) Additional findbugs and javac warnings -- Key: PIG-1262 URL: https://issues.apache.org/jira/browse/PIG-1262 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1262-1.patch After a while, we have introduced some new findbugs and javacc warnings. Will fix them in this Jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1262) Additional findbugs and javac warnings
[ https://issues.apache.org/jira/browse/PIG-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1262: Attachment: PIG-1262-1.patch Additional findbugs and javac warnings -- Key: PIG-1262 URL: https://issues.apache.org/jira/browse/PIG-1262 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1262-1.patch After a while, we have introduced some new findbugs and javacc warnings. Will fix them in this Jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect
[ https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-613: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed. Casting elements inside a tuple does not take effect Key: PIG-613 URL: https://issues.apache.org/jira/browse/PIG-613 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.7.0 Attachments: myfloatdata.txt, PIG-613-1.patch, PIG-613-2.patch, SQUARE.java Consider the following Pig script which casts return values of the SQUARE UDF which are tuples of doubles to long. The describe output of B shows it is long, however the result is still double. {code} register statistics.jar; A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double); B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as squares:(loadtimesq); describe B; explain B; dump B; {code} === Describe output of B: B: {squares: (loadtimesq: long)} === Sample output of B: ((7885.44)) ((792098.2200010001)) ((1497360.9268889998)) ((50023.7956)) ((0.972196)) ((0.30980356)) ((9.9760144E-7)) === Cause: The cast for Tuples has not been implemented in POCast.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1255) Tiny code cleanup for serialization code for PigSplit
[ https://issues.apache.org/jira/browse/PIG-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1255: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) No test included since this patch does not include any new feature. Patch committed. Tiny code cleanup for serialization code for PigSplit - Key: PIG-1255 URL: https://issues.apache.org/jira/browse/PIG-1255 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1255-1.patch, PIG-1255-2.patch A bug which close output stream while serialization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-613) Casting complex type(tuple/bag/map) does not take effect
[ https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-613: --- Summary: Casting complex type(tuple/bag/map) does not take effect (was: Casting elements inside a tuple does not take effect) Casting complex type(tuple/bag/map) does not take effect Key: PIG-613 URL: https://issues.apache.org/jira/browse/PIG-613 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.7.0 Attachments: myfloatdata.txt, PIG-613-1.patch, PIG-613-2.patch, SQUARE.java Consider the following Pig script which casts return values of the SQUARE UDF which are tuples of doubles to long. The describe output of B shows it is long, however the result is still double. {code} register statistics.jar; A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double); B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as squares:(loadtimesq); describe B; explain B; dump B; {code} === Describe output of B: B: {squares: (loadtimesq: long)} === Sample output of B: ((7885.44)) ((792098.2200010001)) ((1497360.9268889998)) ((50023.7956)) ((0.972196)) ((0.30980356)) ((9.9760144E-7)) === Cause: The cast for Tuples has not been implemented in POCast.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1255) Tiny code cleanup for serialization code for PigSplit
[ https://issues.apache.org/jira/browse/PIG-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1255: Status: Patch Available (was: Open) Tiny code cleanup for serialization code for PigSplit - Key: PIG-1255 URL: https://issues.apache.org/jira/browse/PIG-1255 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1255-1.patch A bug which close output stream while serialization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect
[ https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-613: --- Status: Patch Available (was: Open) Casting elements inside a tuple does not take effect Key: PIG-613 URL: https://issues.apache.org/jira/browse/PIG-613 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.7.0 Attachments: myfloatdata.txt, PIG-613-1.patch, PIG-613-2.patch, SQUARE.java Consider the following Pig script which casts return values of the SQUARE UDF which are tuples of doubles to long. The describe output of B shows it is long, however the result is still double. {code} register statistics.jar; A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double); B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as squares:(loadtimesq); describe B; explain B; dump B; {code} === Describe output of B: B: {squares: (loadtimesq: long)} === Sample output of B: ((7885.44)) ((792098.2200010001)) ((1497360.9268889998)) ((50023.7956)) ((0.972196)) ((0.30980356)) ((9.9760144E-7)) === Cause: The cast for Tuples has not been implemented in POCast.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1016) Reading in map data seems broken
[ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837574#action_12837574 ] Daniel Dai commented on PIG-1016: - Hi, busy, I checked your code, seems your patch assume PIG-1016.patch checked in. If I understand correctly, there are inconsistency in this approach. In your code, you allow map value to be any type. However, internally Pig always assume map value to be bytearray. So Pig will choose to use PigBytesRawComparator. And you further modify PigBytesRawComparator to handle all data type. This logic is very confusing. Further, TextDataParser itself if bogus since it will guess the data type based on the content. In PIG-613, we reiterate that map value is bytearray. However, we fixed the code which can cast bytearray to map/tuple/bag correctly. I verified the test case you gave, and it works. {code} A= load '9.txt' as (data:map[]); B= foreach A generate (int)(data#'a'), (chararray)(data#'b'),(tuple(map[]))(data#'c'); C= order B by $0; dump C; {code} Result: (1,'a',(1,2,3)) (2,'d',(1,2,3)) (3,'c',(1,2,3)) {code} D= order B by $1; dump D; {code} Result: (1,'a',(1,2,3)) (3,'c',(1,2,3)) (2,'d',(1,2,3)) {code} describe B; {code} Result: B: {int,chararray,(map[ ])} Do you have other use cases which PIG-613 cannot address? Reading in map data seems broken Key: PIG-1016 URL: https://issues.apache.org/jira/browse/PIG-1016 Project: Pig Issue Type: Improvement Components: data Affects Versions: 0.4.0 Reporter: hc busy Attachments: PIG-1016.patch Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time. I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1016) Reading in map data seems broken
[ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837643#action_12837643 ] Daniel Dai commented on PIG-1016: - Hi, busy, Finally I think I understand what you mean. You want to write a loader and in the loader, you want to put whatever to the map value, right? Then I think it is a valid use case. What I am talking about is if you use PigStorage to load data, map value is always bytearray. Reading in map data seems broken Key: PIG-1016 URL: https://issues.apache.org/jira/browse/PIG-1016 Project: Pig Issue Type: Improvement Components: data Affects Versions: 0.4.0 Reporter: hc busy Attachments: PIG-1016.patch Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time. I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect
[ https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-613: --- Attachment: PIG-613-1.patch Casting elements inside a tuple does not take effect Key: PIG-613 URL: https://issues.apache.org/jira/browse/PIG-613 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.7.0 Attachments: myfloatdata.txt, PIG-613-1.patch, SQUARE.java Consider the following Pig script which casts return values of the SQUARE UDF which are tuples of doubles to long. The describe output of B shows it is long, however the result is still double. {code} register statistics.jar; A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double); B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as squares:(loadtimesq); describe B; explain B; dump B; {code} === Describe output of B: B: {squares: (loadtimesq: long)} === Sample output of B: ((7885.44)) ((792098.2200010001)) ((1497360.9268889998)) ((50023.7956)) ((0.972196)) ((0.30980356)) ((9.9760144E-7)) === Cause: The cast for Tuples has not been implemented in POCast.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect
[ https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-613: --- Status: Patch Available (was: Open) Casting elements inside a tuple does not take effect Key: PIG-613 URL: https://issues.apache.org/jira/browse/PIG-613 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.7.0 Attachments: myfloatdata.txt, PIG-613-1.patch, SQUARE.java Consider the following Pig script which casts return values of the SQUARE UDF which are tuples of doubles to long. The describe output of B shows it is long, however the result is still double. {code} register statistics.jar; A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double); B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as squares:(loadtimesq); describe B; explain B; dump B; {code} === Describe output of B: B: {squares: (loadtimesq: long)} === Sample output of B: ((7885.44)) ((792098.2200010001)) ((1497360.9268889998)) ((50023.7956)) ((0.972196)) ((0.30980356)) ((9.9760144E-7)) === Cause: The cast for Tuples has not been implemented in POCast.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect
[ https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-613: --- Attachment: PIG-613-1.patch Casting elements inside a tuple does not take effect Key: PIG-613 URL: https://issues.apache.org/jira/browse/PIG-613 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.7.0 Attachments: myfloatdata.txt, PIG-613-1.patch, SQUARE.java Consider the following Pig script which casts return values of the SQUARE UDF which are tuples of doubles to long. The describe output of B shows it is long, however the result is still double. {code} register statistics.jar; A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double); B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as squares:(loadtimesq); describe B; explain B; dump B; {code} === Describe output of B: B: {squares: (loadtimesq: long)} === Sample output of B: ((7885.44)) ((792098.2200010001)) ((1497360.9268889998)) ((50023.7956)) ((0.972196)) ((0.30980356)) ((9.9760144E-7)) === Cause: The cast for Tuples has not been implemented in POCast.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect
[ https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-613: --- Status: Open (was: Patch Available) Casting elements inside a tuple does not take effect Key: PIG-613 URL: https://issues.apache.org/jira/browse/PIG-613 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.7.0 Attachments: myfloatdata.txt, PIG-613-1.patch, SQUARE.java Consider the following Pig script which casts return values of the SQUARE UDF which are tuples of doubles to long. The describe output of B shows it is long, however the result is still double. {code} register statistics.jar; A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double); B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as squares:(loadtimesq); describe B; explain B; dump B; {code} === Describe output of B: B: {squares: (loadtimesq: long)} === Sample output of B: ((7885.44)) ((792098.2200010001)) ((1497360.9268889998)) ((50023.7956)) ((0.972196)) ((0.30980356)) ((9.9760144E-7)) === Cause: The cast for Tuples has not been implemented in POCast.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect
[ https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-613: --- Attachment: (was: PIG-613-1.patch) Casting elements inside a tuple does not take effect Key: PIG-613 URL: https://issues.apache.org/jira/browse/PIG-613 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.7.0 Attachments: myfloatdata.txt, PIG-613-1.patch, SQUARE.java Consider the following Pig script which casts return values of the SQUARE UDF which are tuples of doubles to long. The describe output of B shows it is long, however the result is still double. {code} register statistics.jar; A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double); B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as squares:(loadtimesq); describe B; explain B; dump B; {code} === Describe output of B: B: {squares: (loadtimesq: long)} === Sample output of B: ((7885.44)) ((792098.2200010001)) ((1497360.9268889998)) ((50023.7956)) ((0.972196)) ((0.30980356)) ((9.9760144E-7)) === Cause: The cast for Tuples has not been implemented in POCast.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect
[ https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-613: --- Status: Patch Available (was: Open) Casting elements inside a tuple does not take effect Key: PIG-613 URL: https://issues.apache.org/jira/browse/PIG-613 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.7.0 Attachments: myfloatdata.txt, PIG-613-1.patch, SQUARE.java Consider the following Pig script which casts return values of the SQUARE UDF which are tuples of doubles to long. The describe output of B shows it is long, however the result is still double. {code} register statistics.jar; A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double); B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as squares:(loadtimesq); describe B; explain B; dump B; {code} === Describe output of B: B: {squares: (loadtimesq: long)} === Sample output of B: ((7885.44)) ((792098.2200010001)) ((1497360.9268889998)) ((50023.7956)) ((0.972196)) ((0.30980356)) ((9.9760144E-7)) === Cause: The cast for Tuples has not been implemented in POCast.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1016) Reading in map data seems broken
[ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837106#action_12837106 ] Daniel Dai commented on PIG-1016: - This issue should be fixed as part of the effort in [PIG-613|https://issues.apache.org/jira/browse/PIG-613]. hc busy, can you check if that patch address your issue? Reading in map data seems broken Key: PIG-1016 URL: https://issues.apache.org/jira/browse/PIG-1016 Project: Pig Issue Type: Improvement Components: data Affects Versions: 0.4.0 Reporter: hc busy Attachments: PIG-1016.patch Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time. I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1247) Error Number makes it hard to debug: ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error
[ https://issues.apache.org/jira/browse/PIG-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836107#action_12836107 ] Daniel Dai commented on PIG-1247: - This error handling code is hard coded by javacc. Seems we do not have a way to get around currently. Error Number makes it hard to debug: ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error - Key: PIG-1247 URL: https://issues.apache.org/jira/browse/PIG-1247 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.7.0 I have a large script in which there are intermediate stores statements, one of them writes to a directory I do not have permission to write to. The stack trace I get from Pig is this: 2010-02-20 02:16:32,055 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error Details at logfile: /home/viraj/pig_1266632145355.log Pig Stack Trace --- ERROR 2999: Unexpected internal error. org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error java.lang.ClassCastException: org.apache.pig.backend.datastorage.DataStorageException cannot be cast to java.lang.Error at org.apache.pig.impl.logicalLayer.parser.QueryParser.StoreClause(QueryParser.java:3583) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1407) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:949) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:762) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1036) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:986) at org.apache.pig.PigServer.registerQuery(PigServer.java:386) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:720) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:386) The only way to find the error was to look at the javacc generated QueryParser.java code and do a System.out.println() Here is a script to reproduce the problem: {code} A = load '/user/viraj/three.txt' using PigStorage(); B = foreach A generate ['a'#'12'] as b:map[] ; store B into '/user/secure/pigtest' using PigStorage(); {code} three.txt has 3 lines which contain nothing but the number 1. {code} $ hadoop fs -ls /user/secure/ ls: could not get get listing for 'hdfs://mynamenode/user/secure' : org.apache.hadoop.security.AccessControlException: Permission denied: user=viraj, access=READ_EXECUTE, inode=secure:secure:users:rwx-- {code} Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1169) Top-N queries produce incorrect results when a store statement is added between order by and limit statement
[ https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834966#action_12834966 ] Daniel Dai commented on PIG-1169: - +1 Top-N queries produce incorrect results when a store statement is added between order by and limit statement Key: PIG-1169 URL: https://issues.apache.org/jira/browse/PIG-1169 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1169.patch ??We tried to get top N results after a groupby and sort, and got different results with or without storing the full sorted results. Here is a skeleton of our pig script.?? {code} raw_data = Load 'input_files' AS (f1, f2, ..., fn); grouped = group raw_data by (f1, f2); data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value; ordered = order data by value DESC parallel 10; topn = limit ordered 10; store ordered into 'outputdir/full'; store topn into 'outputdir/topn'; {code} ??With the statement 'store ordered ...', top N results are incorrect, but without the statement, results are correct. Has anyone seen this before? I know a similar bug has been fixed in the multi-query release. We are on pig .4 and hadoop .20.1.?? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1238) Dump does not respect the schema
[ https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835086#action_12835086 ] Daniel Dai commented on PIG-1238: - Do an explain, the last limit job is : MapReduce node 1-99 Map Plan Local Rearrange[tuple]{double}(false) - 1-103 | | | Project[double][1] - 1-102 | |---Limit - 1-101 | |---Load(file:/tmp/temp-513510662/tmp1311900615:org.apache.pig.builtin.BinStorage) - 1-100 Reduce Plan Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-109 | |---Limit - 1-108 | |---New For Each(true)[bag] - 1-107 | | | Project[tuple][1] - 1-106 | |---Package[tuple]{double} - 1-105 Global sort: false The project in the map plan is wrong. Dump does not respect the schema Key: PIG-1238 URL: https://issues.apache.org/jira/browse/PIG-1238 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Ankur For complex data type and certain sequence of operations dump produces results with non-existent field in the relation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1238) Dump does not respect the schema
[ https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834422#action_12834422 ] Daniel Dai commented on PIG-1238: - Hi, Ankur, I encounter syntax error in B = FOREACH A GENERATE 'a'#'12' as b:map[], ['b'#'c'#'12'] as mapFields;. Can you verity the script? Dump does not respect the schema Key: PIG-1238 URL: https://issues.apache.org/jira/browse/PIG-1238 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Ankur For complex data type and certain sequence of operations dump produces results with non-existent field in the relation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases
[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833129#action_12833129 ] Daniel Dai commented on PIG-1231: - Seems hudson is not running the testing process at all. Manual test success in both trunk and 0.6 branch. Did not include new testcase since it is a fix to existing testcase. Default DataBagIterator.hasNext() should be idempotent in all cases --- Key: PIG-1231 URL: https://issues.apache.org/jira/browse/PIG-1231 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1231-1.patch, PIG-1231-2.patch DefaultDataBagIterator.hasNext() is not repeatable when the below conditions met: 1. There is no more tuple in the last spill file 2. There is no tuples in memory (all contents are spilled to files) This is not acceptable cuz the name hasNext() implies that it is idempotent. In BagFormat, we do misuse DataBagIterator.hasNext() because of the assumption that hasNext() is always idempotent, which leads to some mysterious errors. Condition 2 seems to be very restrictive, but when the databag is really big, the memory can hold less than a couple of tuples, the chance to hit 2. is high enough. Here is one error we saw: Caused by: java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) ... 20 more This happens because: we call hasNext(), which reach EOF and we close the file. Then we call hasNext() again in the assumption that it is idempotent. However, the stream is closed so we get this error message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases
[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1231: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to both trunk and 0.6 branch. Default DataBagIterator.hasNext() should be idempotent in all cases --- Key: PIG-1231 URL: https://issues.apache.org/jira/browse/PIG-1231 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1231-1.patch, PIG-1231-2.patch DefaultDataBagIterator.hasNext() is not repeatable when the below conditions met: 1. There is no more tuple in the last spill file 2. There is no tuples in memory (all contents are spilled to files) This is not acceptable cuz the name hasNext() implies that it is idempotent. In BagFormat, we do misuse DataBagIterator.hasNext() because of the assumption that hasNext() is always idempotent, which leads to some mysterious errors. Condition 2 seems to be very restrictive, but when the databag is really big, the memory can hold less than a couple of tuples, the chance to hit 2. is high enough. Here is one error we saw: Caused by: java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) ... 20 more This happens because: we call hasNext(), which reach EOF and we close the file. Then we call hasNext() again in the assumption that it is idempotent. However, the stream is closed so we get this error message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases
[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1231: Hadoop Flags: (was: [Reviewed]) Status: Patch Available (was: Reopened) Default DataBagIterator.hasNext() should be idempotent in all cases --- Key: PIG-1231 URL: https://issues.apache.org/jira/browse/PIG-1231 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1231-1.patch, PIG-1231-2.patch DefaultDataBagIterator.hasNext() is not repeatable when the below conditions met: 1. There is no more tuple in the last spill file 2. There is no tuples in memory (all contents are spilled to files) This is not acceptable cuz the name hasNext() implies that it is idempotent. In BagFormat, we do misuse DataBagIterator.hasNext() because of the assumption that hasNext() is always idempotent, which leads to some mysterious errors. Condition 2 seems to be very restrictive, but when the databag is really big, the memory can hold less than a couple of tuples, the chance to hit 2. is high enough. Here is one error we saw: Caused by: java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) ... 20 more This happens because: we call hasNext(), which reach EOF and we close the file. Then we call hasNext() again in the assumption that it is idempotent. However, the stream is closed so we get this error message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases
[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1231: Attachment: PIG-1231-2.patch Default DataBagIterator.hasNext() should be idempotent in all cases --- Key: PIG-1231 URL: https://issues.apache.org/jira/browse/PIG-1231 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1231-1.patch, PIG-1231-2.patch DefaultDataBagIterator.hasNext() is not repeatable when the below conditions met: 1. There is no more tuple in the last spill file 2. There is no tuples in memory (all contents are spilled to files) This is not acceptable cuz the name hasNext() implies that it is idempotent. In BagFormat, we do misuse DataBagIterator.hasNext() because of the assumption that hasNext() is always idempotent, which leads to some mysterious errors. Condition 2 seems to be very restrictive, but when the databag is really big, the memory can hold less than a couple of tuples, the chance to hit 2. is high enough. Here is one error we saw: Caused by: java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) ... 20 more This happens because: we call hasNext(), which reach EOF and we close the file. Then we call hasNext() again in the assumption that it is idempotent. However, the stream is closed so we get this error message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases
[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reopened PIG-1231: - There is unit case failure in 0.6 branch. Default DataBagIterator.hasNext() should be idempotent in all cases --- Key: PIG-1231 URL: https://issues.apache.org/jira/browse/PIG-1231 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1231-1.patch, PIG-1231-2.patch DefaultDataBagIterator.hasNext() is not repeatable when the below conditions met: 1. There is no more tuple in the last spill file 2. There is no tuples in memory (all contents are spilled to files) This is not acceptable cuz the name hasNext() implies that it is idempotent. In BagFormat, we do misuse DataBagIterator.hasNext() because of the assumption that hasNext() is always idempotent, which leads to some mysterious errors. Condition 2 seems to be very restrictive, but when the databag is really big, the memory can hold less than a couple of tuples, the chance to hit 2. is high enough. Here is one error we saw: Caused by: java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) ... 20 more This happens because: we call hasNext(), which reach EOF and we close the file. Then we call hasNext() again in the assumption that it is idempotent. However, the stream is closed so we get this error message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1190) Handling of quoted strings in pig-latin/grunt commands
[ https://issues.apache.org/jira/browse/PIG-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832805#action_12832805 ] Daniel Dai commented on PIG-1190: - +1 for the new change. Handling of quoted strings in pig-latin/grunt commands -- Key: PIG-1190 URL: https://issues.apache.org/jira/browse/PIG-1190 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: correct-testcase.patch, pig-1190.patch, pig-1190_1.patch There is some inconsistency in the way quoted strings are used/handled in pig-latin . In load/store and define-ship commands, files are specified in quoted strings , and the file name is the content within the quotes. But in case of register, set, and file system commands , if string is specified in quotes, the quotes are also included as part of the string. This is not only inconsistent , it is also unintuitive. This is also inconsistent with the way hdfs commandline (or bash shell) interpret file names. For example, currently with the command - set job.name 'job123' The job name set set to 'job123' (including the quotes) not job123 . This needs to be fixed, and above command should be considered equivalent to - set job.name job123. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1169) Top-N queries produce incorrect results when a store statement is added between order by and limit statement
[ https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-1169: --- Assignee: Richard Ding (was: Daniel Dai) Top-N queries produce incorrect results when a store statement is added between order by and limit statement Key: PIG-1169 URL: https://issues.apache.org/jira/browse/PIG-1169 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.7.0 ??We tried to get top N results after a groupby and sort, and got different results with or without storing the full sorted results. Here is a skeleton of our pig script.?? {code} raw_data = Load 'input_files' AS (f1, f2, ..., fn); grouped = group raw_data by (f1, f2); data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value; ordered = order data by value DESC parallel 10; topn = limit ordered 10; store ordered into 'outputdir/full'; store topn into 'outputdir/topn'; {code} ??With the statement 'store ordered ...', top N results are incorrect, but without the statement, results are correct. Has anyone seen this before? I know a similar bug has been fixed in the multi-query release. We are on pig .4 and hadoop .20.1.?? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases
[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831558#action_12831558 ] Daniel Dai commented on PIG-1231: - testCompressed1: java.lang.IllegalArgumentException: port out of range:-1. Not a real problem. Manual test passes. Default DataBagIterator.hasNext() should be idempotent in all cases --- Key: PIG-1231 URL: https://issues.apache.org/jira/browse/PIG-1231 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1231-1.patch DefaultDataBagIterator.hasNext() is not repeatable when the below conditions met: 1. There is no more tuple in the last spill file 2. There is no tuples in memory (all contents are spilled to files) This is not acceptable cuz the name hasNext() implies that it is idempotent. In BagFormat, we do misuse DataBagIterator.hasNext() because of the assumption that hasNext() is always idempotent, which leads to some mysterious errors. Condition 2 seems to be very restrictive, but when the databag is really big, the memory can hold less than a couple of tuples, the chance to hit 2. is high enough. Here is one error we saw: Caused by: java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) ... 20 more This happens because: we call hasNext(), which reach EOF and we close the file. Then we call hasNext() again in the assumption that it is idempotent. However, the stream is closed so we get this error message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases
[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1231: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to both trunk and 0.6 branch. Default DataBagIterator.hasNext() should be idempotent in all cases --- Key: PIG-1231 URL: https://issues.apache.org/jira/browse/PIG-1231 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1231-1.patch DefaultDataBagIterator.hasNext() is not repeatable when the below conditions met: 1. There is no more tuple in the last spill file 2. There is no tuples in memory (all contents are spilled to files) This is not acceptable cuz the name hasNext() implies that it is idempotent. In BagFormat, we do misuse DataBagIterator.hasNext() because of the assumption that hasNext() is always idempotent, which leads to some mysterious errors. Condition 2 seems to be very restrictive, but when the databag is really big, the memory can hold less than a couple of tuples, the chance to hit 2. is high enough. Here is one error we saw: Caused by: java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) ... 20 more This happens because: we call hasNext(), which reach EOF and we close the file. Then we call hasNext() again in the assumption that it is idempotent. However, the stream is closed so we get this error message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1231) DataBagIterator.hasNext() should be idempotent
DataBagIterator.hasNext() should be idempotent -- Key: PIG-1231 URL: https://issues.apache.org/jira/browse/PIG-1231 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Current implementation of DataBagIterator.hasNext() will actually fetch the next tuple every time. So if we call hasNext() consecutively, more than 1 tuples will be fetched. This is confusing cuz the name hasNext() implies that it is idempotent. In BagFormat, we do misuse DataBagIterator.hasNext() because of this, which leads to some mysterious errors. Here is one error we saw: Caused by: java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) ... 20 more This happens because: we call hasNext(), which reach EOF and we close the file. Then we call hasNext() again in the assumption that it is idempotent. However, the stream is closed so we get this error message. This fix will go to DefaultDataBagIterator, DistinctDataBagIterator, CachedBagIterator, SortedDataBagIterator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1231) DataBagIterator.hasNext() should be idempotent
[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1231: Status: Patch Available (was: Open) DataBagIterator.hasNext() should be idempotent -- Key: PIG-1231 URL: https://issues.apache.org/jira/browse/PIG-1231 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1231-1.patch DataBagIterator.hasNext() is not repeatable in some situations. This is not acceptable cuz the name hasNext() implies that it is idempotent. While hasNext() returns true, it is repeatable, but if hasNext() returns false, it is not. In BagFormat, we do misuse DataBagIterator.hasNext() because of the assumption that hasNext() is always idempotent, which leads to some mysterious errors. Here is one error we saw: Caused by: java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) ... 20 more This happens because: we call hasNext(), which reach EOF and we close the file. Then we call hasNext() again in the assumption that it is idempotent. However, the stream is closed so we get this error message. This fix will go to DefaultDataBagIterator, DistinctDataBagIterator, CachedBagIterator, SortedDataBagIterator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1231) DataBagIterator.hasNext() should be idempotent
[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1231: Attachment: PIG-1231-1.patch DefaultDataBagIterator is the only DataBag has this problem. Other databag handles this through different mechanisms. DataBagIterator.hasNext() should be idempotent -- Key: PIG-1231 URL: https://issues.apache.org/jira/browse/PIG-1231 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1231-1.patch DataBagIterator.hasNext() is not repeatable in some situations. This is not acceptable cuz the name hasNext() implies that it is idempotent. While hasNext() returns true, it is repeatable, but if hasNext() returns false, it is not. In BagFormat, we do misuse DataBagIterator.hasNext() because of the assumption that hasNext() is always idempotent, which leads to some mysterious errors. Here is one error we saw: Caused by: java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) ... 20 more This happens because: we call hasNext(), which reach EOF and we close the file. Then we call hasNext() again in the assumption that it is idempotent. However, the stream is closed so we get this error message. This fix will go to DefaultDataBagIterator, DistinctDataBagIterator, CachedBagIterator, SortedDataBagIterator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1231) Default DataBagIterator.hasNext() should be idempotent in all cases
[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1231: Description: DefaultDataBagIterator.hasNext() is not repeatable when the below conditions met: 1. There is no more tuple in the last spill file 2. There is no tuples in memory (all contents are spilled to files) This is not acceptable cuz the name hasNext() implies that it is idempotent. In BagFormat, we do misuse DataBagIterator.hasNext() because of the assumption that hasNext() is always idempotent, which leads to some mysterious errors. Condition 2 seems to be very restrictive, but when the databag is really big, the memory can hold less than a couple of tuples, the chance to hit 2. is high enough. Here is one error we saw: Caused by: java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) ... 20 more This happens because: we call hasNext(), which reach EOF and we close the file. Then we call hasNext() again in the assumption that it is idempotent. However, the stream is closed so we get this error message. was: DataBagIterator.hasNext() is not repeatable in some situations. This is not acceptable cuz the name hasNext() implies that it is idempotent. While hasNext() returns true, it is repeatable, but if hasNext() returns false, it is not. In BagFormat, we do misuse DataBagIterator.hasNext() because of the assumption that hasNext() is always idempotent, which leads to some mysterious errors. Here is one error we saw: Caused by: java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) ... 20 more This happens because: we call hasNext(), which reach EOF and we close the file. Then we call hasNext() again in the assumption that it is idempotent. However, the stream is closed so we get this error message. This fix will go to DefaultDataBagIterator, DistinctDataBagIterator, CachedBagIterator, SortedDataBagIterator. Summary: Default DataBagIterator.hasNext() should be idempotent in all cases (was: DataBagIterator.hasNext() should be idempotent) Default DataBagIterator.hasNext() should be idempotent in all cases --- Key: PIG-1231 URL: https://issues.apache.org/jira/browse/PIG-1231 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1231-1.patch DefaultDataBagIterator.hasNext() is not repeatable when the below conditions met: 1. There is no more tuple in the last spill file 2. There is no tuples in memory (all contents are spilled to files) This is not acceptable cuz the name hasNext() implies that it is idempotent. In BagFormat, we do misuse DataBagIterator.hasNext() because of the assumption that hasNext() is always idempotent, which leads to some mysterious errors. Condition 2 seems to be very restrictive, but when the databag is really big, the memory can hold less than a couple of tuples, the chance to hit 2. is high enough. Here is one error we saw: Caused by: java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) ... 20 more This happens because: we call hasNext(), which reach EOF and we close the file. Then we call hasNext() again in the assumption that it is idempotent. However, the stream is closed so we get this error message. -- This message is automatically generated
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1090: Attachment: PIG-1090-22.patch Two bug fix: 1. Cycle in plan if load and store location are the same 2. relToAbsPathForStoreLocation is not called using pig API directly not using Grunt. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, PIG-1090-20.patch, PIG-1090-21.patch, PIG-1090-22.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1219: Resolution: Won't Fix Status: Resolved (was: Patch Available) Will go for distributed cache approach (https://issues.apache.org/jira/browse/PIG-1218). This patch is no longer needed then. Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1219-1.patch, PIG-1219-2.patch, PIG-1219-3.patch We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1219: Status: Open (was: Patch Available) Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1219-1.patch, PIG-1219-2.patch We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1219: Attachment: PIG-1219-3.patch The test failure is because the way we test it, not the core code. We now require the quantile file to be created before we run JobControlCompiler. In our testcase, we invoke the methods of JobControlCompiler directly without actually running the job, so we do not have quantile file when we get into JobControlCompiler. Change testcase to force create the quantile file. Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1219-1.patch, PIG-1219-2.patch, PIG-1219-3.patch We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1219) Extra call to the namenode in WeightedRangePartitioner
Extra call to the namenode in WeightedRangePartitioner -- Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1219: Summary: Extra listStatus call to the namenode in WeightedRangePartitioner (was: Extra call to the namenode in WeightedRangePartitioner) Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1219: Attachment: PIG-1219-1.patch I am still testing with the patch. Attach it first so other committers can review. Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1219-1.patch We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1219: Status: Patch Available (was: Open) Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1219-1.patch We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1219: Status: Open (was: Patch Available) Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1219-1.patch, PIG-1219-2.patch We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1219: Attachment: PIG-1219-2.patch Thanks Richard. Post updated patch. Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1219-1.patch, PIG-1219-2.patch We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1219: Status: Patch Available (was: Open) Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1219-1.patch, PIG-1219-2.patch We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1090: Attachment: PIG-1090-20.patch Fix one bug in MergeJoin when index has only one entry. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, PIG-1090-20.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1090: Attachment: (was: PIG-1090-20.patch) Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, PIG-1090-20.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1090: Attachment: PIG-1090-20.patch Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, PIG-1090-20.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null
[ https://issues.apache.org/jira/browse/PIG-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1212: Status: Patch Available (was: Open) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null Key: PIG-1212 URL: https://issues.apache.org/jira/browse/PIG-1212 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1212-1.patch The following script throw a NPE: a = load '1.txt' as (a0:chararray); b = load '2.txt' as (b0:chararray); c = join a by a0, b by b0; d = filter c by a0 == 'a'; explain d; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null
[ https://issues.apache.org/jira/browse/PIG-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1212: Attachment: PIG-1212-1.patch LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null Key: PIG-1212 URL: https://issues.apache.org/jira/browse/PIG-1212 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1212-1.patch The following script throw a NPE: a = load '1.txt' as (a0:chararray); b = load '2.txt' as (b0:chararray); c = join a by a0, b by b0; d = filter c by a0 == 'a'; explain d; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1210) fieldsToRead send the same fields more than once in some cases
[ https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1210: Status: Patch Available (was: Open) Attach patch with test case. fieldsToRead send the same fields more than once in some cases -- Key: PIG-1210 URL: https://issues.apache.org/jira/browse/PIG-1210 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1210-1.patch, PIG-1210-2.patch This bug will happen if the following condition meet: 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only LoadFunc we notice now is Zebra. 2. The first item in FOREACH statement contains reference to the same input more than once. For example, the following script will be affected: a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0'); b = foreach a generate a0+a0; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1210) fieldsToRead send the same fields more than once in some cases
[ https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1210: Status: Open (was: Patch Available) fieldsToRead send the same fields more than once in some cases -- Key: PIG-1210 URL: https://issues.apache.org/jira/browse/PIG-1210 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1210-1.patch, PIG-1210-2.patch This bug will happen if the following condition meet: 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only LoadFunc we notice now is Zebra. 2. The first item in FOREACH statement contains reference to the same input more than once. For example, the following script will be affected: a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0'); b = foreach a generate a0+a0; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806468#action_12806468 ] Daniel Dai commented on PIG-1090: - PIG-1090-19.patch committed. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null
[ https://issues.apache.org/jira/browse/PIG-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1212: Attachment: PIG-1212-2.patch Address Richard's comment. LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null Key: PIG-1212 URL: https://issues.apache.org/jira/browse/PIG-1212 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1212-1.patch, PIG-1212-2.patch The following script throw a NPE: a = load '1.txt' as (a0:chararray); b = load '2.txt' as (b0:chararray); c = join a by a0, b by b0; d = filter c by a0 == 'a'; explain d; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1213) Schema serialization is broken
[ https://issues.apache.org/jira/browse/PIG-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806540#action_12806540 ] Daniel Dai commented on PIG-1213: - +1. Please commit once hudson reviewed. Schema serialization is broken -- Key: PIG-1213 URL: https://issues.apache.org/jira/browse/PIG-1213 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.6.0 Attachments: PIG-1213.patch Consider a udf which needs to know the schema of its input in the backend while executing. To achieve this, the udf needs to store the schema into the UDFContext. Internally the UDFContext will serialize the schema into the jobconf. However this currently is broken and gives a Serialization exception -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1210) fieldsToRead send the same fields more than once in some cases
[ https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806557#action_12806557 ] Daniel Dai commented on PIG-1210: - Test failure is due to java.lang.IllegalArgumentException: port out of range:-1. Should be an temporal one. fieldsToRead send the same fields more than once in some cases -- Key: PIG-1210 URL: https://issues.apache.org/jira/browse/PIG-1210 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1210-1.patch, PIG-1210-2.patch This bug will happen if the following condition meet: 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only LoadFunc we notice now is Zebra. 2. The first item in FOREACH statement contains reference to the same input more than once. For example, the following script will be affected: a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0'); b = foreach a generate a0+a0; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1210) fieldsToRead send the same fields more than once in some cases
[ https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806610#action_12806610 ] Daniel Dai commented on PIG-1210: - List is the data structure needed for the construct of RequiredFields. Yes, we could Set, but we need to check if any of our code assume the order within the list, since if we use Set, we lose the order. We can think about that in the new logical plan. fieldsToRead send the same fields more than once in some cases -- Key: PIG-1210 URL: https://issues.apache.org/jira/browse/PIG-1210 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1210-1.patch, PIG-1210-2.patch This bug will happen if the following condition meet: 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only LoadFunc we notice now is Zebra. 2. The first item in FOREACH statement contains reference to the same input more than once. For example, the following script will be affected: a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0'); b = foreach a generate a0+a0; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1210) fieldsToRead send the same fields more than once in some cases
[ https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1210: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to both trunk and 0.6 branch. fieldsToRead send the same fields more than once in some cases -- Key: PIG-1210 URL: https://issues.apache.org/jira/browse/PIG-1210 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1210-1.patch, PIG-1210-2.patch This bug will happen if the following condition meet: 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only LoadFunc we notice now is Zebra. 2. The first item in FOREACH statement contains reference to the same input more than once. For example, the following script will be affected: a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0'); b = foreach a generate a0+a0; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null
[ https://issues.apache.org/jira/browse/PIG-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1212: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed. Thanks Richard! LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null Key: PIG-1212 URL: https://issues.apache.org/jira/browse/PIG-1212 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1212-1.patch, PIG-1212-2.patch The following script throw a NPE: a = load '1.txt' as (a0:chararray); b = load '2.txt' as (b0:chararray); c = join a by a0, b by b0; d = filter c by a0 == 'a'; explain d; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1090: Attachment: PIG-1090-17.patch Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1090: Attachment: PIG-1090-17.patch Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1090: Attachment: (was: PIG-1090-17.patch) Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1090: Attachment: PIG-1090-17.patch Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1090: Attachment: (was: PIG-1090-17.patch) Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1090: Attachment: PIG-1090-17.patch Resubmit PIG-1090-17.patch to address Pradeep's comments. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1210) fieldsToRead send the same fields more than once in some cases
fieldsToRead send the same fields more than once in some cases -- Key: PIG-1210 URL: https://issues.apache.org/jira/browse/PIG-1210 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 This bug will happen if the following condition meet: 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only LoadFunc we notice now is Zebra. 2. The first item in FOREACH statement contains reference to the same input more than once. For example, the following script will be affected: a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0'); b = foreach a generate a0+a0; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1210) fieldsToRead send the same fields more than once in some cases
[ https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1210: Attachment: PIG-1210-1.patch fieldsToRead send the same fields more than once in some cases -- Key: PIG-1210 URL: https://issues.apache.org/jira/browse/PIG-1210 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1210-1.patch This bug will happen if the following condition meet: 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only LoadFunc we notice now is Zebra. 2. The first item in FOREACH statement contains reference to the same input more than once. For example, the following script will be affected: a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0'); b = foreach a generate a0+a0; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1210) fieldsToRead send the same fields more than once in some cases
[ https://issues.apache.org/jira/browse/PIG-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1210: Fix Version/s: (was: 0.7.0) 0.6.0 Status: Patch Available (was: Open) fieldsToRead send the same fields more than once in some cases -- Key: PIG-1210 URL: https://issues.apache.org/jira/browse/PIG-1210 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1210-1.patch This bug will happen if the following condition meet: 1. LoadFunc is susceptible to duplicated fields in fieldsToRead. The only LoadFunc we notice now is Zebra. 2. The first item in FOREACH statement contains reference to the same input more than once. For example, the following script will be affected: a = load '11' using org.apache.hadoop.zebra.pig.TableLoader('a0'); b = foreach a generate a0+a0; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null
LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null Key: PIG-1212 URL: https://issues.apache.org/jira/browse/PIG-1212 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 The following script through a NPE: a = load '1.txt' as (a0:chararray); b = load '2.txt' as (b0:chararray); c = join a by a0, b by b0; d = filter c by a0 == 'a'; explain d; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null
[ https://issues.apache.org/jira/browse/PIG-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1212: Description: The following script throw a NPE: a = load '1.txt' as (a0:chararray); b = load '2.txt' as (b0:chararray); c = join a by a0, b by b0; d = filter c by a0 == 'a'; explain d; was: The following script through a NPE: a = load '1.txt' as (a0:chararray); b = load '2.txt' as (b0:chararray); c = join a by a0, b by b0; d = filter c by a0 == 'a'; explain d; LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null Key: PIG-1212 URL: https://issues.apache.org/jira/browse/PIG-1212 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 The following script throw a NPE: a = load '1.txt' as (a0:chararray); b = load '2.txt' as (b0:chararray); c = join a by a0, b by b0; d = filter c by a0 == 'a'; explain d; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1203) Temporarily disable failed unit test in load-store-redesign branch which have external dependency
Temporarily disable failed unit test in load-store-redesign branch which have external dependency - Key: PIG-1203 URL: https://issues.apache.org/jira/browse/PIG-1203 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 In load-store-redesign branch, two test suits, TestHBaseStorage and TestCounters always fail. TestHBaseStorage depends on https://issues.apache.org/jira/browse/PIG-1200, TestCounters depends on future version of hadoop. We disable these two test suits temporarily, and will enable them once the dependent issues are solved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1203) Temporarily disable failed unit test in load-store-redesign branch which have external dependency
[ https://issues.apache.org/jira/browse/PIG-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1203: Issue Type: Sub-task (was: Bug) Parent: PIG-966 Temporarily disable failed unit test in load-store-redesign branch which have external dependency - Key: PIG-1203 URL: https://issues.apache.org/jira/browse/PIG-1203 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 In load-store-redesign branch, two test suits, TestHBaseStorage and TestCounters always fail. TestHBaseStorage depends on https://issues.apache.org/jira/browse/PIG-1200, TestCounters depends on future version of hadoop. We disable these two test suits temporarily, and will enable them once the dependent issues are solved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1203) Temporarily disable failed unit test in load-store-redesign branch which have external dependency
[ https://issues.apache.org/jira/browse/PIG-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1203: Attachment: PIG-1203-1.patch Patch for the load-store-redesign branch Temporarily disable failed unit test in load-store-redesign branch which have external dependency - Key: PIG-1203 URL: https://issues.apache.org/jira/browse/PIG-1203 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1203-1.patch In load-store-redesign branch, two test suits, TestHBaseStorage and TestCounters always fail. TestHBaseStorage depends on https://issues.apache.org/jira/browse/PIG-1200, TestCounters depends on future version of hadoop. We disable these two test suits temporarily, and will enable them once the dependent issues are solved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect
[ https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-613: --- Fix Version/s: 0.7.0 Assignee: Daniel Dai Casting elements inside a tuple does not take effect Key: PIG-613 URL: https://issues.apache.org/jira/browse/PIG-613 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.7.0 Attachments: myfloatdata.txt, SQUARE.java Consider the following Pig script which casts return values of the SQUARE UDF which are tuples of doubles to long. The describe output of B shows it is long, however the result is still double. {code} register statistics.jar; A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double); B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as squares:(loadtimesq); describe B; explain B; dump B; {code} === Describe output of B: B: {squares: (loadtimesq: long)} === Sample output of B: ((7885.44)) ((792098.2200010001)) ((1497360.9268889998)) ((50023.7956)) ((0.972196)) ((0.30980356)) ((9.9760144E-7)) === Cause: The cast for Tuples has not been implemented in POCast.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later
[ https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1184: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed. PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later -- Key: PIG-1184 URL: https://issues.apache.org/jira/browse/PIG-1184 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Pradeep Kamath Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1184-1.patch, PIG-1184-2.patch The following script : {noformat} -e a = load 'input.txt' as (f1:chararray, f2:chararray, f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, \$4; dump b; {noformat} gives the following result: (oiue,M,10) {noformat} cat input.txt: oiueM {(3),(4)} {(toronto),(montreal)} {noformat} If PruneColumns optimizations is disabled, we get the right result: (oiue,M,10) (oiue,M,10) (oiue,M,10) (oiue,M,10) The flatten results in 4 records - so the output should contain 4 records. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1090: Attachment: PIG-1090-15.patch Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed. StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: multimapstore.pig, multireducestore.pig, PIG-1189-1.patch, PIG-1189-2.patch, PIG-1189-3.patch, singlemapstore.pig, singlereducestore.pig Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1090: Attachment: (was: PIG-1090-15.patch) Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: PIG-1189-2.patch Attach patch to address unit failures. These failures are because we add storeFunc udf into udf array in MapReduceOper, so the return value for MapReduceOper.name() changes. Instead of fixing Golden file, I change the way we generate POStore in TestMRCompiler because it is inconsistent with the way we generate POLoad. Also I change MapReduceOper.udf from a List to a set which I feel more proper. StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: multimapstore.pig, multireducestore.pig, PIG-1189-1.patch, PIG-1189-2.patch, singlemapstore.pig, singlereducestore.pig Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Status: Open (was: Patch Available) StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: multimapstore.pig, multireducestore.pig, PIG-1189-1.patch, PIG-1189-2.patch, singlemapstore.pig, singlereducestore.pig Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Status: Patch Available (was: Open) StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: multimapstore.pig, multireducestore.pig, PIG-1189-1.patch, PIG-1189-2.patch, singlemapstore.pig, singlereducestore.pig Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later
[ https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1184: Hadoop Flags: (was: [Reviewed]) Status: Patch Available (was: Reopened) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later -- Key: PIG-1184 URL: https://issues.apache.org/jira/browse/PIG-1184 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Pradeep Kamath Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1184-1.patch, PIG-1184-2.patch The following script : {noformat} -e a = load 'input.txt' as (f1:chararray, f2:chararray, f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, \$4; dump b; {noformat} gives the following result: (oiue,M,10) {noformat} cat input.txt: oiueM {(3),(4)} {(toronto),(montreal)} {noformat} If PruneColumns optimizations is disabled, we get the right result: (oiue,M,10) (oiue,M,10) (oiue,M,10) (oiue,M,10) The flatten results in 4 records - so the output should contain 4 records. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later
[ https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1184: Attachment: PIG-1184-2.patch Fix unit test failures PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later -- Key: PIG-1184 URL: https://issues.apache.org/jira/browse/PIG-1184 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Pradeep Kamath Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1184-1.patch, PIG-1184-2.patch The following script : {noformat} -e a = load 'input.txt' as (f1:chararray, f2:chararray, f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, \$4; dump b; {noformat} gives the following result: (oiue,M,10) {noformat} cat input.txt: oiueM {(3),(4)} {(toronto),(montreal)} {noformat} If PruneColumns optimizations is disabled, we get the right result: (oiue,M,10) (oiue,M,10) (oiue,M,10) (oiue,M,10) The flatten results in 4 records - so the output should contain 4 records. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1090: Attachment: PIG-1090-15.patch Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804378#action_12804378 ] Daniel Dai commented on PIG-1090: - PIG-1090-15.patch is store side changes in regard to StoreMetadata.storeSchema Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: PIG-1189-3.patch Address javac warnings. StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: multimapstore.pig, multireducestore.pig, PIG-1189-1.patch, PIG-1189-2.patch, PIG-1189-3.patch, singlemapstore.pig, singlereducestore.pig Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Status: Open (was: Patch Available) StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: multimapstore.pig, multireducestore.pig, PIG-1189-1.patch, PIG-1189-2.patch, PIG-1189-3.patch, singlemapstore.pig, singlereducestore.pig Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Status: Patch Available (was: Open) StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: multimapstore.pig, multireducestore.pig, PIG-1189-1.patch, PIG-1189-2.patch, PIG-1189-3.patch, singlemapstore.pig, singlereducestore.pig Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: (was: PIG-1189-1.patch) StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: multimapstore.pig, multireducestore.pig, singlemapstore.pig, singlereducestore.pig Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: PIG-1189-1.patch Hudson apply *.pig as the patch. Reattach to wake up hudson with the right patch file. StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: multimapstore.pig, multireducestore.pig, PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Status: Open (was: Patch Available) StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: multimapstore.pig, multireducestore.pig, PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Status: Patch Available (was: Open) StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: multimapstore.pig, multireducestore.pig, PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later
[ https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reopened PIG-1184: - Still need to address core test failures. PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later -- Key: PIG-1184 URL: https://issues.apache.org/jira/browse/PIG-1184 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Pradeep Kamath Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1184-1.patch The following script : {noformat} -e a = load 'input.txt' as (f1:chararray, f2:chararray, f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, \$4; dump b; {noformat} gives the following result: (oiue,M,10) {noformat} cat input.txt: oiueM {(3),(4)} {(toronto),(montreal)} {noformat} If PruneColumns optimizations is disabled, we get the right result: (oiue,M,10) (oiue,M,10) (oiue,M,10) (oiue,M,10) The flatten results in 4 records - so the output should contain 4 records. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.