[jira] Closed: (PIG-1293) pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set
[ https://issues.apache.org/jira/browse/PIG-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1293. --- > pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set > - > > Key: PIG-1293 > URL: https://issues.apache.org/jira/browse/PIG-1293 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Fix For: 0.7.0 > > Attachments: PIG-1293.txt > > > If PIG_HOME isn't set and pig is in the path, the pig wrapper script can't > find its home. Setting PIG_HOME makes it hard to support multiple versions > of pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1284) pig UDF is lacking XMLLoader. Plan to add the XMLLoader
[ https://issues.apache.org/jira/browse/PIG-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1284. --- > pig UDF is lacking XMLLoader. Plan to add the XMLLoader > --- > > Key: PIG-1284 > URL: https://issues.apache.org/jira/browse/PIG-1284 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Alok Singh >Assignee: Alok Singh > Fix For: 0.7.0 > > Attachments: pigudf_xmlLoader.patch, pigudf_xmlLoader.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > Hi All, > We are planning to add the XMLLoader UDF in the piggybank repository. > Here is the proposal with the user docs :- > The load function to load the XML file > This will implements the LoadFunc interface which is used to parse records > from a dataset. > This takes a xmlTag as the arg which it will use to split the inputdataset > into > multiple records. > For example if the input xml (input.xml) is like this > > > foobar > barfoo > > > foo > > > justname > > > And your pig script is like this > --load the jar files > register loader.jar; > -- load the dataset using XMLLoader > -- A is the bag containing the tuple which contains one atom i.e doc see > output > A = load '/user/aloks/pig/input.xml using loader.XMLLoader('property') as > (doc:chararray); > --dump the result > dump A; > Then you will get the output > ( > foobar > barfoo > ) > ( > justname > ) > Where each () indicate one record > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1291. --- > [zebra] Zebra need to support the virtual column 'source_table' for the > unsorted table unions also > --- > > Key: PIG-1291 > URL: https://issues.apache.org/jira/browse/PIG-1291 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0, 0.8.0 >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch > > > In Pig contrib project zebra, > When user do the union of the sorted tables, the resulting table contains a > virtual column called 'source_table'. > Which allows user to know the original table name from where the content of > the row of the result table is coming from. > This feature is also very useful for the case when the input tables are not > sorted. > Based on the discussion with the zebra dev team, it should be easy to > implement. > I am filing this enhancemnet jira for zebra. > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1292) Interface Refinements
[ https://issues.apache.org/jira/browse/PIG-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1292. --- > Interface Refinements > - > > Key: PIG-1292 > URL: https://issues.apache.org/jira/browse/PIG-1292 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: pig-1292.patch, pig-interfaces.patch > > > A loader can't implement both OrderedLoadFunc and IndexableLoadFunc, as both > are abstract classes instead of being interfaces. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1276) [Zebra] Changes requried for Zebra due to PIG-1259 changes
[ https://issues.apache.org/jira/browse/PIG-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1276. --- > [Zebra] Changes requried for Zebra due to PIG-1259 changes > --- > > Key: PIG-1276 > URL: https://issues.apache.org/jira/browse/PIG-1276 > Project: Pig > Issue Type: Bug >Reporter: Xuefu Zhang >Priority: Minor > Fix For: 0.7.0 > > Attachments: zebra.0304 > > > Pig resource schema interface changed, so Zebra needs to catch exception > thrown from the new interfaces. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster
[ https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1282. --- > [zebra] make Zebra's pig test cases run on real cluster > --- > > Key: PIG-1282 > URL: https://issues.apache.org/jira/browse/PIG-1282 > Project: Pig > Issue Type: Task >Affects Versions: 0.6.0 >Reporter: Chao Wang >Assignee: Chao Wang > Fix For: 0.7.0 > > Attachments: PIG-1282.patch > > > The goal of this task is to make Zebra's pig test cases run on real cluster. > Currently Zebra's pig test cases are mostly tested using MiniCluster. We want > to use a real hadoop cluster to test them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release
[ https://issues.apache.org/jira/browse/PIG-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1287. --- > Use hadoop-0.20.2 with pig 0.7.0 release > > > Key: PIG-1287 > URL: https://issues.apache.org/jira/browse/PIG-1287 > Project: Pig > Issue Type: Task >Affects Versions: 0.7.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.7.0 > > Attachments: hadoop20.jar, PIG-1287-2.patch, PIG-1287.patch > > > Use hadoop-0.20.2 with pig 0.7.0 release -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1272) Column pruner causes wrong results
[ https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1272. --- > Column pruner causes wrong results > -- > > Key: PIG-1272 > URL: https://issues.apache.org/jira/browse/PIG-1272 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1272-1.patch, PIG-1272-2.patch > > > For a simple script the column pruner optimization removes certain columns > from the original relation, which results in wrong results. > Input file "kv" contains the following columns (tab separated) > {code} > a 1 > a 2 > a 3 > b 4 > c 5 > c 6 > b 7 > d 8 > {code} > Now running this script in Pig 0.6 produces > {code} > kv = load 'kv' as (k,v); > keys= foreach kv generate k; > keys = distinct keys; > keys = limit keys 2; > rejoin = join keys by k, kv by k; > dump rejoin; > {code} > (a,a) > (a,a) > (a,a) > (b,b) > (b,b) > Running this in Pig 0.5 version without column pruner results in: > (a,a,1) > (a,a,2) > (a,a,3) > (b,b,4) > (b,b,7) > When we disable the "ColumnPruner" optimization it gives right results. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1267) Problems with partition filter optimizer
[ https://issues.apache.org/jira/browse/PIG-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1267. --- > Problems with partition filter optimizer > > > Key: PIG-1267 > URL: https://issues.apache.org/jira/browse/PIG-1267 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1267.patch > > > There are a couple of problems with the current partition filter optimizer: > 1. When a partition filter is removed from the logical plan, the input index > of the following join/cogroup operator may change, which in turn changes the > ordering of the fields in the schema and results in compile-time errors. > 2. At most one partition filter can be removed per plan, while multiple > partition filters can exist in the cases of joins. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1263) Script producing varying number of records when COGROUPing value of map data type with and without types
[ https://issues.apache.org/jira/browse/PIG-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1263. --- > Script producing varying number of records when COGROUPing value of map data > type with and without types > > > Key: PIG-1263 > URL: https://issues.apache.org/jira/browse/PIG-1263 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Daniel Dai > Fix For: 0.7.0 > > > I have a Pig script which I am experimenting upon. [[Albeit this is not > optimized and can be done in variety of ways]] I get different record counts > by placing load store pairs in the script. > Case 1: Returns 424329 records > Case 2: Returns 5859 records > Case 3: Returns 5859 records > Case 4: Returns 5578 records > I am wondering what the correct result is? > Here are the scripts. > Case 1: > {code} > register udf.jar > A = LOAD '/user/viraj/data/20100203' USING MapLoader() AS (s, m, l); > B = FOREACH A GENERATE > s#'key1' as key1, > s#'key2' as key2; > C = FOREACH B generate key2; > D = filter C by (key2 IS NOT null); > E = distinct D; > store E into 'unique_key_list' using PigStorage('\u0001'); > F = Foreach E generate key2, MapGenerate(key2) as m; > G = FILTER F by (m IS NOT null); > H = foreach G generate key2, m#'id1' as id1, m#'id2' as id2, m#'id3' as id3, > m#'id4' as id4, m#'id5' as id5, m#'id6' as id6, m#'id7' as id7, m#'id8' as > id8, m#'id9' as id9, m#'id10' as id10, m#'id11' as id11, m#'id12' as id12; > I = GROUP H BY (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, > id12); > J = Foreach I generate group.id1 as id1, group.id2 as id2, group.id3 as id3, > group.id4 as id4,group.id5 as id5, group.id6 as id6, group.id7 as id7, > group.id8 as id8, group.id9 as id9, group.id10 as id10, group.id11 as id11, > group.id12 as id12; > --load previous days data > K = LOAD '/user/viraj/data/20100202' USING PigStorage('\u0001') as (id1, id2, > id3, id4, id5, id6, id7, id8, id9, id10, id11, id12); > L = COGROUP K by (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, > id12) OUTER, > J by (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, > id12) OUTER; > M = filter L by IsEmpty(K); > store M into 'cogroupNoTypes' using PigStorage(); > {code} > Case 2: Storing and loading intermediate results in J > {code} > register udf.jar > A = LOAD '/user/viraj/data/20100203' USING MapLoader() AS (s, m, l); > B = FOREACH A GENERATE > s#'key1' as key1, > s#'key2' as key2; > C = FOREACH B generate key2; > D = filter C by (key2 IS NOT null); > E = distinct D; > store E into 'unique_key_list' using PigStorage('\u0001'); > F = Foreach E generate key2, MapGenerate(key2) as m; > G = FILTER F by (m IS NOT null); > H = foreach G generate key2, m#'id1' as id1, m#'id2' as id2, m#'id3' as id3, > m#'id4' as id4, m#'id5' as id5, m#'id6' as id6, m#'id7' as id7, m#'id8' as > id8, m#'id9' as id9, m#'id10' as id10, m#'id11' as id11, m#'id12' as id12; > I = GROUP H BY (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, > id12); > J = Foreach I generate group.id1 as id1, group.id2 as id2, group.id3 as id3, > group.id4 as id4,group.id5 as id5, group.id6 as id6, group.id7 as id7, > group.id8 as id8, group.id9 as id9, group.id10 as id10, group.id11 as id11, > group.id12 as id12; > --store intermediate data to HDFS and re-read > store J into 'output/20100203/J' using PigStorage('\u0001'); > --load previous days data > K = LOAD '/user/viraj/data/20100202' USING PigStorage('\u0001') as (id1, id2, > id3, id4, id5, id6, id7, id8, id9, id10, id11, id12); > --read J into K1 > K1 = LOAD 'output/20100203/J' using PigStorage('\u0001') as (id1, id2, id3, > id4, id5, id6, id7, id8, id9, id10, id11, id12); > L = COGROUP K by (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, > id12) OUTER, > K1 by (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, > id12) OUTER; > M = filter L by IsEmpty(K); > store M into 'cogroupNoTypesIntStore' using PigStorage(); > {code} > Case 3: Types information specified but no intermediate store of J > {code} > register udf.jar > A = LOAD '/user/viraj/data/20100203' USING MapLoader() AS (s, m, l); > B = FOREACH A GENERATE > s#'key1' as key1, > s#'key2' as key2; > C = FOREACH B generate key2; > D = filter C by (key2 IS NOT null); > E = distinct D; > store E into 'unique_key_list' using PigStorage('\u0001'); > F = Foreach E generate key2, MapGenerate(key2) as m; > G = FILTER F by (m IS NOT null); > H = foreach G generate key2, (long)m#'id1' as id1, (long)m#'id2' as id2, > (long)m#'id3' as id3, (long)m#'id4' as id4, (long)m#'id5' as id5, > (long)m#'id6' as id6, (long)m#'id7' as id
[jira] Closed: (PIG-1273) Skewed join throws error
[ https://issues.apache.org/jira/browse/PIG-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1273. --- > Skewed join throws error > - > > Key: PIG-1273 > URL: https://issues.apache.org/jira/browse/PIG-1273 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Ankur >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1273.patch > > > When the sampled relation is too small or empty then skewed join fails. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1275) empty bag in PigStorage read as null
[ https://issues.apache.org/jira/browse/PIG-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1275. --- > empty bag in PigStorage read as null > > > Key: PIG-1275 > URL: https://issues.apache.org/jira/browse/PIG-1275 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Thejas M Nair >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1275-1.patch, PIG-1275-2.patch > > > This seems to be introduced after changes in PIG-613 . > grunt> cat /tmp/students.txt > > qwerF {(1),(2)} > zxldf M {} > grunt> l = load '/tmp/students.txt' as (n : chararray, s : chararray, b: {t : > (i : int)} ); > grunt> dump l; > (qwer,F,{(1),(2)}) > (zxldf,M,) > grunt> -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1268) [Zebra] Need an ant target that runs all pig-related tests in Zebra
[ https://issues.apache.org/jira/browse/PIG-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1268. --- > [Zebra] Need an ant target that runs all pig-related tests in Zebra > --- > > Key: PIG-1268 > URL: https://issues.apache.org/jira/browse/PIG-1268 > Project: Pig > Issue Type: Test > Components: build >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang >Priority: Minor > Fix For: 0.7.0 > > Attachments: zebra.0303 > > > Currently Pig checkins don't run any Zebra test to make sure that Zebra is > not broken. To make this happen, Zebra build needs a test target that only > run pig-related tests. With this, Pig committers need to do "ant pig" for > Zebra as part of the before-checkin sanity check. Ideally, this target should > be triggered as part of Hudson. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1269) [Zebra] Restrict schema definition for collection
[ https://issues.apache.org/jira/browse/PIG-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1269. --- > [Zebra] Restrict schema definition for collection > - > > Key: PIG-1269 > URL: https://issues.apache.org/jira/browse/PIG-1269 > Project: Pig > Issue Type: Bug >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Fix For: 0.7.0 > > Attachments: zebra.0310 > > > Currently Zebra grammar for schema definition for collection field allows > many types of definition. To reduce complexity and remove ambiguity, and more > importantly, to make the meta data more representative of the actual data > instances, the grammar rules need to be changed. Only a record type is > allowed and required for collection definition. Thus, > fieldName:collection(record(c1:int, c2:string)) is legal, while > fieldName:collection(c1:int, c2:string), > fieldName:collection(f:record(c1:int, c2:string)), > fieldName:collection(c1:int), or feildName:collection(int) is illegal. > This will have some impact on existing Zebra M/R programs or Pig scripts that > use Zebra. Schema acceptable in previous release now may become illegal > because of this change. This should be clearly documented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1262) Additional findbugs and javac warnings
[ https://issues.apache.org/jira/browse/PIG-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1262. --- > Additional findbugs and javac warnings > -- > > Key: PIG-1262 > URL: https://issues.apache.org/jira/browse/PIG-1262 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1262-1.patch, PIG-1262-2.patch > > > After a while, we have introduced some new findbugs and javacc warnings. Will > fix them in this Jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface
[ https://issues.apache.org/jira/browse/PIG-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1265. --- > Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and > add a cleanupOnFailure method to StoreFuncInterface > - > > Key: PIG-1265 > URL: https://issues.apache.org/jira/browse/PIG-1265 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.7.0 > > Attachments: PIG-1265-2.patch, PIG-1265.patch > > > Speaking to the hadoop team folks, the direction in hadoop is to use Job > instead of Configuration - for example InputFormat/OutputFormat > implementations use Job to store input/output location. So pig should also do > the same in LoadMetadata and StoreMetadata to be closer to hadoop. > Currently when a job fails, pig assumes the output locations (corresponding > to the stores in the job) are hdfs locations and attempts to delete them. > Since output locations could be non hdfs locations, this cleanup should be > delegated to the StoreFuncInterface implementation - hence a new method - > cleanupOnFailure() should be introduced in StoreFuncInterface and a default > implementation should be provided in the StoreFunc abstract class which > checks if the location exists on hdfs and deletes it if so. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1260) Param Subsitution results in parser error if there is no EOL after last line in script
[ https://issues.apache.org/jira/browse/PIG-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1260. --- > Param Subsitution results in parser error if there is no EOL after last line > in script > -- > > Key: PIG-1260 > URL: https://issues.apache.org/jira/browse/PIG-1260 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1260.patch > > > {noformat} > A = load '$INPUT' using PigStorage(':'); > B = foreach A generate $0 as id; > store B into '$OUTPUT' USING PigStorage(); > {noformat} > Invoking above script which contains no EOL in the last line of script as > following: > {noformat} > pig -param INPUT=mydata/input -param OUTPUT=mydata/output myscript.pig > {noformat} > results in parser error: > {noformat} > [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during > parsing. Lexical error at line 3, column 42. Encountered: after : "" > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with > 1 subfields)
[ https://issues.apache.org/jira/browse/PIG-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1259. --- > ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as > its only sub field (the tuple itself can have a schema with > 1 subfields) > - > > Key: PIG-1259 > URL: https://issues.apache.org/jira/browse/PIG-1259 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.7.0 > > Attachments: PIG-1259-2.patch, PIG-1259.patch > > > Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in > the ResourceSchema with a subschema containing anything other than a tuple. > The tuple itself can have a schema with > 1 subfields. This check should also > be enforced in ResourceFieldSchema.setSchema() -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1266) Show spill count on the pig console at the end of the job
[ https://issues.apache.org/jira/browse/PIG-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1266. --- > Show spill count on the pig console at the end of the job > - > > Key: PIG-1266 > URL: https://issues.apache.org/jira/browse/PIG-1266 > Project: Pig > Issue Type: Bug >Reporter: Sriranjan Manjunath >Assignee: Sriranjan Manjunath > Fix For: 0.7.0 > > Attachments: PIG_1266.patch > > > Currently the spill count is displayed only on the job tracker log. It should > be displayed on the console as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1264) Skewed join sampler misses out the key with the highest frequency
[ https://issues.apache.org/jira/browse/PIG-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1264. --- > Skewed join sampler misses out the key with the highest frequency > - > > Key: PIG-1264 > URL: https://issues.apache.org/jira/browse/PIG-1264 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Sriranjan Manjunath >Assignee: Richard Ding > Fix For: 0.7.0 > > > I am noticing two issues with the sampler used in skewed join: > 1. It does not allocate multiple reducers to the key with the highest > frequency. > 2. It seems to be allocating the same number of reducers to every key (8 in > this case). > Query: > a = load 'studenttab10k' using PigStorage() as (name, age, gpa); > b = load 'votertab10k' as (name, age, registration, contributions); > e = join a by name right, b by name using "skewed" parallel 8; > store e into 'SkewedJoin_9.out'; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1257. --- > PigStorage per the new load-store redesign should support splitting of bzip > files > - > > Key: PIG-1257 > URL: https://issues.apache.org/jira/browse/PIG-1257 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.7.0 > > Attachments: blockEndingInCR.txt.bz2, > blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, > PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2 > > > PigStorage implemented per new load-store-redesign (PIG-966) is based on > TextInputFormat for reading data. TextInputFormat has support for reading > bzip data but without support for splitting bzip files. In pig 0.6, splitting > was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1261) PigStorageSchema broke after changes to ResourceSchema
[ https://issues.apache.org/jira/browse/PIG-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1261. --- > PigStorageSchema broke after changes to ResourceSchema > -- > > Key: PIG-1261 > URL: https://issues.apache.org/jira/browse/PIG-1261 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Daniel Dai >Assignee: Dmitriy V. Ryaboy > Fix For: 0.7.0 > > Attachments: PIG_1261.diff, TestPigStorageSchema.java > > > After we add a new method "getCastString" into ResourceSchema, > TestPigStorageSchema begin to fail. Seems PigStorageSchema try to serialize > cast string into the schema. If I change the name of the method from > "getCastString" to "genCastString", then the error message go away. Since > Dmitriy is the author of TestPigStorageSchema, I need his help to check if > this is the right approach to fix it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1255) Tiny code cleanup for serialization code for PigSplit
[ https://issues.apache.org/jira/browse/PIG-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1255. --- > Tiny code cleanup for serialization code for PigSplit > - > > Key: PIG-1255 > URL: https://issues.apache.org/jira/browse/PIG-1255 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1255-1.patch, PIG-1255-2.patch > > > A bug which close output stream while serialization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1256) [Zebra] Bag field should always contain a tuple type as the field schema in ResourceSchema object converted from Zebra Schema
[ https://issues.apache.org/jira/browse/PIG-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1256. --- > [Zebra] Bag field should always contain a tuple type as the field schema in > ResourceSchema object converted from Zebra Schema > - > > Key: PIG-1256 > URL: https://issues.apache.org/jira/browse/PIG-1256 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang >Priority: Minor > Fix For: 0.7.0 > > Attachments: patch.0223 > > > Pig now requires that schema converted from Zebra Schema contains Tuple type > as field schema in .7 release. Zebra needs to take care of all cases where > Record is not explicitly specified. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1258) [zebra] Number of sorted input splits is unusually high
[ https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1258. --- > [zebra] Number of sorted input splits is unusually high > --- > > Key: PIG-1258 > URL: https://issues.apache.org/jira/browse/PIG-1258 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Yan Zhou > Fix For: 0.7.0 > > Attachments: PIG-1258.patch > > > Number of sorted input splits is unusually high if the projections are on > multiple column groups, or a union of tables, or column group(s) that hold > many small tfiles. In one test, the number is about 100 times bigger that > from unsorted input splits on the same input tables. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1251) Move SortInfo calculation earlier in compilation
[ https://issues.apache.org/jira/browse/PIG-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1251. --- > Move SortInfo calculation earlier in compilation > - > > Key: PIG-1251 > URL: https://issues.apache.org/jira/browse/PIG-1251 > Project: Pig > Issue Type: Bug >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: pig-1251.patch, pig-1251_1.patch > > > In LSR Pig does Input Output Validation by calling hadoop's checkSpecs() A > storefunc might need schema to do such a validation. So, we should call > checkSchema() before doing the validation. checkSchema() in turn requires > SortInfo which is calculated later in compilation phase. We need to move it > earlier in compilation phase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization
[ https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1252. --- > Diamond splitter does not generate correct results when using Multi-query > optimization > -- > > Key: PIG-1252 > URL: https://issues.apache.org/jira/browse/PIG-1252 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1252-2.patch, PIG-1252.patch > > > I have script which uses split but somehow does not use one of the split > branch. The skeleton of the script is as follows > {code} > loadData = load '/user/viraj/zebradata' using > org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, > col7'); > prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, > (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : > ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 > : IS_VALID ('200', '0', '0', 'input.txt')) as validRec; > SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), > falseDataTmp IF (validRec == '1' AND splitcond == ''); > grpData = GROUP trueDataTmp BY splitcond; > finalData = FOREACH grpData { >orderedData = ORDER trueDataTmp BY col1,col2; >GENERATE FLATTEN ( MYUDF (orderedData, 60, > 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l); > } > dump finalData; > {code} > You can see that "falseDataTmp" is untouched. > When I run this script with no-Multiquery (-M) option I get the right result. > This could be the result of complex BinCond's in the POLoad. We can get rid > of this error by using FILTER instead of SPIT. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1253) [zebra] make map/reduce test cases run on real cluster
[ https://issues.apache.org/jira/browse/PIG-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1253. --- > [zebra] make map/reduce test cases run on real cluster > -- > > Key: PIG-1253 > URL: https://issues.apache.org/jira/browse/PIG-1253 > Project: Pig > Issue Type: Task >Affects Versions: 0.6.0 >Reporter: Chao Wang >Assignee: Chao Wang > Fix For: 0.7.0 > > Attachments: PIG-1253-0.6.patch, PIG-1253.patch, PIG-1253.patch > > > The goal of this task is to make map/reduce test cases run on real cluster. > Currently map/reduce test cases are mostly tested under local mode. When > running on real cluster, all involved jars have to be manually deployed in > advance which is not desired. > The major change here is to support -libjars option to be able to ship user > jars to backend automatically. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1250) Make StoreFunc an abstract class and create a mirror interface called StoreFuncInterface
[ https://issues.apache.org/jira/browse/PIG-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1250. --- > Make StoreFunc an abstract class and create a mirror interface called > StoreFuncInterface > > > Key: PIG-1250 > URL: https://issues.apache.org/jira/browse/PIG-1250 > Project: Pig > Issue Type: Sub-task >Affects Versions: 0.7.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.7.0 > > Attachments: PIG-1250-2.patch, PIG-1250.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1238) Dump does not respect the schema
[ https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1238. --- > Dump does not respect the schema > > > Key: PIG-1238 > URL: https://issues.apache.org/jira/browse/PIG-1238 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Ankur >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1238.patch > > > For complex data type and certain sequence of operations dump produces > results with non-existent field in the relation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1243) Passing Complex map types to and from streaming causes a problem
[ https://issues.apache.org/jira/browse/PIG-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1243. --- > Passing Complex map types to and from streaming causes a problem > > > Key: PIG-1243 > URL: https://issues.apache.org/jira/browse/PIG-1243 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.7.0 > > > I have a program which generates different types of Maps fields and stores it > into PigStorage. > {code} > A = load '/user/viraj/three.txt' using PigStorage(); > B = foreach A generate ['a'#'12'] as b:map[], ['b'#['c'#'12']] as c, > ['c'#{(['d'#'15']),(['e'#'16'])}] as d; > store B into '/user/viraj/pigtest' using PigStorage(); > {code} > Now I test the previous output in the below script to make sure I have the > right results. I also pass this data to a Perl script and I observe that the > complex Map types I have generated, are lost when I get the result back. > {code} > DEFINE CMD `simple.pl` SHIP('simple.pl'); > A = load '/user/viraj/pigtest' using PigStorage() as (simpleFields, > mapFields, mapListFields); > B = foreach A generate $0, $1, $2; > dump B; > C = foreach A generate (chararray)simpleFields#'a' as value, $0,$1,$2; > D = stream C through CMD as (a0:map[], a1:map[], a2:map[]); > dump D; > {code} > dumping B results in: > ([a#12],[b#[c#12]],[c#{([d#15]),([e#16])}]) > ([a#12],[b#[c#12]],[c#{([d#15]),([e#16])}]) > ([a#12],[b#[c#12]],[c#{([d#15]),([e#16])}]) > dumping D results in: > ([a#12],,) > ([a#12],,) > ([a#12],,) > The Perl script used here is: > {code} > #!/usr/local/bin/perl > use warnings; > use strict; > while(<>) { > my($bc,$s,$m,$l)=split/\t/; > print("$s\t$m\t$l"); > } > {code} > Is there an issue with handling of complex Map fields within streaming? How > can I fix this to obtain the right result? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1240) [Zebra] suggestion to have zebra manifest file contain version and svn-revision etc.
[ https://issues.apache.org/jira/browse/PIG-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1240. --- > [Zebra] suggestion to have zebra manifest file contain version and > svn-revision etc. > - > > Key: PIG-1240 > URL: https://issues.apache.org/jira/browse/PIG-1240 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.7.0 >Reporter: Gaurav Jain >Assignee: Gaurav Jain >Priority: Minor > Fix For: 0.7.0 > > Attachments: PIG-1240.patch > > > Zebra jars' manifest file sld have zebra manifest file contain version and > svn-revision etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1226) Need to be able to register jars on the command line
[ https://issues.apache.org/jira/browse/PIG-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1226. --- > Need to be able to register jars on the command line > > > Key: PIG-1226 > URL: https://issues.apache.org/jira/browse/PIG-1226 > Project: Pig > Issue Type: Bug >Reporter: Alan Gates >Assignee: Thejas M Nair > Fix For: 0.7.0 > > Attachments: PIG-1126.patch > > > Currently 'register' can only be done inside a Pig Latin script. Users often > run their scripts in different environments, so jar locations or versions may > change. But they don't want to edit their script to fit each environment. > Instead they could register on the command line, something like: > pig -Dpig.additional.jars=my.jar:your.jar script.pig > These would not override registers in the Pig Latin script itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1248) [piggybank] useful String functions
[ https://issues.apache.org/jira/browse/PIG-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1248. --- > [piggybank] useful String functions > --- > > Key: PIG-1248 > URL: https://issues.apache.org/jira/browse/PIG-1248 > Project: Pig > Issue Type: New Feature >Reporter: Dmitriy V. Ryaboy >Assignee: Dmitriy V. Ryaboy > Fix For: 0.7.0 > > Attachments: PIG_1248.diff, PIG_1248.diff, PIG_1248.diff > > > Pig ships with very few evalFuncs for working with strings. This jira is for > adding a few more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1233) NullPointerException in AVG
[ https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1233. --- > NullPointerException in AVG > > > Key: PIG-1233 > URL: https://issues.apache.org/jira/browse/PIG-1233 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Ankur >Assignee: Ankur > Fix For: 0.7.0 > > Attachments: jira-1233.patch > > > The overridden method - getValue() in AVG throws null pointer exception in > case accumulate() is not called leaving variable 'intermediateCount' > initialized to null. This causes java to throw exception when it tries to > 'unbox' the value for numeric comparison. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1220) Document unknown keywords as missing or to do in future
[ https://issues.apache.org/jira/browse/PIG-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1220. --- > Document unknown keywords as missing or to do in future > --- > > Key: PIG-1220 > URL: https://issues.apache.org/jira/browse/PIG-1220 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.6.0 >Reporter: Viraj Bhat > Fix For: 0.7.0 > > > To get help at the grunt shell I do the following: > grunt>touchz > 010-02-04 00:59:28,714 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1000: Error during parsing. Encountered " "touchz "" at line 1, > column 1. > Was expecting one of: > > "cat" ... > "fs" ... > "cd" ... > "cp" ... > "copyFromLocal" ... > "copyToLocal" ... > "dump" ... > "describe" ... > "aliases" ... > "explain" ... > "help" ... > "kill" ... > "ls" ... > "mv" ... > "mkdir" ... > "pwd" ... > "quit" ... > "register" ... > "rm" ... > "rmf" ... > "set" ... > "illustrate" ... > "run" ... > "exec" ... > "scriptDone" ... > "" ... > ... > ";" ... > I looked at the code and found that we do nothing at: > "scriptDone": Is there some future value of that command. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1241) Accumulator is turned on when a map is used with a non-accumulative UDF
[ https://issues.apache.org/jira/browse/PIG-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1241. --- > Accumulator is turned on when a map is used with a non-accumulative UDF > --- > > Key: PIG-1241 > URL: https://issues.apache.org/jira/browse/PIG-1241 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Ying He >Assignee: Ying He > Fix For: 0.7.0 > > Attachments: accum.patch > > > Exception is thrown for a script like the following: > register /homes/yinghe/owl/string.jar; > a = load 'a.txt' as (id, url); > b = group a by (id, url); > c = foreach b generate COUNT(a), (CHARARRAY) > string.URLPARSE(group.url)#'url'; > dump c; > In this query, URLPARSE() is not accumulative, and it returns a map. > The accumulator optimizer failed to check UDF in this case, and tries to run > the job in accumulative mode. ClassCastException is thrown when trying to > cast UDF into Accumulator interface. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1224) Collected group should change to use new (internal) bag
[ https://issues.apache.org/jira/browse/PIG-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1224. --- > Collected group should change to use new (internal) bag > --- > > Key: PIG-1224 > URL: https://issues.apache.org/jira/browse/PIG-1224 > Project: Pig > Issue Type: Improvement >Reporter: Olga Natkovich >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: pig-1224.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1218) Use distributed cache to store samples
[ https://issues.apache.org/jira/browse/PIG-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1218. --- > Use distributed cache to store samples > -- > > Key: PIG-1218 > URL: https://issues.apache.org/jira/browse/PIG-1218 > Project: Pig > Issue Type: Improvement >Reporter: Olga Natkovich >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1218.patch, PIG-1218_2.patch, PIG-1218_3.patch > > > Currently, in the case of skew join and order by we use sample that is just > written to the dfs (not distributed cache) and, as the result, get opened and > copied around more than necessary. This impacts query performance and also > places unnecesary load on the name node -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1234. --- > Unable to create input slice for har:// files > - > > Key: PIG-1234 > URL: https://issues.apache.org/jira/browse/PIG-1234 > Project: Pig > Issue Type: Bug >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Pradeep Kamath > Fix For: 0.7.0 > > Attachments: PIG-1234.patch > > > Tried to load har:// files > {noformat} > grunt> a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING > PigStorage('\n') AS (line); > grunt> dump > {noformat} > but pig says > {noformat} > 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2118: > Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples
[ https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1230. --- > Streaming input in POJoinPackage should use nonspillable bag to collect tuples > -- > > Key: PIG-1230 > URL: https://issues.apache.org/jira/browse/PIG-1230 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: pig-1230.patch, pig-1230_1.patch, pig-1230_2.patch > > > Last table of join statement is streamed through instead of collecting all > its tuple in a bag. As a further optimization of that, tuples of that > relation are collected in chunks in a bag. Since we don't want to spill the > tuples from this bag, NonSpillableBag should be used to hold tuples for this > relation. Initially, DefaultDataBag was used, which was later changed to > InternalCachedBag as a part of PIG-1209. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1216) New load store design does not allow Pig to validate inputs and outputs up front
[ https://issues.apache.org/jira/browse/PIG-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1216. --- > New load store design does not allow Pig to validate inputs and outputs up > front > > > Key: PIG-1216 > URL: https://issues.apache.org/jira/browse/PIG-1216 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Alan Gates >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: pig-1216.patch, pig-1216_1.patch > > > In Pig 0.6 and before, Pig attempts to verify existence of inputs and > non-existence of outputs during parsing to avoid run time failures when > inputs don't exist or outputs can't be overwritten. The downside to this was > that Pig assumed all inputs and outputs were HDFS files, which made > implementation harder for non-HDFS based load and store functions. In the > load store redesign (PIG-966) this was delegated to InputFormats and > OutputFormats to avoid this problem and to make use of the checks already > being done in those implementations. Unfortunately, for Pig Latin scripts > that run more then one MR job, this does not work well. MR does not do > input/output verification on all the jobs at once. It does them one at a > time. So if a Pig Latin script results in 10 MR jobs and the file to store > to at the end already exists, the first 9 jobs will be run before the 10th > job discovers that the whole thing was doomed from the beginning. > To avoid this a validate call needs to be added to the new LoadFunc and > StoreFunc interfaces. Pig needs to pass this method enough information that > the load function implementer can delegate to InputFormat.getSplits() and the > store function implementer to OutputFormat.checkOutputSpecs() if s/he decides > to. Since 90% of all load and store functions use HDFS and PigStorage will > also need to, the Pig team should implement a default file existence check on > HDFS and make it available as a static method to other Load/Store function > implementers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1217) [piggybank] evaluation.util.Top is broken
[ https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1217. --- > [piggybank] evaluation.util.Top is broken > - > > Key: PIG-1217 > URL: https://issues.apache.org/jira/browse/PIG-1217 > Project: Pig > Issue Type: Bug >Affects Versions: 0.3.0, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0 >Reporter: Dmitriy V. Ryaboy >Assignee: Dmitriy V. Ryaboy >Priority: Minor > Fix For: 0.7.0 > > Attachments: fix_top_udf.diff, fix_top_udf.diff, fix_top_udf.diff > > > The Top udf has been broken for a while, due to an incorrect implementation > of getArgToFuncMapping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1215) Make Hadoop jobId more prominent in the client log
[ https://issues.apache.org/jira/browse/PIG-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1215. --- > Make Hadoop jobId more prominent in the client log > -- > > Key: PIG-1215 > URL: https://issues.apache.org/jira/browse/PIG-1215 > Project: Pig > Issue Type: Improvement >Reporter: Olga Natkovich >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: pig-1215.patch, pig-1215.patch, pig-1215_1.patch, > pig-1215_3.patch, pig-1215_4.patch > > > This is a request from applications that want to be able to programmatically > parse client logs to find hadoop Ids. > The woould like to see each job id on a separate line in the following format: > hadoopJobId: job_123456789 > They would also like to see the jobs in the order they are executed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1207) [zebra] Data sanity check should be performed at the end of writing instead of later at query time
[ https://issues.apache.org/jira/browse/PIG-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1207. --- > [zebra] Data sanity check should be performed at the end of writing instead > of later at query time > --- > > Key: PIG-1207 > URL: https://issues.apache.org/jira/browse/PIG-1207 > Project: Pig > Issue Type: Improvement >Reporter: Yan Zhou >Assignee: Yan Zhou > Fix For: 0.7.0 > > Attachments: PIG-1207.patch, PIG-1207.patch > > > Currently the equity check of number of rows across different column groups > are performed by the query. And the error info is sketchy and only emits a > "Column groups are not evenly distributed", or worse, throws an > IndexOufOfBound exception from CGScanner.getCGValue since BasicTable.atEnd > and BasicTable.getKey, which are called just before BasicTable.getValue, only > checks the first column group in projection and any discrepancy of the number > of rows per file cross multiple column groups in projection could have > BasicTable.atEnd return false and BasicTable.getKey return a key normally > but another column group already exaust its current file and the call to its > CGScanner.getCGValue throw the exception. > This check should also be performed at the end of writing and the error info > should be more informational. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1209) Port POJoinPackage to proactively spill
[ https://issues.apache.org/jira/browse/PIG-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1209. --- > Port POJoinPackage to proactively spill > --- > > Key: PIG-1209 > URL: https://issues.apache.org/jira/browse/PIG-1209 > Project: Pig > Issue Type: Bug >Reporter: Sriranjan Manjunath >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: pig-1209.patch > > > POPackage proactively spills the bag whereas POJoinPackage still uses the > SpillableMemoryManager. We should port this to use InternalCacheBag which > proactively spills. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null
[ https://issues.apache.org/jira/browse/PIG-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1212. --- > LogicalPlan.replaceAndAddSucessors produce wrong result when successors are > null > > > Key: PIG-1212 > URL: https://issues.apache.org/jira/browse/PIG-1212 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1212-1.patch, PIG-1212-2.patch > > > The following script throw a NPE: > a = load '1.txt' as (a0:chararray); > b = load '2.txt' as (b0:chararray); > c = join a by a0, b by b0; > d = filter c by a0 == 'a'; > explain d; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1204) Pig hangs when joining two streaming relations in local mode
[ https://issues.apache.org/jira/browse/PIG-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1204. --- > Pig hangs when joining two streaming relations in local mode > > > Key: PIG-1204 > URL: https://issues.apache.org/jira/browse/PIG-1204 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1204.patch > > > The following script hangs running in local mode when inpuf files contains > many lines (e.g. 10K). The same script works when runing in MR mode. > {code} > A = load 'input1' as (a0, a1, a2); > B = stream A through `head -1` as (a0, a1, a2); > C = load 'input2' as (a0, a1, a2); > D = stream C through `head -1` as (a0, a1, a2); > E = join B by a0, D by a0; > dump E > {code} > Here is one stack trace: > "Thread-13" prio=10 tid=0x09938400 nid=0x1232 in Object.wait() > [0x8fffe000..0x8030] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x9b8e0a40> (a > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream) > at java.lang.Object.wait(Object.java:485) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNextHelper(POStream.java:291) > - locked <0x9b8e0a40> (a > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNext(POStream.java:214) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:272) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1203) Temporarily disable failed unit test in load-store-redesign branch which have external dependency
[ https://issues.apache.org/jira/browse/PIG-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1203. --- > Temporarily disable failed unit test in load-store-redesign branch which have > external dependency > - > > Key: PIG-1203 > URL: https://issues.apache.org/jira/browse/PIG-1203 > Project: Pig > Issue Type: Sub-task > Components: impl >Affects Versions: 0.7.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1203-1.patch > > > In load-store-redesign branch, two test suits, TestHBaseStorage and > TestCounters always fail. TestHBaseStorage depends on > https://issues.apache.org/jira/browse/PIG-1200, TestCounters depends on > future version of hadoop. We disable these two test suits temporarily, and > will enable them once the dependent issues are solved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1198) [zebra] performance improvements
[ https://issues.apache.org/jira/browse/PIG-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1198. --- > [zebra] performance improvements > > > Key: PIG-1198 > URL: https://issues.apache.org/jira/browse/PIG-1198 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou >Assignee: Yan Zhou > Fix For: 0.7.0 > > Attachments: PIG-1198.patch, PIG-1198.patch, PIG-1198.patch > > > Current input split generation is row-based split on individual TFiles. This > leaves undesired fact that even for TFiles smaller than one block one split > is still generated for each. Consequently, there will be many mappers, and > many waves, needed to handle the many small TFiles generated by as many > mappers/reducers that wrote the data. This issue can be addressed by > generating input splits that can include multiple TFiles. > For sorted tables, key distribution generation by table, which is used to > generated proper input splits, includes key distributions from column groups > even they are not in projection. This incurs extra cost to perform > unnecessary computations and, more inappropriately, creates unreasonable > results on input split generations; > For unsorted tables, when row split is generated on a union of tables, the > FileSplits are generated for each table and then lumped together to form the > final list of splits to Map/Reduce. This has a undesirable fact that number > of splits is subject to the number of tables in the table union and not just > controlled by the number of splits used by the Map/Reduce framework; > The input split's goal size is calculated on all column groups even if some > of them are not in projection; > For input splits of multiple files in one column group, all files are opened > at startup. This is unnecessary and takes unnecessarily resources from start > to end. The files should be opened when needed and closed when not; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1190) Handling of quoted strings in pig-latin/grunt commands
[ https://issues.apache.org/jira/browse/PIG-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1190. --- > Handling of quoted strings in pig-latin/grunt commands > -- > > Key: PIG-1190 > URL: https://issues.apache.org/jira/browse/PIG-1190 > Project: Pig > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: correct-testcase.patch, pig-1190.patch, pig-1190_1.patch > > > There is some inconsistency in the way quoted strings are used/handled in > pig-latin . > In load/store and define-ship commands, files are specified in quoted strings > , and the file name is the content within the quotes. But in case of > register, set, and file system commands , if string is specified in quotes, > the quotes are also included as part of the string. This is not only > inconsistent , it is also unintuitive. > This is also inconsistent with the way hdfs commandline (or bash shell) > interpret file names. > For example, currently with the command - > set job.name 'job123' > The job name set set to 'job123' (including the quotes) not job123 . > This needs to be fixed, and above command should be considered equivalent to > - set job.name job123. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1200) Using TableInputFormat in HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1200. --- > Using TableInputFormat in HBaseStorage > -- > > Key: PIG-1200 > URL: https://issues.apache.org/jira/browse/PIG-1200 > Project: Pig > Issue Type: Sub-task >Affects Versions: 0.7.0 >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Fix For: 0.7.0 > > Attachments: Pig_1200.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1194) ERROR 2055: Received Error while processing the map plan
[ https://issues.apache.org/jira/browse/PIG-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1194. --- > ERROR 2055: Received Error while processing the map plan > > > Key: PIG-1194 > URL: https://issues.apache.org/jira/browse/PIG-1194 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.5.0, 0.6.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: inputdata.txt, PIG-1194.patch, PIG-1294_1.patch > > > I have a simple Pig script which takes 3 columns out of which one is null. > {code} > input = load 'inputdata.txt' using PigStorage() as (col1, col2, col3); > a = GROUP input BY (((double) col3)/((double) col2) > .001 OR col1 < 11 ? > col1 : -1); > b = FOREACH a GENERATE group as col1, SUM(input.col2) as col2, > SUM(input.col3) as col3; > store b into 'finalresult'; > {code} > When I run this script I get the following error: > ERROR 2055: Received Error while processing the map plan. > org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received > Error while processing the map plan. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:277) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > > A more useful error message for the purpose of debugging would be helpful. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1173) pig cannot be built without an internet connection
[ https://issues.apache.org/jira/browse/PIG-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1173. --- > pig cannot be built without an internet connection > -- > > Key: PIG-1173 > URL: https://issues.apache.org/jira/browse/PIG-1173 > Project: Pig > Issue Type: Bug >Reporter: Jeff Hodges >Assignee: Jeff Hodges >Priority: Minor > Fix For: 0.7.0 > > Attachments: offlinebuild-v2.patch, offlinebuild.patch > > > Pig's build.xml does not allow for offline building even when it's been built > before. This is because the ivy-download target has not conditional > associated with it to turn it off. The Hadoop seems to be adding an > unless="offline" to the ivy-download target. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later
[ https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1184. --- > PruneColumns optimization does not handle the case of foreach flatten > correctly if flattened bag is not used later > -- > > Key: PIG-1184 > URL: https://issues.apache.org/jira/browse/PIG-1184 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Pradeep Kamath >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1184-1.patch, PIG-1184-2.patch > > > The following script : > {noformat} > -e "a = load 'input.txt' as (f1:chararray, f2:chararray, > f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a > generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, > \$4; dump b;" > {noformat} > gives the following result: > (oiue,M,10) > {noformat} > cat input.txt: > oiueM {(3),(4)} {(toronto),(montreal)} > {noformat} > If PruneColumns optimizations is disabled, we get the right result: > (oiue,M,10) > (oiue,M,10) > (oiue,M,10) > (oiue,M,10) > The flatten results in 4 records - so the output should contain 4 records. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1189. --- > StoreFunc UDF should ship to the backend automatically without "register" > - > > Key: PIG-1189 > URL: https://issues.apache.org/jira/browse/PIG-1189 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: multimapstore.pig, multireducestore.pig, > PIG-1189-1.patch, PIG-1189-2.patch, PIG-1189-3.patch, singlemapstore.pig, > singlereducestore.pig > > > Pig should ship store UDF to backend even if user do not use "register". The > prerequisite is that UDF should be in classpath on frontend. We make that > work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), > we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1171) Top-N queries produce incorrect results when followed by a cross statement
[ https://issues.apache.org/jira/browse/PIG-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1171. --- > Top-N queries produce incorrect results when followed by a cross statement > -- > > Key: PIG-1171 > URL: https://issues.apache.org/jira/browse/PIG-1171 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1171.patch > > > ??I am not sure if this is a bug, or something more subtle, but here is the > problem that I am having.?? > ??When I LOAD a dataset, change it with an ORDER, LIMIT it, then CROSS it > with itself, the results are not correct. I expect to see the cross of the > limited, ordered dataset, but instead I see the cross of the limited dataset. > Effectively, its like the LIMIT is being excluded.?? > ??Example code follows:?? > {code} > A = load 'foo' as (f1:int, f2:int, f3:int); B = load 'foo' as (f1:int, > f2:int, f3:int); > a = ORDER A BY f1 DESC; > b = ORDER B BY f1 DESC; > aa = LIMIT a 1; > bb = LIMIT b 1; > C = CROSS aa, bb; > DUMP C; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1164) [zebra]smoke test
[ https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1164. --- > [zebra]smoke test > - > > Key: PIG-1164 > URL: https://issues.apache.org/jira/browse/PIG-1164 > Project: Pig > Issue Type: Test >Affects Versions: 0.6.0 >Reporter: Jing Huang > Fix For: 0.7.0 > > Attachments: PIG-1164.patch, PIG-SMOKE.patch, smoke.patch > > > Change zebra build.xml file to add smoke target. > And env.sh and run script under zebra/src/test/smoke dir -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1170) [zebra] end to end test and stress test
[ https://issues.apache.org/jira/browse/PIG-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1170. --- > [zebra] end to end test and stress test > --- > > Key: PIG-1170 > URL: https://issues.apache.org/jira/browse/PIG-1170 > Project: Pig > Issue Type: Test >Affects Versions: 0.6.0 >Reporter: Jing Huang > Fix For: 0.7.0 > > Attachments: e2eStress.patch > > > Add test cases for zebra end 2 end test , stress test and stress test > verification tool. > No unit test is needed for this jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1187) UTF-8 (international code) breaks with loader when load with schema is specified
[ https://issues.apache.org/jira/browse/PIG-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1187. --- > UTF-8 (international code) breaks with loader when load with schema is > specified > > > Key: PIG-1187 > URL: https://issues.apache.org/jira/browse/PIG-1187 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > > I have a set of Pig statements which dump an international dataset. > {code} > INPUT_OBJECT = load 'internationalcode'; > describe INPUT_OBJECT; > dump INPUT_OBJECT; > {code} > Sample output > (756a6196-ebcd-4789-ad2f-175e5df65d55,{(labelAaÂâÀ),(labelあいうえお1),(labelஜார்க2),(labeladfadf)}) > It works and dumps results but when I use a schema for loading it fails. > {code} > INPUT_OBJECT = load 'internationalcode' AS (object_id:chararray, labels: bag > {T: tuple(label:chararray)}); > describe INPUT_OBJECT; > {code} > The error message is as follows:2010-01-14 02:23:27,320 FATAL > org.apache.hadoop.mapred.Child: Error running child : > org.apache.pig.data.parser.TokenMgrError: Error: Bailing out of infinite loop > caused by repeated empty string matches at line 1, column 21. > at > org.apache.pig.data.parser.TextDataParserTokenManager.TokenLexicalActions(TextDataParserTokenManager.java:620) > at > org.apache.pig.data.parser.TextDataParserTokenManager.getNextToken(TextDataParserTokenManager.java:569) > at > org.apache.pig.data.parser.TextDataParser.jj_ntk(TextDataParser.java:651) > at > org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:152) > at > org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:100) > at > org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:382) > at > org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42) > at > org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageConverter.java:68) > at > org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:76) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:845) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:250) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1176) Column Pruner issues in union of loader with and without schema
[ https://issues.apache.org/jira/browse/PIG-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1176. --- > Column Pruner issues in union of loader with and without schema > --- > > Key: PIG-1176 > URL: https://issues.apache.org/jira/browse/PIG-1176 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1176-1.patch > > > Column pruner for union could fail if one source of union have the schema and > the other does not have schema. For example, the following script fail: > {code} > a = load '1.txt' as (a0, a1, a2); > b = foreach a generate a0; > c = load '2.txt'; > d = foreach c generate $0; > e = union b, d; > dump e; > {code} > However, this issue is in trunk only and is not applicable to 0.6 branch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1158) pig command line -M option doesn't support table union correctly (comma seperated paths)
[ https://issues.apache.org/jira/browse/PIG-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1158. --- > pig command line -M option doesn't support table union correctly (comma > seperated paths) > > > Key: PIG-1158 > URL: https://issues.apache.org/jira/browse/PIG-1158 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Jing Huang >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1158.patch > > > for example, load (1.txt,2.txt) USING > org.apache.hadoop.zebra.pig.TableLoader() > i see this errror from stand out: > [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: > hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/1.txt,2.txt does not > exist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators
[ https://issues.apache.org/jira/browse/PIG-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1156. --- > Add aliases to ExecJobs and PhysicalOperators > - > > Key: PIG-1156 > URL: https://issues.apache.org/jira/browse/PIG-1156 > Project: Pig > Issue Type: Improvement >Reporter: Dmitriy V. Ryaboy >Assignee: Dmitriy V. Ryaboy > Fix For: 0.7.0 > > Attachments: pig_batchAliases.patch > > > Currently, the way to use muti-query from Java is as follows: > 1. pigServer.setBatchOn(); > 2. register your queries with pigServer > 3. List jobs = pigServer.executeBatch(); > 4. for (ExecJob job : jobs) { Iterator results = job.getResults(); } > This will cause all stores to get evaluated in a single batch. However, there > is no way to identify which of the ExecJobs corresponds to which store. We > should add aliases by which the stored relations are known to ExecJob in > order to allow the user to identify what the jobs correspond do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1161) Add missing apache headers to a few classes
[ https://issues.apache.org/jira/browse/PIG-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1161. --- > Add missing apache headers to a few classes > --- > > Key: PIG-1161 > URL: https://issues.apache.org/jira/browse/PIG-1161 > Project: Pig > Issue Type: Task >Reporter: Dmitriy V. Ryaboy >Assignee: Dmitriy V. Ryaboy >Priority: Trivial > Fix For: 0.7.0 > > Attachments: pig_missing_licenses.patch > > > The following java classes are missing Apache License headers: > StoreConfig > MapRedUtil > SchemaUtil > TestDataBagAccess > TestNullConstant > TestSchemaUtil > We should add the missing headers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1159) merge join right side table does not support comma seperated paths
[ https://issues.apache.org/jira/browse/PIG-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1159. --- > merge join right side table does not support comma seperated paths > -- > > Key: PIG-1159 > URL: https://issues.apache.org/jira/browse/PIG-1159 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Jing Huang >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1159.patch > > > For example this is my script:(join_jira1.pig) > register /grid/0/dev/hadoopqa/jars/zebra.jar; > --a1 = load '1.txt' as (a:int, > b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); > --a2 = load '2.txt' as (a:int, > b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); > --sort1 = order a1 by a parallel 6; > --sort2 = order a2 by a parallel 5; > --store sort1 into 'asort1' using > org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); > --store sort2 into 'asort2' using > org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); > --store sort1 into 'asort3' using > org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); > --store sort2 into 'asort4' using > org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); > joinl = LOAD 'asort1,asort2' USING > org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted'); > joinr = LOAD 'asort3,asort4' USING > org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted'); > joina = join joinl by a, joinr by a using "merge" ; > dump joina; > == > here is the log: > Backend error message > - > java.lang.IllegalArgumentException: Pathname > /user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4 > from > hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4 > is not a valid DFS filename. > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648) > at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203) > at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131) > at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147) > at > org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:534) > at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:338) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:398) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:184) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > Pig Stack Trace > --- > ERROR 6015: During execution, encountered a Hadoop error. > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias joina > at org.apache.pig.PigServer.openIterator(PigServer.java:482) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:386) > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: > During execution, encountered a Hadoop error. > at > .apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158) > at > .apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) > at .apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)at > .apache.pig.backend.hadoop.datastorage.HDa
[jira] Closed: (PIG-1169) Top-N queries produce incorrect results when a store statement is added between order by and limit statement
[ https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1169. --- > Top-N queries produce incorrect results when a store statement is added > between order by and limit statement > > > Key: PIG-1169 > URL: https://issues.apache.org/jira/browse/PIG-1169 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1169.patch > > > ??We tried to get top N results after a groupby and sort, and got different > results with or without storing the full sorted results. Here is a skeleton > of our pig script.?? > {code} > raw_data = Load '' AS (f1, f2, ..., fn); > grouped = group raw_data by (f1, f2); > data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value; > ordered = order data by value DESC parallel 10; > topn = limit ordered 10; > store ordered into 'outputdir/full'; > store topn into 'outputdir/topn'; > {code} > ??With the statement 'store ordered ...', top N results are incorrect, but > without the statement, results are correct. Has anyone seen this before? I > know a similar bug has been fixed in the multi-query release. We are on pig > .4 and hadoop .20.1.?? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1157. --- > Sucessive replicated joins do not generate Map Reduce plan and fails due to > OOM > --- > > Key: PIG-1157 > URL: https://issues.apache.org/jira/browse/PIG-1157 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, > replicatedjoinexplain.log > > > Hi all, > I have a script which does 2 replicated joins in succession. Please note > that the inputs do not exist on the HDFS. > {code} > A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); > A1 = FOREACH A GENERATE a; > B = GROUP A1 BY a; > C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); > D = JOIN C BY x, B BY group USING "replicated"; > E = JOIN A BY a, D by x USING "replicated"; > dump E; > {code} > 2009-12-16 19:12:00,253 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 4 > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-only splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-reduce splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 2 out of total 2 splittees. > 2009-12-16 19:12:00,254 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 2 > 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2998: Unhandled internal error. unable to create new native thread > Details at logfile: pig_1260990666148.log > Looking at the log file: > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. unable to create new native thread > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:597) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) > at org.apache.pig.PigServer.store(PigServer.java:522) > at org.apache.pig.PigServer.openIterator(PigServer.java:458) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > > If we want to look at the explain output, we find that there is no Map Reduce > plan that is generated. > Why is the M/R plan not generated? > Attaching the script and explain output. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1148) Move splitable logic from pig latin to InputFormat
[ https://issues.apache.org/jira/browse/PIG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1148. --- > Move splitable logic from pig latin to InputFormat > -- > > Key: PIG-1148 > URL: https://issues.apache.org/jira/browse/PIG-1148 > Project: Pig > Issue Type: Sub-task >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Fix For: 0.7.0 > > Attachments: PIG-1148.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1153) [zebra] spliting columns at different levels in a complex record column into different column groups throws exception
[ https://issues.apache.org/jira/browse/PIG-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1153. --- > [zebra] spliting columns at different levels in a complex record column into > different column groups throws exception > - > > Key: PIG-1153 > URL: https://issues.apache.org/jira/browse/PIG-1153 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Xuefu Zhang >Assignee: Yan Zhou > Fix For: 0.7.0 > > Attachments: PIG-1153.patch, PIG-1153.patch > > > The following code sample: > String strSch = "r1:record(f1:int, f2:int), r2:record(f5:int, > r3:record(f3:float, f4))"; > String strStorage = "[r1.f1, r2.r3.f3, r2.f5]; [r1.f2, r2.r3.f4]"; > Partition p = new Partition(schema.toString(), strStorage, null); > gives the following exception: > org.apache.hadoop.zebra.parser.ParseException: Different Split Types Set > on the same field: r2.f5 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1141) Make streaming work with the new load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1141. --- > Make streaming work with the new load-store interfaces > --- > > Key: PIG-1141 > URL: https://issues.apache.org/jira/browse/PIG-1141 > Project: Pig > Issue Type: Sub-task >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1141.patch, PIG-1141.patch, PIG-1141.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1154) local mode fails when hadoop config directory is specified in classpath
[ https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1154. --- > local mode fails when hadoop config directory is specified in classpath > --- > > Key: PIG-1154 > URL: https://issues.apache.org/jira/browse/PIG-1154 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Thejas M Nair >Assignee: Ankit Modi > Fix For: 0.7.0 > > Attachments: pig_1154.patch > > > In local mode, the hadoop configuration should not be taken from the > classpath . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1124) Unable to set Custom Job Name using the -Dmapred.job.name parameter
[ https://issues.apache.org/jira/browse/PIG-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1124. --- > Unable to set Custom Job Name using the -Dmapred.job.name parameter > --- > > Key: PIG-1124 > URL: https://issues.apache.org/jira/browse/PIG-1124 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Ashutosh Chauhan >Priority: Minor > Fix For: 0.7.0 > > Attachments: pig-1124.patch > > > As a Hadoop user I want to control the Job name for my analysis via the > command line using the following construct:: > java -cp pig.jar:$HADOOP_HOME/conf -Dmapred.job.name=hadoop_junkie > org.apache.pig.Main broken.pig > -Dmapred.job.name should normally set my Hadoop Job name, but somehow during > the formation of the job.xml in Pig this information is lost and the job name > turns out to be: > "PigLatin:broken.pig" > The current workaround seems to be wiring it in the script itself, using the > following ( or using parameter substitution). > set job.name 'my job' > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs
[ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1149. --- > Allow instantiation of SampleLoaders with parametrized LoadFuncs > > > Key: PIG-1149 > URL: https://issues.apache.org/jira/browse/PIG-1149 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy >Assignee: Dmitriy V. Ryaboy >Priority: Minor > Fix For: 0.7.0 > > Attachments: pig_1149.patch, pig_1149_lsr-branch.patch > > > Currently, it is not possible to instantiate a SampleLoader with something > like PigStorage(':'). We should allow passing parameters to the loaders > being sampled. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'
[ https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1136. --- > [zebra] Map Split of Storage info do not allow for leading underscore char '_' > -- > > Key: PIG-1136 > URL: https://issues.apache.org/jira/browse/PIG-1136 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Yan Zhou >Priority: Minor > Fix For: 0.7.0 > > Attachments: pig-1136-xuefu-new.patch > > > There is some user need to support that type of map keys. Pig's column does > not allow for leading underscore, but apparently no restriction is placed on > the map key. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1131. --- > Pig simple join does not work when it contains empty lines > -- > > Key: PIG-1131 > URL: https://issues.apache.org/jira/browse/PIG-1131 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Viraj Bhat >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: junk1.txt, junk2.txt, pig-1131.patch, pig-1131.patch, > simplejoinscript.pig > > > I have a simple script, which does a JOIN. > {code} > input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); > describe input1; > input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); > describe input2; > joineddata = JOIN input1 by $0, input2 by $0; > describe joineddata; > store joineddata into 'result'; > {code} > The input data contains empty lines. > The join fails in the Map phase with the following error in the > PRLocalRearrange.java > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > I am surprised that the test cases did not detect this error. Could we add > this data which contains empty lines to the testcases? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
[ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1140. --- > [zebra] Use of Hadoop 2.0 APIs > > > Key: PIG-1140 > URL: https://issues.apache.org/jira/browse/PIG-1140 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0 >Reporter: Yan Zhou >Assignee: Xuefu Zhang > Fix For: 0.7.0 > > Attachments: zebra.0209, zebra.0211, zebra.0212, zebra.0213 > > > Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to > upgrade to its 2.0 APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1146) Inconsistent column pruning in LOUnion
[ https://issues.apache.org/jira/browse/PIG-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1146. --- > Inconsistent column pruning in LOUnion > -- > > Key: PIG-1146 > URL: https://issues.apache.org/jira/browse/PIG-1146 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1146-1.patch, PIG-1146-2.patch > > > This happens when we do a union on two relations, if one column comes from a > loader, the other matching column comes from a constant, and this column get > pruned. We prune for the one from loader and did not prune the constant. Thus > leaves union an inconsistent state. Here is a script: > {code} > a = load '1.txt' as (a0, a1:chararray, a2); > b = load '2.txt' as (b0, b2); > c = foreach b generate b0, 'hello', b2; > d = union a, c; > e = foreach d generate $0, $2; > dump e; > {code} > 1.txt: > {code} > ulysses thompson64 1.90 > katie carson25 3.65 > {code} > 2.txt: > {code} > luke king 0.73 > holly davidson 2.43 > {code} > expected output: > (ulysses thompson,1.90) > (katie carson,3.65) > (luke king,0.73) > (holly davidson,2.43) > real output: > (ulysses thompson,) > (katie carson,) > (luke king,0.73) > (holly davidson,2.43) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1117. --- > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117-0.7.0-reviewed.patch, > PIG-1117-0.7.0-reviewed.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1122) [zebra] Zebra build.xml still uses 0.6 version
[ https://issues.apache.org/jira/browse/PIG-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1122. --- > [zebra] Zebra build.xml still uses 0.6 version > -- > > Key: PIG-1122 > URL: https://issues.apache.org/jira/browse/PIG-1122 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Yan Zhou >Assignee: Yan Zhou > Fix For: 0.7.0 > > Attachments: PIG-1122.patch > > > Zebra still uses pig-0.6.0-dev-core.jar in build-contrib.xml. It should be > changed to pig-0.7.0-dev-core.jar on APACHE trunk only. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1115) [zebra] temp files are not cleaned.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1115. --- > [zebra] temp files are not cleaned. > --- > > Key: PIG-1115 > URL: https://issues.apache.org/jira/browse/PIG-1115 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Hong Tang >Assignee: Gaurav Jain > Fix For: 0.7.0 > > Attachments: PIG-1115.patch > > > Temp files created by zebra during table creation are not cleaned where there > is any task failure, which results in waste of disk space. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1103) refactor test-commit
[ https://issues.apache.org/jira/browse/PIG-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1103. --- > refactor test-commit > > > Key: PIG-1103 > URL: https://issues.apache.org/jira/browse/PIG-1103 > Project: Pig > Issue Type: Task >Reporter: Olga Natkovich >Assignee: Olga Natkovich > Fix For: 0.7.0 > > Attachments: PIG-1103.patch > > > Due to the changes to the local mode, many tests are now taking longer. Need > to make sure that test-commit still finishes within 10 minutes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1110) Handle compressed file formats -- Gz, BZip with the new proposal
[ https://issues.apache.org/jira/browse/PIG-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1110. --- > Handle compressed file formats -- Gz, BZip with the new proposal > > > Key: PIG-1110 > URL: https://issues.apache.org/jira/browse/PIG-1110 > Project: Pig > Issue Type: Sub-task >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1110.patch, PIG-1110.patch, PIG_1110_Jeff.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1099) [zebra] version on APACHE trunk should be 0.7.0 to be in pace with PIG
[ https://issues.apache.org/jira/browse/PIG-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1099. --- > [zebra] version on APACHE trunk should be 0.7.0 to be in pace with PIG > -- > > Key: PIG-1099 > URL: https://issues.apache.org/jira/browse/PIG-1099 > Project: Pig > Issue Type: Bug >Reporter: Yan Zhou >Assignee: Yan Zhou >Priority: Trivial > Fix For: 0.7.0 > > Attachments: PIG_1099.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1102) Collect number of spills per job
[ https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1102. --- > Collect number of spills per job > > > Key: PIG-1102 > URL: https://issues.apache.org/jira/browse/PIG-1102 > Project: Pig > Issue Type: Improvement >Reporter: Olga Natkovich >Assignee: Sriranjan Manjunath > Fix For: 0.7.0 > > Attachments: PIG_1102.patch, PIG_1102.patch.1 > > > Memory shortage is one of the main performance issues in Pig. Knowing when we > spill do the disk is useful for understanding query performance and also to > see how certain changes in Pig effect that. > Other interesting stats to collect would be average CPU usage and max mem > usage but I am not sure if this information is easily retrievable. > Using Hadoop counters for this would make sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1101) Pig parser does not recognize its own data type in LIMIT statement
[ https://issues.apache.org/jira/browse/PIG-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1101. --- > Pig parser does not recognize its own data type in LIMIT statement > -- > > Key: PIG-1101 > URL: https://issues.apache.org/jira/browse/PIG-1101 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Ashutosh Chauhan >Priority: Minor > Fix For: 0.7.0 > > Attachments: pig-1101.patch > > > I have a Pig script in which I specify the number of records to limit as a > long type. > {code} > A = LOAD '/user/viraj/echo.txt' AS (txt:chararray); > B = LIMIT A 10L; > DUMP B; > {code} > I get a parser error: > 2009-11-21 02:25:51,100 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1000: Error during parsing. Encountered " "10L "" at line 3, > column 13. > Was expecting: > ... > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.generateParseException(QueryParser.java:8963) > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_consume_token(QueryParser.java:8839) > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.LimitClause(QueryParser.java:1656) > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1280) > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:893) > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:682) > at > org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) > at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1017) > In fact 10L seems to work in the foreach generate construct. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface
[ https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1088. --- > change merge join and merge join indexer to work with new LoadFunc interface > > > Key: PIG-1088 > URL: https://issues.apache.org/jira/browse/PIG-1088 > Project: Pig > Issue Type: Sub-task >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.7.0 > > Attachments: PIG-1088.1.patch, PIG-1088.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1106) FR join should not spill
[ https://issues.apache.org/jira/browse/PIG-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1106. --- > FR join should not spill > > > Key: PIG-1106 > URL: https://issues.apache.org/jira/browse/PIG-1106 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Ankit Modi > Fix For: 0.7.0 > > Attachments: frjoin-nonspill.patch > > > Currently, the values for the replicated side of the data are placed in a > spillable bag (POFRJoin near line 275). This does not make sense because the > whole point of the optimization is that the data on one side fits into > memory. We already have a non-spillable bag implemented > (NonSpillableDataBag.java) and we need to change FRJoin code to use it. And > of course need to do lots of testing to make sure that we don't spill but die > instead when we run out of memory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface
[ https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1062. --- > load-store-redesign branch: change SampleLoader and subclasses to work with > new LoadFunc interface > --- > > Key: PIG-1062 > URL: https://issues.apache.org/jira/browse/PIG-1062 > Project: Pig > Issue Type: Sub-task >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.7.0 > > Attachments: PIG-1062.5.patch, PIG-1062.patch, PIG-1062.patch.3 > > > This is part of the effort to implement new load store interfaces as laid out > in http://wiki.apache.org/pig/LoadStoreRedesignProposal . > PigStorage and BinStorage are now working. > SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to > be changed to work with new LoadFunc interface. > Fixing SampleLoader and RandomSampleLoader will get order-by queries working. > PoissonSampleLoader is used by skew join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1094) Fix unit tests corresponding to source changes so far
[ https://issues.apache.org/jira/browse/PIG-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1094. --- > Fix unit tests corresponding to source changes so far > - > > Key: PIG-1094 > URL: https://issues.apache.org/jira/browse/PIG-1094 > Project: Pig > Issue Type: Sub-task >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.7.0 > > Attachments: PIG-1094.patch, PIG-1094_2.patch, PIG-1094_3.patch, > PIG-1094_4.patch, PIG-1094_5.patch, PIG-1094_6.patch, PIG-1094_7.patch > > > The check-in's so far on load-store-redesign branch have nor addressed unit > test failures due to interface changes. This jira is to track the task of > making the common case unit tests work with the new interfaces. Some aspects > of the new proposal like using LoadCaster interface for casting, making local > mode work have not been completed yet. Tests which are failing due to those > reasons will not be fixed in this jira and addressed in the jiras > corresponding to those tasks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1075) Error in Cogroup when key fields types don't match
[ https://issues.apache.org/jira/browse/PIG-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1075. --- > Error in Cogroup when key fields types don't match > -- > > Key: PIG-1075 > URL: https://issues.apache.org/jira/browse/PIG-1075 > Project: Pig > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Ankur >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1075.patch > > > When Cogrouping 2 relations on multiple key fields, pig throws an error if > the corresponding types don't match. > Consider the following script:- > A = LOAD 'data' USING PigStorage() as (a:chararray, b:int, c:int); > B = LOAD 'data' USING PigStorage() as (a:chararray, b:chararray, c:int); > C = CoGROUP A BY (a,b,c), B BY (a,b,c); > D = FOREACH C GENERATE FLATTEN(A), FLATTEN(B); > describe D; > dump D; > The complete stack trace of the error thrown is > Pig Stack Trace > --- > ERROR 1051: Cannot cast to Unknown > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1001: Unable to > describe schema for alias D > at org.apache.pig.PigServer.dumpSchema(PigServer.java:436) > at > org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:233) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:253) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An > unexpected exception caused the validation to stop > at > org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:104) > at > org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40) > at > org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30) > at > org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:83) > at org.apache.pig.PigServer.compileLp(PigServer.java:821) > at org.apache.pig.PigServer.dumpSchema(PigServer.java:428) > ... 6 more > Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: > ERROR 1060: Cannot resolve COGroup output schema > at > org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2463) > at > org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:372) > at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:45) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at > org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101) > ... 11 more > Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: > ERROR 1051: Cannot cast to Unknown > at > org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertAtomicCastForCOGroupInnerPlan(TypeCheckingVisitor.java:2552) > at > org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2451) > ... 16 more > The error message does not help the user in identifying the issue clearly > especially if the pig script is large and complex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1082) Modify Comparator to work with a typed textual Storage
[ https://issues.apache.org/jira/browse/PIG-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1082. --- > Modify Comparator to work with a typed textual Storage > -- > > Key: PIG-1082 > URL: https://issues.apache.org/jira/browse/PIG-1082 > Project: Pig > Issue Type: Sub-task >Affects Versions: 0.4.0 >Reporter: hc busy > Fix For: 0.7.0 > > Original Estimate: 5h > Remaining Estimate: 5h > > See parent bug. This ticket is for just the comparator change, which needs to > be made in order for the nested data structures to sort right -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1090. --- > Update sources to reflect recent changes in load-store interfaces > - > > Key: PIG-1090 > URL: https://issues.apache.org/jira/browse/PIG-1090 > Project: Pig > Issue Type: Sub-task >Affects Versions: 0.7.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.7.0 > > Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, > PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, > PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, > PIG-1090-20.patch, PIG-1090-21.patch, PIG-1090-22.patch, PIG-1090-3.patch, > PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, > PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch > > > There have been some changes (as recorded in the Changes Section, Nov 2 2009 > sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the > load/store interfaces - this jira is to track the task of making those > changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1093) pig.properties file is missing from distributions
[ https://issues.apache.org/jira/browse/PIG-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1093. --- > pig.properties file is missing from distributions > - > > Key: PIG-1093 > URL: https://issues.apache.org/jira/browse/PIG-1093 > Project: Pig > Issue Type: Bug > Components: build >Affects Versions: 0.5.0, 0.6.0 >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 0.7.0 > > Attachments: PIG-1093.patch > > > pig.properties (in fact the entire conf directory) is not included in the > jars distributed as part of the 0.5 release. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1086) Nested sort by * throw exception
[ https://issues.apache.org/jira/browse/PIG-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1086. --- > Nested sort by * throw exception > > > Key: PIG-1086 > URL: https://issues.apache.org/jira/browse/PIG-1086 > Project: Pig > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Daniel Dai >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1086.patch > > > The following script fail: > A = load '1.txt' as (a0, a1, a2); > B = group A by a0; > C = foreach B { D = order A by *; generate group, D;}; > explain C; > Here is the stack: > Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 > at java.util.ArrayList.get(ArrayList.java:324) > at > org.apache.pig.impl.logicalLayer.schema.Schema.getField(Schema.java:752) > at > org.apache.pig.impl.logicalLayer.LOSort.getSortInfo(LOSort.java:332) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:1365) > at org.apache.pig.impl.logicalLayer.LOSort.visit(LOSort.java:176) > at org.apache.pig.impl.logicalLayer.LOSort.visit(LOSort.java:43) > at > org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:69) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:1274) > at > org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:130) > at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:45) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:234) > at org.apache.pig.PigServer.compilePp(PigServer.java:864) > at org.apache.pig.PigServer.explain(PigServer.java:583) > ... 8 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner
[ https://issues.apache.org/jira/browse/PIG-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1072. --- > ReversibleLoadStoreFunc interface should be removed to enable different load > and store implementation classes to be used in a reversible manner > --- > > Key: PIG-1072 > URL: https://issues.apache.org/jira/browse/PIG-1072 > Project: Pig > Issue Type: Sub-task >Reporter: Pradeep Kamath >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1072.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1079) Modify merge join to use distributed cache to maintain the index
[ https://issues.apache.org/jira/browse/PIG-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1079. --- > Modify merge join to use distributed cache to maintain the index > > > Key: PIG-1079 > URL: https://issues.apache.org/jira/browse/PIG-1079 > Project: Pig > Issue Type: Bug >Reporter: Sriranjan Manjunath >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1079.patch, PIG-1079.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1053. --- > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Fix For: 0.7.0 > > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter
[ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1022. --- > optimizer pushes filter before the foreach that generates column used by > filter > --- > > Key: PIG-1022 > URL: https://issues.apache.org/jira/browse/PIG-1022 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.4.0 >Reporter: Thejas M Nair >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1022-1.patch > > > grunt> l = load 'students.txt' using PigStorage() as (name:chararray, > gender:chararray, age:chararray, score:chararray); > grunt> f = foreach l generate name, gender, age,score, '200' as > gid:chararray; > grunt> g = group f by (name, gid); > grunt> f2 = foreach g generate group.name as name: chararray, group.gid as > gid: chararray; > grunt> filt = filter f2 by gid == '200'; > grunt> explain filt; > In the plan generated filt is pushed up after the load and before the first > foreach, even though the filter is on gid which is generated in first foreach. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1045) Integration with Hadoop 20 New API
[ https://issues.apache.org/jira/browse/PIG-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1045. --- > Integration with Hadoop 20 New API > -- > > Key: PIG-1045 > URL: https://issues.apache.org/jira/browse/PIG-1045 > Project: Pig > Issue Type: New Feature >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1045.patch, PIG-1045.patch > > > Hadoop 21 is not yet released but we know that switch to new MR API is coming > there. This JIRA is for early integration with the portion of this API that > has been implemented in Hadoop 20. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1046) join algorithm specification is within double quotes
[ https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai closed PIG-1046. --- > join algorithm specification is within double quotes > > > Key: PIG-1046 > URL: https://issues.apache.org/jira/browse/PIG-1046 > Project: Pig > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, > pig-1046_3.patch, pig-1046_4.patch > > > This fails - > j = join l1 by $0, l2 by $0 using 'skewed'; > This works - > j = join l1 by $0, l2 by $0 using "skewed"; > String constants are single-quoted in pig-latin. If the algorithm > specification is supposed to be a string, specifying it within single quotes > should be supported. > Alternatively, we should be using identifiers here, since these are > pre-defined in pig users will not be specifying arbitrary values that might > not be valid identifier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.