from:"Daniel Dai \(JIRA\)"

[jira] Closed: (PIG-1293) pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1293.
---


> pig wrapper script tends to fail if pig is in the path and PIG_HOME isn't set
> -
>
> Key: PIG-1293
> URL: https://issues.apache.org/jira/browse/PIG-1293
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Fix For: 0.7.0
>
> Attachments: PIG-1293.txt
>
>
> If PIG_HOME isn't set and pig is in the path, the pig wrapper script can't 
> find its home.  Setting PIG_HOME makes it hard to support multiple versions 
> of pig. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1284) pig UDF is lacking XMLLoader. Plan to add the XMLLoader

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1284.
---


> pig UDF is lacking XMLLoader. Plan to add the XMLLoader
> ---
>
> Key: PIG-1284
> URL: https://issues.apache.org/jira/browse/PIG-1284
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Alok Singh
>Assignee: Alok Singh
> Fix For: 0.7.0
>
> Attachments: pigudf_xmlLoader.patch, pigudf_xmlLoader.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi All,
>  We are planning to add the XMLLoader UDF in the piggybank repository.
> Here is the proposal with the user docs :-
>  The load function to load the XML file
>  This will implements the LoadFunc interface which is used to parse records
>  from a dataset.
>  This takes a xmlTag as the arg which it will use to split the inputdataset 
> into
>  multiple records.
>  For example if the input xml (input.xml) is like this
>  
>  
>   foobar 
>   barfoo 
>  
>  
>   foo 
>  
>  
>   justname 
>  
>  
>  And your pig script is like this
>  --load the jar files
>  register loader.jar;
>  -- load the dataset using XMLLoader
>  -- A is the bag containing the tuple which contains one atom i.e doc see 
> output
>  A = load '/user/aloks/pig/input.xml using loader.XMLLoader('property') as 
> (doc:chararray);
>  --dump the result
>  dump A;
>  Then you will get the output
> (
>  foobar 
>  barfoo 
> )
> (
>  justname 
> )
> Where each () indicate one record
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1291.
---


> [zebra] Zebra need to support the virtual column 'source_table' for the 
> unsorted table unions also 
> ---
>
> Key: PIG-1291
> URL: https://issues.apache.org/jira/browse/PIG-1291
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch
>
>
> In Pig contrib project zebra,
>  When user do the union of the sorted tables, the resulting table contains a 
> virtual column called  'source_table'.
> Which allows user to know the original table name from where the content of 
> the row of the result table is coming from.
> This feature is also very useful for the case when the input tables are not 
> sorted.
> Based on the discussion with the zebra dev team, it should be easy to 
> implement.
> I am filing this enhancemnet jira for zebra.
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1292) Interface Refinements

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1292.
---


> Interface Refinements
> -
>
> Key: PIG-1292
> URL: https://issues.apache.org/jira/browse/PIG-1292
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-1292.patch, pig-interfaces.patch
>
>
> A loader can't implement both OrderedLoadFunc and IndexableLoadFunc, as both 
> are abstract classes instead of being interfaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1276) [Zebra] Changes requried for Zebra due to PIG-1259 changes

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1276.
---


>  [Zebra] Changes requried for Zebra due to PIG-1259 changes
> ---
>
> Key: PIG-1276
> URL: https://issues.apache.org/jira/browse/PIG-1276
> Project: Pig
>  Issue Type: Bug
>Reporter: Xuefu Zhang
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: zebra.0304
>
>
> Pig resource schema interface changed, so Zebra needs to catch exception 
> thrown from the new interfaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1282.
---


> [zebra] make Zebra's pig test cases run on real cluster
> ---
>
> Key: PIG-1282
> URL: https://issues.apache.org/jira/browse/PIG-1282
> Project: Pig
>  Issue Type: Task
>Affects Versions: 0.6.0
>Reporter: Chao Wang
>Assignee: Chao Wang
> Fix For: 0.7.0
>
> Attachments: PIG-1282.patch
>
>
> The goal of this task is to make Zebra's pig test cases run on real cluster.
> Currently Zebra's pig test cases are mostly tested using MiniCluster. We want 
> to use a real hadoop cluster to test them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1287.
---


> Use hadoop-0.20.2 with pig 0.7.0 release
> 
>
> Key: PIG-1287
> URL: https://issues.apache.org/jira/browse/PIG-1287
> Project: Pig
>  Issue Type: Task
>Affects Versions: 0.7.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.7.0
>
> Attachments: hadoop20.jar, PIG-1287-2.patch, PIG-1287.patch
>
>
> Use hadoop-0.20.2 with pig 0.7.0 release

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1272) Column pruner causes wrong results

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1272.
---


> Column pruner causes wrong results
> --
>
> Key: PIG-1272
> URL: https://issues.apache.org/jira/browse/PIG-1272
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1272-1.patch, PIG-1272-2.patch
>
>
> For a simple script the column pruner optimization removes certain columns 
> from the original relation, which results in wrong results.
> Input file "kv" contains the following columns (tab separated)
> {code}
> a   1
> a   2
> a   3
> b   4
> c   5
> c   6
> b   7
> d   8
> {code}
> Now running this script in Pig 0.6 produces
> {code}
> kv = load 'kv' as (k,v);
> keys= foreach kv generate k;
> keys = distinct keys; 
> keys = limit keys 2;
> rejoin = join keys by k, kv by k;
> dump rejoin;
> {code}
> (a,a)
> (a,a)
> (a,a)
> (b,b)
> (b,b)
> Running this in Pig 0.5 version without column pruner results in:
> (a,a,1)
> (a,a,2)
> (a,a,3)
> (b,b,4)
> (b,b,7)
> When we disable the "ColumnPruner" optimization it gives right results.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1267) Problems with partition filter optimizer

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1267.
---


> Problems with partition filter optimizer
> 
>
> Key: PIG-1267
> URL: https://issues.apache.org/jira/browse/PIG-1267
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1267.patch
>
>
> There are a couple of problems with the current partition filter optimizer:
> 1. When a partition filter is removed from the logical plan, the input index 
> of the following join/cogroup operator may change, which in turn changes the 
> ordering of the fields in the schema and results in compile-time errors.
> 2. At most one partition filter can be removed per plan,  while multiple 
> partition filters can exist in the cases of joins. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1263) Script producing varying number of records when COGROUPing value of map data type with and without types

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1263.
---


> Script producing varying number of records when COGROUPing value of map data 
> type with and without types
> 
>
> Key: PIG-1263
> URL: https://issues.apache.org/jira/browse/PIG-1263
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
>
> I have a Pig script which I am experimenting upon. [[Albeit this is not 
> optimized and can be done in variety of ways]] I get different record counts 
> by placing load store pairs in the script.
> Case 1: Returns 424329 records
> Case 2: Returns 5859 records
> Case 3: Returns 5859 records
> Case 4: Returns 5578 records
> I am wondering what the correct result is?
> Here are the scripts.
> Case 1: 
> {code}
> register udf.jar
> A = LOAD '/user/viraj/data/20100203' USING MapLoader() AS (s, m, l);
> B = FOREACH A GENERATE
> s#'key1' as key1,
> s#'key2' as key2;
> C = FOREACH B generate key2;
> D = filter C by (key2 IS NOT null);
> E = distinct D;
> store E into 'unique_key_list' using PigStorage('\u0001');
> F = Foreach E generate key2, MapGenerate(key2) as m;
> G = FILTER F by (m IS NOT null);
> H = foreach G generate key2, m#'id1' as id1, m#'id2' as id2, m#'id3' as id3, 
> m#'id4' as id4, m#'id5' as id5, m#'id6' as id6, m#'id7' as id7, m#'id8' as 
> id8, m#'id9' as id9, m#'id10' as id10, m#'id11' as id11, m#'id12' as id12;
> I = GROUP H BY (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, 
> id12);
> J = Foreach I generate group.id1 as id1, group.id2 as id2, group.id3 as id3, 
> group.id4 as id4,group.id5 as id5, group.id6 as id6, group.id7 as id7, 
> group.id8 as id8, group.id9 as id9, group.id10 as id10, group.id11 as id11, 
> group.id12 as id12;
> --load previous days data
> K = LOAD '/user/viraj/data/20100202' USING PigStorage('\u0001') as (id1, id2, 
> id3, id4, id5, id6, id7, id8, id9, id10, id11, id12);
> L = COGROUP  K by (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, 
> id12) OUTER,
>  J by (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, 
> id12) OUTER;
> M = filter L by IsEmpty(K);
> store M into 'cogroupNoTypes' using PigStorage();
> {code}
> Case 2:  Storing and loading intermediate results in J 
> {code}
> register udf.jar
> A = LOAD '/user/viraj/data/20100203' USING MapLoader() AS (s, m, l);
> B = FOREACH A GENERATE
> s#'key1' as key1,
> s#'key2' as key2;
> C = FOREACH B generate key2;
> D = filter C by (key2 IS NOT null);
> E = distinct D;
> store E into 'unique_key_list' using PigStorage('\u0001');
> F = Foreach E generate key2, MapGenerate(key2) as m;
> G = FILTER F by (m IS NOT null);
> H = foreach G generate key2, m#'id1' as id1, m#'id2' as id2, m#'id3' as id3, 
> m#'id4' as id4, m#'id5' as id5, m#'id6' as id6, m#'id7' as id7, m#'id8' as 
> id8, m#'id9' as id9, m#'id10' as id10, m#'id11' as id11, m#'id12' as id12;
> I = GROUP H BY (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, 
> id12);
> J = Foreach I generate group.id1 as id1, group.id2 as id2, group.id3 as id3, 
> group.id4 as id4,group.id5 as id5, group.id6 as id6, group.id7 as id7, 
> group.id8 as id8, group.id9 as id9, group.id10 as id10, group.id11 as id11, 
> group.id12 as id12;
> --store intermediate data to HDFS and re-read
> store J into 'output/20100203/J' using PigStorage('\u0001');
> --load previous days data
> K = LOAD '/user/viraj/data/20100202' USING PigStorage('\u0001') as (id1, id2, 
> id3, id4, id5, id6, id7, id8, id9, id10, id11, id12);
> --read J into K1
> K1 = LOAD 'output/20100203/J' using PigStorage('\u0001') as (id1, id2, id3, 
> id4, id5, id6, id7, id8, id9, id10, id11, id12);
> L = COGROUP  K by (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, 
> id12) OUTER,
>  K1 by (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, id11, 
> id12) OUTER;
> M = filter L by IsEmpty(K);
> store M into 'cogroupNoTypesIntStore' using PigStorage();
> {code}
> Case 3: Types information specified but no intermediate store of J
> {code}
> register udf.jar
> A = LOAD '/user/viraj/data/20100203' USING MapLoader() AS (s, m, l);
> B = FOREACH A GENERATE
> s#'key1' as key1,
> s#'key2' as key2;
> C = FOREACH B generate key2;
> D = filter C by (key2 IS NOT null);
> E = distinct D;
> store E into 'unique_key_list' using PigStorage('\u0001');
> F = Foreach E generate key2, MapGenerate(key2) as m;
> G = FILTER F by (m IS NOT null);
> H = foreach G generate key2, (long)m#'id1' as id1, (long)m#'id2' as id2, 
> (long)m#'id3' as id3, (long)m#'id4' as id4, (long)m#'id5' as id5, 
> (long)m#'id6' as id6, (long)m#'id7' as id

[jira] Closed: (PIG-1273) Skewed join throws error

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1273.
---


> Skewed join throws error 
> -
>
> Key: PIG-1273
> URL: https://issues.apache.org/jira/browse/PIG-1273
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1273.patch
>
>
> When the sampled relation is too small or empty then skewed join fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1275) empty bag in PigStorage read as null

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1275.
---


> empty bag in PigStorage read as null
> 
>
> Key: PIG-1275
> URL: https://issues.apache.org/jira/browse/PIG-1275
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1275-1.patch, PIG-1275-2.patch
>
>
> This seems to be introduced after changes in PIG-613 .
> grunt> cat /tmp/students.txt  
>  
> qwerF   {(1),(2)}
> zxldf   M   {}
> grunt> l = load '/tmp/students.txt' as (n : chararray, s : chararray, b: {t : 
> (i : int)} );
> grunt> dump l;  
> (qwer,F,{(1),(2)})
> (zxldf,M,)
> grunt> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1268) [Zebra] Need an ant target that runs all pig-related tests in Zebra

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1268.
---


> [Zebra] Need an ant target that runs all pig-related tests in Zebra
> ---
>
> Key: PIG-1268
> URL: https://issues.apache.org/jira/browse/PIG-1268
> Project: Pig
>  Issue Type: Test
>  Components: build
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: zebra.0303
>
>
> Currently Pig checkins don't run any Zebra test to make sure that Zebra is 
> not broken. To make this happen, Zebra build needs a test target that only 
> run pig-related tests. With this, Pig committers need to do "ant pig" for 
> Zebra as part of the before-checkin sanity check. Ideally, this target should 
> be triggered as part of Hudson.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1269) [Zebra] Restrict schema definition for collection

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1269.
---


> [Zebra] Restrict schema definition for collection
> -
>
> Key: PIG-1269
> URL: https://issues.apache.org/jira/browse/PIG-1269
> Project: Pig
>  Issue Type: Bug
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0310
>
>
> Currently Zebra grammar for schema definition for collection field allows 
> many types of definition. To reduce complexity and remove ambiguity, and more 
> importantly, to make the meta data more representative of the actual data 
> instances, the grammar rules need to be changed. Only a record type is 
> allowed and required for collection definition. Thus,  
> fieldName:collection(record(c1:int, c2:string)) is legal, while 
> fieldName:collection(c1:int, c2:string), 
> fieldName:collection(f:record(c1:int, c2:string)), 
> fieldName:collection(c1:int), or feildName:collection(int) is illegal.
> This will have some impact on existing Zebra M/R programs or Pig scripts that 
> use Zebra. Schema acceptable in previous release now may become illegal 
> because of this change. This should be clearly documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1262) Additional findbugs and javac warnings

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1262.
---


> Additional findbugs and javac warnings
> --
>
> Key: PIG-1262
> URL: https://issues.apache.org/jira/browse/PIG-1262
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1262-1.patch, PIG-1262-2.patch
>
>
> After a while, we have introduced some new findbugs and javacc warnings. Will 
> fix them in this Jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1265.
---


> Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and 
> add a cleanupOnFailure method to StoreFuncInterface
> -
>
> Key: PIG-1265
> URL: https://issues.apache.org/jira/browse/PIG-1265
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.7.0
>
> Attachments: PIG-1265-2.patch, PIG-1265.patch
>
>
> Speaking to the hadoop team folks, the direction in hadoop is to use Job 
> instead of Configuration - for example InputFormat/OutputFormat 
> implementations use Job to store input/output location. So pig should also do 
> the same in LoadMetadata and StoreMetadata to be closer to hadoop.
> Currently when a job fails, pig assumes the output locations (corresponding 
> to the stores in the job) are hdfs locations and attempts to delete them. 
> Since output locations could be non hdfs locations, this cleanup should be 
> delegated to the StoreFuncInterface implementation - hence a new method - 
> cleanupOnFailure() should be introduced in StoreFuncInterface and a default 
> implementation should be provided in the StoreFunc abstract class which 
> checks if the location exists on hdfs and deletes it if so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1260) Param Subsitution results in parser error if there is no EOL after last line in script

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1260.
---


> Param Subsitution results in parser error if there is no EOL after last line 
> in script
> --
>
> Key: PIG-1260
> URL: https://issues.apache.org/jira/browse/PIG-1260
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1260.patch
>
>
> {noformat}
> A = load '$INPUT' using PigStorage(':');
> B = foreach A generate $0 as id;
> store B into '$OUTPUT' USING PigStorage();
> {noformat}
> Invoking above script which contains no EOL in the last line of script as 
> following:
> {noformat} 
> pig -param INPUT=mydata/input -param OUTPUT=mydata/output myscript.pig
> {noformat}
> results in parser error:
> {noformat}
> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during 
> parsing. Lexical error at line 3, column 42.  Encountered:  after : ""
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with > 1 subfields)

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1259.
---


> ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as 
> its only sub field  (the tuple itself can have a schema with > 1 subfields)
> -
>
> Key: PIG-1259
> URL: https://issues.apache.org/jira/browse/PIG-1259
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.7.0
>
> Attachments: PIG-1259-2.patch, PIG-1259.patch
>
>
> Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in 
> the ResourceSchema with a subschema containing anything other than a tuple. 
> The tuple itself can have a schema with > 1 subfields. This check should also 
>  be enforced in ResourceFieldSchema.setSchema()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1266) Show spill count on the pig console at the end of the job

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1266.
---


> Show spill count on the pig console at the end of the job
> -
>
> Key: PIG-1266
> URL: https://issues.apache.org/jira/browse/PIG-1266
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Sriranjan Manjunath
> Fix For: 0.7.0
>
> Attachments: PIG_1266.patch
>
>
> Currently the spill count is displayed only on the job tracker log. It should 
> be displayed on the console as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1264) Skewed join sampler misses out the key with the highest frequency

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1264.
---


> Skewed join sampler misses out the key with the highest frequency
> -
>
> Key: PIG-1264
> URL: https://issues.apache.org/jira/browse/PIG-1264
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Sriranjan Manjunath
>Assignee: Richard Ding
> Fix For: 0.7.0
>
>
> I am noticing two issues with the sampler used in skewed join:
> 1. It does not allocate multiple reducers to the key with the highest 
> frequency.
> 2. It seems to be allocating the same number of reducers to every key (8 in 
> this case).
> Query:
> a = load 'studenttab10k' using PigStorage() as (name, age, gpa);
> b = load 'votertab10k' as (name, age, registration, contributions);
> e = join a by name right, b by name using "skewed" parallel 8;
> store e into 'SkewedJoin_9.out';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1257.
---


> PigStorage per the new load-store redesign should support splitting of bzip 
> files
> -
>
> Key: PIG-1257
> URL: https://issues.apache.org/jira/browse/PIG-1257
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.7.0
>
> Attachments: blockEndingInCR.txt.bz2, 
> blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
> PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on 
> TextInputFormat for reading data. TextInputFormat has support for reading 
> bzip data but without support for splitting bzip files. In pig 0.6, splitting 
> was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1261) PigStorageSchema broke after changes to ResourceSchema

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1261.
---


> PigStorageSchema broke after changes to ResourceSchema
> --
>
> Key: PIG-1261
> URL: https://issues.apache.org/jira/browse/PIG-1261
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Daniel Dai
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.7.0
>
> Attachments: PIG_1261.diff, TestPigStorageSchema.java
>
>
> After we add a new method "getCastString" into ResourceSchema, 
> TestPigStorageSchema begin to fail. Seems PigStorageSchema try to serialize 
> cast string into the schema. If I change the name of the method from 
> "getCastString" to "genCastString", then the error message go away. Since 
> Dmitriy is the author of TestPigStorageSchema, I need his help to check if 
> this is the right approach to fix it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1255) Tiny code cleanup for serialization code for PigSplit

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1255.
---


> Tiny code cleanup for serialization code for PigSplit
> -
>
> Key: PIG-1255
> URL: https://issues.apache.org/jira/browse/PIG-1255
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1255-1.patch, PIG-1255-2.patch
>
>
> A bug which close output stream while serialization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1256) [Zebra] Bag field should always contain a tuple type as the field schema in ResourceSchema object converted from Zebra Schema

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1256.
---


> [Zebra] Bag field should always contain a tuple type as the field schema in 
> ResourceSchema object converted from Zebra Schema
> -
>
> Key: PIG-1256
> URL: https://issues.apache.org/jira/browse/PIG-1256
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: patch.0223
>
>
> Pig now requires that schema converted from Zebra Schema contains Tuple type 
> as field schema in .7 release. Zebra needs to take care of all cases where 
> Record is not explicitly specified.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1258) [zebra] Number of sorted input splits is unusually high

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1258.
---


> [zebra] Number of sorted input splits is unusually high
> ---
>
> Key: PIG-1258
> URL: https://issues.apache.org/jira/browse/PIG-1258
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
> Fix For: 0.7.0
>
> Attachments: PIG-1258.patch
>
>
> Number of sorted input splits is unusually high if the projections are on 
> multiple column groups, or a union of tables, or column group(s) that hold 
> many small tfiles. In one test, the number is about 100 times bigger that 
> from unsorted input splits on the same input tables.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1251) Move SortInfo calculation earlier in compilation

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1251.
---


> Move SortInfo calculation earlier in compilation 
> -
>
> Key: PIG-1251
> URL: https://issues.apache.org/jira/browse/PIG-1251
> Project: Pig
>  Issue Type: Bug
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-1251.patch, pig-1251_1.patch
>
>
> In LSR Pig does Input Output Validation by calling hadoop's checkSpecs() A 
> storefunc might need schema to do such a validation. So, we should call 
> checkSchema() before doing the validation. checkSchema() in turn requires 
> SortInfo which is calculated later in compilation phase. We need to move it 
> earlier in compilation phase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1252.
---


> Diamond splitter does not generate correct results when using Multi-query 
> optimization
> --
>
> Key: PIG-1252
> URL: https://issues.apache.org/jira/browse/PIG-1252
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1252-2.patch, PIG-1252.patch
>
>
> I have script which uses split but somehow does not use one of the split 
> branch. The skeleton of the script is as follows
> {code}
> loadData = load '/user/viraj/zebradata' using 
> org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, 
> col7');
> prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, 
> (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : 
> ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 
> : IS_VALID ('200', '0', '0', 'input.txt')) as validRec;
> SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), 
> falseDataTmp IF (validRec == '1' AND splitcond == '');
> grpData = GROUP trueDataTmp BY splitcond;
> finalData = FOREACH grpData {
>orderedData = ORDER trueDataTmp BY col1,col2;
>GENERATE FLATTEN ( MYUDF (orderedData, 60, 
> 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l);
>   }
> dump finalData;
> {code}
> You can see that "falseDataTmp" is untouched.
> When I run this script with no-Multiquery (-M) option I get the right result. 
>  This could be the result of complex BinCond's in the POLoad. We can get rid 
> of this error by using  FILTER instead of SPIT.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1253) [zebra] make map/reduce test cases run on real cluster

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1253.
---


> [zebra] make map/reduce test cases run on real cluster
> --
>
> Key: PIG-1253
> URL: https://issues.apache.org/jira/browse/PIG-1253
> Project: Pig
>  Issue Type: Task
>Affects Versions: 0.6.0
>Reporter: Chao Wang
>Assignee: Chao Wang
> Fix For: 0.7.0
>
> Attachments: PIG-1253-0.6.patch, PIG-1253.patch, PIG-1253.patch
>
>
> The goal of this task is to make map/reduce test cases run on real cluster.
> Currently map/reduce test cases are mostly tested under local mode. When 
> running on real cluster, all involved jars have to be manually deployed in 
> advance which is not desired. 
> The major change here is to support -libjars option to be able to ship user 
> jars to backend automatically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1250) Make StoreFunc an abstract class and create a mirror interface called StoreFuncInterface

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1250.
---


> Make StoreFunc an abstract class and create a mirror interface called 
> StoreFuncInterface
> 
>
> Key: PIG-1250
> URL: https://issues.apache.org/jira/browse/PIG-1250
> Project: Pig
>  Issue Type: Sub-task
>Affects Versions: 0.7.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.7.0
>
> Attachments: PIG-1250-2.patch, PIG-1250.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1238) Dump does not respect the schema

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1238.
---


> Dump does not respect the schema
> 
>
> Key: PIG-1238
> URL: https://issues.apache.org/jira/browse/PIG-1238
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1238.patch
>
>
> For complex data type and certain sequence of operations dump produces 
> results with non-existent field in the relation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1243) Passing Complex map types to and from streaming causes a problem

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1243.
---


> Passing Complex map types to and from streaming causes a problem
> 
>
> Key: PIG-1243
> URL: https://issues.apache.org/jira/browse/PIG-1243
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.7.0
>
>
> I have a program which generates different types of Maps fields and stores it 
> into PigStorage.
> {code}
> A = load '/user/viraj/three.txt' using PigStorage();
> B = foreach A generate ['a'#'12'] as b:map[], ['b'#['c'#'12']] as c, 
> ['c'#{(['d'#'15']),(['e'#'16'])}] as d;
> store B into '/user/viraj/pigtest' using PigStorage();
> {code}
> Now I test the previous output in the below script to make sure I have the 
> right results. I also pass this data to a Perl script and I observe that the 
> complex Map types I have generated, are lost when I get the result back.
> {code}
> DEFINE CMD `simple.pl` SHIP('simple.pl');
> A = load '/user/viraj/pigtest' using PigStorage() as (simpleFields, 
> mapFields, mapListFields);
> B = foreach A generate $0, $1, $2;
> dump B;
> C = foreach A generate  (chararray)simpleFields#'a' as value, $0,$1,$2;
> D = stream C through CMD as (a0:map[], a1:map[], a2:map[]);
> dump D;
> {code}
> dumping B results in:
> ([a#12],[b#[c#12]],[c#{([d#15]),([e#16])}])
> ([a#12],[b#[c#12]],[c#{([d#15]),([e#16])}])
> ([a#12],[b#[c#12]],[c#{([d#15]),([e#16])}])
> dumping D results in:
> ([a#12],,)
> ([a#12],,)
> ([a#12],,)
> The Perl script used here is:
> {code}
> #!/usr/local/bin/perl
> use warnings;
> use strict;
> while(<>) {
> my($bc,$s,$m,$l)=split/\t/;
> print("$s\t$m\t$l");
> }
> {code}
> Is there an issue with handling of complex Map fields within streaming? How 
> can I fix this to obtain the right result?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1240) [Zebra] suggestion to have zebra manifest file contain version and svn-revision etc.

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1240.
---


> [Zebra]  suggestion to have zebra manifest file contain version and 
> svn-revision etc.
> -
>
> Key: PIG-1240
> URL: https://issues.apache.org/jira/browse/PIG-1240
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.7.0
>Reporter: Gaurav Jain
>Assignee: Gaurav Jain
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: PIG-1240.patch
>
>
> Zebra jars' manifest file sld  have zebra manifest file contain version and 
> svn-revision etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1226) Need to be able to register jars on the command line

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1226.
---


> Need to be able to register jars on the command line
> 
>
> Key: PIG-1226
> URL: https://issues.apache.org/jira/browse/PIG-1226
> Project: Pig
>  Issue Type: Bug
>Reporter: Alan Gates
>Assignee: Thejas M Nair
> Fix For: 0.7.0
>
> Attachments: PIG-1126.patch
>
>
> Currently 'register' can only be done inside a Pig Latin script.  Users often 
> run their scripts in different environments, so jar locations or versions may 
> change.  But they don't want to edit their script to fit each environment.  
> Instead they could register on the command line, something like:
> pig -Dpig.additional.jars=my.jar:your.jar script.pig
> These would not override registers in the Pig Latin script itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1248) [piggybank] useful String functions

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1248.
---


> [piggybank] useful String functions
> ---
>
> Key: PIG-1248
> URL: https://issues.apache.org/jira/browse/PIG-1248
> Project: Pig
>  Issue Type: New Feature
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.7.0
>
> Attachments: PIG_1248.diff, PIG_1248.diff, PIG_1248.diff
>
>
> Pig ships with very few evalFuncs for working with strings. This jira is for 
> adding a few more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1233) NullPointerException in AVG

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1233.
---


> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.7.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1220) Document unknown keywords as missing or to do in future

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1220.
---


> Document unknown keywords as missing or to do in future
> ---
>
> Key: PIG-1220
> URL: https://issues.apache.org/jira/browse/PIG-1220
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
> Fix For: 0.7.0
>
>
> To get help at the grunt shell I do the following:
> grunt>touchz
> 010-02-04 00:59:28,714 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1000: Error during parsing. Encountered "  "touchz "" at line 1, 
> column 1.
> Was expecting one of:
>  
> "cat" ...
> "fs" ...
> "cd" ...
> "cp" ...
> "copyFromLocal" ...
> "copyToLocal" ...
> "dump" ...
> "describe" ...
> "aliases" ...
> "explain" ...
> "help" ...
> "kill" ...
> "ls" ...
> "mv" ...
> "mkdir" ...
> "pwd" ...
> "quit" ...
> "register" ...
> "rm" ...
> "rmf" ...
> "set" ...
> "illustrate" ...
> "run" ...
> "exec" ...
> "scriptDone" ...
> "" ...
>  ...
> ";" ...
> I looked at the code and found that we do nothing at:
> "scriptDone": Is there some future value of that command.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1241) Accumulator is turned on when a map is used with a non-accumulative UDF

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1241.
---


> Accumulator is turned on when a map is used with a non-accumulative UDF
> ---
>
> Key: PIG-1241
> URL: https://issues.apache.org/jira/browse/PIG-1241
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ying He
>Assignee: Ying He
> Fix For: 0.7.0
>
> Attachments: accum.patch
>
>
> Exception is thrown for a script like the following:
> register /homes/yinghe/owl/string.jar;
> a = load 'a.txt' as (id, url);
> b = group  a by (id, url);
> c = foreach b generate  COUNT(a), (CHARARRAY) 
> string.URLPARSE(group.url)#'url';
> dump c;
> In this query, URLPARSE() is not accumulative, and it returns a map. 
> The accumulator optimizer failed to check UDF in this case, and tries to run 
> the job in accumulative mode. ClassCastException is thrown when trying to 
> cast UDF into Accumulator interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1224) Collected group should change to use new (internal) bag

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1224.
---


> Collected group should change to use new (internal) bag
> ---
>
> Key: PIG-1224
> URL: https://issues.apache.org/jira/browse/PIG-1224
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-1224.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1218) Use distributed cache to store samples

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1218.
---


> Use distributed cache to store samples
> --
>
> Key: PIG-1218
> URL: https://issues.apache.org/jira/browse/PIG-1218
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1218.patch, PIG-1218_2.patch, PIG-1218_3.patch
>
>
> Currently, in the case of skew join and order by we use sample that is just 
> written to the dfs (not distributed cache) and, as the result, get opened and 
> copied around more than necessary. This impacts query performance and also 
> places unnecesary load on the name node

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1234) Unable to create input slice for har:// files

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1234.
---


> Unable to create input slice for har:// files
> -
>
> Key: PIG-1234
> URL: https://issues.apache.org/jira/browse/PIG-1234
> Project: Pig
>  Issue Type: Bug
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Pradeep Kamath
> Fix For: 0.7.0
>
> Attachments: PIG-1234.patch
>
>
> Tried to load har:// files
> {noformat}
> grunt> a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
> PigStorage('\n') AS (line);
> grunt> dump 
> {noformat}
> but pig says
> {noformat}
> 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2118:
>  Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1230.
---


> Streaming input in POJoinPackage should use nonspillable bag to collect tuples
> --
>
> Key: PIG-1230
> URL: https://issues.apache.org/jira/browse/PIG-1230
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-1230.patch, pig-1230_1.patch, pig-1230_2.patch
>
>
> Last table of join statement is streamed through instead of collecting all 
> its tuple in a bag. As a further optimization of that, tuples of that 
> relation are collected in chunks in a bag. Since we don't want to spill the 
> tuples from this bag, NonSpillableBag should be used to hold tuples for this 
> relation. Initially, DefaultDataBag was used, which was later changed to 
> InternalCachedBag as a part of PIG-1209.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1216) New load store design does not allow Pig to validate inputs and outputs up front

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1216.
---


> New load store design does not allow Pig to validate inputs and outputs up 
> front
> 
>
> Key: PIG-1216
> URL: https://issues.apache.org/jira/browse/PIG-1216
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Alan Gates
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-1216.patch, pig-1216_1.patch
>
>
> In Pig 0.6 and before, Pig attempts to verify existence of inputs and 
> non-existence of outputs during parsing to avoid run time failures when 
> inputs don't exist or outputs can't be overwritten.  The downside to this was 
> that Pig assumed all inputs and outputs were HDFS files, which made 
> implementation harder for non-HDFS based load and store functions.  In the 
> load store redesign (PIG-966) this was delegated to InputFormats and 
> OutputFormats to avoid this problem and to make use of the checks already 
> being done in those implementations.  Unfortunately, for Pig Latin scripts 
> that run more then one MR job, this does not work well.  MR does not do 
> input/output verification on all the jobs at once.  It does them one at a 
> time.  So if a Pig Latin script results in 10 MR jobs and the file to store 
> to at the end already exists, the first 9 jobs will be run before the 10th 
> job discovers that the whole thing was doomed from the beginning.  
> To avoid this a validate call needs to be added to the new LoadFunc and 
> StoreFunc interfaces.  Pig needs to pass this method enough information that 
> the load function implementer can delegate to InputFormat.getSplits() and the 
> store function implementer to OutputFormat.checkOutputSpecs() if s/he decides 
> to.  Since 90% of all load and store functions use HDFS and PigStorage will 
> also need to, the Pig team should implement a default file existence check on 
> HDFS and make it available as a static method to other Load/Store function 
> implementers.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1217) [piggybank] evaluation.util.Top is broken

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1217.
---


> [piggybank] evaluation.util.Top is broken
> -
>
> Key: PIG-1217
> URL: https://issues.apache.org/jira/browse/PIG-1217
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: fix_top_udf.diff, fix_top_udf.diff, fix_top_udf.diff
>
>
> The Top udf has been broken for a while, due to an incorrect implementation 
> of getArgToFuncMapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1215) Make Hadoop jobId more prominent in the client log

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1215.
---


> Make Hadoop jobId more prominent in the client log
> --
>
> Key: PIG-1215
> URL: https://issues.apache.org/jira/browse/PIG-1215
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-1215.patch, pig-1215.patch, pig-1215_1.patch, 
> pig-1215_3.patch, pig-1215_4.patch
>
>
> This is a request from applications that want to be able to programmatically 
> parse client logs to find hadoop Ids.
> The woould like to see each job id on a separate line in the following format:
> hadoopJobId: job_123456789
> They would also like to see the jobs in the order they are executed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1207) [zebra] Data sanity check should be performed at the end of writing instead of later at query time

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1207.
---


> [zebra] Data sanity check should be performed at the end  of writing instead 
> of later at query time
> ---
>
> Key: PIG-1207
> URL: https://issues.apache.org/jira/browse/PIG-1207
> Project: Pig
>  Issue Type: Improvement
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.7.0
>
> Attachments: PIG-1207.patch, PIG-1207.patch
>
>
> Currently the equity check of number of rows across different column groups 
> are performed by the query. And the error info is sketchy and only emits a 
> "Column groups are not evenly distributed", or worse,  throws an 
> IndexOufOfBound exception from CGScanner.getCGValue since BasicTable.atEnd 
> and BasicTable.getKey, which are called just before BasicTable.getValue, only 
> checks the first column group in projection and any discrepancy of the number 
> of rows per file cross multiple column groups in projection could have  
> BasicTable.atEnd  return false and BasicTable.getKey return a key normally 
> but another column group already exaust its current file and the call to its 
> CGScanner.getCGValue throw the exception. 
> This check should also be performed at the end of writing and the error info 
> should be more informational.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1209) Port POJoinPackage to proactively spill

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1209.
---


> Port POJoinPackage to proactively spill
> ---
>
> Key: PIG-1209
> URL: https://issues.apache.org/jira/browse/PIG-1209
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-1209.patch
>
>
> POPackage proactively spills the bag whereas POJoinPackage still uses the 
> SpillableMemoryManager. We should port this to use InternalCacheBag which 
> proactively spills.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1212.
---


> LogicalPlan.replaceAndAddSucessors produce wrong result when successors are 
> null
> 
>
> Key: PIG-1212
> URL: https://issues.apache.org/jira/browse/PIG-1212
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1212-1.patch, PIG-1212-2.patch
>
>
> The following script throw a NPE:
> a = load '1.txt' as (a0:chararray);
> b = load '2.txt' as (b0:chararray);
> c = join a by a0, b by b0;
> d = filter c by a0 == 'a';
> explain d;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1204) Pig hangs when joining two streaming relations in local mode

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1204.
---


> Pig hangs when joining two streaming relations in local mode
> 
>
> Key: PIG-1204
> URL: https://issues.apache.org/jira/browse/PIG-1204
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1204.patch
>
>
> The following script hangs running in local mode  when inpuf files contains 
> many lines (e.g. 10K). The same script works when runing in MR mode.
> {code}
> A = load 'input1' as (a0, a1, a2);
> B = stream A through `head -1` as (a0, a1, a2);
> C = load 'input2' as (a0, a1, a2);
> D = stream C through `head -1` as (a0, a1, a2);
> E = join B by a0, D by a0;
> dump E
> {code}  
> Here is one stack trace:
> "Thread-13" prio=10 tid=0x09938400 nid=0x1232 in Object.wait() 
> [0x8fffe000..0x8030]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x9b8e0a40> (a 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream)
> at java.lang.Object.wait(Object.java:485)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNextHelper(POStream.java:291)
> - locked <0x9b8e0a40> (a 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNext(POStream.java:214)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:272)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1203) Temporarily disable failed unit test in load-store-redesign branch which have external dependency

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1203.
---


> Temporarily disable failed unit test in load-store-redesign branch which have 
> external dependency
> -
>
> Key: PIG-1203
> URL: https://issues.apache.org/jira/browse/PIG-1203
> Project: Pig
>  Issue Type: Sub-task
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1203-1.patch
>
>
> In load-store-redesign branch, two test suits, TestHBaseStorage and 
> TestCounters always fail. TestHBaseStorage depends on 
> https://issues.apache.org/jira/browse/PIG-1200, TestCounters depends on 
> future version of hadoop. We disable these two test suits temporarily, and 
> will enable them once the dependent issues are solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1198) [zebra] performance improvements

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1198.
---


> [zebra] performance improvements
> 
>
> Key: PIG-1198
> URL: https://issues.apache.org/jira/browse/PIG-1198
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.7.0
>
> Attachments: PIG-1198.patch, PIG-1198.patch, PIG-1198.patch
>
>
> Current input split generation is row-based split on individual TFiles. This 
> leaves undesired fact that even for TFiles smaller than one block one split 
> is still generated for each. Consequently, there will be many mappers, and 
> many waves, needed to handle the many small TFiles generated by as many 
> mappers/reducers that wrote the data. This issue can be addressed by 
> generating input splits that can include multiple TFiles. 
> For sorted tables, key distribution generation by table, which is used to 
> generated proper input splits, includes key distributions from column groups 
> even they are not in projection. This incurs extra cost to perform 
> unnecessary computations and, more inappropriately, creates unreasonable 
> results on input split generations; 
> For unsorted tables, when row split is generated on a union of tables, the 
> FileSplits are generated for each table and then lumped together to form the 
> final list of splits to Map/Reduce. This has a undesirable fact that number 
> of splits is subject to the number of tables in the table union and not just 
> controlled by the number of splits used by the Map/Reduce framework; 
> The input split's goal size is calculated on all column groups even if some 
> of them are not in projection; 
> For input splits of multiple files in one column group, all files are opened 
> at startup. This is unnecessary and takes unnecessarily resources from start 
> to end. The files should be opened when needed and closed when not; 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1190) Handling of quoted strings in pig-latin/grunt commands

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1190.
---


> Handling of quoted strings in pig-latin/grunt commands
> --
>
> Key: PIG-1190
> URL: https://issues.apache.org/jira/browse/PIG-1190
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: correct-testcase.patch, pig-1190.patch, pig-1190_1.patch
>
>
> There is some inconsistency in the way quoted strings are used/handled in 
> pig-latin .
> In load/store and define-ship commands, files are specified in quoted strings 
> , and the file name is the content within the quotes.  But in case of 
> register, set, and file system commands , if string is specified in quotes, 
> the quotes are also included as part of the string. This is not only 
> inconsistent , it is also unintuitive. 
> This is also inconsistent with the way hdfs commandline (or bash shell) 
> interpret file names.
> For example, currently with the command - 
> set job.name 'job123'
> The job name set set to 'job123' (including the quotes) not job123 .
> This needs to be fixed, and above command should be considered equivalent to 
> - set job.name job123. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1200) Using TableInputFormat in HBaseStorage

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1200.
---


> Using TableInputFormat in HBaseStorage
> --
>
> Key: PIG-1200
> URL: https://issues.apache.org/jira/browse/PIG-1200
> Project: Pig
>  Issue Type: Sub-task
>Affects Versions: 0.7.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.7.0
>
> Attachments: Pig_1200.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1194) ERROR 2055: Received Error while processing the map plan

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1194.
---


> ERROR 2055: Received Error while processing the map plan
> 
>
> Key: PIG-1194
> URL: https://issues.apache.org/jira/browse/PIG-1194
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: inputdata.txt, PIG-1194.patch, PIG-1294_1.patch
>
>
> I have a simple Pig script which takes 3 columns out of which one is null. 
> {code}
> input = load 'inputdata.txt' using PigStorage() as (col1, col2, col3);
> a = GROUP input BY (((double) col3)/((double) col2) > .001 OR col1 < 11 ? 
> col1 : -1);
> b = FOREACH a GENERATE group as col1, SUM(input.col2) as col2, 
> SUM(input.col3) as  col3;
> store b into 'finalresult';
> {code}
> When I run this script I get the following error:
> ERROR 2055: Received Error while processing the map plan.
> org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received 
> Error while processing the map plan.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:277)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 
> A more useful error message for the purpose of debugging would be helpful.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1173) pig cannot be built without an internet connection

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1173.
---


> pig cannot be built without an internet connection
> --
>
> Key: PIG-1173
> URL: https://issues.apache.org/jira/browse/PIG-1173
> Project: Pig
>  Issue Type: Bug
>Reporter: Jeff Hodges
>Assignee: Jeff Hodges
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: offlinebuild-v2.patch, offlinebuild.patch
>
>
> Pig's build.xml does not allow for offline building even when it's been built 
> before. This is because the ivy-download target has not conditional 
> associated with it to turn it off. The Hadoop seems to be adding an 
> unless="offline" to the ivy-download target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1184.
---


> PruneColumns optimization does not handle the case of foreach flatten 
> correctly if flattened bag is not used later
> --
>
> Key: PIG-1184
> URL: https://issues.apache.org/jira/browse/PIG-1184
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Pradeep Kamath
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1184-1.patch, PIG-1184-2.patch
>
>
> The following script :
> {noformat}
> -e "a = load 'input.txt' as (f1:chararray, f2:chararray, 
> f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a 
> generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, 
> \$4; dump b;"
> {noformat}
> gives the following result:
> (oiue,M,10)
> {noformat}
> cat input.txt:
> oiueM   {(3),(4)}   {(toronto),(montreal)}
> {noformat}
> If PruneColumns optimizations is disabled, we get the right result:
> (oiue,M,10)
> (oiue,M,10)
> (oiue,M,10)
> (oiue,M,10)
> The flatten results in 4 records - so the output should contain 4 records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1189.
---


> StoreFunc UDF should ship to the backend automatically without "register"
> -
>
> Key: PIG-1189
> URL: https://issues.apache.org/jira/browse/PIG-1189
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: multimapstore.pig, multireducestore.pig, 
> PIG-1189-1.patch, PIG-1189-2.patch, PIG-1189-3.patch, singlemapstore.pig, 
> singlereducestore.pig
>
>
> Pig should ship store UDF to backend even if user do not use "register". The 
> prerequisite is that UDF should be in classpath on frontend. We make that 
> work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
> we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1171) Top-N queries produce incorrect results when followed by a cross statement

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1171.
---


> Top-N queries produce incorrect results when followed by a cross statement
> --
>
> Key: PIG-1171
> URL: https://issues.apache.org/jira/browse/PIG-1171
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1171.patch
>
>
> ??I am not sure if this is a bug, or something more subtle, but here is the 
> problem that I am having.??
> ??When I LOAD a dataset, change it with an ORDER, LIMIT it, then CROSS it 
> with itself, the results are not correct. I expect to see the cross of the 
> limited, ordered dataset, but instead I see the cross of the limited dataset. 
> Effectively, its like the LIMIT is being excluded.??
> ??Example code follows:??
> {code}
> A = load 'foo' as (f1:int, f2:int, f3:int); B = load 'foo' as (f1:int, 
> f2:int, f3:int);
> a = ORDER A BY f1 DESC;
> b = ORDER B BY f1 DESC;
> aa = LIMIT a 1;
> bb = LIMIT b 1;
> C = CROSS aa, bb;
> DUMP C;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1164) [zebra]smoke test

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1164.
---


> [zebra]smoke test
> -
>
> Key: PIG-1164
> URL: https://issues.apache.org/jira/browse/PIG-1164
> Project: Pig
>  Issue Type: Test
>Affects Versions: 0.6.0
>Reporter: Jing Huang
> Fix For: 0.7.0
>
> Attachments: PIG-1164.patch, PIG-SMOKE.patch, smoke.patch
>
>
> Change zebra build.xml file to add smoke target. 
> And env.sh and run script under zebra/src/test/smoke dir

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1170) [zebra] end to end test and stress test

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1170.
---


> [zebra] end to end test and stress test
> ---
>
> Key: PIG-1170
> URL: https://issues.apache.org/jira/browse/PIG-1170
> Project: Pig
>  Issue Type: Test
>Affects Versions: 0.6.0
>Reporter: Jing Huang
> Fix For: 0.7.0
>
> Attachments: e2eStress.patch
>
>
> Add test cases for zebra end 2 end test , stress test and  stress test 
> verification tool. 
> No unit test is needed for this jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1187) UTF-8 (international code) breaks with loader when load with schema is specified

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1187.
---


> UTF-8 (international code) breaks with loader when load with schema is 
> specified
> 
>
> Key: PIG-1187
> URL: https://issues.apache.org/jira/browse/PIG-1187
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
>
> I have a set of Pig statements which dump an international dataset.
> {code}
> INPUT_OBJECT = load 'internationalcode';
> describe INPUT_OBJECT;
> dump INPUT_OBJECT;
> {code}
> Sample output
> (756a6196-ebcd-4789-ad2f-175e5df65d55,{(labelAaÂâÀ),(labelあいうえお1),(labelஜார்க2),(labeladfadf)})
> It works and dumps results but when I use a schema for loading it fails.
> {code}
> INPUT_OBJECT = load 'internationalcode' AS (object_id:chararray, labels: bag 
> {T: tuple(label:chararray)});
> describe INPUT_OBJECT;
> {code}
> The error message is as follows:2010-01-14 02:23:27,320 FATAL 
> org.apache.hadoop.mapred.Child: Error running child : 
> org.apache.pig.data.parser.TokenMgrError: Error: Bailing out of infinite loop 
> caused by repeated empty string matches at line 1, column 21.
>   at 
> org.apache.pig.data.parser.TextDataParserTokenManager.TokenLexicalActions(TextDataParserTokenManager.java:620)
>   at 
> org.apache.pig.data.parser.TextDataParserTokenManager.getNextToken(TextDataParserTokenManager.java:569)
>   at 
> org.apache.pig.data.parser.TextDataParser.jj_ntk(TextDataParser.java:651)
>   at 
> org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:152)
>   at 
> org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:100)
>   at 
> org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:382)
>   at 
> org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42)
>   at 
> org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageConverter.java:68)
>   at 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:76)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:845)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:250)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1176) Column Pruner issues in union of loader with and without schema

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1176.
---


> Column Pruner issues in union of loader with and without schema
> ---
>
> Key: PIG-1176
> URL: https://issues.apache.org/jira/browse/PIG-1176
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1176-1.patch
>
>
> Column pruner for union could fail if one source of union have the schema and 
> the other does not have schema. For example, the following script fail:
> {code}
> a = load '1.txt' as (a0, a1, a2);
> b = foreach a generate a0;
> c = load '2.txt';
> d = foreach c generate $0;
> e = union b, d;
> dump e;
> {code}
> However, this issue is in trunk only and is not applicable to 0.6 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1158) pig command line -M option doesn't support table union correctly (comma seperated paths)

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1158.
---


> pig command line -M option doesn't support table union correctly (comma 
> seperated paths)
> 
>
> Key: PIG-1158
> URL: https://issues.apache.org/jira/browse/PIG-1158
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Jing Huang
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1158.patch
>
>
> for example, load (1.txt,2.txt) USING 
> org.apache.hadoop.zebra.pig.TableLoader()
> i see this errror from stand out:
> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: 
> hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/1.txt,2.txt does not 
> exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1156.
---


> Add aliases to ExecJobs and PhysicalOperators
> -
>
> Key: PIG-1156
> URL: https://issues.apache.org/jira/browse/PIG-1156
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.7.0
>
> Attachments: pig_batchAliases.patch
>
>
> Currently, the way to use muti-query from Java is as follows:
> 1.  pigServer.setBatchOn();
> 2. register your queries with pigServer
> 3. List jobs = pigServer.executeBatch();
> 4. for (ExecJob job : jobs) { Iterator results = job.getResults(); }
> This will cause all stores to get evaluated in a single batch. However, there 
> is no way to identify which of the ExecJobs corresponds to which store.  We 
> should add aliases by which the stored relations are known to ExecJob in 
> order to allow the user to identify what the jobs correspond do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1161) Add missing apache headers to a few classes

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1161.
---


> Add missing apache headers to a few classes
> ---
>
> Key: PIG-1161
> URL: https://issues.apache.org/jira/browse/PIG-1161
> Project: Pig
>  Issue Type: Task
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Trivial
> Fix For: 0.7.0
>
> Attachments: pig_missing_licenses.patch
>
>
> The following java classes are missing Apache License headers:
> StoreConfig
> MapRedUtil
> SchemaUtil
> TestDataBagAccess
> TestNullConstant
> TestSchemaUtil
> We should add the missing headers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1159) merge join right side table does not support comma seperated paths

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1159.
---


> merge join right side table does not support comma seperated paths
> --
>
> Key: PIG-1159
> URL: https://issues.apache.org/jira/browse/PIG-1159
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Jing Huang
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1159.patch
>
>
> For example this is my script:(join_jira1.pig)
> register /grid/0/dev/hadoopqa/jars/zebra.jar;
> --a1 = load '1.txt' as (a:int, 
> b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
> --a2 = load '2.txt' as (a:int, 
> b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
> --sort1 = order a1 by a parallel 6;
> --sort2 = order a2 by a parallel 5;
> --store sort1 into 'asort1' using 
> org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
> --store sort2 into 'asort2' using 
> org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
> --store sort1 into 'asort3' using 
> org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
> --store sort2 into 'asort4' using 
> org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
> joinl = LOAD 'asort1,asort2' USING 
> org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted');
> joinr = LOAD 'asort3,asort4' USING 
> org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted');
> joina = join joinl by a, joinr by a using "merge" ;
> dump joina;
> ==
> here is the log:
> Backend error message
> -
> java.lang.IllegalArgumentException: Pathname 
> /user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4
>  from 
> hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4
>  is not a valid DFS filename.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
> at 
> org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:534)
> at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:338)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:398)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:184)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Pig Stack Trace
> ---
> ERROR 6015: During execution, encountered a Hadoop error.
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias joina
> at org.apache.pig.PigServer.openIterator(PigServer.java:482)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:386)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: 
> During execution, encountered a Hadoop error.
> at 
> .apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158)
> at 
> .apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
> at .apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)at 
> .apache.pig.backend.hadoop.datastorage.HDa

[jira] Closed: (PIG-1169) Top-N queries produce incorrect results when a store statement is added between order by and limit statement

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1169.
---


> Top-N queries produce incorrect results when a store statement is added 
> between order by and limit statement
> 
>
> Key: PIG-1169
> URL: https://issues.apache.org/jira/browse/PIG-1169
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1169.patch
>
>
> ??We tried to get top N results after a groupby and sort, and got different 
> results with or without storing the full sorted results. Here is a skeleton 
> of our pig script.??
> {code}
> raw_data = Load '' AS (f1, f2, ..., fn);
> grouped = group raw_data by (f1, f2);
> data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value;
> ordered = order data by value DESC parallel 10;
> topn = limit ordered 10;
> store ordered into 'outputdir/full';
> store topn into 'outputdir/topn';
> {code}
> ??With the statement 'store ordered ...', top N results are incorrect, but 
> without the statement, results are correct. Has anyone seen this before? I 
> know a similar bug has been fixed in the multi-query release. We are on pig 
> .4 and hadoop .20.1.??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1157.
---


> Sucessive replicated joins do not generate Map Reduce plan and fails due to 
> OOM
> ---
>
> Key: PIG-1157
> URL: https://issues.apache.org/jira/browse/PIG-1157
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, 
> replicatedjoinexplain.log
>
>
> Hi all,
>  I have a script which does 2 replicated joins in succession. Please note 
> that the inputs do not exist on the HDFS.
> {code}
> A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c);
> A1 = FOREACH A GENERATE a;
> B = GROUP A1 BY a;
> C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y);
> D = JOIN C BY x, B BY group USING "replicated";
> E = JOIN A BY a, D by x USING "replicated";
> dump E;
> {code}
> 2009-12-16 19:12:00,253 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size before optimization: 4
> 2009-12-16 19:12:00,254 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - Merged 1 map-only splittees.
> 2009-12-16 19:12:00,254 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - Merged 1 map-reduce splittees.
> 2009-12-16 19:12:00,254 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - Merged 2 out of total 2 splittees.
> 2009-12-16 19:12:00,254 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size after optimization: 2
> 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2998: Unhandled internal error. unable to create new native thread
> Details at logfile: pig_1260990666148.log
> Looking at the log file:
> Pig Stack Trace
> ---
> ERROR 2998: Unhandled internal error. unable to create new native thread
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:597)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773)
> at org.apache.pig.PigServer.store(PigServer.java:522)
> at org.apache.pig.PigServer.openIterator(PigServer.java:458)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> 
> If we want to look at the explain output, we find that there is no Map Reduce 
> plan that is generated. 
>  Why is the M/R plan not generated?
> Attaching the script and explain output.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1148) Move splitable logic from pig latin to InputFormat

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1148.
---


> Move splitable logic from pig latin to InputFormat
> --
>
> Key: PIG-1148
> URL: https://issues.apache.org/jira/browse/PIG-1148
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.7.0
>
> Attachments: PIG-1148.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1153) [zebra] spliting columns at different levels in a complex record column into different column groups throws exception

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1153.
---


> [zebra] spliting columns at different levels in a complex record column into 
> different column groups throws exception
> -
>
> Key: PIG-1153
> URL: https://issues.apache.org/jira/browse/PIG-1153
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Xuefu Zhang
>Assignee: Yan Zhou
> Fix For: 0.7.0
>
> Attachments: PIG-1153.patch, PIG-1153.patch
>
>
> The following code sample:
>   String strSch = "r1:record(f1:int, f2:int), r2:record(f5:int, 
> r3:record(f3:float, f4))";
>   String strStorage = "[r1.f1, r2.r3.f3, r2.f5]; [r1.f2, r2.r3.f4]";
>   Partition p = new Partition(schema.toString(), strStorage, null);
> gives the following exception:
> org.apache.hadoop.zebra.parser.ParseException: Different Split Types Set 
> on the same field: r2.f5

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1141) Make streaming work with the new load-store interfaces

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1141.
---


> Make streaming work with the new load-store interfaces 
> ---
>
> Key: PIG-1141
> URL: https://issues.apache.org/jira/browse/PIG-1141
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1141.patch, PIG-1141.patch, PIG-1141.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1154.
---


> local mode fails when hadoop config directory is specified in classpath
> ---
>
> Key: PIG-1154
> URL: https://issues.apache.org/jira/browse/PIG-1154
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Fix For: 0.7.0
>
> Attachments: pig_1154.patch
>
>
> In local mode, the hadoop configuration should not be taken from the 
> classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1124) Unable to set Custom Job Name using the -Dmapred.job.name parameter

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1124.
---


> Unable to set Custom Job Name using the -Dmapred.job.name parameter
> ---
>
> Key: PIG-1124
> URL: https://issues.apache.org/jira/browse/PIG-1124
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: pig-1124.patch
>
>
> As a Hadoop user I want to control the Job name for my analysis via the 
> command line using the following construct::
> java -cp pig.jar:$HADOOP_HOME/conf -Dmapred.job.name=hadoop_junkie 
> org.apache.pig.Main broken.pig
> -Dmapred.job.name should normally set my Hadoop Job name, but somehow during 
> the formation of the job.xml in Pig this information is lost and the job name 
> turns out to be:
> "PigLatin:broken.pig"
> The current workaround seems to be wiring it in the script itself, using the 
> following ( or using parameter substitution).
> set job.name 'my job'
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1149.
---


> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> 
>
> Key: PIG-1149
> URL: https://issues.apache.org/jira/browse/PIG-1149
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
>Assignee: Dmitriy V. Ryaboy
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: pig_1149.patch, pig_1149_lsr-branch.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something 
> like PigStorage(':').  We should allow passing parameters to the loaders 
> being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1136.
---


> [zebra] Map Split of Storage info do not allow for leading underscore char '_'
> --
>
> Key: PIG-1136
> URL: https://issues.apache.org/jira/browse/PIG-1136
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Yan Zhou
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: pig-1136-xuefu-new.patch
>
>
> There is some user need to support that type of map keys. Pig's column does 
> not allow for leading underscore, but apparently no restriction is placed on 
> the map key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1131.
---


> Pig simple join does not work when it contains empty lines
> --
>
> Key: PIG-1131
> URL: https://issues.apache.org/jira/browse/PIG-1131
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Viraj Bhat
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: junk1.txt, junk2.txt, pig-1131.patch, pig-1131.patch, 
> simplejoinscript.pig
>
>
> I have a simple script, which does a JOIN.
> {code}
> input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
> describe input1;
> input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
> describe input2;
> joineddata = JOIN input1 by $0, input2 by $0;
> describe joineddata;
> store joineddata into 'result';
> {code}
> The input data contains empty lines.  
> The join fails in the Map phase with the following error in the 
> PRLocalRearrange.java
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>   at java.util.ArrayList.get(ArrayList.java:322)
>   at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:159)
> I am surprised that the test cases did not detect this error. Could we add 
> this data which contains empty lines to the testcases?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1140.
---


> [zebra] Use of Hadoop 2.0 APIs  
> 
>
> Key: PIG-1140
> URL: https://issues.apache.org/jira/browse/PIG-1140
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Xuefu Zhang
> Fix For: 0.7.0
>
> Attachments: zebra.0209, zebra.0211, zebra.0212, zebra.0213
>
>
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
> upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1146) Inconsistent column pruning in LOUnion

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1146.
---


> Inconsistent column pruning in LOUnion
> --
>
> Key: PIG-1146
> URL: https://issues.apache.org/jira/browse/PIG-1146
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1146-1.patch, PIG-1146-2.patch
>
>
> This happens when we do a union on two relations, if one column comes from a 
> loader, the other matching column comes from a constant, and this column get 
> pruned. We prune for the one from loader and did not prune the constant. Thus 
> leaves union an inconsistent state. Here is a script:
> {code}
> a = load '1.txt' as (a0, a1:chararray, a2);
> b = load '2.txt' as (b0, b2);
> c = foreach b generate b0, 'hello', b2;
> d = union a, c;
> e = foreach d generate $0, $2;
> dump e;
> {code}
> 1.txt: 
> {code}
> ulysses thompson64  1.90
> katie carson25  3.65
> {code}
> 2.txt:
> {code}
> luke king   0.73
> holly davidson  2.43
> {code}
> expected output:
> (ulysses thompson,1.90)
> (katie carson,3.65)
> (luke king,0.73)
> (holly davidson,2.43)
> real output:
> (ulysses thompson,)
> (katie carson,)
> (luke king,0.73)
> (holly davidson,2.43)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1117) Pig reading hive columnar rc tables

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1117.
---


> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117-0.7.0-reviewed.patch, 
> PIG-1117-0.7.0-reviewed.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1122) [zebra] Zebra build.xml still uses 0.6 version

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1122.
---


> [zebra] Zebra build.xml still uses 0.6 version
> --
>
> Key: PIG-1122
> URL: https://issues.apache.org/jira/browse/PIG-1122
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.7.0
>
> Attachments: PIG-1122.patch
>
>
>  Zebra still uses pig-0.6.0-dev-core.jar in build-contrib.xml. It should be 
> changed to pig-0.7.0-dev-core.jar on APACHE trunk only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1115) [zebra] temp files are not cleaned.

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1115.
---


> [zebra] temp files are not cleaned.
> ---
>
> Key: PIG-1115
> URL: https://issues.apache.org/jira/browse/PIG-1115
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Hong Tang
>Assignee: Gaurav Jain
> Fix For: 0.7.0
>
> Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there 
> is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1103) refactor test-commit

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1103.
---


> refactor test-commit
> 
>
> Key: PIG-1103
> URL: https://issues.apache.org/jira/browse/PIG-1103
> Project: Pig
>  Issue Type: Task
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Fix For: 0.7.0
>
> Attachments: PIG-1103.patch
>
>
> Due to the changes to the local mode, many tests are now taking longer. Need 
> to make sure that test-commit still finishes within 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1110) Handle compressed file formats -- Gz, BZip with the new proposal

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1110.
---


> Handle compressed file formats -- Gz, BZip with the new proposal
> 
>
> Key: PIG-1110
> URL: https://issues.apache.org/jira/browse/PIG-1110
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1110.patch, PIG-1110.patch, PIG_1110_Jeff.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1099) [zebra] version on APACHE trunk should be 0.7.0 to be in pace with PIG

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1099.
---


> [zebra] version on APACHE trunk should be 0.7.0 to be in pace with PIG
> --
>
> Key: PIG-1099
> URL: https://issues.apache.org/jira/browse/PIG-1099
> Project: Pig
>  Issue Type: Bug
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Trivial
> Fix For: 0.7.0
>
> Attachments: PIG_1099.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1102) Collect number of spills per job

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1102.
---


> Collect number of spills per job
> 
>
> Key: PIG-1102
> URL: https://issues.apache.org/jira/browse/PIG-1102
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Sriranjan Manjunath
> Fix For: 0.7.0
>
> Attachments: PIG_1102.patch, PIG_1102.patch.1
>
>
> Memory shortage is one of the main performance issues in Pig. Knowing when we 
> spill do the disk is useful for understanding query performance and also to 
> see how certain changes in Pig effect that.
> Other interesting stats to collect would be average CPU usage and max mem 
> usage but I am not sure if this information is easily retrievable.
> Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1101) Pig parser does not recognize its own data type in LIMIT statement

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1101.
---


> Pig parser does not recognize its own data type in LIMIT statement
> --
>
> Key: PIG-1101
> URL: https://issues.apache.org/jira/browse/PIG-1101
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: pig-1101.patch
>
>
> I have a Pig script in which I specify the number of records to limit as a 
> long type. 
> {code}
> A = LOAD '/user/viraj/echo.txt' AS (txt:chararray);
> B = LIMIT A 10L;
> DUMP B;
> {code}
> I get a parser error:
> 2009-11-21 02:25:51,100 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1000: Error during parsing. Encountered "  "10L "" at line 3, 
> column 13.
> Was expecting:
>  ...
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.generateParseException(QueryParser.java:8963)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_consume_token(QueryParser.java:8839)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.LimitClause(QueryParser.java:1656)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1280)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:893)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:682)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1017)
> In fact 10L seems to work in the foreach generate construct.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1088.
---


> change merge join and merge join indexer to work with new LoadFunc interface
> 
>
> Key: PIG-1088
> URL: https://issues.apache.org/jira/browse/PIG-1088
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.7.0
>
> Attachments: PIG-1088.1.patch, PIG-1088.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1106) FR join should not spill

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1106.
---


> FR join should not spill
> 
>
> Key: PIG-1106
> URL: https://issues.apache.org/jira/browse/PIG-1106
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Fix For: 0.7.0
>
> Attachments: frjoin-nonspill.patch
>
>
> Currently, the values for the replicated side of the data are placed in a 
> spillable bag (POFRJoin near line 275). This does not make sense because the 
> whole point of the optimization is that the data on one side fits into 
> memory. We already have a non-spillable bag implemented 
> (NonSpillableDataBag.java) and we need to change FRJoin code to use it. And 
> of course need to do lots of testing to make sure that we don't spill but die 
> instead when we run out of memory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1062.
---


> load-store-redesign branch: change SampleLoader and subclasses to work with 
> new LoadFunc interface 
> ---
>
> Key: PIG-1062
> URL: https://issues.apache.org/jira/browse/PIG-1062
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.7.0
>
> Attachments: PIG-1062.5.patch, PIG-1062.patch, PIG-1062.patch.3
>
>
> This is part of the effort to implement new load store interfaces as laid out 
> in http://wiki.apache.org/pig/LoadStoreRedesignProposal .
> PigStorage and BinStorage are now working.
> SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to 
> be changed to work with new LoadFunc interface.  
> Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
> PoissonSampleLoader is used by skew join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1094) Fix unit tests corresponding to source changes so far

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1094.
---


> Fix unit tests corresponding to source changes so far
> -
>
> Key: PIG-1094
> URL: https://issues.apache.org/jira/browse/PIG-1094
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.7.0
>
> Attachments: PIG-1094.patch, PIG-1094_2.patch, PIG-1094_3.patch, 
> PIG-1094_4.patch, PIG-1094_5.patch, PIG-1094_6.patch, PIG-1094_7.patch
>
>
> The check-in's so far on load-store-redesign branch have nor addressed unit 
> test failures due to interface changes. This jira is to track the task of 
> making the common case unit tests work with the new interfaces. Some aspects 
> of the new proposal like using LoadCaster interface for casting, making local 
> mode work have not been completed yet. Tests which are failing due to those 
> reasons will not be fixed in this jira and addressed in the jiras 
> corresponding to those tasks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1075) Error in Cogroup when key fields types don't match

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1075.
---


> Error in Cogroup when key fields types don't match
> --
>
> Key: PIG-1075
> URL: https://issues.apache.org/jira/browse/PIG-1075
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Ankur
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1075.patch
>
>
> When Cogrouping 2 relations on multiple key fields, pig throws an error if 
> the corresponding types don't match. 
> Consider the following script:-
> A = LOAD 'data' USING PigStorage() as (a:chararray, b:int, c:int);
> B = LOAD 'data' USING PigStorage() as (a:chararray, b:chararray, c:int);
> C = CoGROUP A BY (a,b,c), B BY (a,b,c);
> D = FOREACH C GENERATE FLATTEN(A), FLATTEN(B);
> describe D;
> dump D;
> The complete stack trace of the error thrown is
> Pig Stack Trace
> ---
> ERROR 1051: Cannot cast to Unknown
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1001: Unable to 
> describe schema for alias D
> at org.apache.pig.PigServer.dumpSchema(PigServer.java:436)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:233)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:253)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An 
> unexpected exception caused the validation to stop
> at 
> org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:104)
> at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
> at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
> at 
> org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:83)
> at org.apache.pig.PigServer.compileLp(PigServer.java:821)
> at org.apache.pig.PigServer.dumpSchema(PigServer.java:428)
> ... 6 more
> Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
> ERROR 1060: Cannot resolve COGroup output schema
> at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2463)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:372)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:45)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
> ... 11 more
> Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
> ERROR 1051: Cannot cast to Unknown
> at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertAtomicCastForCOGroupInnerPlan(TypeCheckingVisitor.java:2552)
> at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2451)
> ... 16 more
> The error message does not help the user in identifying the issue clearly 
> especially if the pig script is large and complex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1082) Modify Comparator to work with a typed textual Storage

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1082.
---


> Modify Comparator to work with a typed textual Storage
> --
>
> Key: PIG-1082
> URL: https://issues.apache.org/jira/browse/PIG-1082
> Project: Pig
>  Issue Type: Sub-task
>Affects Versions: 0.4.0
>Reporter: hc busy
> Fix For: 0.7.0
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> See parent bug. This ticket is for just the comparator change, which needs to 
> be made in order for the nested data structures to sort right

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1090.
---


> Update sources to reflect recent changes in load-store interfaces
> -
>
> Key: PIG-1090
> URL: https://issues.apache.org/jira/browse/PIG-1090
> Project: Pig
>  Issue Type: Sub-task
>Affects Versions: 0.7.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.7.0
>
> Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
> PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
> PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, 
> PIG-1090-20.patch, PIG-1090-21.patch, PIG-1090-22.patch, PIG-1090-3.patch, 
> PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, 
> PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch
>
>
> There have been some changes (as recorded in the Changes Section, Nov 2 2009 
> sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
> load/store interfaces - this jira is to track the task of making those 
> changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1093) pig.properties file is missing from distributions

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1093.
---


> pig.properties file is missing from distributions
> -
>
> Key: PIG-1093
> URL: https://issues.apache.org/jira/browse/PIG-1093
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.7.0
>
> Attachments: PIG-1093.patch
>
>
> pig.properties (in fact the entire conf directory) is not included in the 
> jars distributed as part of the 0.5 release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1086) Nested sort by * throw exception

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1086.
---


> Nested sort by * throw exception
> 
>
> Key: PIG-1086
> URL: https://issues.apache.org/jira/browse/PIG-1086
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Daniel Dai
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1086.patch
>
>
> The following script fail:
> A = load '1.txt' as (a0, a1, a2);
> B = group A by a0;
> C = foreach B { D = order A by *; generate group, D;};
> explain C;
> Here is the stack:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.ArrayList.get(ArrayList.java:324)
> at 
> org.apache.pig.impl.logicalLayer.schema.Schema.getField(Schema.java:752)
> at 
> org.apache.pig.impl.logicalLayer.LOSort.getSortInfo(LOSort.java:332)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:1365)
> at org.apache.pig.impl.logicalLayer.LOSort.visit(LOSort.java:176)
> at org.apache.pig.impl.logicalLayer.LOSort.visit(LOSort.java:43)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:69)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:1274)
> at 
> org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:130)
> at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:45)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:234)
> at org.apache.pig.PigServer.compilePp(PigServer.java:864)
> at org.apache.pig.PigServer.explain(PigServer.java:583)
> ... 8 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1072.
---


> ReversibleLoadStoreFunc interface should be removed to enable different load 
> and store implementation classes to be used in a reversible manner
> ---
>
> Key: PIG-1072
> URL: https://issues.apache.org/jira/browse/PIG-1072
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Pradeep Kamath
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1072.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1079) Modify merge join to use distributed cache to maintain the index

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1079.
---


> Modify merge join to use distributed cache to maintain the index
> 
>
> Key: PIG-1079
> URL: https://issues.apache.org/jira/browse/PIG-1079
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1079.patch, PIG-1079.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1053) Consider moving to Hadoop for local mode

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1053.
---


> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Fix For: 0.7.0
>
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1022.
---


> optimizer pushes filter before the foreach that generates column used by 
> filter
> ---
>
> Key: PIG-1022
> URL: https://issues.apache.org/jira/browse/PIG-1022
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1022-1.patch
>
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, 
> gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as 
> gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as 
> gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first 
> foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1045) Integration with Hadoop 20 New API

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1045.
---


> Integration with Hadoop 20 New API
> --
>
> Key: PIG-1045
> URL: https://issues.apache.org/jira/browse/PIG-1045
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1045.patch, PIG-1045.patch
>
>
> Hadoop 21 is not yet released but we know that switch to new MR API is coming 
> there. This JIRA is for early integration with the portion of this API that 
> has been implemented in Hadoop 20.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (PIG-1046) join algorithm specification is within double quotes

2010-05-13 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1046.
---


> join algorithm specification is within double quotes
> 
>
> Key: PIG-1046
> URL: https://issues.apache.org/jira/browse/PIG-1046
> Project: Pig
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, 
> pig-1046_3.patch, pig-1046_4.patch
>
>
> This fails -
> j = join l1 by $0, l2 by $0 using 'skewed';
> This works -
> j = join l1 by $0, l2 by $0 using "skewed";
> String constants are single-quoted in pig-latin. If the algorithm 
> specification is supposed to be a string, specifying it within single quotes 
> should be supported.
> Alternatively, we should be using identifiers here, since these are 
> pre-defined in pig users will not be specifying arbitrary values that might 
> not be valid identifier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 1646 matches

Mail list logo