[jira] Commented: (PIG-979) Acummulator Interface for UDFs
[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759848#action_12759848 ] Jeff Hammerbacher commented on PIG-979: --- One could also cite the SOSP paper from MSR this year comparing the iterator to the accumulator interface, though I have a hard time concisely stating their conclusions: http://sigops.org/sosp/sosp09/papers/yu-sosp09.pdf. > Acummulator Interface for UDFs > -- > > Key: PIG-979 > URL: https://issues.apache.org/jira/browse/PIG-979 > Project: Pig > Issue Type: New Feature >Reporter: Alan Gates >Assignee: Ying He > > Add an accumulator interface for UDFs that would allow them to take a set > number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-980) Optimizing nested order bys
[ https://issues.apache.org/jira/browse/PIG-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759815#action_12759815 ] Alan Gates commented on PIG-980: A common pattern for Pig Latin scripts is: {code} A = load 'bla'; B = group A by $0; C = foreach B { D = order A by $1; ... } {code} Currently Pig executes this by using POSort on the reduce side, which collects all of the records out of the bag produced by POPackage into a SortedBag. If this bag is large, it will spill both as part of POPackage collecting it and as part of POSort sorting it. None of this is necessary however. Hadoop allows users to specify a sort order for data going to the reducer in addition to a partition key. This can be done by defining the Comparator for the job to compare all the fields you want sorted, and the Partitioner to only look at the field you want to partition on. So in this case the partitioner would be set to look at $0, and the comparator at $0, and $1. Beyond avoiding unnecessary sorts and spills, this will also allow us to use the proposed Accumulator interface (see PIG-979) for these types of scripts. > Optimizing nested order bys > --- > > Key: PIG-980 > URL: https://issues.apache.org/jira/browse/PIG-980 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > > Pig needs to take advantage of secondary sort in Hadoop to optimize nested > order bys. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-979) Acummulator Interface for UDFs
[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759813#action_12759813 ] David Ciemiewicz commented on PIG-979: -- This JIRA doesn't quite get the gist of why I believe the Accumulator interface is of interest. It isn't just about performance and avoiding retreading over the same data over and over again. It is also about providing an interface to support CUMMULATIVE_SUM, RANK, and other functions of it's ilk. A better code example for justifying this would be: {code} A = load 'data' using PigStorage() as ( query: chararray, int: count ); B = order A by count desc parallel 1; C = foreach B generate query, count, CUMULATIVE_SUM(count) as cumulative_count, RANK(count) as rank; {code} These functions RANK and CUMULATIVE_SUM would have persistent state and yet would emit a value per value or tuple passed. Bags would not be appropriate as coded. Additionally, the reason for the Accumulator inteface is to avoid multiple passes over the same data: For instance, consider the example: {code} A = load 'data' using PigStorage() as ( query: chararray, int: count ); B = group A all; C = foreach B generate group, SUM(A.count), AVG(A.count), VAR(A.count), STDEV(A.count), MIN(A.count), MAX(A.count), MEDIAN(A.count); {code} Repeatedly shuffling the same values just isn't an optimal way to process data. > Acummulator Interface for UDFs > -- > > Key: PIG-979 > URL: https://issues.apache.org/jira/browse/PIG-979 > Project: Pig > Issue Type: New Feature >Reporter: Alan Gates >Assignee: Ying He > > Add an accumulator interface for UDFs that would allow them to take a set > number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-980) Optimizing nested order bys
Optimizing nested order bys --- Key: PIG-980 URL: https://issues.apache.org/jira/browse/PIG-980 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Pig needs to take advantage of secondary sort in Hadoop to optimize nested order bys. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-979) Acummulator Interface for UDFs
[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759804#action_12759804 ] Alan Gates commented on PIG-979: Consider a Pig script like the following: {code} A = load 'bla'; B = group A by $0; C = foreach B { D = order A by $1; generate CUMMULATIVE_SUM(D); } {code} Because the UDF needs to see this data in an ordered fashion, it cannot be done using Pig's Algebraic interface. But it does not need to see all the contents of the bag together. One way to address this is to add an Accumulator interface that UDFs could implement. {code} interface Accumulator { /** * Pass tuples to the UDF. The passed in bag will contain only records from one * key. It may not contain all the records for one key. This function will * be called repeatedly until all records from one key are provided * to the UDF. * @param 1 or more tuples, all sharing the same key. */ void accumulate(Bag b); /** * Called when all records from a key have been passed to accumulate. * @return the value for the UDF for this key. */ T getValue(); } {code} In cases where all UDFs in a given foreach implement this accumulate interface, then Pig could choose to use this method to push records to the UDFs. Then it would not need to read all records from the Reduce iterator and cache them in memory or on disk. Before we commit to adding this new level of complexity to the langauge, we should performance test it. Given that we have recently made a change aimed at addressing Pig's problem of dying during large non-algebraic group bys (see PIG-975), this needs to perform significantly better than that to justify adding it. > Acummulator Interface for UDFs > -- > > Key: PIG-979 > URL: https://issues.apache.org/jira/browse/PIG-979 > Project: Pig > Issue Type: New Feature >Reporter: Alan Gates >Assignee: Ying He > > Add an accumulator interface for UDFs that would allow them to take a set > number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-979) Acummulator Interface for UDFs
Acummulator Interface for UDFs -- Key: PIG-979 URL: https://issues.apache.org/jira/browse/PIG-979 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Ying He Add an accumulator interface for UDFs that would allow them to take a set number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-978) ERROR 2100 (hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist) and ERROR 2999: (Unexpected internal error. null) when using Multi-Query optimization
ERROR 2100 (hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist) and ERROR 2999: (Unexpected internal error. null) when using Multi-Query optimization --- Key: PIG-978 URL: https://issues.apache.org/jira/browse/PIG-978 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 I have Pig script of this form.. which I execute using Multi-query optimization. {code} A = load '/user/viraj/firstinput' using PigStorage(); B = group C = ..agrregation function store C into '/user/viraj/firstinputtempresult/days1'; .. Atab = load '/user/viraj/secondinput' using PigStorage(); Btab = group Ctab = ..agrregation function store Ctab into '/user/viraj/secondinputtempresult/days1'; .. E = load '/user/viraj/firstinputtempresult/' using PigStorage(); F = group G = aggregation function store G into '/user/viraj/finalresult1'; Etab = load '/user/viraj/secondinputtempresult/' using PigStorage(); Ftab = group Gtab = aggregation function store Gtab into '/user/viraj/finalresult2'; {code} 2009-07-20 22:05:44,507 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2100: hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist. Details at logfile: /homes/viraj/pigscripts/pig_1248127173601.log) is due to the mismatch of store/load commands. The script first stores files into the 'days1' directory (store C into '/user/viraj/firstinputtempresult/days1' using PigStorage();), but it later loads from the top level directory (E = load '/user/viraj/firstinputtempresult/' using PigStorage()) instead of the original directory (/user/viraj/firstinputtempresult/days1). The current multi-query optimizer can't solve the dependency between these two commands--they have different load file paths. So the jobs will run concurrently and result in the errors. The solution is to add 'exec' or 'run' command after the first two stores . This will force the first two store commands to run before the rest commands. It would be nice to see this fixed as a part of an enhancement to the Multi-query. We either disable the Multi-query or throw a warning/error message, so that the user can correct his load/store statements. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759789#action_12759789 ] Raghu Angadi commented on PIG-949: -- I just committed this. Thanks Yan for the fix and Jing for the test! > Zebra Bug: splitting map into multiple column group using storage hint causes > unexpected behaviour > -- > > Key: PIG-949 > URL: https://issues.apache.org/jira/browse/PIG-949 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 > Environment: linux >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.5.0 > > Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch > > > Hi > The storage hint > specification plays a important part whether the output table is readable or > not > say if we have have the map 'map'. > One can split the map into a column group using [map#{k1}, map#{k2}...] > however the remaining map field will automatically be added to the default > group. > if user try to create a new column group for the remaining fields as follows > [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group > the table writer will create the table. > however, if one tries to load the created table via pig or via map reduce > using TableInputFormat > > then the reader have problem reading the map > We get the following stack trace > 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : > attempt_200908191538_33939_m_21_2, Status : FAILED > java.io.IOException: getValue() failed: null > at > org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-949: - Resolution: Fixed Fix Version/s: (was: 0.4.0) Status: Resolved (was: Patch Available) > Zebra Bug: splitting map into multiple column group using storage hint causes > unexpected behaviour > -- > > Key: PIG-949 > URL: https://issues.apache.org/jira/browse/PIG-949 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 > Environment: linux >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.5.0 > > Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch > > > Hi > The storage hint > specification plays a important part whether the output table is readable or > not > say if we have have the map 'map'. > One can split the map into a column group using [map#{k1}, map#{k2}...] > however the remaining map field will automatically be added to the default > group. > if user try to create a new column group for the remaining fields as follows > [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group > the table writer will create the table. > however, if one tries to load the created table via pig or via map reduce > using TableInputFormat > > then the reader have problem reading the map > We get the following stack trace > 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : > attempt_200908191538_33939_m_21_2, Status : FAILED > java.io.IOException: getValue() failed: null > at > org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-949: -- Already viewed the patch +1 > Zebra Bug: splitting map into multiple column group using storage hint causes > unexpected behaviour > -- > > Key: PIG-949 > URL: https://issues.apache.org/jira/browse/PIG-949 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 > Environment: linux >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.4.0, 0.5.0 > > Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch > > > Hi > The storage hint > specification plays a important part whether the output table is readable or > not > say if we have have the map 'map'. > One can split the map into a column group using [map#{k1}, map#{k2}...] > however the remaining map field will automatically be added to the default > group. > if user try to create a new column group for the remaining fields as follows > [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group > the table writer will create the table. > however, if one tries to load the created table via pig or via map reduce > using TableInputFormat > > then the reader have problem reading the map > We get the following stack trace > 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : > attempt_200908191538_33939_m_21_2, Status : FAILED > java.io.IOException: getValue() failed: null > at > org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) > at > org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-975: Attachment: PIG-975.patch4 Add switch to old bag. Setting property pig.cachedbag.type=default would switch to old default bag. If not specified, use InternalCachedBag.l > Need a databag that does not register with SpillableMemoryManager and spill > data pro-actively > - > > Key: PIG-975 > URL: https://issues.apache.org/jira/browse/PIG-975 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Ying He >Assignee: Ying He > Fix For: 0.2.0 > > Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, > PIG-975.patch3, PIG-975.patch4 > > > POPackage uses DefaultDataBag during reduce process to hold data. It is > registered with SpillableMemoryManager and prone to OutOfMemoryException. > It's better to pro-actively managers the usage of the memory. The bag fills > in memory to a specified amount, and dump the rest the disk. The amount of > memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-958) Splitting output data on key field
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759742#action_12759742 ] Pradeep Kamath commented on PIG-958: The release audit warning I think is related to missing Apache header comment - can you add Apache header comment by pasting it from some other source file in svn - every file needs to have the apache header as a comment at the beginning of the file - you will need to add it to the beginning of source and test file. Also if you agree with any of the review comments you can incorporate those changes when you submit the next version of the patch. > Splitting output data on key field > -- > > Key: PIG-958 > URL: https://issues.apache.org/jira/browse/PIG-958 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Ankur > Attachments: 958.v2.patch > > > Pig users often face the need to split the output records into a bunch of > files and directories depending on the type of record. Pig's SPLIT operator > is useful when record types are few and known in advance. In cases where type > is not directly known but is derived dynamically from values of a key field > in the output tuple, a custom store function is a better solution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-977) exit status does not account for JOB_STATUS.TERMINATED
[ https://issues.apache.org/jira/browse/PIG-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759739#action_12759739 ] Pradeep Kamath commented on PIG-977: It does look like we only us COMPLETED and FAILED - +1 to remove other unused states - we can add them when the need arises. > exit status does not account for JOB_STATUS.TERMINATED > -- > > Key: PIG-977 > URL: https://issues.apache.org/jira/browse/PIG-977 > Project: Pig > Issue Type: Bug >Reporter: Thejas M Nair > > For determining the exit status of pig query, only JOB_STATUS.FAILED is being > used and status TERMINATED is ignored. > I think the reason for this is that in ExecJob.JOB_STATUS only FAILED and > COMPLETED are being used anywhere. Rest are unused. I think we should comment > out the unused parts for now to indicate that, or fix the code for > determining success/failure in GruntParser. executeBatch > {code} > public enum JOB_STATUS { > QUEUED, > RUNNING, > SUSPENDED, > TERMINATED, > FAILED, > COMPLETED, > } > {code} > {code} > private void executeBatch() throws IOException { > if (mPigServer.isBatchOn()) { > if (mExplain != null) { > explainCurrentBatch(); > } > if (!mLoadOnly) { > List jobs = mPigServer.executeBatch(); > for(ExecJob job: jobs) { > == > if (job.getStatus() == ExecJob.JOB_STATUS.FAILED) { > mNumFailedJobs++; > if (job.getException() != null) { > LogUtils.writeLog( > job.getException(), > > mPigServer.getPigContext().getProperties().getProperty("pig.logfile"), > log, > > "true".equalsIgnoreCase(mPigServer.getPigContext().getProperties().getProperty("verbose")), > "Pig Stack Trace"); > } > } > else { > mNumSucceededJobs++; > } > } > } > } > } > {code} > Any opinions ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-970: --- Attachment: TEST-org.apache.pig.test.TestHBaseStorage.txt pig-hbase-20-v2.patch The issue was the missing Zookeeper lib. I added that, and now I get what looks like a real hbase error. I have no idea what it means, so I'll let you take a look. I've attached both a new patch (with the changes to build.xml to pick up the right libs) and the error log from the test run. > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT > Attachments: build.xml.path, pig-hbase-0.20.0-support.patch, > pig-hbase-20-v2.patch, TEST-org.apache.pig.test.TestHBaseStorage.txt > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-958) Splitting output data on key field
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759733#action_12759733 ] Hadoop QA commented on PIG-958: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420264/958.v2.patch against trunk revision 818929. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 281 release audit warnings (more than the trunk's current 279 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/46/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/46/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/46/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/46/console This message is automatically generated. > Splitting output data on key field > -- > > Key: PIG-958 > URL: https://issues.apache.org/jira/browse/PIG-958 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Ankur > Attachments: 958.v2.patch > > > Pig users often face the need to split the output records into a bunch of > files and directories depending on the type of record. Pig's SPLIT operator > is useful when record types are few and known in advance. In cases where type > is not directly known but is derived dynamically from values of a key field > in the output tuple, a custom store function is a better solution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent BARAT updated PIG-970: -- Attachment: build.xml.path To show you better what I did on the jar files side, here is the patch I made on the build.xml file. > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT > Attachments: build.xml.path, pig-hbase-0.20.0-support.patch > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759727#action_12759727 ] Vincent BARAT commented on PIG-970: --- Yes, but I was unable to make the TestHBaseStorage work. I guess it was just a matter of environement, since the errors were related to a classes not found. I didn't waste too much time on that actually... I will try again. > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT > Attachments: pig-hbase-0.20.0-support.patch > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-977) exit status does not account for JOB_STATUS.TERMINATED
exit status does not account for JOB_STATUS.TERMINATED -- Key: PIG-977 URL: https://issues.apache.org/jira/browse/PIG-977 Project: Pig Issue Type: Bug Reporter: Thejas M Nair For determining the exit status of pig query, only JOB_STATUS.FAILED is being used and status TERMINATED is ignored. I think the reason for this is that in ExecJob.JOB_STATUS only FAILED and COMPLETED are being used anywhere. Rest are unused. I think we should comment out the unused parts for now to indicate that, or fix the code for determining success/failure in GruntParser. executeBatch {code} public enum JOB_STATUS { QUEUED, RUNNING, SUSPENDED, TERMINATED, FAILED, COMPLETED, } {code} {code} private void executeBatch() throws IOException { if (mPigServer.isBatchOn()) { if (mExplain != null) { explainCurrentBatch(); } if (!mLoadOnly) { List jobs = mPigServer.executeBatch(); for(ExecJob job: jobs) { == > if (job.getStatus() == ExecJob.JOB_STATUS.FAILED) { mNumFailedJobs++; if (job.getException() != null) { LogUtils.writeLog( job.getException(), mPigServer.getPigContext().getProperties().getProperty("pig.logfile"), log, "true".equalsIgnoreCase(mPigServer.getPigContext().getProperties().getProperty("verbose")), "Pig Stack Trace"); } } else { mNumSucceededJobs++; } } } } } {code} Any opinions ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [VOTE] Release Pig 0.4.0 (candidate 2)
With 3 +1s from Hadoop PMC (Alan Gates, Raghu Angadi, and Olga Natkovich) and no -1s, the release passed the vote. I will be working on rolling it out next. Olga -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Tuesday, September 22, 2009 4:12 PM To: priv...@hadoop.apache.org Cc: pig-dev@hadoop.apache.org Subject: Re: [VOTE] Release Pig 0.4.0 (candidate 2) +1. ran 'ant test-core'. contrib/zebra: 'ant test' passed after following directions as suggested : got a patch from PIG-660, and hadoop20.jar from PIG-833. For clarity we might attach patch suitable for PIG-660 for 0.4. Raghu. Olga Natkovich wrote: > Hi, > > The new version is available in > http://people.apache.org/~olga/pig-0.4.0-candidate-2/. > > I see one failure in a unit test in piggybank (contrib.) but it is not > related to the functions themselves but seems to be an issue with > MiniCluster and I don't feel we need to chase this down. I made sure > that the same test runs ok with Hadoop 20. > > Please, vote by end of day on Thursday, 9/24. > > Olga > > -Original Message- > From: Olga Natkovich [mailto:ol...@yahoo-inc.com] > Sent: Thursday, September 17, 2009 12:09 PM > To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org > Subject: [VOTE] Release Pig 0.4.0 (candidate 1) > > Hi, > > I have fixed the issue causing the failure that Alan reported. > > Please test the new release: > http://people.apache.org/~olga/pig-0.4.0-candidate-1/. > > Vote closes on Tuesday, 9/22. > > Olga > > > -Original Message- > From: Olga Natkovich [mailto:ol...@yahoo-inc.com] > Sent: Monday, September 14, 2009 2:06 PM > To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org > Subject: [VOTE] Release Pig 0.4.0 (candidate 0) > > Hi, > > > > I created a candidate build for Pig 0.4.0 release. The highlights of > this release are > > > > - Performance improvements especially in the area of JOIN > support where we introduced two new join types: skew join to deal with > data skew and sort merge join to take advantage of the sorted data sets. > > - Support for Outer join. > > - Works with Hadoop 18 > > > > I ran the release audit and rat report looked fine. The relevant part is > attached below. > > > > Keys used to sign the release are available at > http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup. > > > > Please download the release and try it out: > http://people.apache.org/~olga/pig-0.4.0-candidate-0. > > > > Should we release this? Vote closes on Thursday, 9/17. > > > > Olga > > > > > > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/CHANGES.txt > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/CHANG > ES.txt > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken-links.x > ml > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/cookbook.html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/linkmap.html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_refer > ence.html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_users > .html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/tutorial.html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/package-li > st > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes. > html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/missingS > inces.txt > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/user_com > ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ > alldiffs_index_additions.html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ > alldiffs_index_all.html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ > alldiffs_index_changes.html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ > alldiffs_index_removals.html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ > changes-summary.html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ > classes_index_additions.html > [java] !? > /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jd
[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759689#action_12759689 ] Olga Natkovich commented on PIG-975: Ying, what Pradeep is asking for is more like a safety switch - to give users a way to go back to the old implementation if they run into problem with new. Once we verify that the new code is as stable as the old, we would remove the switch. We would also not expose it to users unless they do run into trouble. > Need a databag that does not register with SpillableMemoryManager and spill > data pro-actively > - > > Key: PIG-975 > URL: https://issues.apache.org/jira/browse/PIG-975 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Ying He >Assignee: Ying He > Fix For: 0.2.0 > > Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, > PIG-975.patch3 > > > POPackage uses DefaultDataBag during reduce process to hold data. It is > registered with SpillableMemoryManager and prone to OutOfMemoryException. > It's better to pro-actively managers the usage of the memory. The bag fills > in memory to a specified amount, and dump the rest the disk. The amount of > memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759681#action_12759681 ] Ying He commented on PIG-975: - I think this is too implementation specific to expose to end user. Frankly, I don't think user cares which class we use for the data bags. > Need a databag that does not register with SpillableMemoryManager and spill > data pro-actively > - > > Key: PIG-975 > URL: https://issues.apache.org/jira/browse/PIG-975 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Ying He >Assignee: Ying He > Fix For: 0.2.0 > > Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, > PIG-975.patch3 > > > POPackage uses DefaultDataBag during reduce process to hold data. It is > registered with SpillableMemoryManager and prone to OutOfMemoryException. > It's better to pro-actively managers the usage of the memory. The bag fills > in memory to a specified amount, and dump the rest the disk. The amount of > memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-942) Maps are not implicitly casted
[ https://issues.apache.org/jira/browse/PIG-942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-942: --- Resolution: Fixed Status: Resolved (was: Patch Available) Unit test was present in the original patch. Patch committed to trunk. > Maps are not implicitly casted > -- > > Key: PIG-942 > URL: https://issues.apache.org/jira/browse/PIG-942 > Project: Pig > Issue Type: Bug >Reporter: Sriranjan Manjunath >Assignee: Pradeep Kamath > Fix For: 0.6.0 > > Attachments: PIG-942-2.patch, PIG-942.patch > > > A = load 'foo' as (m) throws the following exception when foo has maps. > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be > cast to java.util.Map > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:98) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:115) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:612) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) > The same works if I explicitly cast m to a map: A = load 'foo' as (m:[]) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-958) Splitting output data on key field
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-958: --- Status: Open (was: Patch Available) > Splitting output data on key field > -- > > Key: PIG-958 > URL: https://issues.apache.org/jira/browse/PIG-958 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Ankur > Attachments: 958.v2.patch > > > Pig users often face the need to split the output records into a bunch of > files and directories depending on the type of record. Pig's SPLIT operator > is useful when record types are few and known in advance. In cases where type > is not directly known but is derived dynamically from values of a key field > in the output tuple, a custom store function is a better solution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-958) Splitting output data on key field
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-958: --- Status: Patch Available (was: Open) > Splitting output data on key field > -- > > Key: PIG-958 > URL: https://issues.apache.org/jira/browse/PIG-958 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Ankur > Attachments: 958.v2.patch > > > Pig users often face the need to split the output records into a bunch of > files and directories depending on the type of record. Pig's SPLIT operator > is useful when record types are few and known in advance. In cases where type > is not directly known but is derived dynamically from values of a key field > in the output tuple, a custom store function is a better solution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759645#action_12759645 ] Pradeep Kamath commented on PIG-975: I think it might be a good idea to have a config parameter (maybe a java -D property) which can allow users to choose between spillableBagForReduce and NonSpillableBagForReduce with the Non spillable one being the default. This way if for some reason users find the spillablebag better for their query they can use it. > Need a databag that does not register with SpillableMemoryManager and spill > data pro-actively > - > > Key: PIG-975 > URL: https://issues.apache.org/jira/browse/PIG-975 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Ying He >Assignee: Ying He > Fix For: 0.2.0 > > Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, > PIG-975.patch3 > > > POPackage uses DefaultDataBag during reduce process to hold data. It is > registered with SpillableMemoryManager and prone to OutOfMemoryException. > It's better to pro-actively managers the usage of the memory. The bag fills > in memory to a specified amount, and dump the rest the disk. The amount of > memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-970) Support of HBase 0.20.0
[ https://issues.apache.org/jira/browse/PIG-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759626#action_12759626 ] Alan Gates commented on PIG-970: In addition to adding hbase-0.20.0.jar to the lib directory did you add hbase-0.20.0-test? > Support of HBase 0.20.0 > --- > > Key: PIG-970 > URL: https://issues.apache.org/jira/browse/PIG-970 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Vincent BARAT > Attachments: pig-hbase-0.20.0-support.patch > > > The support of HBase is currently very limited and restricted to HBase 0.18.0. > Because the next releases of PIG will support Hadoop 0.20.0, they should also > support HBase 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-975: Attachment: internalbag.xls performance numbers > Need a databag that does not register with SpillableMemoryManager and spill > data pro-actively > - > > Key: PIG-975 > URL: https://issues.apache.org/jira/browse/PIG-975 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Ying He >Assignee: Ying He > Fix For: 0.2.0 > > Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, > PIG-975.patch3 > > > POPackage uses DefaultDataBag during reduce process to hold data. It is > registered with SpillableMemoryManager and prone to OutOfMemoryException. > It's better to pro-actively managers the usage of the memory. The bag fills > in memory to a specified amount, and dump the rest the disk. The amount of > memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-975: Attachment: PIG-975.patch3 remove synchronization > Need a databag that does not register with SpillableMemoryManager and spill > data pro-actively > - > > Key: PIG-975 > URL: https://issues.apache.org/jira/browse/PIG-975 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Ying He >Assignee: Ying He > Fix For: 0.2.0 > > Attachments: PIG-975.patch, PIG-975.patch2, PIG-975.patch3 > > > POPackage uses DefaultDataBag during reduce process to hold data. It is > registered with SpillableMemoryManager and prone to OutOfMemoryException. > It's better to pro-actively managers the usage of the memory. The bag fills > in memory to a specified amount, and dump the rest the disk. The amount of > memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-942) Maps are not implicitly casted
[ https://issues.apache.org/jira/browse/PIG-942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759509#action_12759509 ] Hadoop QA commented on PIG-942: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420393/PIG-942-2.patch against trunk revision 818175. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/45/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/45/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/45/console This message is automatically generated. > Maps are not implicitly casted > -- > > Key: PIG-942 > URL: https://issues.apache.org/jira/browse/PIG-942 > Project: Pig > Issue Type: Bug >Reporter: Sriranjan Manjunath >Assignee: Pradeep Kamath > Fix For: 0.6.0 > > Attachments: PIG-942-2.patch, PIG-942.patch > > > A = load 'foo' as (m) throws the following exception when foo has maps. > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be > cast to java.util.Map > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:98) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:115) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:612) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) > The same works if I explicitly cast m to a map: A = load 'foo' as (m:[]) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-976) Multi-query optimization throws ClassCastException
Multi-query optimization throws ClassCastException -- Key: PIG-976 URL: https://issues.apache.org/jira/browse/PIG-976 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Ankur Multi-query optimization fails to merge 2 branches when 1 is a result of Group By ALL and another is a result of Group By field1 where field 1 is of type long. Here is the script that fails with multi-query on. data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); A = GROUP data ALL; B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2; C = FOREACH B GENERATE (sum1/sum2) AS rate; STORE C INTO 'result1'; D = GROUP data BY a; E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c); STORE E into 'result2'; Here is the exception from the logs java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-942) Maps are not implicitly casted
[ https://issues.apache.org/jira/browse/PIG-942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giridharan Kesavan updated PIG-942: --- Status: Patch Available (was: Open) > Maps are not implicitly casted > -- > > Key: PIG-942 > URL: https://issues.apache.org/jira/browse/PIG-942 > Project: Pig > Issue Type: Bug >Reporter: Sriranjan Manjunath >Assignee: Pradeep Kamath > Fix For: 0.6.0 > > Attachments: PIG-942-2.patch, PIG-942.patch > > > A = load 'foo' as (m) throws the following exception when foo has maps. > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be > cast to java.util.Map > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:98) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:115) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:612) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) > The same works if I explicitly cast m to a map: A = load 'foo' as (m:[]) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-942) Maps are not implicitly casted
[ https://issues.apache.org/jira/browse/PIG-942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giridharan Kesavan updated PIG-942: --- Status: Open (was: Patch Available) > Maps are not implicitly casted > -- > > Key: PIG-942 > URL: https://issues.apache.org/jira/browse/PIG-942 > Project: Pig > Issue Type: Bug >Reporter: Sriranjan Manjunath >Assignee: Pradeep Kamath > Fix For: 0.6.0 > > Attachments: PIG-942-2.patch, PIG-942.patch > > > A = load 'foo' as (m) throws the following exception when foo has maps. > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be > cast to java.util.Map > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:98) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:115) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:612) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) > The same works if I explicitly cast m to a map: A = load 'foo' as (m:[]) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.