A question regarding Storages, dependency order and java.lang.StackOverflowError
Hello, Tested with Pig 0.12.1 and Pig 0.14.0 I write here with not much hope, but maybe I have luck and someone knows how to solve it :) I am writing an Storage for Gora, and if I use an outer bag inside a foreach when storing I get java.lang.StackOverflowError . Exactly this: Pig Stack Trace --- ERROR 2998: Unhandled internal error. null java.lang.StackOverflowError at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at java.lang.Class.newInstance(Class.java:379) at org.apache.pig.impl.util.Utils.mergeCollection(Utils.java:441) at org.apache.pig.newplan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:84) at org.apache.pig.newplan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:88) at org.apache.pig.newplan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:88) (fill 1030 lines of log with this last line) When doing a dump or using PigStorage all works perfectly, so the problem is surely in my Storage implementation. The script is as follows: borrar_areas_table = LOAD '.' USING org.apache.gora.pig.GoraStorage( 'java.lang.String', 'es.indra.innovationlabs.celtic.generated.BorrarAreas', 'nombre') ; borrar_areas = FOREACH borrar_areas_table GENERATE key ; borrar_areas_bag = GROUP borrar_areas ALL ; -- [2] - Borrar de webpage: -- experta: map area - record = hashmap, --y areas: array areas = bag webpage = LOAD '.' USING org.apache.gora.pig.GoraStorage( 'java.lang.String', 'org.apache.nutch.storage.WebPage', 'experta, areas') ; -- Seleccionar aquellas páginas que contienen en areas alguna de las áreas a borrar (en borrar_areas_bag.borrar_areas) webpage_match = FILTER webpage BY bagContainsFB(areas, borrar_areas_bag.borrar_areas) ; -- Borrar las áreas (bag) y las claves en experta (map) webpage_fix = FOREACH webpage_match GENERATE key, deleteMapKeys(experta, borrar_areas_bag.borrar_areas) as experta, SUBTRACT(areas, borrar_areas_bag.borrar_areas) as areas ; STORE webpage_fix INTO '.' USING org.apache.gora.pig.GoraStorage( 'java.lang.String', 'org.apache.nutch.storage.WebPage', 'experta, areas') ; I have to do a workaround in order to get things done, avoiding using borrar_areas_bag.borrar_areas and using a cross instead, but the execution is noticeably slower: borrar_areas_table = LOAD '.' USING org.apache.gora.pig.GoraStorage( 'java.lang.String', 'es.indra.innovationlabs.celtic.generated.BorrarAreas', 'nombre') ; borrar_areas = FOREACH borrar_areas_table GENERATE key ; borrar_areas_bag = GROUP borrar_areas ALL ; -- [2] - Borrar de webpage: experta: map area - record = hashmap, y areas: array areas = bag webpage = LOAD '.' USING org.apache.gora.pig.GoraStorage( 'java.lang.String', 'org.apache.nutch.storage.WebPage', 'experta, areas') ; webpage_cross_areas = CROSS webpage, borrar_areas_bag ; -- Seleccionar aquellas páginas que contienen en areas alguna de las áreas a borrar (en borrar_areas_bag::borrar_areas) webpage_match = FILTER webpage_cross_areas BY bagContainsFB(webpage::areas, borrar_areas_bag::borrar_areas) ; -- Borrar las áreas (bag) y las claves en experta (map) webpage_fix = FOREACH webpage_match GENERATE webpage::key AS key, deleteMapKeys(experta, borrar_areas_bag::borrar_areas) as experta, SUBTRACT(areas, borrar_areas_bag::borrar_areas) as areas:{(chararray)} ; STORE webpage_fix INTO '.' USING org.apache.gora.pig.GoraStorage( 'java.lang.String', 'org.apache.nutch.storage.WebPage', 'experta, areas') ; The actual question is: Does anyone think about something if I ask about that case?: outerbag in a foreach, Storage, dependecies, ... Any possible method that I should implement? Is related with some schema? I know is a quite nonsense question, so I don't expect any idea :( but thanks! :) Regards, Alfonso Nishikawa
[jira] [Created] (PIG-4513) Lines dropped in delimited text when they begin with null/no-data
Madhan Sundararajan Devaki created PIG-4513: --- Summary: Lines dropped in delimited text when they begin with null/no-data Key: PIG-4513 URL: https://issues.apache.org/jira/browse/PIG-4513 Project: Pig Issue Type: Bug Components: parser, piggybank Affects Versions: 0.12.0 Environment: CDH5.2.x, CDH5.3.x Reporter: Madhan Sundararajan Devaki Priority: Blocker When Pig (0.12) is used to process delimited text files (| delimited), lines that do not contain data in the first column are dropped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Build failed in Jenkins: Pig-trunk-commit #2104
See https://builds.apache.org/job/Pig-trunk-commit/2104/ -- [...truncated 4407 lines...] [junit] Running org.apache.pig.test.TestNewPlanListener [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.344 sec [junit] Running org.apache.pig.test.TestNewPlanLogToPhyTranslationVisitor [junit] Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.269 sec [junit] Running org.apache.pig.test.TestNewPlanLogicalOptimizer [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.247 sec [junit] Running org.apache.pig.test.TestNewPlanOperatorPlan [junit] Tests run: 47, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.597 sec [junit] Running org.apache.pig.test.TestNewPlanPruneMapKeys [junit] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.697 sec [junit] Running org.apache.pig.test.TestNewPlanPushDownForeachFlatten [junit] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.3 sec [junit] Running org.apache.pig.test.TestNewPlanPushUpFilter [junit] Tests run: 46, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.243 sec [junit] Running org.apache.pig.test.TestNewPlanRule [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.307 sec [junit] Running org.apache.pig.test.TestNotEqualTo [junit] Tests run: 28, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.421 sec [junit] Running org.apache.pig.test.TestNull [junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.42 sec [junit] Running org.apache.pig.test.TestNullConstant [junit] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 23.026 sec [junit] Running org.apache.pig.test.TestNumberOfReducers [junit] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 486.471 sec [junit] Running org.apache.pig.test.TestOptimizeLimit [junit] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.115 sec [junit] Running org.apache.pig.test.TestOrderBy3 [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.967 sec [junit] Running org.apache.pig.test.TestPOBinCond [junit] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.443 sec [junit] Running org.apache.pig.test.TestPOCast [junit] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.865 sec [junit] Running org.apache.pig.test.TestPODistinct [junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.371 sec [junit] Running org.apache.pig.test.TestPOGenerate [junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.342 sec [junit] Running org.apache.pig.test.TestPOMapLookUp [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.332 sec [junit] Running org.apache.pig.test.TestPONegative [junit] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.49 sec [junit] Running org.apache.pig.test.TestPOPartialAgg [junit] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.823 sec [junit] Running org.apache.pig.test.TestPOPartialAggPlan [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.224 sec [junit] Running org.apache.pig.test.TestPORegexp [junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.353 sec [junit] Running org.apache.pig.test.TestPOSort [junit] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.378 sec [junit] Running org.apache.pig.test.TestPOSplit [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.34 sec [junit] Running org.apache.pig.test.TestPOUserFunc [junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.377 sec [junit] Running org.apache.pig.test.TestPackage [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.773 sec [junit] Running org.apache.pig.test.TestParamSubPreproc [junit] Tests run: 36, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.721 sec [junit] Running org.apache.pig.test.TestParser [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.966 sec [junit] Running org.apache.pig.test.TestPhyOp [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.415 sec [junit] Running org.apache.pig.test.TestPhyPatternMatch [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.324 sec [junit] Running org.apache.pig.test.TestPigContext [junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 78.338 sec [junit] Running org.apache.pig.test.TestPigContextClassCache [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.322 sec [junit] Running org.apache.pig.test.TestPigException [junit] Tests
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (30 issues) Subscriber: pigdaily Key Summary PIG-4496Fix CBZip2InputStream to close underlying stream https://issues.apache.org/jira/browse/PIG-4496 PIG-4494Pig's htrace version conflicts with that of hadoop 2.6.0 https://issues.apache.org/jira/browse/PIG-4494 PIG-4490MIN/MAX builtin UDFs return wrong results when accumulating for strings https://issues.apache.org/jira/browse/PIG-4490 PIG-4481e2e tests ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and StreamingPerformance_4 produce different result on Windows https://issues.apache.org/jira/browse/PIG-4481 PIG-4468Pig's jackson version conflicts with that of hadoop 2.6.0 https://issues.apache.org/jira/browse/PIG-4468 PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter https://issues.apache.org/jira/browse/PIG-4455 PIG-4452Embedded SQL using SQL instead of sql fails with string index out of range: -1 error https://issues.apache.org/jira/browse/PIG-4452 PIG-4422Implement visitMergeJoin in SparkCompiler https://issues.apache.org/jira/browse/PIG-4422 PIG-4418NullPointerException in JVMReuseImpl https://issues.apache.org/jira/browse/PIG-4418 PIG-4417Pig's register command should support automatic fetching of jars from repo. https://issues.apache.org/jira/browse/PIG-4417 PIG-4377Skewed outer join produce wrong result in some cases https://issues.apache.org/jira/browse/PIG-4377 PIG-4365TOP udf should implement Accumulator interface https://issues.apache.org/jira/browse/PIG-4365 PIG-4341Add CMX support to pig.tmpfilecompression.codec https://issues.apache.org/jira/browse/PIG-4341 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4276Fix ordering related failures in TestEvalPipeline for Spark https://issues.apache.org/jira/browse/PIG-4276 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4193Make collected group work with Spark https://issues.apache.org/jira/browse/PIG-4193 PIG-4111Make Pig compiles with avro-1.7.7 https://issues.apache.org/jira/browse/PIG-4111 PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce https://issues.apache.org/jira/browse/PIG-4004 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3851Upgrade jline to 2.11 https://issues.apache.org/jira/browse/PIG-3851 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328filterId=12322384
[jira] [Commented] (PIG-4365) TOP udf should implement Accumulator interface
[ https://issues.apache.org/jira/browse/PIG-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507782#comment-14507782 ] Rohini Palaniswamy commented on PIG-4365: - [~eyal], Where it returns null before it now returns an empty bag. That needs to be fixed. Could you also add a test with actual pig script and small batch size so that the full code path is exercised in the test? Refer TestAccumulator for example. Can you also post the new patch in the review board (reviews.apache.org) as well? TOP udf should implement Accumulator interface -- Key: PIG-4365 URL: https://issues.apache.org/jira/browse/PIG-4365 Project: Pig Issue Type: Task Affects Versions: 0.15.0 Reporter: Rohini Palaniswamy Assignee: Eyal Allweil Labels: newbie Fix For: 0.15.0 Attachments: PIG-4365.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4514) pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change
Thejas M Nair created PIG-4514: -- Summary: pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change Key: PIG-4514 URL: https://issues.apache.org/jira/browse/PIG-4514 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.15.0 {code} src/org/apache/pig/backend/hadoop/executionengine/tez/runtime/PigGraceShuffleVertexManager.java:173: error: exception TezException is never thrown in body of corresponding try statement [javac] } catch (TezException e) { [javac] ^ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4514) pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change
[ https://issues.apache.org/jira/browse/PIG-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4514: --- Attachment: PIG-4514.1.patch pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change - Key: PIG-4514 URL: https://issues.apache.org/jira/browse/PIG-4514 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.15.0 Attachments: PIG-4514.1.patch {code} src/org/apache/pig/backend/hadoop/executionengine/tez/runtime/PigGraceShuffleVertexManager.java:173: error: exception TezException is never thrown in body of corresponding try statement [javac] } catch (TezException e) { [javac] ^ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PIG-4295) Enable unit test TestPigContext for spark
[ https://issues.apache.org/jira/browse/PIG-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel reassigned PIG-4295: - Assignee: liyunzhang_intel Enable unit test TestPigContext for spark --- Key: PIG-4295 URL: https://issues.apache.org/jira/browse/PIG-4295 Project: Pig Issue Type: Sub-task Components: spark Reporter: liyunzhang_intel Assignee: liyunzhang_intel Fix For: spark-branch Attachments: TEST-org.apache.pig.test.TestPigContext.txt error log is attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4513) Lines dropped in delimited text when they begin with null/no-data
[ https://issues.apache.org/jira/browse/PIG-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-4513: Fix Version/s: 0.15.0 Lines dropped in delimited text when they begin with null/no-data - Key: PIG-4513 URL: https://issues.apache.org/jira/browse/PIG-4513 Project: Pig Issue Type: Bug Components: parser, piggybank Affects Versions: 0.12.0 Environment: CDH5.2.x, CDH5.3.x Reporter: Madhan Sundararajan Devaki Priority: Blocker Fix For: 0.15.0 When Pig (0.12) is used to process delimited text files (| delimited), lines that do not contain data in the first column are dropped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4490) MIN/MAX builtin UDFs return wrong results when accumulating for strings
[ https://issues.apache.org/jira/browse/PIG-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-4490: Fix Version/s: 0.15.0 Assignee: xplenty [~opensou...@xplenty.com], What problems do you have with compiling? Please ensure that the testcases you have added fail without the fix and pass after the fix. Let me know if you need any help or clarifications as I would like to get this patch into Pig 0.15. MIN/MAX builtin UDFs return wrong results when accumulating for strings --- Key: PIG-4490 URL: https://issues.apache.org/jira/browse/PIG-4490 Project: Pig Issue Type: Bug Components: internal-udfs Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: xplenty Assignee: xplenty Fix For: 0.15.0 Attachments: fix-min-max-test.patch, fix-min-max.patch When using MIN/MAX UDFs with strings in a job that uses the accumulator interface the results are wrong - The UDF won't return the correct MIN/MAX value. this is caused by a reverse 'GreaterThan/SmallerThan () sign in the accumulate() function of both StringMin/StringMax UDFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Branching Pig 0.15
+1 for frequent releases. On Fri, Apr 17, 2015 at 8:38 PM, Daniel Dai da...@hortonworks.com wrote: It's almost 5 months since Pig 0.14.0 released, and we added Hive UDF, tez grace parallelism, numerous tez fixes and quite a few other patches. I would like to branch 0.15 by next week Wednesday. We can continue to check in important bug fixes into 0.15 after branching. Any objection? Thanks, Daniel
[jira] [Commented] (PIG-4513) Lines dropped in delimited text when they begin with null/no-data
[ https://issues.apache.org/jira/browse/PIG-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508038#comment-14508038 ] Rohini Palaniswamy commented on PIG-4513: - This sounds bad. Can you add a reproducible script with data to the jira? Lines dropped in delimited text when they begin with null/no-data - Key: PIG-4513 URL: https://issues.apache.org/jira/browse/PIG-4513 Project: Pig Issue Type: Bug Components: parser, piggybank Affects Versions: 0.12.0 Environment: CDH5.2.x, CDH5.3.x Reporter: Madhan Sundararajan Devaki Priority: Blocker When Pig (0.12) is used to process delimited text files (| delimited), lines that do not contain data in the first column are dropped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)