Hello, Tested with Pig 0.12.1 and Pig 0.14.0
I write here with not much hope, but maybe I have luck and someone knows how to solve it :) I am writing an Storage for Gora, and if I use an outer bag inside a foreach when storing I get java.lang.StackOverflowError . Exactly this: Pig Stack Trace --------------- ERROR 2998: Unhandled internal error. null java.lang.StackOverflowError at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at java.lang.Class.newInstance(Class.java:379) at org.apache.pig.impl.util.Utils.mergeCollection(Utils.java:441) at org.apache.pig.newplan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:84) at org.apache.pig.newplan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:88) at org.apache.pig.newplan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:88) (fill 1030 lines of log with this last line) When doing a dump or using PigStorage all works perfectly, so the problem is surely in my Storage implementation. The script is as follows: borrar_areas_table = LOAD '.' USING org.apache.gora.pig.GoraStorage( 'java.lang.String', 'es.indra.innovationlabs.celtic.generated.BorrarAreas', 'nombre') ; borrar_areas = FOREACH borrar_areas_table GENERATE key ; borrar_areas_bag = GROUP borrar_areas ALL ; -- [2] - Borrar de webpage: -- experta: map <area> -> record = hashmap, -- y areas: array <areas> = bag webpage = LOAD '.' USING org.apache.gora.pig.GoraStorage( 'java.lang.String', 'org.apache.nutch.storage.WebPage', 'experta, areas') ; -- Seleccionar aquellas páginas que contienen en <areas> alguna de las áreas a borrar (en borrar_areas_bag.borrar_areas) webpage_match = FILTER webpage BY bagContainsFB(areas, borrar_areas_bag.borrar_areas) ; -- Borrar las áreas (bag) y las claves en experta (map) webpage_fix = FOREACH webpage_match GENERATE key, deleteMapKeys(experta, borrar_areas_bag.borrar_areas) as experta, SUBTRACT(areas, borrar_areas_bag.borrar_areas) as areas ; STORE webpage_fix INTO '.' USING org.apache.gora.pig.GoraStorage( 'java.lang.String', 'org.apache.nutch.storage.WebPage', 'experta, areas') ; I have to do a workaround in order to get things done, avoiding using borrar_areas_bag.borrar_areas and using a cross instead, but the execution is noticeably slower: borrar_areas_table = LOAD '.' USING org.apache.gora.pig.GoraStorage( 'java.lang.String', 'es.indra.innovationlabs.celtic.generated.BorrarAreas', 'nombre') ; borrar_areas = FOREACH borrar_areas_table GENERATE key ; borrar_areas_bag = GROUP borrar_areas ALL ; -- [2] - Borrar de webpage: experta: map <area> -> record = hashmap, y areas: array <areas> = bag webpage = LOAD '.' USING org.apache.gora.pig.GoraStorage( 'java.lang.String', 'org.apache.nutch.storage.WebPage', 'experta, areas') ; webpage_cross_areas = CROSS webpage, borrar_areas_bag ; -- Seleccionar aquellas páginas que contienen en <areas> alguna de las áreas a borrar (en borrar_areas_bag::borrar_areas) webpage_match = FILTER webpage_cross_areas BY bagContainsFB(webpage::areas, borrar_areas_bag::borrar_areas) ; -- Borrar las áreas (bag) y las claves en experta (map) webpage_fix = FOREACH webpage_match GENERATE webpage::key AS key, deleteMapKeys(experta, borrar_areas_bag::borrar_areas) as experta, SUBTRACT(areas, borrar_areas_bag::borrar_areas) as areas:{(chararray)} ; STORE webpage_fix INTO '.' USING org.apache.gora.pig.GoraStorage( 'java.lang.String', 'org.apache.nutch.storage.WebPage', 'experta, areas') ; The actual question is: Does anyone think about something if I ask about that case?: outerbag in a foreach, Storage, dependecies, ... Any possible method that I should implement? Is related with some schema? I know is a quite nonsense question, so I don't expect any idea :( but thanks! :) Regards, Alfonso Nishikawa