Hello,

Tested with Pig 0.12.1 and Pig 0.14.0

I write here with not much hope, but maybe I have luck and someone knows
how to solve it :)

I am writing an Storage for Gora, and if I use an outer bag inside a
foreach when storing I get java.lang.StackOverflowError .

Exactly this:

Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. null

java.lang.StackOverflowError
        at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown
Source)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:379)
        at org.apache.pig.impl.util.Utils.mergeCollection(Utils.java:441)
        at
org.apache.pig.newplan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:84)
        at
org.apache.pig.newplan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:88)
        at
org.apache.pig.newplan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:88)
(fill 1030 lines of log with this last line)

When doing a dump or using PigStorage all works perfectly, so the problem
is surely in my Storage implementation.

The script is as follows:


borrar_areas_table = LOAD '.'
               USING org.apache.gora.pig.GoraStorage(
                  'java.lang.String',
                  'es.indra.innovationlabs.celtic.generated.BorrarAreas',
                  'nombre') ;

borrar_areas = FOREACH borrar_areas_table GENERATE key ;
borrar_areas_bag = GROUP borrar_areas ALL ;

-- [2] - Borrar de webpage:
--          experta: map <area> -> record = hashmap,
--        y areas: array <areas> = bag
webpage = LOAD '.'
          USING org.apache.gora.pig.GoraStorage(
                  'java.lang.String',
                  'org.apache.nutch.storage.WebPage',
                  'experta, areas') ;

    -- Seleccionar aquellas páginas que contienen en <areas> alguna de las
áreas a borrar (en borrar_areas_bag.borrar_areas)
    webpage_match = FILTER webpage BY bagContainsFB(areas,
borrar_areas_bag.borrar_areas) ;
    -- Borrar las áreas (bag) y las claves en experta (map)
    webpage_fix   = FOREACH webpage_match
                    GENERATE key, deleteMapKeys(experta,
borrar_areas_bag.borrar_areas) as experta,
                             SUBTRACT(areas, borrar_areas_bag.borrar_areas)
as areas ;

    STORE webpage_fix INTO '.' USING org.apache.gora.pig.GoraStorage(
                         'java.lang.String',
                         'org.apache.nutch.storage.WebPage',
                         'experta, areas') ;

I have to do a workaround in order to get things done, avoiding using
borrar_areas_bag.borrar_areas and using a cross instead, but the execution
is noticeably slower:

borrar_areas_table = LOAD '.'
               USING org.apache.gora.pig.GoraStorage(
                  'java.lang.String',
                  'es.indra.innovationlabs.celtic.generated.BorrarAreas',
                  'nombre') ;

borrar_areas = FOREACH borrar_areas_table GENERATE key ;
borrar_areas_bag = GROUP borrar_areas ALL ;

-- [2] - Borrar de webpage: experta: map <area> -> record = hashmap, y
areas: array <areas> = bag
webpage = LOAD '.'
          USING org.apache.gora.pig.GoraStorage(
                  'java.lang.String',
                  'org.apache.nutch.storage.WebPage',
                  'experta, areas') ;

    webpage_cross_areas = CROSS webpage, borrar_areas_bag ;
    -- Seleccionar aquellas páginas que contienen en <areas> alguna de las
áreas a borrar (en borrar_areas_bag::borrar_areas)
    webpage_match = FILTER webpage_cross_areas BY
bagContainsFB(webpage::areas, borrar_areas_bag::borrar_areas) ;
    -- Borrar las áreas (bag) y las claves en experta (map)
    webpage_fix   = FOREACH webpage_match
                    GENERATE webpage::key AS key,
                             deleteMapKeys(experta,
borrar_areas_bag::borrar_areas) as experta,
                             SUBTRACT(areas,
borrar_areas_bag::borrar_areas) as areas:{(chararray)} ;

    STORE webpage_fix INTO '.' USING org.apache.gora.pig.GoraStorage(
                         'java.lang.String',
                         'org.apache.nutch.storage.WebPage',
                         'experta, areas') ;


The actual question is: Does anyone think about something if I ask about
that case?: outerbag in a foreach, Storage, dependecies, ...
Any possible method that I should implement? Is related with some schema?

I know is a quite nonsense question, so I don't expect any idea :( but
thanks! :)

Regards,

Alfonso Nishikawa

Reply via email to