[ 
https://issues.apache.org/jira/browse/PIG-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571825#action_12571825
 ] 

Pi Song commented on PIG-114:
-----------------------------

Even both StoreFunc and LoadFunc exist in the custom storage class, it doesn't 
guarantee that 
{noformat}
LoadFunc (StoreFunc(x)) = x
{noformat}
as this is left open to users to implement.

As the definition of optimization (in this case where we are only interested in 
output) , the output regardless of doing optimization or not should be the same.

Reading the output of "Store Operator" is therefore considered "unsafe" for 
optimization.

My suggestions would be :-
1. By default go back to get intermediate result before "Store" as this will 
rely on StoreFunc and LoadFunc of PigStorage (Supposing that this is not merely 
load-and-then-store execution plan). Though this will incur some performance 
hit as the output of MapReduce run associated with "Store operator" cannot be 
reused.
2. Provide a way for users implementing  storage to tell the execution engine 
that  LoadFunc  is truly inverse of StoreFunc in the implementation so that the 
execution engine can take advantage of that and doesn't have to go to the 
intermediate result before "Store" .
3. All the built-in storage implementation should be truly reversible

> store one alias/logicalPlan twice leads to instantiation of StoreFunc as 
> LoadFunc
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-114
>                 URL: https://issues.apache.org/jira/browse/PIG-114
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Johannes Zillmann
>         Attachments: pigPatch-storeTwice-620665.patch
>
>
> Calling PigServer#store() twice for an alias results in following exception :
> {noformat}
> java.lang.RuntimeException: java.lang.ClassCastException: 
> org.apache.pig.test.DummyStoreFunc cannot be cast to org.apache.pig.LoadFunc
>       at 
> org.apache.pig.backend.local.executionengine.POLoad.<init>(POLoad.java:59)
>       at 
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.doCompile(LocalExecutionEngine.java:167)
>       at 
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.doCompile(LocalExecutionEngine.java:184)
>       at 
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.doCompile(LocalExecutionEngine.java:184)
>       at 
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.compile(LocalExecutionEngine.java:111)
>       at 
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.compile(LocalExecutionEngine.java:90)
>       at 
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.compile(LocalExecutionEngine.java:1)
>       at org.apache.pig.PigServer.store(PigServer.java:330)
>       at org.apache.pig.PigServer.store(PigServer.java:317)
>       at org.apache.pig.test.StoreTwiceTest.testIt(StoreTwiceTest.java:31)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:589)
>       at junit.framework.TestCase.runTest(TestCase.java:164)
>       at junit.framework.TestCase.runBare(TestCase.java:130)
>       at junit.framework.TestResult$1.protect(TestResult.java:110)
>       at junit.framework.TestResult.runProtected(TestResult.java:128)
>       at junit.framework.TestResult.run(TestResult.java:113)
>       at junit.framework.TestCase.run(TestCase.java:120)
>       at junit.framework.TestSuite.runTest(TestSuite.java:228)
>       at junit.framework.TestSuite.run(TestSuite.java:223)
>       at 
> org.junit.internal.runners.OldTestClassRunner.run(OldTestClassRunner.java:35)
>       at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
>       at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>       at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
>       at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
>       at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
>       at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
> Caused by: java.lang.ClassCastException: org.apache.pig.test.DummyStoreFunc 
> cannot be cast to org.apache.pig.LoadFunc
>       at 
> org.apache.pig.backend.local.executionengine.POLoad.<init>(POLoad.java:57)
>       ... 28 more
> {noformat}
> I will attach a patch with a test scenario for this. Basically the code is as 
> follow:
> {noformat}PigServer pig = new PigServer(ExecType.LOCAL);
>         pig
>                 .registerQuery("A = LOAD 
> 'test/org/apache/pig/test/StoreTwiceTest.java' USING "
>                         + DummyLoadFunc.class.getName() + "();");
>         pig.registerQuery("B = FOREACH A GENERATE * ;");
>         File outputFile = new File("/tmp/testPigOutput");
>         outputFile.delete();
>         pig.store("A", outputFile.getAbsolutePath(), DummyStoreFunc.class
>                 .getName()
>                 + "()");
>         outputFile.delete();
>         pig.store("B", outputFile.getAbsolutePath(), DummyStoreFunc.class
>                 .getName()
>                 + "()");
>         outputFile.delete();
>         assertEquals(2, _storedTuples.size());
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to