[ 
https://issues.apache.org/jira/browse/PIG-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039583#comment-15039583
 ] 

liyunzhang_intel commented on PIG-4594:
---------------------------------------

[~mohitsabharwal] and [~kexianda]:
Let's explain why need add forceConnect method in PhysicalPlan.java:
{quote}
 If we want to connect x to y and z. In MR implementation, we clone two copies 
x1 and x2. x1 will be connected to y, x2 will be connected to z..
{quote}
 Agree with Xianda. Let's see an example for it.  
{code}
cat bin/testMultiQueryJiraPig983_2.pig
a = load './passwd' using PigStorage(':') as (uname:chararray, 
passwd:chararray, uid:int, gid:int);
b = filter a by uid < 5;
c = filter a by uid >= 5;
d = join b by uname, c by uname;
store d into './testMultiQueryJiraPig983_2.out'; 
{code}   
You can see in following result,after multiquery optimization, scope-57 and 
scope-67 actullay are same, so in mr implemention, scope-67 copys scope-57 to 
avoid exception 
"This operator does not support multiple outputs". we need do the load *twice* 
even though they are *same.* 
{code}
before multiquery optimization:
#--------------------------------------------------
# Map Reduce Plan                                 
#--------------------------------------------------
MapReduce node scope-39
Map Plan
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp45078980/tmp-1782295863:org.apache.pig.impl.io.InterStorage)
 - scope-40
|
|---a: New For Each(false,false,false,false)[bag] - scope-13
    |   |
    |   Cast[chararray] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[chararray] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |   |
    |   Cast[int] - scope-8
    |   |
    |   |---Project[bytearray][2] - scope-7
    |   |
    |   Cast[int] - scope-11
    |   |
    |   |---Project[bytearray][3] - scope-10
    |
    |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) 
- scope-0--------
Global sort: false
----------------

MapReduce node scope-45
Map Plan
Union[tuple] - scope-46
|
|---d: Local Rearrange[tuple]{chararray}(false) - scope-31
|   |   |
|   |   Project[chararray][0] - scope-32
|   |
|   |---b: Filter[bag] - scope-17
|       |   |
|       |   Less Than[boolean] - scope-20
|       |   |
|       |   |---Project[int][2] - scope-18
|       |   |
|       |   |---Constant(5) - scope-19
|       |
|       
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp45078980/tmp-1782295863:org.apache.pig.impl.io.InterStorage)
 - scope-41
|
|---d: Local Rearrange[tuple]{chararray}(false) - scope-33
    |   |
    |   Project[chararray][0] - scope-34
    |
    |---c: Filter[bag] - scope-23
        |   |
        |   Greater Than or Equal[boolean] - scope-26
        |   |
        |   |---Project[int][2] - scope-24
        |   |
        |   |---Constant(5) - scope-25
        |
        
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp45078980/tmp-1782295863:org.apache.pig.impl.io.InterStorage)
 - scope-43--------
Reduce Plan
d: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/testMultiQueryJiraPig983_2.out:org.apache.pig.builtin.PigStorage)
 - scope-38
|
|---d: Package(JoinPackager(true,true))[tuple]{chararray} - scope-30--------
Global sort: false
----------------

after multiquery optimization:
#--------------------------------------------------
# Map Reduce Plan                                 
#--------------------------------------------------
MapReduce node scope-45
Map Plan
Union[tuple] - scope-46
|
|---d: Local Rearrange[tuple]{chararray}(false) - scope-31
|   |   |
|   |   Project[chararray][0] - scope-32
|   |
|   |---b: Filter[bag] - scope-17
|       |   |
|       |   Less Than[boolean] - scope-20
|       |   |
|       |   |---Project[int][2] - scope-18
|       |   |
|       |   |---Constant(5) - scope-19
|       |
|       |---a: New For Each(false,false,false,false)[bag] - scope-67
|           |   |
|           |   Cast[chararray] - scope-60
|           |   |
|           |   |---Project[bytearray][0] - scope-59
|           |   |
|           |   Cast[chararray] - scope-62
|           |   |
|           |   |---Project[bytearray][1] - scope-61
|           |   |
|           |   Cast[int] - scope-64
|           |   |
|           |   |---Project[bytearray][2] - scope-63
|           |   |
|           |   Cast[int] - scope-66
|           |   |
|           |   |---Project[bytearray][3] - scope-65
|           |
|           |---a: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-58
|
|---d: Local Rearrange[tuple]{chararray}(false) - scope-33
    |   |
    |   Project[chararray][0] - scope-34
    |
    |---c: Filter[bag] - scope-23
        |   |
        |   Greater Than or Equal[boolean] - scope-26
        |   |
        |   |---Project[int][2] - scope-24
        |   |
        |   |---Constant(5) - scope-25
        |
        |---a: New For Each(false,false,false,false)[bag] - scope-57
            |   |
            |   Cast[chararray] - scope-50
            |   |
            |   |---Project[bytearray][0] - scope-49
            |   |
            |   Cast[chararray] - scope-52
            |   |
            |   |---Project[bytearray][1] - scope-51
            |   |
            |   Cast[int] - scope-54
            |   |
            |   |---Project[bytearray][2] - scope-53
            |   |
            |   Cast[int] - scope-56
            |   |
            |   |---Project[bytearray][3] - scope-55
            |
            |---a: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - 
scope-48--------
Reduce Plan
d: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/testMultiQueryJiraPig983_2.out:org.apache.pig.builtin.PigStorage)
 - scope-38
|
|---d: Package(JoinPackager(true,true))[tuple]{chararray} - scope-30--------
Global sort: false
----------------                  
{code}
*but* in spark multiquery optimization: we *don't* need do the load 
*twice*(only have one load(*scope-0*))
the result in spark mode:
{code}
before multiquery optimization:
scope-39->scope-45
scope-45
#--------------------------------------------------
# Spark Plan                                 
#--------------------------------------------------

Spark node scope-39
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp1918416213/tmp1819996690:org.apache.pig.impl.io.InterStorage)
 - scope-40
|
|---a: New For Each(false,false,false,false)[bag] - scope-13
    |   |
    |   Cast[chararray] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[chararray] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |   |
    |   Cast[int] - scope-8
    |   |
    |   |---Project[bytearray][2] - scope-7
    |   |
    |   Cast[int] - scope-11
    |   |
    |   |---Project[bytearray][3] - scope-10
    |
    |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) 
- scope-0--------

Spark node scope-45
d: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/testMultiQueryJiraPig983_2.out:org.apache.pig.builtin.PigStorage)
 - scope-38
|
|---d: New For Each(true,true)[tuple] - scope-37
    |   |
    |   Project[bag][1] - scope-35
    |   |
    |   Project[bag][2] - scope-36
    |
    |---d: Package(Packager)[tuple]{chararray} - scope-30
        |
        |---d: Global Rearrange[tuple] - scope-29
            |
            |---d: Local Rearrange[tuple]{chararray}(false) - scope-31
            |   |   |
            |   |   Project[chararray][0] - scope-32
            |   |
            |   |---b: Filter[bag] - scope-17
            |       |   |
            |       |   Less Than[boolean] - scope-20
            |       |   |
            |       |   |---Project[int][2] - scope-18
            |       |   |
            |       |   |---Constant(5) - scope-19
            |       |
            |       
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp1918416213/tmp1819996690:org.apache.pig.impl.io.InterStorage)
 - scope-41
            |
            |---d: Local Rearrange[tuple]{chararray}(false) - scope-33
                |   |
                |   Project[chararray][0] - scope-34
                |
                |---c: Filter[bag] - scope-23
                    |   |
                    |   Greater Than or Equal[boolean] - scope-26
                    |   |
                    |   |---Project[int][2] - scope-24
                    |   |
                    |   |---Constant(5) - scope-25
                    |
                    
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp1918416213/tmp1819996690:org.apache.pig.impl.io.InterStorage)
 - scope-43--------
after multiquery optimization:
scope-39
#--------------------------------------------------
# Spark Plan                                 
#--------------------------------------------------

Spark node scope-39
d: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/testMultiQueryJiraPig983_2.out:org.apache.pig.builtin.PigStorage)
 - scope-38
|
|---d: New For Each(true,true)[tuple] - scope-37
    |   |
    |   Project[bag][1] - scope-35
    |   |
    |   Project[bag][2] - scope-36
    |
    |---d: Package(Packager)[tuple]{chararray} - scope-30
        |
        |---d: Global Rearrange[tuple] - scope-29
            |
            |---d: Local Rearrange[tuple]{chararray}(false) - scope-31
            |   |   |
            |   |   Project[chararray][0] - scope-32
            |   |
            |   |---b: Filter[bag] - scope-17
            |       |   |
            |       |   Less Than[boolean] - scope-20
            |       |   |
            |       |   |---Project[int][2] - scope-18
            |       |   |
            |       |   |---Constant(5) - scope-19
            |       |
            |       |---a: New For Each(false,false,false,false)[bag] - scope-13
            |           |   |
            |           |   Cast[chararray] - scope-2
            |           |   |
            |           |   |---Project[bytearray][0] - scope-1
            |           |   |
            |           |   Cast[chararray] - scope-5
            |           |   |
            |           |   |---Project[bytearray][1] - scope-4
            |           |   |
            |           |   Cast[int] - scope-8
            |           |   |
            |           |   |---Project[bytearray][2] - scope-7
            |           |   |
            |           |   Cast[int] - scope-11
            |           |   |
            |           |   |---Project[bytearray][3] - scope-10
            |           |
            |           |---a: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0
            |
            |---d: Local Rearrange[tuple]{chararray}(false) - scope-33
                |   |
                |   Project[chararray][0] - scope-34
                |
                |---c: Filter[bag] - scope-23
                    |   |
                    |   Greater Than or Equal[boolean] - scope-26
                    |   |
                    |   |---Project[int][2] - scope-24
                    |   |
                    |   |---Constant(5) - scope-25
                    |
                    |---a: New For Each(false,false,false,false)[bag] - scope-13
                        |   |
                        |   Cast[chararray] - scope-2
                        |   |
                        |   |---Project[bytearray][0] - scope-1
                        |   |
                        |   Cast[chararray] - scope-5
                        |   |
                        |   |---Project[bytearray][1] - scope-4
                        |   |
                        |   Cast[int] - scope-8
                        |   |
                        |   |---Project[bytearray][2] - scope-7
                        |   |
                        |   Cast[int] - scope-11
                        |   |
                        |   |---Project[bytearray][3] - scope-10
                        |
                        |---a: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - 
scope-0--------
    {code}                         

Method *forceConnect* I added in PIG-4594.patch works to connect y and z to x 
even though x does not support multi outputs because i *remove* the check 
whether the operator supports multiOutputs.

> Enable "TestMultiQuery" in spark mode
> -------------------------------------
>
>                 Key: PIG-4594
>                 URL: https://issues.apache.org/jira/browse/PIG-4594
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4594-3.patch, PIG-4594.patch, PIG-4594_1.patch, 
> PIG-4594_2.patch
>
>
> in https://builds.apache.org/job/Pig-spark/211/#showFailuresLink,it shows 
> that 
> following unit test failures fail:
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1068
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1157
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1252
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1438



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to