[ 
https://issues.apache.org/jira/browse/PIG-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4610:
----------------------------------
    Attachment: PIG-4610.patch

[~mohitsabharwal],[~kexianda],[~praveenr019],[~xuefuz]: 
PIG-4610.patch fixes following unit test failures:
org.apache.pig.builtin.TestOrcStorage.testJoinWithPruning
org.apache.pig.builtin.TestOrcStorage.testLoadStoreMoreDataType
org.apache.pig.builtin.TestOrcStorage.testMultiStore


Let's make an example to explain why it fails before:
testOrcStorage.tmp.pig:
orc-file-11-format.orc  is found in 
$PIG_HOME/test/org/apache/pig/builtin/orc/orc-file-11-format.orc 
{code}
A = load './orc-file-11-format.orc' using OrcStorage();
B = foreach A generate int1,string1;
D = limit B 10;
store D into './testOrcStorage.tmp.out';
{code}

the result of spark:
{code}
false   1
false   1
false   1
false   1
false   1
false   1
false   1
false   1
false   1
false   1
{code}

the result of MR:
{code}
65536   hi
65536   bye
65536   hi
65536   bye
65536   hi
65536   bye
65536   hi
65536   bye
65536   hi
65536   bye
{code}
the data format from orc-file-11-format.orc is like: the requireColumns is the 
4th and 9th(this info is stored in orc-file-11-format.orc):
{code}
{true, 100, 2048, 65536, 9223372036854775807, 2.0, -5.0, , bye, {[{1, bye}, {2, 
sigh}]}, [{100000000, cat}, {-100000, in}, {1234, hat}], {chani={5, chani}, 
mauddib={1, mauddib}}, 2000-03-12 15:00:01, 12345678.6547457}
{code}

the difference between spark and mr is because [{{OrcStorage#mRequiredColumns}} 
|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/OrcStorage.java#L298]
 is not 
initialized([{{UDFContext.getUDFContext().isFrontend()}}|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/OrcStorage.java#L296]
 is true). The reason {{UDFContext.getUDFContext().isFrontend()}} is true 
because 
[{{jconf.get(MRConfiguration.JOB_APPLICATION_ATTEMPT_ID)}}|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/util/UDFContext.java#L238]
 is null. PIG-4610.patch is set {{MRConfiguration.JOB_APPLICATION_ATTEMPT_ID}} 
in SparkUtil#newJobConf.


> Enable "TestOrcStorage“ unit test in spark mode
> -----------------------------------------------
>
>                 Key: PIG-4610
>                 URL: https://issues.apache.org/jira/browse/PIG-4610
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4610.patch
>
>
> In https://builds.apache.org/job/Pig-spark/222/#showFailuresLink, it shows 
> following unit test failures about "TestOrcStorage":
> org.apache.pig.builtin.TestOrcStorage.testJoinWithPruning
> org.apache.pig.builtin.TestOrcStorage.testLoadStoreMoreDataType
> org.apache.pig.builtin.TestOrcStorage.testMultiStore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to