[ https://issues.apache.org/jira/browse/PIG-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liyunzhang_intel updated PIG-4610: ---------------------------------- Attachment: PIG-4610.patch [~mohitsabharwal],[~kexianda],[~praveenr019],[~xuefuz]: PIG-4610.patch fixes following unit test failures: org.apache.pig.builtin.TestOrcStorage.testJoinWithPruning org.apache.pig.builtin.TestOrcStorage.testLoadStoreMoreDataType org.apache.pig.builtin.TestOrcStorage.testMultiStore Let's make an example to explain why it fails before: testOrcStorage.tmp.pig: orc-file-11-format.orc is found in $PIG_HOME/test/org/apache/pig/builtin/orc/orc-file-11-format.orc {code} A = load './orc-file-11-format.orc' using OrcStorage(); B = foreach A generate int1,string1; D = limit B 10; store D into './testOrcStorage.tmp.out'; {code} the result of spark: {code} false 1 false 1 false 1 false 1 false 1 false 1 false 1 false 1 false 1 false 1 {code} the result of MR: {code} 65536 hi 65536 bye 65536 hi 65536 bye 65536 hi 65536 bye 65536 hi 65536 bye 65536 hi 65536 bye {code} the data format from orc-file-11-format.orc is like: the requireColumns is the 4th and 9th(this info is stored in orc-file-11-format.orc): {code} {true, 100, 2048, 65536, 9223372036854775807, 2.0, -5.0, , bye, {[{1, bye}, {2, sigh}]}, [{100000000, cat}, {-100000, in}, {1234, hat}], {chani={5, chani}, mauddib={1, mauddib}}, 2000-03-12 15:00:01, 12345678.6547457} {code} the difference between spark and mr is because [{{OrcStorage#mRequiredColumns}} |https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/OrcStorage.java#L298] is not initialized([{{UDFContext.getUDFContext().isFrontend()}}|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/OrcStorage.java#L296] is true). The reason {{UDFContext.getUDFContext().isFrontend()}} is true because [{{jconf.get(MRConfiguration.JOB_APPLICATION_ATTEMPT_ID)}}|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/util/UDFContext.java#L238] is null. PIG-4610.patch is set {{MRConfiguration.JOB_APPLICATION_ATTEMPT_ID}} in SparkUtil#newJobConf. > Enable "TestOrcStorage“ unit test in spark mode > ----------------------------------------------- > > Key: PIG-4610 > URL: https://issues.apache.org/jira/browse/PIG-4610 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Assignee: liyunzhang_intel > Fix For: spark-branch > > Attachments: PIG-4610.patch > > > In https://builds.apache.org/job/Pig-spark/222/#showFailuresLink, it shows > following unit test failures about "TestOrcStorage": > org.apache.pig.builtin.TestOrcStorage.testJoinWithPruning > org.apache.pig.builtin.TestOrcStorage.testLoadStoreMoreDataType > org.apache.pig.builtin.TestOrcStorage.testMultiStore -- This message was sent by Atlassian JIRA (v6.3.4#6332)