[
https://issues.apache.org/jira/browse/PIG-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615915#comment-14615915
]
Mohit Sabharwal commented on PIG-4611:
--------------------------------------
Thanks for the explanation and addressing this issue, [~kellyzly]!!!
Let me know if I understand this correctly:
1) Spark Executor will serialize all objects referenced in supplied closures.
Since UDFContext.getUDFContext() is not initialized (because Spark does not
expose a setup() interface like MR), we always default defaultCaster to
STRING_CASTER.
2) However later on, in the *same* Executor thread, the record reader creation
will correctly deserialize the UDFContext from JobConf
(PigInputFormatSpark.createRecordReader->PigInputFormat.createRecordReader->MapRedUtil.setupUDFContext->UDFContext.deserialize)
3) Next, in the same Executor thread, when HBaseStorage is initialized by the
load function, it will find a correctly populated UDFContext.
This sounds reasonable to me. Since this a core change, could you please add
comments to HBaseStorage.java explaining why we handling this as a special case
for Spark ?
I assume it is a typo, but you need -Dexectype argument to be {{spark}}, not
{{TestHBaseStorage}} when running TestHBaseStorage:
{code}
ant test -Dhadoopversion=23 -Dtestcase=TestHBaseStorage -Dexectype=spark
-DdebugPort=9999
{code}
> Fix remaining unit test failures about "TestHBaseStorage"
> ---------------------------------------------------------
>
> Key: PIG-4611
> URL: https://issues.apache.org/jira/browse/PIG-4611
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4611.patch
>
>
> In https://builds.apache.org/job/Pig-spark/lastCompletedBuild/testReport/, it
> shows following unit test failures about TestHBaseStorage:
> org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete
> org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_1
> org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_2
> org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection
> org.apache.pig.test.TestHBaseStorage.testCollectedGroup
> org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)