-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57996/
-----------------------------------------------------------
Review request for pig, Daniel Dai, liyun zhang, Rohini Palaniswamy, and Xuefu
Zhang.
Bugs: PIG-5186
https://issues.apache.org/jira/browse/PIG-5186
Repository: pig-git
Description
-------
Aggregate warnings were not supported in Spark mode yet (hence the e2e Warning
test case failures). I aim to enable this now.
In MR/Tez we use counters, and in Spark we rely on Accumulators (a means to
support distributed counters).
Pig has some builtin warning enums in PigWarning, and also supports custom
warnings for user defined functions.
This latter is problematic with Spark because you cannot register new
accumulators on the backend and read their values later in the driver.
A workaround has been implemented in my patch whereas we define Map type of
Accumulators (beside the Long type we already use). One for the builtin
warnings, one for the custom ones. These are passed from driver to backend,
where the executors can create entries in the maps or increment preexisting
values.
Also added upgrade of DummyContextUDF, this will help fix HiveUDF_7 e2e test
case on Spark.
Previously this was using org.apache.hadoop.mapred.Reporter we have to update
this to PigHadoopLogger which supports Spark too.
Diffs
-----
src/org/apache/pig/PigWarning.java fcda1145f4e7c16940a540222ac7cc5370e3db33
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigHadoopLogger.java
255650edb519acc452812a5d67f3ac2376c278c2
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java
36813b27be1090b04d577829080e4b931c5eb950
src/org/apache/pig/backend/hadoop/executionengine/spark/running/PigInputFormatSpark.java
8cf6513d3e5425e974d27d74c92592a6f0ed2cf2
src/org/apache/pig/tools/pigstats/PigStatusReporter.java
5396535301b0e90dc5d3be2064cfe0bdf488bf6a
src/org/apache/pig/tools/pigstats/PigWarnCounterIncrementable.java
PRE-CREATION
src/org/apache/pig/tools/pigstats/spark/SparkCounter.java
2411f875ec996fedb870c1b709b99e949803ed50
src/org/apache/pig/tools/pigstats/spark/SparkCounterGroup.java
c23624dfcd2e11429fd8355497d184b155450c1f
src/org/apache/pig/tools/pigstats/spark/SparkCounters.java
5ca077ca519ad766c8f6a23ef5b69cd02f3abe99
src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java
808c3deb47bc8d9a212a701c81e0c9c6abe88f37
src/org/apache/pig/tools/pigstats/spark/SparkPigStats.java
699219d30519c7db56a4b39c7690fa20d62df44d
src/org/apache/pig/tools/pigstats/spark/SparkStatsUtil.java
2945c80dba4a23b07ee3d8a613b6a7e9319622ba
test/e2e/pig/udfs/java/org/apache/pig/test/udf/evalfunc/DummyContextUDF.java
d5eb9ae660f94a444a17f2171107b6ff7e81819b
Diff: https://reviews.apache.org/r/57996/diff/1/
Testing
-------
After this patch Warning E2E tests on Spark pass.
Thanks,
Adam Szita