-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57996/
-----------------------------------------------------------
(Updated May 15, 2017, 1:27 p.m.)
Review request for pig, Daniel Dai, liyun zhang, Rohini Palaniswamy, and Xuefu
Zhang.
Bugs: PIG-5186
https://issues.apache.org/jira/browse/PIG-5186
Repository: pig-git
Description
-------
Aggregate warnings were not supported in Spark mode yet (hence the e2e Warning
test case failures). I aim to enable this now.
In MR/Tez we use counters, and in Spark we rely on Accumulators (a means to
support distributed counters).
Pig has some builtin warning enums in PigWarning, and also supports custom
warnings for user defined functions.
This latter is problematic with Spark because you cannot register new
accumulators on the backend and read their values later in the driver.
A workaround has been implemented in my patch whereas we define Map type of
Accumulators (beside the Long type we already use). One for the builtin
warnings, one for the custom ones. These are passed from driver to backend,
where the executors can create entries in the maps or increment preexisting
values.
Also added upgrade of DummyContextUDF, this will help fix HiveUDF_7 e2e test
case on Spark.
Previously this was using org.apache.hadoop.mapred.Reporter we have to update
this to PigHadoopLogger which supports Spark too.
Diffs (updated)
-----
src/org/apache/pig/PigWarning.java fcda1145f4e7c16940a540222ac7cc5370e3db33
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigHadoopLogger.java
255650edb519acc452812a5d67f3ac2376c278c2
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java
85e3dc2f80c22b358560c716151d0ba931c78e4e
src/org/apache/pig/backend/hadoop/executionengine/spark/running/PigInputFormatSpark.java
a22c90dd8dd9b6463f22597effd884bc0f12bb73
src/org/apache/pig/tools/pigstats/PigStatusReporter.java
5396535301b0e90dc5d3be2064cfe0bdf488bf6a
src/org/apache/pig/tools/pigstats/PigWarnCounter.java PRE-CREATION
src/org/apache/pig/tools/pigstats/spark/SparkCounter.java
2411f875ec996fedb870c1b709b99e949803ed50
src/org/apache/pig/tools/pigstats/spark/SparkCounterGroup.java
c23624dfcd2e11429fd8355497d184b155450c1f
src/org/apache/pig/tools/pigstats/spark/SparkCounters.java
5ca077ca519ad766c8f6a23ef5b69cd02f3abe99
src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java
4c81a414be071d5e08689fde2a5abd847a5fd0b6
src/org/apache/pig/tools/pigstats/spark/SparkPigStats.java
4e3644209817eeffabcec13d8aac5179bbad1c62
src/org/apache/pig/tools/pigstats/spark/SparkStatsUtil.java
8f3bf6d736de08ae963152553e975e865f875ab2
test/e2e/pig/tests/nightly.conf 2048740b6a1b73862694805cc07bdd084c66a300
Diff: https://reviews.apache.org/r/57996/diff/2/
Changes: https://reviews.apache.org/r/57996/diff/1-2/
Testing
-------
After this patch Warning E2E tests on Spark pass.
Thanks,
Adam Szita