[
https://issues.apache.org/jira/browse/PIG-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheolsoo Park updated PIG-3645:
-------------------------------
Attachment: PIG-3645-1.patch
Attaching an initial patch that includes regenerated gold files.
I will run the full suite of unit tests.
> Replace Random with UUID in FileLocalizer.getTemporaryPath()
> ------------------------------------------------------------
>
> Key: PIG-3645
> URL: https://issues.apache.org/jira/browse/PIG-3645
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Priority: Minor
> Fix For: 0.13.0
>
> Attachments: PIG-3645-1.patch
>
>
> Currently, temporary paths are generated by FileLocalizer using
> Random.nextInt(). To provide strong randomness, MapReduceLauncher resets the
> Random object every time when compiling physical plan to MR plan:
> {code}
> MRCompiler comp = new MRCompiler(php, pc);
> comp.randomizeFileLocalizer(); // This in turn calls FileLocalizer.setR(new
> Random()).
> {code}
> Besides, there are a couple of places calling FileLocalizer.setR() (e.g.
> MRCompiler) with some random seed.
> I think-
> # Randomizing Random seed is unnecessary if we switch to UUID.
> # Setting Random objects in code like this is error-prone because it can be
> easily broken by having or missing a FileLocalizer.setR() somewhere else. See
> an example [here|http://search-hadoop.com/m/2nxTzQXfHw1].
> So I propose that we remove all this "randomizing Random seed" code and use
> UUID instead in temporary paths.
> For unit tests that compare the results against gold files, we should still
> allow to set Random seed through FileLocalizer.setR(). But this method will
> be annotated as "VisibleForTesting" to ensure it is not used nowhere else
> other than in unit tests.
> Regarding the existing gold files, they can be easily regenerated by
> TestMRCompiler as follows-
> {code}
> FileOutputStream fos = new FileOutputStream(expectedFile + "_new");
> PrintWriter pw = new PrintWriter(fos);
> pw.write(compiledPlan);
> {code}
> I assume there won't be any kind of regressions due to this change. But
> please let me know if I am wrong.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)