[ 
https://issues.apache.org/jira/browse/PIG-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3645:
-------------------------------

    Attachment: PIG-3645-1.patch

Attaching an initial patch that includes regenerated gold files.

I will run the full suite of unit tests.

> Replace Random with UUID in FileLocalizer.getTemporaryPath()
> ------------------------------------------------------------
>
>                 Key: PIG-3645
>                 URL: https://issues.apache.org/jira/browse/PIG-3645
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>            Priority: Minor
>             Fix For: 0.13.0
>
>         Attachments: PIG-3645-1.patch
>
>
> Currently, temporary paths are generated by FileLocalizer using 
> Random.nextInt(). To provide strong randomness, MapReduceLauncher resets the 
> Random object every time when compiling physical plan to MR plan:
> {code}
> MRCompiler comp = new MRCompiler(php, pc); 
> comp.randomizeFileLocalizer(); // This in turn calls FileLocalizer.setR(new 
> Random()).
> {code}
> Besides, there are a couple of places calling FileLocalizer.setR() (e.g. 
> MRCompiler) with some random seed.
> I think-
> # Randomizing Random seed is unnecessary if we switch to UUID.
> # Setting Random objects in code like this is error-prone because it can be 
> easily broken by having or missing a FileLocalizer.setR() somewhere else. See 
> an example [here|http://search-hadoop.com/m/2nxTzQXfHw1].
> So I propose that we remove all this "randomizing Random seed" code and use 
> UUID instead in temporary paths.
> For unit tests that compare the results against gold files, we should still 
> allow to set Random seed through FileLocalizer.setR(). But this method will 
> be annotated as "VisibleForTesting" to ensure it is not used nowhere else 
> other than in unit tests.
> Regarding the existing gold files, they can be easily regenerated by 
> TestMRCompiler as follows-
> {code}
> FileOutputStream fos = new FileOutputStream(expectedFile + "_new");
> PrintWriter pw = new PrintWriter(fos);
> pw.write(compiledPlan);
> {code}
> I assume there won't be any kind of regressions due to this change. But 
> please let me know if I am wrong.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to