> On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 532
> > <https://reviews.apache.org/r/47040/diff/5/?file=1379887#file1379887line532>
> >
> >     Let's rename this to setJobQueueForUser()
> >     
> >     Any reason to return the config ? Return value should be void.
> >     
> >     configuration -> conf for consistency

Yeah, what you're saying makes sense. The intent here is that I really do want 
to consolidate the scheduling code that's floating around the codebase. Right 
now all this only handles a very specific behavior (sets the job queue if Yarn 
fair scheduling is configured and impersonation is off) - it doesn't even 
handle every Yarn scheduling case, but it really should. This is just me 
setting down a flag and saying "let's consolidate pre-sumbit job scheduler 
stuff here".


> On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/session/YarnFairScheduling.java, line 
> > 50
> > <https://reviews.apache.org/r/47040/diff/5/?file=1379889#file1379889line50>
> >
> >     Remove this check.
> >     
> >     I'd just inline the setJobQueueForUser method in the caller and get rid 
> > of this entire class altogether. It only seems to be offering 
> > usingNonImpersonationModeWithFairScheduling() method which is a utility 
> > method.

For this issue yes. But we should consolidate all yarn specific behavior in one 
place with a straighforward interface for clients. The alternative is to 
duplicate this:

    // In non-impersonation mode, map scheduler queue to current user
    // if fair scheduler is configured.
    if (! sessionConf.getBoolVar(ConfVars.HIVE_SERVER2_ENABLE_DOAS) &&
      sessionConf.getBoolVar(ConfVars.HIVE_SERVER2_MAP_FAIR_SCHEDULER_QUEUE)) {
      ShimLoader.getHadoopShims().refreshDefaultQueue(sessionConf, username);
    }
    
every time we want to set up a user's job queue in non-impersonation mode. 
Furthermore, I'm absolutely certain there's more Yarn logic than just 'schedule 
a queue in non-impersonation mode' floating around the codebase, but it's hard 
to know exactly what because it's not easily identifiable (see above). A single 
interface for configuring yarn stuff would make it much easier to identify and 
enforce correct behavior.


> On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/session/YarnFairScheduling.java, line 
> > 64
> > <https://reviews.apache.org/r/47040/diff/5/?file=1379889#file1379889line64>
> >
> >     What validation we are doing in this method ? I'd just call this 
> > setJobQueueForUser or something

You're correct. The validation should be more rigorous/well-defined than it is 
currently. The long and short is that the user's job queue should be validated 
before jobs are submitted, but it turns out that that's a bit more complicated 
than what we have time for at the moment so in the interim we just check to see 
if the most recent set of rules from fair-scheduler.xml allow you to set the 
same queue as you're already using. So validation *really is* different from 
default configuration, it's just that validation is weakly defined at the 
moment.


> On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote:
> > shims/common/src/main/java/org/apache/hadoop/fs/FileSystemWatcher.java, 
> > line 169
> > <https://reviews.apache.org/r/47040/diff/5/?file=1379891#file1379891line169>
> >
> >     can you elaborate on this comment? Which OSs does it not block ?
> >     
> >     If it's async in those OSs, is 5 seconds always sufficient...  seems 
> > hacky.

Yeah, this is actually false. I was testing it wrong previously. Thanks for the 
catch.


> On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote:
> > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java,
> >  line 119
> > <https://reviews.apache.org/r/47040/diff/5/?file=1379895#file1379895line119>
> >
> >     I'd rename this method setJobQueueForUser()
> >     
> >     Because that's what this method is doing, ... i.e. either setting the 
> > queue config by user or setting default queue config.

Same as above.


> On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote:
> > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java,
> >  line 120
> > <https://reviews.apache.org/r/47040/diff/5/?file=1379895#file1379895line120>
> >
> >     Is there any advantage of using "defaultString()" when it returns an 
> > empty string vs just letting conf.get() return null -- and doing a null 
> > check instead of isEmpty() ?
> >     
> >     The pattern in Hive code is to just check if conf.get() returns null. 
> > Unless there is a advantage in this use case, let's maintain the pattern. 
> > (Just like the queueConfig != null check elsewhere)
> >     
> >     Same for other places where defaultString() is used.

Coming from using F# nulls don't make sense to me, heh. Changed though.


> On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote:
> > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java,
> >  line 124
> > <https://reviews.apache.org/r/47040/diff/5/?file=1379895#file1379895line124>
> >
> >     wait, refreshDefaultQueue() just calls attemptSetScheduleForUser().
> >     
> >     Can you just inline attemptSetScheduleForUser here with 
> > YarnConfiguration.DEFAULT_QUEUE_NAME param ? It's cleaner without the extra 
> > indirection.

Same point as above. Currently the validation is weak/inorrectly-defined, but 
it still needs to exist. Will need to redefine the validation later (by 
modifying this function). It's important though to keep the two concepts 
separate for callers.


> On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote:
> > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java,
> >  line 137
> > <https://reviews.apache.org/r/47040/diff/5/?file=1379895#file1379895line137>
> >
> >     We should make this LOG.info. There is nothing warn about here ...

Oops, thanks!


> On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote:
> > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java,
> >  line 132
> > <https://reviews.apache.org/r/47040/diff/5/?file=1379895#file1379895line132>
> >
> >     nit: cleaner with
> >     if (queueConfig == null)
> >       return;

Hmm, don't like it :/

Not against returning mid function, but it seems odd to change the cadence 
(return on one null check, but not on others). Man nulls are so weird, heh. 
'Options' would be way better.


- Reuben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47040/#review132819
-----------------------------------------------------------


On May 11, 2016, 8:02 p.m., Reuben Kuhnert wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47040/
> -----------------------------------------------------------
> 
> (Updated May 11, 2016, 8:02 p.m.)
> 
> 
> Review request for hive, Lenni Kuff, Mohit Sabharwal, and Sergio Pena.
> 
> 
> Bugs: HIVE-13696
>     https://issues.apache.org/jira/browse/HIVE-13696
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Ensure that jobs sent to YARN with impersonation off are correctly routed to 
> the proper queue based on fair-scheduler.xml. Monitor this file for changes 
> and validate that jobs can only be sent to queues authorized for the user.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> 3fecc5c4ca2a06a031c0c4a711fb49e757c49062 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 
> 926f6e883030b5a01d025994bd02c67f0f5a275c 
>   ql/src/java/org/apache/hadoop/hive/ql/session/YarnFairScheduling.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> a0015ebc655931f241b28c53fbb94cfe172841b1 
>   shims/common/src/main/java/org/apache/hadoop/fs/FileSystemWatcher.java 
> PRE-CREATION 
>   shims/common/src/main/java/org/apache/hadoop/hive/shims/SchedulerShim.java 
> 63803b8b0752745bd2fedaccc5d100befd97093b 
>   shims/scheduler/pom.xml b36c12325c588cdb609c6200b1edef73a2f79552 
>   
> shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerQueueAllocator.java
>  PRE-CREATION 
>   
> shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java
>  372244dc3c989d2a3ae2eb2bfb8cd0a235705e18 
>   
> shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/QueueAllocator.java
>  PRE-CREATION 
>   
> shims/scheduler/src/test/java/org/apache/hadoop/hive/schshim/TestFairScheduler.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/47040/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Reuben Kuhnert
> 
>

Reply via email to