> On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote: > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 532 > > <https://reviews.apache.org/r/47040/diff/5/?file=1379887#file1379887line532> > > > > Let's rename this to setJobQueueForUser() > > > > Any reason to return the config ? Return value should be void. > > > > configuration -> conf for consistency
Yeah, what you're saying makes sense. The intent here is that I really do want to consolidate the scheduling code that's floating around the codebase. Right now all this only handles a very specific behavior (sets the job queue if Yarn fair scheduling is configured and impersonation is off) - it doesn't even handle every Yarn scheduling case, but it really should. This is just me setting down a flag and saying "let's consolidate pre-sumbit job scheduler stuff here". > On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote: > > ql/src/java/org/apache/hadoop/hive/ql/session/YarnFairScheduling.java, line > > 50 > > <https://reviews.apache.org/r/47040/diff/5/?file=1379889#file1379889line50> > > > > Remove this check. > > > > I'd just inline the setJobQueueForUser method in the caller and get rid > > of this entire class altogether. It only seems to be offering > > usingNonImpersonationModeWithFairScheduling() method which is a utility > > method. For this issue yes. But we should consolidate all yarn specific behavior in one place with a straighforward interface for clients. The alternative is to duplicate this: // In non-impersonation mode, map scheduler queue to current user // if fair scheduler is configured. if (! sessionConf.getBoolVar(ConfVars.HIVE_SERVER2_ENABLE_DOAS) && sessionConf.getBoolVar(ConfVars.HIVE_SERVER2_MAP_FAIR_SCHEDULER_QUEUE)) { ShimLoader.getHadoopShims().refreshDefaultQueue(sessionConf, username); } every time we want to set up a user's job queue in non-impersonation mode. Furthermore, I'm absolutely certain there's more Yarn logic than just 'schedule a queue in non-impersonation mode' floating around the codebase, but it's hard to know exactly what because it's not easily identifiable (see above). A single interface for configuring yarn stuff would make it much easier to identify and enforce correct behavior. > On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote: > > ql/src/java/org/apache/hadoop/hive/ql/session/YarnFairScheduling.java, line > > 64 > > <https://reviews.apache.org/r/47040/diff/5/?file=1379889#file1379889line64> > > > > What validation we are doing in this method ? I'd just call this > > setJobQueueForUser or something You're correct. The validation should be more rigorous/well-defined than it is currently. The long and short is that the user's job queue should be validated before jobs are submitted, but it turns out that that's a bit more complicated than what we have time for at the moment so in the interim we just check to see if the most recent set of rules from fair-scheduler.xml allow you to set the same queue as you're already using. So validation *really is* different from default configuration, it's just that validation is weakly defined at the moment. > On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote: > > shims/common/src/main/java/org/apache/hadoop/fs/FileSystemWatcher.java, > > line 169 > > <https://reviews.apache.org/r/47040/diff/5/?file=1379891#file1379891line169> > > > > can you elaborate on this comment? Which OSs does it not block ? > > > > If it's async in those OSs, is 5 seconds always sufficient... seems > > hacky. Yeah, this is actually false. I was testing it wrong previously. Thanks for the catch. > On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote: > > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java, > > line 119 > > <https://reviews.apache.org/r/47040/diff/5/?file=1379895#file1379895line119> > > > > I'd rename this method setJobQueueForUser() > > > > Because that's what this method is doing, ... i.e. either setting the > > queue config by user or setting default queue config. Same as above. > On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote: > > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java, > > line 120 > > <https://reviews.apache.org/r/47040/diff/5/?file=1379895#file1379895line120> > > > > Is there any advantage of using "defaultString()" when it returns an > > empty string vs just letting conf.get() return null -- and doing a null > > check instead of isEmpty() ? > > > > The pattern in Hive code is to just check if conf.get() returns null. > > Unless there is a advantage in this use case, let's maintain the pattern. > > (Just like the queueConfig != null check elsewhere) > > > > Same for other places where defaultString() is used. Coming from using F# nulls don't make sense to me, heh. Changed though. > On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote: > > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java, > > line 124 > > <https://reviews.apache.org/r/47040/diff/5/?file=1379895#file1379895line124> > > > > wait, refreshDefaultQueue() just calls attemptSetScheduleForUser(). > > > > Can you just inline attemptSetScheduleForUser here with > > YarnConfiguration.DEFAULT_QUEUE_NAME param ? It's cleaner without the extra > > indirection. Same point as above. Currently the validation is weak/inorrectly-defined, but it still needs to exist. Will need to redefine the validation later (by modifying this function). It's important though to keep the two concepts separate for callers. > On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote: > > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java, > > line 137 > > <https://reviews.apache.org/r/47040/diff/5/?file=1379895#file1379895line137> > > > > We should make this LOG.info. There is nothing warn about here ... Oops, thanks! > On May 12, 2016, 7:02 a.m., Mohit Sabharwal wrote: > > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java, > > line 132 > > <https://reviews.apache.org/r/47040/diff/5/?file=1379895#file1379895line132> > > > > nit: cleaner with > > if (queueConfig == null) > > return; Hmm, don't like it :/ Not against returning mid function, but it seems odd to change the cadence (return on one null check, but not on others). Man nulls are so weird, heh. 'Options' would be way better. - Reuben ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/47040/#review132819 ----------------------------------------------------------- On May 11, 2016, 8:02 p.m., Reuben Kuhnert wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/47040/ > ----------------------------------------------------------- > > (Updated May 11, 2016, 8:02 p.m.) > > > Review request for hive, Lenni Kuff, Mohit Sabharwal, and Sergio Pena. > > > Bugs: HIVE-13696 > https://issues.apache.org/jira/browse/HIVE-13696 > > > Repository: hive-git > > > Description > ------- > > Ensure that jobs sent to YARN with impersonation off are correctly routed to > the proper queue based on fair-scheduler.xml. Monitor this file for changes > and validate that jobs can only be sent to queues authorized for the user. > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java > 3fecc5c4ca2a06a031c0c4a711fb49e757c49062 > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java > 926f6e883030b5a01d025994bd02c67f0f5a275c > ql/src/java/org/apache/hadoop/hive/ql/session/YarnFairScheduling.java > PRE-CREATION > service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java > a0015ebc655931f241b28c53fbb94cfe172841b1 > shims/common/src/main/java/org/apache/hadoop/fs/FileSystemWatcher.java > PRE-CREATION > shims/common/src/main/java/org/apache/hadoop/hive/shims/SchedulerShim.java > 63803b8b0752745bd2fedaccc5d100befd97093b > shims/scheduler/pom.xml b36c12325c588cdb609c6200b1edef73a2f79552 > > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerQueueAllocator.java > PRE-CREATION > > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java > 372244dc3c989d2a3ae2eb2bfb8cd0a235705e18 > > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/QueueAllocator.java > PRE-CREATION > > shims/scheduler/src/test/java/org/apache/hadoop/hive/schshim/TestFairScheduler.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/47040/diff/ > > > Testing > ------- > > > Thanks, > > Reuben Kuhnert > >