Re: Dynamically change logging levels for loggers
See the “LogFixture” class for how I got this to work (for tests) using LogBack. If we combine the LogBack details, with the prior Apex framework, we might have most of what we need. BTW: another huge help would be a way to “poke” Drill and have it dump the state of some of its internal data structures, such as the memory allocator, the list of running fragments, etc. We’ve seen cases where, after Drill runs for a long time, “something bad” happens. But, it is hard to see that internal state to figure out what’s what. Thanks, - Paul > On Aug 25, 2017, at 11:22 PM, Vlad Rozovwrote: > > +1. Even though it was done for log4j in Apache Apex, I am pretty sure that > the same can be done for logback. The only thing to consider is that all such > functionality is specific to a logging provider in use. I am quite familiar > with how it was done in Apex and can help if necessary. > > Thank you, > > Vlad > > On 8/25/17 14:22, Timothy Farkas wrote: >> +1 for exploring adding this feature. We had a feature to dynamically change >> log levels at runtime through the rest API in Apache Apex and it was very >> helpful with debugging things. >> >> >> From: Paul Rogers >> Sent: Friday, August 25, 2017 11:01:29 AM >> To: dev@drill.apache.org >> Subject: Re: Dynamically change logging levels for loggers >> >> Hi Kunal, >> >> Don’t know about rereading the config file, but I have had luck in the unit >> test framework with adjusting log levels programmatically. (Tests turn on >> interesting log levels for the duration of a single tests.) We might be able >> to use that capability (provided by Logback) to make adjustments at run time. >> >> - Paul >> >>> On Aug 25, 2017, at 10:55 AM, Kunal Khatua wrote: >>> >>> I figured this is a rarely modified piece of code but most frequently used >>> across all components. Hoping that someone who might have worked on logging >>> can share some insight from their experience in general, if not within >>> Drill. >>> >>> I was wondering if changes to Drill's logback.xml can be picked up >>> dynamically. >>> >>> i.e. without restarting the Drillbit, change the logging level of specific >>> classes within the Drillbit. >>> >>> I ask this because sometimes, a Drillbit needs to go through a warmup phase >>> where the JVM optimizes the functions frequently in use. Changing the >>> logging from something like an INFO to a DEBUG level would then allow me to >>> correctly capture specific log messages without having to lose all those >>> optimizations due to a restart (for the DEBUG to take effect). >>> >>> Is it something worth having ? >>> >>> ~ Kunal >> > > > Thank you, > > Vlad
Re: Drill developer guide or code organization
Hi Aditya, Drill does not have a good overview at present. The Wiki pages that Muhammad pointed out are about all that we can offer. Some general guidelines: almost everything you’ll want to explore is in the “java-exec” package. This includes the planner, the networking layer, the execution framework, etc. The planner is a bit hard to follow unless you learn Apache Calcite: Drill’s code is just a series of extensions to Calcite. Drill is columnar. Value Vectors are the internal representation, and are defined (via code generation) in the “vector” project. A number of storage and format plugins exist in the “contrib” projects. Please post specific questions here and we can help you. Then, I’ll adapt the answers to extend my own Wiki pages (the first item on the list below.) BTW: We want to move some of the more “fully baked” posts into Apache Drill at some point, perhaps in the Apache Drill wiki or as markdown files within a new Maven project. Also, as you learn about Drill, please consider creating your own summary of what you learn to benefit others. We can eventually pull that material into Drill as well. Finally, Muhammad, what challenges are you facing with the test framework? It is supposed to be easy, so if it is not, we’d sure like to learn about the challenges and fix them (or add better documentation.) Thanks, - Paul > On Aug 26, 2017, at 6:47 AM, Muhammad Gelbanawrote: > > I agree to that. Having a documentation guiding potential committers > through the code can help many achieve their tasks and grow the community. > I my self am struggling a bit with the test cases framework but I'm not > giving my full time though. > > Anyway, here is a list of the all the Github wikis for Drill forks: > > https://github.com/paul-rogers/drill/wiki > https://github.com/parthchandra/drill/wiki > https://github.com/kkhatua/drill/wiki > https://github.com/bitblender/drill/wiki > https://github.com/chunhui-shi/drill/wiki > https://github.com/xiaom/drill/wiki > https://github.com/jacques-n/drill/wiki > https://github.com/XingCloud/incubator-drill/wiki (Chinese) > > Thanks, > Gelbana > > On Sat, Aug 26, 2017 at 3:07 PM, Aditya Allamraju < > aditya.allamr...@gmail.com> wrote: > >> Team, >> >> Is there a place where we have documented different Code components of >> Drill? >> What i am looking for is something similar to >> https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide (mainly >> the >> part with code organization) >> I looked at apache docs. But could not find the above info in "developer >> information". >> >> I request the active members of the group to share such info. If it is not >> yet there, can someone please put up a doc for a start briefly mentioning >> different components and problem they are solving. >> Such information will greatly help the newcomers to this community. >> >> Appreciate all the efforts going on in this group. >> >> Thanks >> Aditya >>
Re: Drill developer guide or code organization
I agree to that. Having a documentation guiding potential committers through the code can help many achieve their tasks and grow the community. I my self am struggling a bit with the test cases framework but I'm not giving my full time though. Anyway, here is a list of the all the Github wikis for Drill forks: https://github.com/paul-rogers/drill/wiki https://github.com/parthchandra/drill/wiki https://github.com/kkhatua/drill/wiki https://github.com/bitblender/drill/wiki https://github.com/chunhui-shi/drill/wiki https://github.com/xiaom/drill/wiki https://github.com/jacques-n/drill/wiki https://github.com/XingCloud/incubator-drill/wiki (Chinese) Thanks, Gelbana On Sat, Aug 26, 2017 at 3:07 PM, Aditya Allamraju < aditya.allamr...@gmail.com> wrote: > Team, > > Is there a place where we have documented different Code components of > Drill? > What i am looking for is something similar to > https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide (mainly > the > part with code organization) > I looked at apache docs. But could not find the above info in "developer > information". > > I request the active members of the group to share such info. If it is not > yet there, can someone please put up a doc for a start briefly mentioning > different components and problem they are solving. > Such information will greatly help the newcomers to this community. > > Appreciate all the efforts going on in this group. > > Thanks > Aditya >
Drill developer guide or code organization
Team, Is there a place where we have documented different Code components of Drill? What i am looking for is something similar to https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide (mainly the part with code organization) I looked at apache docs. But could not find the above info in "developer information". I request the active members of the group to share such info. If it is not yet there, can someone please put up a doc for a start briefly mentioning different components and problem they are solving. Such information will greatly help the newcomers to this community. Appreciate all the efforts going on in this group. Thanks Aditya
[GitHub] drill pull request #910: DRILL-5726: Support Impersonation without authentic...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/910#discussion_r135385935 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/DrillRestServer.java --- @@ -230,6 +230,27 @@ public WebUserConnection provide() { public void dispose(WebUserConnection instance) { } + +/** + * Creates session user principal. If impersonation is enabled without authentication and User-Name header is present and valid, + * will create session user principal with provided user name, otherwise anonymous user name will be used. + * In both cases session user principal will have admin rights. + * + * @param config drill config + * @param request client request + * @return session user principal + */ +private Principal createSessionUserPrincipal(DrillConfig config, HttpServletRequest request) { + final boolean checkForUserName = !config.getBoolean(ExecConstants.USER_AUTHENTICATION_ENABLED) && config.getBoolean(ExecConstants.IMPERSONATION_ENABLED); + if (checkForUserName) { --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Dynamically change logging levels for loggers
+1. Even though it was done for log4j in Apache Apex, I am pretty sure that the same can be done for logback. The only thing to consider is that all such functionality is specific to a logging provider in use. I am quite familiar with how it was done in Apex and can help if necessary. Thank you, Vlad On 8/25/17 14:22, Timothy Farkas wrote: +1 for exploring adding this feature. We had a feature to dynamically change log levels at runtime through the rest API in Apache Apex and it was very helpful with debugging things. From: Paul RogersSent: Friday, August 25, 2017 11:01:29 AM To: dev@drill.apache.org Subject: Re: Dynamically change logging levels for loggers Hi Kunal, Don’t know about rereading the config file, but I have had luck in the unit test framework with adjusting log levels programmatically. (Tests turn on interesting log levels for the duration of a single tests.) We might be able to use that capability (provided by Logback) to make adjustments at run time. - Paul On Aug 25, 2017, at 10:55 AM, Kunal Khatua wrote: I figured this is a rarely modified piece of code but most frequently used across all components. Hoping that someone who might have worked on logging can share some insight from their experience in general, if not within Drill. I was wondering if changes to Drill's logback.xml can be picked up dynamically. i.e. without restarting the Drillbit, change the logging level of specific classes within the Drillbit. I ask this because sometimes, a Drillbit needs to go through a warmup phase where the JVM optimizes the functions frequently in use. Changing the logging from something like an INFO to a DEBUG level would then allow me to correctly capture specific log messages without having to lose all those optimizations due to a restart (for the DEBUG to take effect). Is it something worth having ? ~ Kunal Thank you, Vlad
[GitHub] drill pull request #906: DRILL-5546: Handle schema change exception failure ...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/906#discussion_r135382803 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java --- @@ -152,97 +157,75 @@ public void kill(boolean sendUpstream) { } } - private void releaseAssets() { -container.zeroVectors(); - } - - private void clearFieldVectorMap() { -for (final ValueVector v : mutator.fieldVectorMap().values()) { - v.clear(); -} - } - @Override public IterOutcome next() { if (done) { return IterOutcome.NONE; } oContext.getStats().startProcessing(); try { - try { -injector.injectChecked(context.getExecutionControls(), "next-allocate", OutOfMemoryException.class); - -currentReader.allocate(mutator.fieldVectorMap()); - } catch (OutOfMemoryException e) { -clearFieldVectorMap(); -throw UserException.memoryError(e).build(logger); - } - while ((recordCount = currentReader.next()) == 0) { + while (true) { try { - if (!readers.hasNext()) { -// We're on the last reader, and it has no (more) rows. -currentReader.close(); -releaseAssets(); -done = true; // have any future call to next() return NONE - -if (mutator.isNewSchema()) { - // This last reader has a new schema (e.g., we have a zero-row - // file or other source). (Note that some sources have a non- - // null/non-trivial schema even when there are no rows.) + injector.injectChecked(context.getExecutionControls(), "next-allocate", OutOfMemoryException.class); --- End diff -- This patch tries to decouple the logic of record reader and scanbatch: - Record reader is responsible to add vectors to batch (via Mutator), and populate data - ScanBatch is responsible to interpret the output of record reader, by checking rowCount && Mutator.isNewSchema() to decide whether return OK_NEW_SCHEMA, OK, or NONE. > What happens on the first reader? There is no schema, so any schema is a new schema. Suppose the file is JSON and the schema is built on the fly. Does the code handle the case that we have no schema (first reader), and that reader adds no columns? It's not true "any schema is a new schema". If the first reader has no schema and adds no columns, then Mutator.isNewSchema() should return false. Mutator.isNewSchema() returns true only after the last call, one or more happens - a new top level field is added, - a field in a nested field is added, - an existing field type is changed You may argue a more appropriate way to represent an empty JSON file is an empty schema. However, such idea would lead to various schema conflicts in down-stream operator, if other scan thread has non-empty JSON files. This is exactly what happened before this patch. The proposal in this patch is to **ignore** empty JSON, since 1)rowCount=0, 2)no new column were added to batch. - If all the record readers for a scan thread return with rowCount = 0, and produce no new schema, then this Scan thread should return 'NONE' directly, without returning OK_NEW_SCHEMA. - If at least one of reader return either with >0 row, or new schema, then Scan thread will return batch with new schema. - If all scan threads returns 'NONE' directly, implying the entire table does not have data/schema, this is what Project.handleFastNone() will deal with. >But, if the input is CSV, then we always have a schema. If the file has column headers, then we know that the schema is, say, (a, b, c) because those are the headers. Or, if the file has no headers, the schema is always the columns array. So, should we send that schema downstream? If so, should it include the implicit columns? If CSV always adds columns (either _a,b,c, or columns_), then ScanBatch will produce a batch with (a, b, c), or columns. It does not make sense to ignore those schema. - In the case of file with header, file with _a,b,c_ will lead to a batch with (a,b,c) while a file with _a,b,c,d_ will lead to a batch with (a,b,c,d). Those two files will cause a schema change, which is expected behavior. - In the case of file without header, all files will produce a batch with columns, which means there would be no schema change across different files, regardless whether they have row=0, or row > 0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please