Re: Dynamically change logging levels for loggers

2017-08-26 Thread Paul Rogers
See the “LogFixture” class for how I got this to work (for tests) using 
LogBack. If we combine the LogBack details, with the prior Apex framework, we 
might have most of what we need.

BTW: another huge help would be a way to “poke” Drill and have it dump the 
state of some of its internal data structures, such as the memory allocator, 
the list of running fragments, etc. We’ve seen cases where, after Drill runs 
for a long time, “something bad” happens. But, it is hard to see that internal 
state to figure out what’s what.

Thanks,

- Paul

> On Aug 25, 2017, at 11:22 PM, Vlad Rozov  wrote:
> 
> +1. Even though it was done for log4j in Apache Apex, I am pretty sure that 
> the same can be done for logback. The only thing to consider is that all such 
> functionality is specific to a logging provider in use. I am quite familiar 
> with how it was done in Apex and can help if necessary.
> 
> Thank you,
> 
> Vlad
> 
> On 8/25/17 14:22, Timothy Farkas wrote:
>> +1 for exploring adding this feature. We had a feature to dynamically change 
>> log levels at runtime through the rest API in Apache Apex and it was very 
>> helpful with debugging things.
>> 
>> 
>> From: Paul Rogers 
>> Sent: Friday, August 25, 2017 11:01:29 AM
>> To: dev@drill.apache.org
>> Subject: Re: Dynamically change logging levels for loggers
>> 
>> Hi Kunal,
>> 
>> Don’t know about rereading the config file, but I have had luck in the unit 
>> test framework with adjusting log levels programmatically. (Tests turn on 
>> interesting log levels for the duration of a single tests.) We might be able 
>> to use that capability (provided by Logback) to make adjustments at run time.
>> 
>> - Paul
>> 
>>> On Aug 25, 2017, at 10:55 AM, Kunal Khatua  wrote:
>>> 
>>> I figured this is a rarely modified piece of code but most frequently used 
>>> across all components. Hoping that someone who might have worked on logging 
>>> can share some insight from their experience in general, if not within 
>>> Drill.
>>> 
>>> I was wondering if changes to Drill's logback.xml can be picked up 
>>> dynamically.
>>> 
>>> i.e. without restarting the Drillbit, change the logging level of specific 
>>> classes within the Drillbit.
>>> 
>>> I ask this because sometimes, a Drillbit needs to go through a warmup phase 
>>> where the JVM optimizes the functions frequently in use. Changing the 
>>> logging from something like an INFO to a DEBUG level would then allow me to 
>>> correctly capture specific log messages without having to lose all those 
>>> optimizations due to a restart (for the DEBUG to take effect).
>>> 
>>> Is it something worth having ?
>>> 
>>> ~ Kunal
>> 
> 
> 
> Thank you,
> 
> Vlad



Re: Drill developer guide or code organization

2017-08-26 Thread Paul Rogers
Hi Aditya,

Drill does not have a good overview at present. The Wiki pages that Muhammad 
pointed out are about all that we can offer.

Some general guidelines: almost everything you’ll want to explore is in the 
“java-exec” package. This includes the planner, the networking layer, the 
execution framework, etc.

The planner is a bit hard to follow unless you learn Apache Calcite: Drill’s 
code is just a series of extensions to Calcite.

Drill is columnar. Value Vectors are the internal representation, and are 
defined (via code generation) in the “vector” project.

A number of storage and format plugins exist in the “contrib” projects.

Please post specific questions here and we can help you. Then, I’ll adapt the 
answers to extend my own Wiki pages (the first item on the list below.)

BTW: We want to move some of the more “fully baked” posts into Apache Drill at 
some point, perhaps in the Apache Drill wiki or as markdown files within a new 
Maven project.

Also, as you learn about Drill, please consider creating your own summary of 
what you learn to benefit others. We can eventually pull that material into 
Drill as well.

Finally, Muhammad, what challenges are you facing with the test framework? It 
is supposed to be easy, so if it is not, we’d sure like to learn about the 
challenges and fix them (or add better documentation.)

Thanks,

- Paul


> On Aug 26, 2017, at 6:47 AM, Muhammad Gelbana  wrote:
> 
> I agree to that. Having a documentation guiding potential committers
> through the code can help many achieve their tasks and grow the community.
> I my self am struggling a bit with the test cases framework but I'm not
> giving my full time though.
> 
> Anyway, here is a list of the all the Github wikis for Drill forks:
> 
> https://github.com/paul-rogers/drill/wiki
> https://github.com/parthchandra/drill/wiki
> https://github.com/kkhatua/drill/wiki
> https://github.com/bitblender/drill/wiki
> https://github.com/chunhui-shi/drill/wiki
> https://github.com/xiaom/drill/wiki
> https://github.com/jacques-n/drill/wiki
> https://github.com/XingCloud/incubator-drill/wiki (Chinese)
> 
> Thanks,
> Gelbana
> 
> On Sat, Aug 26, 2017 at 3:07 PM, Aditya Allamraju <
> aditya.allamr...@gmail.com> wrote:
> 
>> Team,
>> 
>> Is there a place where we have documented different Code components of
>> Drill?
>> What i am looking for is something similar to
>> https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide (mainly
>> the
>> part with code organization)
>> I looked at apache docs. But could not find the above info in "developer
>> information".
>> 
>> I request the active members of the group to share such info. If it is not
>> yet there, can someone please put up a doc for a start briefly mentioning
>> different components and problem they are solving.
>> Such information will greatly help the newcomers to this community.
>> 
>> Appreciate all the efforts going on in this group.
>> 
>> Thanks
>> Aditya
>> 



Re: Drill developer guide or code organization

2017-08-26 Thread Muhammad Gelbana
I agree to that. Having a documentation guiding potential committers
through the code can help many achieve their tasks and grow the community.
I my self am struggling a bit with the test cases framework but I'm not
giving my full time though.

Anyway, here is a list of the all the Github wikis for Drill forks:

https://github.com/paul-rogers/drill/wiki
https://github.com/parthchandra/drill/wiki
https://github.com/kkhatua/drill/wiki
https://github.com/bitblender/drill/wiki
https://github.com/chunhui-shi/drill/wiki
https://github.com/xiaom/drill/wiki
https://github.com/jacques-n/drill/wiki
https://github.com/XingCloud/incubator-drill/wiki (Chinese)

Thanks,
Gelbana

On Sat, Aug 26, 2017 at 3:07 PM, Aditya Allamraju <
aditya.allamr...@gmail.com> wrote:

> Team,
>
> Is there a place where we have documented different Code components of
> Drill?
> What i am looking for is something similar to
> https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide (mainly
> the
> part with code organization)
> I looked at apache docs. But could not find the above info in "developer
> information".
>
> I request the active members of the group to share such info. If it is not
> yet there, can someone please put up a doc for a start briefly mentioning
> different components and problem they are solving.
> Such information will greatly help the newcomers to this community.
>
> Appreciate all the efforts going on in this group.
>
> Thanks
> Aditya
>


Drill developer guide or code organization

2017-08-26 Thread Aditya Allamraju
Team,

Is there a place where we have documented different Code components of
Drill?
What i am looking for is something similar to
https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide (mainly the
part with code organization)
I looked at apache docs. But could not find the above info in "developer
information".

I request the active members of the group to share such info. If it is not
yet there, can someone please put up a doc for a start briefly mentioning
different components and problem they are solving.
Such information will greatly help the newcomers to this community.

Appreciate all the efforts going on in this group.

Thanks
Aditya


[GitHub] drill pull request #910: DRILL-5726: Support Impersonation without authentic...

2017-08-26 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/910#discussion_r135385935
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/DrillRestServer.java
 ---
@@ -230,6 +230,27 @@ public WebUserConnection provide() {
 public void dispose(WebUserConnection instance) {
 
 }
+
+/**
+ * Creates session user principal. If impersonation is enabled without 
authentication and User-Name header is present and valid,
+ * will create session user principal with provided user name, 
otherwise anonymous user name will be used.
+ * In both cases session user principal will have admin rights.
+ *
+ * @param config drill config
+ * @param request client request
+ * @return session user principal
+ */
+private Principal createSessionUserPrincipal(DrillConfig config, 
HttpServletRequest request) {
+  final boolean checkForUserName = 
!config.getBoolean(ExecConstants.USER_AUTHENTICATION_ENABLED) && 
config.getBoolean(ExecConstants.IMPERSONATION_ENABLED);
+  if (checkForUserName) {
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Dynamically change logging levels for loggers

2017-08-26 Thread Vlad Rozov
+1. Even though it was done for log4j in Apache Apex, I am pretty sure 
that the same can be done for logback. The only thing to consider is 
that all such functionality is specific to a logging provider in use. I 
am quite familiar with how it was done in Apex and can help if necessary.


Thank you,

Vlad

On 8/25/17 14:22, Timothy Farkas wrote:

+1 for exploring adding this feature. We had a feature to dynamically change 
log levels at runtime through the rest API in Apache Apex and it was very 
helpful with debugging things.


From: Paul Rogers 
Sent: Friday, August 25, 2017 11:01:29 AM
To: dev@drill.apache.org
Subject: Re: Dynamically change logging levels for loggers

Hi Kunal,

Don’t know about rereading the config file, but I have had luck in the unit 
test framework with adjusting log levels programmatically. (Tests turn on 
interesting log levels for the duration of a single tests.) We might be able to 
use that capability (provided by Logback) to make adjustments at run time.

- Paul


On Aug 25, 2017, at 10:55 AM, Kunal Khatua  wrote:

I figured this is a rarely modified piece of code but most frequently used 
across all components. Hoping that someone who might have worked on logging can 
share some insight from their experience in general, if not within Drill.

I was wondering if changes to Drill's logback.xml can be picked up dynamically.

i.e. without restarting the Drillbit, change the logging level of specific 
classes within the Drillbit.

I ask this because sometimes, a Drillbit needs to go through a warmup phase 
where the JVM optimizes the functions frequently in use. Changing the logging 
from something like an INFO to a DEBUG level would then allow me to correctly 
capture specific log messages without having to lose all those optimizations 
due to a restart (for the DEBUG to take effect).

Is it something worth having ?

~ Kunal





Thank you,

Vlad


[GitHub] drill pull request #906: DRILL-5546: Handle schema change exception failure ...

2017-08-26 Thread jinfengni
Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/906#discussion_r135382803
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -152,97 +157,75 @@ public void kill(boolean sendUpstream) {
 }
   }
 
-  private void releaseAssets() {
-container.zeroVectors();
-  }
-
-  private void clearFieldVectorMap() {
-for (final ValueVector v : mutator.fieldVectorMap().values()) {
-  v.clear();
-}
-  }
-
   @Override
   public IterOutcome next() {
 if (done) {
   return IterOutcome.NONE;
 }
 oContext.getStats().startProcessing();
 try {
-  try {
-injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
-
-currentReader.allocate(mutator.fieldVectorMap());
-  } catch (OutOfMemoryException e) {
-clearFieldVectorMap();
-throw UserException.memoryError(e).build(logger);
-  }
-  while ((recordCount = currentReader.next()) == 0) {
+  while (true) {
 try {
-  if (!readers.hasNext()) {
-// We're on the last reader, and it has no (more) rows.
-currentReader.close();
-releaseAssets();
-done = true;  // have any future call to next() return NONE
-
-if (mutator.isNewSchema()) {
-  // This last reader has a new schema (e.g., we have a 
zero-row
-  // file or other source).  (Note that some sources have a 
non-
-  // null/non-trivial schema even when there are no rows.)
+  injector.injectChecked(context.getExecutionControls(), 
"next-allocate", OutOfMemoryException.class);
--- End diff --

This patch tries to decouple the logic of record reader and scanbatch:
 - Record reader is responsible to add vectors to batch (via Mutator), and 
populate data
 - ScanBatch is responsible to interpret the output of record reader, by 
checking rowCount && Mutator.isNewSchema() to decide whether return 
OK_NEW_SCHEMA, OK, or NONE. 

> What happens on the first reader? There is no schema, so any schema is a 
new schema. Suppose the file is JSON and the schema is built on the fly. Does 
the code handle the case that we have no schema (first reader), and that reader 
adds no columns?

It's not true "any schema is a new schema". If the first reader has no 
schema and adds no columns, then Mutator.isNewSchema() should return false. 
Mutator.isNewSchema() returns true only after the last call, one or more happens

- a new top level field is added, 
- a field in a nested field is added, 
- an existing field type is changed

You may argue a more appropriate way to represent an empty JSON file is an 
empty schema. However, such idea would lead to various schema conflicts in 
down-stream operator, if other scan thread has non-empty JSON files. This is 
exactly what happened before this patch. 

The proposal in this patch is to **ignore** empty JSON, since 1)rowCount=0, 
2)no new column were added to batch. 
 - If all the record readers for a scan thread return with rowCount = 0, 
and produce no new schema, then this Scan thread should return 'NONE' directly, 
without returning OK_NEW_SCHEMA. 
 - If at least one of reader return either with >0 row, or new schema, then 
Scan thread will return batch with new schema.
- If all scan threads returns 'NONE' directly, implying the entire table 
does not have data/schema, this is what Project.handleFastNone() will deal with.

>But, if the input is CSV, then we always have a schema. If the file has 
column headers, then we know that the schema is, say, (a, b, c) because those 
are the headers. Or, if the file has no headers, the schema is always the 
columns array. So, should we send that schema downstream? If so, should it 
include the implicit columns?

If CSV always adds columns (either _a,b,c, or columns_), then ScanBatch 
will produce a batch with (a, b, c), or columns. It does not make sense to 
ignore those schema.  
  - In the case of file with header,  file with _a,b,c_ will lead to a 
batch with (a,b,c) while a file with _a,b,c,d_ will lead to a batch with 
(a,b,c,d). Those two files will cause a schema change, which is expected 
behavior.
  - In the case of file without header, all files will produce a batch with 
columns, which means there would be no schema change across different files, 
regardless whether they have row=0, or row > 0.
  



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please