date:20190402

Apply for JIRA permission

2019-04-02 Thread Hefei Li

Hi guys,

I want to contribute to Apache Drill.

Would you please give me the permission as a contributor ?

My JIRA username is *lhfei *.




===
Best Regards
Hefei LiHefei Li
MP: +86  18701581473
MSN: lh...@live.cn
===

Re: [DISCUSS]: Additional Formats for Drill

2019-04-02 Thread Paul Rogers

Hi All,

Daffodil is an interesting project as is the DFDLSchemas project. Thanks for 
sharing!

An interesting challenge is how these libraries load data: what is their 
internal format, or what API do they use for the application to consume data? 
Found this for Daffodil, it will "parse data into an infoset represented as XML 
or JSON"

Drill is part of the "big data" ecosystem. Converting a 100GB file, say, into 
XML, then into Drill would be a bit cumbersome. Better would be if the 
libraries provided an API that Drill could implement to receive the data and 
write it to vectors using, say, the new row set framework that we've just added 
for CSV and will soon add for JSON. Both JSON and XML provide a parser to which 
the app provides an implementation. Drill uses this approach to parse JSON.

Another issue is file splits: to store a large file on HDFS (yes, HDFS is old, 
everyone uses S3 now), we want Drill to read each file block separately. The 
means the file must be "splittable": there must be some well-defined token that 
the scanner can search for at block boundaries. Not clear if these parsers are 
designed for this big data model.

For both projects, would be good to read data into Arrow. Ideally, we'd get a 
volunteer to port the row set mechanism to Arrow so that the same API can write 
to both Arrow and Drill vectors (saving the entire world from having to write 
their own vector writing mechanisms.)

Thanks,
- Paul

On Tuesday, April 2, 2019, 1:06:53 PM PDT, Ted Dunning 
 wrote:  

 I have no idea how much uptake these would have, but if the library can
give all the formats all at once for modest effort, that would be great.

On Tue, Apr 2, 2019 at 9:22 AM Charles Givre  wrote:

> Hello everyone,
> I recently presented a talk at the ASF DC Roadshow (shameless plug[1] )
> but heard a really good talk by a PMC member for the Apache Daffodil
> (incubating) project.  At its core, Daffodil is a collection of parsers
> which convert various data formats to a standard structure which can then
> be ingested into other tools.  Some of these formats Drill already can
> ingest natively such as PCAP, CSV however many cannot such as NACHA (bulk
> financial transactions), vCard, Shapefile, and many more.  Here is a brief
> presentation about Daffodil [2].
>
> The DFDLSchemas github has a handful of DFDL schemas that are pretty good
> open source examples[3].
>
> On a related note, I stumbled on the Kaitai struct library[4] which is
> another library which performs a similar function to Daffodil.  Would it be
> of interest for the community to incorporate these libraries into Drill?
> My thought is that it would greatly increase the types of data that Drill
> can natively query and hence seriously increase Drill’s usefulness.  If
> there is interest, (and honestly even if there isn’t) I can start working
> on this for the next release of Drill.
>
>
> [1]:
> https://www.slideshare.net/cgivre/drilling-cyber-security-data-with-apache-drill
> <
> https://www.slideshare.net/cgivre/drilling-cyber-security-data-with-apache-drill
> >
> [2]:
> https://www.slideshare.net/mbeckerle/tresys-dfdl-data-format-description-language-daffodil-open-source-public-overview-100432615
> <
> https://www.slideshare.net/mbeckerle/tresys-dfdl-data-format-description-language-daffodil-open-source-public-overview-100432615
> >
> [3]: https://github.com/DFDLSchemas 
> [4]: http://formats.kaitai.io 
>
>

Re: Drill not compiling after rebase!!

2019-04-02 Thread Charles Givre

All, 
I submitted the following PR which fixes this: 
https://github.com/apache/drill/pull/1731 
.  @vvysotskyi, can you review and 
commit?
Thanks,
— C


> On Apr 2, 2019, at 17:25, hanu mapr  wrote:
> 
> Hello Vova,
> 
> Option 2 makes sense to me. I have tried with type casting the accept
> method and it worked.
> 
> Thanks,
> -Hanu
> 
> On Tue, Apr 2, 2019 at 9:23 AM Arina Yelchiyeva 
> wrote:
> 
>> I would go with the second approach, since it would have less impact on
>> the project.
>> 
>> Kind regards,
>> Arina
>> 
>>> On Apr 2, 2019, at 6:53 PM, Vova Vysotskyi  wrote:
>>> 
>>> Hi,
>>> 
>>> For now, I see two ways of solving this issue:
>>> 
>>> 1. Find minimum JDK build version where this issue is fixed and specify
>> it
>>> in requireJavaVersion tag in maven-enforcer-plugin:
>>> https://maven.apache.org/enforcer/enforcer-rules/requireJavaVersion.html
>>> So build will fail with a clear error message instead of compilation
>> error.
>>> 
>>> 2. Explicitly specify RuntimeException type in generic for method, add
>> the
>>> corresponding comment and suppression to avoid warnings in IDE. (If
>> someone
>>> has checked that it works. See my previous email.)
>>> 
>>> Kind regards,
>>> Volodymyr Vysotskyi
>>> 
>>> 
>>> On Tue, Apr 2, 2019 at 6:35 PM Charles Givre  wrote:
>>> 
 All,
 I tried this on another machine with a higher version of Java and it
 worked without the changes below.  So it would seem that this probably
>> is a
 bug in JDK. How do we proceed?
 
> On Apr 1, 2019, at 17:12, Sorabh Hamirwasia 
 wrote:
> 
> I am not seeing any issue with latest maven and with below java
>> version.
 As
> Vova suggested this could be a JDK bug.
> 
> *# mvn --version*
> *Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
> MaxPermSize=256m; support was removed in 8.0*
> *Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3;
> 2018-10-24T11:41:47-07:00)*
> *Maven home: /opt/maven/apache-maven-3.6.0*
> *Java version: 1.8.0_131, vendor: Oracle Corporation, runtime:
> /opt/jdk1.8.0_131/jre*
> 
> @Charles/Hanu,
> Can you upgrade your JDK version and try once ?
> 
> Thanks,
> Sorabh
> 
> On Mon, Apr 1, 2019 at 1:53 PM hanu mapr  wrote:
> 
>> Hello Vova,
>> 
>> Here is the java version on my laptop.
>> 
>> HMADURI-E597:drill hmaduri$ java -version
>> java version "1.8.0_91"
>> Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
>> Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
>> HMADURI-E597:drill hmaduri$ javac -version
>> javac 1.8.0_91
>> 
>> Thanks,
>> -Hanu
>> 
>> On Mon, Apr 1, 2019 at 1:45 PM Charles Givre 
>> wrote:
>> 
>>> Hi Volodmyr,
>>> I’m on a Mac OSX Mohave, java version 1.8.0_65, maven version 3.6.0.
>>> 
>>> In order to get Drill to build I had to make the following changes:
>>> 
>>> org/apache/drill/exec/store/parquet/TestParquetFilterPushDown.java
>> (add
>>> try/catch)
>>> 
>>> private void
>>> testParquetRowGroupFilterEval(MetadataBase.ParquetTableMetadataBase
>> footer,
>>> final int rowGroupIndex, final LogicalExpression filterExpr,
>> RowsMatch
>>> canDropExpected) {
>>> try {
>>> RowsMatch canDrop = FilterEvaluatorUtils.evalFilter(filterExpr,
>> footer,
>>> rowGroupIndex, fragContext.getOptions(), fragContext);
>>> Assert.assertEquals(canDropExpected, canDrop);
>>> } catch (Exception e) {
>>> fail();
>>> }
>>> }
>>> 
>>> and
>>> 
>>> org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java
>>> 
>>> public static RowsMatch evalFilter(LogicalExpression expr,
>>> MetadataBase.ParquetTableMetadataBase footer,
>>>int rowGroupIndex, OptionManager
>>> options, FragmentContext fragmentContext) throws Exception {
>>> 
>>> where I added throws Exception.
>>> 
>>> 
 On Apr 1, 2019, at 16:11, Vova Vysotskyi  wrote:
 
 Hi all,
 
 Looking into the code, I don't see a reason for compilation failure,
>>> since
 the exception type should be inferred from *FieldReferenceFinder*,
>> which
 contains *RuntimeException*.
 
 Perhaps it may be JDK bug, something like this
 https://bugs.openjdk.java.net/browse/JDK-8066974.
 Charles, Hanu, could you please share you JDK versions, on my
 machine 1.8.0_191 and everything works fine.
 
 Also, could you please check whether specifying types explicitly
>> will
>>> help:
 *expr.accept(new FieldReferenceFinder(), null)* *->*
>>> *expr.,
 Void, RuntimeException>accept(new FieldReferenceFinder(), null)*
 
 Kind regards,
 Volodymyr Vysotskyi
 
 
 On Mon, Apr 1,

[GitHub] [drill] cgivre opened a new pull request #1731: DRILL-7153: Drill Fails to Build using JDK 1.8.0_65

2019-04-02 Thread GitBox

cgivre opened a new pull request #1731: DRILL-7153: Drill Fails to Build using 
JDK 1.8.0_65
URL: https://github.com/apache/drill/pull/1731
 
 
   This PR fixes a bug in which building Drill using JDK 1.8.0_65 results in 
the following error. 
   
   ```
   [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-compile) 
on project drill-java-exec: Compilation failure
   [ERROR] 
/Users/cgivre/github/drill-dev/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java:[59,68]
 error: unreported exception E; must be caught or declared to be thrown
   [ERROR]   where E,T,V are type-variables:
   [ERROR] E extends Exception declared in method 
accept(ExprVisitor,V)
   [ERROR] T extends Object declared in method 
accept(ExprVisitor,V)
   [ERROR] V extends Object declared in method 
accept(ExprVisitor,V)
   [ERROR]
   [ERROR] -> [Help 1]
   [ERROR]
   [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
   [ERROR] Re-run Maven using the -X switch to enable full debug logging.
   [ERROR]
   [ERROR] For more information about the errors and possible solutions, please 
read the following articles:
   [ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
   [ERROR]
   [ERROR] After correcting the problems, you can resume the build with the 
command
   [ERROR]   mvn  -rf :drill-java-exec
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (DRILL-7153) Drill Fails to Build using JDK 1.8.0_65

2019-04-02 Thread Charles Givre (JIRA)

Charles Givre created DRILL-7153:


 Summary: Drill Fails to Build using JDK 1.8.0_65
 Key: DRILL-7153
 URL: https://issues.apache.org/jira/browse/DRILL-7153
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.16.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.16.0


Drill fails to build when using Java 1.8.0_65.  Throws the following error:

[{{ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-compile) 
on project drill-java-exec: Compilation failure
[ERROR] 
/Users/cgivre/github/drill-dev/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java:[59,68]
 error: unreported exception E; must be caught or declared to be thrown
[ERROR]   where E,T,V are type-variables:
[ERROR] E extends Exception declared in method 
accept(ExprVisitor,V)
[ERROR] T extends Object declared in method 
accept(ExprVisitor,V)
[ERROR] V extends Object declared in method 
accept(ExprVisitor,V)
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :drill-java-exec}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [apache|drill] What is the Memory per Large Query？

2019-04-02 Thread Paul Rogers

Hi,

The queue documentation can be a bit hard to find, but it is available at [1]. 
However, it appears that either a) this information is out of date, or b) the 
feature has changed. About 18 months ago we added additional options to make it 
easier to tune the queues, but that information is not in the documentation 
(that I could find.)

The basic rules are:

1. Choose the total memory per Drillbit in drill-env.sh [2].
2. Choose a ratio of large to small query size using exec.queue.memory_ratio
3. Choose a number of concurrent small queries with  exec.queue.small and a 
number of concurrent large queries with exec.queue.large.
4. Enable queues with exec.queue.enable=true

The system will work out the memory available to each small and large query. 
Suppose we have:

* Memory = 9GB
* exec.queue.memory_ratio=5
* exec.queue.small=4
* exec.queue.large=1

Total "memory units" is exec.queue.large * exec.queue.memory_ratio + 
exec.queue.small = 1 * 5 + 4 = 9
We have 9 GB total, so each small query gets 1 GB and each big query gets 5 GB.

If you adjust total memory, the memory-per-query is automatically adjusted. If 
you change the ratio, or the number of queries per queue, the memory is also 
adjusted.

Then, empirically figure out, for your workload, how much memory an average 
"small" and "large" query need. You can run the math backward to figure out how 
many queries you can have in each queue for a given total memory, or how much 
total memory you need to run a certain number of queries.

I believe the team is working on a new system. Still, until that is available, 
would be great to document the above in the current documentation (or point us 
to where the info is hiding...)

Thanks,
- Paul

[1] http://drill.apache.org/docs/enabling-query-queuing/


[2] http://drill.apache.org/docs/configuring-drill-memory/
 

On Tuesday, April 2, 2019, 2:45:04 PM PDT, groobym...@qq.com 
 wrote:  
 
 Hi,  i am a new one to drill, what is the 'Memory per Large Query’  and how to 
configure the large queue size?  Thanks

[jira] [Resolved] (DRILL-7152) Histogram creation throws exception for all nulls column

2019-04-02 Thread Aman Sinha (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved DRILL-7152.
---
Resolution: Fixed

Fixed in 54384a9. 

> Histogram creation throws exception for all nulls column
> 
>
> Key: DRILL-7152
> URL: https://issues.apache.org/jira/browse/DRILL-7152
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>
> ANALYZE command fails when creating the histogram for a table with 1 column 
> with all NULLs. 
> Analyze table `table_stats/parquet_col_nulls` compute statistics;
> {noformat}
> Error: SYSTEM ERROR: NullPointerException
>   (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get 
> TDigest output
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085
> 
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492
> org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224
> 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> 
> org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1669
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> {noformat}
> This table has 1 column with all NULL values:
> {noformat}
> apache drill (dfs.drilltestdir)> select * from 
> `table_stats/parquet_col_nulls` limit 20;
> +--+--+
> | col1 | col2 |
> +--+--+
> | 0| null |
> | 1| null |
> | 2| null |
> | 3| null |
> | 4| null |
> | 5| null |
> | 6| null |
> | 7| null |
> | 8| null |
> | 9| null |
> | 10   | null |
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [drill] amansinha100 merged pull request #1730: DRILL-7152: During histogram creation handle the case when all values…

2019-04-02 Thread GitBox

amansinha100 merged pull request #1730: DRILL-7152: During histogram creation 
handle the case when all values…
URL: https://github.com/apache/drill/pull/1730
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #1726: DRILL-7143: Support default value for empty columns

2019-04-02 Thread GitBox

paul-rogers commented on a change in pull request #1726: DRILL-7143: Support 
default value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271557119
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/impl/VectorPrinter.java
 ##
 @@ -33,7 +32,10 @@
   public static void printOffsets(UInt4Vector vector, int start, int length) {
 header(vector, start, length);
 for (int i = start, j = 0; j < length; i++, j++) {
-  if (j > 0) {
+  if (j % 40 == 0) {
 
 Review comment:
   Before this change, I had a vector of 1000 items all on one line. After this 
change, the output is 40 elements per line. Note that this code is used only 
during debugging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #1726: DRILL-7143: Support default value for empty columns

2019-04-02 Thread GitBox

paul-rogers commented on a change in pull request #1726: DRILL-7143: Support 
default value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271557253
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/AbstractFixedWidthWriter.java
 ##
 @@ -93,17 +112,62 @@ protected final int prepareWrite(int writeIndex) {
 @Override
 protected final void fillEmpties(final int writeIndex) {
   final int width = width();
-  final int stride = ZERO_BUF.length / width;
+  final int stride = emptyValue.length / width;
   int dest = lastWriteIndex + 1;
   while (dest < writeIndex) {
 int length = writeIndex - dest;
 length = Math.min(length, stride);
-drillBuf.setBytes(dest * width, ZERO_BUF, 0, length * width);
+drillBuf.setBytes(dest * width, emptyValue, 0, length * width);
 dest += length;
   }
 }
   }
 
+  /**
+   * Base class for writers that use the Java int type as their native
+   * type. Handles common implicit conversions from other types to int.
+   */
+  public static abstract class BaseIntWriter extends BaseFixedWidthWriter {
+
+@Override
+public final void setLong(final long value) {
+  try {
+// Catches int overflow. Does not catch overflow for smaller types.
+setInt(Math.toIntExact(value));
+  } catch (final ArithmeticException e) {
+throw InvalidConversionError.writeError(schema(), value, e);
+  }
+}
+
+@Override
+public final void setDouble(final double value) {
 
 Review comment:
   Yes, just as setInt() covers TInyInt, SmallInt, Int, UInt1, and UInt2.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] amansinha100 commented on a change in pull request #1729: DRILL-7150: Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST

2019-04-02 Thread GitBox

amansinha100 commented on a change in pull request #1729: DRILL-7150: Fix 
timezone conversion for timestamp from maprdb after the transition from PDT to 
PST
URL: https://github.com/apache/drill/pull/1729#discussion_r271548171
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java
 ##
 @@ -357,7 +357,8 @@ protected void writeTimeStamp(MapOrListWriterImpl writer, 
String fieldName, Docu
* @param readerdocument reader
*/
   private void writeTimestampWithLocalZoneOffset(MapOrListWriterImpl writer, 
String fieldName, DocumentReader reader) {
-long timestamp = reader.getTimestampLong() + 
DateUtility.TIMEZONE_OFFSET_MILLIS;
+long timestamp = 
Instant.ofEpochMilli(reader.getTimestampLong()).atZone(ZoneId.systemDefault())
 
 Review comment:
   Same as above. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] amansinha100 commented on a change in pull request #1729: DRILL-7150: Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST

2019-04-02 Thread GitBox

amansinha100 commented on a change in pull request #1729: DRILL-7150: Fix 
timezone conversion for timestamp from maprdb after the transition from PDT to 
PST
URL: https://github.com/apache/drill/pull/1729#discussion_r271548124
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/CompareFunctionsProcessor.java
 ##
 @@ -93,7 +95,9 @@ public static CompareFunctionsProcessor 
processWithTimeZoneOffset(FunctionCall c
   protected boolean visitTimestampExpr(SchemaPath path, 
TimeStampExpression valueArg) {
 // converts timestamp value from local time zone to UTC since the 
record reader
 // reads the timestamp in local timezone if the 
readTimestampWithZoneOffset flag is enabled
-long timeStamp = valueArg.getTimeStamp() - 
DateUtility.TIMEZONE_OFFSET_MILLIS;
+long timeStamp = 
Instant.ofEpochMilli(valueArg.getTimeStamp()).atZone(ZoneId.of("UTC"))
 
 Review comment:
   This is a long chain of functions .. could you split this into couple of 
statements ? Helps both readability and debugging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] gparai commented on issue #1730: DRILL-7152: During histogram creation handle the case when all values…

2019-04-02 Thread GitBox

gparai commented on issue #1730: DRILL-7152: During histogram creation handle 
the case when all values…
URL: https://github.com/apache/drill/pull/1730#issuecomment-479236474
 
 
   @amansinha100 please take a look at the Travis failure. Otherwise, changes 
LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] amansinha100 commented on issue #1730: DRILL-7152: During histogram creation handle the case when all values…

2019-04-02 Thread GitBox

amansinha100 commented on issue #1730: DRILL-7152: During histogram creation 
handle the case when all values…
URL: https://github.com/apache/drill/pull/1730#issuecomment-479218730
 
 
   @gparai could you please review ?  Thanks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] amansinha100 opened a new pull request #1730: DRILL-7152: During histogram creation handle the case when all values…

2019-04-02 Thread GitBox

amansinha100 opened a new pull request #1730: DRILL-7152: During histogram 
creation handle the case when all values…
URL: https://github.com/apache/drill/pull/1730
 
 
   … of a column are NULLs.
   
   Please see [DRILL-7152](https://issues.apache.org/jira/browse/DRILL-7152) 
for a description of the issue.  It was caused because all the column's values 
are NULLs and the t-digest code-gen functions tried to generate an output for 
an empty t-digest since it does not store any NULL values.  The fix is to check 
the t-digest size() first before trying to create the output. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[apache|drill] What is the Memory per Large Query？

2019-04-02 Thread groobyming

Hi,  i am a new one to drill, what is the 'Memory per Large Query’   and how to 
configure the large queue size?  Thanks

[GitHub] [drill] dvjyothsna commented on a change in pull request #1723: DRILL-7063: Seperate metadata cache file into summary, file metadata

2019-04-02 Thread GitBox

dvjyothsna commented on a change in pull request #1723: DRILL-7063: Seperate 
metadata cache file into summary, file metadata
URL: https://github.com/apache/drill/pull/1723#discussion_r271507937
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/RefreshMetadataHandler.java
 ##
 @@ -161,7 +161,7 @@ public PhysicalPlan getPlan(SqlNode sqlNode) throws 
ForemanSetupException {
*/
   private SqlNodeList getColumnList(final SqlRefreshMetadata 
sqlrefreshMetadata) {
 SqlNodeList columnList = sqlrefreshMetadata.getFieldList();
-if (columnList == null || !SqlNodeList.isEmptyList(columnList)) {
 
 Review comment:
   Removed the extra check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Re: Drill not compiling after rebase!!

2019-04-02 Thread hanu mapr

Hello Vova,

Option 2 makes sense to me. I have tried with type casting the accept
method and it worked.

Thanks,
-Hanu

On Tue, Apr 2, 2019 at 9:23 AM Arina Yelchiyeva 
wrote:

> I would go with the second approach, since it would have less impact on
> the project.
>
> Kind regards,
> Arina
>
> > On Apr 2, 2019, at 6:53 PM, Vova Vysotskyi  wrote:
> >
> > Hi,
> >
> > For now, I see two ways of solving this issue:
> >
> > 1. Find minimum JDK build version where this issue is fixed and specify
> it
> > in requireJavaVersion tag in maven-enforcer-plugin:
> > https://maven.apache.org/enforcer/enforcer-rules/requireJavaVersion.html
> > So build will fail with a clear error message instead of compilation
> error.
> >
> > 2. Explicitly specify RuntimeException type in generic for method, add
> the
> > corresponding comment and suppression to avoid warnings in IDE. (If
> someone
> > has checked that it works. See my previous email.)
> >
> > Kind regards,
> > Volodymyr Vysotskyi
> >
> >
> > On Tue, Apr 2, 2019 at 6:35 PM Charles Givre  wrote:
> >
> >> All,
> >> I tried this on another machine with a higher version of Java and it
> >> worked without the changes below.  So it would seem that this probably
> is a
> >> bug in JDK. How do we proceed?
> >>
> >>> On Apr 1, 2019, at 17:12, Sorabh Hamirwasia 
> >> wrote:
> >>>
> >>> I am not seeing any issue with latest maven and with below java
> version.
> >> As
> >>> Vova suggested this could be a JDK bug.
> >>>
> >>> *# mvn --version*
> >>> *Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
> >>> MaxPermSize=256m; support was removed in 8.0*
> >>> *Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3;
> >>> 2018-10-24T11:41:47-07:00)*
> >>> *Maven home: /opt/maven/apache-maven-3.6.0*
> >>> *Java version: 1.8.0_131, vendor: Oracle Corporation, runtime:
> >>> /opt/jdk1.8.0_131/jre*
> >>>
> >>> @Charles/Hanu,
> >>> Can you upgrade your JDK version and try once ?
> >>>
> >>> Thanks,
> >>> Sorabh
> >>>
> >>> On Mon, Apr 1, 2019 at 1:53 PM hanu mapr  wrote:
> >>>
>  Hello Vova,
> 
>  Here is the java version on my laptop.
> 
>  HMADURI-E597:drill hmaduri$ java -version
>  java version "1.8.0_91"
>  Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
>  Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
>  HMADURI-E597:drill hmaduri$ javac -version
>  javac 1.8.0_91
> 
>  Thanks,
>  -Hanu
> 
>  On Mon, Apr 1, 2019 at 1:45 PM Charles Givre 
> wrote:
> 
> > Hi Volodmyr,
> > I’m on a Mac OSX Mohave, java version 1.8.0_65, maven version 3.6.0.
> >
> > In order to get Drill to build I had to make the following changes:
> >
> > org/apache/drill/exec/store/parquet/TestParquetFilterPushDown.java
> (add
> > try/catch)
> >
> > private void
> > testParquetRowGroupFilterEval(MetadataBase.ParquetTableMetadataBase
>  footer,
> > final int rowGroupIndex, final LogicalExpression filterExpr,
> RowsMatch
> > canDropExpected) {
> > try {
> >  RowsMatch canDrop = FilterEvaluatorUtils.evalFilter(filterExpr,
>  footer,
> > rowGroupIndex, fragContext.getOptions(), fragContext);
> >  Assert.assertEquals(canDropExpected, canDrop);
> > } catch (Exception e) {
> >  fail();
> > }
> > }
> >
> > and
> >
> > org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java
> >
> > public static RowsMatch evalFilter(LogicalExpression expr,
> > MetadataBase.ParquetTableMetadataBase footer,
> > int rowGroupIndex, OptionManager
> > options, FragmentContext fragmentContext) throws Exception {
> >
> > where I added throws Exception.
> >
> >
> >> On Apr 1, 2019, at 16:11, Vova Vysotskyi  wrote:
> >>
> >> Hi all,
> >>
> >> Looking into the code, I don't see a reason for compilation failure,
> > since
> >> the exception type should be inferred from *FieldReferenceFinder*,
>  which
> >> contains *RuntimeException*.
> >>
> >> Perhaps it may be JDK bug, something like this
> >> https://bugs.openjdk.java.net/browse/JDK-8066974.
> >> Charles, Hanu, could you please share you JDK versions, on my
> >> machine 1.8.0_191 and everything works fine.
> >>
> >> Also, could you please check whether specifying types explicitly
> will
> > help:
> >> *expr.accept(new FieldReferenceFinder(), null)* *->*
> > *expr.,
> >> Void, RuntimeException>accept(new FieldReferenceFinder(), null)*
> >>
> >> Kind regards,
> >> Volodymyr Vysotskyi
> >>
> >>
> >> On Mon, Apr 1, 2019 at 10:40 PM Charles Givre 
>  wrote:
> >>
> >>> Hi Hanu,
> >>> I posted code that fixed this to the list.  Once I did that, it
> >> worked
> >>> fine.
> >>> —C
> >>>
>  On Apr 1, 2019, at 15:39, hanu mapr  wrote:
> 
>  Hello All,
> 
>  The exact

[jira] [Created] (DRILL-7152) Histogram creation throws exception for all nulls column

2019-04-02 Thread Aman Sinha (JIRA)

Aman Sinha created DRILL-7152:
-

 Summary: Histogram creation throws exception for all nulls column
 Key: DRILL-7152
 URL: https://issues.apache.org/jira/browse/DRILL-7152
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Aman Sinha
Assignee: Aman Sinha
 Fix For: 1.16.0


ANALYZE command fails when creating the histogram for a table with 1 column 
with all NULLs. 

Analyze table `table_stats/parquet_col_nulls` compute statistics;

{noformat}
Error: SYSTEM ERROR: NullPointerException
  (org.apache.drill.common.exceptions.DrillRuntimeException) Failed to get 
TDigest output

org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputRecordValues():1085

org.apache.drill.exec.test.generated.StreamingAggregatorGen32.outputToBatchPrev():492
org.apache.drill.exec.test.generated.StreamingAggregatorGen32.doWork():224

org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():288
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.record.AbstractRecordBatch.next():126
org.apache.drill.exec.record.AbstractRecordBatch.next():116

org.apache.drill.exec.physical.impl.statistics.StatisticsMergeBatch.innerNext():358
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.record.AbstractRecordBatch.next():126
org.apache.drill.exec.record.AbstractRecordBatch.next():116

org.apache.drill.exec.physical.impl.unpivot.UnpivotMapsRecordBatch.innerNext():106
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.record.AbstractRecordBatch.next():126
org.apache.drill.exec.record.AbstractRecordBatch.next():116

org.apache.drill.exec.physical.impl.StatisticsWriterRecordBatch.innerNext():96
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.record.AbstractRecordBatch.next():126
org.apache.drill.exec.record.AbstractRecordBatch.next():116
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
org.apache.drill.exec.record.AbstractRecordBatch.next():186
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1669
org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
{noformat}

This table has 1 column with all NULL values:

{noformat}
apache drill (dfs.drilltestdir)> select * from `table_stats/parquet_col_nulls` 
limit 20;
+--+--+
| col1 | col2 |
+--+--+
| 0| null |
| 1| null |
| 2| null |
| 3| null |
| 4| null |
| 5| null |
| 6| null |
| 7| null |
| 8| null |
| 9| null |
| 10   | null |
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [drill] sohami commented on issue #1671: DRILL-7045 UDF string_binary java.lang.IndexOutOfBoundsException

2019-04-02 Thread GitBox

sohami commented on issue #1671: DRILL-7045 UDF string_binary 
java.lang.IndexOutOfBoundsException
URL: https://github.com/apache/drill/pull/1671#issuecomment-479186883
 
 
   @jcmcote - I have address @KazydubB comment in this commit and rebased on 
latest apache. Can you please make the change or pull in this commit so that we 
can close this PR ? 
https://github.com/sohami/drill/commit/7aaef8691a4a594442464301035ea3aefd7497dd


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Re: [DISCUSS]: Additional Formats for Drill

2019-04-02 Thread Ted Dunning

I have no idea how much uptake these would have, but if the library can
give all the formats all at once for modest effort, that would be great.

On Tue, Apr 2, 2019 at 9:22 AM Charles Givre  wrote:

> Hello everyone,
> I recently presented a talk at the ASF DC Roadshow (shameless plug[1] )
> but heard a really good talk by a PMC member for the Apache Daffodil
> (incubating) project.  At its core, Daffodil is a collection of parsers
> which convert various data formats to a standard structure which can then
> be ingested into other tools.   Some of these formats Drill already can
> ingest natively such as PCAP, CSV however many cannot such as NACHA (bulk
> financial transactions), vCard, Shapefile, and many more.  Here is a brief
> presentation about Daffodil [2].
>
> The DFDLSchemas github has a handful of DFDL schemas that are pretty good
> open source examples[3].
>
> On a related note, I stumbled on the Kaitai struct library[4] which is
> another library which performs a similar function to Daffodil.  Would it be
> of interest for the community to incorporate these libraries into Drill?
> My thought is that it would greatly increase the types of data that Drill
> can natively query and hence seriously increase Drill’s usefulness.  If
> there is interest, (and honestly even if there isn’t) I can start working
> on this for the next release of Drill.
>
>
> [1]:
> https://www.slideshare.net/cgivre/drilling-cyber-security-data-with-apache-drill
> <
> https://www.slideshare.net/cgivre/drilling-cyber-security-data-with-apache-drill
> >
> [2]:
> https://www.slideshare.net/mbeckerle/tresys-dfdl-data-format-description-language-daffodil-open-source-public-overview-100432615
> <
> https://www.slideshare.net/mbeckerle/tresys-dfdl-data-format-description-language-daffodil-open-source-public-overview-100432615
> >
> [3]: https://github.com/DFDLSchemas 
> [4]: http://formats.kaitai.io 
>
>

[GitHub] [drill] kkhatua edited a comment on issue #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option

2019-04-02 Thread GitBox

kkhatua edited a comment on issue #1714: DRILL-7048: Implement JDBC 
Statement.setMaxRows() with System Option
URL: https://github.com/apache/drill/pull/1714#issuecomment-479152740
 
 
   @vvysotskyi , @ihuzenko 
   I've done the changes and verified the tests. If everything is fine, I'll 
rebase on the latest master (there are small conflicts due to new commits on 
master introducing additional system options)
   
   I've also included a trim for the values, so an input of `100 `  will be 
treated as valid.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] kkhatua commented on issue #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option

2019-04-02 Thread GitBox

kkhatua commented on issue #1714: DRILL-7048: Implement JDBC 
Statement.setMaxRows() with System Option
URL: https://github.com/apache/drill/pull/1714#issuecomment-479152740
 
 
   @vvysotskyi , @ihuzenko 
   I've done the changes and verified the tests. If everything is fine, I'll 
rebase on the latest master (there are small conflicts due to new commits on 
master introducing additional system options)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] kkhatua commented on a change in pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option

2019-04-02 Thread GitBox

kkhatua commented on a change in pull request #1714: DRILL-7048: Implement JDBC 
Statement.setMaxRows() with System Option
URL: https://github.com/apache/drill/pull/1714#discussion_r271452501
 
 

 ##
 File path: 
exec/jdbc/src/test/java/org/apache/drill/jdbc/PreparedStatementTest.java
 ##
 @@ -462,4 +618,25 @@ public void 
testParamSettingWhenUnsupportedTypeSaysUnsupported() throws SQLExcep
 }
   }
 
+
+  // Sets the SystemMaxRows option
+  private void setSystemMaxRows(int sysValueToSet) throws SQLException {
 
 Review comment:
   As per our chat, I've introduced `@Before` and `@After` methods for 
synchronizing the `system` level modifications to the options.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] HanumathRao commented on a change in pull request #1725: DRILL-7146: Query failing with NPE when ZK queue is enabled.

2019-04-02 Thread GitBox

HanumathRao commented on a change in pull request #1725: DRILL-7146: Query 
failing with NPE when ZK queue is enabled.
URL: https://github.com/apache/drill/pull/1725#discussion_r271437936
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/planner/rm/TestMemoryCalculator.java
 ##
 @@ -59,6 +59,7 @@
 
   private static final long DEFAULT_SLICE_TARGET = 10L;
   private static final long DEFAULT_BATCH_SIZE = 16*1024*1024;
+  private static final String ENABLE_QUEUE = 
"drill.exec.queue.embedded.enable";
 
 Review comment:
   I have updated the test case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Resolved] (DRILL-6377) typeof() does not return DECIMAL scale, precision

2019-04-02 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-6377.
-
   Resolution: Fixed
Fix Version/s: 1.16.0

> typeof() does not return DECIMAL scale, precision
> -
>
> Key: DRILL-6377
> URL: https://issues.apache.org/jira/browse/DRILL-6377
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Priority: Minor
> Fix For: 1.16.0
>
>
> The {{typeof()}} function returns the type of a column:
> {noformat}
> SELECT typeof(CAST(a AS DOUBLE)) FROM (VALUES (1)) AS T(a);
> +-+
> | EXPR$0  |
> +-+
> | FLOAT8  |
> +-+
> {noformat}
> In Drill, the {{DECIMAL}} type is parameterized with scale and precision. 
> However, {{typeof()}} does not return this information:
> {noformat}
> ALTER SESSION SET `planner.enable_decimal_data_type` = true;
> SELECT typeof(CAST(a AS DECIMAL)) FROM (VALUES (1)) AS T(a);
> +--+
> |  EXPR$0  |
> +--+
> | DECIMAL38SPARSE  |
> +--+
> SELECT typeof(CAST(a AS DECIMAL(6, 3))) FROM (VALUES (1)) AS T(a);
> +---+
> |  EXPR$0   |
> +---+
> | DECIMAL9  |
> +---+
> {noformat}
> Expected something of the form {{DECIMAL(, )}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [drill] ihuzenko commented on issue #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

ihuzenko commented on issue #1706: DRILL-7115: Improve Hive schema show tables 
performance
URL: https://github.com/apache/drill/pull/1706#issuecomment-479094218
 
 
   @vdiravka , I've addressed comments. 
   
   I totally agree with you that refactoring is better to put into separate 
commits and I'll use this approach in future. 
   
   For show tables authorization improvement was created 
[DRILL-7151](https://issues.apache.org/jira/browse/DRILL-7151) ticket. 
   
   For caches was changed type of ```tableNamesCache``` to  
```LoadingCache> ```, previously only 
names were cached here, also all work with Guava caches was unified under 
```HiveMetadataCache``` facade. 
   
   For Drill Hive SASL (Kerberos) connection I didn't introduce changes, 
related code from ```DrillHiveMetaStoreClientFactory``` was previously in 
```DrillHiveMetaStoreClient```.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[DISCUSS]: Hadoop 3

2019-04-02 Thread Vitalii Diravka

Hi devs!

I am working on the update of Hadoop libs to the 3.2.0 version [1].
I found the issue in *hadoop-common* related to several loggers in the
project [2], [3].
So to update the version of hadoop libs in Drill it is necessary to remove
*commons-logging* from banned dependencies [4].
After doing it I didn't find conflicts between two logger libs in Drill.

Is this solution acceptable?
It can be temporary until [3] is fixed.



[1] https://issues.apache.org/jira/browse/DRILL-6540
[2]
https://issues.apache.org/jira/browse/DRILL-6540?focusedCommentId=16606306&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16606306
[3] https://issues.apache.org/jira/browse/HADOOP-15749
[4] https://github.com/apache/drill/blob/master/pom.xml#L522


Kind regards
Vitalii

[GitHub] [drill] vvysotskyi commented on issue #1729: DRILL-7150: Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST

2019-04-02 Thread GitBox

vvysotskyi commented on issue #1729: DRILL-7150: Fix timezone conversion for 
timestamp from maprdb after the transition from PDT to PST
URL: https://github.com/apache/drill/pull/1729#issuecomment-479083498
 
 
   @amansinha100, since you have reviewed the original PR, could you please 
review this one?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi opened a new pull request #1729: DRILL-7150: Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST

2019-04-02 Thread GitBox

vvysotskyi opened a new pull request #1729: DRILL-7150: Fix timezone conversion 
for timestamp from maprdb after the transition from PDT to PST
URL: https://github.com/apache/drill/pull/1729
 
 
   Used JDK classes to convert timestamp from one timezone to another one 
instead of adding milliseconds which corresponds to the offset.
   
   For problem description please see 
[DRILL-7151](https://issues.apache.org/jira/browse/DRILL-7151).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Re: Drill not compiling after rebase!!

2019-04-02 Thread Arina Yelchiyeva

I would go with the second approach, since it would have less impact on the 
project.

Kind regards,
Arina

> On Apr 2, 2019, at 6:53 PM, Vova Vysotskyi  wrote:
> 
> Hi,
> 
> For now, I see two ways of solving this issue:
> 
> 1. Find minimum JDK build version where this issue is fixed and specify it
> in requireJavaVersion tag in maven-enforcer-plugin:
> https://maven.apache.org/enforcer/enforcer-rules/requireJavaVersion.html
> So build will fail with a clear error message instead of compilation error.
> 
> 2. Explicitly specify RuntimeException type in generic for method, add the
> corresponding comment and suppression to avoid warnings in IDE. (If someone
> has checked that it works. See my previous email.)
> 
> Kind regards,
> Volodymyr Vysotskyi
> 
> 
> On Tue, Apr 2, 2019 at 6:35 PM Charles Givre  wrote:
> 
>> All,
>> I tried this on another machine with a higher version of Java and it
>> worked without the changes below.  So it would seem that this probably is a
>> bug in JDK. How do we proceed?
>> 
>>> On Apr 1, 2019, at 17:12, Sorabh Hamirwasia 
>> wrote:
>>> 
>>> I am not seeing any issue with latest maven and with below java version.
>> As
>>> Vova suggested this could be a JDK bug.
>>> 
>>> *# mvn --version*
>>> *Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
>>> MaxPermSize=256m; support was removed in 8.0*
>>> *Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3;
>>> 2018-10-24T11:41:47-07:00)*
>>> *Maven home: /opt/maven/apache-maven-3.6.0*
>>> *Java version: 1.8.0_131, vendor: Oracle Corporation, runtime:
>>> /opt/jdk1.8.0_131/jre*
>>> 
>>> @Charles/Hanu,
>>> Can you upgrade your JDK version and try once ?
>>> 
>>> Thanks,
>>> Sorabh
>>> 
>>> On Mon, Apr 1, 2019 at 1:53 PM hanu mapr  wrote:
>>> 
 Hello Vova,
 
 Here is the java version on my laptop.
 
 HMADURI-E597:drill hmaduri$ java -version
 java version "1.8.0_91"
 Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
 Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
 HMADURI-E597:drill hmaduri$ javac -version
 javac 1.8.0_91
 
 Thanks,
 -Hanu
 
 On Mon, Apr 1, 2019 at 1:45 PM Charles Givre  wrote:
 
> Hi Volodmyr,
> I’m on a Mac OSX Mohave, java version 1.8.0_65, maven version 3.6.0.
> 
> In order to get Drill to build I had to make the following changes:
> 
> org/apache/drill/exec/store/parquet/TestParquetFilterPushDown.java (add
> try/catch)
> 
> private void
> testParquetRowGroupFilterEval(MetadataBase.ParquetTableMetadataBase
 footer,
> final int rowGroupIndex, final LogicalExpression filterExpr, RowsMatch
> canDropExpected) {
> try {
>  RowsMatch canDrop = FilterEvaluatorUtils.evalFilter(filterExpr,
 footer,
> rowGroupIndex, fragContext.getOptions(), fragContext);
>  Assert.assertEquals(canDropExpected, canDrop);
> } catch (Exception e) {
>  fail();
> }
> }
> 
> and
> 
> org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java
> 
> public static RowsMatch evalFilter(LogicalExpression expr,
> MetadataBase.ParquetTableMetadataBase footer,
> int rowGroupIndex, OptionManager
> options, FragmentContext fragmentContext) throws Exception {
> 
> where I added throws Exception.
> 
> 
>> On Apr 1, 2019, at 16:11, Vova Vysotskyi  wrote:
>> 
>> Hi all,
>> 
>> Looking into the code, I don't see a reason for compilation failure,
> since
>> the exception type should be inferred from *FieldReferenceFinder*,
 which
>> contains *RuntimeException*.
>> 
>> Perhaps it may be JDK bug, something like this
>> https://bugs.openjdk.java.net/browse/JDK-8066974.
>> Charles, Hanu, could you please share you JDK versions, on my
>> machine 1.8.0_191 and everything works fine.
>> 
>> Also, could you please check whether specifying types explicitly will
> help:
>> *expr.accept(new FieldReferenceFinder(), null)* *->*
> *expr.,
>> Void, RuntimeException>accept(new FieldReferenceFinder(), null)*
>> 
>> Kind regards,
>> Volodymyr Vysotskyi
>> 
>> 
>> On Mon, Apr 1, 2019 at 10:40 PM Charles Givre 
 wrote:
>> 
>>> Hi Hanu,
>>> I posted code that fixed this to the list.  Once I did that, it
>> worked
>>> fine.
>>> —C
>>> 
 On Apr 1, 2019, at 15:39, hanu mapr  wrote:
 
 Hello All,
 
 The exact function which is causing this error is the following.
 
 public static RowsMatch evalFilter(LogicalExpression expr,
 MetadataBase.ParquetTableMetadataBase footer,
int rowGroupIndex, OptionManager
 options, FragmentContext fragmentContext) throws Exception {
 
 and also for the caller functions in TestParquetFilterPushDown all
> along.
 
>>

[DISCUSS]: Additional Formats for Drill

2019-04-02 Thread Charles Givre

Hello everyone, 
I recently presented a talk at the ASF DC Roadshow (shameless plug[1] ) but 
heard a really good talk by a PMC member for the Apache Daffodil (incubating) 
project.  At its core, Daffodil is a collection of parsers which convert 
various data formats to a standard structure which can then be ingested into 
other tools.   Some of these formats Drill already can ingest natively such as 
PCAP, CSV however many cannot such as NACHA (bulk financial transactions), 
vCard, Shapefile, and many more.  Here is a brief presentation about Daffodil 
[2].  

The DFDLSchemas github has a handful of DFDL schemas that are pretty good open 
source examples[3].  

On a related note, I stumbled on the Kaitai struct library[4] which is another 
library which performs a similar function to Daffodil.  Would it be of interest 
for the community to incorporate these libraries into Drill?  My thought is 
that it would greatly increase the types of data that Drill can natively query 
and hence seriously increase Drill’s usefulness.  If there is interest, (and 
honestly even if there isn’t) I can start working on this for the next release 
of Drill.


[1]: 
https://www.slideshare.net/cgivre/drilling-cyber-security-data-with-apache-drill
 

[2]: 
https://www.slideshare.net/mbeckerle/tresys-dfdl-data-format-description-language-daffodil-open-source-public-overview-100432615
 

[3]: https://github.com/DFDLSchemas 
[4]: http://formats.kaitai.io

Re: Drill not compiling after rebase!!

2019-04-02 Thread Vova Vysotskyi

Hi,

For now, I see two ways of solving this issue:

1. Find minimum JDK build version where this issue is fixed and specify it
in requireJavaVersion tag in maven-enforcer-plugin:
https://maven.apache.org/enforcer/enforcer-rules/requireJavaVersion.html
So build will fail with a clear error message instead of compilation error.

2. Explicitly specify RuntimeException type in generic for method, add the
corresponding comment and suppression to avoid warnings in IDE. (If someone
has checked that it works. See my previous email.)

Kind regards,
Volodymyr Vysotskyi


On Tue, Apr 2, 2019 at 6:35 PM Charles Givre  wrote:

> All,
> I tried this on another machine with a higher version of Java and it
> worked without the changes below.  So it would seem that this probably is a
> bug in JDK. How do we proceed?
>
> > On Apr 1, 2019, at 17:12, Sorabh Hamirwasia 
> wrote:
> >
> > I am not seeing any issue with latest maven and with below java version.
> As
> > Vova suggested this could be a JDK bug.
> >
> > *# mvn --version*
> > *Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
> > MaxPermSize=256m; support was removed in 8.0*
> > *Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3;
> > 2018-10-24T11:41:47-07:00)*
> > *Maven home: /opt/maven/apache-maven-3.6.0*
> > *Java version: 1.8.0_131, vendor: Oracle Corporation, runtime:
> > /opt/jdk1.8.0_131/jre*
> >
> > @Charles/Hanu,
> > Can you upgrade your JDK version and try once ?
> >
> > Thanks,
> > Sorabh
> >
> > On Mon, Apr 1, 2019 at 1:53 PM hanu mapr  wrote:
> >
> >> Hello Vova,
> >>
> >> Here is the java version on my laptop.
> >>
> >> HMADURI-E597:drill hmaduri$ java -version
> >> java version "1.8.0_91"
> >> Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
> >> Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
> >> HMADURI-E597:drill hmaduri$ javac -version
> >> javac 1.8.0_91
> >>
> >> Thanks,
> >> -Hanu
> >>
> >> On Mon, Apr 1, 2019 at 1:45 PM Charles Givre  wrote:
> >>
> >>> Hi Volodmyr,
> >>> I’m on a Mac OSX Mohave, java version 1.8.0_65, maven version 3.6.0.
> >>>
> >>> In order to get Drill to build I had to make the following changes:
> >>>
> >>> org/apache/drill/exec/store/parquet/TestParquetFilterPushDown.java (add
> >>> try/catch)
> >>>
> >>> private void
> >>> testParquetRowGroupFilterEval(MetadataBase.ParquetTableMetadataBase
> >> footer,
> >>> final int rowGroupIndex, final LogicalExpression filterExpr, RowsMatch
> >>> canDropExpected) {
> >>> try {
> >>>   RowsMatch canDrop = FilterEvaluatorUtils.evalFilter(filterExpr,
> >> footer,
> >>> rowGroupIndex, fragContext.getOptions(), fragContext);
> >>>   Assert.assertEquals(canDropExpected, canDrop);
> >>> } catch (Exception e) {
> >>>   fail();
> >>> }
> >>> }
> >>>
> >>> and
> >>>
> >>> org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java
> >>>
> >>> public static RowsMatch evalFilter(LogicalExpression expr,
> >>> MetadataBase.ParquetTableMetadataBase footer,
> >>>  int rowGroupIndex, OptionManager
> >>> options, FragmentContext fragmentContext) throws Exception {
> >>>
> >>> where I added throws Exception.
> >>>
> >>>
>  On Apr 1, 2019, at 16:11, Vova Vysotskyi  wrote:
> 
>  Hi all,
> 
>  Looking into the code, I don't see a reason for compilation failure,
> >>> since
>  the exception type should be inferred from *FieldReferenceFinder*,
> >> which
>  contains *RuntimeException*.
> 
>  Perhaps it may be JDK bug, something like this
>  https://bugs.openjdk.java.net/browse/JDK-8066974.
>  Charles, Hanu, could you please share you JDK versions, on my
>  machine 1.8.0_191 and everything works fine.
> 
>  Also, could you please check whether specifying types explicitly will
> >>> help:
>  *expr.accept(new FieldReferenceFinder(), null)* *->*
> >>> *expr.,
>  Void, RuntimeException>accept(new FieldReferenceFinder(), null)*
> 
>  Kind regards,
>  Volodymyr Vysotskyi
> 
> 
>  On Mon, Apr 1, 2019 at 10:40 PM Charles Givre 
> >> wrote:
> 
> > Hi Hanu,
> > I posted code that fixed this to the list.  Once I did that, it
> worked
> > fine.
> > —C
> >
> >> On Apr 1, 2019, at 15:39, hanu mapr  wrote:
> >>
> >> Hello All,
> >>
> >> The exact function which is causing this error is the following.
> >>
> >> public static RowsMatch evalFilter(LogicalExpression expr,
> >> MetadataBase.ParquetTableMetadataBase footer,
> >> int rowGroupIndex, OptionManager
> >> options, FragmentContext fragmentContext) throws Exception {
> >>
> >> and also for the caller functions in TestParquetFilterPushDown all
> >>> along.
> >>
> >> I think evalFilter needs to catch the Exception or throw an
> >> Exception.
> >> I just tried this, didn't put much thought into it. So I think this
> >> Exception needs to be handled properly.
> >>
> >>
> >> Thanks,

Re: Drill not compiling after rebase!!

2019-04-02 Thread Charles Givre

All, 
I tried this on another machine with a higher version of Java and it worked 
without the changes below.  So it would seem that this probably is a bug in 
JDK. How do we proceed?

> On Apr 1, 2019, at 17:12, Sorabh Hamirwasia  wrote:
> 
> I am not seeing any issue with latest maven and with below java version. As
> Vova suggested this could be a JDK bug.
> 
> *# mvn --version*
> *Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
> MaxPermSize=256m; support was removed in 8.0*
> *Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3;
> 2018-10-24T11:41:47-07:00)*
> *Maven home: /opt/maven/apache-maven-3.6.0*
> *Java version: 1.8.0_131, vendor: Oracle Corporation, runtime:
> /opt/jdk1.8.0_131/jre*
> 
> @Charles/Hanu,
> Can you upgrade your JDK version and try once ?
> 
> Thanks,
> Sorabh
> 
> On Mon, Apr 1, 2019 at 1:53 PM hanu mapr  wrote:
> 
>> Hello Vova,
>> 
>> Here is the java version on my laptop.
>> 
>> HMADURI-E597:drill hmaduri$ java -version
>> java version "1.8.0_91"
>> Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
>> Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
>> HMADURI-E597:drill hmaduri$ javac -version
>> javac 1.8.0_91
>> 
>> Thanks,
>> -Hanu
>> 
>> On Mon, Apr 1, 2019 at 1:45 PM Charles Givre  wrote:
>> 
>>> Hi Volodmyr,
>>> I’m on a Mac OSX Mohave, java version 1.8.0_65, maven version 3.6.0.
>>> 
>>> In order to get Drill to build I had to make the following changes:
>>> 
>>> org/apache/drill/exec/store/parquet/TestParquetFilterPushDown.java (add
>>> try/catch)
>>> 
>>> private void
>>> testParquetRowGroupFilterEval(MetadataBase.ParquetTableMetadataBase
>> footer,
>>> final int rowGroupIndex, final LogicalExpression filterExpr, RowsMatch
>>> canDropExpected) {
>>> try {
>>>   RowsMatch canDrop = FilterEvaluatorUtils.evalFilter(filterExpr,
>> footer,
>>> rowGroupIndex, fragContext.getOptions(), fragContext);
>>>   Assert.assertEquals(canDropExpected, canDrop);
>>> } catch (Exception e) {
>>>   fail();
>>> }
>>> }
>>> 
>>> and
>>> 
>>> org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java
>>> 
>>> public static RowsMatch evalFilter(LogicalExpression expr,
>>> MetadataBase.ParquetTableMetadataBase footer,
>>>  int rowGroupIndex, OptionManager
>>> options, FragmentContext fragmentContext) throws Exception {
>>> 
>>> where I added throws Exception.
>>> 
>>> 
 On Apr 1, 2019, at 16:11, Vova Vysotskyi  wrote:
 
 Hi all,
 
 Looking into the code, I don't see a reason for compilation failure,
>>> since
 the exception type should be inferred from *FieldReferenceFinder*,
>> which
 contains *RuntimeException*.
 
 Perhaps it may be JDK bug, something like this
 https://bugs.openjdk.java.net/browse/JDK-8066974.
 Charles, Hanu, could you please share you JDK versions, on my
 machine 1.8.0_191 and everything works fine.
 
 Also, could you please check whether specifying types explicitly will
>>> help:
 *expr.accept(new FieldReferenceFinder(), null)* *->*
>>> *expr.,
 Void, RuntimeException>accept(new FieldReferenceFinder(), null)*
 
 Kind regards,
 Volodymyr Vysotskyi
 
 
 On Mon, Apr 1, 2019 at 10:40 PM Charles Givre 
>> wrote:
 
> Hi Hanu,
> I posted code that fixed this to the list.  Once I did that, it worked
> fine.
> —C
> 
>> On Apr 1, 2019, at 15:39, hanu mapr  wrote:
>> 
>> Hello All,
>> 
>> The exact function which is causing this error is the following.
>> 
>> public static RowsMatch evalFilter(LogicalExpression expr,
>> MetadataBase.ParquetTableMetadataBase footer,
>> int rowGroupIndex, OptionManager
>> options, FragmentContext fragmentContext) throws Exception {
>> 
>> and also for the caller functions in TestParquetFilterPushDown all
>>> along.
>> 
>> I think evalFilter needs to catch the Exception or throw an
>> Exception.
>> I just tried this, didn't put much thought into it. So I think this
>> Exception needs to be handled properly.
>> 
>> 
>> Thanks,
>> 
>> -Hanu
>> 
>> 
>> On Mon, Apr 1, 2019 at 12:20 PM hanu mapr 
>> wrote:
>> 
>>> Hello All,
>>> 
>>> I am also getting the same error which Charles got on compilation of
>>> the
>>> latest build.
>>> 
>>> 
>>> Here is the message which I got.
>>> 
>>> [ERROR]
>>> 
> 
>>> 
>> /Users/hmaduri/contribs/APACHE/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java:[59,68]
>>> error: unreported exception E; must be caught or declared to be
>> thrown
>>> where E,T,V are type-variables:
>>>  E extends Exception declared in method
>>> accept(ExprVisitor,V)
>>>  T extends Object declared in method
> accept(ExprVisitor,V)
>>>  V extends Object declared in method
> accept(ExprVisitor,V)
>>> 
>>> Thanks

[jira] [Created] (DRILL-7151) Show only accessible tables when Hive authorization enabled

2019-04-02 Thread Igor Guzenko (JIRA)

Igor Guzenko created DRILL-7151:
---

 Summary: Show only accessible tables when Hive authorization 
enabled
 Key: DRILL-7151
 URL: https://issues.apache.org/jira/browse/DRILL-7151
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Igor Guzenko
Assignee: Igor Guzenko


The SHOW TABLES for Hive worked inconsistently for very long time.

Before changes introduced by DRILL-7115 only accessible tables were shown only 
when Hive Storage Based Authorization is enabled, but for SQL Standard Based 
Authorization all tables were shown to user ([related 
discussion|https://github.com/apache/drill/pull/461#discussion_r58753354]). 

In scope of DRILL-7115 the only accessible restriction for Storage Based 
Authorization was weakened in order to improve query performance.

There is still need to improve security of Hive show tables query and at the 
same time do not violate performance requirements. 

For SQL Standard Based Authorization this can be done by asking 
```HiveAuthorizationHelper.authorizerV2``` for table's 'SELECT' permission.

For Storage Based Authorization performance acceptable approach is not known 
for now, one of ideas is try using appropriate Hive storage based authorizer 
class for the purpose. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

DoY and Kerberos

2019-04-02 Thread Charles Givre

Hello all, 
I opened a JIRA about this (https://issues.apache.org/jira/browse/DRILL-7149 
) but my company is trying to 
deploy Drill to our cluster, and we found that the DoY seems to lack any 
support for Kerberos.  Is this in fact the case or is anyone looking at this?  
Thanks,
— C

[jira] [Created] (DRILL-7150) Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST

2019-04-02 Thread Volodymyr Vysotskyi (JIRA)

Volodymyr Vysotskyi created DRILL-7150:
--

 Summary: Fix timezone conversion for timestamp from maprdb after 
the transition from PDT to PST
 Key: DRILL-7150
 URL: https://issues.apache.org/jira/browse/DRILL-7150
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - MapRDB
Affects Versions: 1.16.0
Reporter: Volodymyr Vysotskyi
Assignee: Volodymyr Vysotskyi
 Fix For: 1.16.0


Steps to reproduce:
0. Set PST timezone and date {{date +%Y%m%d -s "20190329"}}
1. Create the table in MaprDB shell:
{noformat}
create /tmp/testtimestamp
insert /tmp/testtimestamp --value 
'{"_id":"eot","str":"-01-01T23:59:59.999","ts":{"$date":"-01-02T07:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"pdt","str":"2019-04-01T23:59:59.999","ts":{"$date":"2019-04-02T06:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"pst","str":"2019-01-01T23:59:59.999","ts":{"$date":"2019-01-02T07:59:59.999Z"}}'
insert /tmp/testtimestamp --value 
'{"_id":"unk","str":"2017-07-08T20:01:49.885","ts":{"$date":"2017-07-09T03:01:49.885Z"}}'
{noformat}
2. Create a hive table:
{code:sql}
CREATE EXTERNAL TABLE default.timeTest
(`_id` string,
`str` string,
`ts` timestamp)
ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'  
STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'  
TBLPROPERTIES ( 'maprdb.column.id'='_id', 'maprdb.table.name'='/tmp/timeTest')
{code}
3. Enable native reader and timezone conversion for maprdb timestamp:
{code:sql}
alter session set store.hive.maprdb_json.optimize_scan_with_native_reader=true;
alter session store.hive.maprdb_json.read_timestamp_with_timezone_offset=true;
{code}
4. Run the query on the table from Drill using hive plugin:
{code}
0: jdbc:drill:drillbit=ldevdmhn005:31010> select * from hive.default.timeTest;
+--+--+--+
| _id  |   str|ts|
+--+--+--+
| eot  | -01-01T23:59:59.999  | -01-02 00:59:59.999  |
| pdt  | 2019-04-01T23:59:59.999  | 2019-04-01 23:59:59.999  |
| pst  | 2019-01-01T23:59:59.999  | 2019-01-02 00:59:59.999  |
| unk  | 2017-07-08T20:01:49.885  | 2017-07-08 20:01:49.885  |
+--+--+--+
4 rows selected (0.343 seconds)
{code}

Plese note that the results for {{eot}} and {{pst}} values are wrong.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [drill] ihuzenko commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

ihuzenko commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271339849
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ##
 @@ -920,46 +920,11 @@ public void dropTable(String table) {
 }
 
 @Override
-public List> getTableNamesAndTypes(boolean 
bulkLoad, int bulkSize) {
-  final List> tableNamesAndTypes = 
Lists.newArrayList();
-
-  // Look for raw tables first
-  if (!tables.isEmpty()) {
-for (Map.Entry tableEntry : 
tables.entrySet()) {
-  tableNamesAndTypes
-  .add(Pair.of(tableEntry.getKey().sig.name, 
tableEntry.getValue().getJdbcTableType()));
-}
-  }
-  // Then look for files that start with this name and end in .drill.
-  List files = Collections.emptyList();
-  try {
-files = DotDrillUtil.getDotDrills(getFS(), new 
Path(config.getLocation()), DotDrillType.VIEW);
-  } catch (AccessControlException e) {
-if (!schemaConfig.getIgnoreAuthErrors()) {
-  logger.debug(e.getMessage());
-  throw UserException.permissionError(e)
-  .message("Not authorized to list or query tables in schema 
[%s]", getFullSchemaName())
-  .build(logger);
-}
-  } catch (IOException e) {
-logger.warn("Failure while trying to list view tables in workspace 
[{}]", getFullSchemaName(), e);
-  } catch (UnsupportedOperationException e) {
-// the file system (e.g. the classpath filesystem) may not support 
listing
-// of files. But see getViews(), it ignores the exception and continues
-logger.debug("Failure while trying to list view tables in workspace 
[{}]", getFullSchemaName(), e);
-  }
-
-  try {
-for (DotDrillFile f : files) {
-  if (f.getType() == DotDrillType.VIEW) {
-tableNamesAndTypes.add(Pair.of(f.getBaseName(), TableType.VIEW));
-  }
-}
-  } catch (UnsupportedOperationException e) {
-logger.debug("The filesystem for this workspace does not support this 
operation.", e);
 
 Review comment:
   This deleted code mostly duplicated body of existing ```getViews()``` 
method. This logging statement also present in the method. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (DRILL-7149) Kerberos Code Missing from Drill on YARN

2019-04-02 Thread Charles Givre (JIRA)

Charles Givre created DRILL-7149:


 Summary: Kerberos Code Missing from Drill on YARN
 Key: DRILL-7149
 URL: https://issues.apache.org/jira/browse/DRILL-7149
 Project: Apache Drill
  Issue Type: Bug
  Components: Security
Affects Versions: 1.14.0
Reporter: Charles Givre


My company is trying to deploy Drill using the Drill on Yarn (DoY) and we have 
run into the issue that DoY does not seem to support passing Kerberos 
credentials in order to interact with HDFS. 

Upon checking the source code available in GIT 
(https://github.com/apache/drill/blob/1.14.0/drill-yarn/src/main/java/org/apache/drill/yarn/core/)
 and referring to Apache YARN documentation 
(https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html)
 , we saw no section for passing the security credentials needed by the 
application to interact with any Hadoop cluster services and applications. 

This we feel needs to be added to the source code so that delegation tokens can 
be passed inside the container for the process to be able access Drill archive 
on HDFS and start. It probably should be added to the ContainerLaunchContext 
within the ApplicationSubmissionContext for DoY as suggested under Apache 
documentation.
 
We tried the same DoY utility on a non-kerberised cluster and the process 
started well. Although we ran into a different issue there of hosts getting 
blacklisted
We tested with the Single Principal per cluster option.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [drill] ihuzenko commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

ihuzenko commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271335079
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableEntryCacheLoader.java
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.exec.store.hive.ColumnListsCache;
+import org.apache.drill.exec.store.hive.HiveReadEntry;
+import org.apache.drill.exec.store.hive.HiveTableWithColumnCache;
+import org.apache.drill.exec.store.hive.HiveTableWrapper;
+import org.apache.drill.exec.store.hive.HiveUtilities;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.api.UnknownTableException;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * CacheLoader that synchronized on client and tries to reconnect when
+ * client fails. Used by {@link HiveMetadataCache}.
+ */
+final class TableEntryCacheLoader extends CacheLoader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(TableNameLoader.class);
+
+  private final DrillHiveMetaStoreClient client;
+
+  TableEntryCacheLoader(DrillHiveMetaStoreClient client) {
+this.client = client;
+  }
+
+
+  @Override
+  @SuppressWarnings("NullableProblems")
+  public HiveReadEntry load(TableName key) throws Exception {
+Table table;
+List partitions;
+synchronized (client) {
+  table = getTable(key);
+  partitions = getPartitions(key);
+}
+HiveTableWithColumnCache hiveTable = new HiveTableWithColumnCache(table, 
new ColumnListsCache(table));
+List partitionWrappers = 
partitions.isEmpty()
+? null
 
 Review comment:
   I've considered possibility to use empty lists and can conclude that doing 
this will break backward compatibility because ```HiveReadEntry``` is part of 
JSON serializable  ```HiveScan``` operator, and deserializing empty lists for 
older drillbit which expect null list may break null dependent checks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on issue #1728: DRILL-7089: Implement caching for TableMetadataProvider at query level and adapt statistics to use Drill metastore API

2019-04-02 Thread GitBox

vvysotskyi commented on issue #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728#issuecomment-479008647
 
 
   @amansinha100, could you please take a look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi commented on issue #1728: DRILL-7089: Implement caching for TableMetadataProvider at query level and adapt statistics to use Drill metastore API

2019-04-02 Thread GitBox

vvysotskyi commented on issue #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728#issuecomment-479008544
 
 
   Diagrams of the classes introduced in this PR: 
https://docs.google.com/presentation/d/1XG_xgR4okzXaJ3Z7HFHfzCwlM5VkNfre8GFEAd2Zo8k/edit?usp=sharing


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vvysotskyi opened a new pull request #1728: DRILL-7089: Implement caching for TableMetadataProvider at query level and adapt statistics to use Drill metastore API

2019-04-02 Thread GitBox

vvysotskyi opened a new pull request #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728
 
 
   In the scope of this PR introduced caching of table metadata (schema and 
statistics) at the query level.
   Introduced `MetadataProviderManager` which holds both `SchemaProvider` and 
`DrillStatsTable` and `TableMetadataProvider` if it was already created.
   `MetadataProviderManager` instance will be cached and used for every 
`DrillTable` which corresponds to the same table.
   Such an approach was used to preserve lazy initialization of group scan and 
`TableMetadataProvider` instances, so once the first instance of 
`TableMetadataProvider` is created, it will be stored in the 
`MetadataProviderManager` and its metadata will be reused for all further 
`TableMetadataProvider` instances.
   
   Another part of this PR is connected with the adoption of statistics to use 
Drill Metastore API. Enhanced logic to distinguish exact and estimated 
metadata, and used `TableMetadata` for obtaining statistics.
   
   Will create and attach a class diagram later.
   
   Also, tests should be run for this PR, so for now, I'll leave it in draft 
state.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: Support default value for empty columns

2019-04-02 Thread GitBox

arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: 
Support default value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271265638
 
 

 ##
 File path: common/src/main/java/org/apache/drill/common/types/Types.java
 ##
 @@ -463,23 +462,29 @@ public static boolean usesHolderForGet(final MajorType 
type) {
 default:
   return true;
 }
-
   }
 
   public static boolean isFixedWidthType(final MajorType type) {
-switch(type.getMinorType()) {
+return isFixedWidthType(type.getMinorType());
+  }
+
+  public static boolean isFixedWidthType(final MinorType type) {
+return ! isVarWidthType(type);
+  }
+
+  public static boolean isVarWidthType(final MinorType type) {
+switch(type) {
 
 Review comment:
   ```suggestion
   switch (type) {
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: Support default value for empty columns

2019-04-02 Thread GitBox

arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: 
Support default value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271270255
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/NullableScalarWriter.java
 ##
 @@ -278,4 +278,9 @@ public void dump(HierarchicalFormatter format) {
 baseWriter.dump(format);
 format.endObject();
   }
+
+  @Override
+  public void setDefaultValue(Object value) {
+throw new UnsupportedOperationException("Default values not supported for 
nullable types");
 
 Review comment:
   Maybe include `value` into the error message?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: Support default value for empty columns

2019-04-02 Thread GitBox

arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: 
Support default value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271268972
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/convert/AbstractWriteConverter.java
 ##
 @@ -68,6 +68,11 @@ public ColumnMetadata schema() {
 return baseWriter.schema();
   }
 
+  @Override
+  public void setDefaultValue(Object value) {
+throw new IllegalStateException("Cannot set a default value through a 
shim; types conflict.");
 
 Review comment:
   Should we include `value` in the error message?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: Support default value for empty columns

2019-04-02 Thread GitBox

arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: 
Support default value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271269796
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/AbstractFixedWidthWriter.java
 ##
 @@ -93,17 +112,62 @@ protected final int prepareWrite(int writeIndex) {
 @Override
 protected final void fillEmpties(final int writeIndex) {
   final int width = width();
-  final int stride = ZERO_BUF.length / width;
+  final int stride = emptyValue.length / width;
   int dest = lastWriteIndex + 1;
   while (dest < writeIndex) {
 int length = writeIndex - dest;
 length = Math.min(length, stride);
-drillBuf.setBytes(dest * width, ZERO_BUF, 0, length * width);
+drillBuf.setBytes(dest * width, emptyValue, 0, length * width);
 dest += length;
   }
 }
   }
 
+  /**
+   * Base class for writers that use the Java int type as their native
+   * type. Handles common implicit conversions from other types to int.
+   */
+  public static abstract class BaseIntWriter extends BaseFixedWidthWriter {
+
+@Override
+public final void setLong(final long value) {
+  try {
+// Catches int overflow. Does not catch overflow for smaller types.
+setInt(Math.toIntExact(value));
+  } catch (final ArithmeticException e) {
+throw InvalidConversionError.writeError(schema(), value, e);
+  }
+}
+
+@Override
+public final void setDouble(final double value) {
 
 Review comment:
   Double covers Float as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: Support default value for empty columns

2019-04-02 Thread GitBox

arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: 
Support default value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271270612
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/OffsetVectorWriterImpl.java
 ##
 @@ -302,4 +312,9 @@ public void dump(HierarchicalFormatter format) {
   .attribute("nextOffset", nextOffset)
   .endObject();
   }
+
+  @Override
+  public void setDefaultValue(Object value) {
+throw new UnsupportedOperationException("Encoding not supported for offset 
vectors");
 
 Review comment:
   Same here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: Support default value for empty columns

2019-04-02 Thread GitBox

arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: 
Support default value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271268522
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/ScalarReader.java
 ##
 @@ -86,4 +87,10 @@
   LocalDate getDate();
   LocalTime getTime();
   Instant getTimestamp();
+
+  /**
+   * Return the value of the object using the extended type.
+   * @return
 
 Review comment:
   Please move add description to the return to avoid warnings in the IDE (just 
move upper line to `@return`).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: Support default value for empty columns

2019-04-02 Thread GitBox

arina-ielchiieva commented on a change in pull request #1726: DRILL-7143: 
Support default value for empty columns
URL: https://github.com/apache/drill/pull/1726#discussion_r271269107
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/impl/VectorPrinter.java
 ##
 @@ -33,7 +32,10 @@
   public static void printOffsets(UInt4Vector vector, int start, int length) {
 header(vector, start, length);
 for (int i = start, j = 0; j < length; i++, j++) {
-  if (j > 0) {
+  if (j % 40 == 0) {
 
 Review comment:
   How this will look like after the change?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1727: DRILL-7145: Exceptions happened during retrieving values from ValueVe…

2019-04-02 Thread GitBox

arina-ielchiieva commented on a change in pull request #1727: DRILL-7145: 
Exceptions happened during retrieving values from ValueVe…
URL: https://github.com/apache/drill/pull/1727#discussion_r271258333
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 loader.clear();
   }
 } catch (Exception e) {
-  exception = UserException.systemError(e).build(logger);
+  throw UserException.systemError(e).build(logger);
 
 Review comment:
   I don't think we should throw an exception here. We should stick to original 
approach and store it but just add method `getException()` in 
`AbstractDisposableUserClientConnection` class, similar to `getError()`. And 
then do proper handling of both.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #1727: DRILL-7145: Exceptions happened during retrieving values from ValueVe…

2019-04-02 Thread GitBox

arina-ielchiieva commented on a change in pull request #1727: DRILL-7145: 
Exceptions happened during retrieving values from ValueVe…
URL: https://github.com/apache/drill/pull/1727#discussion_r271258333
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 loader.clear();
   }
 } catch (Exception e) {
-  exception = UserException.systemError(e).build(logger);
+  throw UserException.systemError(e).build(logger);
 
 Review comment:
   I don't think we should throw an exception here. We should stick to original 
approach and store it but just add method getException() 
inAbstractDisposableUserClientConnection class, similar to getError(). And then 
do proper handling of both.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] aielchiieva commented on a change in pull request #1727: DRILL-7145: Exceptions happened during retrieving values from ValueVe…

2019-04-02 Thread GitBox

aielchiieva commented on a change in pull request #1727: DRILL-7145: Exceptions 
happened during retrieving values from ValueVe…
URL: https://github.com/apache/drill/pull/1727#discussion_r271257957
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 loader.clear();
   }
 } catch (Exception e) {
-  exception = UserException.systemError(e).build(logger);
+  throw UserException.systemError(e).build(logger);
 
 Review comment:
   I don't think we should throw an exception here. We should stick to original 
approach and store it but just add method `getException()` 
in`AbstractDisposableUserClientConnection` class, similar to `getError()`. And 
then do proper handling of both.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] aielchiieva commented on a change in pull request #1727: DRILL-7145: Exceptions happened during retrieving values from ValueVe…

2019-04-02 Thread GitBox

aielchiieva commented on a change in pull request #1727: DRILL-7145: Exceptions 
happened during retrieving values from ValueVe…
URL: https://github.com/apache/drill/pull/1727#discussion_r271257957
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java
 ##
 @@ -151,7 +151,7 @@ public void sendData(RpcOutcomeListener listener, 
QueryWritableBatch result
 loader.clear();
   }
 } catch (Exception e) {
-  exception = UserException.systemError(e).build(logger);
+  throw UserException.systemError(e).build(logger);
 
 Review comment:
   I don't think we should throw an exception here. We should stick to original 
approach and store it but just add method `getException()` 
in`AbstractDisposableUserClientConnection` class, similar to `getError()`. And 
then do proper handling of both.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] agozhiy opened a new pull request #1727: DRILL-7145: Exceptions happened during retrieving values from ValueVe…

2019-04-02 Thread GitBox

agozhiy opened a new pull request #1727: DRILL-7145: Exceptions happened during 
retrieving values from ValueVe…
URL: https://github.com/apache/drill/pull/1727
 
 
   …ctor are not being displayed at the Drill Web UI


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

ihuzenko commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271222800
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java
 ##
 @@ -206,11 +203,11 @@ private Result evaluateHelperFunction(Map recordValues, Function
 
 for(ExprNode arg : exprNode.args) {
   Result exprResult = evaluateHelper(recordValues, arg);
-  if (exprResult == Result.FALSE) {
-return exprResult;
-  }
-  if (exprResult == Result.INCONCLUSIVE) {
-result = Result.INCONCLUSIVE;
+  switch (exprResult) {
 
 Review comment:
   Suggested change will break the logic, here is a loop and when invocation of 
```evaluateHelper(recordValues, arg)``` returned ```Result.INCONCLUSIVE``` once 
it's still a chance that next iteration will return false I guess. Previously 
here was the chunk:
   ```java
   for(ExprNode arg : exprNode.args) {
 Result exprResult = evaluateHelper(recordValues, arg);
 if (exprResult == Result.FALSE) {
   return exprResult;
 }
 if (exprResult == Result.INCONCLUSIVE) {
   result = Result.INCONCLUSIVE;
 }
   }
   ```
   I see that my change made it more confusing, so I'll revert it back. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

ihuzenko commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271215506
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ##
 @@ -63,89 +58,38 @@ public Table getTable(String tableName) {
 return hiveSchema.getDrillTable(this.name, tableName);
   }
 
+  @Override
+  public Collection> getTableNamesAndTypes() {
+ensureInitTables();
+return tables.entrySet();
+  }
+
   @Override
   public Set getTableNames() {
+ensureInitTables();
+return tables.keySet();
+  }
+
+  private void ensureInitTables() {
 if (tables == null) {
   try {
-tables = Sets.newHashSet(mClient.getTableNames(this.name, 
schemaConfig.getIgnoreAuthErrors()));
-  } catch (final TException e) {
-logger.warn("Failure while attempting to access HiveDatabase '{}'.", 
this.name, e.getCause());
-tables = Sets.newHashSet(); // empty set.
+tables = mClient.getTableNamesAndTypes(this.name, 
schemaConfig.getIgnoreAuthErrors());
+  } catch (TException e) {
+logger.warn(String.format(
 
 Review comment:
   It's invocation of ```warn(String msg, Throwable t)``` which means that 
stack trace won't be missed in logs. Using string with placeholders ```{}``` 
and ```warn(String format, Object... arguments)``` most probably will just call 
```toString()``` on exception object and stack trace details won't be shown. 
   
 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] ihuzenko commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

ihuzenko commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271203347
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableEntryCacheLoader.java
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.exec.store.hive.ColumnListsCache;
+import org.apache.drill.exec.store.hive.HiveReadEntry;
+import org.apache.drill.exec.store.hive.HiveTableWithColumnCache;
+import org.apache.drill.exec.store.hive.HiveTableWrapper;
+import org.apache.drill.exec.store.hive.HiveUtilities;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.api.UnknownTableException;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * CacheLoader that synchronized on client and tries to reconnect when
+ * client fails. Used by {@link HiveMetadataCache}.
+ */
+final class TableEntryCacheLoader extends CacheLoader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(TableNameLoader.class);
+
+  private final DrillHiveMetaStoreClient client;
+
+  TableEntryCacheLoader(DrillHiveMetaStoreClient client) {
+this.client = client;
+  }
+
+
+  @Override
+  @SuppressWarnings("NullableProblems")
+  public HiveReadEntry load(TableName key) throws Exception {
+Table table;
+List partitions;
+synchronized (client) {
+  table = getTable(key);
+  partitions = getPartitions(key);
+}
+HiveTableWithColumnCache hiveTable = new HiveTableWithColumnCache(table, 
new ColumnListsCache(table));
+List partitionWrappers = 
partitions.isEmpty()
+? null
 
 Review comment:
   Good catch, the logic was here previously since the class was static nested. 
So I extracted it and preserved existing logic, but I'll try to use empty list 
and maybe somewhere else redundant null check will be removed too.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271150560
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableEntryCacheLoader.java
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.exec.store.hive.ColumnListsCache;
+import org.apache.drill.exec.store.hive.HiveReadEntry;
+import org.apache.drill.exec.store.hive.HiveTableWithColumnCache;
+import org.apache.drill.exec.store.hive.HiveTableWrapper;
+import org.apache.drill.exec.store.hive.HiveUtilities;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.api.UnknownTableException;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * CacheLoader that synchronized on client and tries to reconnect when
+ * client fails. Used by {@link HiveMetadataCache}.
+ */
+final class TableEntryCacheLoader extends CacheLoader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(TableNameLoader.class);
+
+  private final DrillHiveMetaStoreClient client;
+
+  TableEntryCacheLoader(DrillHiveMetaStoreClient client) {
+this.client = client;
+  }
+
+
+  @Override
+  @SuppressWarnings("NullableProblems")
+  public HiveReadEntry load(TableName key) throws Exception {
+Table table;
+List partitions;
+synchronized (client) {
+  table = getTable(key);
+  partitions = getPartitions(key);
+}
+HiveTableWithColumnCache hiveTable = new HiveTableWithColumnCache(table, 
new ColumnListsCache(table));
+List partitionWrappers = 
partitions.isEmpty()
+? null
 
 Review comment:
   Why not empty list instead of null in case of empty partitions list?
   Depends on the above answer you can use `Optional` or `Stream` 
`filter(Objects::nonNull)` for better stream chaining. You can ignore it, if 
you added `if` condition intentionally to avoid creation of `Optional` or 
`Stream` objects.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271145356
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableNameLoader.java
 ##
 @@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Function;
+
+import org.apache.calcite.schema.Schema.TableType;
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.stream.Collectors.toMap;
+import static org.apache.hadoop.hive.metastore.TableType.VIRTUAL_VIEW;
+
+/**
+ * CacheLoader that synchronized on client and tries to reconnect when
+ * client fails. Used by {@link HiveMetadataCache}.
+ */
+final class TableNameLoader extends CacheLoader> {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(TableNameLoader.class);
+
+  private final DrillHiveMetaStoreClient client;
+
+  TableNameLoader(DrillHiveMetaStoreClient client) {
+this.client = client;
+  }
+
+  @Override
+  @SuppressWarnings("NullableProblems")
+  public Map load(String dbName) throws Exception {
+List tableAndViewNames;
+final Set viewNames = new HashSet<>();
+synchronized (client) {
+  try {
+tableAndViewNames = client.getAllTables(dbName);
+viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW));
+  } catch (MetaException e) {
+  /*
+ HiveMetaStoreClient is encapsulating both the 
MetaException/TExceptions inside MetaException.
+ Since we don't have good way to differentiate, we will close older 
connection and retry once.
+ This is only applicable for getAllTables and getAllDatabases method 
since other methods are
+ properly throwing correct exceptions.
+  */
+logger.warn("Failure while attempting to get hive tables. Retries 
once.", e);
+AutoCloseables.closeSilently(client::close);
+client.reconnect();
+tableAndViewNames = client.getAllTables(dbName);
+viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW));
+  }
+}
+Function valueMapper = viewNames.isEmpty()
+? tableName -> TableType.TABLE
+: tableOrViewName -> viewNames.contains(tableOrViewName) ? 
TableType.VIEW : TableType.TABLE;
+return Collections.unmodifiableMap(tableAndViewNames.stream()
+.collect(toMap(Function.identity(), valueMapper)));
 
 Review comment:
   please follow the common way to use `Collectors` class name with a static 
method usage.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271177669
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableName.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.Objects;
+
+/**
+ * Combination of dbName and tableName fields used
 
 Review comment:
   ```suggestion
* Combination of database and table names used
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271145021
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableNameLoader.java
 ##
 @@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Function;
+
+import org.apache.calcite.schema.Schema.TableType;
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.stream.Collectors.toMap;
+import static org.apache.hadoop.hive.metastore.TableType.VIRTUAL_VIEW;
+
+/**
+ * CacheLoader that synchronized on client and tries to reconnect when
+ * client fails. Used by {@link HiveMetadataCache}.
+ */
+final class TableNameLoader extends CacheLoader> {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(TableNameLoader.class);
+
+  private final DrillHiveMetaStoreClient client;
+
+  TableNameLoader(DrillHiveMetaStoreClient client) {
+this.client = client;
+  }
+
+  @Override
+  @SuppressWarnings("NullableProblems")
+  public Map load(String dbName) throws Exception {
+List tableAndViewNames;
+final Set viewNames = new HashSet<>();
+synchronized (client) {
+  try {
+tableAndViewNames = client.getAllTables(dbName);
+viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW));
+  } catch (MetaException e) {
+  /*
+ HiveMetaStoreClient is encapsulating both the 
MetaException/TExceptions inside MetaException.
+ Since we don't have good way to differentiate, we will close older 
connection and retry once.
+ This is only applicable for getAllTables and getAllDatabases method 
since other methods are
+ properly throwing correct exceptions.
+  */
+logger.warn("Failure while attempting to get hive tables. Retries 
once.", e);
+AutoCloseables.closeSilently(client::close);
+client.reconnect();
+tableAndViewNames = client.getAllTables(dbName);
+viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW));
+  }
+}
+Function valueMapper = viewNames.isEmpty()
+? tableName -> TableType.TABLE
 
 Review comment:
   Please replace two-level ternary operator. We are trying to avoid it in 
Drill for readability.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271179071
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
 ##
 @@ -63,89 +58,38 @@ public Table getTable(String tableName) {
 return hiveSchema.getDrillTable(this.name, tableName);
   }
 
+  @Override
+  public Collection> getTableNamesAndTypes() {
+ensureInitTables();
+return tables.entrySet();
+  }
+
   @Override
   public Set getTableNames() {
+ensureInitTables();
+return tables.keySet();
+  }
+
+  private void ensureInitTables() {
 if (tables == null) {
   try {
-tables = Sets.newHashSet(mClient.getTableNames(this.name, 
schemaConfig.getIgnoreAuthErrors()));
-  } catch (final TException e) {
-logger.warn("Failure while attempting to access HiveDatabase '{}'.", 
this.name, e.getCause());
-tables = Sets.newHashSet(); // empty set.
+tables = mClient.getTableNamesAndTypes(this.name, 
schemaConfig.getIgnoreAuthErrors());
+  } catch (TException e) {
+logger.warn(String.format(
 
 Review comment:
   Why `String.format`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271160776
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ##
 @@ -920,46 +920,11 @@ public void dropTable(String table) {
 }
 
 @Override
-public List> getTableNamesAndTypes(boolean 
bulkLoad, int bulkSize) {
-  final List> tableNamesAndTypes = 
Lists.newArrayList();
-
-  // Look for raw tables first
-  if (!tables.isEmpty()) {
-for (Map.Entry tableEntry : 
tables.entrySet()) {
-  tableNamesAndTypes
-  .add(Pair.of(tableEntry.getKey().sig.name, 
tableEntry.getValue().getJdbcTableType()));
-}
-  }
-  // Then look for files that start with this name and end in .drill.
-  List files = Collections.emptyList();
-  try {
-files = DotDrillUtil.getDotDrills(getFS(), new 
Path(config.getLocation()), DotDrillType.VIEW);
-  } catch (AccessControlException e) {
-if (!schemaConfig.getIgnoreAuthErrors()) {
-  logger.debug(e.getMessage());
-  throw UserException.permissionError(e)
-  .message("Not authorized to list or query tables in schema 
[%s]", getFullSchemaName())
-  .build(logger);
-}
-  } catch (IOException e) {
-logger.warn("Failure while trying to list view tables in workspace 
[{}]", getFullSchemaName(), e);
-  } catch (UnsupportedOperationException e) {
-// the file system (e.g. the classpath filesystem) may not support 
listing
-// of files. But see getViews(), it ignores the exception and continues
-logger.debug("Failure while trying to list view tables in workspace 
[{}]", getFullSchemaName(), e);
-  }
-
-  try {
-for (DotDrillFile f : files) {
-  if (f.getType() == DotDrillType.VIEW) {
-tableNamesAndTypes.add(Pair.of(f.getBaseName(), TableType.VIEW));
-  }
-}
-  } catch (UnsupportedOperationException e) {
-logger.debug("The filesystem for this workspace does not support this 
operation.", e);
 
 Review comment:
   What about logging?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271164002
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java
 ##
 @@ -266,8 +266,7 @@ private void scanSchema(String schemaPath, SchemaPlus 
schema) {
*/
   public void visitTables(String schemaPath, SchemaPlus schema) {
 final AbstractSchema drillSchema = schema.unwrap(AbstractSchema.class);
-final List tableNames = Lists.newArrayList(schema.getTableNames());
-for(Pair tableNameToTable : 
drillSchema.getTablesByNames(tableNames)) {
+for(Pair tableNameToTable : 
drillSchema.getTablesByNames(schema.getTableNames())) {
 
 Review comment:
   ```suggestion
   for (Pair tableNameToTable : 
drillSchema.getTablesByNames(schema.getTableNames())) {
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271177349
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableName.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.Objects;
+
+/**
+ * Combination of dbName and tableName fields used
+ * to represent key for getting table data from cache.
+ */
+final class TableName {
+
+  private final String dbName;
+
 
 Review comment:
   ```suggestion
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271160223
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ##
 @@ -67,23 +66,24 @@
 import org.apache.drill.exec.store.AbstractSchema;
 import org.apache.drill.exec.store.PartitionNotFoundException;
 import org.apache.drill.exec.store.SchemaConfig;
-import org.apache.drill.exec.util.DrillFileSystemUtil;
 import org.apache.drill.exec.store.StorageStrategy;
 import org.apache.drill.exec.store.easy.json.JSONFormatPlugin;
+import org.apache.drill.exec.util.DrillFileSystemUtil;
 import org.apache.drill.exec.util.ImpersonationUtil;
+import org.apache.drill.shaded.guava.com.google.common.base.Joiner;
+import org.apache.drill.shaded.guava.com.google.common.base.Strings;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.fs.permission.FsAction;
 import org.apache.hadoop.fs.permission.FsPermission;
 import org.apache.hadoop.security.AccessControlException;
 
-import com.fasterxml.jackson.databind.ObjectMapper;
-import org.apache.drill.shaded.guava.com.google.common.base.Joiner;
-import org.apache.drill.shaded.guava.com.google.common.base.Strings;
-import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
-import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+import static java.util.Collections.unmodifiableList;
 
 Review comment:
   Usually we don't touch imports ordering, since different IDE can change it 
for a lot of classes.
   But it is ok here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271144225
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/client/TableNameLoader.java
 ##
 @@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.client;
+
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Function;
+
+import org.apache.calcite.schema.Schema.TableType;
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.stream.Collectors.toMap;
+import static org.apache.hadoop.hive.metastore.TableType.VIRTUAL_VIEW;
+
+/**
+ * CacheLoader that synchronized on client and tries to reconnect when
+ * client fails. Used by {@link HiveMetadataCache}.
+ */
+final class TableNameLoader extends CacheLoader> {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(TableNameLoader.class);
+
+  private final DrillHiveMetaStoreClient client;
+
+  TableNameLoader(DrillHiveMetaStoreClient client) {
+this.client = client;
+  }
+
+  @Override
+  @SuppressWarnings("NullableProblems")
+  public Map load(String dbName) throws Exception {
+List tableAndViewNames;
+final Set viewNames = new HashSet<>();
+synchronized (client) {
+  try {
+tableAndViewNames = client.getAllTables(dbName);
+viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW));
+  } catch (MetaException e) {
+  /*
+ HiveMetaStoreClient is encapsulating both the 
MetaException/TExceptions inside MetaException.
+ Since we don't have good way to differentiate, we will close older 
connection and retry once.
+ This is only applicable for getAllTables and getAllDatabases method 
since other methods are
+ properly throwing correct exceptions.
+  */
+logger.warn("Failure while attempting to get hive tables. Retries 
once.", e);
+AutoCloseables.closeSilently(client::close);
+client.reconnect();
+tableAndViewNames = client.getAllTables(dbName);
+viewNames.addAll(client.getTables(dbName, "*", VIRTUAL_VIEW));
+  }
+}
+Function valueMapper = viewNames.isEmpty()
+? tableName -> TableType.TABLE
+: tableOrViewName -> viewNames.contains(tableOrViewName) ? 
TableType.VIEW : TableType.TABLE;
+return Collections.unmodifiableMap(tableAndViewNames.stream()
+.collect(toMap(Function.identity(), valueMapper)));
+  }
+
+}
 
 Review comment:
   ```suggestion
   }
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271161918
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java
 ##
 @@ -206,11 +203,11 @@ private Result evaluateHelperFunction(Map recordValues, Function
 
 for(ExprNode arg : exprNode.args) {
 
 Review comment:
   ```suggestion
for (ExprNode arg : exprNode.args) {
   ```
   please edit in 3 other cases in this class


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive schema show tables performance

2019-04-02 Thread GitBox

vdiravka commented on a change in pull request #1706: DRILL-7115: Improve Hive 
schema show tables performance
URL: https://github.com/apache/drill/pull/1706#discussion_r271163913
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java
 ##
 @@ -206,11 +203,11 @@ private Result evaluateHelperFunction(Map recordValues, Function
 
 for(ExprNode arg : exprNode.args) {
   Result exprResult = evaluateHelper(recordValues, arg);
-  if (exprResult == Result.FALSE) {
-return exprResult;
-  }
-  if (exprResult == Result.INCONCLUSIVE) {
-result = Result.INCONCLUSIVE;
+  switch (exprResult) {
 
 Review comment:
   consider
   ```
   if (exprResult == Result.FALSE || exprResult == Result.INCONCLUSIVE) {
 return exprResult;
   }
   ```
   i find it simpler


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #1726: DRILL-7143: Support default value for empty columns

2019-04-02 Thread GitBox

paul-rogers commented on issue #1726: DRILL-7143: Support default value for 
empty columns
URL: https://github.com/apache/drill/pull/1726#issuecomment-478882176
 
 
   @arina-ielchiieva, here is a first-cut at the improved default values. Have 
tested selected mechanisms and CSV with schema. Have not yet run the full set 
of unit tests. Consider this a "preview" to begin the code review in parallel 
with the remaining busy-work needed to complete the PR. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers opened a new pull request #1726: DRILL-7143: Support default value for empty columns

2019-04-02 Thread GitBox

paul-rogers opened a new pull request #1726: DRILL-7143: Support default value 
for empty columns
URL: https://github.com/apache/drill/pull/1726
 
 
   Modifies the prior work to add default values for columns. The prior work 
added defaults
   when the entire column is missing from a reader (the old Nullable Int 
column). The Row
   Set mechanism now will also "fill empty" slots with the default value.
   
   Added default support for the column writers. The writers automatically 
obtain the
   default value from the column schema. The default can also be set explicitly 
on
   the column writer.
   
   Updated the null column mechanism to use this feature rather than the ad-hoc
   implemention in the prior commit.
   
   Semantics changed a bit. Only Required columns take a default. The default 
value
   is ignored or nullable columns since nullable columns already have a file 
default: NULL.
   
   Updated the CSV-with-schema tests to illustrate the new behavior.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

72 matches

Mail list logo