[jira] [Resolved] (DRILL-4848) Minor correction in UDF's documentation

2016-10-04 Thread Bridget Bevens (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens resolved DRILL-4848.
---
Resolution: Fixed

Changed step 3 to the following:
The drill-module.conf file should contain the packages to scan for functions 
drill.classpath.scanning.packages+= "com.mydomain.drill.fn". Separate package 
names with a comma.

Setting bug status to resolved.

> Minor correction in UDF's documentation
> ---
>
> Key: DRILL-4848
> URL: https://issues.apache.org/jira/browse/DRILL-4848
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.2.0, 1.4.0, 1.6.0
>Reporter: Rahul Challapalli
>Assignee: Bridget Bevens
>
> https://drill.apache.org/docs/adding-custom-functions-to-drill/
> We should change the 3rd point in the custom functions documentation page (1) 
> from
> {code}
> drill.classpath.scanning.package+= "com.mydomain.drill.fn"
> {code}
> to
> {code}
> drill.classpath.scanning.packages+= "com.mydomain.drill.fn"
> {code}
> (1) https://drill.apache.org/docs/adding-custom-functions-to-drill/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Drill Hangout Minutes - 10/4/16

2016-10-04 Thread Zelaine Fong
Attendees - Roman, Vitalii, Sorabh, Sudheesh, Kunal, Anil Kumar, Arina,
Padma, Laurent, Paul, Khurram, Gautam, Zelaine

1) Laurent - Client side changes to support metadata queries

Laurent indicated that the server side changes corresponding to these
client side changes were already previously committed by Venki.  They've
worked with Simba on the ODBC driver changes to take advantage of the new
APIs, and have tested against Tableau.  Laurent said he didn't have
performance numbers for these improvements.  Paul asked if there's a
writeup corresponding to improvements.  Laurent indicated that he felt the
information documented in the Jira and pull request should cover this.

2) Anil Kumar - Kafka plugin

Anil was looking for guidance on how to build this plugin.  He will start
work on this and hopes to have something in about a month.  After that, he
wants to work on a Cassandra plugin.  He'll probably start with the work
previously done by Yash (DRILL-92).


[jira] [Created] (DRILL-4930) Metadata results are not sorted

2016-10-04 Thread Laurent Goujon (JIRA)
Laurent Goujon created DRILL-4930:
-

 Summary: Metadata results are not sorted
 Key: DRILL-4930
 URL: https://issues.apache.org/jira/browse/DRILL-4930
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Reporter: Laurent Goujon
Priority: Minor


According to JDBC and ODBC specs, metadata results should be ordered. 
Currently, results are unordered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4929) Drill unable to propagate selectivity/distinctrowcount through RelSubset

2016-10-04 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-4929:
-

 Summary: Drill unable to propagate selectivity/distinctrowcount 
through RelSubset
 Key: DRILL-4929
 URL: https://issues.apache.org/jira/browse/DRILL-4929
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai


Drill only has access to the best alternative plan. Calcite needs to expose the 
set.rel within RelSubset which can be utilized during Drill logical planning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [HANGOUT] Topics for 10/04/16

2016-10-04 Thread Jacques Nadeau
Both the C++ and the JDBC changes are updates that leverage a number of
pre-existing APIs already on the server. Our initial evaluations, we have
already seen substantially improved BI tool performance with the proposed
changes (with no additional server side changes). Are you seeing something
different? If you haven't yet looked at the changes in that light, I
suggest you do.

If anything, I'm more concerned about client feature proposals that don't
cover both the C++ and Java client. For example, I think we should be
cautious about merging something like DRILL-4280. We should be cautious
about introducing new server APIs unless there is a concrete plan around
support in all clients.

So I agree with the spirit of your ask: change proposals should be
"complete". However, I don't think it reasonably applies to the changes
proposed by Laurent. His changes "complete" the already introduced metadata
and prepare apis the server exposes. It provides an improved BI user
experience. It also introduces unit tests in the C++ client, something that
was previously sorely missing.



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, Oct 4, 2016 at 9:47 AM, Parth Chandra  wrote:

> Hi guys,
>
>   I won't be able to join the hangout but it would be good to discuss the
> plan for the related backend changes.
>
>   As I mentioned before I would like to see a concrete proposal for the
> backend that will accompany these changes. Without that, I feel there is no
> point to adding so much new code.
>
> Thanks
>
> Parth
>
>
> On Mon, Oct 3, 2016 at 7:52 PM, Laurent Goujon  wrote:
>
> > Hi,
> >
> > I'm currently working on improving metadata support for both the JDBC
> > driver and the C++ connector, more specifically the following JIRAs:
> >
> > DRILL-4853: Update C++ protobuf source files
> > DRILL-4420: Server-side metadata and prepared-statement support for C++
> > connector
> > DRILL-4880: Support JDBC driver registration using ServiceLoader
> > DRILL-4925: Add tableType filter to GetTables metadata query
> > DRILL-4730: Update JDBC DatabaseMetaData implementation to use new
> Metadata
> > APIs
> >
> > I  already opened multiple pull requests for those (the list is available
> > at https://github.com/apache/drill/pulls/laurentgo)
> >
> > I'm planning to join tomorrow hangout in case people have questions about
> > those.
> >
> > Cheers,
> >
> > Laurent
> >
> > On Mon, Oct 3, 2016 at 10:28 AM, Subbu Srinivasan <
> ssriniva...@zscaler.com
> > >
> > wrote:
> >
> > > Can we close on https://github.com/apache/drill/pull/518 ?
> > >
> > > On Mon, Oct 3, 2016 at 10:27 AM, Sudheesh Katkam 
> > > wrote:
> > >
> > > > Hi drillers,
> > > >
> > > > Our bi-weekly hangout is tomorrow (10/04/16, 10 AM PT). If you have
> any
> > > > suggestions for hangout topics, you can add them to this thread. We
> > will
> > > > also ask around at the beginning of the hangout for topics.
> > > >
> > > > Thank you,
> > > > Sudheesh
> > > >
> > >
> >
>


Hangout starting now..

2016-10-04 Thread Sudheesh Katkam
Link: https://hangouts.google.com/hangouts/_/maprtech.com/drillbi-weeklyhangout 


Thank you,
Sudheesh

Re: [HANGOUT] Topics for 10/04/16

2016-10-04 Thread Sudheesh Katkam
Join us at this link:

https://hangouts.google.com/hangouts/_/maprtech.com/drillbi-weeklyhangout 


> On Oct 3, 2016, at 9:26 PM, Shadi Khalifa  wrote:
> 
> Hi,
> I have been working on integrating WEKA into Drill to support building and 
> scoring classification models. I have been successful in supporting all WEKA 
> classifiers and making them run in a distributed fashion over Drill 1.2. The 
> classifier accuracy is not affected by running in a distributed fashion and 
> the training and scoring times are getting a huge boost using Drill. A paper 
> on this has been published in the IEEE symposium on Big Data in June 2016 
> [available: 
> http://cs.queensu.ca/~khalifa/qdrill/QDrill_20160212IEEE_CameraReady.pdf] and 
> we are now in the process of publishing another paper in which QDrill 
> supports all WEKA algorithms. FYI, this can be easily extended to support 
> clustering and other types of WEKA algorithms. The architecture also allows 
> supporting other data mining libraries.
> The QDrill project website is  http://cs.queensu.ca/~khalifa/qdrill, the 
> project downloadable version on it is little bit old but I'm planning to 
> upload a more updated stable version within the next 10 days. I'm also using 
> an SVN repository and planning to move the project to GitHub to make it 
> easier to get the latest Drill versions and to may be integrate with Drill at 
> some point. 
> Unfortunately, I have another meeting tomorrow at the same time of the 
> hangout, but I would love to know your opinion and to discuss the process of 
> evaluating this extension and may be integrating it with Drill at some point. 
> Regards
> Shadi KhalifaPhD CandidateSchool of Computing Queen's University Canada
> I'm just a neuron in the society collective brain
> 
> 01001001 0010 01101100 0110 01110110 01100101 0010 01000101 
> 01100111 0001 0111 01110100 
> P Please consider your environmental responsibility before printing this 
> e-mail
> 
> 
> 
>On Monday, October 3, 2016 10:52 PM, Laurent Goujon  
> wrote:
> 
> 
> Hi,
> 
> I'm currently working on improving metadata support for both the JDBC
> driver and the C++ connector, more specifically the following JIRAs:
> 
> DRILL-4853: Update C++ protobuf source files
> DRILL-4420: Server-side metadata and prepared-statement support for C++
> connector
> DRILL-4880: Support JDBC driver registration using ServiceLoader
> DRILL-4925: Add tableType filter to GetTables metadata query
> DRILL-4730: Update JDBC DatabaseMetaData implementation to use new Metadata
> APIs
> 
> I  already opened multiple pull requests for those (the list is available
> at https://github.com/apache/drill/pulls/laurentgo)
> 
> I'm planning to join tomorrow hangout in case people have questions about
> those.
> 
> Cheers,
> 
> Laurent
> 
> On Mon, Oct 3, 2016 at 10:28 AM, Subbu Srinivasan 
> wrote:
> 
>> Can we close on https://github.com/apache/drill/pull/518 ?
>> 
>> On Mon, Oct 3, 2016 at 10:27 AM, Sudheesh Katkam 
>> wrote:
>> 
>>> Hi drillers,
>>> 
>>> Our bi-weekly hangout is tomorrow (10/04/16, 10 AM PT). If you have any
>>> suggestions for hangout topics, you can add them to this thread. We will
>>> also ask around at the beginning of the hangout for topics.
>>> 
>>> Thank you,
>>> Sudheesh
>>> 
>> 
> 
> 



Re: [HANGOUT] Topics for 10/04/16

2016-10-04 Thread Parth Chandra
Hi guys,

  I won't be able to join the hangout but it would be good to discuss the
plan for the related backend changes.

  As I mentioned before I would like to see a concrete proposal for the
backend that will accompany these changes. Without that, I feel there is no
point to adding so much new code.

Thanks

Parth


On Mon, Oct 3, 2016 at 7:52 PM, Laurent Goujon  wrote:

> Hi,
>
> I'm currently working on improving metadata support for both the JDBC
> driver and the C++ connector, more specifically the following JIRAs:
>
> DRILL-4853: Update C++ protobuf source files
> DRILL-4420: Server-side metadata and prepared-statement support for C++
> connector
> DRILL-4880: Support JDBC driver registration using ServiceLoader
> DRILL-4925: Add tableType filter to GetTables metadata query
> DRILL-4730: Update JDBC DatabaseMetaData implementation to use new Metadata
> APIs
>
> I  already opened multiple pull requests for those (the list is available
> at https://github.com/apache/drill/pulls/laurentgo)
>
> I'm planning to join tomorrow hangout in case people have questions about
> those.
>
> Cheers,
>
> Laurent
>
> On Mon, Oct 3, 2016 at 10:28 AM, Subbu Srinivasan  >
> wrote:
>
> > Can we close on https://github.com/apache/drill/pull/518 ?
> >
> > On Mon, Oct 3, 2016 at 10:27 AM, Sudheesh Katkam 
> > wrote:
> >
> > > Hi drillers,
> > >
> > > Our bi-weekly hangout is tomorrow (10/04/16, 10 AM PT). If you have any
> > > suggestions for hangout topics, you can add them to this thread. We
> will
> > > also ask around at the beginning of the hangout for topics.
> > >
> > > Thank you,
> > > Sudheesh
> > >
> >
>


[jira] [Created] (DRILL-4928) Error o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.

2016-10-04 Thread evan912 (JIRA)
evan912 created DRILL-4928:
--

 Summary: Error o.a.d.exec.rpc.RpcExceptionHandler - Exception in 
RPC communication. 
 Key: DRILL-4928
 URL: https://issues.apache.org/jira/browse/DRILL-4928
 Project: Apache Drill
  Issue Type: Bug
Reporter: evan912


2016-10-04 23:39:22,129 [Client-1] ERROR o.a.d.exec.rpc.RpcExceptionHandler - 
Exception in RPC communication.  Connection: /127.0.0.1:50632 <--> 
kelvin-/127.0.1.1:31010 (user client).  Closing connection.
io.netty.handler.codec.EncoderException: java.lang.IllegalAccessError: tried to 
access field io.netty.buffer.PooledByteBufAllocator.threadCache from class 
io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator
at 
io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:107)
 ~[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
 ~[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:691)
 ~[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:626)
 ~[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:284) 
~[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
 ~[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:691)
 ~[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:681)
 ~[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:716)
 ~[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:954)
 ~[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:244) 
~[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:121) 
~[drill-rpc-1.8.0.jar:1.8.0]
at 
org.apache.drill.exec.rpc.BasicClient$ConnectionMultiListener$ConnectionHandler.operationComplete(BasicClient.java:216)
 [drill-rpc-1.8.0.jar:1.8.0]
at 
org.apache.drill.exec.rpc.BasicClient$ConnectionMultiListener$ConnectionHandler.operationComplete(BasicClient.java:196)
 [drill-rpc-1.8.0.jar:1.8.0]
at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
 [spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
 [spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
 [spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406) 
[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
 [spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:255)
 [spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:290)
 [spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) 
[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 [spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
[spark-1.6.2-yarn-shuffle.jar:1.6.2]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 [spark-1.6.2-yarn-shuffle.jar:1.6.2]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly

2016-10-04 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r81736781
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
@@ -918,18 +916,22 @@ public void setMax(Object max) {
 @JsonProperty public ConcurrentHashMap columnTypeInfo;
 @JsonProperty List files;
 @JsonProperty List directories;
-@JsonProperty String drillVersion;
--- End diff --

I agree regarding future similar issues. So I returned this field to 
`ParquetTableMetadataBase` in the new commit. 
But to determine that the parquet file is new and definitely with correct 
date values we have a new flag in parquet metadata "is.date.correct = true". 
Thus, it is more easy to determine corrupted date values and there is no need 
to wait the end of release to merge these changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4927) Add support for Null Equality Joins

2016-10-04 Thread Roman (JIRA)
Roman created DRILL-4927:


 Summary: Add support for Null Equality Joins
 Key: DRILL-4927
 URL: https://issues.apache.org/jira/browse/DRILL-4927
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning & Optimization
Affects Versions: 1.8.0
Reporter: Roman
Assignee: Roman


Join with an equality condition which allows null=null fails. For example, if 
we use some of this queries:

{code:sql}
select ... FROM t1, t2 WHERE t1.c1 = t2.c2 OR (t1.c1 IS NULL AND t2.c2 IS NULL);

select ... FROM t1 INNER JOIN  t2 ON  t1.c1 = t2.c2 OR (t1.c1 IS NULL AND t2.c2 
IS NULL);
{code}

we got "UNSUPPORTED_OPERATION ERROR". We should add support of this option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4926) Add support for Null Equality Joins

2016-10-04 Thread Roman (JIRA)
Roman created DRILL-4926:


 Summary: Add support for Null Equality Joins
 Key: DRILL-4926
 URL: https://issues.apache.org/jira/browse/DRILL-4926
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning & Optimization
Affects Versions: 1.8.0
Reporter: Roman
Assignee: Roman


Join with an equality condition which allows null=null fails. For example, if 
we use some of this queries:

{code:sql}
select ... FROM t1, t2 WHERE t1.c1 = t2.c2 OR (t1.c1 IS NULL AND t2.c2 IS NULL);

select ... FROM t1 INNER JOIN  t2 ON  t1.c1 = t2.c2 OR (t1.c1 IS NULL AND t2.c2 
IS NULL);
{code}

we got "UNSUPPORTED_OPERATION ERROR". We should add support of this option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request #595: DRILL-4203: Parquet File. Date is stored wrongly

2016-10-04 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/595#discussion_r81722838
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java
 ---
@@ -104,6 +104,9 @@ public ScanBatch getBatch(FragmentContext context, 
ParquetRowGroupScan rowGroupS
   logger.trace("ParquetTrace,Read Footer,{},{},{},{},{},{},{}", 
"", e.getPath(), "", 0, 0, 0, timeToRead);
   footers.put(e.getPath(), footer );
 }
+boolean autoCorrectCorruptDates = 
rowGroupScan.formatConfig.autoCorrectCorruptDates;
--- End diff --

We are going there only to detect corrupt dates according the option in the 
parquet format config. 
I add the log message below regarding DateCorruptionStatus and also in 
other places where we `detectCorruptDates`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---