[jira] [Created] (DRILL-4622) Need a better error message

2016-04-20 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4622:
-

 Summary: Need a better error message
 Key: DRILL-4622
 URL: https://issues.apache.org/jira/browse/DRILL-4622
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.6.0
Reporter: Khurram Faraaz
Priority: Minor


Need a better error message.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select id from (values(values(10))) tbl(id);
Error: SYSTEM ERROR: AssertionError: Internal error: Conversion to relational 
algebra failed to preserve datatypes:
validated type:
RecordType(INTEGER id) NOT NULL
converted type:
RecordType(INTEGER NOT NULL id) NOT NULL
rel:
LogicalProject(id=[$0])
  LogicalValues(tuples=[[{ 10 }]])



[Error Id: cc1f141e-97b5-43fe-a039-709d92dcacaf on centos-03.qa.lab:31010] 
(state=,code=0)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Operator unit test framework merged

2016-04-20 Thread Jacques Nadeau
Great Jason, thanks for pulling this together!

Jacques
On Apr 20, 2016 9:24 AM, "Jason Altekruse"  wrote:

> Hello all,
>
> I finally got a chance to do some final minor fixes and merge the operator
> unit test framework I posted a while back, thanks again to Path for doing a
> review on it. There are still some enhancements I would like to add to make
> the tests more flexible, but for examples of what can be done with the
> current version please check out the tests that were included with the
> patch [1]. Please don't hesitate to ask questions or suggest improvements.
> I think that writing tests in smaller units like this could go a long way
> in improving our coverage and ensure that we can write tests that
> consistently cover a particular execution path, independent of the query
> planner.
>
> For anyone looking to get more familiar with how Drill executes operations,
> these tests might be a little easier way to start getting antiquated with
> the internals of Drill. The tests mock a number of the more complex parts
> of the system and try to produce a minimal environment where a single
> operation can run.
>
> [1] -
>
> https://github.com/apache/drill/blob/d93a3633815ed1c7efd6660eae62b7351a2c9739/exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/BasicPhysicalOpUnitTest.java
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>


[GitHub] drill pull request: DRILL-3317: when ProtobufLengthDecoder couldn'...

2016-04-20 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/446#issuecomment-212587912
  
Thanks for trying to put together a test.

Can you include the fix to correct the issue about the root allocator as 
well. I haven't had a chance to look at that but it would explain why those 
numbers have been as low as they have been.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Getting back on Calcite master: only a few steps left

2016-04-20 Thread Julian Hyde
Regarding https://issues.apache.org/jira/browse/CALCITE-1150, "Create the a new 
DynamicRecordType, avoiding star expansion when working with this type”. This 
feature will be useful, and as I have said to Jacques would fit well within 
Calcite, but it’s a bit shapeless to me right now. I would like to see some 
validator tests (any maybe one or two sql-to-rel converter tests) so I get a 
feel for how it would work. When you have some tests can you post them in the 
JIRA case? I don’t think you should charge ahead with the implementation 
because I might not agree with the specification (i.e. the test cases).

I’l make the same comment in the JIRA case, and let’s continue the discussion 
there.

Julian



> On Apr 19, 2016, at 9:51 AM, Jinfeng Ni  wrote:
> 
> @Jacques,
> 
> Sorry for the delay. Let me spend couple of days this week to get
> CALCITE-1150 back on track. Initially, I encountered one conflicting
> change in Calcite master, and broke couple of unit tests. If I can not
> get them solved, I'll ping you or Minji for a discussion.
> 
> 
> 
> On Mon, Apr 18, 2016 at 5:31 PM, Jacques Nadeau  wrote:
>> Hey All,
>> 
>> Following up to get a status update. We made some good initial progress but
>> it seems like people may have hit some challenges (or distractions). Can
>> everyone report on how they are doing?
>> 
>> Jinfeng, how are tests for CALCITE-1150 going? Can Minji help get together
>> test cases for CALCITE-1150? Maybe you could provide guidance on the set of
>> queries to test?
>> 
>> thanks,
>> Jacques
>> 
>> 
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>> 
>> On Thu, Mar 31, 2016 at 4:19 PM, Julian Hyde  wrote:
>> 
>>> I’ve closed 1149, if we don’t need the feature.
>>> 
>>> Yes, we need a unit test for 1151. I offered a suggestion how.
>>> 
 On Mar 31, 2016, at 11:59 AM, Sudheesh Katkam 
>>> wrote:
 
 I submitted a patch for CALCITE-1151 <
>>> https://issues.apache.org/jira/browse/CALCITE-1151> (with changes to
>>> resolve a checkstyle error). I am waiting for comments regarding the unit
>>> test.
 
 I added a comment to CALCITE-1149 <
>>> https://issues.apache.org/jira/browse/CALCITE-1149> with the workaround
>>> being used.
 
 Thank you,
 Sudheesh
 
> On Mar 16, 2016, at 5:19 PM, Jacques Nadeau  wrote:
> 
> Yes, I'm trying to work through the failing unit tests.
> 
> I merged your change.
> 
> In the future you can pick compare & create pull request on your branch
>>> and
> then change the target repo from apache to mine.
> 
> thanks,
> Jacques
> 
> 
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> 
> On Wed, Mar 16, 2016 at 4:39 PM, Aman Sinha 
>>> wrote:
> 
>> Jacques, I wasn't sure how to create a pull request against your
>>> branch;
>> for  CALCITE-1108 you can cherry-pick from here:
>> 
>> 
>>> https://github.com/amansinha100/incubator-calcite/commits/calcite-drill-2
>> 
>> BTW,  there are unit test failures on your branch which I assume is
>> expected for now ?
>> 
>> On Tue, Mar 15, 2016 at 6:56 PM, Jacques Nadeau 
>> wrote:
>> 
>>> Why don't you guys propose patches for my branch and I'll incorporate
>> until
>>> we get to a good state. Once we feel good about it, I'll clean up the
>>> revision history.
>>> 
>>> --
>>> Jacques Nadeau
>>> CTO and Co-Founder, Dremio
>>> 
>>> On Tue, Mar 15, 2016 at 11:01 AM, Jinfeng Ni 
>>> wrote:
>>> 
 I'll add test for CALCITE-1150.
 
 
 
 On Tue, Mar 15, 2016 at 9:45 AM, Sudheesh Katkam <
>>> skat...@maprtech.com
>>> 
 wrote:
> CALCITE-1149 [Extend CALCITE-845] <
 
>>> 
>> 
>>> https://github.com/mapr/incubator-calcite/commit/bd73728a8297e15331ae956096eab0e15b3f
 
 does not need to be committed into Calcite. DRILL-4372 <
 https://issues.apache.org/jira/browse/DRILL-4372> supersedes that
>> patch.
> 
> I will add a test case for CALCITE-1151.
> 
> Thank you,
> Sudheesh
> 
>> On Mar 15, 2016, at 9:04 AM, Aman Sinha 
>> wrote:
>> 
>> I'll add a test for CALCITE-1108.   For 1105 I am not yet sure but
>>> will
>> look through the old drill commits to see what test was added
>>> there.
>> 
>> On Sun, Mar 13, 2016 at 11:15 PM, Minji Kim 
>> wrote:
>> 
>>> I will add more test cases to CALCITE-1148 in addition to the ones
 already
>>> there.  I noticed a few more problems while testing the patch
>> against
 drill
>>> master.  I am still working through these issues, so I 

Re: Drill v1.6 and s3n connection

2016-04-20 Thread Jason Altekruse
Thanks Bridget, let me know if you need any other info from me or want me
to review the changes.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Wed, Apr 20, 2016 at 11:28 AM, Bridget Bevens 
wrote:

> Created DRILL-4621  to
> track doc change request.
>
> Thanks,
> Bridget
>
> On Wed, Apr 20, 2016 at 10:10 AM, Jason Altekruse 
> wrote:
>
> > It looks like a number of doc pages can be improved by referencing some
> > changes made recently.
> >
> > With the inclusion of the needed jars for s3a with Drill, there is no
> > longer a need to download jets3t [1]. In addition to setting your
> > credentials, this option for allowing more concurrent connections
> > (necessary to allow reads of wider parquet files) can also be set in this
> > block instead of a core-site.xml file [2].
> >
> > This config block can actually be used to set any filesystem properties.
> > Some of these are custom to a particular filesystem like S3, but a number
> > of them are used by a variety of implementations of the HDFS interface.
> Any
> > properties like these [3] should be able to be set in this config block.
> >
> > [1] -
> >
> https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/
> > [2] -
> >
> >
> https://drill.apache.org/docs/s3-storage-plugin/#quering-parquet-format-files-on-s3
> > [3] -
> >
> >
> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/core-default.xml
> >
> > Jason Altekruse
> > Software Engineer at Dremio
> > Apache Drill Committer
> >
> > On Wed, Apr 20, 2016 at 9:52 AM, Abhishek Girish <
> > abhishek.gir...@gmail.com>
> > wrote:
> >
> > > Thanks Jason! I hadn't noticed the config property for S3. I tried this
> > out
> > > now, and feel it is a lot more easier now.
> > >
> > > And yes, we should definitely update the docs. There have been quite a
> > few
> > > threads related to S3 config.
> > >
> > > On Wed, Apr 20, 2016 at 8:19 AM, Jason Altekruse 
> > wrote:
> > >
> > > > I don't believe there is any way in which a particular bucket has a
> > > > property of being s3, s3n or s3a. As I understand it, this only
> change
> > > the
> > > > client library that is used to interface with S3. We have included
> the
> > > jars
> > > > necessary for s3a with Drill, which is the newest and most performant
> > > > option available.
> > > >
> > > > I need to open a doc JIRA for this, but there is one way in which the
> > s3
> > > > experience was improved recently to prevent the need to restart Drill
> > to
> > > > add your S3 credentials. When you create a connection to an S3
> bucket,
> > > you
> > > > can now specify your credentials in a property named "config" in the
> > > > storage plugin. This allows you to set any filesystem properties,
> which
> > > we
> > > > previously was only possible to set with a core-site.xml file on the
> > > > classpath when starting Drill.
> > > >
> > > > Example:
> > > > {
> > > >   "type": "file",
> > > >   "enabled": true,
> > > >   "connection": "s3a://address.of.your.bucket/",
> > > >   "config": {
> > > > "fs.s3a.access.key": "",
> > > > "fs.s3a.secret.key": ""
> > > >   },
> > > >   "workspaces": {
> > > > "root": {
> > > >   "location": "/",
> > > >   "writable": false,
> > > >   "defaultInputFormat": null
> > > > }
> > > >   },
> > > >   "formats": {
> > > > "psv": {
> > > >   "type": "text",
> > > >   "extensions": [
> > > > "tbl"
> > > >   ],
> > > >   "delimiter": "|"
> > > > }, ...
> > > >
> > > >
> > > > Jason Altekruse
> > > > Software Engineer at Dremio
> > > > Apache Drill Committer
> > > >
> > > > On Wed, Apr 20, 2016 at 7:40 AM, Nick Monetta 
> wrote:
> > > >
> > > > > Hi,
> > > > > Does Drill v1.6 still support s3n connections or just s3a?
> > > > >
> > > > > I have a s3n S3 bucket that I'm trying to connect to and it will
> not
> > > > work.
> > > > > My config is:
> > > > >
> > > > > {
> > > > >   "type": "file",
> > > > >   "enabled": true,
> > > > >   "connection": "s3n://inrixprod-tapp/",
> > > > >   "workspaces": {
> > > > > "root": {
> > > > >   "location": "/",
> > > > >   "writable": false,
> > > > >   "defaultInputFormat": null
> > > > > },
> > > > >
> > > > > Nick Monetta | INRIX |ni...@inrix.com |Movement Intelligence |
> > > > > www.inrix.com  | mobile +1 646-248-4105 |
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: Drill v1.6 and s3n connection

2016-04-20 Thread Bridget Bevens
Created DRILL-4621  to
track doc change request.

Thanks,
Bridget

On Wed, Apr 20, 2016 at 10:10 AM, Jason Altekruse  wrote:

> It looks like a number of doc pages can be improved by referencing some
> changes made recently.
>
> With the inclusion of the needed jars for s3a with Drill, there is no
> longer a need to download jets3t [1]. In addition to setting your
> credentials, this option for allowing more concurrent connections
> (necessary to allow reads of wider parquet files) can also be set in this
> block instead of a core-site.xml file [2].
>
> This config block can actually be used to set any filesystem properties.
> Some of these are custom to a particular filesystem like S3, but a number
> of them are used by a variety of implementations of the HDFS interface. Any
> properties like these [3] should be able to be set in this config block.
>
> [1] -
> https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/
> [2] -
>
> https://drill.apache.org/docs/s3-storage-plugin/#quering-parquet-format-files-on-s3
> [3] -
>
> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/core-default.xml
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>
> On Wed, Apr 20, 2016 at 9:52 AM, Abhishek Girish <
> abhishek.gir...@gmail.com>
> wrote:
>
> > Thanks Jason! I hadn't noticed the config property for S3. I tried this
> out
> > now, and feel it is a lot more easier now.
> >
> > And yes, we should definitely update the docs. There have been quite a
> few
> > threads related to S3 config.
> >
> > On Wed, Apr 20, 2016 at 8:19 AM, Jason Altekruse 
> wrote:
> >
> > > I don't believe there is any way in which a particular bucket has a
> > > property of being s3, s3n or s3a. As I understand it, this only change
> > the
> > > client library that is used to interface with S3. We have included the
> > jars
> > > necessary for s3a with Drill, which is the newest and most performant
> > > option available.
> > >
> > > I need to open a doc JIRA for this, but there is one way in which the
> s3
> > > experience was improved recently to prevent the need to restart Drill
> to
> > > add your S3 credentials. When you create a connection to an S3 bucket,
> > you
> > > can now specify your credentials in a property named "config" in the
> > > storage plugin. This allows you to set any filesystem properties, which
> > we
> > > previously was only possible to set with a core-site.xml file on the
> > > classpath when starting Drill.
> > >
> > > Example:
> > > {
> > >   "type": "file",
> > >   "enabled": true,
> > >   "connection": "s3a://address.of.your.bucket/",
> > >   "config": {
> > > "fs.s3a.access.key": "",
> > > "fs.s3a.secret.key": ""
> > >   },
> > >   "workspaces": {
> > > "root": {
> > >   "location": "/",
> > >   "writable": false,
> > >   "defaultInputFormat": null
> > > }
> > >   },
> > >   "formats": {
> > > "psv": {
> > >   "type": "text",
> > >   "extensions": [
> > > "tbl"
> > >   ],
> > >   "delimiter": "|"
> > > }, ...
> > >
> > >
> > > Jason Altekruse
> > > Software Engineer at Dremio
> > > Apache Drill Committer
> > >
> > > On Wed, Apr 20, 2016 at 7:40 AM, Nick Monetta  wrote:
> > >
> > > > Hi,
> > > > Does Drill v1.6 still support s3n connections or just s3a?
> > > >
> > > > I have a s3n S3 bucket that I'm trying to connect to and it will not
> > > work.
> > > > My config is:
> > > >
> > > > {
> > > >   "type": "file",
> > > >   "enabled": true,
> > > >   "connection": "s3n://inrixprod-tapp/",
> > > >   "workspaces": {
> > > > "root": {
> > > >   "location": "/",
> > > >   "writable": false,
> > > >   "defaultInputFormat": null
> > > > },
> > > >
> > > > Nick Monetta | INRIX |ni...@inrix.com |Movement Intelligence |
> > > > www.inrix.com  | mobile +1 646-248-4105 |
> > > >
> > > >
> > > >
> > >
> >
>


[jira] [Created] (DRILL-4621) Drill v1.6 and s3n connection

2016-04-20 Thread Bridget Bevens (JIRA)
Bridget Bevens created DRILL-4621:
-

 Summary: Drill v1.6 and s3n connection 
 Key: DRILL-4621
 URL: https://issues.apache.org/jira/browse/DRILL-4621
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Reporter: Bridget Bevens
Assignee: Bridget Bevens


It looks like a number of doc pages can be improved by referencing some
changes made recently.

With the inclusion of the needed jars for s3a with Drill, there is no
longer a need to download jets3t [1]. In addition to setting your
credentials, this option for allowing more concurrent connections
(necessary to allow reads of wider parquet files) can also be set in this
block instead of a core-site.xml file [2].

This config block can actually be used to set any filesystem properties.
Some of these are custom to a particular filesystem like S3, but a number
of them are used by a variety of implementations of the HDFS interface. Any
properties like these [3] should be able to be set in this config block.

[1] -
https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/
[2] -
https://drill.apache.org/docs/s3-storage-plugin/#quering-parquet-format-files-on-s3
[3] -
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/core-default.xml

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer


See email thread: Drill v1.6 and s3n connection 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4445) Remove extra code to work around mixture of arrays and Lists used in Logical and Physical query plan nodes

2016-04-20 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4445.

Resolution: Fixed

Fixed in d24205d4e795a1aab54b64708dde1e7deeca668b

> Remove extra code to work around mixture of arrays and Lists used in Logical 
> and Physical query plan nodes
> --
>
> Key: DRILL-4445
> URL: https://issues.apache.org/jira/browse/DRILL-4445
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
>
> The physical plan node classes for all of the operators currently use a mix 
> of arrays and Lists to refer to lists of incoming operators, expressions, and 
> other operator properties. This had lead to the introduction of several 
> utility methods for translating between the two representations, examples can 
> be seen in common/logical/data/Abstractbuilder.
> This isn't a major problem, but the new operator test framework uses these 
> classes as a primary interface for setting up the tests. It seemed worthwhile 
> to just refactor the classes to be consistent so that the tests would all be 
> similar. There are a few changes to execution code, but they are all just 
> trivial changes to use the list based interfaces (length vs size(), set() 
> instead of arr[i] = foo, etc.) as Jackson just transparently handles both 
> types the same (which is why this hasn't really been a problem).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4437) Implement framework for testing operators in isolation

2016-04-20 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4437.

Resolution: Fixed

Fixed in d93a3633815ed1c7efd6660eae62b7351a2c9739

> Implement framework for testing operators in isolation
> --
>
> Key: DRILL-4437
> URL: https://issues.apache.org/jira/browse/DRILL-4437
> Project: Apache Drill
>  Issue Type: Test
>  Components: Tools, Build & Test
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.7.0
>
>
> Most of the tests written for Drill are end-to-end. We spin up a full 
> instance of the server, submit one or more SQL queries and check the results.
> While integration tests like this are useful for ensuring that all features 
> are guaranteed to not break end-user functionality overuse of this approach 
> has caused a number of pain points.
> Overall the tests end up running a lot of the exact same code, parsing and 
> planning many similar queries.
> Creating consistent reproductions of issues, especially edge cases found in 
> clustered environments can be extremely difficult. Even the simpler case of 
> testing cases where operators are able to handle a particular series of 
> incoming batches of records has required hacks like generating large enough 
> files so that the scanners happen to break them up into separate batches. 
> These tests are brittle as they make assumptions about how the scanners will 
> work in the future. An example of when this could break, we might do perf 
> evaluation to find out we should be producing larger batches in some cases. 
> Existing tests that are trying to test multiple batches by producing a few 
> more records than the current threshold for batch size would not be testing 
> the same code paths.
> We need to make more parts of the system testable without initializing the 
> entire Drill server, as well as making the different internal settings and 
> state of the server configurable for tests.
> This is a first effort to enable testing the physical operators in Drill by 
> mocking the components of the system necessary to enable operators to 
> initialize and execute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4592) Explain plan statement should show plan in WebUi

2016-04-20 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-4592.
---
Resolution: Fixed

Fixed in commit: 9f4fff800d128878094ae70b454201f79976135d

> Explain plan statement should show plan in WebUi
> 
>
> Key: DRILL-4592
> URL: https://issues.apache.org/jira/browse/DRILL-4592
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.7.0
>
>
> When explain plan statement is run, the physical plan is generated and 
> returned. However, the plan is not put in the profile and does not show up in 
> the physical plan / visual plan tab in WebUI. If someone wants to look at the 
> visual plan, the only way is to execute the query, which sometime requires 
> long execution time. This makes a bit hard to analyze the plan for a 
> problematic query.  
> Similar as regular query and CTAS statement, we should put the plan for 
> EXPLAIN PLAN statement, and display properly in WebUi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4589) Reduce planning time for file system partition pruning by reducing filter evaluation overhead

2016-04-20 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-4589.
---
   Resolution: Fixed
Fix Version/s: 1.7.0

Fixed in commit: dbf4b15eda14f55462ff0872266bf61c13bdb1bc

> Reduce planning time for file system partition pruning by reducing filter 
> evaluation overhead
> -
>
> Key: DRILL-4589
> URL: https://issues.apache.org/jira/browse/DRILL-4589
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.7.0
>
>
> When Drill is used to query hundreds of thousands, or even millions of files 
> organized into multi-level directories, user typically will provide a 
> partition filter like  : dir0 = something and dir1 = something2 and .. .  
> For such queries, we saw the query planning time could be unacceptable long, 
> due to three main overheads: 1) to expand and get the list of files, 2) to 
> evaluate the partition filter, 3) to get the metadata, in the case of parquet 
> files for which metadata cache file is not available. 
> DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after 
> DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the 
> partition filter evaluation is applied to file level. In many cases, we saw 
> that the number of leaf subdirectories is significantly lower than that of 
> files. Since all the files under the same leaf subdirecctory share the same 
> directory metadata, we should apply the filter evaluation at the leaf 
> subdirectory. By doing that, we could reduce the cpu overhead to evaluate the 
> filter, and the memory overhead as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-4387: GroupScan or ScanBatchCreator shou...

2016-04-20 Thread jinfengni
Github user jinfengni closed the pull request at:

https://github.com/apache/drill/pull/379


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4387: GroupScan or ScanBatchCreator shou...

2016-04-20 Thread jaltekruse
Github user jaltekruse commented on the pull request:

https://github.com/apache/drill/pull/379#issuecomment-212534349
  
@jinfengni looks like this was merged, can you close the PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Drill v1.6 and s3n connection

2016-04-20 Thread Jason Altekruse
It looks like a number of doc pages can be improved by referencing some
changes made recently.

With the inclusion of the needed jars for s3a with Drill, there is no
longer a need to download jets3t [1]. In addition to setting your
credentials, this option for allowing more concurrent connections
(necessary to allow reads of wider parquet files) can also be set in this
block instead of a core-site.xml file [2].

This config block can actually be used to set any filesystem properties.
Some of these are custom to a particular filesystem like S3, but a number
of them are used by a variety of implementations of the HDFS interface. Any
properties like these [3] should be able to be set in this config block.

[1] -
https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/
[2] -
https://drill.apache.org/docs/s3-storage-plugin/#quering-parquet-format-files-on-s3
[3] -
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/core-default.xml

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Wed, Apr 20, 2016 at 9:52 AM, Abhishek Girish 
wrote:

> Thanks Jason! I hadn't noticed the config property for S3. I tried this out
> now, and feel it is a lot more easier now.
>
> And yes, we should definitely update the docs. There have been quite a few
> threads related to S3 config.
>
> On Wed, Apr 20, 2016 at 8:19 AM, Jason Altekruse  wrote:
>
> > I don't believe there is any way in which a particular bucket has a
> > property of being s3, s3n or s3a. As I understand it, this only change
> the
> > client library that is used to interface with S3. We have included the
> jars
> > necessary for s3a with Drill, which is the newest and most performant
> > option available.
> >
> > I need to open a doc JIRA for this, but there is one way in which the s3
> > experience was improved recently to prevent the need to restart Drill to
> > add your S3 credentials. When you create a connection to an S3 bucket,
> you
> > can now specify your credentials in a property named "config" in the
> > storage plugin. This allows you to set any filesystem properties, which
> we
> > previously was only possible to set with a core-site.xml file on the
> > classpath when starting Drill.
> >
> > Example:
> > {
> >   "type": "file",
> >   "enabled": true,
> >   "connection": "s3a://address.of.your.bucket/",
> >   "config": {
> > "fs.s3a.access.key": "",
> > "fs.s3a.secret.key": ""
> >   },
> >   "workspaces": {
> > "root": {
> >   "location": "/",
> >   "writable": false,
> >   "defaultInputFormat": null
> > }
> >   },
> >   "formats": {
> > "psv": {
> >   "type": "text",
> >   "extensions": [
> > "tbl"
> >   ],
> >   "delimiter": "|"
> > }, ...
> >
> >
> > Jason Altekruse
> > Software Engineer at Dremio
> > Apache Drill Committer
> >
> > On Wed, Apr 20, 2016 at 7:40 AM, Nick Monetta  wrote:
> >
> > > Hi,
> > > Does Drill v1.6 still support s3n connections or just s3a?
> > >
> > > I have a s3n S3 bucket that I'm trying to connect to and it will not
> > work.
> > > My config is:
> > >
> > > {
> > >   "type": "file",
> > >   "enabled": true,
> > >   "connection": "s3n://inrixprod-tapp/",
> > >   "workspaces": {
> > > "root": {
> > >   "location": "/",
> > >   "writable": false,
> > >   "defaultInputFormat": null
> > > },
> > >
> > > Nick Monetta | INRIX |ni...@inrix.com |Movement Intelligence |
> > > www.inrix.com  | mobile +1 646-248-4105 |
> > >
> > >
> > >
> >
>


[GitHub] drill pull request: DRILL-2100: Added deleting temporary spill dir...

2016-04-20 Thread adeneche
Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/454#discussion_r60447425
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ---
@@ -223,7 +227,18 @@ public void close() {
 if (mSorter != null) {
   mSorter.clear();
 }
-
+for(Iterator iter = this.currSpillDirs.iterator(); iter.hasNext(); 
iter.remove()) {
+Path path = (Path)iter.next();
+try {
+if (fs != null && path != null && fs.exists(path)) {
+if (fs.delete(path, true)) {
+fs.cancelDeleteOnExit(path);
+}
+}
+} catch (IOException e) {
+throw new RuntimeException(e);
--- End diff --

same concern here: does it make sense to fail a query if we fail to delete 
one of the spill directories ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-2100: Added deleting temporary spill dir...

2016-04-20 Thread adeneche
Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/454#discussion_r60447085
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ---
@@ -550,7 +565,15 @@ public BatchGroup mergeAndSpill(LinkedList 
batchGroups) throws Schem
 c1.buildSchema(BatchSchema.SelectionVectorMode.NONE);
 c1.setRecordCount(count);
 
-String outputFile = Joiner.on("/").join(dirs.next(), fileName, 
spillCount++);
+String spillDir = dirs.next();
+Path currSpillPath = new Path(Joiner.on("/").join(spillDir, fileName));
+currSpillDirs.add(currSpillPath);
+String outputFile = Joiner.on("/").join(currSpillPath, spillCount++);
+try {
+fs.deleteOnExit(currSpillPath);
+} catch (IOException e) {
+throw new RuntimeException(e);
--- End diff --

I have some concerns about throwing an exception here:

First, does it make sense to fail the query when `deleteOnExit()` fails ? 
shouldn't we just log a warning. After all, if all goes well, this folder will 
get deleted in the close method.

Second, if this exception throws we'll be leaking memory because we'll skip 
clearing batchGroupList and hyperBatch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Drill v1.6 and s3n connection

2016-04-20 Thread Abhishek Girish
Thanks Jason! I hadn't noticed the config property for S3. I tried this out
now, and feel it is a lot more easier now.

And yes, we should definitely update the docs. There have been quite a few
threads related to S3 config.

On Wed, Apr 20, 2016 at 8:19 AM, Jason Altekruse  wrote:

> I don't believe there is any way in which a particular bucket has a
> property of being s3, s3n or s3a. As I understand it, this only change the
> client library that is used to interface with S3. We have included the jars
> necessary for s3a with Drill, which is the newest and most performant
> option available.
>
> I need to open a doc JIRA for this, but there is one way in which the s3
> experience was improved recently to prevent the need to restart Drill to
> add your S3 credentials. When you create a connection to an S3 bucket, you
> can now specify your credentials in a property named "config" in the
> storage plugin. This allows you to set any filesystem properties, which we
> previously was only possible to set with a core-site.xml file on the
> classpath when starting Drill.
>
> Example:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://address.of.your.bucket/",
>   "config": {
> "fs.s3a.access.key": "",
> "fs.s3a.secret.key": ""
>   },
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> }, ...
>
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>
> On Wed, Apr 20, 2016 at 7:40 AM, Nick Monetta  wrote:
>
> > Hi,
> > Does Drill v1.6 still support s3n connections or just s3a?
> >
> > I have a s3n S3 bucket that I'm trying to connect to and it will not
> work.
> > My config is:
> >
> > {
> >   "type": "file",
> >   "enabled": true,
> >   "connection": "s3n://inrixprod-tapp/",
> >   "workspaces": {
> > "root": {
> >   "location": "/",
> >   "writable": false,
> >   "defaultInputFormat": null
> > },
> >
> > Nick Monetta | INRIX |ni...@inrix.com |Movement Intelligence |
> > www.inrix.com  | mobile +1 646-248-4105 |
> >
> >
> >
>


[GitHub] drill pull request: DRILL-2100: Added deleting temporary spill dir...

2016-04-20 Thread adeneche
Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/454#discussion_r60445363
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ---
@@ -116,6 +119,7 @@
   private boolean first = true;
   private int targetRecordCount;
   private final String fileName;
+  private Set currSpillDirs = new TreeSet();
--- End diff --

IntelliJ complains about an "unchecked assignment" here. Can you change 
this to `new TreeSet<>()` instead ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Operator unit test framework merged

2016-04-20 Thread Abdel Hakim Deneche
Great job Jason, this is much needed indeed.

On Wed, Apr 20, 2016 at 9:34 AM, Jason Altekruse  wrote:

> small correction: thank you Parth* for the review
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>
> On Wed, Apr 20, 2016 at 9:23 AM, Jason Altekruse  wrote:
>
> > Hello all,
> >
> > I finally got a chance to do some final minor fixes and merge the
> operator
> > unit test framework I posted a while back, thanks again to Path for
> doing a
> > review on it. There are still some enhancements I would like to add to
> make
> > the tests more flexible, but for examples of what can be done with the
> > current version please check out the tests that were included with the
> > patch [1]. Please don't hesitate to ask questions or suggest
> improvements.
> > I think that writing tests in smaller units like this could go a long way
> > in improving our coverage and ensure that we can write tests that
> > consistently cover a particular execution path, independent of the query
> > planner.
> >
> > For anyone looking to get more familiar with how Drill executes
> > operations, these tests might be a little easier way to start getting
> > antiquated with the internals of Drill. The tests mock a number of the
> more
> > complex parts of the system and try to produce a minimal environment
> where
> > a single operation can run.
> >
> > [1] -
> >
> https://github.com/apache/drill/blob/d93a3633815ed1c7efd6660eae62b7351a2c9739/exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/BasicPhysicalOpUnitTest.java
> >
> > Jason Altekruse
> > Software Engineer at Dremio
> > Apache Drill Committer
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



Re: Operator unit test framework merged

2016-04-20 Thread Jason Altekruse
small correction: thank you Parth* for the review

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Wed, Apr 20, 2016 at 9:23 AM, Jason Altekruse  wrote:

> Hello all,
>
> I finally got a chance to do some final minor fixes and merge the operator
> unit test framework I posted a while back, thanks again to Path for doing a
> review on it. There are still some enhancements I would like to add to make
> the tests more flexible, but for examples of what can be done with the
> current version please check out the tests that were included with the
> patch [1]. Please don't hesitate to ask questions or suggest improvements.
> I think that writing tests in smaller units like this could go a long way
> in improving our coverage and ensure that we can write tests that
> consistently cover a particular execution path, independent of the query
> planner.
>
> For anyone looking to get more familiar with how Drill executes
> operations, these tests might be a little easier way to start getting
> antiquated with the internals of Drill. The tests mock a number of the more
> complex parts of the system and try to produce a minimal environment where
> a single operation can run.
>
> [1] -
> https://github.com/apache/drill/blob/d93a3633815ed1c7efd6660eae62b7351a2c9739/exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/BasicPhysicalOpUnitTest.java
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>


Operator unit test framework merged

2016-04-20 Thread Jason Altekruse
Hello all,

I finally got a chance to do some final minor fixes and merge the operator
unit test framework I posted a while back, thanks again to Path for doing a
review on it. There are still some enhancements I would like to add to make
the tests more flexible, but for examples of what can be done with the
current version please check out the tests that were included with the
patch [1]. Please don't hesitate to ask questions or suggest improvements.
I think that writing tests in smaller units like this could go a long way
in improving our coverage and ensure that we can write tests that
consistently cover a particular execution path, independent of the query
planner.

For anyone looking to get more familiar with how Drill executes operations,
these tests might be a little easier way to start getting antiquated with
the internals of Drill. The tests mock a number of the more complex parts
of the system and try to produce a minimal environment where a single
operation can run.

[1] -
https://github.com/apache/drill/blob/d93a3633815ed1c7efd6660eae62b7351a2c9739/exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/BasicPhysicalOpUnitTest.java

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer


[jira] [Created] (DRILL-4620) Drill query Hbase table got base64 encoded results while Hbase Shell show table content correctly

2016-04-20 Thread Chunhui Shi (JIRA)
Chunhui Shi created DRILL-4620:
--

 Summary: Drill query Hbase table got base64 encoded results while 
Hbase Shell show table content correctly 
 Key: DRILL-4620
 URL: https://issues.apache.org/jira/browse/DRILL-4620
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi


Create a table using hbase shell following the steps in 
https://www.mapr.com/blog/secondary-indexing-mapr-db-using-elasticsearch. 
However query the generated table in drill showing base64 encoded results but 
not the correct plaintext. As shown below:

[root@atsqa4-128 ~]# hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.98.12-mapr-1602, rcf7a299d9b0a24150d4a13cbce7fc9eac9b2404d, Tue Mar  
1 19:32:45 UTC 2016

Not all HBase shell commands are applicable to MapR tables.
Consult MapR documentation for the list of supported commands.

hbase(main):001:0> scan '/user/person'
ROW 
COLUMN+CELL 


 1  
column=details:address, timestamp=1461110148447, value=350 Holger Way   


 1  
column=details:fname, timestamp=1461110112541, value=Tom


 1  
column=details:lname, timestamp=1461110121828, value=John   


 2  
column=details:address, timestamp=1461110227143, value=340 Holger Way   


 2  
column=details:fname, timestamp=1461110171622, value=David  


 2  
column=details:lname, timestamp=1461110189721, value=Robert 


 3  
column=details:address, timestamp=1461110282174, value=310 Holger Way   


 3  
column=details:fname, timestamp=1461110248477, value=Samuel 


 3  
column=details:lname, timestamp=1461110268460, value=Trump  


 4  
column=details:address, timestamp=1461110355548, value=100 Zanker Ave   


 4  
column=details:fname, timestamp=1461110307194, value=Christina  


 4  
column=details:lname, timestamp=1461110332695, value=Rogers 


4 row(s) in 0.1380 seconds

hbase(main):002:0> exit
[root@atsqa4-128 ~]# /opt/mapr/drill/drill-1.7.0/bin/sqlline -u 
"jdbc:drill:zk=10.10.88.125:5181"
apache drill 1.7.0-SNAPSHOT 
"what ever the mind of man can conceive and believe, drill can query"
0: 

[GitHub] drill pull request: DRILL-4437: Operator unit tests

2016-04-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/394


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Drill v1.6 and s3n connection

2016-04-20 Thread Oscar Morante

I think you need to set up jets3t if you want to use s3n.

On Wed, Apr 20, 2016 at 02:40:29PM +, Nick Monetta wrote:

Hi,
Does Drill v1.6 still support s3n connections or just s3a?

I have a s3n S3 bucket that I'm trying to connect to and it will not work. My 
config is:

{
 "type": "file",
 "enabled": true,
 "connection": "s3n://inrixprod-tapp/",
 "workspaces": {
   "root": {
 "location": "/",
 "writable": false,
 "defaultInputFormat": null
   },

Nick Monetta | INRIX |ni...@inrix.com |Movement Intelligence | www.inrix.com  | 
mobile +1 646-248-4105 |




--
Oscar Morante
"Self-education is, I firmly believe, the only kind of education there is."
 -- Isaac Asimov.


signature.asc
Description: Digital signature


Re: Drill v1.6 and s3n connection

2016-04-20 Thread Jason Altekruse
I don't believe there is any way in which a particular bucket has a
property of being s3, s3n or s3a. As I understand it, this only change the
client library that is used to interface with S3. We have included the jars
necessary for s3a with Drill, which is the newest and most performant
option available.

I need to open a doc JIRA for this, but there is one way in which the s3
experience was improved recently to prevent the need to restart Drill to
add your S3 credentials. When you create a connection to an S3 bucket, you
can now specify your credentials in a property named "config" in the
storage plugin. This allows you to set any filesystem properties, which we
previously was only possible to set with a core-site.xml file on the
classpath when starting Drill.

Example:
{
  "type": "file",
  "enabled": true,
  "connection": "s3a://address.of.your.bucket/",
  "config": {
"fs.s3a.access.key": "",
"fs.s3a.secret.key": ""
  },
  "workspaces": {
"root": {
  "location": "/",
  "writable": false,
  "defaultInputFormat": null
}
  },
  "formats": {
"psv": {
  "type": "text",
  "extensions": [
"tbl"
  ],
  "delimiter": "|"
}, ...


Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Wed, Apr 20, 2016 at 7:40 AM, Nick Monetta  wrote:

> Hi,
> Does Drill v1.6 still support s3n connections or just s3a?
>
> I have a s3n S3 bucket that I'm trying to connect to and it will not work.
> My config is:
>
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3n://inrixprod-tapp/",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
>
> Nick Monetta | INRIX |ni...@inrix.com |Movement Intelligence |
> www.inrix.com  | mobile +1 646-248-4105 |
>
>
>


Drill v1.6 and s3n connection

2016-04-20 Thread Nick Monetta
Hi,
Does Drill v1.6 still support s3n connections or just s3a?

I have a s3n S3 bucket that I'm trying to connect to and it will not work. My 
config is:

{
  "type": "file",
  "enabled": true,
  "connection": "s3n://inrixprod-tapp/",
  "workspaces": {
"root": {
  "location": "/",
  "writable": false,
  "defaultInputFormat": null
},

Nick Monetta | INRIX |ni...@inrix.com |Movement Intelligence | www.inrix.com  | 
mobile +1 646-248-4105 |




Storing the output from the query

2016-04-20 Thread Fayaz Basha. Shaik
Hi Team,

Is it possible to store the output of the query from Drill ? like any import 
option?

Need: I would like to get the strings in double quotes.

I used the following configuration and steps:
Configuration:
"csv": {
  "type": "text",
  "extensions": [
"csv"
  ],
  "quote": "\u",
  "delimiter": ","
},


use dfs.tmp;

alter session set `store.format`='csv';

create table dfs.tmp.mytab as select cast(tb1.id as varchar(4)), 
flatten(tb1.batters.batter) FROM dfs.`C:\Users\Desktop\Drill\donuts.json` as 
tb1;

The output in the file is looking as
EXPR$0|EXPR$1
0001|{"id":"1001","type":"Regular"}
0001|{"id":"1002","type":"Chocolate"}

I would like to get it as
"0001"|"{"id":"1001","type":"Regular"}"
"0001"|"{"id":"1002","type":"Chocolate"}"

Basically, the output of string columns should be represented in double quotes. 
Is there any configuration option available for the same?


Regards,
Fayaz



[jira] [Resolved] (DRILL-4459) SchemaChangeException while querying hive json table

2016-04-20 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-4459.

Resolution: Fixed

> SchemaChangeException while querying hive json table
> 
>
> Key: DRILL-4459
> URL: https://issues.apache.org/jira/browse/DRILL-4459
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill, Functions - Hive
>Affects Versions: 1.4.0
> Environment: MapR-Drill 1.4.0
> Hive-1.2.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.7.0
>
>
> getting the SchemaChangeException while querying json documents stored in 
> hive table.
> {noformat}
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> {noformat}
> minimum reproduce
> {noformat}
> created sample json documents using the attached script(randomdata.sh)
> hive>create table simplejson(json string);
> hive>load data local inpath '/tmp/simple.json' into table simplejson;
> now query it through Drill.
> Drill Version
> select * from sys.version;
> +---++-+-++
> | commit_id | commit_message | commit_time | build_email | build_time |
> +---++-+-++
> | eafe0a245a0d4c0234bfbead10c6b2d7c8ef413d | DRILL-3901:  Don't do early 
> expansion of directory in the non-metadata-cache case because it already 
> happens during ParquetGroupScan's metadata gathering operation. | 07.10.2015 
> @ 17:12:57 UTC | Unknown | 07.10.2015 @ 17:36:16 UTC |
> +---++-+-++
> 0: jdbc:drill:zk=> select * from hive.`default`.simplejson where 
> GET_JSON_OBJECT(simplejson.json, '$.DocId') = 'DocId2759947' limit 1;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIT(VAR16CHAR-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 1:1
> [Error Id: 74f054a8-6f1d-4ddd-9064-3939fcc82647 on ip-10-0-0-233:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4237) Skew in hash distribution

2016-04-20 Thread Chunhui Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi resolved DRILL-4237.

   Resolution: Fixed
 Reviewer: Aman Sinha
Fix Version/s: 1.7.0

> Skew in hash distribution
> -
>
> Key: DRILL-4237
> URL: https://issues.apache.org/jira/browse/DRILL-4237
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Chunhui Shi
> Fix For: 1.7.0
>
>
> Apparently, the fix in DRILL-4119 did not fully resolve the data skew issue.  
> It worked fine on the smaller sample of the data set but on another sample of 
> the same data set, it still produces skewed values - see below the hash 
> values which are all odd numbers. 
> {noformat}
> 0: jdbc:drill:zk=local> select columns[0], hash32(columns[0]) from `test.csv` 
> limit 10;
> +---+--+
> |  EXPR$0   |EXPR$1|
> +---+--+
> | f71aaddec3316ae18d43cb1467e88a41  | 1506011089   |
> | 3f3a13bb45618542b5ac9d9536704d3a  | 1105719049   |
> | 6935afd0c693c67bba482cedb7a2919b  | -18137557|
> | ca2a938d6d7e57bda40501578f98c2a8  | -1372666789  |
> | fab7f08402c8836563b0a5c94dbf0aec  | -1930778239  |
> | 9eb4620dcb68a84d17209da279236431  | -970026001   |
> | 16eed4a4e801b98550b4ff504242961e  | 356133757|
> | a46f7935fea578ce61d8dd45bfbc2b3d  | -94010449|
> | 7fdf5344536080c15deb2b5a2975a2b7  | -141361507   |
> | b82560a06e2e51b461c9fe134a8211bd  | -375376717   |
> +---+--+
> {noformat}
> This indicates an underlying issue with the XXHash64 java implementation, 
> which is Drill's implementation of the C version.  One of the key difference 
> as pointed out by [~jnadeau] was the use of unsigned int64 in the C version 
> compared to the Java version which uses (signed) long.  I created an XXHash 
> version using com.google.common.primitives.UnsignedLong.  However, 
> UnsignedLong does not have bit-wise operations that are needed for XXHash 
> such as rotateLeft(),  XOR etc.  One could write wrappers for these but at 
> this point, the question is: should we think of an alternative hash function 
> ? 
> The alternative approach could be the murmur hash for numeric data types that 
> we were using earlier and the Mahout version of hash function for string 
> types 
> (https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java#L28).
>   As a test, I reverted to this function and was getting good hash 
> distribution for the test data. 
> I could not find any performance comparisons of our perf tests (TPC-H or DS) 
> with the original and newer (XXHash) hash functions.  If performance is 
> comparable, should we revert to the original function ?  
> As an aside, I would like to remove the hash64 versions of the functions 
> since these are not used anywhere. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4478) binary_string cannot convert buffer that were not start from 0 correctly

2016-04-20 Thread Chunhui Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi resolved DRILL-4478.

   Resolution: Fixed
 Reviewer: Aman Sinha
Fix Version/s: 1.7.0

> binary_string cannot convert buffer that were not start from 0 correctly
> 
>
> Key: DRILL-4478
> URL: https://issues.apache.org/jira/browse/DRILL-4478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
> Fix For: 1.7.0
>
>
> When binary_string was called multiple times, it can only convert the first 
> one correctly if the drillbuf start from 0. For the second and afterwards 
> calls, because the drillbuf is not starting from 0 thus 
> DrillStringUtils.parseBinaryString could not do the work correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)