[jira] [Commented] (DRILL-1851) Need some samples for RANK(), ROW_NUMBER(), SubQuery in SELECTStatement - Apache Drill

2015-10-28 Thread Tom Barber (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978282#comment-14978282
 ] 

Tom Barber commented on DRILL-1851:
---

Hello folks, ROW_NUMBER works for me in 1.2 so I don't know how much of this 
ticket remains open:

select ROW_NUMBER() OVER (ORDER BY columns[0]), columns[0] from 
dfs.`/home/bugg/tmp/hads/` limit 10;

Just thought I'd let those watching know!

Tom

> Need some samples for RANK(), ROW_NUMBER(), SubQuery in SELECTStatement - 
> Apache Drill
> --
>
> Key: DRILL-1851
> URL: https://issues.apache.org/jira/browse/DRILL-1851
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 0.6.0
> Environment: Drill SQL
>Reporter: Chandru
>Priority: Critical
> Fix For: Future
>
> Attachments: Issue_Hugefiles-Drill.jpg
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Provide some sample queries for the below scenarios,
> 1.RANK( ) function
> rank() over (ORDER BY columns[0] DESC)  as rowcount
> 2. ROW_NUMBER( ) function
> row_number() over (PARTITION BY  columns[0] ORDER BY columns[1] DESC) as 
> rowcount
> 3. SubQuery in Select Statement.
> SELECT paccnt.R_NAME, 
> CAST((SELECT N_NATIONKEY FROM 
> dfs.`/home/user/drill/drill-0.6.0/sample-data/nation.parquet`) AS CHAR) AS 
> NUMB
> from dfs.`/home/user/drill/drill-0.6.0/sample-data/region.parquet` as paccnt;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3990) Create a sys.fragments table

2015-10-28 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3990:
--
Description: Similar to DRILL-3989, we should create a table which lists 
all the currently executing fragments. This could include the query they are 
associated with, the start and stop time, the node they are executing on and 
maybe a couple of key metrics (e.g. records consumed, records produced, current 
and peak memory consumed). This could also be  modeled after the sys.threads 
and sys.memory tables.  (was: Similar to DRILL-3988, we should create a table 
which lists all the currently executing fragments. This could include the query 
they are associated with, the start and stop time, the node they are executing 
on and maybe a couple of key metrics (e.g. records consumed, records produced, 
current and peak memory consumed). This could also be  modeled after the 
sys.threads and sys.memory tables.)

> Create a sys.fragments table
> 
>
> Key: DRILL-3990
> URL: https://issues.apache.org/jira/browse/DRILL-3990
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Reporter: Jacques Nadeau
>  Labels: newbie
>
> Similar to DRILL-3989, we should create a table which lists all the currently 
> executing fragments. This could include the query they are associated with, 
> the start and stop time, the node they are executing on and maybe a couple of 
> key metrics (e.g. records consumed, records produced, current and peak memory 
> consumed). This could also be  modeled after the sys.threads and sys.memory 
> tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3990) Create a sys.fragments table

2015-10-28 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-3990:
-

 Summary: Create a sys.fragments table
 Key: DRILL-3990
 URL: https://issues.apache.org/jira/browse/DRILL-3990
 Project: Apache Drill
  Issue Type: Improvement
  Components: Metadata
Reporter: Jacques Nadeau


Similar to DRILL-3988, we should create a table which lists all the currently 
executing fragments. This could include the query they are associated with, the 
start and stop time, the node they are executing on and maybe a couple of key 
metrics (e.g. records consumed, records produced, current and peak memory 
consumed). This could also be  modeled after the sys.threads and sys.memory 
tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3988) Create a sys.functions table to expose available Drill functions

2015-10-28 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3988:
--
Labels: newbie  (was: )

> Create a sys.functions table to expose available Drill functions
> 
>
> Key: DRILL-3988
> URL: https://issues.apache.org/jira/browse/DRILL-3988
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Reporter: Jacques Nadeau
>  Labels: newbie
>
> Create a new sys.functions table that returns a list of all available 
> functions.
> Key considerations: 
> - one row per name or one per argument set. I'm inclined to latter so people 
> can use queries to get to data.
> - we need to create a delineation between user functions and internal 
> functions and only show user functions. 'CastInt' isn't something the user 
> should be able to see (or run).
> - should we add a description annotation that could be included in the 
> sys.functions table?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3750) Not able to connect to HDFS and/or Hive

2015-10-28 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3750:
--
Labels:   (was: newbie)

> Not able to connect to HDFS and/or Hive
> ---
>
> Key: DRILL-3750
> URL: https://issues.apache.org/jira/browse/DRILL-3750
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill, Metadata, Storage - Hive, Storage - 
> Text & CSV
>Affects Versions: 1.1.0
> Environment: apache hadoop and apache drill
>Reporter: ravi ranjan kumar
> Fix For: Future
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I am not able to connect/fetch data using select queries form hive storage 
> and hdfs storage.
> hive storage config - 
> {
>   "type": "hive",
>   "enabled": true,
>   "configProps": {
> "hive.metastore.uris": "thrift://192.168.146.138:9083",
> "javax.jdo.option.ConnectionURL": 
> "jdbc:derby:;databaseName=/home/ravi/bigdata/hive-1.0.1/metastore_db;create=true",
> "hive.metastore.warehouse.dir": "/tmp/hive",
> "fs.default.name": "hdfs://192.168.146.136:9000/",
> "hive.metastore.sasl.enabled": "false"
>   }
> }
> Query -  select * from hive.`customers`
> ERROR - org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> EOFException [Error Id: 29746c1e-90fc-41f6-8263-ce943354b07e on ubuntu:31010]
> HDFS storage config -
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "hdfs://192.168.146.136:9000/",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> },
> "avro": {
>   "type": "avro"
> }
>   }
> }
> Query  - select * from hdfs.`/customers.csv`
> ERROR - org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: 
> From line 1, column 15 to line 1, column 18: Table 'hdfs./customers.csv' not 
> found [Error Id: 13df2ccb-01bd-480f-966c-ceda7e1503a8 on ubuntu:31010]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3989) Create a sys.queries table

2015-10-28 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-3989:
-

 Summary: Create a sys.queries table
 Key: DRILL-3989
 URL: https://issues.apache.org/jira/browse/DRILL-3989
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Reporter: Jacques Nadeau


We should create a sys.queries table that provides a clusterwide view of active 
queries. It could include the following columns:

queryid, user, sql, current status, number of nodes involved, number of total 
fragments, number of fragments completed, start time

This should be a pretty straightforward task as we should be able to leverage 
the capabilities around required affinity. A great model to build off of are 
the sys.memory and sys.threads tables.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3937) We are not pruning when we have a metadata cache and auto partitioned data in some cases

2015-10-28 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978457#comment-14978457
 ] 

Aman Sinha commented on DRILL-3937:
---

I found 2 more issues during testing...I will upload an new PR for this along 
with unit tests. 

> We are not pruning when we have a metadata cache and auto partitioned data in 
> some cases
> 
>
> Key: DRILL-3937
> URL: https://issues.apache.org/jira/browse/DRILL-3937
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Rahul Challapalli
>Assignee: Aman Sinha
> Attachments: 1_0_9998.parquet, 1_0_.parquet
>
>
> git.commit.id.abbrev=2736412
> The below plan indicates that we are not pruning
> {code}
> explain plan for select count(*) from dfs.`/drill/comscore/orders2` where 
> o_clerk='Clerk#79443';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 00-03  Project($f0=[0])
> 00-04SelectionVectorRemover
> 00-05  Filter(condition=[=($0, 'Clerk#79443')])
> 00-06Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/comscore/orders2/1_0_.parquet], ReadEntryWithPath 
> [path=maprfs:///drill/comscore/orders2/1_0_9998.parquet]], 
> selectionRoot=/drill/comscore/orders2, numFiles=2, usedMetadataFile=true, 
> columns=[`o_clerk`]]])
> {code}
> Error from the logs
> {code}
> 2015-10-15 01:24:28,467 [29e0ffb4-1c91-f40a-8bf0-5e3665dcf107:foreman] WARN  
> o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
> partition.
> java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to 
> parquet.io.api.Binary
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.populatePruningVector(ParquetGroupScan.java:414)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.populatePartitionVectors(ParquetPartitionDescriptor.java:96)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:212)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  [calcite-core-1.4.0-drill-r6.jar:1.4.0-drill-r6]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
>  [calcite-core-1.4.0-drill-r6.jar:1.4.0-drill-r6]
> at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) 
> [calcite-core-1.4.0-drill-r6.jar:1.4.0-drill-r6]
> at 
> org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:303) 
> [calcite-core-1.4.0-drill-r6.jar:1.4.0-drill-r6]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:545)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:905) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> {code}
> The partition column type in this case is binary which could be causing the 
> issue. 
> Partition pruning seems to be working when we have Metadata Caching + Auto 
> Partitioned Files with integer partition column 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3951) Lexical Errors in ODBC Queries

2015-10-28 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3951:
--
Labels:   (was: newbie)

> Lexical Errors in ODBC Queries
> --
>
> Key: DRILL-3951
> URL: https://issues.apache.org/jira/browse/DRILL-3951
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - ODBC
>Affects Versions: 1.1.0, 1.2.0
> Environment: Mac OS 10.11, Apache Drill v. 1.2, Python 3.4, 
>Reporter: Charles Givre
>
> I followed the instructions to install the latest version of Apache Drill, 
> and the Mapr ODBC drivers, but when I attempt to query a data source via 
> ODBC, I get the following errors:
> Error: ('HY000', '[HY000] [MapR][Drill] (1040) Drill failed to execute the 
> query: `\n[30027]Query execution error. Details:[ \nPARSE 
> ERROR: Lexical error at line 1, column 1.  Encountered: "\\ufffd" (65533), 
> after : ""\n\n\n[Error Id: 8e1f4049-f3e9-477f-9e3f-5df62c (1040) 
> (SQLExecDirectW)')
> Here is the code which generates the errors:
> import pyodbc
> import pandas as pd
> MY_DSN = 
> "DRIVER=/opt/mapr/drillodbc/lib/universal/libmaprdrillodbc.dylib;Host=localhost;Port=31010;ConnectionType=Direct;Catalog=Drill;Schema=mfs.views;AuthenticationType=No
>  Authentication"
> conn = pyodbc.connect(MY_DSN, autocommit=True)
> cursor = conn.cursor()
> employee_query = "SELECT * FROM dfs.`employee.json`"
> data = pd.read_sql( employee_query, conn )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3925) Implementing and Configuring a Custom Authenticator

2015-10-28 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3925:
--
Labels:   (was: newbie)

> Implementing and Configuring a Custom Authenticator
> ---
>
> Key: DRILL-3925
> URL: https://issues.apache.org/jira/browse/DRILL-3925
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation, Execution - RPC, Functions - Drill
>Affects Versions: 1.1.0
> Environment: MacOSX, Linux
>Reporter: Tri Dung Le
>
> https://drill.apache.org/docs/configuring-user-authentication/#implementing-and-configuring-a-custom-authenticator
> I have been read this tutorial to implement a custom authentication for 
> apache drill. But I get  error. Please help me to figure out what problem. 
> Please see full detail as below:
> {quote}
> Error: Failure in starting embedded Drillbit: 
> org.apache.drill.exec.exception.DrillbitStartupException: Failed to find the 
> implementation of '{}' for type '{}' (state=,code=0)
> java.sql.SQLException: Failure in starting embedded Drillbit: 
> org.apache.drill.exec.exception.DrillbitStartupException: Failed to find the 
> implementation of '{}' for type '{}'
>   at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:109)
>   at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:66)
>   at 
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:69)
>   at 
> net.hydromatic.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:126)
>   at org.apache.drill.jdbc.Driver.connect(Driver.java:78)
>   at sqlline.DatabaseConnection.connect(DatabaseConnection.java:167)
>   at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:213)
>   at sqlline.Commands.connect(Commands.java:1083)
>   at sqlline.Commands.connect(Commands.java:1015)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
>   at sqlline.SqlLine.dispatch(SqlLine.java:734)
>   at sqlline.SqlLine.initArgs(SqlLine.java:519)
>   at sqlline.SqlLine.begin(SqlLine.java:587)
>   at sqlline.SqlLine.start(SqlLine.java:366)
>   at sqlline.SqlLine.main(SqlLine.java:259)
> Caused by: org.apache.drill.exec.exception.DrillbitStartupException: Failed 
> to find the implementation of '{}' for type '{}'
>   at 
> org.apache.drill.exec.rpc.user.security.UserAuthenticatorFactory.createAuthenticator(UserAuthenticatorFactory.java:104)
>   at org.apache.drill.exec.rpc.user.UserServer.(UserServer.java:75)
>   at 
> org.apache.drill.exec.service.ServiceEngine.(ServiceEngine.java:57)
>   at org.apache.drill.exec.server.Drillbit.(Drillbit.java:184)
>   at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:99)
>   ... 18 more
> apache drill 1.0.0 
> "a drill is a terrible thing to waste"
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3933) Error execute select command line sqlline -u -q

2015-10-28 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3933:
--
Labels: bug  (was: bug newbie)

> Error execute select command line sqlline -u -q
> ---
>
> Key: DRILL-3933
> URL: https://issues.apache.org/jira/browse/DRILL-3933
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Jon
>  Labels: bug
>
> I'm newbie with Drill and Jira, so sorry if this is not the correct site.
> When I query : "sqlline -u 'jdbc:drill:drillbit=localhost' -q 'select * from 
> hive.database.table;' " return: 
> "select anaconda-ks.cfg build.out install.log install.log.syslog 
> ranger_tutorial sandbox.info start_ambari.sh start_hbase.sh start_solr.sh 
> stop_solr.sh from hive.database.table;"
> Error: PARSE ERROR: Encountered "." at line 1, column 29.
> Was expecting one of:
> "FROM" ...
> "," ...
> So, to fix this, i should type all columns to do this one work.
> But, if I used UI, in localhost:8047/query, the query works. Drill is 
> connected to Hive with plugin of course. Is this a bug or something? Bad 
> conf.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3987) Create a POC VV extraction

2015-10-28 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978633#comment-14978633
 ] 

Jacques Nadeau commented on DRILL-3987:
---

One of the things I'm looking at here what is the right separation between 
describing schema and Drill's concept of materialized field and schemapath. It 
seems like we need a simplified MaterializedField in the vector classes and 
then a specialization that supports things like Drill's logical expressions in 
the Drill codebase. What do you think [~hgunes]?

> Create a POC VV extraction
> --
>
> Key: DRILL-3987
> URL: https://issues.apache.org/jira/browse/DRILL-3987
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
>
> I'd like to start by looking at an extraction that pulls out the base 
> concepts of:
> buffer allocation, value vectors and complexwriter/fieldreader.
> I need to figure out how to resolve some of the cross-dependency issues (such 
> as the jdbc accessor connections).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-1491) Support for JDK 8

2015-10-28 Thread Patrick Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wong updated DRILL-1491:

Attachment: DRILL-1491.1.patch.txt

DRILL-1491.1.patch.txt - allow JDK 8

> Support for JDK 8
> -
>
> Key: DRILL-1491
> URL: https://issues.apache.org/jira/browse/DRILL-1491
> Project: Apache Drill
>  Issue Type: Task
>  Components: Tools, Build & Test
>Reporter: Aditya Kishore
> Fix For: Future
>
> Attachments: DRILL-1491.1.patch.txt
>
>
> This will be the umbrella JIRA used to track and fix issues with JDK 8 
> support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3929) Support the ability to query database tables using external indices

2015-10-28 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978820#comment-14978820
 ] 

Aman Sinha commented on DRILL-3929:
---

Discussed the Phoenix integration with Calcite more on the dev list and through 
a google hangout. See discussion on Drill dev list: 
http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3ccajrw0orh+wfa2gfzgbglbrkqk9m6y4_aor5kh_rxhaag0cb...@mail.gmail.com%3e

The approach to use projections and rely on materialized view rewrite in 
Calcite is predicated on how exactly does Calcite do the MV matching and 
rewrites.  There are at least 2 pending JIRAs:  CALCITE-772 and CALCITE-773 
that are known items that are needed for the phoenix integration.  

However,  I think even if they are addressed, the basic idea of converting each 
 index column predicate into a join would not work well for Drill.   It will 
add substantially to the join planning cost, which is not needed since we can 
do the secondary index planning during physical planning stage rather than 
logical planning.  

> Support the ability to query database tables using external indices   
> --
>
> Key: DRILL-3929
> URL: https://issues.apache.org/jira/browse/DRILL-3929
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> This is a placeholder for adding support in Drill to query database tables 
> using external indices.  I will add more details about the use case and a 
> preliminary design proposal.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3989) Create a sys.queries table

2015-10-28 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978892#comment-14978892
 ] 

Jacques Nadeau commented on DRILL-3989:
---

initially i was focused on running queries. Completed queries is a much larger 
set. Not sure that should be in the same table. (One other note, someone can 
currently query the query log for this information.)

> Create a sys.queries table
> --
>
> Key: DRILL-3989
> URL: https://issues.apache.org/jira/browse/DRILL-3989
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Jacques Nadeau
>  Labels: newbie
>
> We should create a sys.queries table that provides a clusterwide view of 
> active queries. It could include the following columns:
> queryid, user, sql, current status, number of nodes involved, number of total 
> fragments, number of fragments completed, start time
> This should be a pretty straightforward task as we should be able to leverage 
> the capabilities around required affinity. A great model to build off of are 
> the sys.memory and sys.threads tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3989) Create a sys.queries table

2015-10-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978979#comment-14978979
 ] 

Julian Hyde commented on DRILL-3989:


I don’t think Oracle has a clear answer. Their nearest equivalent is v$sql, but 
they also have v$process.

JDBC defines the terminology for most people, and they call it statement. 
Albeit a JDBC statement can be executed multiple times. I don't know whether 
Drill gives each execution a new id, or uses the same statement id for each.

MySQL gets it mixed up: “KILL QUERY terminates the statement the connection is 
currently executing, but leaves the connection itself intact.” 
https://dev.mysql.com/doc/refman/5.0/en/kill.html

I'd define it as "things that are running that a DBA would like to kill". This 
includes SELECT queries, DML and DDL statements. Collectively, statements. 
Certainly, INSERT and CREATE TABLE AS SELECT can potentially table as much time 
& resources as queries.

> Create a sys.queries table
> --
>
> Key: DRILL-3989
> URL: https://issues.apache.org/jira/browse/DRILL-3989
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Jacques Nadeau
>  Labels: newbie
>
> We should create a sys.queries table that provides a clusterwide view of 
> active queries. It could include the following columns:
> queryid, user, sql, current status, number of nodes involved, number of total 
> fragments, number of fragments completed, start time
> This should be a pretty straightforward task as we should be able to leverage 
> the capabilities around required affinity. A great model to build off of are 
> the sys.memory and sys.threads tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3989) Create a sys.queries table

2015-10-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978874#comment-14978874
 ] 

Julian Hyde commented on DRILL-3989:


You should call it "statements". You never know...

> Create a sys.queries table
> --
>
> Key: DRILL-3989
> URL: https://issues.apache.org/jira/browse/DRILL-3989
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Jacques Nadeau
>  Labels: newbie
>
> We should create a sys.queries table that provides a clusterwide view of 
> active queries. It could include the following columns:
> queryid, user, sql, current status, number of nodes involved, number of total 
> fragments, number of fragments completed, start time
> This should be a pretty straightforward task as we should be able to leverage 
> the capabilities around required affinity. A great model to build off of are 
> the sys.memory and sys.threads tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3987) Create a POC VV extraction

2015-10-28 Thread Hanifi Gunes (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978974#comment-14978974
 ] 

Hanifi Gunes commented on DRILL-3987:
-

For the points above,

i) 

> Create a POC VV extraction
> --
>
> Key: DRILL-3987
> URL: https://issues.apache.org/jira/browse/DRILL-3987
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
>
> I'd like to start by looking at an extraction that pulls out the base 
> concepts of:
> buffer allocation, value vectors and complexwriter/fieldreader.
> I need to figure out how to resolve some of the cross-dependency issues (such 
> as the jdbc accessor connections).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3987) Create a POC VV extraction

2015-10-28 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979006#comment-14979006
 ] 

Jacques Nadeau commented on DRILL-3987:
---

Some additional thoughts:

packages (both codegen and code)
org.apache.drill.common.types.TypeProtos (type stuff from protoc)
org.apache.drill.exec.vector
org.apache.drill.exec.vector.complex
org.apache.drill.exec.vector.complex.impl
org.apache.drill.exec.vector.complex.reader
org.apache.drill.exec.vector.complex.writer

Classes 
org.apache.drill.exec.proto.SchemaUserBitShared.SerializedField
org.apache.drill.exec.util.CallBack
org.apache.drill.exec.memory.BufferAllocator
io.netty.buffer.DrillBuf

Need a Basic version of VectorContainer (probably without the VectorWrapper and 
VectorAccessible)
Need to subdivide SchemaPath/MaterializedField.
Need to extract OutOfMemory and some other exceptions 
What do to with holders...

We probably need to extract some DrillBuf concepts externally (e.g. 
buffermanager)






> Create a POC VV extraction
> --
>
> Key: DRILL-3987
> URL: https://issues.apache.org/jira/browse/DRILL-3987
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
>
> I'd like to start by looking at an extraction that pulls out the base 
> concepts of:
> buffer allocation, value vectors and complexwriter/fieldreader.
> I need to figure out how to resolve some of the cross-dependency issues (such 
> as the jdbc accessor connections).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2175) Provide an option to not display the list of files in the physical plan

2015-10-28 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978793#comment-14978793
 ] 

Rahul Challapalli commented on DRILL-2175:
--

We are currently using the text plan in the extended tests for testing 
partition pruning. Whenever the order of files change, we might run into test 
failures and some effort is needed to fix the tests. So getting rid of the 
files scanned will be helpful. We should also make sure that this applies 
across all Scan's (Json, Parquet, Hive etc)

Also the HiveScan currently does not display the "numFiles" attribute in the 
scan. Without this attribute we cannot test the hive partition pruning if we 
end up getting rid of the list of scanned files/partitions 

https://issues.apache.org/jira/browse/DRILL-3634

> Provide an option to not display the list of files in the physical plan
> ---
>
> Key: DRILL-2175
> URL: https://issues.apache.org/jira/browse/DRILL-2175
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
> Fix For: Future
>
>
> The physical plan shown through explain (both the text and json version) 
> shows all the files to be read by the Scan node.   This creates a problem 
> when the number of files is large (e.g hundreds) - I am unable to see the 
> entire plan even after raising the sqlline maxwidth to 500K (default of 10K 
> is too small).   This is a usability issue.  
> We could provide an option - either through another version of Explain or 
> through a session option - to not display the entire list of files.  Another 
> option is to show the parent directory and the number of files it contains.  
> The  total number of files is shown already. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3989) Create a sys.queries table

2015-10-28 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978872#comment-14978872
 ] 

Khurram Faraaz commented on DRILL-3989:
---

Should we also have a column to hold the total time taken for execution by a 
query that has completed execution ?

> Create a sys.queries table
> --
>
> Key: DRILL-3989
> URL: https://issues.apache.org/jira/browse/DRILL-3989
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Jacques Nadeau
>  Labels: newbie
>
> We should create a sys.queries table that provides a clusterwide view of 
> active queries. It could include the following columns:
> queryid, user, sql, current status, number of nodes involved, number of total 
> fragments, number of fragments completed, start time
> This should be a pretty straightforward task as we should be able to leverage 
> the capabilities around required affinity. A great model to build off of are 
> the sys.memory and sys.threads tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3989) Create a sys.queries table

2015-10-28 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978894#comment-14978894
 ] 

Jacques Nadeau commented on DRILL-3989:
---

Do you mean the table e.g. sys.statements?

I can see that. Is that what Oracle calls it?

> Create a sys.queries table
> --
>
> Key: DRILL-3989
> URL: https://issues.apache.org/jira/browse/DRILL-3989
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Jacques Nadeau
>  Labels: newbie
>
> We should create a sys.queries table that provides a clusterwide view of 
> active queries. It could include the following columns:
> queryid, user, sql, current status, number of nodes involved, number of total 
> fragments, number of fragments completed, start time
> This should be a pretty straightforward task as we should be able to leverage 
> the capabilities around required affinity. A great model to build off of are 
> the sys.memory and sys.threads tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3991) Support schema changes in hash join operator

2015-10-28 Thread amit hadke (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

amit hadke updated DRILL-3991:
--
Description: 
Hash join should be able to support schema changes during execution.
It should resolve edge cases when join columns are missing.

Example:

|Table A | Table B|
|--|:---:|
| k1 v1 | k2  v2|
| 1   "a" | "2"  "b"|
| 2  "b" | 1"a"|
| 2.0 "b" | 2.0  "b"|
| 3 "c" | |
   

A INNER JOIN B on A.k1=B.k2
|k1 |  v1  | k2|v2|
|---|::|--:|--:|
| 1 | "a" | 1 | "a" | 
| 2  | "b" | 2.0 | "b" |
| 2.0 | "b" | 2.0 | "b" |

Where in output

k1 is a union type (INTEGER, DOUBLE)
k2 is a union type (INTEGER, DOUBLE, VARCHAR)

  was:
Hash join should be able to support schema changes during execution.
It should resolve edge cases when join columns are missing.

Example:
   Table A  Table B
   k1   v1k2  v2
   1 "a"   "2"  "b"
   2 "b"1"a"
   2.0  "b"  2.0  "b"
   3 "c" 

A inner join B on A.key=B.key
  k1   v1 k2v2
  1 "a" 1 "a"
  2 "b" 2.0   "b"
 2.0   "b" 2.0   "b"

Where in output
   k1 is a union type (INTEGER, DOUBLE)
   k2 is a union type (INTEGER, DOUBLE, VARCHAR)


> Support schema changes in hash join operator
> 
>
> Key: DRILL-3991
> URL: https://issues.apache.org/jira/browse/DRILL-3991
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: amit hadke
>
> Hash join should be able to support schema changes during execution.
> It should resolve edge cases when join columns are missing.
> Example:
> |Table A | Table B|
> |--|:---:|
> | k1 v1 | k2  v2|
> | 1   "a" | "2"  "b"|
> | 2  "b" | 1"a"|
> | 2.0 "b" | 2.0  "b"|
> | 3 "c" | |
>
> A INNER JOIN B on A.k1=B.k2
> |k1 |  v1  | k2|v2|
> |---|::|--:|--:|
> | 1 | "a" | 1 | "a" | 
> | 2  | "b" | 2.0 | "b" |
> | 2.0 | "b" | 2.0 | "b" |
> Where in output
> 
> k1 is a union type (INTEGER, DOUBLE)
> k2 is a union type (INTEGER, DOUBLE, VARCHAR)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3991) Support schema changes in hash join operator

2015-10-28 Thread amit hadke (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

amit hadke updated DRILL-3991:
--
Description: 
Hash join should be able to support schema changes during execution.
It should resolve edge cases when join columns are missing.

Example:

|Table A | Table B|
| k1 v1 | k2  v2|
| 1   "a" | "2"  "b"|
| 2  "b" | 1"a"|
| 2.0 "b" | 2.0  "b"|
| 3 "c" | |
   

A INNER JOIN B on A.k1=B.k2
|k1 |  v1  | k2|v2|
| 1 | "a" | 1 | "a" | 
| 2  | "b" | 2.0 | "b" |
| 2.0 | "b" | 2.0 | "b" |

Where in output

k1 is of union type (INTEGER, DOUBLE)
k2 is of union type (INTEGER, DOUBLE, VARCHAR)

  was:
Hash join should be able to support schema changes during execution.
It should resolve edge cases when join columns are missing.

Example:

|Table A | Table B|
| k1 v1 | k2  v2|
| 1   "a" | "2"  "b"|
| 2  "b" | 1"a"|
| 2.0 "b" | 2.0  "b"|
| 3 "c" | |
   

A INNER JOIN B on A.k1=B.k2
|k1 |  v1  | k2|v2|
| 1 | "a" | 1 | "a" | 
| 2  | "b" | 2.0 | "b" |
| 2.0 | "b" | 2.0 | "b" |

Where in output

k1 is a union type (INTEGER, DOUBLE)
k2 is a union type (INTEGER, DOUBLE, VARCHAR)


> Support schema changes in hash join operator
> 
>
> Key: DRILL-3991
> URL: https://issues.apache.org/jira/browse/DRILL-3991
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: amit hadke
>
> Hash join should be able to support schema changes during execution.
> It should resolve edge cases when join columns are missing.
> Example:
> |Table A | Table B|
> | k1 v1 | k2  v2|
> | 1   "a" | "2"  "b"|
> | 2  "b" | 1"a"|
> | 2.0 "b" | 2.0  "b"|
> | 3 "c" | |
>
> A INNER JOIN B on A.k1=B.k2
> |k1 |  v1  | k2|v2|
> | 1 | "a" | 1 | "a" | 
> | 2  | "b" | 2.0 | "b" |
> | 2.0 | "b" | 2.0 | "b" |
> Where in output
> 
> k1 is of union type (INTEGER, DOUBLE)
> k2 is of union type (INTEGER, DOUBLE, VARCHAR)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-2123) Order of columns in the Web UI is wrong when columns are explicitly specified in projection list

2015-10-28 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam reassigned DRILL-2123:
--

Assignee: Sudheesh Katkam

> Order of columns in the Web UI is wrong when columns are explicitly specified 
> in projection list
> 
>
> Key: DRILL-2123
> URL: https://issues.apache.org/jira/browse/DRILL-2123
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 0.8.0
>Reporter: Victoria Markman
>Assignee: Sudheesh Katkam
>Priority: Critical
> Fix For: Future
>
> Attachments: Screen Shot 2015-01-29 at 4.08.06 PM.png
>
>
> I'm running query:
> {code}
> select  c_integer, 
>c_bigint, 
>nullif(c_integer, c_bigint) 
> from   `dfs.aggregation`.t1 
> order by c_integer
> {code}
> In sqlline I get correct order of columns:
> {code}
> 0: jdbc:drill:schema=dfs> select c_integer, c_bigint, nullif(c_integer, 
> c_bigint) from `dfs.aggregation`.t1;
> ++++
> | c_integer  |  c_bigint  |   EXPR$2   |
> ++++
> | 451237400  | -3477884857818808320 | 451237400  |
> {code}
> In Web UI - columns are sorted in alphabetical order. 
> Screenshot is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3991) Support schema changes in hash join operator

2015-10-28 Thread amit hadke (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

amit hadke updated DRILL-3991:
--
Description: 
Hash join should be able to support schema changes during execution.
It should resolve edge cases when join columns are missing.

Example:

|Table A | Table B|
| k1 v1 | k2  v2|
| 1   "a" | "2"  "b"|
| 2  "b" | 1"a"|
| 2.0 "b" | 2.0  "b"|
| 3 "c" | |
   

A INNER JOIN B on A.k1=B.k2
|k1 |  v1  | k2|v2|
| 1 | "a" | 1 | "a" | 
| 2  | "b" | 2.0 | "b" |
| 2.0 | "b" | 2.0 | "b" |

Where in output

k1 is a union type (INTEGER, DOUBLE)
k2 is a union type (INTEGER, DOUBLE, VARCHAR)

  was:
Hash join should be able to support schema changes during execution.
It should resolve edge cases when join columns are missing.

Example:

|Table A | Table B|
|--|:---:|
| k1 v1 | k2  v2|
| 1   "a" | "2"  "b"|
| 2  "b" | 1"a"|
| 2.0 "b" | 2.0  "b"|
| 3 "c" | |
   

A INNER JOIN B on A.k1=B.k2
|k1 |  v1  | k2|v2|
|---|::|--:|--:|
| 1 | "a" | 1 | "a" | 
| 2  | "b" | 2.0 | "b" |
| 2.0 | "b" | 2.0 | "b" |

Where in output

k1 is a union type (INTEGER, DOUBLE)
k2 is a union type (INTEGER, DOUBLE, VARCHAR)


> Support schema changes in hash join operator
> 
>
> Key: DRILL-3991
> URL: https://issues.apache.org/jira/browse/DRILL-3991
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: amit hadke
>
> Hash join should be able to support schema changes during execution.
> It should resolve edge cases when join columns are missing.
> Example:
> |Table A | Table B|
> | k1 v1 | k2  v2|
> | 1   "a" | "2"  "b"|
> | 2  "b" | 1"a"|
> | 2.0 "b" | 2.0  "b"|
> | 3 "c" | |
>
> A INNER JOIN B on A.k1=B.k2
> |k1 |  v1  | k2|v2|
> | 1 | "a" | 1 | "a" | 
> | 2  | "b" | 2.0 | "b" |
> | 2.0 | "b" | 2.0 | "b" |
> Where in output
> 
> k1 is a union type (INTEGER, DOUBLE)
> k2 is a union type (INTEGER, DOUBLE, VARCHAR)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3991) Support schema changes in hash join operator

2015-10-28 Thread amit hadke (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

amit hadke reassigned DRILL-3991:
-

Assignee: amit hadke

> Support schema changes in hash join operator
> 
>
> Key: DRILL-3991
> URL: https://issues.apache.org/jira/browse/DRILL-3991
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: amit hadke
>Assignee: amit hadke
>
> Hash join should be able to support schema changes during execution.
> It should resolve edge cases when join columns are missing.
> Example:
> |Table A | Table B|
> | k1 v1 | k2  v2|
> | 1   "a" | "2"  "b"|
> | 2  "b" | 1"a"|
> | 2.0 "b" | 2.0  "b"|
> | 3 "c" | |
>
> A INNER JOIN B on A.k1=B.k2
> |k1 |  v1  | k2|v2|
> | 1 | "a" | 1 | "a" | 
> | 2  | "b" | 2.0 | "b" |
> | 2.0 | "b" | 2.0 | "b" |
> Where in output
> 
> k1 is of union type (INTEGER, DOUBLE)
> k2 is of union type (INTEGER, DOUBLE, VARCHAR)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3987) Create a POC VV extraction

2015-10-28 Thread Hanifi Gunes (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978974#comment-14978974
 ] 

Hanifi Gunes edited comment on DRILL-3987 at 10/28/15 7:47 PM:
---

Vectors should store specific types of values supporting append only writes & 
random reads as well as exporting convenience functions for zero-copy buffer 
transfer, accessing vector metadata like buffer size, schema etc.

So for the points above,

we need to export
i) a purified ByteBuf sub-interface. DrillBuf seems over convoluted with 
operator, fragment ctx and suchlike operations.
ii) a subset of Drill's BufferAllocator removing drill specific logic like 
getFragmentLimit
iii) builders to instantiate vectors, writers to support append only writes, 
readers to make random reads
iv) Involving RPC related stuff in the base library sounds out of scope. I 
would model transfers happening amongst vectors.

v) you can export a vector into a metadata & composite buffer it would be 
really nice if you could build it back again. Exporting convenience 
classes/methods like VectorContainers, RecordBatchLoader (will need a better 
name here :) would be really complementary.
vi) I would also propose touching to the design for abstracting out a 
ListVector and removing Repeated* types.
vii) [~jnadeau] we had a lot of difficulty in the past due to 
serialized/materialized mix in the past esp with computing hash code, 
materialized field mismatching complex VV instances. At this point, I would 
think that having an immutable vector descriptor along with an immutable schema 
descriptor built lazily on demand (see BaseVV#getMetadataBuilder) would make 
sense. To me a barebones vector descriptor is as simple as a path/name + type 
(all immutable). We should be able to create a vector just using these two. We 
can still keep MField for carrying out metadata info.

Will look at this more as PoC gets a shape.




was (Author: hgunes):
For the points above,

i) 

> Create a POC VV extraction
> --
>
> Key: DRILL-3987
> URL: https://issues.apache.org/jira/browse/DRILL-3987
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
>
> I'd like to start by looking at an extraction that pulls out the base 
> concepts of:
> buffer allocation, value vectors and complexwriter/fieldreader.
> I need to figure out how to resolve some of the cross-dependency issues (such 
> as the jdbc accessor connections).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3994) Build Fails on Windows after DRILL-3742

2015-10-28 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979704#comment-14979704
 ] 

Jacques Nadeau commented on DRILL-3994:
---

Hey [~julienledem], sounds like this patch may have created a regression in the 
Windows build. Can you take a look?

> Build Fails on Windows after DRILL-3742
> ---
>
> Key: DRILL-3994
> URL: https://issues.apache.org/jira/browse/DRILL-3994
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Reporter: Sudheesh Katkam
>Assignee: Julien Le Dem
>Priority: Critical
>
> Build fails on Windows on the latest master:
> {code}
> c:\drill> mvn clean install -DskipTests 
> ...
> [INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0 
> approved: 169 licence.
> [INFO] 
> [INFO] <<< exec-maven-plugin:1.2.1:java (default) < validate @ drill-common 
> <<<
> [INFO] 
> [INFO] --- exec-maven-plugin:1.2.1:java (default) @ drill-common ---
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See 
> http://www.slf4j.org/codes.html#StaticLoggerBinder
>  for further details.
> Scanning: C:\drill\common\target\classes
> [WARNING] 
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: 
> file:C:/drill/common/target/classes/ not in 
> [file:/C:/drill/common/target/classes/]
>   at 
> org.apache.drill.common.scanner.BuildTimeScan.main(BuildTimeScan.java:129)
>   ... 6 more
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Drill Root POM .. SUCCESS [ 10.016 
> s]
> [INFO] tools/Parent Pom ... SUCCESS [  1.062 
> s]
> [INFO] tools/freemarker codegen tooling ... SUCCESS [  6.922 
> s]
> [INFO] Drill Protocol . SUCCESS [ 10.062 
> s]
> [INFO] Common (Logical Plan, Base expressions)  FAILURE [  9.954 
> s]
> [INFO] contrib/Parent Pom . SKIPPED
> [INFO] contrib/data/Parent Pom  SKIPPED
> [INFO] contrib/data/tpch-sample-data .. SKIPPED
> [INFO] exec/Parent Pom  SKIPPED
> [INFO] exec/Java Execution Engine . SKIPPED
> [INFO] exec/JDBC Driver using dependencies  SKIPPED
> [INFO] JDBC JAR with all dependencies . SKIPPED
> [INFO] contrib/mongo-storage-plugin ... SKIPPED
> [INFO] contrib/hbase-storage-plugin ... SKIPPED
> [INFO] contrib/jdbc-storage-plugin  SKIPPED
> [INFO] contrib/hive-storage-plugin/Parent Pom . SKIPPED
> [INFO] contrib/hive-storage-plugin/hive-exec-shaded ... SKIPPED
> [INFO] contrib/hive-storage-plugin/core ... SKIPPED
> [INFO] contrib/drill-gis-plugin ... SKIPPED
> [INFO] Packaging and Distribution Assembly  SKIPPED
> [INFO] contrib/sqlline  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 38.813 s
> [INFO] Finished at: 2015-10-28T12:17:19-07:00
> [INFO] Final Memory: 67M/466M
> [INFO] 
> 
> [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java 
> (default) on project drill-common: An exception occured while executing the 
> Java class. null: InvocationTargetException: 
> file:C:/drill/common/target/classes/ not in 
> [file:/C:/drill/common/target/classes/] -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR] 
> [ERROR] After correcting the problems, you can resume the build 

[jira] [Updated] (DRILL-3994) Build Fails on Windows after DRILL-3742

2015-10-28 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3994:
--
Assignee: Julien Le Dem

> Build Fails on Windows after DRILL-3742
> ---
>
> Key: DRILL-3994
> URL: https://issues.apache.org/jira/browse/DRILL-3994
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Reporter: Sudheesh Katkam
>Assignee: Julien Le Dem
>Priority: Critical
>
> Build fails on Windows on the latest master:
> {code}
> c:\drill> mvn clean install -DskipTests 
> ...
> [INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0 
> approved: 169 licence.
> [INFO] 
> [INFO] <<< exec-maven-plugin:1.2.1:java (default) < validate @ drill-common 
> <<<
> [INFO] 
> [INFO] --- exec-maven-plugin:1.2.1:java (default) @ drill-common ---
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See 
> http://www.slf4j.org/codes.html#StaticLoggerBinder
>  for further details.
> Scanning: C:\drill\common\target\classes
> [WARNING] 
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: 
> file:C:/drill/common/target/classes/ not in 
> [file:/C:/drill/common/target/classes/]
>   at 
> org.apache.drill.common.scanner.BuildTimeScan.main(BuildTimeScan.java:129)
>   ... 6 more
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Drill Root POM .. SUCCESS [ 10.016 
> s]
> [INFO] tools/Parent Pom ... SUCCESS [  1.062 
> s]
> [INFO] tools/freemarker codegen tooling ... SUCCESS [  6.922 
> s]
> [INFO] Drill Protocol . SUCCESS [ 10.062 
> s]
> [INFO] Common (Logical Plan, Base expressions)  FAILURE [  9.954 
> s]
> [INFO] contrib/Parent Pom . SKIPPED
> [INFO] contrib/data/Parent Pom  SKIPPED
> [INFO] contrib/data/tpch-sample-data .. SKIPPED
> [INFO] exec/Parent Pom  SKIPPED
> [INFO] exec/Java Execution Engine . SKIPPED
> [INFO] exec/JDBC Driver using dependencies  SKIPPED
> [INFO] JDBC JAR with all dependencies . SKIPPED
> [INFO] contrib/mongo-storage-plugin ... SKIPPED
> [INFO] contrib/hbase-storage-plugin ... SKIPPED
> [INFO] contrib/jdbc-storage-plugin  SKIPPED
> [INFO] contrib/hive-storage-plugin/Parent Pom . SKIPPED
> [INFO] contrib/hive-storage-plugin/hive-exec-shaded ... SKIPPED
> [INFO] contrib/hive-storage-plugin/core ... SKIPPED
> [INFO] contrib/drill-gis-plugin ... SKIPPED
> [INFO] Packaging and Distribution Assembly  SKIPPED
> [INFO] contrib/sqlline  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 38.813 s
> [INFO] Finished at: 2015-10-28T12:17:19-07:00
> [INFO] Final Memory: 67M/466M
> [INFO] 
> 
> [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java 
> (default) on project drill-common: An exception occured while executing the 
> Java class. null: InvocationTargetException: 
> file:C:/drill/common/target/classes/ not in 
> [file:/C:/drill/common/target/classes/] -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR] 
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :drill-common
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3995) Scalar replacement bug with Common Subexpression Elimination

2015-10-28 Thread Steven Phillips (JIRA)
Steven Phillips created DRILL-3995:
--

 Summary: Scalar replacement bug with Common Subexpression 
Elimination
 Key: DRILL-3995
 URL: https://issues.apache.org/jira/browse/DRILL-3995
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips


The following query:
 {code}
select t1.full_name from cp.`employee.json` t1, cp.`department.json` t2 where 
t1.department_id = t2.department_id and t1.position_id = t2.department_id
{code}

fails with the following:

org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
RuntimeException: Error at instruction 43: Expected an object reference, but 
found . setValue(II)V
0 R I I . . . .  :  :L0
1 R I I . . . .  :  : LINENUMBER 249 L0
2 R I I . . . .  :  : ICONST_0
3 R I I . . . .  : I  : ISTORE 3
4 R I I I . . .  :  : LCONST_0
5 R I I I . . .  : J  : LSTORE 4
6 R I I I J . .  :  :L1
7 R I I I J . .  :  : LINENUMBER 251 L1
8 R I I I J . .  :  : ALOAD 0
9 R I I I J . .  : R  : GETFIELD 
org/apache/drill/exec/test/generated/HashTableGen2$BatchHolder.vv20 : 
Lorg/apache/drill/exec/vector/NullableBigIntVector;
00010 R I I I J . .  : R  : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector.getAccessor 
()Lorg/apache/drill/exec/vector/NullableBigIntVector$Accessor;
00011 R I I I J . .  : R  : ILOAD 1
00012 R I I I J . .  : R I  : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector$Accessor.isSet (I)I
00013 R I I I J . .  : I  : ISTORE 3
00014 R I I I J . .  :  :L2
00015 R I I I J . .  :  : LINENUMBER 252 L2
00016 R I I I J . .  :  : ILOAD 3
00017 R I I I J . .  : I  : ICONST_1
00018 R I I I J . .  : I I  : IF_ICMPNE L3
00019 R I I I J . .  :  :L4
00020 ? : LINENUMBER 253 L4
00021 ? : ALOAD 0
00022 ? : GETFIELD 
org/apache/drill/exec/test/generated/HashTableGen2$BatchHolder.vv20 : 
Lorg/apache/drill/exec/vector/NullableBigIntVector;
00023 ? : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector.getAccessor 
()Lorg/apache/drill/exec/vector/NullableBigIntVector$Accessor;
00024 ? : ILOAD 1
00025 ? : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector$Accessor.get (I)J
00026 ? : LSTORE 4
00027 R I I I J . .  :  :L3
00028 R I I I J . .  :  : LINENUMBER 256 L3
00029 R I I I J . .  :  : ILOAD 3
00030 R I I I J . .  : I  : ICONST_0
00031 R I I I J . .  : I I  : IF_ICMPEQ L5
00032 R I I I J . .  :  :L6
00033 ? : LINENUMBER 257 L6
00034 ? : ALOAD 0
00035 ? : GETFIELD 
org/apache/drill/exec/test/generated/HashTableGen2$BatchHolder.vv24 : 
Lorg/apache/drill/exec/vector/NullableBigIntVector;
00036 ? : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector.getMutator 
()Lorg/apache/drill/exec/vector/NullableBigIntVector$Mutator;
00037 ? : ILOAD 2
00038 ? : ILOAD 3
00039 ? : LLOAD 4
00040 ? : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector$Mutator.set (IIJ)V
00041 R I I I J . .  :  :L5
00042 R I I I J . .  :  : LINENUMBER 259 L5
00043 R I I I J . .  :  : ALOAD 6
00044 ? : GETFIELD 
org/apache/drill/exec/expr/holders/NullableBigIntHolder.isSet : I
00045 ? : ICONST_0
00046 ? : IF_ICMPEQ L7
00047 ? :L8
00048 ? : LINENUMBER 260 L8
00049 ? : ALOAD 0
00050 ? : GETFIELD 
org/apache/drill/exec/test/generated/HashTableGen2$BatchHolder.vv27 : 
Lorg/apache/drill/exec/vector/NullableBigIntVector;
00051 ? : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector.getMutator 
()Lorg/apache/drill/exec/vector/NullableBigIntVector$Mutator;
00052 ? : ILOAD 2
00053 ? : ALOAD 6
00054 ? : GETFIELD 
org/apache/drill/exec/expr/holders/NullableBigIntHolder.isSet : I
00055 ? : ALOAD 6
00056 ? : GETFIELD 
org/apache/drill/exec/expr/holders/NullableBigIntHolder.value : J
00057 ? : INVOKEVIRTUAL 
org/apache/drill/exec/vector/NullableBigIntVector$Mutator.set (IIJ)V
00058 ? :L7
00059 ? : LINENUMBER 245 L7
00060 ? : RETURN
00061 ? :L9

when common subexpressions are eliminated (see DRILL-3912).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3912) Common subexpression elimination in code generation

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979747#comment-14979747
 ] 

ASF GitHub Bot commented on DRILL-3912:
---

Github user StevenMPhillips commented on the pull request:

https://github.com/apache/drill/pull/189#issuecomment-152066327
  
@jinfengni I updated the PR. Could you take a look?


> Common subexpression elimination in code generation
> ---
>
> Key: DRILL-3912
> URL: https://issues.apache.org/jira/browse/DRILL-3912
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Jinfeng Ni
>
> Drill currently will evaluate the full expression tree, even if there are 
> redundant subtrees. Many of these redundant evaluations can be eliminated by 
> reusing the results from previously evaluated expression trees.
> For example,
> {code}
> select a + 1, (a + 1)* (a - 1) from t
> {code}
> Will compute the entire (a + 1) expression twice. With CSE, it will only be 
> evaluated once.
> The benefit will be reducing the work done when evaluating expressions, as 
> well as reducing the amount of code that is generated, which could also lead 
> to better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3993) Rebase Drill on Calcite 1.5.0 release

2015-10-28 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979636#comment-14979636
 ] 

Sudheesh Katkam commented on DRILL-3993:


[~julianhyde] I completely agree with you. I think my question could have been 
phrased better. I wanted to know if there are any documented steps that we take 
every time to catch up.

And thanks to the Calcite community for allowing Drillers to check for 
regressions before a release; that's has been very helpful :)

> Rebase Drill on Calcite 1.5.0 release
> -
>
> Key: DRILL-3993
> URL: https://issues.apache.org/jira/browse/DRILL-3993
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Sudheesh Katkam
>
> Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure 
> there are no regressions.
> Also, how do we resolve this 'catching up' issue in the long term?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979707#comment-14979707
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152059989
  
Just to add to my comment above, if you want to do a quick call or hangout 
to discuss I'm more than happy to. As I said above, it is possible I am 
misunderstanding. If so, I'll definitely revise my objection.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979632#comment-14979632
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152047328
  
Interesting. Can you explain where the time is coming from? It isn't clear 
to me why this will have a big impact over what we had before. While you're 
pushing the limit down to just above the scan nodes, we already had an 
optimization which avoided parallelization. Since we're pipelined this really 
shouldn't matter much. Is limit zero not working right in the limit operator? 
It should terminate upon receiving schema, not wait until a batch of actual 
records (I'm wondering if it is doing the latter). Is sending zero records 
through causing operators to skip compilation? In what cases was this change 
taking something from hundreds of seconds to a few seconds? I'm asking these 
questions so I can better understand as I want to make sure there isn't a bug 
somewhere else. Thanks!


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979671#comment-14979671
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152053724
  
I think I see the source of confusion (sorry); this patch does not address 
that query in the JIRA, which is why Jinfeng asked me to change the title in 
one of his comments. Regarding that query, DRILL-3921 helps avoids most of the 
execution time, but we still incur the planning time. And my initial approach 
address this issue but as mentioned above, this is blocked by DRILL-2288 and 
other things.

The new approach actually addresses any query that has a limit 0 above a 
blocking operator that consumes all records. And avoiding parallelization made 
the query much worse. (Actually, was fast-schema supposed to still kick in? Did 
not seem like it from my experiments.)

I tested against a query like `SELECT * FROM (SELECT COUNT(DISTINCT a), 
COUNT(DISTINCT b), COUNT(DISTINCT c) FROM very_large_table) T LIMIT 0` and this 
completed two orders of magnitude faster.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979701#comment-14979701
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152056732
  
I'm sorry to say that I'm -1 on this change

It seems to be adding a planning rewrite rule where there should be a 
simple fix execution bug. Let's just fix the execution bug. 

Limit 0 should complete its execution the moment it receives a schema (as 
part of fast schema). It doesn't need to receive any records. You just 
described a situation where it is waiting for records from a blocking operator. 
That shouldn't be the case. If there is some other real benefit to this change 
after that execution bug is fixed, let's revisit in that light.

If you think I'm misunderstanding your description of the execution 
behavior or the dynamics involved, please help me to better understand. 


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979198#comment-14979198
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r43315255
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ---
@@ -46,6 +51,32 @@ public static boolean containsLimit0(RelNode rel) {
 return visitor.isContains();
   }
 
+  public static DrillRel addLimitOnTopOfLeafNodes(final DrillRel rel) {
+final RelShuttleImpl shuttle = new RelShuttleImpl() {
+
+  private RelNode addLimitAsParent(RelNode node) {
+final RexBuilder builder = node.getCluster().getRexBuilder();
+final RexLiteral offset = 
builder.makeExactLiteral(BigDecimal.ZERO);
+final RexLiteral fetch = builder.makeExactLiteral(BigDecimal.ZERO);
+return new DrillLimitRel(node.getCluster(), node.getTraitSet(), 
node, offset, fetch);
--- End diff --

I understand in your case, you only putDrillLimitRel. But you may want to 
make this Visitor more general, such that it could create any kind of LimitRel, 
including DrillLimitRel, LogicalLimitRel, DrillLimitPrel, etc. You can do that 
by define a LimitFactory, and pass to this Visitor. This's similar to what 
other Calcite rule would do, to make the code more general. 


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3992) Unable to query Oracle

2015-10-28 Thread Eric Roma (JIRA)
Eric Roma created DRILL-3992:


 Summary: Unable to query Oracle 
 Key: DRILL-3992
 URL: https://issues.apache.org/jira/browse/DRILL-3992
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.2.0
 Environment: Windows 7 Enterprise 64-bit, Oracle 10g, Teradata 15.00
Reporter: Eric Roma
Priority: Minor
 Fix For: 1.2.0


*See External Issue URL for Stack Overflow Post*
*Appears to be similar issue at 
http://stackoverflow.com/questions/33370438/apache-drill-1-2-and-sql-server-jdbc**

Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release 
10.2.0.4.0 - 64bit in embedded mode.

I'm curious if anyone has had any success connecting Apache Drill to an Oracle 
DB. I've updated the drill-override.conf with the following configurations (per 
documents):

drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "localhost:2181",
  drill.exec.sys.store.provider.local.path = "/mypath"
}
and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can 
successfully create the storage plug-in:

{
  "type": "jdbc",
  "driver": "oracle.jdbc.driver.OracleDriver",
  "url": "jdbc:oracle:thin:@::",
  "username": "USERNAME",
  "password": "PASSWORD",
  "enabled": true
}
but when I issue a query such as:

select * from ..`dual`; 
I get the following error:

Query Failed: An Error Occurred
org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: From 
line 1, column 15 to line 1, column 20: Table 
'..dual' not found [Error Id: 
57a4153c-6378-4026-b90c-9bb727e131ae on :].
I've tried to query other schema/tables and get a similar result. I've also 
tried connecting to Teradata and get the same error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3992) Unable to query Oracle DB using JDBC Storage Plug-In

2015-10-28 Thread Eric Roma (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Roma updated DRILL-3992:
-
Description: 
*See External Issue URL for Stack Overflow Post*
*Appears to be similar issue at 
http://stackoverflow.com/questions/33370438/apache-drill-1-2-and-sql-server-jdbc*

Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release 
10.2.0.4.0 - 64bit in embedded mode.

I'm curious if anyone has had any success connecting Apache Drill to an Oracle 
DB. I've updated the drill-override.conf with the following configurations (per 
documents):

drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "localhost:2181",
  drill.exec.sys.store.provider.local.path = "/mypath"
}
and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can 
successfully create the storage plug-in:

{
  "type": "jdbc",
  "driver": "oracle.jdbc.driver.OracleDriver",
  "url": "jdbc:oracle:thin:@::",
  "username": "USERNAME",
  "password": "PASSWORD",
  "enabled": true
}
but when I issue a query such as:

select * from ..`dual`; 
I get the following error:

Query Failed: An Error Occurred
org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: From 
line 1, column 15 to line 1, column 20: Table 
'..dual' not found [Error Id: 
57a4153c-6378-4026-b90c-9bb727e131ae on :].
I've tried to query other schema/tables and get a similar result. I've also 
tried connecting to Teradata and get the same error.

  was:
*See External Issue URL for Stack Overflow Post*
*Appears to be similar issue at 
http://stackoverflow.com/questions/33370438/apache-drill-1-2-and-sql-server-jdbc**

Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release 
10.2.0.4.0 - 64bit in embedded mode.

I'm curious if anyone has had any success connecting Apache Drill to an Oracle 
DB. I've updated the drill-override.conf with the following configurations (per 
documents):

drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "localhost:2181",
  drill.exec.sys.store.provider.local.path = "/mypath"
}
and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can 
successfully create the storage plug-in:

{
  "type": "jdbc",
  "driver": "oracle.jdbc.driver.OracleDriver",
  "url": "jdbc:oracle:thin:@::",
  "username": "USERNAME",
  "password": "PASSWORD",
  "enabled": true
}
but when I issue a query such as:

select * from ..`dual`; 
I get the following error:

Query Failed: An Error Occurred
org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: From 
line 1, column 15 to line 1, column 20: Table 
'..dual' not found [Error Id: 
57a4153c-6378-4026-b90c-9bb727e131ae on :].
I've tried to query other schema/tables and get a similar result. I've also 
tried connecting to Teradata and get the same error.

Summary: Unable to query Oracle DB using JDBC Storage Plug-In  (was: 
Unable to query Oracle )

> Unable to query Oracle DB using JDBC Storage Plug-In
> 
>
> Key: DRILL-3992
> URL: https://issues.apache.org/jira/browse/DRILL-3992
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
> Environment: Windows 7 Enterprise 64-bit, Oracle 10g, Teradata 15.00
>Reporter: Eric Roma
>Priority: Minor
>  Labels: newbie
> Fix For: 1.2.0
>
>
> *See External Issue URL for Stack Overflow Post*
> *Appears to be similar issue at 
> http://stackoverflow.com/questions/33370438/apache-drill-1-2-and-sql-server-jdbc*
> Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release 
> 10.2.0.4.0 - 64bit in embedded mode.
> I'm curious if anyone has had any success connecting Apache Drill to an 
> Oracle DB. I've updated the drill-override.conf with the following 
> configurations (per documents):
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "localhost:2181",
>   drill.exec.sys.store.provider.local.path = "/mypath"
> }
> and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can 
> successfully create the storage plug-in:
> {
>   "type": "jdbc",
>   "driver": "oracle.jdbc.driver.OracleDriver",
>   "url": "jdbc:oracle:thin:@::",
>   "username": "USERNAME",
>   "password": "PASSWORD",
>   "enabled": true
> }
> but when I issue a query such as:
> select * from ..`dual`; 
> I get the following error:
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: 
> From line 1, column 15 to line 1, column 20: Table 
> '..dual' not found [Error Id: 
> 57a4153c-6378-4026-b90c-9bb727e131ae on :].
> I've tried to query other schema/tables and get a similar result. I've also 
> tried connecting to Teradata and get the same error.



--
This message was sent 

[jira] [Commented] (DRILL-3983) Small test improvements

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979289#comment-14979289
 ] 

ASF GitHub Bot commented on DRILL-3983:
---

Github user julienledem commented on the pull request:

https://github.com/apache/drill/pull/221#issuecomment-152001036
  
@adeneche Please see last commit. I made the output printing configurable 
so that it is less verbose in tests. 
https://github.com/apache/drill/commit/9b40f93122eb22055e9ebec287e5a5ebfa65a2fe


> Small test improvements
> ---
>
> Key: DRILL-3983
> URL: https://issues.apache.org/jira/browse/DRILL-3983
> Project: Apache Drill
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979209#comment-14979209
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user julianhyde commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r43316183
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ---
@@ -46,6 +51,32 @@ public static boolean containsLimit0(RelNode rel) {
 return visitor.isContains();
   }
 
+  public static DrillRel addLimitOnTopOfLeafNodes(final DrillRel rel) {
+final RelShuttleImpl shuttle = new RelShuttleImpl() {
+
+  private RelNode addLimitAsParent(RelNode node) {
+final RexBuilder builder = node.getCluster().getRexBuilder();
+final RexLiteral offset = 
builder.makeExactLiteral(BigDecimal.ZERO);
+final RexLiteral fetch = builder.makeExactLiteral(BigDecimal.ZERO);
+return new DrillLimitRel(node.getCluster(), node.getTraitSet(), 
node, offset, fetch);
--- End diff --

Agree with @jinfengni. In more recent versions of Calcite, use 
RelBuilder.limit() or .sortLimit(). The RelBuilder will be configured to create 
the appropriate Drill variants of all RelNodes. It might also do some useful 
canonization/optimization. We recommend using RelBuilder for most tasks 
involving creating RelNodes.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979213#comment-14979213
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-151987787
  
Please modify the title of JIRA DRILL-3623, since the new pull request is 
using a completely different approach to address the performance issue for 
"LIMIT 0".


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979292#comment-14979292
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152001143
  
What happened to the original strategy of short circuiting on schema'd 
files. This approach still means we have to pay for all the operation 
compilations for no reason.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979189#comment-14979189
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r43314473
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ---
@@ -46,6 +51,32 @@ public static boolean containsLimit0(RelNode rel) {
 return visitor.isContains();
   }
 
+  public static DrillRel addLimitOnTopOfLeafNodes(final DrillRel rel) {
+final RelShuttleImpl shuttle = new RelShuttleImpl() {
+
+  private RelNode addLimitAsParent(RelNode node) {
+final RexBuilder builder = node.getCluster().getRexBuilder();
+final RexLiteral offset = 
builder.makeExactLiteral(BigDecimal.ZERO);
+final RexLiteral fetch = builder.makeExactLiteral(BigDecimal.ZERO);
+return new DrillLimitRel(node.getCluster(), node.getTraitSet(), 
node, offset, fetch);
+  }
+
+  @Override
+  public RelNode visit(TableScan scan) {
--- End diff --

You also need override visitValues, since Value could be leaf operator as 
well, in addition to TableScan. 


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979348#comment-14979348
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152013964
  
The original approach (skipping the execution phase for limit 0 
completely), actually could potentially have issues in some cases, due to the 
difference in Calcite rule and Drill execution rule, in terms of how type is 
determined.

For example, sum(int) in calcite is resolved to be int, while in Drill 
execution, we changed to bigint. Another case is implicit cast. Currently, 
there are some small differences between Calcite and Drill execution. That 
means, if we skip the execution for limit 0, then types which are resolved in 
Calcite could be different from the type if the query goes through Drill 
execution. For BI tool like Tableau, that means the type returned from "limit 
0" query and type from a second query w/o "limit 0" could be different. 

If we want to avoid the above issues, we have to detect all those cases, 
which are painful. That's why Sudheesh and I are now more inclined to this new 
approach. 
 


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3871) Exception on inner join when join predicate is int96 field generated by impala

2015-10-28 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim reassigned DRILL-3871:
---

Assignee: Deneche A. Hakim  (was: Parth Chandra)

> Exception on inner join when join predicate is int96 field generated by impala
> --
>
> Key: DRILL-3871
> URL: https://issues.apache.org/jira/browse/DRILL-3871
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>Assignee: Deneche A. Hakim
>Priority: Critical
>  Labels: int96
> Fix For: 1.3.0
>
> Attachments: tables.tar
>
>
> Both tables in the join where created by impala, with column c_timestamp 
> being parquet int96. 
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . > max(t1.c_timestamp),
> . . . . . . . . . . . . > min(t1.c_timestamp),
> . . . . . . . . . . . . > count(t1.c_timestamp)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > imp_t1 t1
> . . . . . . . . . . . . > inner join
> . . . . . . . . . . . . > imp_t2 t2
> . . . . . . . . . . . . > on  (t1.c_timestamp = t2.c_timestamp)
> . . . . . . . . . . . . > ;
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
> TProtocolException: Required field 'uncompressed_page_size' was not found in 
> serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, 
> compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
> at 
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> at sqlline.SqlLine.print(SqlLine.java:1583)
> at sqlline.Commands.execute(Commands.java:852)
> at sqlline.Commands.sql(Commands.java:751)
> at sqlline.SqlLine.dispatch(SqlLine.java:738)
> at sqlline.SqlLine.begin(SqlLine.java:612)
> at sqlline.SqlLine.start(SqlLine.java:366)
> at sqlline.SqlLine.main(SqlLine.java:259)
> {code}
> drillbit.log
> {code}
> 2015-09-30 21:15:45,710 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses
> 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Time: 1ms total, 1.645381ms avg, 1ms max.
> 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Earliest start: 1.332000 μs, Latest start: 1.332000 μs, 
> Average start: 1.332000 μs .
> 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State to report: RUNNING
> 2015-09-30 21:15:45,925 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested RUNNING --> 
> FAILED
> 2015-09-30 21:15:45,930 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested FAILED --> 
> FINISHED
> 2015-09-30 21:15:45,931 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: TProtocolException: 
> Required field 'uncompressed_page_size' was not found in serialized data! 
> Struct: PageHeader(type:null, uncompressed_page_size:0, 
> compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> TProtocolException: Required field 'uncompressed_page_size' was not found in 
> serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, 
> compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323)
>  

[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979395#comment-14979395
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152018974
  
Got it. Thanks for the explanation. So this is a hack until we can solve 
those issues.

I think we have to do this work, however. a 1-2 second response to a limit 
0 query is too much. We should open up jiras for all of these inconsistency 
issues and then get Calcite and Drill in alignment. 

What do you think we're talking about: aggregation outputs, implicit 
casting. What else? 


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3993) Rebase Drill on Calcite 1.5.0 release

2015-10-28 Thread Sudheesh Katkam (JIRA)
Sudheesh Katkam created DRILL-3993:
--

 Summary: Rebase Drill on Calcite 1.5.0 release
 Key: DRILL-3993
 URL: https://issues.apache.org/jira/browse/DRILL-3993
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.2.0
Reporter: Sudheesh Katkam


Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure 
there are no regressions.

Also, how do we resolve this 'catching up' issue in the long term?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3993) Rebase Drill on Calcite 1.5.0 release

2015-10-28 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979474#comment-14979474
 ] 

Jacques Nadeau commented on DRILL-3993:
---

We just need to get off the fork. [~jni], can you outline what are the three 
main issues? Maybe [~sudheeshkatkam] can help resolve them.


> Rebase Drill on Calcite 1.5.0 release
> -
>
> Key: DRILL-3993
> URL: https://issues.apache.org/jira/browse/DRILL-3993
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Sudheesh Katkam
>
> Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure 
> there are no regressions.
> Also, how do we resolve this 'catching up' issue in the long term?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3993) Rebase Drill on Calcite 1.5.0 release

2015-10-28 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979477#comment-14979477
 ] 

Jacques Nadeau edited comment on DRILL-3993 at 10/28/15 11:32 PM:
--

Actually, I think I remembered:

* Schema Caching
* * Validator
* AbstractConverter (e.g. trait pull-up)


was (Author: jnadeau):
Actually, I think I remembered:

Schema Caching
* Validator
AbstractConverter (e.g. trait pull-up)

> Rebase Drill on Calcite 1.5.0 release
> -
>
> Key: DRILL-3993
> URL: https://issues.apache.org/jira/browse/DRILL-3993
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Sudheesh Katkam
>
> Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure 
> there are no regressions.
> Also, how do we resolve this 'catching up' issue in the long term?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3993) Rebase Drill on Calcite 1.5.0 release

2015-10-28 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979477#comment-14979477
 ] 

Jacques Nadeau commented on DRILL-3993:
---

Actually, I think I remembered:

Schema Caching
* Validator
AbstractConverter (e.g. trait pull-up)

> Rebase Drill on Calcite 1.5.0 release
> -
>
> Key: DRILL-3993
> URL: https://issues.apache.org/jira/browse/DRILL-3993
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Sudheesh Katkam
>
> Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure 
> there are no regressions.
> Also, how do we resolve this 'catching up' issue in the long term?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979502#comment-14979502
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152034356
  
Also, on the execution side, I was actually hitting 
[DRILL-2288](https://issues.apache.org/jira/browse/DRILL-2288), where sending 
exactly one batch with schema and without data is not handled correctly by 
various RecordBatches. With a fix for that issue, we could add further 
optimization for schemaed tables (i.e. add the previous implementation) with 
this implementation as the fallback.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3963) Read raw key value bytes from sequence files

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979526#comment-14979526
 ] 

ASF GitHub Bot commented on DRILL-3963:
---

Github user amithadke commented on the pull request:

https://github.com/apache/drill/pull/214#issuecomment-152036324
  
@sudheeshkatkam I've added changes and tests for sequence file and avro. 
they both use hadoop's api to create recordreader. Thanks for helping out with 
the test.


> Read raw key value bytes from sequence files
> 
>
> Key: DRILL-3963
> URL: https://issues.apache.org/jira/browse/DRILL-3963
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: amit hadke
>Assignee: amit hadke
>
> Sequence files store list of key-value pairs. Keys/values are of type hadoop 
> writable.
> Provide a format plugin that reads raw bytes out of sequence files which can 
> be further deserialized by a udf(from hadoop writable -> drill type)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3993) Rebase Drill on Calcite 1.5.0 release

2015-10-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979538#comment-14979538
 ] 

Julian Hyde commented on DRILL-3993:


[~sudheeshkatkam], "Catching up" is always necessary when you separate two 
components into modules and version them separately. Changes in module A don't 
break module B's nightly builds, but B needs to periodically sync up, at a time 
of its choosing.

I think that we put a lot of valuable features into Calcite that benefit Drill 
(some of them contributed by people who are also Drill committers), and I think 
we do a pretty good job at controlling change, so that things that do not 
directly benefit Drill at least do not break it. For example, we follow 
semantic versioning and do not remove APIs except in a major release.

We have discovered with other projects that asking the downstream projects to 
kick the tires of a Calcite release in the run-up to a release is an effective 
way to find problems, and efficient in terms of time and effort for both 
projects.

If there is anything else we can do in Calcite do make the process more 
efficient for Drill, let me know.

> Rebase Drill on Calcite 1.5.0 release
> -
>
> Key: DRILL-3993
> URL: https://issues.apache.org/jira/browse/DRILL-3993
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Sudheesh Katkam
>
> Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure 
> there are no regressions.
> Also, how do we resolve this 'catching up' issue in the long term?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3993) Rebase Drill on Calcite 1.5.0 release

2015-10-28 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979492#comment-14979492
 ] 

Jinfeng Ni commented on DRILL-3993:
---

[~jnadeau], that's the three major things between Drill's forked Calcite and 
Calcite master. I'm working on the fist one.

The third one, AbstractConverter, is also one of the reason for Drill's long 
physical planning time in some cases. If we remove AbstractConverter and find 
way for trait pull-up, I believe we could see planning time would be reduced 
significantly. 

  

> Rebase Drill on Calcite 1.5.0 release
> -
>
> Key: DRILL-3993
> URL: https://issues.apache.org/jira/browse/DRILL-3993
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Sudheesh Katkam
>
> Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure 
> there are no regressions.
> Also, how do we resolve this 'catching up' issue in the long term?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979500#comment-14979500
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152034091
  
Sudheesh and I feel this new approach is more like a big optimization step 
towards solving the performance issue for "limit 0" query, rather than hack 
solution :  1) It shows quite significantly reduction in query time, from 
hundreds of seconds to couple of seconds in some cases. That's a big 
improvement. 2) it would benefit not only schema-based query, but also 
schema-less query, while the original approach would only apply for 
schema-based query. 

I agree we should continue to optimize "limit 0" query. But for now, I 
think this new approach has its own merits.

The aggregation /implicit casting are the two things that I can think of, 
if we go with the schema-based approach. 
 



> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3987) Create a POC VV extraction

2015-10-28 Thread Parth Chandra (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979562#comment-14979562
 ] 

Parth Chandra commented on DRILL-3987:
--

I was thinking that we would start with something equivalent to the 
Parquet-Format project that fixes the format in an implementation (and 
language) independent way. That way, we can update the C++ implementation to 
keep in sync with the Java implementation as well.
Also, agree with Hanifi's comment (vii), above. A single immutable vector 
descriptor and a lazily built schema descriptor would be just right.

> Create a POC VV extraction
> --
>
> Key: DRILL-3987
> URL: https://issues.apache.org/jira/browse/DRILL-3987
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
>
> I'd like to start by looking at an extraction that pulls out the base 
> concepts of:
> buffer allocation, value vectors and complexwriter/fieldreader.
> I need to figure out how to resolve some of the cross-dependency issues (such 
> as the jdbc accessor connections).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3994) Build Fails on Windows after DRILL-3742

2015-10-28 Thread Sudheesh Katkam (JIRA)
Sudheesh Katkam created DRILL-3994:
--

 Summary: Build Fails on Windows after DRILL-3742
 Key: DRILL-3994
 URL: https://issues.apache.org/jira/browse/DRILL-3994
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Reporter: Sudheesh Katkam
Priority: Critical


Build fails on Windows on the latest master:

{code}
c:\drill> mvn clean install -DskipTests 
...
[INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0 
approved: 169 licence.
[INFO] 
[INFO] <<< exec-maven-plugin:1.2.1:java (default) < validate @ drill-common <<<
[INFO] 
[INFO] --- exec-maven-plugin:1.2.1:java (default) @ drill-common ---
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See 
http://www.slf4j.org/codes.html#StaticLoggerBinder
 for further details.
Scanning: C:\drill\common\target\classes
[WARNING] 
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: 
file:C:/drill/common/target/classes/ not in 
[file:/C:/drill/common/target/classes/]
at 
org.apache.drill.common.scanner.BuildTimeScan.main(BuildTimeScan.java:129)
... 6 more
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Drill Root POM .. SUCCESS [ 10.016 s]
[INFO] tools/Parent Pom ... SUCCESS [  1.062 s]
[INFO] tools/freemarker codegen tooling ... SUCCESS [  6.922 s]
[INFO] Drill Protocol . SUCCESS [ 10.062 s]
[INFO] Common (Logical Plan, Base expressions)  FAILURE [  9.954 s]
[INFO] contrib/Parent Pom . SKIPPED
[INFO] contrib/data/Parent Pom  SKIPPED
[INFO] contrib/data/tpch-sample-data .. SKIPPED
[INFO] exec/Parent Pom  SKIPPED
[INFO] exec/Java Execution Engine . SKIPPED
[INFO] exec/JDBC Driver using dependencies  SKIPPED
[INFO] JDBC JAR with all dependencies . SKIPPED
[INFO] contrib/mongo-storage-plugin ... SKIPPED
[INFO] contrib/hbase-storage-plugin ... SKIPPED
[INFO] contrib/jdbc-storage-plugin  SKIPPED
[INFO] contrib/hive-storage-plugin/Parent Pom . SKIPPED
[INFO] contrib/hive-storage-plugin/hive-exec-shaded ... SKIPPED
[INFO] contrib/hive-storage-plugin/core ... SKIPPED
[INFO] contrib/drill-gis-plugin ... SKIPPED
[INFO] Packaging and Distribution Assembly  SKIPPED
[INFO] contrib/sqlline  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 38.813 s
[INFO] Finished at: 2015-10-28T12:17:19-07:00
[INFO] Final Memory: 67M/466M
[INFO] 
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java 
(default) on project drill-common: An exception occured while executing the 
Java class. null: InvocationTargetException: 
file:C:/drill/common/target/classes/ not in 
[file:/C:/drill/common/target/classes/] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :drill-common
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-3756) Consider loosening up the Maven checkstyle audit

2015-10-28 Thread Edmon Begoli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edmon Begoli closed DRILL-3756.
---
Resolution: Fixed

We agreed that this style check can stay.

> Consider loosening up the Maven checkstyle audit
> 
>
> Key: DRILL-3756
> URL: https://issues.apache.org/jira/browse/DRILL-3756
> Project: Apache Drill
>  Issue Type: Wish
>  Components: Tools, Build & Test
>Affects Versions: 1.1.0
> Environment: Maven build on any platform.
>Reporter: Edmon Begoli
>Priority: Minor
> Fix For: Future
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> A space in javadoc before the end of line causes Maven build to fail on 
> checkstyle audit.
> [INFO] --- maven-checkstyle-plugin:2.12.1:check (checkstyle-validation) @ 
> drill-java-exec ---
> [INFO] Starting audit...
> for example
> /drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePlugin.java:30:
>  Line matches the illegal pattern '\s+$'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3726) Drill is not properly interpreting CRLF (0d0a). CR gets read as content.

2015-10-28 Thread Edmon Begoli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edmon Begoli updated DRILL-3726:

Description: 
  When we query the last attribute of a text file, we get missing characters.  
Looking at the row through Drill, a \r is included at the end of the last 
attribute.  
Looking in a text editor, it's not embedded into that attribute.

I'm thinking that Drill is not interpreting CRLF (0d0a) as a new line, only the 
LF, resulting in the CR becoming part of the last attribute.

  was:
 When we query the last attribute of a text file, we get missing characters.  
Looking at the row through Drill, a \r is included at the end of the last 
attribute.  
Looking in a text editor, it's not embedded into that attribute.

I'm thinking that Drill is not interpreting CRLF (0d0a) as a new line, only the 
LF, resulting in the CR becoming part of the last attribute.


> Drill is not properly interpreting CRLF (0d0a). CR gets read as content.
> 
>
> Key: DRILL-3726
> URL: https://issues.apache.org/jira/browse/DRILL-3726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.1.0
> Environment: Linux RHEL 6.6, OSX 10.9
>Reporter: Edmon Begoli
> Fix For: Future
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
>   When we query the last attribute of a text file, we get missing characters. 
>  Looking at the row through Drill, a \r is included at the end of the last 
> attribute.  
> Looking in a text editor, it's not embedded into that attribute.
> I'm thinking that Drill is not interpreting CRLF (0d0a) as a new line, only 
> the LF, resulting in the CR becoming part of the last attribute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979610#comment-14979610
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r43339305
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ---
@@ -46,6 +51,32 @@ public static boolean containsLimit0(RelNode rel) {
 return visitor.isContains();
   }
 
+  public static DrillRel addLimitOnTopOfLeafNodes(final DrillRel rel) {
+final RelShuttleImpl shuttle = new RelShuttleImpl() {
+
+  private RelNode addLimitAsParent(RelNode node) {
+final RexBuilder builder = node.getCluster().getRexBuilder();
+final RexLiteral offset = 
builder.makeExactLiteral(BigDecimal.ZERO);
+final RexLiteral fetch = builder.makeExactLiteral(BigDecimal.ZERO);
+return new DrillLimitRel(node.getCluster(), node.getTraitSet(), 
node, offset, fetch);
--- End diff --

Thank you Julian, RelBuilder seems perfect for this case.

Jinfeng, for now, making this visitor more general and using RelBuilder 
needs bigger changes, so I am adding a TODO(DRILL-3993).


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3992) Unable to query Oracle DB using JDBC Storage Plug-In

2015-10-28 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979625#comment-14979625
 ] 

Jacques Nadeau commented on DRILL-3992:
---

Note that for reference, here are example configurations I was using to do 
initial testing:

https://github.com/jacques-n/drill/blob/DRILL-3992/contrib/storage-jdbc/src/test/resources/bootstrap-storage-plugins.json

> Unable to query Oracle DB using JDBC Storage Plug-In
> 
>
> Key: DRILL-3992
> URL: https://issues.apache.org/jira/browse/DRILL-3992
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
> Environment: Windows 7 Enterprise 64-bit, Oracle 10g, Teradata 15.00
>Reporter: Eric Roma
>Priority: Minor
>  Labels: newbie
> Fix For: 1.2.0
>
>
> *See External Issue URL for Stack Overflow Post*
> *Appears to be similar issue at 
> http://stackoverflow.com/questions/33370438/apache-drill-1-2-and-sql-server-jdbc*
> Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release 
> 10.2.0.4.0 - 64bit in embedded mode.
> I'm curious if anyone has had any success connecting Apache Drill to an 
> Oracle DB. I've updated the drill-override.conf with the following 
> configurations (per documents):
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "localhost:2181",
>   drill.exec.sys.store.provider.local.path = "/mypath"
> }
> and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can 
> successfully create the storage plug-in:
> {
>   "type": "jdbc",
>   "driver": "oracle.jdbc.driver.OracleDriver",
>   "url": "jdbc:oracle:thin:@::",
>   "username": "USERNAME",
>   "password": "PASSWORD",
>   "enabled": true
> }
> but when I issue a query such as:
> select * from ..`dual`; 
> I get the following error:
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: 
> From line 1, column 15 to line 1, column 20: Table 
> '..dual' not found [Error Id: 
> 57a4153c-6378-4026-b90c-9bb727e131ae on :].
> I've tried to query other schema/tables and get a similar result. I've also 
> tried connecting to Teradata and get the same error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3992) Unable to query Oracle DB using JDBC Storage Plug-In

2015-10-28 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979623#comment-14979623
 ] 

Jacques Nadeau commented on DRILL-3992:
---

I have a fix which I believe resolves this issue. 

You can try it out by checking out the following commit and building Drill.

https://github.com/jacques-n/drill/commit/b6a502652c8a8273802b79061b761d866871959b

Let me know if this resolves your problem.

> Unable to query Oracle DB using JDBC Storage Plug-In
> 
>
> Key: DRILL-3992
> URL: https://issues.apache.org/jira/browse/DRILL-3992
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
> Environment: Windows 7 Enterprise 64-bit, Oracle 10g, Teradata 15.00
>Reporter: Eric Roma
>Priority: Minor
>  Labels: newbie
> Fix For: 1.2.0
>
>
> *See External Issue URL for Stack Overflow Post*
> *Appears to be similar issue at 
> http://stackoverflow.com/questions/33370438/apache-drill-1-2-and-sql-server-jdbc*
> Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release 
> 10.2.0.4.0 - 64bit in embedded mode.
> I'm curious if anyone has had any success connecting Apache Drill to an 
> Oracle DB. I've updated the drill-override.conf with the following 
> configurations (per documents):
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "localhost:2181",
>   drill.exec.sys.store.provider.local.path = "/mypath"
> }
> and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can 
> successfully create the storage plug-in:
> {
>   "type": "jdbc",
>   "driver": "oracle.jdbc.driver.OracleDriver",
>   "url": "jdbc:oracle:thin:@::",
>   "username": "USERNAME",
>   "password": "PASSWORD",
>   "enabled": true
> }
> but when I issue a query such as:
> select * from ..`dual`; 
> I get the following error:
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: 
> From line 1, column 15 to line 1, column 20: Table 
> '..dual' not found [Error Id: 
> 57a4153c-6378-4026-b90c-9bb727e131ae on :].
> I've tried to query other schema/tables and get a similar result. I've also 
> tried connecting to Teradata and get the same error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)