[jira] [Updated] (DRILL-6076) Reduce the default memory from a total of 13GB to 5GB

2018-01-10 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-6076:

Labels:   (was: ready-to-commit)

> Reduce the default memory from a total of 13GB to 5GB
> -
>
> Key: DRILL-6076
> URL: https://issues.apache.org/jira/browse/DRILL-6076
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Critical
> Fix For: 1.13.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, the default memory requirements for Drill are about 13GB, with the 
> following allocations:
> * 4GB Heap
> * 8GB Direct Memory
> * 1GB CodeCache
> * 512MB MaxPermSize
> Also, with Drill 1.12.0, the recommendation is to move to JDK8, which makes 
> the MaxPermSize as irrelevant.
> With that, the default requirements total to 13GB, which is rather high. This 
> is especially a problem for scenarios where people are trying out Drill and 
> might be using this in a development environment where 13GB is too high.
> When using the public [test 
> framework|https://github.com/mapr/drill-test-framework/] for Apache Drill, it 
> was observed that the framework's functional and unit tests passed 
> successfully with memory as little as 5GB; based on the following allocation:
> * 1GB Heap
> * 3GB Direct Memory
> * 512MB CodeCache
> * 512MB MaxPermSize
> Based on this finding, the proposal is to reduce the defaults from the 
> current settings to the values just mentioned above. The drill-env.sh file 
> already has details in the comments, along with the recommended values that 
> reflect the original 13GB defaults.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5994) Cannot start web server on a machine with more than 200 cores

2018-01-10 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-5994:


Assignee: Mitchel Labonte

> Cannot start web server on a machine with more than 200 cores
> -
>
> Key: DRILL-5994
> URL: https://issues.apache.org/jira/browse/DRILL-5994
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Mitchel Labonte
>Assignee: Mitchel Labonte
>Priority: Minor
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
>
> If the WebServer is launched on a machine that has more than 200 cores, you 
> get the following stack trace:
> {noformat}
> Exception in thread "main" 
> org.apache.drill.exec.exception.DrillStartupException: Failure during initial 
> startup of Drillbit:
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:313)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:289)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:285)
> Caused by: java.lang.IllegalStateException: Insufficient max threads in 
> ThreadPool: max=200 < needed=206
> at org.eclipse.jetty.server.Server.doStart(Server.java:321)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at org.eclipse.drill.exec.server.rest.WebServer.start(WebServer.java:197)
> at org.eclipse.drill.exec.server.Drillbit.run(Drillbit.java:140)
> at org.eclipse.drill.exec.server.Drillbit.start(Drillbit.java:309)
> ... 2 more
> {noformat}
> The cause of this is that in the WebServer start method, a Server instance is 
> created with the default constructor, which initializes a QueuedThreadPool 
> with a default maxThreads value of 200, and there is no way to configure this 
> value.
> *For documentation*
> New config option - drill.exec.web_server.thread_pool_max.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6002) Avoid memory copy from direct buffer to heap while spilling to local disk

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321490#comment-16321490
 ] 

ASF GitHub Bot commented on DRILL-6002:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1058#discussion_r160841030
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/SpillSet.java
 ---
@@ -107,7 +107,7 @@
  * nodes provide insufficient local disk space)
  */
 
-private static final int TRANSFER_SIZE = 32 * 1024;
+private static final int TRANSFER_SIZE = 1024 * 1024;
--- End diff --

Is a 1MB buffer excessive? The point of a buffer is to ensure we write in 
units of a disk block. For the local file system, experience showed no gain 
after 32K. In the MapR FS, each write is in units of 1 MB. Does Hadoop have a 
preferred size?

Given this variation, if we need large buffers, should we choose a buffer 
size based on the underlying file system? For example, is there a preferred 
size for S3?

32K didn't seem large enough to worry about, even if we had 1000 fragments 
busily spilling. But 1MB? 1000 * 1 MB = 1GB, which starts becoming significant, 
especially in light of our efforts to reduce heap usage. Should we worry?


> Avoid memory copy from direct buffer to heap while spilling to local disk
> -
>
> Key: DRILL-6002
> URL: https://issues.apache.org/jira/browse/DRILL-6002
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>
> When spilling to a local disk or to any file system that supports 
> WritableByteChannel it is preferable to avoid copy from off-heap to java heap 
> as WritableByteChannel can work directly with the off-heap memory.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5994) Cannot start web server on a machine with more than 200 cores

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321461#comment-16321461
 ] 

ASF GitHub Bot commented on DRILL-5994:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1069
  
@MitchelLabonte My proposal is to limit the number of acceptors instead of 
allowing to increase the max number of threads in the connection pool. If the 
number of acceptors is limited, the number of "needed" threads will be limited 
too and the exception will not be raised.


> Cannot start web server on a machine with more than 200 cores
> -
>
> Key: DRILL-5994
> URL: https://issues.apache.org/jira/browse/DRILL-5994
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Mitchel Labonte
>Priority: Minor
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
>
> If the WebServer is launched on a machine that has more than 200 cores, you 
> get the following stack trace:
> {noformat}
> Exception in thread "main" 
> org.apache.drill.exec.exception.DrillStartupException: Failure during initial 
> startup of Drillbit:
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:313)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:289)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:285)
> Caused by: java.lang.IllegalStateException: Insufficient max threads in 
> ThreadPool: max=200 < needed=206
> at org.eclipse.jetty.server.Server.doStart(Server.java:321)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at org.eclipse.drill.exec.server.rest.WebServer.start(WebServer.java:197)
> at org.eclipse.drill.exec.server.Drillbit.run(Drillbit.java:140)
> at org.eclipse.drill.exec.server.Drillbit.start(Drillbit.java:309)
> ... 2 more
> {noformat}
> The cause of this is that in the WebServer start method, a Server instance is 
> created with the default constructor, which initializes a QueuedThreadPool 
> with a default maxThreads value of 200, and there is no way to configure this 
> value.
> *For documentation*
> New config option - drill.exec.web_server.thread_pool_max.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5994) Cannot start web server on a machine with more than 200 cores

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321413#comment-16321413
 ] 

ASF GitHub Bot commented on DRILL-5994:
---

Github user MitchelLabonte commented on the issue:

https://github.com/apache/drill/pull/1069
  
@vrozov the Web server definitely doesn't need this much, but Drill won't 
launch on a machine with more than 200 cores and throw an exception:
Exception in thread "main" 
org.apache.drill.exec.exception.DrillStartupException: Failure
during initial startup of Drillbit:
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:313)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:289)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:285)
> Caused by: java.lang.IllegalStateException: Insufficient max threads in 
ThreadPool: max=200
< needed=206
> at org.eclipse.jetty.server.Server.doStart(Server.java:321)
> at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at org.eclipse.drill.exec.server.rest.WebServer.start(WebServer.java:197)
> at org.eclipse.drill.exec.server.Drillbit.run(Drillbit.java:140)
> at org.eclipse.drill.exec.server.Drillbit.start(Drillbit.java:309)
> ... 2 more

The configuration is there to have an option to not hit this exception on 
startup on a machine with more than 200 cores. 


> Cannot start web server on a machine with more than 200 cores
> -
>
> Key: DRILL-5994
> URL: https://issues.apache.org/jira/browse/DRILL-5994
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Mitchel Labonte
>Priority: Minor
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
>
> If the WebServer is launched on a machine that has more than 200 cores, you 
> get the following stack trace:
> {noformat}
> Exception in thread "main" 
> org.apache.drill.exec.exception.DrillStartupException: Failure during initial 
> startup of Drillbit:
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:313)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:289)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:285)
> Caused by: java.lang.IllegalStateException: Insufficient max threads in 
> ThreadPool: max=200 < needed=206
> at org.eclipse.jetty.server.Server.doStart(Server.java:321)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at org.eclipse.drill.exec.server.rest.WebServer.start(WebServer.java:197)
> at org.eclipse.drill.exec.server.Drillbit.run(Drillbit.java:140)
> at org.eclipse.drill.exec.server.Drillbit.start(Drillbit.java:309)
> ... 2 more
> {noformat}
> The cause of this is that in the WebServer start method, a Server instance is 
> created with the default constructor, which initializes a QueuedThreadPool 
> with a default maxThreads value of 200, and there is no way to configure this 
> value.
> *For documentation*
> New config option - drill.exec.web_server.thread_pool_max.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-10 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-6080:
--

 Summary: Sort incorrectly limits batch size to 65535 records 
rather than 65536
 Key: DRILL-6080
 URL: https://issues.apache.org/jira/browse/DRILL-6080
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.12.0
Reporter: Paul Rogers
Assignee: Paul Rogers
Priority: Minor
 Fix For: 1.13.0


Drill places an upper limit on the number of rows in a batch of 64K. That is 
65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 to 
65,535.

The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
count. So, if an incoming batch uses the full 64K size, sort ends up splitting 
batches unnecessarily.

The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6025) Execution time of a running query shown as 'NOT AVAILABLE'

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321167#comment-16321167
 ] 

ASF GitHub Bot commented on DRILL-6025:
---

Github user prasadns14 commented on the issue:

https://github.com/apache/drill/pull/1074
  
@arina-ielchiieva, resolved merge conflicts


> Execution time of a running query shown as 'NOT AVAILABLE'
> --
>
> Key: DRILL-6025
> URL: https://issues.apache.org/jira/browse/DRILL-6025
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
>
> When a query is in 'RUNNING' state, the execution time is shown as 'NOT 
> AVAILABLE'
> We could show the execution duration till the current time



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5994) Cannot start web server on a machine with more than 200 cores

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321148#comment-16321148
 ] 

ASF GitHub Bot commented on DRILL-5994:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1069
  
@MitchelLabonte @arina-ielchiieva I don't think that Drill needs that many 
threads/acceptors to handle HTTP(s) requests as it is not a real web (REST API) 
server. For proper resource utilization, it will be better to limit the number 
of acceptors to a small value (let's say 2 or 4 by default) instead of the 
current default that uses a number of available processors and can be huge on 
machines with lots of cores (32 or more). @paul-rogers  What is your take on 
this?  


> Cannot start web server on a machine with more than 200 cores
> -
>
> Key: DRILL-5994
> URL: https://issues.apache.org/jira/browse/DRILL-5994
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Mitchel Labonte
>Priority: Minor
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
>
> If the WebServer is launched on a machine that has more than 200 cores, you 
> get the following stack trace:
> {noformat}
> Exception in thread "main" 
> org.apache.drill.exec.exception.DrillStartupException: Failure during initial 
> startup of Drillbit:
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:313)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:289)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:285)
> Caused by: java.lang.IllegalStateException: Insufficient max threads in 
> ThreadPool: max=200 < needed=206
> at org.eclipse.jetty.server.Server.doStart(Server.java:321)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at org.eclipse.drill.exec.server.rest.WebServer.start(WebServer.java:197)
> at org.eclipse.drill.exec.server.Drillbit.run(Drillbit.java:140)
> at org.eclipse.drill.exec.server.Drillbit.start(Drillbit.java:309)
> ... 2 more
> {noformat}
> The cause of this is that in the WebServer start method, a Server instance is 
> created with the default constructor, which initializes a QueuedThreadPool 
> with a default maxThreads value of 200, and there is no way to configure this 
> value.
> *For documentation*
> New config option - drill.exec.web_server.thread_pool_max.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5741) Automatically manage memory allocations during startup

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321142#comment-16321142
 ] 

ASF GitHub Bot commented on DRILL-5741:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1082#discussion_r160803438
  
--- Diff: distribution/src/resources/drill-config.sh ---
@@ -180,18 +251,46 @@ else
   fi
 fi
 
-# Default memory settings if none provided by the environment or
+# Checking if being executed in context of Drillbit and not SQLLine
+if [ "$DRILLBIT_CONTEXT" == "1" ]; then 
+  # *-auto.sh allows for distrib/user specific checks to be done
+  distribAuto="$DRILL_CONF_DIR/distrib-auto.sh"
+  if [ ! -r "$distribAuto" ]; then 
distribAuto="$DRILL_HOME/conf/distrib-auto.sh"; fi
+  if [ ! -r "$distribAuto" ]; then distribAuto=""; fi
+  drillAuto="$DRILL_CONF_DIR/drill-auto.sh"
+  if [ ! -r "$drillAuto" ]; then 
drillAuto="$DRILL_HOME/conf/drill-auto.sh"; fi
+  if [ ! -r "$drillAuto" ]; then drillAuto=""; fi
+
+  # Enforcing checks in order (distrib-auto.sh , drill-auto.sh)
+  # (NOTE: A script is executed only if it has relevant executable lines)
+  if [ -n "$distribAuto" ] && [ $(executableLineCount $distribAuto) -gt 0 
]; then
+. "$distribAuto"
+if [ $? -gt 0 ]; then fatal_error "Aborting Drill Startup due failed 
checks from $distribAuto"; fi
+  fi
+  if [ -n "$drillAuto" ] && [ $(executableLineCount $drillAuto) -gt 0 ]; 
then
--- End diff --

Passed the checks for the file to the renamed function: 
`checkExecutableLineCount`


> Automatically manage memory allocations during startup
> --
>
> Key: DRILL-5741
> URL: https://issues.apache.org/jira/browse/DRILL-5741
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Fix For: 1.13.0
>
> Attachments: Auto Mem Allocation Proposal - Computation Logic.pdf, 
> Auto Mem Allocation Proposal - Scenarios.pdf
>
>
> Currently, during startup, a Drillbit can be assigned large values for the 
> following:
> * Xmx (Heap)
> * XX:MaxDirectMemorySize
> * XX:ReservedCodeCacheSize
> * XX:MaxPermSize
> All of this, potentially, can exceed the available memory on a system when a 
> Drillbit is under heavy load. It would be good to have the Drillbit ensure 
> during startup itself that the cumulative value of these parameters does not 
> exceed a pre-defined upper limit for the Drill process.
> This JIRA is a *proposal* to allow for automatic configuration (based on 
> configuration patterns observed in production Drill clusters). It leverages 
> the capability of providing distribution (and user-specific) checks during 
> Drill Startup from DRILL-6068.
> The idea is to remove the need for a user to worry about managing the tuning 
> parameters, by providing the optimal values. In addition, it also allows for 
> the memory allocation to be implicitly managed by simply providing the Drill 
> process with a single dimensional of total process memory (either in absolute 
> values, or as a percentage of the total system memory), while 
> {{distrib-auto.sh}} provides the individual allocations.
> This allocation is then partitioned into allocations for Heap and Direct 
> Memory, with a small portion allocated for the Generated Java CodeCache as 
> well. If any of the individual allocations are also specified (via 
> {{distrib-env.sh}} or {{drill-env.sh}}), the remaining unspecified 
> allocations are adjusted to stay +within the limits+ of the total memory 
> allocation.
> The *details* of the proposal are here:
> https://docs.google.com/spreadsheets/d/1N6VYlQFiPoTV4iD46XbkIrvEQesiGFUU9-GWXYsAPXs/edit#gid=0
> For those unable to access the Google Document, PDFs are attached:
> * [^Auto Mem Allocation Proposal - Computation Logic.pdf] - Provides the 
> equation used for computing the heap, direct and code cache allocations for a 
> given input
> * [^Auto Mem Allocation Proposal - Scenarios.pdf] - Describes the various 
> inputs, and their expected allocations
> The variables that are (_optionally_) defined (in memory, {{distrib-env.sh}} 
> or {{drill-env.sh}} ) are:
> * {{DRILLBIT_MAX_PROC_MEM}} : Total Process Memory
> * {{DRILL_HEAP}} : JVM Max Heap Size
> * {{DRILL_MAX_DIRECT_MEMORY}} : JVM Max Direct Memory Size
> * {{DRILLBIT_CODE_CACHE_SIZE}} : JVM Code Cache Size
> Note: _With JDK8, MaxPermSize is no longer supported, so we do not account 
> for this any more, and will unset the variable if JDK8 or higher is detected._



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5741) Automatically manage memory allocations during startup

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321141#comment-16321141
 ] 

ASF GitHub Bot commented on DRILL-5741:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1082#discussion_r160802962
  
--- Diff: distribution/src/assemble/bin.xml ---
@@ -345,6 +345,16 @@
   0755
   conf
 
+
+  src/resources/drill-auto.sh
--- End diff --

Modifying the base commit (#1081) to reflect the name change from 
`[distrib/drill]-auto.sh` to `[distrib/drill]-setup.sh`


> Automatically manage memory allocations during startup
> --
>
> Key: DRILL-5741
> URL: https://issues.apache.org/jira/browse/DRILL-5741
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.11.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Fix For: 1.13.0
>
> Attachments: Auto Mem Allocation Proposal - Computation Logic.pdf, 
> Auto Mem Allocation Proposal - Scenarios.pdf
>
>
> Currently, during startup, a Drillbit can be assigned large values for the 
> following:
> * Xmx (Heap)
> * XX:MaxDirectMemorySize
> * XX:ReservedCodeCacheSize
> * XX:MaxPermSize
> All of this, potentially, can exceed the available memory on a system when a 
> Drillbit is under heavy load. It would be good to have the Drillbit ensure 
> during startup itself that the cumulative value of these parameters does not 
> exceed a pre-defined upper limit for the Drill process.
> This JIRA is a *proposal* to allow for automatic configuration (based on 
> configuration patterns observed in production Drill clusters). It leverages 
> the capability of providing distribution (and user-specific) checks during 
> Drill Startup from DRILL-6068.
> The idea is to remove the need for a user to worry about managing the tuning 
> parameters, by providing the optimal values. In addition, it also allows for 
> the memory allocation to be implicitly managed by simply providing the Drill 
> process with a single dimensional of total process memory (either in absolute 
> values, or as a percentage of the total system memory), while 
> {{distrib-auto.sh}} provides the individual allocations.
> This allocation is then partitioned into allocations for Heap and Direct 
> Memory, with a small portion allocated for the Generated Java CodeCache as 
> well. If any of the individual allocations are also specified (via 
> {{distrib-env.sh}} or {{drill-env.sh}}), the remaining unspecified 
> allocations are adjusted to stay +within the limits+ of the total memory 
> allocation.
> The *details* of the proposal are here:
> https://docs.google.com/spreadsheets/d/1N6VYlQFiPoTV4iD46XbkIrvEQesiGFUU9-GWXYsAPXs/edit#gid=0
> For those unable to access the Google Document, PDFs are attached:
> * [^Auto Mem Allocation Proposal - Computation Logic.pdf] - Provides the 
> equation used for computing the heap, direct and code cache allocations for a 
> given input
> * [^Auto Mem Allocation Proposal - Scenarios.pdf] - Describes the various 
> inputs, and their expected allocations
> The variables that are (_optionally_) defined (in memory, {{distrib-env.sh}} 
> or {{drill-env.sh}} ) are:
> * {{DRILLBIT_MAX_PROC_MEM}} : Total Process Memory
> * {{DRILL_HEAP}} : JVM Max Heap Size
> * {{DRILL_MAX_DIRECT_MEMORY}} : JVM Max Direct Memory Size
> * {{DRILLBIT_CODE_CACHE_SIZE}} : JVM Code Cache Size
> Note: _With JDK8, MaxPermSize is no longer supported, so we do not account 
> for this any more, and will unset the variable if JDK8 or higher is detected._



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5879) Optimize "Like" operator

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321124#comment-16321124
 ] 

ASF GitHub Bot commented on DRILL-5879:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r160753323
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,283 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 1) {
+  matcherFcn = new Matcher1();
+} else if (patternLength == 2) {
+  matcherFcn = new Matcher2();
+} else if (patternLength == 3) {
+  matcherFcn = new Matcher3();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class Matcher1 extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
-  return 1;
+private Matcher1() {
+  super();
 }
 
-final int txtLength = end - start;
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
 
-// no match if input string length is less than pattern length
-if (txtLength < patternLength) {
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+byte inputByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != inputByte) {
+  continue;
+}
+return 1;
+  }
   return 0;
 }
+  }
 
+  /** Handles patterns with length two */
+  private final class Matcher2 extends MatcherFcn {
 
-final int outerEnd = txtLength - patternLength;
+private Matcher2() {
+  super();
+}
 
-outer:
-for (int txtIndex = 0; txtIndex <= outerEnd; txtIndex++) {
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start - 1;
+  final byte firstPattByte  = patternArray[0];
+  final byte secondPattByte = patternArray[1];
 
   // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
   // so, we can just directly compare.
-  for (int patternIndex = 0; patternIndex < patternLength; 
patternIndex++) {
-if (patternByteBuffer.get(patternIndex) != drillBuf.getByte(start 
+ txtIndex + patternIndex)) {
-  continue outer;
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+final byte firstInByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != firstInByte) {
+  continue;
+} else {
+  final byte secondInByte = drillBuf.getByte(start + idx +1);
+
+  if (secondInByte == secondPattByte) {
+

[jira] [Commented] (DRILL-5879) Optimize "Like" operator

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321122#comment-16321122
 ] 

ASF GitHub Bot commented on DRILL-5879:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r160754694
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,283 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 1) {
+  matcherFcn = new Matcher1();
+} else if (patternLength == 2) {
+  matcherFcn = new Matcher2();
+} else if (patternLength == 3) {
--- End diff --

Good point Paul!


> Optimize "Like" operator
> 
>
> Key: DRILL-5879
> URL: https://issues.apache.org/jira/browse/DRILL-5879
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
> Environment: * 
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
> Fix For: 1.13.0
>
>
> Query: select  from  where colA like '%a%' or colA like 
> '%xyz%';
> Improvement Opportunities
> # Avoid isAscii computation (full access of the input string) since we're 
> dealing with the same column twice
> # Optimize the "contains" for-loop 
> Implementation Details
> 1)
> * Added a new integer variable "asciiMode" to the VarCharHolder class
> * The default value is -1 which indicates this info is not known
> * Otherwise this value will be set to either 1 or 0 based on the string being 
> in ASCII mode or Unicode
> * The execution plan already shares the same VarCharHolder instance for all 
> evaluations of the same column value
> * The asciiMode will be correctly set during the first LIKE evaluation and 
> will be reused across other LIKE evaluations
> 2) 
> * The "Contains" LIKE operation is quite expensive as the code needs to 
> access the input string to perform character based comparisons
> * Created 4 versions of the same for-loop to a) make the loop simpler to 
> optimize (Vectorization) and b) minimize comparisons
> Benchmarks
> * Lineitem table 100GB
> * Query: select l_returnflag, count(*) from dfs.`` where l_comment 
> not like '%a%' or l_comment like '%the%' group by l_returnflag
> * Before changes: 33sec
> * After changes: 27sec



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5879) Optimize "Like" operator

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321123#comment-16321123
 ] 

ASF GitHub Bot commented on DRILL-5879:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r160763490
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,283 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 1) {
+  matcherFcn = new Matcher1();
+} else if (patternLength == 2) {
+  matcherFcn = new Matcher2();
+} else if (patternLength == 3) {
+  matcherFcn = new Matcher3();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class Matcher1 extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
-  return 1;
+private Matcher1() {
+  super();
 }
 
-final int txtLength = end - start;
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
 
-// no match if input string length is less than pattern length
-if (txtLength < patternLength) {
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+byte inputByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != inputByte) {
+  continue;
+}
+return 1;
+  }
   return 0;
 }
+  }
 
+  /** Handles patterns with length two */
+  private final class Matcher2 extends MatcherFcn {
 
-final int outerEnd = txtLength - patternLength;
+private Matcher2() {
+  super();
+}
 
-outer:
-for (int txtIndex = 0; txtIndex <= outerEnd; txtIndex++) {
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start - 1;
+  final byte firstPattByte  = patternArray[0];
+  final byte secondPattByte = patternArray[1];
 
   // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
   // so, we can just directly compare.
-  for (int patternIndex = 0; patternIndex < patternLength; 
patternIndex++) {
-if (patternByteBuffer.get(patternIndex) != drillBuf.getByte(start 
+ txtIndex + patternIndex)) {
-  continue outer;
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+final byte firstInByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != firstInByte) {
+  continue;
+} else {
+  final byte secondInByte = drillBuf.getByte(start + idx +1);
+
+  if (secondInByte == secondPattByte) {
+

[jira] [Commented] (DRILL-5879) Optimize "Like" operator

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321121#comment-16321121
 ] 

ASF GitHub Bot commented on DRILL-5879:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r160758370
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,283 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 1) {
+  matcherFcn = new Matcher1();
+} else if (patternLength == 2) {
+  matcherFcn = new Matcher2();
+} else if (patternLength == 3) {
+  matcherFcn = new Matcher3();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class Matcher1 extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
-  return 1;
+private Matcher1() {
+  super();
 }
 
-final int txtLength = end - start;
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
 
-// no match if input string length is less than pattern length
-if (txtLength < patternLength) {
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+byte inputByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != inputByte) {
+  continue;
+}
+return 1;
+  }
   return 0;
 }
+  }
 
+  /** Handles patterns with length two */
+  private final class Matcher2 extends MatcherFcn {
 
-final int outerEnd = txtLength - patternLength;
+private Matcher2() {
+  super();
+}
 
-outer:
-for (int txtIndex = 0; txtIndex <= outerEnd; txtIndex++) {
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start - 1;
+  final byte firstPattByte  = patternArray[0];
+  final byte secondPattByte = patternArray[1];
 
   // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
   // so, we can just directly compare.
-  for (int patternIndex = 0; patternIndex < patternLength; 
patternIndex++) {
-if (patternByteBuffer.get(patternIndex) != drillBuf.getByte(start 
+ txtIndex + patternIndex)) {
-  continue outer;
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+final byte firstInByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != firstInByte) {
+  continue;
+} else {
+  final byte secondInByte = drillBuf.getByte(start + idx +1);
+
+  if (secondInByte == secondPattByte) {
+

[jira] [Commented] (DRILL-5879) Optimize "Like" operator

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321118#comment-16321118
 ] 

ASF GitHub Bot commented on DRILL-5879:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r160765882
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestSqlPatterns.java
 ---
@@ -446,6 +446,61 @@ public void testSqlPatternComplex() {
 assertEquals(1, sqlPatternComplex.match(0, byteBuffer.limit(), 
drillBuf)); // should match
   }
 
+  @Test
+  public void testSqlPatternContainsMUltipleMatchers() {
--- End diff --

Paul, the test-suite already has many other tests for SQL matching. But I 
agree, they now might hit only few of these matcher. I will add more tests.


> Optimize "Like" operator
> 
>
> Key: DRILL-5879
> URL: https://issues.apache.org/jira/browse/DRILL-5879
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
> Environment: * 
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
> Fix For: 1.13.0
>
>
> Query: select  from  where colA like '%a%' or colA like 
> '%xyz%';
> Improvement Opportunities
> # Avoid isAscii computation (full access of the input string) since we're 
> dealing with the same column twice
> # Optimize the "contains" for-loop 
> Implementation Details
> 1)
> * Added a new integer variable "asciiMode" to the VarCharHolder class
> * The default value is -1 which indicates this info is not known
> * Otherwise this value will be set to either 1 or 0 based on the string being 
> in ASCII mode or Unicode
> * The execution plan already shares the same VarCharHolder instance for all 
> evaluations of the same column value
> * The asciiMode will be correctly set during the first LIKE evaluation and 
> will be reused across other LIKE evaluations
> 2) 
> * The "Contains" LIKE operation is quite expensive as the code needs to 
> access the input string to perform character based comparisons
> * Created 4 versions of the same for-loop to a) make the loop simpler to 
> optimize (Vectorization) and b) minimize comparisons
> Benchmarks
> * Lineitem table 100GB
> * Query: select l_returnflag, count(*) from dfs.`` where l_comment 
> not like '%a%' or l_comment like '%the%' group by l_returnflag
> * Before changes: 33sec
> * After changes: 27sec



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5879) Optimize "Like" operator

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321125#comment-16321125
 ] 

ASF GitHub Bot commented on DRILL-5879:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r160764681
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,283 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 1) {
+  matcherFcn = new Matcher1();
+} else if (patternLength == 2) {
+  matcherFcn = new Matcher2();
+} else if (patternLength == 3) {
+  matcherFcn = new Matcher3();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class Matcher1 extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
-  return 1;
+private Matcher1() {
+  super();
 }
 
-final int txtLength = end - start;
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
 
-// no match if input string length is less than pattern length
-if (txtLength < patternLength) {
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+byte inputByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != inputByte) {
+  continue;
+}
+return 1;
+  }
   return 0;
 }
+  }
 
+  /** Handles patterns with length two */
+  private final class Matcher2 extends MatcherFcn {
 
-final int outerEnd = txtLength - patternLength;
+private Matcher2() {
+  super();
+}
 
-outer:
-for (int txtIndex = 0; txtIndex <= outerEnd; txtIndex++) {
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start - 1;
+  final byte firstPattByte  = patternArray[0];
+  final byte secondPattByte = patternArray[1];
 
   // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
   // so, we can just directly compare.
-  for (int patternIndex = 0; patternIndex < patternLength; 
patternIndex++) {
-if (patternByteBuffer.get(patternIndex) != drillBuf.getByte(start 
+ txtIndex + patternIndex)) {
-  continue outer;
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+final byte firstInByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != firstInByte) {
+  continue;
+} else {
+  final byte secondInByte = drillBuf.getByte(start + idx +1);
+
+  if (secondInByte == secondPattByte) {
+

[jira] [Commented] (DRILL-5879) Optimize "Like" operator

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321120#comment-16321120
 ] 

ASF GitHub Bot commented on DRILL-5879:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r160755097
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,283 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 1) {
+  matcherFcn = new Matcher1();
+} else if (patternLength == 2) {
+  matcherFcn = new Matcher2();
+} else if (patternLength == 3) {
+  matcherFcn = new Matcher3();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class Matcher1 extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
-  return 1;
+private Matcher1() {
+  super();
 }
 
-final int txtLength = end - start;
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
 
-// no match if input string length is less than pattern length
-if (txtLength < patternLength) {
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+byte inputByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != inputByte) {
+  continue;
+}
+return 1;
+  }
   return 0;
 }
+  }
 
+  /** Handles patterns with length two */
+  private final class Matcher2 extends MatcherFcn {
 
-final int outerEnd = txtLength - patternLength;
+private Matcher2() {
+  super();
+}
 
-outer:
-for (int txtIndex = 0; txtIndex <= outerEnd; txtIndex++) {
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start - 1;
+  final byte firstPattByte  = patternArray[0];
+  final byte secondPattByte = patternArray[1];
 
   // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
   // so, we can just directly compare.
-  for (int patternIndex = 0; patternIndex < patternLength; 
patternIndex++) {
-if (patternByteBuffer.get(patternIndex) != drillBuf.getByte(start 
+ txtIndex + patternIndex)) {
-  continue outer;
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+final byte firstInByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != firstInByte) {
+  continue;
+} else {
+  final byte secondInByte = drillBuf.getByte(start + idx +1);
+
+  if (secondInByte == secondPattByte) {
+

[jira] [Commented] (DRILL-5879) Optimize "Like" operator

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321119#comment-16321119
 ] 

ASF GitHub Bot commented on DRILL-5879:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1072#discussion_r160759375
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -19,44 +19,283 @@
 
 import io.netty.buffer.DrillBuf;
 
-public class SqlPatternContainsMatcher extends AbstractSqlPatternMatcher {
+/** SQL Pattern Contains implementation */
+public final class SqlPatternContainsMatcher extends 
AbstractSqlPatternMatcher {
+  private final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString) {
 super(patternString);
+
+// Pattern matching is 1) a CPU intensive operation and 2) pattern and 
input dependent. The conclusion is
+// that there is no single implementation that can do it all well. So, 
we use multiple implementations
+// chosen based on the pattern length.
+if (patternLength == 1) {
+  matcherFcn = new Matcher1();
+} else if (patternLength == 2) {
+  matcherFcn = new Matcher2();
+} else if (patternLength == 3) {
+  matcherFcn = new Matcher3();
+} else if (patternLength < 10) {
+  matcherFcn = new MatcherN();
+} else {
+  matcherFcn = new BoyerMooreMatcher();
+}
   }
 
   @Override
   public int match(int start, int end, DrillBuf drillBuf) {
+return matcherFcn.match(start, end, drillBuf);
+  }
+
+  
//--
+  // Inner Data Structure
+  // 
--
+
+  /** Abstract matcher class to allow us pick the most efficient 
implementation */
+  private abstract class MatcherFcn {
+protected final byte[] patternArray;
+
+protected MatcherFcn() {
+  assert patternByteBuffer.hasArray();
+
+  patternArray = patternByteBuffer.array();
+}
+
+/**
+ * @return 1 if the pattern was matched; 0 otherwise
+ */
+protected abstract int match(int start, int end, DrillBuf drillBuf);
+  }
+
+  /** Handles patterns with length one */
+  private final class Matcher1 extends MatcherFcn {
 
-if (patternLength == 0) { // Everything should match for null pattern 
string
-  return 1;
+private Matcher1() {
+  super();
 }
 
-final int txtLength = end - start;
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start;
+  final byte firstPattByte  = patternArray[0];
 
-// no match if input string length is less than pattern length
-if (txtLength < patternLength) {
+  // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
+  // so, we can just directly compare.
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+byte inputByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != inputByte) {
+  continue;
+}
+return 1;
+  }
   return 0;
 }
+  }
 
+  /** Handles patterns with length two */
+  private final class Matcher2 extends MatcherFcn {
 
-final int outerEnd = txtLength - patternLength;
+private Matcher2() {
+  super();
+}
 
-outer:
-for (int txtIndex = 0; txtIndex <= outerEnd; txtIndex++) {
+/** {@inheritDoc} */
+@Override
+protected final int match(int start, int end, DrillBuf drillBuf) {
+  final int lengthToProcess = end - start - 1;
+  final byte firstPattByte  = patternArray[0];
+  final byte secondPattByte = patternArray[1];
 
   // simplePattern string has meta characters i.e % and _ and escape 
characters removed.
   // so, we can just directly compare.
-  for (int patternIndex = 0; patternIndex < patternLength; 
patternIndex++) {
-if (patternByteBuffer.get(patternIndex) != drillBuf.getByte(start 
+ txtIndex + patternIndex)) {
-  continue outer;
+  for (int idx = 0; idx < lengthToProcess; idx++) {
+final byte firstInByte = drillBuf.getByte(start + idx);
+
+if (firstPattByte != firstInByte) {
+  continue;
+} else {
+  final byte secondInByte = drillBuf.getByte(start + idx +1);
+
+  if (secondInByte == secondPattByte) {
+

[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321054#comment-16321054
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160792114
  
--- Diff: 
exec/vector/src/main/java/org/apache/drill/exec/util/DecimalUtility.java ---
@@ -159,9 +159,20 @@ public static BigDecimal 
getBigDecimalFromSparse(DrillBuf data, int startIndex,
 }
 
 public static BigDecimal getBigDecimalFromDrillBuf(DrillBuf bytebuf, 
int start, int length, int scale) {
+  if (length <= 0) {
+// if the length is somehow non-positive, interpret this as zero
+//System.out.println("getBigDecimal forces 0 with start " + start 
+ " len " + length);
+  try {
+  throw new Exception("hi there");
--- End diff --

Yes, I'll remove the friendly "hi there".  It was for debugging, I guess.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321049#comment-16321049
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160791971
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -539,7 +553,12 @@ public void setValueLengthSafe(int index, int length) {
 }
 
 
-public void setSafe(int index, int start, int end, DrillBuf buffer){
+<#if type.minor == "VarDecimal">
--- End diff --

Is this what you're talking about?  It looks OK to me.
<#if type.minor == "VarDecimal">
public void setSafe(int index, int start, int end, DrillBuf buffer, int 
scale)
<#else>
public void setSafe(int index, int start, int end, DrillBuf buffer)
 <#-- type.minor -->


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321043#comment-16321043
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160791435
  
--- Diff: exec/vector/src/main/codegen/templates/NullableValueVectors.java 
---
@@ -327,13 +327,17 @@ public Mutator getMutator(){
 return v;
   }
 
-  public void copyFrom(int fromIndex, int thisIndex, 
Nullable${minor.class}Vector from){
+  protected void copyFromUnsafe(int fromIndex, int thisIndex, 
Nullable${minor.class}Vector from){
--- End diff --

I added a copyFromSafe() method elsewhere, which is a "safe copy" that 
should not throw an exception due to overwriting the end of the buffer.  
Because I did that, and realizing that copyFrom() is actually UNSAFE (can throw 
the exception, which new copyFromSafe() is designed to avoid), I decided for 
code clarity to change the name of this function.  It has been over a year 
since I wrote this code, and that is what I recall now.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321036#comment-16321036
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160790684
  
--- Diff: exec/vector/src/main/codegen/templates/ComplexWriters.java ---
@@ -99,7 +99,7 @@ public void write(Nullable${minor.class?cap_first}Holder 
h) {
 
   <#if !(minor.class == "Decimal9" || minor.class == "Decimal18" || 
minor.class == "Decimal28Sparse" || minor.class == "Decimal38Sparse" || 
minor.class == "Decimal28Dense" || minor.class == "Decimal38Dense")>
   public void write${minor.class}(<#list fields as field>${field.type} 
${field.name}<#if field_has_next>, ) {
-mutator.addSafe(idx(), <#list fields as field>${field.name}<#if 
field_has_next>, );
+mutator.addSafe(idx(), <#list fields as field><#if field.name == 
"scale"><#break>${field.name}<#if field_has_next && 
fields[field_index+1].name != "scale" >, );
--- End diff --

I had a hard time understanding how this codegen stuff works, so some of 
the code I wrote was due to my struggling to  understand how it is SUPPOSED to 
work.  As such, I found that it needed to "break" out of code generation just 
after the "scale" argument, to avoid compilation failure of the generated code 
(IIRC, it was quite a while ago), and that is why I coded it this way.  What 
exact change are you suggesting?  To AND the field.name check=="scale" check 
with a check that minor.class=="VarDecimal"?


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321001#comment-16321001
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160786416
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/StringOutputRecordWriter.java ---
@@ -146,7 +146,7 @@ public void writeField() throws IOException {
 // TODO: error check
 addField(fieldId, reader.readObject().toString());
 
-  <#elseif minor.class == "VarChar" || minor.class == "Var16Char" || 
minor.class == "VarBinary">
+  <#elseif minor.class == "VarChar" || minor.class == "Var16Char" || 
minor.class == "VarBinary" || minor.class == "VarDecimal">
--- End diff --

Yes, that works


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320989#comment-16320989
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160785721
  
--- Diff: exec/java-exec/src/main/codegen/templates/SqlAccessors.java ---
@@ -127,6 +127,25 @@ public String getString(int index) {
 }
   <#break>
 
+<#case "VarDecimal">
+
+@Override
+public String getString(int index){
+<#if mode=="Nullable">
+if(ac.isNull(index)){
+  return null;
+}
+
+try {
--- End diff --

The getString method can throw NumberFormatException for BigDecimal, 
according to javadoc 
https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html.  Should it 
just throw the exception?  If yes, I';ll remove the try/catch.


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320945#comment-16320945
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160779892
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/DecimalFunctions.java ---
@@ -102,7 +111,578 @@
 <#-- For each DECIMAL... type (in DecimalTypes.tdd) ... -->
 <#list comparisonTypesDecimal.decimalTypes as type>
 
-<#if type.name.endsWith("Sparse")>
+<#if type.name.endsWith("VarDecimal")>
+
+<@pp.changeOutputFile 
name="/org/apache/drill/exec/expr/fn/impl/${type.name}Functions.java" />
+
+<#include "/@includes/license.ftl" />
+
+package org.apache.drill.exec.expr.fn.impl;
+
+<#include "/@includes/vv_imports.ftl" />
+
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import 
org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.annotations.Workspace;
+import org.apache.drill.exec.expr.fn.FunctionGenerationHelper;
+import org.apache.drill.exec.expr.holders.*;
+import org.apache.drill.exec.record.RecordBatch;
+
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.DrillBuf;
+
+import java.nio.ByteBuffer;
+
+@SuppressWarnings("unused")
+public class ${type.name}Functions {
+private static void initBuffer(DrillBuf buffer) {
+// for VarDecimal, this method of setting initial size is actually 
only a very rough heuristic.
+int size = (${type.storage} * 
(org.apache.drill.exec.util.DecimalUtility.INTEGER_SIZE));
+buffer = buffer.reallocIfNeeded(size);
+ }
+
+@FunctionTemplate(name = "subtract", scope = 
FunctionTemplate.FunctionScope.DECIMAL_ADD_SCALE, nulls = 
NullHandling.NULL_IF_NULL)
--- End diff --

I will try adding that with checkPrecision=false (one size fits all 
precisions), e.g.:
@FunctionTemplate(name = "add",
scope = FunctionTemplate.FunctionScope.SIMPLE,
returnType = FunctionTemplate.ReturnType.DECIMAL_ADD_SCALE,
nulls = NullHandling.NULL_IF_NULL,
checkPrecisionRange = false)



> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320932#comment-16320932
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160777281
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastIntDecimal.java ---
@@ -68,15 +68,31 @@ public void setup() {
 
 public void eval() {
 out.scale = (int) scale.value;
+
+<#if !type.to.endsWith("VarDecimal")>
 out.precision = (int) precision.value;
+
 
-<#if type.to == "Decimal9" || type.to == "Decimal18">
+<#if type.to.endsWith("VarDecimal")>
+out.start = 0;
+out.buffer = buffer;
+String s = Long.toString((long)in.value);
+for (int i = 0; i < out.scale; ++i) {  // add 0's to get unscaled 
integer
+s += "0";
--- End diff --

Agreed


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320930#comment-16320930
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160776544
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/DecimalAggrTypeFunctions2.java
 ---
@@ -108,9 +108,12 @@ public void output() {
 out.buffer = buffer;
 out.start  = 0;
 out.scale = outputScale.value;
-out.precision = 38;
 java.math.BigDecimal average = 
((java.math.BigDecimal)(value.obj)).divide(java.math.BigDecimal.valueOf(count.value,
 0), out.scale, java.math.BigDecimal.ROUND_HALF_UP);
+<#if type.inputType.contains("VarDecimal")>
--- End diff --

Yes, I will try "<#if !type.inputType.contains("VarDecimal")>"


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
> Fix For: Future
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6076) Reduce the default memory from a total of 13GB to 5GB

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320836#comment-16320836
 ] 

ASF GitHub Bot commented on DRILL-6076:
---

Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1086
  
@paul-rogers, @parthchandra  can you review/ comment on this change?


> Reduce the default memory from a total of 13GB to 5GB
> -
>
> Key: DRILL-6076
> URL: https://issues.apache.org/jira/browse/DRILL-6076
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, the default memory requirements for Drill are about 13GB, with the 
> following allocations:
> * 4GB Heap
> * 8GB Direct Memory
> * 1GB CodeCache
> * 512MB MaxPermSize
> Also, with Drill 1.12.0, the recommendation is to move to JDK8, which makes 
> the MaxPermSize as irrelevant.
> With that, the default requirements total to 13GB, which is rather high. This 
> is especially a problem for scenarios where people are trying out Drill and 
> might be using this in a development environment where 13GB is too high.
> When using the public [test 
> framework|https://github.com/mapr/drill-test-framework/] for Apache Drill, it 
> was observed that the framework's functional and unit tests passed 
> successfully with memory as little as 5GB; based on the following allocation:
> * 1GB Heap
> * 3GB Direct Memory
> * 512MB CodeCache
> * 512MB MaxPermSize
> Based on this finding, the proposal is to reduce the defaults from the 
> current settings to the values just mentioned above. The drill-env.sh file 
> already has details in the comments, along with the recommended values that 
> reflect the original 13GB defaults.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (DRILL-5833) Parquet reader fails with assertion error for Decimal9, Decimal18 types

2018-01-10 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved DRILL-5833.

Resolution: Fixed

> Parquet reader fails with assertion error for Decimal9, Decimal18 types
> ---
>
> Key: DRILL-5833
> URL: https://issues.apache.org/jira/browse/DRILL-5833
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.13.0
>
>
> The {{TestParquetWriter.testDecimal()}} test recently failed. As it turns 
> out, this test never ran properly before against the "old" Parquet reader. 
> Because the {{store.parquet.use_new_reader}} was left at a previous value, 
> sometimes the test would run against the "new" reader (and succeed) or 
> against the "old" reader (and fail.)
> Once the test was forced to run against the "old" reader, it fails deep in 
> the Parquet record reader in 
> {{DrillParquetGroupConverter.getConverterForType()}}.
> The code attempts to create a Decimal9 writer by calling 
> {{SingleMapWriter.decimal9(String name)}} to create the writer. However, the 
> code in this method says:
> {code}
>   public Decimal9Writer decimal9(String name) {
> // returns existing writer
> final FieldWriter writer = fields.get(name.toLowerCase());
> assert writer != null;
> return writer;
>   }
> {code}
> And, indeed, the assertion is triggered.
> As it turns out, the code for Decimal28 shows the proper solution:
> {code}
> mapWriter.decimal28Sparse(name, metadata.getScale(), metadata.getPrecision())
> {code}
> That is, pass the scale and precision to this form of the method which 
> actually creates the writer:
> {code}
>   public Decimal9Writer decimal9(String name, int scale, int precision) {
> {code}
> Applying the same pattern to for the Parquet Decimal9 and Decimal18 types 
> allows the above test to get past the asserts. Given this error, it is clear 
> that this test could never have run, and so the error in the Parquet reader 
> was never detected.
> It also turns out that the test itself is wrong, reversing the validation and 
> test queries:
> {code}
>   public void runTestAndValidate(String selection, String 
> validationSelection, String inputTable, String outputFile) throws Exception {
> try {
>   deleteTableIfExists(outputFile);
>   ...
>   // Query reads from the input (JSON) table
>   String query = String.format("SELECT %s FROM %s", selection, 
> inputTable);
>   String create = "CREATE TABLE " + outputFile + " AS " + query;
>   // validate query reads from the output (Parquet) table
>   String validateQuery = String.format("SELECT %s FROM " + outputFile, 
> validationSelection);
>   test(create);
>   testBuilder()
>   .unOrdered()
>   .sqlQuery(query) // Query under test is input query
>   .sqlBaselineQuery(validateQuery) // Baseline query is output query
>   .go();
> {code}
> Given this, it is the Parquet data that is wrong, not the baseline.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5833) Parquet reader fails with assertion error for Decimal9, Decimal18 types

2018-01-10 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320687#comment-16320687
 ] 

Paul Rogers commented on DRILL-5833:


Marked as resolved.

> Parquet reader fails with assertion error for Decimal9, Decimal18 types
> ---
>
> Key: DRILL-5833
> URL: https://issues.apache.org/jira/browse/DRILL-5833
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.13.0
>
>
> The {{TestParquetWriter.testDecimal()}} test recently failed. As it turns 
> out, this test never ran properly before against the "old" Parquet reader. 
> Because the {{store.parquet.use_new_reader}} was left at a previous value, 
> sometimes the test would run against the "new" reader (and succeed) or 
> against the "old" reader (and fail.)
> Once the test was forced to run against the "old" reader, it fails deep in 
> the Parquet record reader in 
> {{DrillParquetGroupConverter.getConverterForType()}}.
> The code attempts to create a Decimal9 writer by calling 
> {{SingleMapWriter.decimal9(String name)}} to create the writer. However, the 
> code in this method says:
> {code}
>   public Decimal9Writer decimal9(String name) {
> // returns existing writer
> final FieldWriter writer = fields.get(name.toLowerCase());
> assert writer != null;
> return writer;
>   }
> {code}
> And, indeed, the assertion is triggered.
> As it turns out, the code for Decimal28 shows the proper solution:
> {code}
> mapWriter.decimal28Sparse(name, metadata.getScale(), metadata.getPrecision())
> {code}
> That is, pass the scale and precision to this form of the method which 
> actually creates the writer:
> {code}
>   public Decimal9Writer decimal9(String name, int scale, int precision) {
> {code}
> Applying the same pattern to for the Parquet Decimal9 and Decimal18 types 
> allows the above test to get past the asserts. Given this error, it is clear 
> that this test could never have run, and so the error in the Parquet reader 
> was never detected.
> It also turns out that the test itself is wrong, reversing the validation and 
> test queries:
> {code}
>   public void runTestAndValidate(String selection, String 
> validationSelection, String inputTable, String outputFile) throws Exception {
> try {
>   deleteTableIfExists(outputFile);
>   ...
>   // Query reads from the input (JSON) table
>   String query = String.format("SELECT %s FROM %s", selection, 
> inputTable);
>   String create = "CREATE TABLE " + outputFile + " AS " + query;
>   // validate query reads from the output (Parquet) table
>   String validateQuery = String.format("SELECT %s FROM " + outputFile, 
> validationSelection);
>   test(create);
>   testBuilder()
>   .unOrdered()
>   .sqlQuery(query) // Query under test is input query
>   .sqlBaselineQuery(validateQuery) // Baseline query is output query
>   .go();
> {code}
> Given this, it is the Parquet data that is wrong, not the baseline.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6054) Issues in FindPartitionConditions

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320662#comment-16320662
 ] 

ASF GitHub Bot commented on DRILL-6054:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1078#discussion_r160741806
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/FindPartitionConditions.java
 ---
@@ -195,8 +195,16 @@ private void popOpStackAndBuildFilter() {
  * For all other operators we clear the children if one of the
  * children is a no push.
  */
-assert currentOp.getOp().getKind() == SqlKind.AND;
-newFilter = currentOp.getChildren().get(0);
+if (currentOp.getOp().getKind() == SqlKind.AND) {
+  newFilter = currentOp.getChildren().get(0);
+  for(OpState opState : opStack) {
--- End diff --

done.


> Issues in FindPartitionConditions
> -
>
> Key: DRILL-6054
> URL: https://issues.apache.org/jira/browse/DRILL-6054
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
> Fix For: 1.13.0
>
>
> When the condition is these cases, partition is not done correctly: 
> b = 3 OR (dir0 = 1 and a = 2)
> not (dir0 = 1 AND b = 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6054) Issues in FindPartitionConditions

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320661#comment-16320661
 ] 

ASF GitHub Bot commented on DRILL-6054:
---

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1078#discussion_r160741771
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/FindPartitionConditions.java
 ---
@@ -228,13 +236,16 @@ private boolean isHolisticExpression(RexCall call) {
 return false;
   }
 
+  protected boolean inputRefToPush(RexInputRef inputRef) {
--- End diff --

This is intentionally made to be 'protected' for future extension.
Right now, FindPartitionCondition use position based inputRef(using BitSet 
dirs) to mark which inputRef should be pushed. But in future, we may use name 
based policy to decide which one to push. 


> Issues in FindPartitionConditions
> -
>
> Key: DRILL-6054
> URL: https://issues.apache.org/jira/browse/DRILL-6054
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
> Fix For: 1.13.0
>
>
> When the condition is these cases, partition is not done correctly: 
> b = 3 OR (dir0 = 1 and a = 2)
> not (dir0 = 1 AND b = 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5879) Optimize "Like" operator

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320652#comment-16320652
 ] 

ASF GitHub Bot commented on DRILL-5879:
---

Github user sachouche closed the pull request at:

https://github.com/apache/drill/pull/1001


> Optimize "Like" operator
> 
>
> Key: DRILL-5879
> URL: https://issues.apache.org/jira/browse/DRILL-5879
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
> Environment: * 
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
> Fix For: 1.13.0
>
>
> Query: select  from  where colA like '%a%' or colA like 
> '%xyz%';
> Improvement Opportunities
> # Avoid isAscii computation (full access of the input string) since we're 
> dealing with the same column twice
> # Optimize the "contains" for-loop 
> Implementation Details
> 1)
> * Added a new integer variable "asciiMode" to the VarCharHolder class
> * The default value is -1 which indicates this info is not known
> * Otherwise this value will be set to either 1 or 0 based on the string being 
> in ASCII mode or Unicode
> * The execution plan already shares the same VarCharHolder instance for all 
> evaluations of the same column value
> * The asciiMode will be correctly set during the first LIKE evaluation and 
> will be reused across other LIKE evaluations
> 2) 
> * The "Contains" LIKE operation is quite expensive as the code needs to 
> access the input string to perform character based comparisons
> * Created 4 versions of the same for-loop to a) make the loop simpler to 
> optimize (Vectorization) and b) minimize comparisons
> Benchmarks
> * Lineitem table 100GB
> * Query: select l_returnflag, count(*) from dfs.`` where l_comment 
> not like '%a%' or l_comment like '%the%' group by l_returnflag
> * Before changes: 33sec
> * After changes: 27sec



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5879) Optimize "Like" operator

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320651#comment-16320651
 ] 

ASF GitHub Bot commented on DRILL-5879:
---

Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/1001
  
Created another pull request #1072to merge my changes with the one done 
with Padma's.


> Optimize "Like" operator
> 
>
> Key: DRILL-5879
> URL: https://issues.apache.org/jira/browse/DRILL-5879
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
> Environment: * 
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
> Fix For: 1.13.0
>
>
> Query: select  from  where colA like '%a%' or colA like 
> '%xyz%';
> Improvement Opportunities
> # Avoid isAscii computation (full access of the input string) since we're 
> dealing with the same column twice
> # Optimize the "contains" for-loop 
> Implementation Details
> 1)
> * Added a new integer variable "asciiMode" to the VarCharHolder class
> * The default value is -1 which indicates this info is not known
> * Otherwise this value will be set to either 1 or 0 based on the string being 
> in ASCII mode or Unicode
> * The execution plan already shares the same VarCharHolder instance for all 
> evaluations of the same column value
> * The asciiMode will be correctly set during the first LIKE evaluation and 
> will be reused across other LIKE evaluations
> 2) 
> * The "Contains" LIKE operation is quite expensive as the code needs to 
> access the input string to perform character based comparisons
> * Created 4 versions of the same for-loop to a) make the loop simpler to 
> optimize (Vectorization) and b) minimize comparisons
> Benchmarks
> * Lineitem table 100GB
> * Query: select l_returnflag, count(*) from dfs.`` where l_comment 
> not like '%a%' or l_comment like '%the%' group by l_returnflag
> * Before changes: 33sec
> * After changes: 27sec



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6079) Memory leak caused by ParquetRowGroupScan

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320624#comment-16320624
 ] 

ASF GitHub Bot commented on DRILL-6079:
---

Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/1087
  
Thank you Arina for catching this; I created the commit for QA before 
vacation so that they could verify the fix. At that time, I didn't have an 
Apache JIRA. I have now updated the comment to reflect the JIRA id DRILL-6079 
which is also the name of this remote branch.


> Memory leak caused by ParquetRowGroupScan
> -
>
> Key: DRILL-6079
> URL: https://issues.apache.org/jira/browse/DRILL-6079
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
> Fix For: 1.13.0
>
>
> Concurrency tests with assertion enabled indicate a memory leak in the 
> Parquet scanner code:
> 2017-10-25 17:28:52,149 [260ed3bc-dbdf-8b4a-66d6-b2bd804e8c74:frag:11:63] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> IllegalStateException: Memory was leaked by query. Memory leaked: (2097152)
> Allocator(op:11:63:6:ParquetRowGroupScan) 100/2097152/7348224/100 
> (res/actual/peak/limit)
> Fragment 11:63
> [Error Id: 5927becb-f000-43db-95b5-93b33064a6fd on mperf113.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Memory was leaked by query. Memory leaked: (2097152)
> Allocator(op:11:63:6:ParquetRowGroupScan) 100/2097152/7348224/100 
> (res/actual/peak/limit)
> Fragment 11:63
> [Error Id: 5927becb-f000-43db-95b5-93b33064a6fd on mperf113.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
>  ~[drill-common-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:301)
>  [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267)
>  [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_121]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> Caused by: java.lang.IllegalStateException: Memory was leaked by query. 
> Memory leaked: (2097152)
> Allocator(op:11:63:6:ParquetRowGroupScan) 100/2097152/7348224/100 
> (res/actual/peak/limit)
> at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:520) 
> ~[drill-memory-base-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.ops.AbstractOperatorExecContext.close(AbstractOperatorExecContext.java:86)
>  ~[drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:108)
>  ~[drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:435)
>  ~[drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.ops.FragmentContext.close(FragmentContext.java:424) 
> ~[drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:324)
>  [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)
>  [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
> ... 5 common frames omitted
> 2017-10-25 17:28:52,149 [260ed3bc-dbdf-8b4a-66d6-b2bd804e8c74:frag:6:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> Memory was leaked by query. Memory leaked: (2097152)
> Allocator(op:6:0:5:ParquetRowGroupScan) 100/2097152/19947520/100 
> (res/actual/peak/limit)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)



[jira] [Assigned] (DRILL-4807) ORDER BY aggregate function in window definition results in AssertionError: Internal error: invariant violated: conversion result not null

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-4807:
---

Assignee: Volodymyr Tkach

> ORDER BY aggregate function in window definition results in AssertionError: 
> Internal error: invariant violated: conversion result not null
> --
>
> Key: DRILL-4807
> URL: https://issues.apache.org/jira/browse/DRILL-4807
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0, 1.10.0
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Tkach
>  Labels: window_function
>
> This seems to be a problem when regular window function queries, when 
> aggregate function is used in ORDER BY clause inside the window definition.
> MapR Drill 1.8.0 commit ID : 34ca63ba
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT col0, SUM(col0) OVER ( PARTITION BY col7 
> ORDER BY MIN(col8)) avg_col0, col7 FROM `allTypsUniq.parquet` GROUP BY 
> col0,col8,col7;
> Error: SYSTEM ERROR: AssertionError: Internal error: invariant violated: 
> conversion result not null
> [Error Id: 19a3eced--4e83-ae0f-6b8ea21b2afd on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT col0, AVG(col0) OVER ( PARTITION BY col7 
> ORDER BY MIN(col8)) avg_col0, col7 FROM `allTypsUniq.parquet` GROUP BY 
> col0,col8,col7;
> Error: SYSTEM ERROR: AssertionError: Internal error: invariant violated: 
> conversion result not null
> [Error Id: c9b7ebf2-6097-41d8-bb73-d57da4ace8ad on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-07-26 09:26:16,717 [2868d347-3124-0c58-89ff-19e4ee891031:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 2868d347-3124-0c58-89ff-19e4ee891031: SELECT col0, AVG(col0) OVER ( PARTITION 
> BY col7 ORDER BY MIN(col8)) avg_col0, col7 FROM `allTypsUniq.parquet` GROUP 
> BY col0,col8,col7
> 2016-07-26 09:26:16,751 [2868d347-3124-0c58-89ff-19e4ee891031:foreman] ERROR 
> o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: AssertionError: Internal 
> error: invariant violated: conversion result not null
> [Error Id: c9b7ebf2-6097-41d8-bb73-d57da4ace8ad on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> AssertionError: Internal error: invariant violated: conversion result not null
> [Error Id: c9b7ebf2-6097-41d8-bb73-d57da4ace8ad on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:791)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:901) 
> [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:271) 
> [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Internal error: invariant violated: 
> conversion result not null
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: Internal error: invariant violated: 
> conversion result not null
> at org.apache.calcite.util.Util.newInternal(Util.java:777) 
> ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at org.apache.calcite.util.Util.permAssert(Util.java:885) 
> ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4063)
>  ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertSortExpression(SqlToRelConverter.java:4080)
>  ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertOver(SqlToRelConverter.java:1783)
>  ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.access$1100(SqlToRelConverter.java:185)
>  ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> 

[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320483#comment-16320483
 ] 

ASF GitHub Bot commented on DRILL-4185:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160712431
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java 
---
@@ -424,6 +429,23 @@ public MetadataContext getMetaContext() {
 return metaContext;
   }
 
+  /**
+   * @return true if this file selectionRoot points to an empty directory, 
false otherwise
+   */
+  public boolean isEmptyDirectory() {
+return emptyDirectory;
+  }
+
+  /**
+   * Setting this as true allows to identify this as empty directory file 
selection
+   *
+   * @param emptyDirectory empty directory flag
+   */
+  public void setEmptyDirectory(boolean emptyDirectory) {
--- End diff --

Please use set empty directory without flag.


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320482#comment-16320482
 ] 

ASF GitHub Bot commented on DRILL-4185:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160711830
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/SchemalessScan.java
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.base;
+
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.PhysicalOperatorSetupException;
+import org.apache.drill.exec.planner.logical.DynamicDrillTable;
+import org.apache.drill.exec.proto.CoordinationProtos;
+
+import java.util.List;
+
+/**
+ *  The type of scan operator, which allows to scan schemaless tables 
({@link DynamicDrillTable} with null selection)
+ */
+@JsonTypeName("schemaless-scan")
+public class SchemalessScan extends AbstractGroupScan implements SubScan {
+
+  private String selectionRoot;
--- End diff --

final


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320481#comment-16320481
 ] 

ASF GitHub Bot commented on DRILL-4185:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160717181
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java
 ---
@@ -250,20 +250,12 @@ private boolean metaDataFileExists(FileSystem fs, 
FileStatus dir) throws IOExcep
 }
 
 boolean isDirReadable(DrillFileSystem fs, FileStatus dir) {
-  Path p = new Path(dir.getPath(), 
ParquetFileWriter.PARQUET_METADATA_FILE);
   try {
-if (fs.exists(p)) {
-  return true;
-} else {
-
-  if (metaDataFileExists(fs, dir)) {
-return true;
-  }
-  List statuses = DrillFileSystemUtil.listFiles(fs, 
dir.getPath(), false);
-  return !statuses.isEmpty() && super.isFileReadable(fs, 
statuses.get(0));
-}
+// There should be at least one file, which is readable by Drill
+List statuses = DrillFileSystemUtil.listFiles(fs, 
dir.getPath(), false);
+return !statuses.isEmpty() && super.isFileReadable(fs, 
statuses.get(0));
--- End diff --

1. How we know that we have file in nested directory if filter checks with 
recursive flag false?
2. The `isReadable` method uses implicit assumption that if metadata file 
exists, there is some data in folder. Plus comment in this file also  points 
out that the same logic is used in `isDirReadable`. Since you are removing this 
logic from `isDirReadable` please make sure the logic is synced between two 
methods or at least comment is updated to reflect new approach.



> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320286#comment-16320286
 ] 

ASF GitHub Bot commented on DRILL-4185:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160666274
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseTestQuery.java ---
@@ -119,6 +125,15 @@ public static void setupDefaultTestCluster() throws 
Exception {
 // turns on the verbose errors in tests
 // sever side stacktraces are added to the message before sending back 
to the client
 test("ALTER SESSION SET `exec.errors.verbose` = true");
+emptyDirCreating();
+  }
+
+  /**
+   * Creates an empty directory under dfs.root schema.
+   */
+  private static void emptyDirCreating() {
--- End diff --

the method is deleted due to the previous comment


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320287#comment-16320287
 ] 

ASF GitHub Bot commented on DRILL-4185:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160663899
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestUnionAll.java 
---
@@ -1197,4 +1197,64 @@ public void testFieldWithDots() throws Exception {
   .baselineValues("1", "2", "1", null, "a")
   .go();
   }
-}
\ No newline at end of file
+
+  @Test
+  public void testUnionAllRightEmptyDir() throws Exception {
--- End diff --

1. Union works fine too. I have added similar tests to 
TestUnionDistinct.java class. Thanks
2. Test case is added. The result is the same as for querying single empty 
dir.


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320284#comment-16320284
 ] 

ASF GitHub Bot commented on DRILL-4185:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160663507
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/RefreshMetadataHandler.java
 ---
@@ -78,19 +78,24 @@ public PhysicalPlan getPlan(SqlNode sqlNode) throws 
ValidationException, RelConv
 
   final Table table = schema.getTable(tableName);
 
-  if(table == null){
+  if(table == null) {
--- End diff --

Done


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320289#comment-16320289
 ] 

ASF GitHub Bot commented on DRILL-4185:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160666020
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseTestQuery.java ---
@@ -119,6 +125,15 @@ public static void setupDefaultTestCluster() throws 
Exception {
 // turns on the verbose errors in tests
 // sever side stacktraces are added to the message before sending back 
to the client
 test("ALTER SESSION SET `exec.errors.verbose` = true");
+emptyDirCreating();
--- End diff --

Agree.
I have changed it. Empty directory is created in the scope of one test. But 
to avoid duplicating code, when in the class there are several test cases with 
using empty dir, the last is created once for entire class. 


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320285#comment-16320285
 ] 

ASF GitHub Bot commented on DRILL-4185:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160663414
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/SchemalessScan.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.base;
+
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.PhysicalOperatorSetupException;
+import org.apache.drill.exec.planner.logical.DynamicDrillTable;
+import org.apache.drill.exec.proto.CoordinationProtos;
+
+import java.util.List;
+
+/**
+ *  The type of scan operator, which allows to scan schemaless tables 
({@link DynamicDrillTable} with null selection)
+ */
+@JsonTypeName("schemaless-scan")
+public class SchemalessScan extends AbstractGroupScan implements SubScan {
+
+  public SchemalessScan(String userName) {
+super(userName);
+  }
+
+  public SchemalessScan(AbstractGroupScan that) {
+super(that);
+  }
+
+  @Override
+  public void applyAssignments(List 
endpoints) throws PhysicalOperatorSetupException {
+  }
+
+  @Override
+  public SubScan getSpecificScan(int minorFragmentId) throws 
ExecutionSetupException {
+return this;
+  }
+
+  @Override
+  public int getMaxParallelizationWidth() {
+return 1;
+  }
+
+  @Override
+  public String getDigest() {
+return toString();
--- End diff --

Thanks. Forgot to override this.

For now toString() is overridden. And to show selectionRoot I have replaced 
null file selection with current File Selection (with `emptyDirectory` flag, 
which indicates whether this FileSelection is an empty directory).

Profiles - > Physical Plan shows the following info:
`Scan(groupscan=[SchemalessScan [selectionRoot = 
file:/home/vitalii/Documents/parquet_for_union/folder2]]) : rowType = 
(DrillRecordRow[*, value]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 334`


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was 

[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320290#comment-16320290
 ] 

ASF GitHub Bot commented on DRILL-4185:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160685002
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java
 ---
@@ -250,20 +250,12 @@ private boolean metaDataFileExists(FileSystem fs, 
FileStatus dir) throws IOExcep
 }
 
 boolean isDirReadable(DrillFileSystem fs, FileStatus dir) {
-  Path p = new Path(dir.getPath(), 
ParquetFileWriter.PARQUET_METADATA_FILE);
   try {
-if (fs.exists(p)) {
-  return true;
-} else {
-
-  if (metaDataFileExists(fs, dir)) {
-return true;
-  }
-  List statuses = DrillFileSystemUtil.listFiles(fs, 
dir.getPath(), false);
-  return !statuses.isEmpty() && super.isFileReadable(fs, 
statuses.get(0));
-}
+// There should be at least one file, which is readable by Drill
+List statuses = DrillFileSystemUtil.listFiles(fs, 
dir.getPath(), false);
+return !statuses.isEmpty() && super.isFileReadable(fs, 
statuses.get(0));
--- End diff --

I did it on purpose. With the old logic of isDirReadable() method an empty 
directory, which contains parquet metadata files, will be processes with 
ParquetGroupScan as a Parquet Table. It leads to obtaining an exception:

https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java#L878

To process such table with SchemalessScan, isReadable method should return 
false for that case. In other words it shouldn't check availability of metadata 
cache files, but only really readable files by Drill.


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320283#comment-16320283
 ] 

ASF GitHub Bot commented on DRILL-4185:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160663989
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/TestEmptyInputSql.java ---
@@ -177,4 +177,33 @@ public void testQueryEmptyCsv() throws Exception {
 .run();
   }
 
+  @Test
+  public void testEmptyDirectory() throws Exception {
+final BatchSchema expectedSchema = new SchemaBuilder().build();
+
+testBuilder()
+.sqlQuery("select * from dfs.`%s`", EMPTY_DIR_NAME)
+.schemaBaseLine(expectedSchema)
+.build()
+.run();
+  }
+
+  @Test
+  public void testEmptyDirectoryAndFieldInQuery() throws Exception {
+final List> expectedSchema = 
Lists.newArrayList();
+final TypeProtos.MajorType majorType = 
TypeProtos.MajorType.newBuilder()
+.setMinorType(TypeProtos.MinorType.INT) // field "key" is absent 
in schema-less table
--- End diff --

Done


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320288#comment-16320288
 ] 

ASF GitHub Bot commented on DRILL-4185:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160664191
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetGroupScan.java
 ---
@@ -56,65 +56,50 @@ private void prepareTables(final String tableName, 
boolean refreshMetadata) thro
   public void testFix4376() throws Exception {
 prepareTables("4376_1", true);
 
-testBuilder()
-  .sqlQuery("SELECT COUNT(*) AS `count` FROM dfs.tmp.`4376_1/60*`")
-  .ordered()
-  .baselineColumns("count").baselineValues(1984L)
-  .go();
+int actualRecordCount = testSql("SELECT * FROM dfs.tmp.`4376_1/60*`");
+int expectedRecordCount = 1984;
+assertEquals(String.format("Received unexpected number of rows in 
output: expected=%d, received=%s",
--- End diff --

Done


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-6054) Issues in FindPartitionConditions

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6054:
---

Assignee: Arina Ielchiieva  (was: Chunhui Shi)

> Issues in FindPartitionConditions
> -
>
> Key: DRILL-6054
> URL: https://issues.apache.org/jira/browse/DRILL-6054
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Arina Ielchiieva
> Fix For: 1.13.0
>
>
> When the condition is these cases, partition is not done correctly: 
> b = 3 OR (dir0 = 1 and a = 2)
> not (dir0 = 1 AND b = 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-1491) Support for JDK 8

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-1491:

Fix Version/s: (was: Future)
   1.13.0

> Support for JDK 8
> -
>
> Key: DRILL-1491
> URL: https://issues.apache.org/jira/browse/DRILL-1491
> Project: Apache Drill
>  Issue Type: Task
>  Components: Tools, Build & Test
>Reporter: Aditya Kishore
> Fix For: 1.13.0
>
> Attachments: DRILL-1491.1.patch.txt
>
>
> This will be the umbrella JIRA used to track and fix issues with JDK 8 
> support.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-1491) Support for JDK 8

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-1491:

Priority: Blocker  (was: Major)

> Support for JDK 8
> -
>
> Key: DRILL-1491
> URL: https://issues.apache.org/jira/browse/DRILL-1491
> Project: Apache Drill
>  Issue Type: Task
>  Components: Tools, Build & Test
>Reporter: Aditya Kishore
>Priority: Blocker
> Fix For: 1.13.0
>
> Attachments: DRILL-1491.1.patch.txt
>
>
> This will be the umbrella JIRA used to track and fix issues with JDK 8 
> support.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6025) Execution time of a running query shown as 'NOT AVAILABLE'

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320184#comment-16320184
 ] 

ASF GitHub Bot commented on DRILL-6025:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1074
  
@prasadns14 please resolve the conflicts.


> Execution time of a running query shown as 'NOT AVAILABLE'
> --
>
> Key: DRILL-6025
> URL: https://issues.apache.org/jira/browse/DRILL-6025
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
>
> When a query is in 'RUNNING' state, the execution time is shown as 'NOT 
> AVAILABLE'
> We could show the execution duration till the current time



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6054) Issues in FindPartitionConditions

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320180#comment-16320180
 ] 

ASF GitHub Bot commented on DRILL-6054:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1078#discussion_r160665449
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/FindPartitionConditions.java
 ---
@@ -195,8 +195,16 @@ private void popOpStackAndBuildFilter() {
  * For all other operators we clear the children if one of the
  * children is a no push.
  */
-assert currentOp.getOp().getKind() == SqlKind.AND;
-newFilter = currentOp.getChildren().get(0);
+if (currentOp.getOp().getKind() == SqlKind.AND) {
+  newFilter = currentOp.getChildren().get(0);
+  for(OpState opState : opStack) {
--- End diff --

Please add space: `for (`


> Issues in FindPartitionConditions
> -
>
> Key: DRILL-6054
> URL: https://issues.apache.org/jira/browse/DRILL-6054
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
> Fix For: 1.13.0
>
>
> When the condition is these cases, partition is not done correctly: 
> b = 3 OR (dir0 = 1 and a = 2)
> not (dir0 = 1 AND b = 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-6054) Issues in FindPartitionConditions

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320179#comment-16320179
 ] 

ASF GitHub Bot commented on DRILL-6054:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1078#discussion_r160664549
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/FindPartitionConditions.java
 ---
@@ -228,13 +236,16 @@ private boolean isHolisticExpression(RexCall call) {
 return false;
   }
 
+  protected boolean inputRefToPush(RexInputRef inputRef) {
--- End diff --

Can be made private. Should be placed under public methods.


> Issues in FindPartitionConditions
> -
>
> Key: DRILL-6054
> URL: https://issues.apache.org/jira/browse/DRILL-6054
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
> Fix For: 1.13.0
>
>
> When the condition is these cases, partition is not done correctly: 
> b = 3 OR (dir0 = 1 and a = 2)
> not (dir0 = 1 AND b = 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-6054) Issues in FindPartitionConditions

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6054:

Labels:   (was: ready-to-commit)

> Issues in FindPartitionConditions
> -
>
> Key: DRILL-6054
> URL: https://issues.apache.org/jira/browse/DRILL-6054
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
> Fix For: 1.13.0
>
>
> When the condition is these cases, partition is not done correctly: 
> b = 3 OR (dir0 = 1 and a = 2)
> not (dir0 = 1 AND b = 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-6054) Issues in FindPartitionConditions

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6054:

Reviewer: Arina Ielchiieva

> Issues in FindPartitionConditions
> -
>
> Key: DRILL-6054
> URL: https://issues.apache.org/jira/browse/DRILL-6054
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
> Fix For: 1.13.0
>
>
> When the condition is these cases, partition is not done correctly: 
> b = 3 OR (dir0 = 1 and a = 2)
> not (dir0 = 1 AND b = 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-6054) Issues in FindPartitionConditions

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6054:

Affects Version/s: 1.12.0

> Issues in FindPartitionConditions
> -
>
> Key: DRILL-6054
> URL: https://issues.apache.org/jira/browse/DRILL-6054
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> When the condition is these cases, partition is not done correctly: 
> b = 3 OR (dir0 = 1 and a = 2)
> not (dir0 = 1 AND b = 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-6054) Issues in FindPartitionConditions

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6054:

Fix Version/s: 1.13.0

> Issues in FindPartitionConditions
> -
>
> Key: DRILL-6054
> URL: https://issues.apache.org/jira/browse/DRILL-6054
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> When the condition is these cases, partition is not done correctly: 
> b = 3 OR (dir0 = 1 and a = 2)
> not (dir0 = 1 AND b = 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-6054) Issues in FindPartitionConditions

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6054:

Labels: ready-to-commit  (was: )

> Issues in FindPartitionConditions
> -
>
> Key: DRILL-6054
> URL: https://issues.apache.org/jira/browse/DRILL-6054
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> When the condition is these cases, partition is not done correctly: 
> b = 3 OR (dir0 = 1 and a = 2)
> not (dir0 = 1 AND b = 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-6076) Reduce the default memory from a total of 13GB to 5GB

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6076:

Issue Type: Task  (was: Bug)

> Reduce the default memory from a total of 13GB to 5GB
> -
>
> Key: DRILL-6076
> URL: https://issues.apache.org/jira/browse/DRILL-6076
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, the default memory requirements for Drill are about 13GB, with the 
> following allocations:
> * 4GB Heap
> * 8GB Direct Memory
> * 1GB CodeCache
> * 512MB MaxPermSize
> Also, with Drill 1.12.0, the recommendation is to move to JDK8, which makes 
> the MaxPermSize as irrelevant.
> With that, the default requirements total to 13GB, which is rather high. This 
> is especially a problem for scenarios where people are trying out Drill and 
> might be using this in a development environment where 13GB is too high.
> When using the public [test 
> framework|https://github.com/mapr/drill-test-framework/] for Apache Drill, it 
> was observed that the framework's functional and unit tests passed 
> successfully with memory as little as 5GB; based on the following allocation:
> * 1GB Heap
> * 3GB Direct Memory
> * 512MB CodeCache
> * 512MB MaxPermSize
> Based on this finding, the proposal is to reduce the defaults from the 
> current settings to the values just mentioned above. The drill-env.sh file 
> already has details in the comments, along with the recommended values that 
> reflect the original 13GB defaults.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-6076) Reduce the default memory from a total of 13GB to 5GB

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6076:

Labels: ready-to-commit  (was: )

> Reduce the default memory from a total of 13GB to 5GB
> -
>
> Key: DRILL-6076
> URL: https://issues.apache.org/jira/browse/DRILL-6076
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, the default memory requirements for Drill are about 13GB, with the 
> following allocations:
> * 4GB Heap
> * 8GB Direct Memory
> * 1GB CodeCache
> * 512MB MaxPermSize
> Also, with Drill 1.12.0, the recommendation is to move to JDK8, which makes 
> the MaxPermSize as irrelevant.
> With that, the default requirements total to 13GB, which is rather high. This 
> is especially a problem for scenarios where people are trying out Drill and 
> might be using this in a development environment where 13GB is too high.
> When using the public [test 
> framework|https://github.com/mapr/drill-test-framework/] for Apache Drill, it 
> was observed that the framework's functional and unit tests passed 
> successfully with memory as little as 5GB; based on the following allocation:
> * 1GB Heap
> * 3GB Direct Memory
> * 512MB CodeCache
> * 512MB MaxPermSize
> Based on this finding, the proposal is to reduce the defaults from the 
> current settings to the values just mentioned above. The drill-env.sh file 
> already has details in the comments, along with the recommended values that 
> reflect the original 13GB defaults.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-6076) Reduce the default memory from a total of 13GB to 5GB

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6076:

Reviewer: Abhishek Girish

> Reduce the default memory from a total of 13GB to 5GB
> -
>
> Key: DRILL-6076
> URL: https://issues.apache.org/jira/browse/DRILL-6076
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, the default memory requirements for Drill are about 13GB, with the 
> following allocations:
> * 4GB Heap
> * 8GB Direct Memory
> * 1GB CodeCache
> * 512MB MaxPermSize
> Also, with Drill 1.12.0, the recommendation is to move to JDK8, which makes 
> the MaxPermSize as irrelevant.
> With that, the default requirements total to 13GB, which is rather high. This 
> is especially a problem for scenarios where people are trying out Drill and 
> might be using this in a development environment where 13GB is too high.
> When using the public [test 
> framework|https://github.com/mapr/drill-test-framework/] for Apache Drill, it 
> was observed that the framework's functional and unit tests passed 
> successfully with memory as little as 5GB; based on the following allocation:
> * 1GB Heap
> * 3GB Direct Memory
> * 512MB CodeCache
> * 512MB MaxPermSize
> Based on this finding, the proposal is to reduce the defaults from the 
> current settings to the values just mentioned above. The drill-env.sh file 
> already has details in the comments, along with the recommended values that 
> reflect the original 13GB defaults.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5868) Support SQL syntax highlighting of queries

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320113#comment-16320113
 ] 

ASF GitHub Bot commented on DRILL-5868:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1084#discussion_r160650090
  
--- Diff: 
exec/java-exec/src/main/resources/rest/static/js/ace-code-editor/snippets/sql.js
 ---
@@ -0,0 +1,46 @@
+/**
+ * Drill SQL Syntax Snippets
+ */
+
+ace.define("ace/snippets/sql",["require","exports","module"], 
function(require, exports, module) {
+"use strict";
+
+exports.snippetText = "snippet info\n\
+   select * from INFORMATION_SCHEMA.${1:};\n\
+snippet sysmem\n\
--- End diff --

Turns out that my Chrome, it works when using -> Ctrl + space.
Could you please update Jira with list of available snippets and ways how 
to enable them?


> Support SQL syntax highlighting of queries
> --
>
> Key: DRILL-5868
> URL: https://issues.apache.org/jira/browse/DRILL-5868
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Web Server
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> It would be nice to have the Query Editor support syntax highlighting.
> An autocomplete would be even better as new functions are introduced in Drill



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5868) Support SQL syntax highlighting of queries

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320115#comment-16320115
 ] 

ASF GitHub Bot commented on DRILL-5868:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1084#discussion_r160650903
  
--- Diff: 
exec/java-exec/src/main/resources/rest/static/js/ace-code-editor/snippets/sql.js
 ---
@@ -0,0 +1,46 @@
+/**
+ * Drill SQL Syntax Snippets
+ */
+
+ace.define("ace/snippets/sql",["require","exports","module"], 
function(require, exports, module) {
+"use strict";
+
+exports.snippetText = "snippet info\n\
+   select * from INFORMATION_SCHEMA.${1:};\n\
+snippet sysmem\n\
+   select * from sys.mem;\n\
+snippet sysopt\n\
+   select * from sys.opt;\n\
+snippet sysbit\n\
+   select * from sys.bit;\n\
+snippet sysconn\n\
+   select * from sys.conn;\n\
+snippet sysprof\n\
+   select * from sys.prof;\n\
+snippet cview\n\
--- End diff --

I am not sure that snippets for sys tables are correct. Please see below 
list of current tables present in sys schema:
```
0: jdbc:drill:drillbit=localhost> show tables;
+---+---+
| TABLE_SCHEMA  |  TABLE_NAME   |
+---+---+
| sys   | profiles_json |
| sys   | drillbits |
| sys   | boot  |
| sys   | internal_options  |
| sys   | threads   |
| sys   | options_val   |
| sys   | profiles  |
| sys   | connections   |
| sys   | internal_options_val  |
| sys   | memory|
| sys   | version   |
| sys   | options   |
```


> Support SQL syntax highlighting of queries
> --
>
> Key: DRILL-5868
> URL: https://issues.apache.org/jira/browse/DRILL-5868
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Web Server
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> It would be nice to have the Query Editor support syntax highlighting.
> An autocomplete would be even better as new functions are introduced in Drill



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5868) Support SQL syntax highlighting of queries

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320114#comment-16320114
 ] 

ASF GitHub Bot commented on DRILL-5868:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1084#discussion_r160651815
  
--- Diff: 
exec/java-exec/src/main/resources/rest/static/js/ace-code-editor/snippets/sql.js
 ---
@@ -0,0 +1,46 @@
+/**
+ * Drill SQL Syntax Snippets
+ */
+
+ace.define("ace/snippets/sql",["require","exports","module"], 
function(require, exports, module) {
+"use strict";
+
+exports.snippetText = "snippet info\n\
+   select * from INFORMATION_SCHEMA.${1:};\n\
+snippet sysmem\n\
+   select * from sys.mem;\n\
+snippet sysopt\n\
+   select * from sys.opt;\n\
+snippet sysbit\n\
+   select * from sys.bit;\n\
+snippet sysconn\n\
+   select * from sys.conn;\n\
+snippet sysprof\n\
+   select * from sys.prof;\n\
+snippet cview\n\
+   create view ${1:[workspace]}.${2:} ( ${3:} )  as 
\n\
+   ${4:};\n\
+snippet ctas\n\
+   create table ${1:} ( ${2:} )  as \n\
+   ${3:};\n\
+snippet ctemp\n\
+   create temporary table ${1:} ( ${2:} )  as \n\
--- End diff --

Could you please also add snippet for create function: `CREATE FUNCTION 
USING JAR '.jar';`?


> Support SQL syntax highlighting of queries
> --
>
> Key: DRILL-5868
> URL: https://issues.apache.org/jira/browse/DRILL-5868
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Web Server
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> It would be nice to have the Query Editor support syntax highlighting.
> An autocomplete would be even better as new functions are introduced in Drill



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5868) Support SQL syntax highlighting of queries

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5868:

Labels: doc-impacting  (was: )

> Support SQL syntax highlighting of queries
> --
>
> Key: DRILL-5868
> URL: https://issues.apache.org/jira/browse/DRILL-5868
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Web Server
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> It would be nice to have the Query Editor support syntax highlighting.
> An autocomplete would be even better as new functions are introduced in Drill



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5868) Support SQL syntax highlighting of queries

2018-01-10 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5868:

Affects Version/s: 1.12.0

> Support SQL syntax highlighting of queries
> --
>
> Key: DRILL-5868
> URL: https://issues.apache.org/jira/browse/DRILL-5868
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Web Server
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> It would be nice to have the Query Editor support syntax highlighting.
> An autocomplete would be even better as new functions are introduced in Drill



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (DRILL-5037) NPE in Parquet Decimal Converter with the complex parquet reader

2018-01-10 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-5037.

   Resolution: Fixed
Fix Version/s: 1.12.0

Fixed in 
[42fc11e|https://github.com/apache/drill/commit/42fc11e53557477ac01c7dd31c3aa93e22fb4384]

>  NPE in Parquet Decimal Converter with the complex parquet reader
> -
>
> Key: DRILL-5037
> URL: https://issues.apache.org/jira/browse/DRILL-5037
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Rahul Challapalli
> Fix For: 1.12.0
>
> Attachments: 
> 0001-Fix-the-DecimalX-writer-invocation-in-DrillParquetGr.patch, 
> drill5037.parquet
>
>
> git.commit.id.abbrev=4b1902c
> The below query fails when we enable the new parquet reader
> Query :
> {code}
> alter session set `store.parquet.use_new_reader` = true;
>  select
>  count(*) as count_star,
>   sum(a.d18)  as sum_d18,
>   --round(avg(a.d18)) as round_avg_d18,
>   cast(avg(a.d18) as bigint)  as round_avg_d18,
>   --trunc(avg(a.d18)) as trunc_avg_d18,
>   cast(avg(a.d18) as bigint)  as trunc_avg_d18,
>   --sum(case when a.d18 = 0 then 100 else round(a.d18/12) end) as 
> case_in_sum_d18,
>   cast(sum(case when a.d18 = 0 then 100 else round(a.d18/12) end) 
> as bigint) as case_in_sum_d18,
>   --coalesce(sum(case when a.d18 = 0 then 100 else 
> round(a.d18/12) end), 0) as case_in_sum_d18
>   cast(coalesce(sum(case when a.d18 = 0 then 100 else 
> round(a.d18/12) end), 0) as bigint) as case_in_sum_d18
>  
> from
>   alltypes_with_nulls a
>   left outer join alltypes_with_nulls b on (a.c_integer = 
> b.c_integer)
>   left outer join alltypes_with_nulls c on (b.c_integer = 
> c.c_integer)
> group by
>   a.c_varchar
>   ,b.c_varchar
>   ,c.c_varchar
>   ,a.c_integer
>   ,b.c_integer
>   ,c.c_integer
>   ,a.d9
>   ,b.d9
>   ,c.d9
>   ,a.d18
>   ,b.d18
>   ,c.d18
>   ,a.d28
>   ,b.d28
>   ,c.d28
>   ,a.d38
>   ,b.d38
>   ,c.d38
>   ,a.c_date
>   ,b.c_date
>   ,c.c_date
>   ,a.c_date
>   ,b.c_date
>   ,c.c_date
>   ,a.c_time
>  order by
>   a.c_varchar
>   ,b.c_varchar
>   ,c.c_varchar
>   ,a.c_integer
>   ,b.c_integer
>   ,c.c_integer
>   ,a.d9
>   ,b.d9
>   ,c.d9
>   ,a.d18
>   ,b.d18
>   ,c.d18
>   ,a.d28
>   ,b.d28
>   ,c.d28
>   ,a.d38
>   ,b.d38
>   ,c.d38
>   ,a.c_date
>   ,b.c_date
>   ,c.c_date
>   ,a.c_date
>   ,b.c_date
>   ,c.c_date
>   ,a.c_time
> {code}
> I attached the data set and error from the log file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-4456) Hive translate function is not working

2018-01-10 Thread Volodymyr Vysotskyi (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319993#comment-16319993
 ] 

Volodymyr Vysotskyi commented on DRILL-4456:


After rebasing to the Calcite1.15, this function may be used, but Drill throws 
another exception:
{noformat}
Caused By (org.apache.calcite.sql.validate.SqlValidatorException) No match 
found for function signature TRANSLATE3(, , )
{noformat}
This error appears because Calcite renames function name from {{translate}} to 
{{translate3}}, but Hive has only function with name {{translate}}.

> Hive translate function is not working
> --
>
> Key: DRILL-4456
> URL: https://issues.apache.org/jira/browse/DRILL-4456
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Hive
>Affects Versions: 1.5.0
>Reporter: Arina Ielchiieva
> Fix For: Future
>
>
> In Hive "select translate(name, 'A', 'B') from users" works fine.
> But in Drill "select translate(name, 'A', 'B') from hive.`users`" returns the 
> following error:
> org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: 
> Encountered "," at line 1, column 22. Was expecting one of: "USING" ... "NOT" 
> ... "IN" ... "BETWEEN" ... "LIKE" ... "SIMILAR" ... "=" ... ">" ... "<" ... 
> "<=" ... ">=" ... "<>" ... "+" ... "-" ... "*" ... "/" ... "||" ... "AND" ... 
> "OR" ... "IS" ... "MEMBER" ... "SUBMULTISET" ... "MULTISET" ... "[" ... "." 
> ... "(" ... while parsing SQL query: select translate(name, 'A', 'B') from 
> hive.users ^ [Error Id: ba21956b-3285-4544-b3b2-fab68b95be1f on 
> localhost:31010]
> Root cause:
> Calcite follows the standard SQL reference.
> SQL reference,  ISO/IEC 9075-2:2011(E), section 6.30
>  ::=
>   TRANSLATE  
> USING  
> To fix:
> 1. add support to translate (expession, from_string, to_string) alternative 
> syntax
> 2. add unit test in org.apache.drill.exec.fn.hive.TestInbuiltHiveUDFs
> Changes can be made directly in Calcite and then upgrade to appropriate 
> Calcite version. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5833) Parquet reader fails with assertion error for Decimal9, Decimal18 types

2018-01-10 Thread Volodymyr Vysotskyi (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319973#comment-16319973
 ] 

Volodymyr Vysotskyi commented on DRILL-5833:


[~Paul.Rogers] since this issue was fixed in the scope of DRILL-5832, may we 
resolve this Jira?

> Parquet reader fails with assertion error for Decimal9, Decimal18 types
> ---
>
> Key: DRILL-5833
> URL: https://issues.apache.org/jira/browse/DRILL-5833
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.13.0
>
>
> The {{TestParquetWriter.testDecimal()}} test recently failed. As it turns 
> out, this test never ran properly before against the "old" Parquet reader. 
> Because the {{store.parquet.use_new_reader}} was left at a previous value, 
> sometimes the test would run against the "new" reader (and succeed) or 
> against the "old" reader (and fail.)
> Once the test was forced to run against the "old" reader, it fails deep in 
> the Parquet record reader in 
> {{DrillParquetGroupConverter.getConverterForType()}}.
> The code attempts to create a Decimal9 writer by calling 
> {{SingleMapWriter.decimal9(String name)}} to create the writer. However, the 
> code in this method says:
> {code}
>   public Decimal9Writer decimal9(String name) {
> // returns existing writer
> final FieldWriter writer = fields.get(name.toLowerCase());
> assert writer != null;
> return writer;
>   }
> {code}
> And, indeed, the assertion is triggered.
> As it turns out, the code for Decimal28 shows the proper solution:
> {code}
> mapWriter.decimal28Sparse(name, metadata.getScale(), metadata.getPrecision())
> {code}
> That is, pass the scale and precision to this form of the method which 
> actually creates the writer:
> {code}
>   public Decimal9Writer decimal9(String name, int scale, int precision) {
> {code}
> Applying the same pattern to for the Parquet Decimal9 and Decimal18 types 
> allows the above test to get past the asserts. Given this error, it is clear 
> that this test could never have run, and so the error in the Parquet reader 
> was never detected.
> It also turns out that the test itself is wrong, reversing the validation and 
> test queries:
> {code}
>   public void runTestAndValidate(String selection, String 
> validationSelection, String inputTable, String outputFile) throws Exception {
> try {
>   deleteTableIfExists(outputFile);
>   ...
>   // Query reads from the input (JSON) table
>   String query = String.format("SELECT %s FROM %s", selection, 
> inputTable);
>   String create = "CREATE TABLE " + outputFile + " AS " + query;
>   // validate query reads from the output (Parquet) table
>   String validateQuery = String.format("SELECT %s FROM " + outputFile, 
> validationSelection);
>   test(create);
>   testBuilder()
>   .unOrdered()
>   .sqlQuery(query) // Query under test is input query
>   .sqlBaselineQuery(validateQuery) // Baseline query is output query
>   .go();
> {code}
> Given this, it is the Parquet data that is wrong, not the baseline.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)