[jira] [Created] (DRILL-4279) The plan is either confusing or could lead to execution problem, when no columns is required from SCAN

Jinfeng Ni (JIRA) Sun, 17 Jan 2016 21:16:54 -0800

Jinfeng Ni created DRILL-4279:
---------------------------------

             Summary: The plan is either confusing or could lead to execution 
problem, when no columns is required from SCAN
                 Key: DRILL-4279
                 URL: https://issues.apache.org/jira/browse/DRILL-4279
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
            Reporter: Jinfeng Ni



When query does not specify any specific column to be returned SCAN,  for 
instance,

{code}
Q1:  select count(*) from T1;
Q2:  select 1 + 100 from T1;
Q3:  select  1.0 + random() from T1; 
{code}

Drill's planner would use a ColumnList with * column, plus a SKIP_ALL mode. 
However, the MODE is not serialized / deserialized. This leads to two problems.
1).  The EXPLAIN plan is confusing, since there is no way to different from a 
"SELECT * " query from this SKIP_ALL mode. 
For instance, 
{code}
explain plan for select count(*) from dfs.`/Users/jni/work/data/yelp/t1`;
00-03          Project($f0=[0])
00-04            Scan(groupscan=[EasyGroupScan 
[selectionRoot=file:/Users/jni/work/data/yelp/t1, numFiles=2, columns=[`*`], 
files= ... 
{code} 

2) If the query is to be executed distributed / parallel,  the missing 
serialization of mode would means some Fragment is fetching all the columns, 
while some Fragment is skipping all the columns. That will cause execution 
error.

For instance, by changing slice_target to enforce the query to be executed in 
multiple fragments, it will hit execution error. 

{code}
select count(*) from dfs.`/Users/jni/work/data/yelp/t1`;
org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: Error 
parsing JSON - You tried to start when you are using a ValueWriter of type 
NullableBitWriterImpl.
{code}

Directory "t1" just contains two yelp JSON files. 

Ideally, I think when no columns is required from SCAN, the explain plan should 
show an empty of column list. The MODE of SKIP_ALL together with star * column 
seems to be confusing and error prone. 







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4279) The plan is either confusing or could lead to execution problem, when no columns is required from SCAN

Reply via email to