[ 
https://issues.apache.org/jira/browse/DERBY-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633967#action_12633967
 ] 

Bryan Pendleton commented on DERBY-3872:
----------------------------------------

Hi Mamta, thanks for all the hard work you've been doing to investigate this!

I think your observation about the "if" statement in SelectNode.preprocess is 
very interesting.

I believe that the idea behind that "if" statement is that some queries that 
are written
as GROUP BY queries can be replaced (rewritten) to be DISTINCT queries instead.

As an example, consider this very simple query:

   select last_name from employee group by last_name

That query can be re-written (internally) to be:

  select distinct last_name from employee

as the two queries have the same meaning (we only want 1 row in the
output for each different value of the LAST_NAME column).

This transformation is only valid if no aggregate functions (MIN MAX COUNT SUM 
AVG)
are used, because if the aggregate functions are used then GROUP BY does
something different than DISTINCT.

Now, in the particular case of your problematic query, there are in fact no
aggregate functions being used, which is why this "if" statement is relevant.

I think that what removing the (havingClause == null) check from the "if" 
statement
did was to give SelectNode.preprocess permission to convert your query:

  select q1."DEPTNO" from DEPTTAB q1, EMPTAB q2 -- DERBY-PROPERTIES 
joinStrategy = HASH
    where ( q2."DEPT_DEPTNO" = q1."DEPTNO") 
    GROUP BY q1."DEPTNO" HAVING 1 =
          (  select q3."SALARY" from EMPTAB q3 where ( q3."EMPID" = 
q1."DEPTNO") ) 

into the simpler, but (I think) equivalent query:

select distinct q1."DEPTNO" from DEPTTAB q1, EMPTAB q2 
where ( q2."DEPT_DEPTNO" = q1."DEPTNO"  and 
   1 =  ( select q3."SALARY" from EMPTAB q3 where ( q3."EMPID" = q1."DEPTNO") ) 

  (I'm not totally sure the above is valid syntax but hopefully it's close)

That is, the "having" clause is pretty much like the "where" clause, but it is
applied *after* the GROUP BY is performed, whereas the where clause is
applied *before* the GROUP BY.  By transforming the query internally to
remove the GROUP BY, the HAVING clause is essentially collapsed into the where.

Now, regarding why this is so relevant to your case, and why the "sort 
avoidance"
question is important, here's a few more ideas:

GROUP BY grouping can basically be performed in two fundamental ways:
 - sorting
 - hashing

Either one of these techniques clusters the "like" values together so that they
can be recognized as belonging to the same group and be collapsed.

In Derby, I think, we *only* use the sorting techinque for group by processing,
*not* the hashing technique (but please don't trust me on this, instead verify
this yourself). However, there is a weird little optimization where the 
sorting-based
group by processing can either happen:
 - in the sorter, during a sort
 - or in the GROUP BY result set
Regardless of where it happens, the idea is that the rows are sorted on the
values that we're grouping by, then as we process them we compare the
current row to the previous row, and if they have the same "key" then they
go into the same "group" in the result.

Sorting the data can be *very* expensive, which is why the sort avoidance
algorithm is an important optimization. If the compilation / optimization 
processing
can determine that the rows are going to appear in sorted order by the group by
column(s), then we can skip the sort, so we do so.

Being able to skip the sort is usually because the column that we are
grouping on is a column which is used in an index, and by doing an
index-to-base-table access for that data, we can ensure that we get
the data via the index, and hence it is in sorted order.

It sounds like the bug may have something to do with *thinking* that the rows
are going to appear in sorted order, so we can skip the sort, but then the 
query plan gets changed around late in optimization and the actual rows
do not appear in sorted order, so the assumption that we could group them
without needing to sort them was wrong.

And the extra result set that you were seeing in the one case (11 result sets
versus 10) may have to do with the result set that would have been present
to perform the sorting, or it may have to do with an index-to-base traversal
that was expected to occur in order to ensure that the rows were in sorted 
order.

I hope some of this is helpful; you are definitely going in the right direction 
with
trying to track this problem down; these problems are really tough so don't
get discouraged just keep it up!


> NullPoinerException thrown when INTEGER function used as a predicate in a 
> WHERE clause of a SELECT .. GROUP BY ..  HAVING statement
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3872
>                 URL: https://issues.apache.org/jira/browse/DERBY-3872
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.3.3.0
>            Reporter: Stan Bradbury
>            Assignee: Mamta A. Satoor
>         Attachments: NPE_Reproduction.sql, QueryWithoutTruePred.out, 
> QueryWithTruePred.out
>
>
> Use attached SQL script to create two tables , execute the following SQL and 
> throw the exception and stack trace below.  NOTE:  removing the 'always true' 
> clause '.. ( integer (1.1) = 1) .." from the SQL and the query does not fail. 
>  Releated??
> select  q1."DEPTNO" from DEPTTAB q1, EMPTAB q2 where  ( integer (1.1) = 1)  
> and  ( q2."DEPT_DEPTNO" =  q1."DEPTNO")  
> GROUP BY q1."DEPTNO" 
> HAVING  max( q2."SALARY") >=  ( select  q3."SALARY" from EMPTAB q3 where  ( 
> q3."EMPID" =  q1."DEPTNO") ) 
> ERROR 38000: The exception 'java.lang.NullPointerException' was thrown while 
> evaluating an expression.
> . . .derby.iapi.error.StandardException.newException
> . . .derby.iapi.error.StandardException.unexpectedUserException
> . . .derby.impl.services.reflect.DirectCall.invoke
> . . .derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore
> . . .derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow
> . . .derby.impl.jdbc.EmbedResultSet.movePosition
> . . .derby.impl.jdbc.EmbedResultSet.next
> . . .derby.tools.JDBCDisplayUtil.indent_DisplayResults
> . . .derby.tools.JDBCDisplayUtil.indent_DisplayResults
> . . .derby.tools.JDBCDisplayUtil.indent_DisplayResults
> . . .derby.tools.JDBCDisplayUtil.DisplayResults
> . . .derby.impl.tools.ij.utilMain.displayResult
> . . .derby.impl.tools.ij.utilMain.doCatch
> . . .derby.impl.tools.ij.utilMain.runScriptGuts
> . . .derby.impl.tools.ij.utilMain.go
> . . .derby.impl.tools.ij.Main.go
> . . .derby.impl.tools.ij.Main.mainCore
> . . .derby.impl.tools.ij.Main14.main
> . . .derby.tools.ij.main
> Caused by: java.lang.NullPointerException
> . . .derby.iapi.types.NumberDataType.compare
> . . .derby.impl.store.access.btree.ControlRow.compareIndexRowFromPageToKey
> . . .derby.impl.store.access.btree.ControlRow.searchForEntry
> . . .derby.impl.store.access.btree.LeafControlRow.search
> . . .derby.impl.store.access.btree.BTreeScan.positionAtStartForForwardScan
> . . .derby.impl.store.access.btree.BTreeForwardScan.positionAtStartPosition
> . . .derby.impl.store.access.btree.BTreeForwardScan.fetchRows
> . . .derby.impl.store.access.btree.BTreeScan.fetchNext
> . . .derby.impl.sql.execute.TableScanResultSet.getNextRowCore
> . . .derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore
> . . .derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore
> . . .derby.impl.sql.execute.OnceResultSet.getNextRowCore
> . . .derby.exe.ac601a400fx011cx480cx5eacx00000010d8100.g0
> . . .derby.exe.ac601a400fx011cx480cx5eacx00000010d8100.e6
>       ... 17 more
> ============= begin nested exception, level (1) ===========
> java.lang.NullPointerException
> . . .derby.iapi.types.NumberDataType.compare
> . . .derby.impl.store.access.btree.ControlRow.compareIndexRowFromPageToKey
> . . .derby.impl.store.access.btree.ControlRow.searchForEntry
> . . .derby.impl.store.access.btree.LeafControlRow.search
> . . .derby.impl.store.access.btree.BTreeScan.positionAtStartForForwardScan
> . . .derby.impl.store.access.btree.BTreeForwardScan.positionAtStartPosition
> . . .derby.impl.store.access.btree.BTreeForwardScan.fetchRows
> . . .derby.impl.store.access.btree.BTreeScan.fetchNext
> . . .derby.impl.sql.execute.TableScanResultSet.getNextRowCore
> . . .derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore
> . . .derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore
> . . .derby.impl.sql.execute.OnceResultSet.getNextRowCore
> . . .derby.exe.ac601a400fx011cx480cx5eacx00000010d8100.g0
> . . .derby.exe.ac601a400fx011cx480cx5eacx00000010d8100.e6
> . . .derby.impl.services.reflect.DirectCall.invoke
> . . .derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore
> . . .derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow
> . . .derby.impl.jdbc.EmbedResultSet.movePosition
> . . .derby.impl.jdbc.EmbedResultSet.next
> . . .derby.tools.JDBCDisplayUtil.indent_DisplayResults
> . . .derby.tools.JDBCDisplayUtil.indent_DisplayResults
> . . .derby.tools.JDBCDisplayUtil.indent_DisplayResults
> . . .derby.tools.JDBCDisplayUtil.DisplayResults
> . . .derby.impl.tools.ij.utilMain.displayResult
> . . .derby.impl.tools.ij.utilMain.doCatch
> . . .derby.impl.tools.ij.utilMain.runScriptGuts
> . . .derby.impl.tools.ij.utilMain.go
> . . .derby.impl.tools.ij.Main.go
> . . .derby.impl.tools.ij.Main.mainCore
> . . .derby.impl.tools.ij.Main14.main
> . . .derby.tools.ij.main
> ============= end nested exception, level (1) ===========
> Cleanup action completed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to