[jira] [Commented] (DRILL-5740) hash agg fail to read spill file

2017-08-25 Thread Boaz Ben-Zvi (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142545#comment-16142545
 ] 

Boaz Ben-Zvi commented on DRILL-5740:
-

[~paul-rogers] and I brainstormed and have a good guess of the cause for this 
bug: When running concurrent spilling queries, one of them terminates first and 
deletes the common subdirectory "10.10.30.168-31010" .
Possible solution -- make this name a part of the "per minor fragment" 
subdirectory (e.g. "265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34")


> hash agg fail to read spill file
> 
>
> Key: DRILL-5740
> URL: https://issues.apache.org/jira/browse/DRILL-5740
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.12.0
>Reporter: Chun Chang
>Assignee: Boaz Ben-Zvi
>Priority: Critical
>
> -Build: | 1.12.0-SNAPSHOT  | 11008d029bafa36279e3045c4ed1a64366080620
> -Multi-node drill cluster
> Running a query causing hash agg spill fails with the following error. And 
> this seems to be a regression.
> {noformat}
> Execution Failures:
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg5.q
> Query:
> select gby_date, gby_int32_rand, sum(int32_field), avg(float_field), 
> min(boolean_field), count(double_rand) from 
> dfs.`/drill/testdata/hagg/PARQUET-500M.parquet` group by gby_date, 
> gby_int32_rand order by gby_date, gby_int32_rand limit 30
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> Fragment 1:34
> [Error Id: 291a79f8-9b7a-485d-9404-e7b7fe1d8f1e on 10.10.30.168:31010]
>   (java.lang.RuntimeException) java.io.FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> 
> org.apache.drill.exec.physical.impl.aggregate.SpilledRecordbatch.():67
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.outputCurrentBatch():980
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5740) hash agg fail to read spill file

2017-08-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142611#comment-16142611
 ] 

ASF GitHub Bot commented on DRILL-5740:
---

GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/924

DRILL-5740: Ensure spill directories are unique

A recent change added the node name and port to make the spill path
unique. Turns out we need to add this information to the single spill
directory name. The previous change use the node ID as a parent
directory, which turns out not to work well in practice.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-5740

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/924.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #924


commit 042ec53cc4300484404d6663bd2e84ec429d945f
Author: Paul Rogers 
Date:   2017-08-26T04:28:29Z

DRILL-5740: Ensure spill directories are unique

A recent change added the node name and port to make the spill path
unique. Turns out we need to add this information to the single spill
directory name. The previous change use the node ID as a parent
directory, which turns out not to work well in practice.




> hash agg fail to read spill file
> 
>
> Key: DRILL-5740
> URL: https://issues.apache.org/jira/browse/DRILL-5740
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.12.0
>Reporter: Chun Chang
>Assignee: Paul Rogers
>Priority: Critical
>
> -Build: | 1.12.0-SNAPSHOT  | 11008d029bafa36279e3045c4ed1a64366080620
> -Multi-node drill cluster
> Running a query causing hash agg spill fails with the following error. And 
> this seems to be a regression.
> {noformat}
> Execution Failures:
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg5.q
> Query:
> select gby_date, gby_int32_rand, sum(int32_field), avg(float_field), 
> min(boolean_field), count(double_rand) from 
> dfs.`/drill/testdata/hagg/PARQUET-500M.parquet` group by gby_date, 
> gby_int32_rand order by gby_date, gby_int32_rand limit 30
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> Fragment 1:34
> [Error Id: 291a79f8-9b7a-485d-9404-e7b7fe1d8f1e on 10.10.30.168:31010]
>   (java.lang.RuntimeException) java.io.FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> 
> org.apache.drill.exec.physical.impl.aggregate.SpilledRecordbatch.():67
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.outputCurrentBatch():980
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecut

[jira] [Commented] (DRILL-5740) hash agg fail to read spill file

2017-08-29 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146091#comment-16146091
 ] 

Paul Rogers commented on DRILL-5740:


[~jni], can you do a quick review? If you give a +1 then I can commit the fix.

> hash agg fail to read spill file
> 
>
> Key: DRILL-5740
> URL: https://issues.apache.org/jira/browse/DRILL-5740
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.12.0
>Reporter: Chun Chang
>Assignee: Paul Rogers
>Priority: Blocker
>
> -Build: | 1.12.0-SNAPSHOT  | 11008d029bafa36279e3045c4ed1a64366080620
> -Multi-node drill cluster
> Running a query causing hash agg spill fails with the following error. And 
> this seems to be a regression.
> {noformat}
> Execution Failures:
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg5.q
> Query:
> select gby_date, gby_int32_rand, sum(int32_field), avg(float_field), 
> min(boolean_field), count(double_rand) from 
> dfs.`/drill/testdata/hagg/PARQUET-500M.parquet` group by gby_date, 
> gby_int32_rand order by gby_date, gby_int32_rand limit 30
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> Fragment 1:34
> [Error Id: 291a79f8-9b7a-485d-9404-e7b7fe1d8f1e on 10.10.30.168:31010]
>   (java.lang.RuntimeException) java.io.FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> 
> org.apache.drill.exec.physical.impl.aggregate.SpilledRecordbatch.():67
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.outputCurrentBatch():980
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5740) hash agg fail to read spill file

2017-08-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146200#comment-16146200
 ] 

ASF GitHub Bot commented on DRILL-5740:
---

Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/924
  
+1

LGTM.


> hash agg fail to read spill file
> 
>
> Key: DRILL-5740
> URL: https://issues.apache.org/jira/browse/DRILL-5740
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.12.0
>Reporter: Chun Chang
>Assignee: Paul Rogers
>Priority: Blocker
>
> -Build: | 1.12.0-SNAPSHOT  | 11008d029bafa36279e3045c4ed1a64366080620
> -Multi-node drill cluster
> Running a query causing hash agg spill fails with the following error. And 
> this seems to be a regression.
> {noformat}
> Execution Failures:
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg5.q
> Query:
> select gby_date, gby_int32_rand, sum(int32_field), avg(float_field), 
> min(boolean_field), count(double_rand) from 
> dfs.`/drill/testdata/hagg/PARQUET-500M.parquet` group by gby_date, 
> gby_int32_rand order by gby_date, gby_int32_rand limit 30
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> Fragment 1:34
> [Error Id: 291a79f8-9b7a-485d-9404-e7b7fe1d8f1e on 10.10.30.168:31010]
>   (java.lang.RuntimeException) java.io.FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> 
> org.apache.drill.exec.physical.impl.aggregate.SpilledRecordbatch.():67
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.outputCurrentBatch():980
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5740) hash agg fail to read spill file

2017-08-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146372#comment-16146372
 ] 

ASF GitHub Bot commented on DRILL-5740:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/924


> hash agg fail to read spill file
> 
>
> Key: DRILL-5740
> URL: https://issues.apache.org/jira/browse/DRILL-5740
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.12.0
>Reporter: Chun Chang
>Assignee: Paul Rogers
>Priority: Blocker
>
> -Build: | 1.12.0-SNAPSHOT  | 11008d029bafa36279e3045c4ed1a64366080620
> -Multi-node drill cluster
> Running a query causing hash agg spill fails with the following error. And 
> this seems to be a regression.
> {noformat}
> Execution Failures:
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg5.q
> Query:
> select gby_date, gby_int32_rand, sum(int32_field), avg(float_field), 
> min(boolean_field), count(double_rand) from 
> dfs.`/drill/testdata/hagg/PARQUET-500M.parquet` group by gby_date, 
> gby_int32_rand order by gby_date, gby_int32_rand limit 30
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> Fragment 1:34
> [Error Id: 291a79f8-9b7a-485d-9404-e7b7fe1d8f1e on 10.10.30.168:31010]
>   (java.lang.RuntimeException) java.io.FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> 
> org.apache.drill.exec.physical.impl.aggregate.SpilledRecordbatch.():67
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.outputCurrentBatch():980
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5740) hash agg fail to read spill file

2017-08-30 Thread Chun Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148144#comment-16148144
 ] 

Chun Chang commented on DRILL-5740:
---

| 1.12.0-SNAPSHOT  | d1a6134b441aebd7b05c9c7bc9ef4780707a41e3  | DRILL-5740: 
Ensure spill directories are unique

With the fix, I still hit similar issue.

oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
FileNotFoundException: File 
/tmp/drill/spill/10.10.30.169-31010_2658c9b0-e4c9-8997-bad3-20e4ae72b694_HashAgg_1-4-21/spill2
 does not exist

Fragment 1:21

[Error Id: ffb265e2-c218-4e1e-8416-b6f92a985422 on 10.10.30.169:31010]

  (java.lang.RuntimeException) java.io.FileNotFoundException: File 
/tmp/drill/spill/10.10.30.169-31010_2658c9b0-e4c9-8997-bad3-20e4ae72b694_HashAgg_1-4-21/spill2
 does not exist
org.apache.drill.exec.physical.impl.aggregate.SpilledRecordbatch.():67

org.apache.drill.exec.test.generated.HashAggregatorGen61.outputCurrentBatch():980
org.apache.drill.exec.test.generated.HashAggregatorGen61.doWork():617

> hash agg fail to read spill file
> 
>
> Key: DRILL-5740
> URL: https://issues.apache.org/jira/browse/DRILL-5740
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.12.0
>Reporter: Chun Chang
>Assignee: Paul Rogers
>Priority: Blocker
>
> -Build: | 1.12.0-SNAPSHOT  | 11008d029bafa36279e3045c4ed1a64366080620
> -Multi-node drill cluster
> Running a query causing hash agg spill fails with the following error. And 
> this seems to be a regression.
> {noformat}
> Execution Failures:
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg5.q
> Query:
> select gby_date, gby_int32_rand, sum(int32_field), avg(float_field), 
> min(boolean_field), count(double_rand) from 
> dfs.`/drill/testdata/hagg/PARQUET-500M.parquet` group by gby_date, 
> gby_int32_rand order by gby_date, gby_int32_rand limit 30
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> Fragment 1:34
> [Error Id: 291a79f8-9b7a-485d-9404-e7b7fe1d8f1e on 10.10.30.168:31010]
>   (java.lang.RuntimeException) java.io.FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> 
> org.apache.drill.exec.physical.impl.aggregate.SpilledRecordbatch.():67
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.outputCurrentBatch():980
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5740) hash agg fail to read spill file

2017-08-30 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148168#comment-16148168
 ] 

Paul Rogers commented on DRILL-5740:


Thanks. Our original guess as to the cause appears to have been wrong. This may 
be an issue in the Hash Agg code path since (knock on wood), we've not seen 
something similar in the external sort tests. [~ben-zvi], can you take another 
look?

> hash agg fail to read spill file
> 
>
> Key: DRILL-5740
> URL: https://issues.apache.org/jira/browse/DRILL-5740
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.12.0
>Reporter: Chun Chang
>Assignee: Paul Rogers
>Priority: Blocker
>
> -Build: | 1.12.0-SNAPSHOT  | 11008d029bafa36279e3045c4ed1a64366080620
> -Multi-node drill cluster
> Running a query causing hash agg spill fails with the following error. And 
> this seems to be a regression.
> {noformat}
> Execution Failures:
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg5.q
> Query:
> select gby_date, gby_int32_rand, sum(int32_field), avg(float_field), 
> min(boolean_field), count(double_rand) from 
> dfs.`/drill/testdata/hagg/PARQUET-500M.parquet` group by gby_date, 
> gby_int32_rand order by gby_date, gby_int32_rand limit 30
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> Fragment 1:34
> [Error Id: 291a79f8-9b7a-485d-9404-e7b7fe1d8f1e on 10.10.30.168:31010]
>   (java.lang.RuntimeException) java.io.FileNotFoundException: File 
> /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3
>  does not exist
> 
> org.apache.drill.exec.physical.impl.aggregate.SpilledRecordbatch.():67
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.outputCurrentBatch():980
> org.apache.drill.exec.test.generated.HashAggregatorGen1891.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)