[
https://issues.apache.org/jira/browse/ASTERIXDB-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till updated ASTERIXDB-2326:
----------------------------
Labels: triaged (was: )
> Cannot run aggregation functions when the external dataset size grows too
> large
> -------------------------------------------------------------------------------
>
> Key: ASTERIXDB-2326
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2326
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: EXT - External data, FUN - Functions
> Reporter: James Fang
> Assignee: Murtadha Hubail
> Priority: Major
> Labels: triaged
>
> I was testing aggregation functions on external data, and found that the
> aggregation functions would not work at all at 100 million tuples. At
> 10million tuples, the aggregates worked. None of the existing aggregates or
> the aggregates I am adding will work for 100 million tuples.
> DDL:
> DROP DATAVERSE AGG_TEST IF EXISTS;
> CREATE DATAVERSE AGG_TEST;
> USE AGG_TEST;
> CREATE TYPE Data AS {
> id: int,
> val: double
> };
> create external dataset dataval(Data) using
> localfs((`path`=`127.0.0.1://Users/name/Documents/100000000.txt`),(`format`=`adm`));
>
> Query:
> USE AGG_TEST;
> {"average":coll_avg((select element x.val from dataval as x))};
>
> Error:
> 11:55:25.603 [Executor-3:ClusterController] INFO
> org.apache.asterix.runtime.utils.ClusterStateManager - Cluster State is now
> ACTIVE
> 11:55:30.447 [Worker:ClusterController] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing:
> GetDatasetDirectoryServiceInfo
> 11:55:30.917 [Worker:ClusterController] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing:
> GetNodeControllersInfo
> 11:55:31.345 [Worker:ClusterController] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing: JobStart
> 11:55:31.379 [Worker:ClusterController] INFO
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService -
> DatasetDirectoryService notified of new job JID:0.1
> 11:55:31.382 [Worker:ClusterController] INFO
> org.apache.asterix.app.active.ActiveNotificationHandler -
> notifyJobCreation(JobId jobId, JobSpecification jobSpecification) was called
> with jobId = JID:0.1
> 11:55:31.382 [Worker:ClusterController] INFO
> org.apache.asterix.app.active.ActiveNotificationHandler - Job is not of type
> active job. property found to be: null
> 11:55:31.393 [Worker:ClusterController] INFO
> org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Plan for
> org.apache.hyracks.api.job.ActivityCluster@1264c6ff
> 11:55:31.393 [Worker:ClusterController] INFO
> org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Built 1 Task
> Clusters
> 11:55:31.393 [Worker:ClusterController] INFO
> org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Tasks:
> [TID:ANID:ODID:0:0:0, TID:ANID:ODID:2:0:0]
> 11:55:31.394 [Worker:ClusterController] INFO
> org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC roots:
> [TC:[TID:ANID:ODID:0:0:0, TID:ANID:ODID:2:0:0]], inProgressTaskClusters: []
> 11:55:31.412 [Worker:ClusterController] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing:
> WaitForJobCompletion
> 11:55:31.412 [Worker:asterix_nc1] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing: StartTasks
> 11:55:31.423 [Worker:asterix_nc1] INFO
> org.apache.hyracks.control.nc.work.StartTasksWork - Initializing
> TAID:TID:ANID:ODID:0:0:0:0 ->
> [org.apache.asterix.external.operators.ExternalScanOperatorDescriptor@74fb82e0,
> AlgebricksMeta [assign [1] :=
> [org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@30d487a5],
> stream-project [1], assign
> [org.apache.asterix.runtime.aggregates.std.LocalAvgAggregateDescriptor$2@6594e4ce]]]
> for JID:0.1
> 11:55:31.450 [Worker:asterix_nc1] INFO
> org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:1
> 11:55:31.453 [Worker:asterix_nc1] INFO
> org.apache.hyracks.control.nc.work.StartTasksWork - Initializing
> TAID:TID:ANID:ODID:2:0:0:0 ->
> [org.apache.hyracks.dataflow.std.result.ResultWriterOperatorDescriptor@71b17102,
> AlgebricksMeta [assign
> [org.apache.asterix.runtime.aggregates.std.GlobalAvgAggregateDescriptor$2@11121dfc],
> assign [1] :=
> [org.apache.asterix.runtime.evaluators.common.ClosedRecordConstructorEvalFactory@443a919b],
> stream-project [1]]] for JID:0.1
> 11:55:31.480 [Worker:asterix_nc1] INFO
> org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:1
> 11:55:31.517
> [org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:2:0:0:0:0]
> INFO org.apache.hyracks.control.nc.dataset.DatasetPartitionWriter - open(0)
> 12:00:57.342 [Worker:asterix_nc1] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing:
> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:0:0:0:0
> 12:00:57.351 [Worker:ClusterController] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing: TaskComplete:
> [asterix_nc1[JID:0.1:TAID:TID:ANID:ODID:0:0:0:0]
> 12:00:57.365 [Worker:ClusterController] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing:
> RegisterResultPartitionLocation: JobId@JID:0.1 ResultSetId@RSID:0 Partition@0
> NPartitions@1
> [[email protected]:49695|http://[email protected]:49695/]
> OrderedResult@true EmptyResult@false
> 12:00:57.368
> [org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:2:0:0:0:0]
> INFO org.apache.hyracks.control.nc.dataset.DatasetPartitionWriter - close(0)
> 12:00:57.373 [Worker:asterix_nc1] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing:
> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:2:0:0:0
> 12:00:57.377 [Worker:ClusterController] WARN
> org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork -
> Failed to register partition location
> org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result
> set for job JID:0.1
> at
> org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55)
> ~[classes/:?]
> at
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105)
> ~[classes/:?]
> at
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114)
> ~[classes/:?]
> at
> org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71)
> [classes/:?]
> at
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> [classes/:?]
> 12:00:57.393 [Worker:ClusterController] INFO
> org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for job:
> JID:0.1: \{asterix_nc1=[TAID:TID:ANID:ODID:2:0:0:0]}
> 12:00:57.394 [Worker:ClusterController] INFO
> org.apache.hyracks.control.cc.executor.JobExecutor - Aborting:
> [TAID:TID:ANID:ODID:2:0:0:0] at asterix_nc1
> 12:00:57.400 [Worker:ClusterController] INFO
> org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing
> uncommitted partitions: []
> 12:00:57.405 [Worker:ClusterController] INFO
> org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing
> partition requests: []
> 12:00:57.407 [Worker:ClusterController] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing:
> ReportResultPartitionWriteCompletion: JobId@JID:0.1 ResultSetId@RSID:0
> Partition@0
> 12:00:57.407 [Worker:asterix_nc1] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing: AbortTasks
> 12:00:57.407 [Worker:asterix_nc1] INFO
> org.apache.hyracks.control.nc.work.AbortTasksWork - Aborting Tasks:
> JID:0.1:[TAID:TID:ANID:ODID:2:0:0:0]
> 12:00:57.407 [Worker:ClusterController] WARN
> org.apache.hyracks.control.common.work.WorkQueue - Exception while executing
> ReportResultPartitionWriteCompletion: JobId@JID:0.1 ResultSetId@RSID:0
> Partition@0
> java.lang.RuntimeException:
> org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result
> set for job JID:0.1
> at
> org.apache.hyracks.control.cc.work.ReportResultPartitionWriteCompletionWork.run(ReportResultPartitionWriteCompletionWork.java:49)
> ~[classes/:?]
> at
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> [classes/:?]
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024:
> No result set for job JID:0.1
> at
> org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55)
> ~[classes/:?]
> at
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105)
> ~[classes/:?]
> at
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.reportResultPartitionWriteCompletion(DatasetDirectoryService.java:141)
> ~[classes/:?]
> at
> org.apache.hyracks.control.cc.work.ReportResultPartitionWriteCompletionWork.run(ReportResultPartitionWriteCompletionWork.java:47)
> ~[classes/:?]
> ... 1 more
> 12:00:57.408 [Worker:ClusterController] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing: TaskComplete:
> [asterix_nc1[JID:0.1:TAID:TID:ANID:ODID:2:0:0:0]
> 12:00:57.409 [Worker:ClusterController] WARN
> org.apache.hyracks.control.cc.executor.JobExecutor - Spurious task complete
> notification: TAID:TID:ANID:ODID:2:0:0:0 Current state = ABORTED
> 12:00:57.409 [Worker:ClusterController] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing: JobCleanup:
> JobId@JID:0.1 Status@FAILURE
> Exceptions@[org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024:
> No result set for job JID:0.1]
> 12:00:57.409 [Worker:ClusterController] INFO
> org.apache.hyracks.control.cc.work.JobCleanupWork - Cleanup for JobRun with
> id: JID:0.1
> 12:00:57.412 [Worker:asterix_nc1] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing: CleanupJoblet
> 12:00:57.413 [Worker:asterix_nc1] INFO
> org.apache.hyracks.control.nc.work.CleanupJobletWork - Cleaning up after job:
> JID:0.1
> 12:00:57.416 [Worker:asterix_nc1] INFO org.apache.hyracks.control.nc.Joblet
> - Freeing leaked 294912 bytes
> 12:00:57.421 [Worker:ClusterController] INFO
> org.apache.hyracks.control.common.work.WorkQueue - Executing:
> JobletCleanupNotification
> 12:00:57.421 [Worker:ClusterController] INFO
> org.apache.asterix.app.active.ActiveNotificationHandler - Getting notified of
> job finish for JobId: JID:0.1
> 12:00:57.421 [Worker:ClusterController] INFO
> org.apache.asterix.app.active.ActiveNotificationHandler - NO NEED TO NOTIFY
> JOB FINISH!
> 12:00:57.430 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:49684]] INFO
> org.apache.hyracks.ipc.impl.IPCSystem - Exception in message
> org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result
> set for job JID:0.1
> at
> org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55)
> ~[classes/:?]
> at
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105)
> ~[classes/:?]
> at
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114)
> ~[classes/:?]
> at
> org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71)
> ~[classes/:?]
> at
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> ~[classes/:?]
> 12:00:57.436 [HttpExecutor(port:19001)-0] ERROR org.apache.asterix - HYR0024:
> No result set for job JID:0.1
> org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result
> set for job JID:0.1
> at
> org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55)
> ~[classes/:?]
> at
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105)
> ~[classes/:?]
> at
> org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114)
> ~[classes/:?]
> at
> org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71)
> ~[classes/:?]
> at
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> ~[classes/:?]
> 12:00:57.442 [Worker:ClusterController] WARN
> org.apache.hyracks.control.common.work.WorkQueue - Work
> JobletCleanupNotification waited 0 times (~0ms), blocked 1 times (~0ms)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)