[jira] [Assigned] (IMPALA-7653) Improve accuracy of compute incremental stats cardinality estimation

2019-03-05 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-7653:


Assignee: Paul Rogers  (was: Pooja Nilangekar)

> Improve accuracy of compute incremental stats cardinality estimation
> 
>
> Key: IMPALA-7653
> URL: https://issues.apache.org/jira/browse/IMPALA-7653
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Balazs Jeszenszky
>Assignee: Paul Rogers
>Priority: Major
>  Labels: resource-management
>
> Currently, the operators of a compute [incremental] stats' subquery rely on 
> combined selectivities - as usual - to estimate cardinality, e.g. during 
> aggregation. For example, note the expected cardinality of the aggregation on 
> this subquery:
> {code}
> F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=4
> Per-Host Resources: mem-estimate=305.20GB mem-reservation=136.00MB
> 01:AGGREGATE [STREAMING]
> |  output: [...]
> |  group by: col_a, col_b, col_c
> |  mem-estimate=76.21GB mem-reservation=34.00MB spill-buffer=2.00MB
> |  tuple-ids=1 row-size=104.83KB cardinality=693000
> |
> 00:SCAN HDFS [default.test, RANDOM]
>partitions=1/554 files=1 size=109.65MB
>stats-rows=1506374 extrapolated-rows=disabled
>table stats: rows=821958291 size=unavailable
>column stats: all
>mem-estimate=88.00MB mem-reservation=0B
>tuple-ids=0 row-size=2.06KB cardinality=1506374
> {code}
> This was generated as a result of compute incremental stats on a single 
> partition, so the output of that aggregation is a single row. Due to the 
> width of the intermediate rows, such overestimations lead to bloated memory 
> estimates. Since the amount of partitions to be updated is known at 
> plan-time, Impala could use that to set the aggregation's cardinality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7604) In AggregationNode.computeStats, handle cardinality overflow better

2019-03-05 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-7604:


Assignee: Paul Rogers  (was: Pooja Nilangekar)

> In AggregationNode.computeStats, handle cardinality overflow better
> ---
>
> Key: IMPALA-7604
> URL: https://issues.apache.org/jira/browse/IMPALA-7604
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Consider the cardinality overflow logic in 
> [{{AggregationNode.computeStats()}}|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/AggregationNode.java].
>  Current code:
> {noformat}
> // if we ended up with an overflow, the estimate is certain to be wrong
> if (cardinality_ < 0) cardinality_ = -1;
> {noformat}
> This code has a number of issues.
> * The check is done after looping over all conjuncts. It could be that, as a 
> result, the number overflowed twice. The check should be done after each 
> multiplication.
> * Since we know that the number overflowed, a better estimate of the total 
> count is {{Long.MAX_VALUE}}.
> * The code later checks for the -1 value and, if found, uses the cardinality 
> of the first child. This is a worse estimate than using the max value, since 
> the first child might have a low cardinality (it could be the later children 
> that caused the overflow.)
> * If we really do expect overflow, then we are dealing with very large 
> numbers. Being accurate to the row is not needed. Better to use a {{double}} 
> which can handle the large values.
> Since overflow probably seldom occurs, this is not an urgent issue. Though, 
> if overflow does occur, the query is huge, and having at least some estimate 
> of the hugeness is better than none. Also, seems that this code probably 
> evolved; this newbie is looking at it fresh and seeing that the accumulated 
> fixes could be tidied up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-6544) Lack of S3 consistency leads to rare test failures

2019-03-05 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-6544:


Assignee: Joe McDonnell  (was: Pooja Nilangekar)

> Lack of S3 consistency leads to rare test failures
> --
>
> Key: IMPALA-6544
> URL: https://issues.apache.org/jira/browse/IMPALA-6544
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 2.8.0
>Reporter: Sailesh Mukil
>Assignee: Joe McDonnell
>Priority: Major
>  Labels: S3, broken-build, consistency, flaky, test-framework
>
> Every now and then, we hit a flaky test on S3 runs due to files missing when 
> they should be present, and vice versa. We could consider running our tests 
> (or a subset of our tests) with S3Guard to avoid these problems, however rare 
> they are.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8189) TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp' fails

2019-03-04 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8189.
--
Resolution: Fixed

> TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp'  fails
> --
>
> Key: IMPALA-8189
> URL: https://issues.apache.org/jira/browse/IMPALA-8189
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Andrew Sherman
>Assignee: Pooja Nilangekar
>Priority: Critical
>  Labels: broken-build, flaky-test
>
> In parquet-resolution-by-name.test a parquet file is copied. 
> {quote}
>  SHELL
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nonnullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> {quote}
> The first copy succeeds, but the second fails. In the DEBUG output (below) 
> you can see the copy writing data to an intermediate file 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  and then after the stream is closed, the copy cannot find the file.
> {quote}
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  7
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  8
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  3
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_create += 1  ->  1
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  6
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  9
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  10
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  4
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: Initialized 
> S3ABlockOutputStream for 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  output to FileBlock{index=1, 
> destFile=/tmp/hadoop-jenkins/s3a/s3ablock-0001-1315190405959387081.tmp, 
> state=Writing, dataSize=0, limit=104857600}
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  7
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  11
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  12
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  5
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AInputStream: 
> reopen(s3a://impala-test-uswest2-1/test-warehouse/complextypestbl_parquet/nonnullable.parq)
>  for read from new offset range[0-3186], length=4096, streamPosition=0, 
> nextReadPosition=0, policy=normal
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: 
> S3ABlockOutputStream{WriteOperationHelper {bucket=impala-test-uswest2-1}, 
> blockSize=104857600, activeBlock=FileBlock{index=1, 
> 

[jira] [Resolved] (IMPALA-8189) TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp' fails

2019-03-04 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8189.
--
Resolution: Fixed

> TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp'  fails
> --
>
> Key: IMPALA-8189
> URL: https://issues.apache.org/jira/browse/IMPALA-8189
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Andrew Sherman
>Assignee: Pooja Nilangekar
>Priority: Critical
>  Labels: broken-build, flaky-test
>
> In parquet-resolution-by-name.test a parquet file is copied. 
> {quote}
>  SHELL
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nonnullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> {quote}
> The first copy succeeds, but the second fails. In the DEBUG output (below) 
> you can see the copy writing data to an intermediate file 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  and then after the stream is closed, the copy cannot find the file.
> {quote}
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  7
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  8
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  3
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_create += 1  ->  1
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  6
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  9
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  10
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  4
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: Initialized 
> S3ABlockOutputStream for 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  output to FileBlock{index=1, 
> destFile=/tmp/hadoop-jenkins/s3a/s3ablock-0001-1315190405959387081.tmp, 
> state=Writing, dataSize=0, limit=104857600}
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  7
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  11
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  12
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  5
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AInputStream: 
> reopen(s3a://impala-test-uswest2-1/test-warehouse/complextypestbl_parquet/nonnullable.parq)
>  for read from new offset range[0-3186], length=4096, streamPosition=0, 
> nextReadPosition=0, policy=normal
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: 
> S3ABlockOutputStream{WriteOperationHelper {bucket=impala-test-uswest2-1}, 
> blockSize=104857600, activeBlock=FileBlock{index=1, 
> 

[jira] [Resolved] (IMPALA-5397) Set "End Time" earlier rather than on unregistration.

2019-03-01 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-5397.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Set "End Time" earlier rather than on unregistration.
> -
>
> Key: IMPALA-5397
> URL: https://issues.apache.org/jira/browse/IMPALA-5397
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.9.0
>Reporter: Mostafa Mokhtar
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: admission-control, query-lifecycle
> Fix For: Impala 3.2.0
>
>
> When queries are executed from Hue and hit the idle query timeout then the 
> query duration keeps going up even though the query was cancelled and it is 
> not actually doing any more work. The end time is only set when the query is 
> actually unregistered.
> Queries below finished in 1s640ms while the reported time is much longer. 
> |User||Default Db||Statement||Query Type||Start Time||Waiting 
> Time||Duration||Scan Progress||State||Last Event||# rows fetched||Resource 
> Pool||Details||Action|
> |hue/va1026.halxg.cloudera@halxg.cloudera.com|tpcds_1000_parquet|select 
> count(*) from tpcds_1000_parquet.inventory|QUERY|2017-05-31 
> 09:38:20.472804000|4m27s|4m32s|261 / 261 ( 100%)|FINISHED|First row 
> fetched|1|root.default|Details|Close|
> |hue/va1026.halxg.cloudera@halxg.cloudera.com|tpcds_1000_parquet|select 
> count(*) from tpcds_1000_parquet.inventory|QUERY|2017-05-31 
> 08:38:52.780237000|2017-05-31 09:38:20.289582000|59m27s|261 / 261 ( 
> 100%)|FINISHED|1|root.default|Details|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-5397) Set "End Time" earlier rather than on unregistration.

2019-03-01 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-5397.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Set "End Time" earlier rather than on unregistration.
> -
>
> Key: IMPALA-5397
> URL: https://issues.apache.org/jira/browse/IMPALA-5397
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.9.0
>Reporter: Mostafa Mokhtar
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: admission-control, query-lifecycle
> Fix For: Impala 3.2.0
>
>
> When queries are executed from Hue and hit the idle query timeout then the 
> query duration keeps going up even though the query was cancelled and it is 
> not actually doing any more work. The end time is only set when the query is 
> actually unregistered.
> Queries below finished in 1s640ms while the reported time is much longer. 
> |User||Default Db||Statement||Query Type||Start Time||Waiting 
> Time||Duration||Scan Progress||State||Last Event||# rows fetched||Resource 
> Pool||Details||Action|
> |hue/va1026.halxg.cloudera@halxg.cloudera.com|tpcds_1000_parquet|select 
> count(*) from tpcds_1000_parquet.inventory|QUERY|2017-05-31 
> 09:38:20.472804000|4m27s|4m32s|261 / 261 ( 100%)|FINISHED|First row 
> fetched|1|root.default|Details|Close|
> |hue/va1026.halxg.cloudera@halxg.cloudera.com|tpcds_1000_parquet|select 
> count(*) from tpcds_1000_parquet.inventory|QUERY|2017-05-31 
> 08:38:52.780237000|2017-05-31 09:38:20.289582000|59m27s|261 / 261 ( 
> 100%)|FINISHED|1|root.default|Details|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8245) Add hostname to timeout error message in HdfsMonitoredOps

2019-02-26 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8245.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Add hostname to timeout error message in HdfsMonitoredOps
> -
>
> Key: IMPALA-8245
> URL: https://issues.apache.org/jira/browse/IMPALA-8245
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Pooja Nilangekar
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> If a DiskIo operation times out, it generates a 
> TErrorCode::THREAD_POOL_TASK_TIMED_OUT or 
> TErrorCode::THREAD_POOL_SUBMIT_FAILED error codes. These call 
> GetDescription() to get DiskIo related context. That information should 
> include the hostname where the error occurred to allow tracking down a 
> problematic host that is seeing DiskIo timeouts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8245) Add hostname to timeout error message in HdfsMonitoredOps

2019-02-26 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8245.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Add hostname to timeout error message in HdfsMonitoredOps
> -
>
> Key: IMPALA-8245
> URL: https://issues.apache.org/jira/browse/IMPALA-8245
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Pooja Nilangekar
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> If a DiskIo operation times out, it generates a 
> TErrorCode::THREAD_POOL_TASK_TIMED_OUT or 
> TErrorCode::THREAD_POOL_SUBMIT_FAILED error codes. These call 
> GetDescription() to get DiskIo related context. That information should 
> include the hostname where the error occurred to allow tracking down a 
> problematic host that is seeing DiskIo timeouts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8245) Add hostname to timeout error message in HdfsMonitoredOps

2019-02-25 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-8245:


Assignee: Pooja Nilangekar

> Add hostname to timeout error message in HdfsMonitoredOps
> -
>
> Key: IMPALA-8245
> URL: https://issues.apache.org/jira/browse/IMPALA-8245
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Pooja Nilangekar
>Priority: Major
>
> If a DiskIo operation times out, it generates a 
> TErrorCode::THREAD_POOL_TASK_TIMED_OUT or 
> TErrorCode::THREAD_POOL_SUBMIT_FAILED error codes. These call 
> GetDescription() to get DiskIo related context. That information should 
> include the hostname where the error occurred to allow tracking down a 
> problematic host that is seeing DiskIo timeouts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8064) test_min_max_filters is flaky

2019-02-25 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8064.
--
Resolution: Fixed

> test_min_max_filters is flaky 
> --
>
> Key: IMPALA-8064
> URL: https://issues.apache.org/jira/browse/IMPALA-8064
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build, flaky-test
> Fix For: Impala 3.2.0
>
> Attachments: profile.txt
>
>
> The following configuration of the test_min_max_filters:
> {code:java}
> query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: kudu/none]{code}
> It produces a higher aggregation of sum over the proberows than expected:
> {code:java}
> query_test/test_runtime_filters.py:113: in test_min_max_filters 
> self.run_test_case('QueryTest/min_max_filters', vector) 
> common/impala_test_suite.py:518: in run_test_case 
> update_section=pytest.config.option.update_results) 
> common/test_result_verifier.py:612: in verify_runtime_profile % 
> (function, field, expected_value, actual_value, actual)) 
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results. 
> E   EXPECTED VALUE: E   619 
> EACTUAL VALUE: E   652
> {code}
> This test was introduced in the patch for IMPALA-6533. The failure occurred 
> during an ASAN build. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8064) test_min_max_filters is flaky

2019-02-25 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8064.
--
Resolution: Fixed

> test_min_max_filters is flaky 
> --
>
> Key: IMPALA-8064
> URL: https://issues.apache.org/jira/browse/IMPALA-8064
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build, flaky-test
> Fix For: Impala 3.2.0
>
> Attachments: profile.txt
>
>
> The following configuration of the test_min_max_filters:
> {code:java}
> query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: kudu/none]{code}
> It produces a higher aggregation of sum over the proberows than expected:
> {code:java}
> query_test/test_runtime_filters.py:113: in test_min_max_filters 
> self.run_test_case('QueryTest/min_max_filters', vector) 
> common/impala_test_suite.py:518: in run_test_case 
> update_section=pytest.config.option.update_results) 
> common/test_result_verifier.py:612: in verify_runtime_profile % 
> (function, field, expected_value, actual_value, actual)) 
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results. 
> E   EXPECTED VALUE: E   619 
> EACTUAL VALUE: E   652
> {code}
> This test was introduced in the patch for IMPALA-6533. The failure occurred 
> during an ASAN build. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-8064) test_min_max_filters is flaky

2019-02-19 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772500#comment-16772500
 ] 

Pooja Nilangekar commented on IMPALA-8064:
--

[~tarmstrong] I looked at a few other failed instances. In all cases, the 
codegen time for F00 is about 2 minutes. In the worst case, the total codegen 
time spent was 2m36s and CodegenInvoluntaryContextSwitches: 289.66K. Also, I 
looked around the ASAN code, each allocation and free grabs a spinlock. So the 
slow down is most likely because of ASAN. Would it be okay to bump the wait 
time? 

> test_min_max_filters is flaky 
> --
>
> Key: IMPALA-8064
> URL: https://issues.apache.org/jira/browse/IMPALA-8064
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build, flaky-test
> Fix For: Impala 3.2.0
>
> Attachments: profile.txt
>
>
> The following configuration of the test_min_max_filters:
> {code:java}
> query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: kudu/none]{code}
> It produces a higher aggregation of sum over the proberows than expected:
> {code:java}
> query_test/test_runtime_filters.py:113: in test_min_max_filters 
> self.run_test_case('QueryTest/min_max_filters', vector) 
> common/impala_test_suite.py:518: in run_test_case 
> update_section=pytest.config.option.update_results) 
> common/test_result_verifier.py:612: in verify_runtime_profile % 
> (function, field, expected_value, actual_value, actual)) 
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results. 
> E   EXPECTED VALUE: E   619 
> EACTUAL VALUE: E   652
> {code}
> This test was introduced in the patch for IMPALA-6533. The failure occurred 
> during an ASAN build. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8064) test_min_max_filters is flaky

2019-02-19 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772342#comment-16772342
 ] 

Pooja Nilangekar commented on IMPALA-8064:
--

I had a look at the most recent failure. The runtime filters didn't arrive in 
the specified 10 ms limit. Instead on one of the fragment instance, the 
arrival time was 1m58s (this node was probably lucky because 
ScanNode::WaitForRuntimeFilters() was called later).  For two other fragment 
instances, the filters had not arrived after 1m40s. I think it might help to 
increase this wait time.  Or could there be other workarounds? 

> test_min_max_filters is flaky 
> --
>
> Key: IMPALA-8064
> URL: https://issues.apache.org/jira/browse/IMPALA-8064
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build, flaky-test
> Fix For: Impala 3.2.0
>
> Attachments: profile.txt
>
>
> The following configuration of the test_min_max_filters:
> {code:java}
> query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: kudu/none]{code}
> It produces a higher aggregation of sum over the proberows than expected:
> {code:java}
> query_test/test_runtime_filters.py:113: in test_min_max_filters 
> self.run_test_case('QueryTest/min_max_filters', vector) 
> common/impala_test_suite.py:518: in run_test_case 
> update_section=pytest.config.option.update_results) 
> common/test_result_verifier.py:612: in verify_runtime_profile % 
> (function, field, expected_value, actual_value, actual)) 
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results. 
> E   EXPECTED VALUE: E   619 
> EACTUAL VALUE: E   652
> {code}
> This test was introduced in the patch for IMPALA-6533. The failure occurred 
> during an ASAN build. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-8201) How to generate erver-key.pem and server-cert.pem when executing the impala ssl test cases?

2019-02-14 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar closed IMPALA-8201.

Resolution: Information Provided

> How to generate erver-key.pem and server-cert.pem when executing the impala 
> ssl test cases?
> ---
>
> Key: IMPALA-8201
> URL: https://issues.apache.org/jira/browse/IMPALA-8201
> Project: IMPALA
>  Issue Type: Test
>  Components: Security
>Affects Versions: Impala 2.10.0
>Reporter: Donghui Xu
>Priority: Minor
>
> When executing the test case in webserver-test.cc, it was found to use 
> be/src/testutil/server-cert.pem and be/src/testutil/server-key.pem.
> How is the above file generated?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8201) How to generate erver-key.pem and server-cert.pem when executing the impala ssl test cases?

2019-02-14 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768755#comment-16768755
 ] 

Pooja Nilangekar commented on IMPALA-8201:
--

[~davidxdh] The instructions to generate the key can be found in 
be/src/testutil/certificates-info.txt. 

> How to generate erver-key.pem and server-cert.pem when executing the impala 
> ssl test cases?
> ---
>
> Key: IMPALA-8201
> URL: https://issues.apache.org/jira/browse/IMPALA-8201
> Project: IMPALA
>  Issue Type: Test
>  Components: Security
>Affects Versions: Impala 2.10.0
>Reporter: Donghui Xu
>Priority: Minor
>
> When executing the test case in webserver-test.cc, it was found to use 
> be/src/testutil/server-cert.pem and be/src/testutil/server-key.pem.
> How is the above file generated?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-8201) How to generate erver-key.pem and server-cert.pem when executing the impala ssl test cases?

2019-02-14 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar closed IMPALA-8201.

Resolution: Information Provided

> How to generate erver-key.pem and server-cert.pem when executing the impala 
> ssl test cases?
> ---
>
> Key: IMPALA-8201
> URL: https://issues.apache.org/jira/browse/IMPALA-8201
> Project: IMPALA
>  Issue Type: Test
>  Components: Security
>Affects Versions: Impala 2.10.0
>Reporter: Donghui Xu
>Priority: Minor
>
> When executing the test case in webserver-test.cc, it was found to use 
> be/src/testutil/server-cert.pem and be/src/testutil/server-key.pem.
> How is the above file generated?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-4268) Rework coordinator buffering to buffer more data

2019-02-12 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766330#comment-16766330
 ] 

Pooja Nilangekar commented on IMPALA-4268:
--

Here is an update about the proposed solution: 

There could be a ResultsBuffer (or any better class name) which would be shared 
between the Coordinator and the PlanRootSink. Initially, it would be of a fixed 
size (~8MB). The PlanRootSink::Send() function would be modified to write into 
the ResutlsBuffer. There would be no direct consumer calling into 
PlanRootSink::GetNext(). Instead, the fetch calls on the coordinator would now 
try to read from the ResultBuffer. 
To handle a case of  a cleanly updating the query state, since the entire tree 
would be able to continually write into the ResultsBuffer, once the last row is 
added to the buffer, the fragment instance would call DoneExecuting() and hence 
the coordinator would be able to release all admission control resources apart 
from the ResutlsBuffer. This would ensure that this transition is not 
observable to the client since all the fetch calls would continue to be served 
from the ResultsBuffer via the coordinator. 

> Rework coordinator buffering to buffer more data
> 
>
> Key: IMPALA-4268
> URL: https://issues.apache.org/jira/browse/IMPALA-4268
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Henry Robinson
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: query-lifecycle, resource-management
> Attachments: rows-produced-histogram.png
>
>
> {{PlanRootSink}} executes the producer thread (the coordinator fragment 
> execution thread) in a separate thread to the consumer (i.e. the thread 
> handling the fetch RPC), which calls {{GetNext()}} to retrieve the rows. The 
> implementation was simplified by handing off a single batch at a time from 
> the producers to consumer.
> This decision causes some problems:
> * Many context switches for the sender. Adding buffering would allow the 
> sender to append to the buffer and continue progress without a context switch.
> * Query execution can't release resources until the client has fetched the 
> final batch, because the coordinator fragment thread is still running and 
> potentially producing backpressure all the way down the plan tree.
> * The consumer can't fulfil fetch requests greater than Impala's internal 
> BATCH_SIZE, because it is only given one batch at a time.
> The tricky part is managing the mismatch between the size of the row batches 
> processed in {{Send()}} and the size of the fetch result asked for by the 
> client without impacting performance too badly. The sender materializes 
> output rows in a {{QueryResultSet}} that is owned by the coordinator. That is 
> not, currently, a splittable object - instead it contains the actual RPC 
> response struct that will hit the wire when the RPC completes. As 
> asynchronous sender does not know the batch size, because it can in theory 
> change on every fetch call (although most reasonable clients will not 
> randomly change the fetch size).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8189) TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp' fails

2019-02-12 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766289#comment-16766289
 ] 

Pooja Nilangekar commented on IMPALA-8189:
--

I'll take a look at this, it looks like this is due to the s3 eventual 
consistency issues. 

> TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp'  fails
> --
>
> Key: IMPALA-8189
> URL: https://issues.apache.org/jira/browse/IMPALA-8189
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Pooja Nilangekar
>Priority: Major
>
> In parquet-resolution-by-name.test a parquet file is copied. 
> {quote}
>  SHELL
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nonnullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> {quote}
> The first copy succeeds, but the second fails. In the DEBUG output (below) 
> you can see the copy writing data to an intermediate file 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  and then after the stream is closed, the copy cannot find the file.
> {quote}
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  7
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  8
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  3
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_create += 1  ->  1
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  6
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  9
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  10
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  4
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: Initialized 
> S3ABlockOutputStream for 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  output to FileBlock{index=1, 
> destFile=/tmp/hadoop-jenkins/s3a/s3ablock-0001-1315190405959387081.tmp, 
> state=Writing, dataSize=0, limit=104857600}
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  7
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  11
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  12
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  5
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AInputStream: 
> reopen(s3a://impala-test-uswest2-1/test-warehouse/complextypestbl_parquet/nonnullable.parq)
>  for read from new offset range[0-3186], length=4096, streamPosition=0, 
> nextReadPosition=0, policy=normal
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: 
> S3ABlockOutputStream{WriteOperationHelper {bucket=impala-test-uswest2-1}, 
> blockSize=104857600, activeBlock=FileBlock{index=1, 
> 

[jira] [Assigned] (IMPALA-8189) TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp' fails

2019-02-12 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-8189:


Assignee: Pooja Nilangekar

> TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp'  fails
> --
>
> Key: IMPALA-8189
> URL: https://issues.apache.org/jira/browse/IMPALA-8189
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Pooja Nilangekar
>Priority: Major
>
> In parquet-resolution-by-name.test a parquet file is copied. 
> {quote}
>  SHELL
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nonnullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> {quote}
> The first copy succeeds, but the second fails. In the DEBUG output (below) 
> you can see the copy writing data to an intermediate file 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  and then after the stream is closed, the copy cannot find the file.
> {quote}
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  7
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  8
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  3
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_create += 1  ->  1
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  6
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  9
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  10
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  4
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: Initialized 
> S3ABlockOutputStream for 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  output to FileBlock{index=1, 
> destFile=/tmp/hadoop-jenkins/s3a/s3ablock-0001-1315190405959387081.tmp, 
> state=Writing, dataSize=0, limit=104857600}
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  7
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  11
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  12
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  5
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AInputStream: 
> reopen(s3a://impala-test-uswest2-1/test-warehouse/complextypestbl_parquet/nonnullable.parq)
>  for read from new offset range[0-3186], length=4096, streamPosition=0, 
> nextReadPosition=0, policy=normal
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: 
> S3ABlockOutputStream{WriteOperationHelper {bucket=impala-test-uswest2-1}, 
> blockSize=104857600, activeBlock=FileBlock{index=1, 
> destFile=/tmp/hadoop-jenkins/s3a/s3ablock-0001-1315190405959387081.tmp, 
> state=Writing, dataSize=3186, 

[jira] [Resolved] (IMPALA-8096) Limit on #rows returned from query

2019-02-07 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8096.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Limit on #rows returned from query
> --
>
> Key: IMPALA-8096
> URL: https://issues.apache.org/jira/browse/IMPALA-8096
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: resource-management
> Fix For: Impala 3.2.0
>
>
> Sometimes users accidentally run queries that return a large number of rows, 
> e.g.
> {code}
> SELECT * FROM table
> {code}
> When they really only need to look at a subset of the rows. It would be 
> useful to have a guardrail to fail queries the return more rows than a 
> particular limit. Maybe it would make sense to integrate with IMPALA-4268 so 
> that the query is failed when the buffer fills up, but it may also be useful 
> to have an easier-to-understand option based on #rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8064) test_min_max_filters is flaky

2019-02-06 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762214#comment-16762214
 ] 

Pooja Nilangekar commented on IMPALA-8064:
--

Sure, taking a look at it right now. 

> test_min_max_filters is flaky 
> --
>
> Key: IMPALA-8064
> URL: https://issues.apache.org/jira/browse/IMPALA-8064
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build, flaky-test
> Fix For: Impala 3.2.0
>
> Attachments: profile.txt
>
>
> The following configuration of the test_min_max_filters:
> {code:java}
> query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: kudu/none]{code}
> It produces a higher aggregation of sum over the proberows than expected:
> {code:java}
> query_test/test_runtime_filters.py:113: in test_min_max_filters 
> self.run_test_case('QueryTest/min_max_filters', vector) 
> common/impala_test_suite.py:518: in run_test_case 
> update_section=pytest.config.option.update_results) 
> common/test_result_verifier.py:612: in verify_runtime_profile % 
> (function, field, expected_value, actual_value, actual)) 
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results. 
> E   EXPECTED VALUE: E   619 
> EACTUAL VALUE: E   652
> {code}
> This test was introduced in the patch for IMPALA-6533. The failure occurred 
> during an ASAN build. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8151) HiveUdfCall assumes StringValue is 16 bytes

2019-02-06 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8151.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> HiveUdfCall assumes StringValue is 16 bytes
> ---
>
> Key: IMPALA-8151
> URL: https://issues.apache.org/jira/browse/IMPALA-8151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: crash
> Fix For: Impala 3.2.0
>
>
> HiveUdfCall has the sizes of internal types hardcoded as magic numbers:
> {code}
>   switch (GetChild(i)->type().type) {
> case TYPE_BOOLEAN:
> case TYPE_TINYINT:
>   // Using explicit sizes helps the compiler unroll memcpy
>   memcpy(input_ptr, v, 1);
>   break;
> case TYPE_SMALLINT:
>   memcpy(input_ptr, v, 2);
>   break;
> case TYPE_INT:
> case TYPE_FLOAT:
>   memcpy(input_ptr, v, 4);
>   break;
> case TYPE_BIGINT:
> case TYPE_DOUBLE:
>   memcpy(input_ptr, v, 8);
>   break;
> case TYPE_TIMESTAMP:
> case TYPE_STRING:
> case TYPE_VARCHAR:
>   memcpy(input_ptr, v, 16);
>   break;
> default:
>   DCHECK(false) << "NYI";
>   }
> {code}
> STRING and VARCHAR were only 16 bytes because of padding. This padding is 
> removed by IMPALA-7367, so this will read past the end of the actual value. 
> This could in theory lead to a crash.
> We need to change the value, but we should probably also switch to 
> sizeof(StringValue) so that it doesn't get broken by similar changes in 
> future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8151) HiveUdfCall assumes StringValue is 16 bytes

2019-02-06 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8151.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> HiveUdfCall assumes StringValue is 16 bytes
> ---
>
> Key: IMPALA-8151
> URL: https://issues.apache.org/jira/browse/IMPALA-8151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: crash
> Fix For: Impala 3.2.0
>
>
> HiveUdfCall has the sizes of internal types hardcoded as magic numbers:
> {code}
>   switch (GetChild(i)->type().type) {
> case TYPE_BOOLEAN:
> case TYPE_TINYINT:
>   // Using explicit sizes helps the compiler unroll memcpy
>   memcpy(input_ptr, v, 1);
>   break;
> case TYPE_SMALLINT:
>   memcpy(input_ptr, v, 2);
>   break;
> case TYPE_INT:
> case TYPE_FLOAT:
>   memcpy(input_ptr, v, 4);
>   break;
> case TYPE_BIGINT:
> case TYPE_DOUBLE:
>   memcpy(input_ptr, v, 8);
>   break;
> case TYPE_TIMESTAMP:
> case TYPE_STRING:
> case TYPE_VARCHAR:
>   memcpy(input_ptr, v, 16);
>   break;
> default:
>   DCHECK(false) << "NYI";
>   }
> {code}
> STRING and VARCHAR were only 16 bytes because of padding. This padding is 
> removed by IMPALA-7367, so this will read past the end of the actual value. 
> This could in theory lead to a crash.
> We need to change the value, but we should probably also switch to 
> sizeof(StringValue) so that it doesn't get broken by similar changes in 
> future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8151) HiveUdfCall assumes StringValue is 16 bytes

2019-02-01 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758629#comment-16758629
 ] 

Pooja Nilangekar commented on IMPALA-8151:
--

I agree. I believe it would make sense to use sizeof() for all other datatypes 
as well. Since datatypes like TIMESTAMP may be modified in the future. Or would 
it be too much of an overhead? 

> HiveUdfCall assumes StringValue is 16 bytes
> ---
>
> Key: IMPALA-8151
> URL: https://issues.apache.org/jira/browse/IMPALA-8151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: crash
>
> HiveUdfCall has the sizes of internal types hardcoded as magic numbers:
> {code}
>   switch (GetChild(i)->type().type) {
> case TYPE_BOOLEAN:
> case TYPE_TINYINT:
>   // Using explicit sizes helps the compiler unroll memcpy
>   memcpy(input_ptr, v, 1);
>   break;
> case TYPE_SMALLINT:
>   memcpy(input_ptr, v, 2);
>   break;
> case TYPE_INT:
> case TYPE_FLOAT:
>   memcpy(input_ptr, v, 4);
>   break;
> case TYPE_BIGINT:
> case TYPE_DOUBLE:
>   memcpy(input_ptr, v, 8);
>   break;
> case TYPE_TIMESTAMP:
> case TYPE_STRING:
> case TYPE_VARCHAR:
>   memcpy(input_ptr, v, 16);
>   break;
> default:
>   DCHECK(false) << "NYI";
>   }
> {code}
> STRING and VARCHAR were only 16 bytes because of padding. This padding is 
> removed by IMPALA-7367, so this will read past the end of the actual value. 
> This could in theory lead to a crash.
> We need to change the value, but we should probably also switch to 
> sizeof(StringValue) so that it doesn't get broken by similar changes in 
> future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6932) Simple LIMIT 1 query can be really slow on many-filed sequence datasets

2019-01-31 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-6932.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Simple LIMIT 1 query can be really slow on many-filed sequence datasets
> ---
>
> Key: IMPALA-6932
> URL: https://issues.apache.org/jira/browse/IMPALA-6932
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Philip Zeyliger
>Assignee: Pooja Nilangekar
>Priority: Critical
> Fix For: Impala 3.2.0
>
>
> I recently ran across really slow behavior with the trivial {{SELECT * FROM 
> table LIMIT 1}} query. The table used Avro as a file format and had about 
> 45,000 files across about 250 partitions. An optimization kicked in to set 
> NUM_NODES to 1.
> The query ran for about an hour, and the profile indicated that it was 
> opening files:
>   - TotalRawHdfsOpenFileTime(*): 1.0h (3622833666032)
> I took a single minidump while this query was running, and I suspect the 
> query was here:
> {code:java}
> 1 impalad!impala::ScannerContext::Stream::GetNextBuffer(long) 
> [scanner-context.cc : 115 + 0x13]
> 2 impalad!impala::ScannerContext::Stream::GetBytesInternal(long, unsigned 
> char**, bool, long*) [scanner-context.cc : 241 + 0x5]
> 3 impalad!impala::HdfsAvroScanner::ReadFileHeader() [scanner-context.inline.h 
> : 54 + 0x1f]
> 4 impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) 
> [base-sequence-scanner.cc : 157 + 0x13]
> 5 impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 129 + 0xc]
> 6 
> impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, 
> impala::io::ScanRange*) [hdfs-scan-node.cc : 527 + 0x17]
> 7 impalad!impala::HdfsScanNode::ScannerThread() [hdfs-scan-node.cc : 437 + 
> 0x1c]
> 8 impalad!impala::Thread::SuperviseThread(std::string const&, std::string 
> const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) [function_template.hpp : 767 + 0x7]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6932) Simple LIMIT 1 query can be really slow on many-filed sequence datasets

2019-01-31 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-6932.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Simple LIMIT 1 query can be really slow on many-filed sequence datasets
> ---
>
> Key: IMPALA-6932
> URL: https://issues.apache.org/jira/browse/IMPALA-6932
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Philip Zeyliger
>Assignee: Pooja Nilangekar
>Priority: Critical
> Fix For: Impala 3.2.0
>
>
> I recently ran across really slow behavior with the trivial {{SELECT * FROM 
> table LIMIT 1}} query. The table used Avro as a file format and had about 
> 45,000 files across about 250 partitions. An optimization kicked in to set 
> NUM_NODES to 1.
> The query ran for about an hour, and the profile indicated that it was 
> opening files:
>   - TotalRawHdfsOpenFileTime(*): 1.0h (3622833666032)
> I took a single minidump while this query was running, and I suspect the 
> query was here:
> {code:java}
> 1 impalad!impala::ScannerContext::Stream::GetNextBuffer(long) 
> [scanner-context.cc : 115 + 0x13]
> 2 impalad!impala::ScannerContext::Stream::GetBytesInternal(long, unsigned 
> char**, bool, long*) [scanner-context.cc : 241 + 0x5]
> 3 impalad!impala::HdfsAvroScanner::ReadFileHeader() [scanner-context.inline.h 
> : 54 + 0x1f]
> 4 impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) 
> [base-sequence-scanner.cc : 157 + 0x13]
> 5 impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 129 + 0xc]
> 6 
> impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, 
> impala::io::ScanRange*) [hdfs-scan-node.cc : 527 + 0x17]
> 7 impalad!impala::HdfsScanNode::ScannerThread() [hdfs-scan-node.cc : 437 + 
> 0x1c]
> 8 impalad!impala::Thread::SuperviseThread(std::string const&, std::string 
> const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) [function_template.hpp : 767 + 0x7]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8064) test_min_max_filters is flaky

2019-01-25 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8064.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> test_min_max_filters is flaky 
> --
>
> Key: IMPALA-8064
> URL: https://issues.apache.org/jira/browse/IMPALA-8064
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build, flaky-test
> Fix For: Impala 3.2.0
>
> Attachments: profile.txt
>
>
> The following configuration of the test_min_max_filters:
> {code:java}
> query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: kudu/none]{code}
> It produces a higher aggregation of sum over the proberows than expected:
> {code:java}
> query_test/test_runtime_filters.py:113: in test_min_max_filters 
> self.run_test_case('QueryTest/min_max_filters', vector) 
> common/impala_test_suite.py:518: in run_test_case 
> update_section=pytest.config.option.update_results) 
> common/test_result_verifier.py:612: in verify_runtime_profile % 
> (function, field, expected_value, actual_value, actual)) 
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results. 
> E   EXPECTED VALUE: E   619 
> EACTUAL VALUE: E   652
> {code}
> This test was introduced in the patch for IMPALA-6533. The failure occurred 
> during an ASAN build. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8064) test_min_max_filters is flaky

2019-01-25 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8064.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> test_min_max_filters is flaky 
> --
>
> Key: IMPALA-8064
> URL: https://issues.apache.org/jira/browse/IMPALA-8064
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build, flaky-test
> Fix For: Impala 3.2.0
>
> Attachments: profile.txt
>
>
> The following configuration of the test_min_max_filters:
> {code:java}
> query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: kudu/none]{code}
> It produces a higher aggregation of sum over the proberows than expected:
> {code:java}
> query_test/test_runtime_filters.py:113: in test_min_max_filters 
> self.run_test_case('QueryTest/min_max_filters', vector) 
> common/impala_test_suite.py:518: in run_test_case 
> update_section=pytest.config.option.update_results) 
> common/test_result_verifier.py:612: in verify_runtime_profile % 
> (function, field, expected_value, actual_value, actual)) 
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results. 
> E   EXPECTED VALUE: E   619 
> EACTUAL VALUE: E   652
> {code}
> This test was introduced in the patch for IMPALA-6533. The failure occurred 
> during an ASAN build. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IMPALA-8096) Limit on #rows returned from query

2019-01-22 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-8096:


Assignee: Pooja Nilangekar

> Limit on #rows returned from query
> --
>
> Key: IMPALA-8096
> URL: https://issues.apache.org/jira/browse/IMPALA-8096
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: resource-management
>
> Sometimes users accidentally run queries that return a large number of rows, 
> e.g.
> {code}
> SELECT * FROM table
> {code}
> When they really only need to look at a subset of the rows. It would be 
> useful to have a guardrail to fail queries the return more rows than a 
> particular limit. Maybe it would make sense to integrate with IMPALA-4268 so 
> that the query is failed when the buffer fills up, but it may also be useful 
> to have an easier-to-understand option based on #rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8007) test_slow_subscriber is flaky

2019-01-12 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8007.
--
Resolution: Fixed

> test_slow_subscriber is flaky
> -
>
> Key: IMPALA-8007
> URL: https://issues.apache.org/jira/browse/IMPALA-8007
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: broken-build, flaky
> Fix For: Impala 3.2.0
>
>
> We have hit both the asserts in the test.
> *Exhaustive:*
> {noformat}
> statestore/test_statestore.py:574: in test_slow_subscriber assert 
> (secs_since_heartbeat < float(sleep_time + 1.0)) E   assert 
> 8.8043 < 6.0 E+  where 6.0 = float((5 + 1.0))
> Stacktrace
> statestore/test_statestore.py:574: in test_slow_subscriber
> assert (secs_since_heartbeat < float(sleep_time + 1.0))
> E   assert 8.8043 < 6.0
> E+  where 6.0 = float((5 + 1.0))
> {noformat}
> *ASAN*
> {noformat}
> Error Message
> statestore/test_statestore.py:573: in t assert (secs_since_heartbeat > 
> float(sleep_time - 1.0)) E   assert 4.995 > 5.0 E+  where 5.0 = float((6 
> - 1.0))
> Stacktrace
> statestore/test_statestore.py:573: in test_slow_subscriber
> assert (secs_since_heartbeat > float(sleep_time - 1.0))
> E   assert 4.995 > 5.0
> E+  where 5.0 = float((6 - 1.0))
> {noformat}
> I only noticed this happen twice (the above two instances) since the patch is 
> committed. So, looks like a racy bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8007) test_slow_subscriber is flaky

2019-01-12 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8007.
--
Resolution: Fixed

> test_slow_subscriber is flaky
> -
>
> Key: IMPALA-8007
> URL: https://issues.apache.org/jira/browse/IMPALA-8007
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: broken-build, flaky
> Fix For: Impala 3.2.0
>
>
> We have hit both the asserts in the test.
> *Exhaustive:*
> {noformat}
> statestore/test_statestore.py:574: in test_slow_subscriber assert 
> (secs_since_heartbeat < float(sleep_time + 1.0)) E   assert 
> 8.8043 < 6.0 E+  where 6.0 = float((5 + 1.0))
> Stacktrace
> statestore/test_statestore.py:574: in test_slow_subscriber
> assert (secs_since_heartbeat < float(sleep_time + 1.0))
> E   assert 8.8043 < 6.0
> E+  where 6.0 = float((5 + 1.0))
> {noformat}
> *ASAN*
> {noformat}
> Error Message
> statestore/test_statestore.py:573: in t assert (secs_since_heartbeat > 
> float(sleep_time - 1.0)) E   assert 4.995 > 5.0 E+  where 5.0 = float((6 
> - 1.0))
> Stacktrace
> statestore/test_statestore.py:573: in test_slow_subscriber
> assert (secs_since_heartbeat > float(sleep_time - 1.0))
> E   assert 4.995 > 5.0
> E+  where 5.0 = float((6 - 1.0))
> {noformat}
> I only noticed this happen twice (the above two instances) since the patch is 
> committed. So, looks like a racy bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8064) test_min_max_filters is flaky

2019-01-10 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar updated IMPALA-8064:
-
Description: 
The following configuration of the test_min_max_filters:
{code:java}
query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
 beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
0} | table_format: kudu/none]{code}
It produces a higher aggregation of sum over the proberows than expected:
{code:java}
query_test/test_runtime_filters.py:113: in test_min_max_filters 
self.run_test_case('QueryTest/min_max_filters', vector) 
common/impala_test_suite.py:518: in run_test_case 
update_section=pytest.config.option.update_results) 
common/test_result_verifier.py:612: in verify_runtime_profile % (function, 
field, expected_value, actual_value, actual)) 
E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
results. 
E   EXPECTED VALUE: E   619 
EACTUAL VALUE: E   652
{code}

This test was introduced in the patch for IMPALA-6533. The failure occurred 
during an ASAN build. 

  was:
The following configuration of the test_min_max_filters:
{code:java}
query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
 beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
0} | table_format: kudu/none]{code}
It produces a higher aggregation of sum over the proberows than expected:
{code:java}
query_test/test_runtime_filters.py:113: in test_min_max_filters 
self.run_test_case('QueryTest/min_max_filters', vector) 
common/impala_test_suite.py:518: in run_test_case 
update_section=pytest.config.option.update_results) 
common/test_result_verifier.py:612: in verify_runtime_profile % (function, 
field, expected_value, actual_value, actual)) 
E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
results. 
E   EXPECTED VALUE: E   619 
EACTUAL VALUE: E   652
{code}

This test was introduced in the patch for IMPALA-6533


> test_min_max_filters is flaky 
> --
>
> Key: IMPALA-8064
> URL: https://issues.apache.org/jira/browse/IMPALA-8064
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Janaki Lahorani
>Priority: Blocker
>  Labels: broken-build, flaky-test
>
> The following configuration of the test_min_max_filters:
> {code:java}
> query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: kudu/none]{code}
> It produces a higher aggregation of sum over the proberows than expected:
> {code:java}
> query_test/test_runtime_filters.py:113: in test_min_max_filters 
> self.run_test_case('QueryTest/min_max_filters', vector) 
> common/impala_test_suite.py:518: in run_test_case 
> update_section=pytest.config.option.update_results) 
> common/test_result_verifier.py:612: in verify_runtime_profile % 
> (function, field, expected_value, actual_value, actual)) 
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results. 
> E   EXPECTED VALUE: E   619 
> EACTUAL VALUE: E   652
> {code}
> This test was introduced in the patch for IMPALA-6533. The failure occurred 
> during an ASAN build. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8064) test_min_max_filters is flaky

2019-01-10 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-8064:


 Summary: test_min_max_filters is flaky 
 Key: IMPALA-8064
 URL: https://issues.apache.org/jira/browse/IMPALA-8064
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar
Assignee: Janaki Lahorani


The following configuration of the test_min_max_filters:
{code:java}
query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
 beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
0} | table_format: kudu/none]{code}
It produces a higher aggregation of sum over the proberows than expected:
{code:java}
query_test/test_runtime_filters.py:113: in test_min_max_filters 
self.run_test_case('QueryTest/min_max_filters', vector) 
common/impala_test_suite.py:518: in run_test_case 
update_section=pytest.config.option.update_results) 
common/test_result_verifier.py:612: in verify_runtime_profile % (function, 
field, expected_value, actual_value, actual)) 
E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
results. 
E   EXPECTED VALUE: E   619 
EACTUAL VALUE: E   652
{code}

This test was introduced in the patch for IMPALA-6533



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8064) test_min_max_filters is flaky

2019-01-10 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-8064:


 Summary: test_min_max_filters is flaky 
 Key: IMPALA-8064
 URL: https://issues.apache.org/jira/browse/IMPALA-8064
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar
Assignee: Janaki Lahorani


The following configuration of the test_min_max_filters:
{code:java}
query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
 beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
0} | table_format: kudu/none]{code}
It produces a higher aggregation of sum over the proberows than expected:
{code:java}
query_test/test_runtime_filters.py:113: in test_min_max_filters 
self.run_test_case('QueryTest/min_max_filters', vector) 
common/impala_test_suite.py:518: in run_test_case 
update_section=pytest.config.option.update_results) 
common/test_result_verifier.py:612: in verify_runtime_profile % (function, 
field, expected_value, actual_value, actual)) 
E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
results. 
E   EXPECTED VALUE: E   619 
EACTUAL VALUE: E   652
{code}

This test was introduced in the patch for IMPALA-6533



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8063) Excessive logging from BeeswaxConnection::get_state() bloats JUnitXML output

2019-01-09 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739018#comment-16739018
 ] 

Pooja Nilangekar commented on IMPALA-8063:
--

The sleep alone won't help since python's sleep function doesn't guarantee 
much. I read this in docs while working on a fix for IMPALA-8007:
 _Also, the suspension time may be longer than requested by an arbitrary amount 
because of the scheduling of other activity in the system. _

Actually, those tests are currently disabled (IMPALA-8059), currently they're 
marked as XFAIL. So would it also make sense to lower the timeout value? Since 
100 seconds is a long time to be spinning in a loop for a test which is 
currently disabled.

> Excessive logging from BeeswaxConnection::get_state() bloats JUnitXML output
> 
>
> Key: IMPALA-8063
> URL: https://issues.apache.org/jira/browse/IMPALA-8063
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Priority: Blocker
>  Labels: broken-build
>
> BeeswaxConnection has logging for each call of get_state:
>  
> {code:java}
> def get_state(self, operation_handle):
>   LOG.info("-- getting state for operation: %s" % operation_handle)
>   return self.__beeswax_client.get_state(operation_handle.get_handle())
> {code}
> With IMPALA-7625, ImpalaTestSuite::wait_for_state() calls this more 
> frequently:
>  
>  
> {code:java}
> def wait_for_state(self, handle, expected_state, timeout):
>   """Waits for the given 'query_handle' to reach the 'expected_state'. If it 
> does not
>   reach the given state within 'timeout' seconds, the method throws an 
> AssertionError.
>   """
>   start_time = time.time()
>   actual_state = self.client.get_state(handle)
>   while actual_state != expected_state and time.time() - start_time < timeout:
> actual_state = self.client.get_state(handle)
>   if actual_state != expected_state:
> raise Timeout("query '%s' did not reach expected state '%s', last known 
> state '%s'"
>   % (handle.get_handle().id, expected_state, actual_state))
> {code}
> When running our tests in exhaustive mode, that increases the size of the 
> logging significantly. For example:
>  
> {noformat}
> Before this change:
> $ ls -l TEST-impala-parallel.xml
> -rw-rw-r-- 1 joe joe 34254745 Jan 6 23:23 TEST-impala-parallel.xml
> $ grep "getting state for operation" TEST-impala-parallel.xml | wc -
> l
> 1044
> After this change:
> $ ls -l TEST-impala-parallel.xml
> -rw-rw-r-- 1 joe joe 159187682 Jan 9 14:51 TEST-impala-parallel.xml
> $ grep "getting state for operation" TEST-impala-parallel.xml | wc -
> l
> 1167084
> {noformat}
>  
>  
> We should reduce this logging. Bloated JUnitXML files add burden to developer 
> workstations and any Jenkins infrastructure parsing them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-6544) Lack of S3 consistency leads to rare test failures

2019-01-09 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-6544:


Assignee: Pooja Nilangekar

> Lack of S3 consistency leads to rare test failures
> --
>
> Key: IMPALA-6544
> URL: https://issues.apache.org/jira/browse/IMPALA-6544
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 2.8.0
>Reporter: Sailesh Mukil
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: S3, broken-build, consistency, flaky, test-framework
>
> Every now and then, we hit a flaky test on S3 runs due to files missing when 
> they should be present, and vice versa. We could consider running our tests 
> (or a subset of our tests) with S3Guard to avoid these problems, however rare 
> they are.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8059) TestWebPage::test_backend_states is flaky

2019-01-08 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-8059:


 Summary: TestWebPage::test_backend_states is flaky
 Key: IMPALA-8059
 URL: https://issues.apache.org/jira/browse/IMPALA-8059
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar


test_backend_states is flaky. The query reaches the _"FINISHED"_ state before 
it's state is verified by the python test. Here are the relevant log: 

{code:java}
07:33:45 - Captured stderr call 
-
07:33:45 -- executing async: localhost:21000
07:33:45 select sleep(1) from functional.alltypes limit 1;
07:33:45 
07:33:45 -- 2019-01-08 07:31:57,952 INFO MainThread: Started query 
7f46f15ed4d6d0f6:4d58cdbc
07:33:45 -- getting state for operation: 

{code}


This bug was introduced by IMPALA-7625.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8059) TestWebPage::test_backend_states is flaky

2019-01-08 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-8059:


 Summary: TestWebPage::test_backend_states is flaky
 Key: IMPALA-8059
 URL: https://issues.apache.org/jira/browse/IMPALA-8059
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar


test_backend_states is flaky. The query reaches the _"FINISHED"_ state before 
it's state is verified by the python test. Here are the relevant log: 

{code:java}
07:33:45 - Captured stderr call 
-
07:33:45 -- executing async: localhost:21000
07:33:45 select sleep(1) from functional.alltypes limit 1;
07:33:45 
07:33:45 -- 2019-01-08 07:31:57,952 INFO MainThread: Started query 
7f46f15ed4d6d0f6:4d58cdbc
07:33:45 -- getting state for operation: 

{code}


This bug was introduced by IMPALA-7625.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8007) test_slow_subscriber is flaky

2019-01-07 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736730#comment-16736730
 ] 

Pooja Nilangekar commented on IMPALA-8007:
--

I was looking for reasons for the secs_since_heartbeat to be lower than the 
sleep time and I found the following in the python docs:
{code:java}
The actual suspension time may be less than that requested because any caught 
signal will terminate the sleep() following execution of that signal’s catching 
routine. Also, the suspension time may be longer than requested by an arbitrary 
amount because of the scheduling of other activity in the system.

Changed in version 3.5: The function now sleeps at least secs even if the sleep 
is interrupted by a signal, except if the signal handler raises an exception 
(see PEP 475 for the rationale).
{code}
As per [~tarmstrong]'s suggestion I modified the test to check that the 
secs_since_heartbeats is always greater than the previous time. This test also 
fails because the statestore's web UI provides the duration in millisecond 
precision while the sleep function can wake up in less than one millisecond. So 
I there are two options here
 # Check for a monotonically increasing duration instead of a strictly 
increasing duration. This would help solve the issue of the test but I am not 
sure that it'd actually validate the correctness of the monitoring thread. (A 
thread could return the exact same value for several seconds/minutes and that 
would still be accepted by the thread).
 # Update the version of the time library used to 3.5 or higher and then check 
for strictly increasing duration since heartbeat. This might affect other 
instances where "time" is imported but would actually validate the heartbeat 
monitoring thread. 

Tim, what do you suggest? 

> test_slow_subscriber is flaky
> -
>
> Key: IMPALA-8007
> URL: https://issues.apache.org/jira/browse/IMPALA-8007
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: broken-build, flaky
> Fix For: Impala 3.2.0
>
>
> We have hit both the asserts in the test.
> *Exhaustive:*
> {noformat}
> statestore/test_statestore.py:574: in test_slow_subscriber assert 
> (secs_since_heartbeat < float(sleep_time + 1.0)) E   assert 
> 8.8043 < 6.0 E+  where 6.0 = float((5 + 1.0))
> Stacktrace
> statestore/test_statestore.py:574: in test_slow_subscriber
> assert (secs_since_heartbeat < float(sleep_time + 1.0))
> E   assert 8.8043 < 6.0
> E+  where 6.0 = float((5 + 1.0))
> {noformat}
> *ASAN*
> {noformat}
> Error Message
> statestore/test_statestore.py:573: in t assert (secs_since_heartbeat > 
> float(sleep_time - 1.0)) E   assert 4.995 > 5.0 E+  where 5.0 = float((6 
> - 1.0))
> Stacktrace
> statestore/test_statestore.py:573: in test_slow_subscriber
> assert (secs_since_heartbeat > float(sleep_time - 1.0))
> E   assert 4.995 > 5.0
> E+  where 5.0 = float((6 - 1.0))
> {noformat}
> I only noticed this happen twice (the above two instances) since the patch is 
> committed. So, looks like a racy bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7882) ASAN failure in llvm-codegen-test

2018-11-27 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7882.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> ASAN failure in llvm-codegen-test
> -
>
> Key: IMPALA-7882
> URL: https://issues.apache.org/jira/browse/IMPALA-7882
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.2.0
>
>
> The llvm-codegen-test backend test is failing under ASAN with the following 
> output:
> {noformat}
> 18:12:34 [ RUN  ] LlvmCodeGenTest.StringValue
> 18:12:34 =
> 18:12:34 ==124917==ERROR: AddressSanitizer: stack-buffer-overflow on address 
> 0x7ffc0f39e86c at pc 0x017ea479 bp 0x7ffc0f39e550 sp 0x7ffc0f39e548
> 18:12:34 READ of size 4 at 0x7ffc0f39e86c thread T0
> 18:12:34 #0 0x17ea478 in testing::AssertionResult 
> testing::internal::CmpHelperEQ(char const*, char const*, int 
> const&, int const&) 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/gtest-1.6.0/include/gtest/gtest.h:1316:19
> 18:12:34 #1 0x17d3a8d in 
> _ZN7testing8internal8EqHelperILb1EE7CompareIiiEENS_15AssertionResultEPKcS6_RKT_RKT0_PNS0_8EnableIfIXntsr10is_pointerISA_EE5valueEE4typeE
>  
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/gtest-1.6.0/include/gtest/gtest.h:1392:12
> 18:12:34 #2 0x17c656b in 
> impala::LlvmCodeGenTest_StringValue_Test::TestBody() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/codegen/llvm-codegen-test.cc:379:3
> 18:12:34 #3 0x4d55af2 in void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d55af2)
> 18:12:34 #4 0x4d4c669 in testing::Test::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4c669)
> 18:12:34 #5 0x4d4c7b7 in testing::TestInfo::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4c7b7)
> 18:12:34 #6 0x4d4c894 in testing::TestCase::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4c894)
> 18:12:34 #7 0x4d4db17 in testing::internal::UnitTestImpl::RunAllTests() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4db17)
> 18:12:34 #8 0x4d4ddf2 in testing::UnitTest::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4ddf2)
> 18:12:34 #9 0x17ce16e in main 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/codegen/llvm-codegen-test.cc:569:10
> 18:12:34 #10 0x7fc221bd5c04 in __libc_start_main 
> (/lib64/libc.so.6+0x21c04)
> 18:12:34 #11 0x16b63c6 in _start 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x16b63c6)
> 18:12:34 
> 18:12:34 Address 0x7ffc0f39e86c is located in stack of thread T0 at offset 
> 492 in frame
> 18:12:34 #0 0x17c567f in 
> impala::LlvmCodeGenTest_StringValue_Test::TestBody() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/codegen/llvm-codegen-test.cc:343
> 18:12:34 
> 18:12:34   This frame has 57 object(s):
> 18:12:34 [32, 40) 'codegen' (line 344)
> 18:12:34 [64, 72) 'ref.tmp' (line 345)
> 18:12:34 [96, 104) 'ref.tmp2' (line 345)
> 18:12:34 [128, 129) 'ref.tmp3' (line 345)
> 18:12:34 [144, 160) 'gtest_ar_' (line 345)
> 18:12:34 [176, 184) 'temp.lvalue'
> 18:12:34 [208, 216) 'ref.tmp6' (line 345)
> 18:12:34 [240, 248) 'temp.lvalue8'
> 18:12:34 [272, 288) 'ref.tmp9' (line 345)
> 18:12:34 [304, 320) 'gtest_ar_12' (line 346)
> 18:12:34 [336, 344) 'ref.tmp15' (line 346)
> 18:12:34 [368, 376) 'temp.lvalue16'
> 18:12:34 [400, 416) 'ref.tmp17' (line 346)
> 18:12:34 [432, 440) 'str' (line 348)
> 18:12:34 [464, 465) 'ref.tmp19' (line 348)
> 18:12:34 [480, 492) 'str_val' (line 350) <== Memory access at offset 492 
> overflows this variable
> 18:12:34 [512, 528) 'gtest_ar_24' (line 357)
> 18:12:34 [544, 552) 'ref.tmp27' (line 357)
> 18:12:34 [576, 584) 'temp.lvalue28'
> 18:12:34 [608, 624) 'ref.tmp29' (line 357)
> 18:12:34 [640, 648) 'jitted_fn' (line 360)
> 18:12:34 [672, 680) 'ref.tmp33' (line 362)
> 18:12:34 [704, 720) 'gtest_ar_35' 

[jira] [Resolved] (IMPALA-7882) ASAN failure in llvm-codegen-test

2018-11-27 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7882.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> ASAN failure in llvm-codegen-test
> -
>
> Key: IMPALA-7882
> URL: https://issues.apache.org/jira/browse/IMPALA-7882
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.2.0
>
>
> The llvm-codegen-test backend test is failing under ASAN with the following 
> output:
> {noformat}
> 18:12:34 [ RUN  ] LlvmCodeGenTest.StringValue
> 18:12:34 =
> 18:12:34 ==124917==ERROR: AddressSanitizer: stack-buffer-overflow on address 
> 0x7ffc0f39e86c at pc 0x017ea479 bp 0x7ffc0f39e550 sp 0x7ffc0f39e548
> 18:12:34 READ of size 4 at 0x7ffc0f39e86c thread T0
> 18:12:34 #0 0x17ea478 in testing::AssertionResult 
> testing::internal::CmpHelperEQ(char const*, char const*, int 
> const&, int const&) 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/gtest-1.6.0/include/gtest/gtest.h:1316:19
> 18:12:34 #1 0x17d3a8d in 
> _ZN7testing8internal8EqHelperILb1EE7CompareIiiEENS_15AssertionResultEPKcS6_RKT_RKT0_PNS0_8EnableIfIXntsr10is_pointerISA_EE5valueEE4typeE
>  
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/gtest-1.6.0/include/gtest/gtest.h:1392:12
> 18:12:34 #2 0x17c656b in 
> impala::LlvmCodeGenTest_StringValue_Test::TestBody() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/codegen/llvm-codegen-test.cc:379:3
> 18:12:34 #3 0x4d55af2 in void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d55af2)
> 18:12:34 #4 0x4d4c669 in testing::Test::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4c669)
> 18:12:34 #5 0x4d4c7b7 in testing::TestInfo::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4c7b7)
> 18:12:34 #6 0x4d4c894 in testing::TestCase::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4c894)
> 18:12:34 #7 0x4d4db17 in testing::internal::UnitTestImpl::RunAllTests() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4db17)
> 18:12:34 #8 0x4d4ddf2 in testing::UnitTest::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4ddf2)
> 18:12:34 #9 0x17ce16e in main 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/codegen/llvm-codegen-test.cc:569:10
> 18:12:34 #10 0x7fc221bd5c04 in __libc_start_main 
> (/lib64/libc.so.6+0x21c04)
> 18:12:34 #11 0x16b63c6 in _start 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x16b63c6)
> 18:12:34 
> 18:12:34 Address 0x7ffc0f39e86c is located in stack of thread T0 at offset 
> 492 in frame
> 18:12:34 #0 0x17c567f in 
> impala::LlvmCodeGenTest_StringValue_Test::TestBody() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/codegen/llvm-codegen-test.cc:343
> 18:12:34 
> 18:12:34   This frame has 57 object(s):
> 18:12:34 [32, 40) 'codegen' (line 344)
> 18:12:34 [64, 72) 'ref.tmp' (line 345)
> 18:12:34 [96, 104) 'ref.tmp2' (line 345)
> 18:12:34 [128, 129) 'ref.tmp3' (line 345)
> 18:12:34 [144, 160) 'gtest_ar_' (line 345)
> 18:12:34 [176, 184) 'temp.lvalue'
> 18:12:34 [208, 216) 'ref.tmp6' (line 345)
> 18:12:34 [240, 248) 'temp.lvalue8'
> 18:12:34 [272, 288) 'ref.tmp9' (line 345)
> 18:12:34 [304, 320) 'gtest_ar_12' (line 346)
> 18:12:34 [336, 344) 'ref.tmp15' (line 346)
> 18:12:34 [368, 376) 'temp.lvalue16'
> 18:12:34 [400, 416) 'ref.tmp17' (line 346)
> 18:12:34 [432, 440) 'str' (line 348)
> 18:12:34 [464, 465) 'ref.tmp19' (line 348)
> 18:12:34 [480, 492) 'str_val' (line 350) <== Memory access at offset 492 
> overflows this variable
> 18:12:34 [512, 528) 'gtest_ar_24' (line 357)
> 18:12:34 [544, 552) 'ref.tmp27' (line 357)
> 18:12:34 [576, 584) 'temp.lvalue28'
> 18:12:34 [608, 624) 'ref.tmp29' (line 357)
> 18:12:34 [640, 648) 'jitted_fn' (line 360)
> 18:12:34 [672, 680) 'ref.tmp33' (line 362)
> 18:12:34 [704, 720) 'gtest_ar_35' 

[jira] [Created] (IMPALA-7878) Bad SQL generated by compute incremental stats

2018-11-21 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7878:


 Summary: Bad SQL generated by compute incremental stats 
 Key: IMPALA-7878
 URL: https://issues.apache.org/jira/browse/IMPALA-7878
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Pooja Nilangekar
Assignee: Paul Rogers


Computing incremental stats on partitions generates bad sql for instance: 
For a table foo partitioned by column bar, the compute stats statement:

{code:java}
compute incremental stats foo partition (bar = 1); 
{code}

would generate the following query: 

{code:java}
SELECT COUNT(*), month FROM foo WHERE (bar=1) GROUP BY bar;
{code}

If this were to be rewritten as follows, it would produce fewer fragments and 
hence also reduce query memory by avoiding a hash aggregation node. 

{code:java}
SELECT COUNT(*), 1 FROM foo WHERE bar=1; 
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-7878) Bad SQL generated by compute incremental stats

2018-11-21 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695190#comment-16695190
 ] 

Pooja Nilangekar commented on IMPALA-7878:
--

[~Paul.Rogers] Assigning it to you because you've be working on the 
analysis/rewrite part of the front end. You could add to the list if you think 
this is related. Please reassign/reword if necessary. 

> Bad SQL generated by compute incremental stats 
> ---
>
> Key: IMPALA-7878
> URL: https://issues.apache.org/jira/browse/IMPALA-7878
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Pooja Nilangekar
>Assignee: Paul Rogers
>Priority: Major
>
> Computing incremental stats on partitions generates bad sql for instance: 
> For a table foo partitioned by column bar, the compute stats statement:
> {code:java}
> compute incremental stats foo partition (bar = 1); 
> {code}
> would generate the following query: 
> {code:java}
> SELECT COUNT(*), month FROM foo WHERE (bar=1) GROUP BY bar;
> {code}
> If this were to be rewritten as follows, it would produce fewer fragments and 
> hence also reduce query memory by avoiding a hash aggregation node. 
> {code:java}
> SELECT COUNT(*), 1 FROM foo WHERE bar=1; 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7873) TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit

2018-11-20 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7873.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory 
> limit
> -
>
> Key: IMPALA-7873
> URL: https://issues.apache.org/jira/browse/IMPALA-7873
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.2.0
>
>
> This is failing to hit a memory exceeded on the last two core runs:
> {noformat}
> query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling
> self.run_test_case('QueryTest/exchange-mem-scaling', vector)
> common/impala_test_suite.py:482: in run_test_case
> assert False, "Expected exception: %s" % expected_str
> E   AssertionError: Expected exception: Memory limit exceeded{noformat}
> It might be that the limit needs to be adjusted. 
> There were two changes since the last successful run: IMPALA-7367 
> (2a4835cfba7597362cc1e72e21315868c5c75d0a) and IMPALA-5031 
> (53ce6bb571cd9ae07ba5255197d35aa852a6f97c)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7873) TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit

2018-11-20 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7873.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory 
> limit
> -
>
> Key: IMPALA-7873
> URL: https://issues.apache.org/jira/browse/IMPALA-7873
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.2.0
>
>
> This is failing to hit a memory exceeded on the last two core runs:
> {noformat}
> query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling
> self.run_test_case('QueryTest/exchange-mem-scaling', vector)
> common/impala_test_suite.py:482: in run_test_case
> assert False, "Expected exception: %s" % expected_str
> E   AssertionError: Expected exception: Memory limit exceeded{noformat}
> It might be that the limit needs to be adjusted. 
> There were two changes since the last successful run: IMPALA-7367 
> (2a4835cfba7597362cc1e72e21315868c5c75d0a) and IMPALA-5031 
> (53ce6bb571cd9ae07ba5255197d35aa852a6f97c)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7856) test_exchange_mem_usage_scaling failing, not hitting expected OOM

2018-11-20 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693727#comment-16693727
 ] 

Pooja Nilangekar commented on IMPALA-7856:
--

I guess this occurred before IMPALA-7367 went in. 

> test_exchange_mem_usage_scaling failing, not hitting expected OOM
> -
>
> Key: IMPALA-7856
> URL: https://issues.apache.org/jira/browse/IMPALA-7856
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.2.0
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Critical
>  Labels: broken-build, flaky-test
> Fix For: Not Applicable
>
>
> {noformat}
> query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling
> self.run_test_case('QueryTest/exchange-mem-scaling', vector)
> common/impala_test_suite.py:482: in run_test_case
> assert False, "Expected exception: %s" % expected_str
> E   AssertionError: Expected exception: Memory limit exceeded
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7873) TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit

2018-11-19 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692521#comment-16692521
 ] 

Pooja Nilangekar commented on IMPALA-7873:
--

Yes, I hit the issue while testing IMPALA-7367. I think the limit I set may not 
be accurate. I will take a look at this. 

> TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory 
> limit
> -
>
> Key: IMPALA-7873
> URL: https://issues.apache.org/jira/browse/IMPALA-7873
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Priority: Blocker
>  Labels: broken-build
>
> This is failing to hit a memory exceeded on the last two core runs:
> {noformat}
> query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling
> self.run_test_case('QueryTest/exchange-mem-scaling', vector)
> common/impala_test_suite.py:482: in run_test_case
> assert False, "Expected exception: %s" % expected_str
> E   AssertionError: Expected exception: Memory limit exceeded{noformat}
> It might be that the limit needs to be adjusted. 
> There were two changes since the last successful run: IMPALA-7367 
> (2a4835cfba7597362cc1e72e21315868c5c75d0a) and IMPALA-5031 
> (53ce6bb571cd9ae07ba5255197d35aa852a6f97c)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7873) TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit

2018-11-19 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-7873:


Assignee: Pooja Nilangekar

> TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory 
> limit
> -
>
> Key: IMPALA-7873
> URL: https://issues.apache.org/jira/browse/IMPALA-7873
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build
>
> This is failing to hit a memory exceeded on the last two core runs:
> {noformat}
> query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling
> self.run_test_case('QueryTest/exchange-mem-scaling', vector)
> common/impala_test_suite.py:482: in run_test_case
> assert False, "Expected exception: %s" % expected_str
> E   AssertionError: Expected exception: Memory limit exceeded{noformat}
> It might be that the limit needs to be adjusted. 
> There were two changes since the last successful run: IMPALA-7367 
> (2a4835cfba7597362cc1e72e21315868c5c75d0a) and IMPALA-5031 
> (53ce6bb571cd9ae07ba5255197d35aa852a6f97c)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7367) Pack StringValue, CollectionValue and TimestampValue slots

2018-11-19 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7367.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Pack StringValue, CollectionValue and TimestampValue slots
> --
>
> Key: IMPALA-7367
> URL: https://issues.apache.org/jira/browse/IMPALA-7367
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: perfomance
> Fix For: Impala 3.2.0
>
> Attachments: 0001-WIP.patch
>
>
> This is a follow-on to finish up the work from IMPALA-2789. IMPALA-2789 
> didn't actually fully pack the memory layout because StringValue, 
> TimestampValue and CollectionValue still occupy 16 bytes but only have 12 
> bytes of actual data. This results in a higher memory footprint, which leads 
> to higher memory requirements and worse performance. We don't get any benefit 
> from the padding since the majority of tuples are not actually aligned in 
> memory anyway.
> I did a quick version of the change for StringValue only which improves TPC-H 
> performance.
> {noformat}
> Report Generated on 2018-07-30
> Run Description: "b5608264b4552e44eb73ded1e232a8775c3dba6b vs 
> f1e401505ac20c0400eec819b9196f7f506fb927"
> Cluster Name: UNKNOWN
> Lab Run Info: UNKNOWN
> Impala Version:  impalad version 3.1.0-SNAPSHOT RELEASE ()
> Baseline Impala Version: impalad version 3.1.0-SNAPSHOT RELEASE (2018-07-27)
> +--+---+-++++
> | Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
> Delta(GeoMean) |
> +--+---+-++++
> | TPCH(10) | parquet / none / none | 2.69| -4.78% | 2.09   | 
> -3.11% |
> +--+---+-++++
> +--+--+---++-++++-+---+
> | Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
> Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
> +--+--+---++-++++-+---+
> | TPCH(10) | TPCH-Q22 | parquet / none / none | 0.94   | 0.93|   
> +0.75%   |   3.37%|   2.84%| 1   | 30|
> | TPCH(10) | TPCH-Q13 | parquet / none / none | 3.32   | 3.32|   
> +0.13%   |   1.74%|   2.09%| 1   | 30|
> | TPCH(10) | TPCH-Q11 | parquet / none / none | 0.99   | 0.99|   
> -0.02%   |   3.74%|   3.16%| 1   | 30|
> | TPCH(10) | TPCH-Q5  | parquet / none / none | 2.30   | 2.33|   
> -0.96%   |   2.15%|   2.45%| 1   | 30|
> | TPCH(10) | TPCH-Q2  | parquet / none / none | 1.55   | 1.57|   
> -1.45%   |   1.65%|   1.49%| 1   | 30|
> | TPCH(10) | TPCH-Q8  | parquet / none / none | 2.89   | 2.93|   
> -1.51%   |   2.69%|   1.34%| 1   | 30|
> | TPCH(10) | TPCH-Q9  | parquet / none / none | 5.96   | 6.06|   
> -1.63%   |   1.34%|   1.82%| 1   | 30|
> | TPCH(10) | TPCH-Q20 | parquet / none / none | 1.58   | 1.61|   
> -1.85%   |   2.28%|   2.16%| 1   | 30|
> | TPCH(10) | TPCH-Q16 | parquet / none / none | 1.18   | 1.21|   
> -2.11%   |   3.68%|   4.72%| 1   | 30|
> | TPCH(10) | TPCH-Q3  | parquet / none / none | 2.13   | 2.18|   
> -2.31%   |   2.09%|   1.92%| 1   | 30|
> | TPCH(10) | TPCH-Q15 | parquet / none / none | 1.86   | 1.90|   
> -2.52%   |   2.06%|   2.22%| 1   | 30|
> | TPCH(10) | TPCH-Q17 | parquet / none / none | 1.85   | 1.90|   
> -2.86%   |   10.00%   |   8.02%| 1   | 30|
> | TPCH(10) | TPCH-Q10 | parquet / none / none | 2.58   | 2.66|   
> -2.93%   |   1.68%|   6.49%| 1   | 30|
> | TPCH(10) | TPCH-Q14 | parquet / none / none | 1.37   | 1.42|   
> -3.22%   |   3.35%|   6.24%| 1   | 30|
> | TPCH(10) | TPCH-Q18 | parquet / none / none | 4.99   | 5.17|   
> -3.38%   |   1.75%|   3.82%| 1   | 30|
> | TPCH(10) | TPCH-Q6  | parquet / none / none | 0.66   | 0.69|   
> -3.73%   |   5.04%|   4.12%| 1   | 30|
> | TPCH(10) | TPCH-Q4  | parquet / none / none | 1.07   | 1.12|   
> -3.97%   |   1.79%|   2.85%| 1   

[jira] [Resolved] (IMPALA-7367) Pack StringValue, CollectionValue and TimestampValue slots

2018-11-19 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7367.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Pack StringValue, CollectionValue and TimestampValue slots
> --
>
> Key: IMPALA-7367
> URL: https://issues.apache.org/jira/browse/IMPALA-7367
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: perfomance
> Fix For: Impala 3.2.0
>
> Attachments: 0001-WIP.patch
>
>
> This is a follow-on to finish up the work from IMPALA-2789. IMPALA-2789 
> didn't actually fully pack the memory layout because StringValue, 
> TimestampValue and CollectionValue still occupy 16 bytes but only have 12 
> bytes of actual data. This results in a higher memory footprint, which leads 
> to higher memory requirements and worse performance. We don't get any benefit 
> from the padding since the majority of tuples are not actually aligned in 
> memory anyway.
> I did a quick version of the change for StringValue only which improves TPC-H 
> performance.
> {noformat}
> Report Generated on 2018-07-30
> Run Description: "b5608264b4552e44eb73ded1e232a8775c3dba6b vs 
> f1e401505ac20c0400eec819b9196f7f506fb927"
> Cluster Name: UNKNOWN
> Lab Run Info: UNKNOWN
> Impala Version:  impalad version 3.1.0-SNAPSHOT RELEASE ()
> Baseline Impala Version: impalad version 3.1.0-SNAPSHOT RELEASE (2018-07-27)
> +--+---+-++++
> | Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
> Delta(GeoMean) |
> +--+---+-++++
> | TPCH(10) | parquet / none / none | 2.69| -4.78% | 2.09   | 
> -3.11% |
> +--+---+-++++
> +--+--+---++-++++-+---+
> | Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
> Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
> +--+--+---++-++++-+---+
> | TPCH(10) | TPCH-Q22 | parquet / none / none | 0.94   | 0.93|   
> +0.75%   |   3.37%|   2.84%| 1   | 30|
> | TPCH(10) | TPCH-Q13 | parquet / none / none | 3.32   | 3.32|   
> +0.13%   |   1.74%|   2.09%| 1   | 30|
> | TPCH(10) | TPCH-Q11 | parquet / none / none | 0.99   | 0.99|   
> -0.02%   |   3.74%|   3.16%| 1   | 30|
> | TPCH(10) | TPCH-Q5  | parquet / none / none | 2.30   | 2.33|   
> -0.96%   |   2.15%|   2.45%| 1   | 30|
> | TPCH(10) | TPCH-Q2  | parquet / none / none | 1.55   | 1.57|   
> -1.45%   |   1.65%|   1.49%| 1   | 30|
> | TPCH(10) | TPCH-Q8  | parquet / none / none | 2.89   | 2.93|   
> -1.51%   |   2.69%|   1.34%| 1   | 30|
> | TPCH(10) | TPCH-Q9  | parquet / none / none | 5.96   | 6.06|   
> -1.63%   |   1.34%|   1.82%| 1   | 30|
> | TPCH(10) | TPCH-Q20 | parquet / none / none | 1.58   | 1.61|   
> -1.85%   |   2.28%|   2.16%| 1   | 30|
> | TPCH(10) | TPCH-Q16 | parquet / none / none | 1.18   | 1.21|   
> -2.11%   |   3.68%|   4.72%| 1   | 30|
> | TPCH(10) | TPCH-Q3  | parquet / none / none | 2.13   | 2.18|   
> -2.31%   |   2.09%|   1.92%| 1   | 30|
> | TPCH(10) | TPCH-Q15 | parquet / none / none | 1.86   | 1.90|   
> -2.52%   |   2.06%|   2.22%| 1   | 30|
> | TPCH(10) | TPCH-Q17 | parquet / none / none | 1.85   | 1.90|   
> -2.86%   |   10.00%   |   8.02%| 1   | 30|
> | TPCH(10) | TPCH-Q10 | parquet / none / none | 2.58   | 2.66|   
> -2.93%   |   1.68%|   6.49%| 1   | 30|
> | TPCH(10) | TPCH-Q14 | parquet / none / none | 1.37   | 1.42|   
> -3.22%   |   3.35%|   6.24%| 1   | 30|
> | TPCH(10) | TPCH-Q18 | parquet / none / none | 4.99   | 5.17|   
> -3.38%   |   1.75%|   3.82%| 1   | 30|
> | TPCH(10) | TPCH-Q6  | parquet / none / none | 0.66   | 0.69|   
> -3.73%   |   5.04%|   4.12%| 1   | 30|
> | TPCH(10) | TPCH-Q4  | parquet / none / none | 1.07   | 1.12|   
> -3.97%   |   1.79%|   2.85%| 1   

[jira] [Resolved] (IMPALA-7791) Aggregation Node memory estimates don't account for number of fragment instances

2018-11-08 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7791.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Aggregation Node memory estimates don't account for number of fragment 
> instances
> 
>
> Key: IMPALA-7791
> URL: https://issues.apache.org/jira/browse/IMPALA-7791
> Project: IMPALA
>  Issue Type: Sub-task
>Affects Versions: Impala 3.1.0
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
> Fix For: Impala 3.1.0
>
>
> AggregationNode's memory estimates are calculated based on the input 
> cardinality of the node, without accounting for the division of input data 
> across fragment instances. This results in very high memory estimates. In 
> reality, the nodes often use only a part of this memory.   
> Example query:
> {code:java}
> [localhost:21000] default> select distinct * from tpch.lineitem limit 5; 
> {code}
> Summary: 
> {code:java}
> +--++--+--+---++---+---+---+
> | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem 
>  | Est. Peak Mem | Detail 
>   
>   
>   
>   
>|
> +--++--+--+---++---+---+---+
> | 04:EXCHANGE  | 1  | 21.24us  | 21.24us  | 5 | 5  | 48.00 KB 
>  | 16.00 KB  | UNPARTITIONED  
>   
>   
>   
>   
>|
> | 03:AGGREGATE | 3  | 5.11s| 5.15s| 15| 5  | 576.21 
> MB | 1.62 GB   | FINALIZE 
>   
>   
>   
>   
>  |
> | 02:EXCHANGE  | 3  | 709.75ms | 728.91ms | 6.00M | 6.00M  | 5.46 MB  
>  | 10.78 MB  | 
> HASH(tpch.lineitem.l_orderkey,tpch.lineitem.l_partkey,tpch.lineitem.l_suppkey,tpch.lineitem.l_linenumber,tpch.lineitem.l_quantity,tpch.lineitem.l_extendedprice,tpch.lineitem.l_discount,tpch.lineitem.l_tax,tpch.lineitem.l_returnflag,tpch.lineitem.l_linestatus,tpch.lineitem.l_shipdate,tpch.lineitem.l_commitdate,tpch.lineitem.l_receiptdate,tpch.lineitem.l_shipinstruct,tpch.lineitem.l_shipmode,tpch.lineitem.l_comment)
>  |
> | 01:AGGREGATE | 3  | 4.37s| 4.70s| 6.00M | 6.00M  | 36.77 MB 
>  | 1.62 GB   | STREAMING  
>   
>   
>   
>   
>|
> | 00:SCAN HDFS | 3  | 437.14ms | 480.60ms | 6.00M | 

[jira] [Resolved] (IMPALA-7791) Aggregation Node memory estimates don't account for number of fragment instances

2018-11-08 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7791.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Aggregation Node memory estimates don't account for number of fragment 
> instances
> 
>
> Key: IMPALA-7791
> URL: https://issues.apache.org/jira/browse/IMPALA-7791
> Project: IMPALA
>  Issue Type: Sub-task
>Affects Versions: Impala 3.1.0
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
> Fix For: Impala 3.1.0
>
>
> AggregationNode's memory estimates are calculated based on the input 
> cardinality of the node, without accounting for the division of input data 
> across fragment instances. This results in very high memory estimates. In 
> reality, the nodes often use only a part of this memory.   
> Example query:
> {code:java}
> [localhost:21000] default> select distinct * from tpch.lineitem limit 5; 
> {code}
> Summary: 
> {code:java}
> +--++--+--+---++---+---+---+
> | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem 
>  | Est. Peak Mem | Detail 
>   
>   
>   
>   
>|
> +--++--+--+---++---+---+---+
> | 04:EXCHANGE  | 1  | 21.24us  | 21.24us  | 5 | 5  | 48.00 KB 
>  | 16.00 KB  | UNPARTITIONED  
>   
>   
>   
>   
>|
> | 03:AGGREGATE | 3  | 5.11s| 5.15s| 15| 5  | 576.21 
> MB | 1.62 GB   | FINALIZE 
>   
>   
>   
>   
>  |
> | 02:EXCHANGE  | 3  | 709.75ms | 728.91ms | 6.00M | 6.00M  | 5.46 MB  
>  | 10.78 MB  | 
> HASH(tpch.lineitem.l_orderkey,tpch.lineitem.l_partkey,tpch.lineitem.l_suppkey,tpch.lineitem.l_linenumber,tpch.lineitem.l_quantity,tpch.lineitem.l_extendedprice,tpch.lineitem.l_discount,tpch.lineitem.l_tax,tpch.lineitem.l_returnflag,tpch.lineitem.l_linestatus,tpch.lineitem.l_shipdate,tpch.lineitem.l_commitdate,tpch.lineitem.l_receiptdate,tpch.lineitem.l_shipinstruct,tpch.lineitem.l_shipmode,tpch.lineitem.l_comment)
>  |
> | 01:AGGREGATE | 3  | 4.37s| 4.70s| 6.00M | 6.00M  | 36.77 MB 
>  | 1.62 GB   | STREAMING  
>   
>   
>   
>   
>|
> | 00:SCAN HDFS | 3  | 437.14ms | 480.60ms | 6.00M | 

[jira] [Assigned] (IMPALA-7814) AggregationNode's memory estimate should be based on NDV only for non-grouping aggs

2018-11-05 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-7814:


Assignee: Pooja Nilangekar

> AggregationNode's memory estimate should be based on NDV only for 
> non-grouping aggs 
> 
>
> Key: IMPALA-7814
> URL: https://issues.apache.org/jira/browse/IMPALA-7814
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Major
>
> Currently, the AggregationNode always computes the NDV to estimate the number 
> of rows. However, for grouping aggregates, the entire input has to be 
> consumed before the output can be produced, hence its memory estimate should 
> not consider the NDV.  This is acceptable for non-grouping aggregates because 
> it only need to store the value expression during the build phase, instead of 
> the entire tuple. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7814) AggregationNode's memory estimate should be based on NDV only for non-grouping aggs

2018-11-05 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar updated IMPALA-7814:
-
Description: Currently, the AggregationNode always computes the NDV to 
estimate the number of rows. However, for grouping aggregates, the entire input 
has to be consumed before the output can be produced, hence its memory estimate 
should not consider the NDV.  This is acceptable for non-grouping aggregates 
because it only need to store the value expression during the build phase, 
instead of the entire tuple. 
Summary: AggregationNode's memory estimate should be based on NDV only 
for non-grouping aggs   (was: Aggregation Node's memory estimate should be 
based on NDV only for non-grouping aggs )

> AggregationNode's memory estimate should be based on NDV only for 
> non-grouping aggs 
> 
>
> Key: IMPALA-7814
> URL: https://issues.apache.org/jira/browse/IMPALA-7814
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Pooja Nilangekar
>Priority: Major
>
> Currently, the AggregationNode always computes the NDV to estimate the number 
> of rows. However, for grouping aggregates, the entire input has to be 
> consumed before the output can be produced, hence its memory estimate should 
> not consider the NDV.  This is acceptable for non-grouping aggregates because 
> it only need to store the value expression during the build phase, instead of 
> the entire tuple. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7814) Aggregation Node's memory estimate should be based on NDV only for non-grouping aggs

2018-11-05 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar updated IMPALA-7814:
-
Summary: Aggregation Node's memory estimate should be based on NDV only for 
non-grouping aggs   (was: Aggregation Node)

> Aggregation Node's memory estimate should be based on NDV only for 
> non-grouping aggs 
> -
>
> Key: IMPALA-7814
> URL: https://issues.apache.org/jira/browse/IMPALA-7814
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Pooja Nilangekar
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7814) Aggregation Node

2018-11-05 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7814:


 Summary: Aggregation Node
 Key: IMPALA-7814
 URL: https://issues.apache.org/jira/browse/IMPALA-7814
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Pooja Nilangekar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7814) Aggregation Node

2018-11-05 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7814:


 Summary: Aggregation Node
 Key: IMPALA-7814
 URL: https://issues.apache.org/jira/browse/IMPALA-7814
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Pooja Nilangekar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IMPALA-7791) Aggregation Node memory estimates don't account for number of fragment instances

2018-11-01 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar updated IMPALA-7791:
-
Priority: Blocker  (was: Major)

> Aggregation Node memory estimates don't account for number of fragment 
> instances
> 
>
> Key: IMPALA-7791
> URL: https://issues.apache.org/jira/browse/IMPALA-7791
> Project: IMPALA
>  Issue Type: Sub-task
>Affects Versions: Impala 3.1.0
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
>
> AggregationNode's memory estimates are calculated based on the input 
> cardinality of the node, without accounting for the division of input data 
> across fragment instances. This results in very high memory estimates. In 
> reality, the nodes often use only a part of this memory.   
> Example query:
> {code:java}
> [localhost:21000] default> select distinct * from tpch.lineitem limit 5; 
> {code}
> Summary: 
> {code:java}
> +--++--+--+---++---+---+---+
> | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem 
>  | Est. Peak Mem | Detail 
>   
>   
>   
>   
>|
> +--++--+--+---++---+---+---+
> | 04:EXCHANGE  | 1  | 21.24us  | 21.24us  | 5 | 5  | 48.00 KB 
>  | 16.00 KB  | UNPARTITIONED  
>   
>   
>   
>   
>|
> | 03:AGGREGATE | 3  | 5.11s| 5.15s| 15| 5  | 576.21 
> MB | 1.62 GB   | FINALIZE 
>   
>   
>   
>   
>  |
> | 02:EXCHANGE  | 3  | 709.75ms | 728.91ms | 6.00M | 6.00M  | 5.46 MB  
>  | 10.78 MB  | 
> HASH(tpch.lineitem.l_orderkey,tpch.lineitem.l_partkey,tpch.lineitem.l_suppkey,tpch.lineitem.l_linenumber,tpch.lineitem.l_quantity,tpch.lineitem.l_extendedprice,tpch.lineitem.l_discount,tpch.lineitem.l_tax,tpch.lineitem.l_returnflag,tpch.lineitem.l_linestatus,tpch.lineitem.l_shipdate,tpch.lineitem.l_commitdate,tpch.lineitem.l_receiptdate,tpch.lineitem.l_shipinstruct,tpch.lineitem.l_shipmode,tpch.lineitem.l_comment)
>  |
> | 01:AGGREGATE | 3  | 4.37s| 4.70s| 6.00M | 6.00M  | 36.77 MB 
>  | 1.62 GB   | STREAMING  
>   
>   
>   
>   
>|
> | 00:SCAN HDFS | 3  | 437.14ms | 480.60ms | 6.00M | 6.00M  | 65.51 MB 
>  | 264.00 MB | tpch.lineitem  

[jira] [Resolved] (IMPALA-7363) Spurious error generated by sequence file scanner with weird scan range length

2018-11-01 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7363.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Spurious error generated by sequence file scanner with weird scan range length
> --
>
> Key: IMPALA-7363
> URL: https://issues.apache.org/jira/browse/IMPALA-7363
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Critical
>  Labels: avro
> Fix For: Impala 3.1.0
>
>
> Repro on master
> {noformat}
> tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ impala-shell.sh
> Starting Impala Shell without Kerberos authentication
> Connected to localhost:21000
> Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build 
> cec33fa0ae75392668273d40b5a1bc4bbd7e9e2e)
> ***
> Welcome to the Impala shell.
> (Impala Shell v3.1.0-SNAPSHOT (cec33fa) built on Thu Jul 26 09:50:10 PDT 2018)
> To see a summary of a query's progress that updates in real-time, run 'set
> LIVE_PROGRESS=1;'.
> ***
> [localhost:21000] default> use tpch_seq_snap;
> Query: use tpch_seq_snap
> [localhost:21000] tpch_seq_snap> SET max_scan_range_length=5377;
> MAX_SCAN_RANGE_LENGTH set to 5377
> [localhost:21000] tpch_seq_snap> select count(*)
>> from lineitem;
> Query: select count(*)
> from lineitem
> Query submitted at: 2018-07-26 14:10:18 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=e9428efe173ad2f4:84b66bdb
> +--+
> | count(*) |
> +--+
> | 5993651  |
> +--+
> WARNINGS: SkipText: length is negative
> Problem parsing file 
> hdfs://localhost:20500/test-warehouse/tpch.lineitem_seq_snap/00_0 at 
> 36472193
> {noformat}
> Found while adding a test for IMPALA-7360



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7363) Spurious error generated by sequence file scanner with weird scan range length

2018-11-01 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7363.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Spurious error generated by sequence file scanner with weird scan range length
> --
>
> Key: IMPALA-7363
> URL: https://issues.apache.org/jira/browse/IMPALA-7363
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Critical
>  Labels: avro
> Fix For: Impala 3.1.0
>
>
> Repro on master
> {noformat}
> tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ impala-shell.sh
> Starting Impala Shell without Kerberos authentication
> Connected to localhost:21000
> Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build 
> cec33fa0ae75392668273d40b5a1bc4bbd7e9e2e)
> ***
> Welcome to the Impala shell.
> (Impala Shell v3.1.0-SNAPSHOT (cec33fa) built on Thu Jul 26 09:50:10 PDT 2018)
> To see a summary of a query's progress that updates in real-time, run 'set
> LIVE_PROGRESS=1;'.
> ***
> [localhost:21000] default> use tpch_seq_snap;
> Query: use tpch_seq_snap
> [localhost:21000] tpch_seq_snap> SET max_scan_range_length=5377;
> MAX_SCAN_RANGE_LENGTH set to 5377
> [localhost:21000] tpch_seq_snap> select count(*)
>> from lineitem;
> Query: select count(*)
> from lineitem
> Query submitted at: 2018-07-26 14:10:18 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=e9428efe173ad2f4:84b66bdb
> +--+
> | count(*) |
> +--+
> | 5993651  |
> +--+
> WARNINGS: SkipText: length is negative
> Problem parsing file 
> hdfs://localhost:20500/test-warehouse/tpch.lineitem_seq_snap/00_0 at 
> 36472193
> {noformat}
> Found while adding a test for IMPALA-7360



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7791) Aggregation Node memory estimates don't account for number of fragment instances

2018-10-30 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-7791:


Assignee: Pooja Nilangekar

> Aggregation Node memory estimates don't account for number of fragment 
> instances
> 
>
> Key: IMPALA-7791
> URL: https://issues.apache.org/jira/browse/IMPALA-7791
> Project: IMPALA
>  Issue Type: Sub-task
>Affects Versions: Impala 3.1.0
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Major
>
> AggregationNode's memory estimates are calculated based on the input 
> cardinality of the node, without accounting for the division of input data 
> across fragment instances. This results in very high memory estimates. In 
> reality, the nodes often use only a part of this memory.   
> Example query:
> {code:java}
> [localhost:21000] default> select distinct * from tpch.lineitem limit 5; 
> {code}
> Summary: 
> {code:java}
> +--++--+--+---++---+---+---+
> | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem 
>  | Est. Peak Mem | Detail 
>   
>   
>   
>   
>|
> +--++--+--+---++---+---+---+
> | 04:EXCHANGE  | 1  | 21.24us  | 21.24us  | 5 | 5  | 48.00 KB 
>  | 16.00 KB  | UNPARTITIONED  
>   
>   
>   
>   
>|
> | 03:AGGREGATE | 3  | 5.11s| 5.15s| 15| 5  | 576.21 
> MB | 1.62 GB   | FINALIZE 
>   
>   
>   
>   
>  |
> | 02:EXCHANGE  | 3  | 709.75ms | 728.91ms | 6.00M | 6.00M  | 5.46 MB  
>  | 10.78 MB  | 
> HASH(tpch.lineitem.l_orderkey,tpch.lineitem.l_partkey,tpch.lineitem.l_suppkey,tpch.lineitem.l_linenumber,tpch.lineitem.l_quantity,tpch.lineitem.l_extendedprice,tpch.lineitem.l_discount,tpch.lineitem.l_tax,tpch.lineitem.l_returnflag,tpch.lineitem.l_linestatus,tpch.lineitem.l_shipdate,tpch.lineitem.l_commitdate,tpch.lineitem.l_receiptdate,tpch.lineitem.l_shipinstruct,tpch.lineitem.l_shipmode,tpch.lineitem.l_comment)
>  |
> | 01:AGGREGATE | 3  | 4.37s| 4.70s| 6.00M | 6.00M  | 36.77 MB 
>  | 1.62 GB   | STREAMING  
>   
>   
>   
>   
>|
> | 00:SCAN HDFS | 3  | 437.14ms | 480.60ms | 6.00M | 6.00M  | 65.51 MB 
>  | 264.00 MB | tpch.lineitem  

[jira] [Updated] (IMPALA-7791) Aggregation Node memory estimates don't account for number of fragment instances

2018-10-30 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar updated IMPALA-7791:
-
Epic Color: ghx-label-7  (was: ghx-label-5)

> Aggregation Node memory estimates don't account for number of fragment 
> instances
> 
>
> Key: IMPALA-7791
> URL: https://issues.apache.org/jira/browse/IMPALA-7791
> Project: IMPALA
>  Issue Type: Sub-task
>Affects Versions: Impala 3.1.0
>Reporter: Pooja Nilangekar
>Priority: Major
>
> AggregationNode's memory estimates are calculated based on the input 
> cardinality of the node, without accounting for the division of input data 
> across fragment instances. This results in very high memory estimates. In 
> reality, the nodes often use only a part of this memory.   
> Example query:
> {code:java}
> [localhost:21000] default> select distinct * from tpch.lineitem limit 5; 
> {code}
> Summary: 
> {code:java}
> +--++--+--+---++---+---+---+
> | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem 
>  | Est. Peak Mem | Detail 
>   
>   
>   
>   
>|
> +--++--+--+---++---+---+---+
> | 04:EXCHANGE  | 1  | 21.24us  | 21.24us  | 5 | 5  | 48.00 KB 
>  | 16.00 KB  | UNPARTITIONED  
>   
>   
>   
>   
>|
> | 03:AGGREGATE | 3  | 5.11s| 5.15s| 15| 5  | 576.21 
> MB | 1.62 GB   | FINALIZE 
>   
>   
>   
>   
>  |
> | 02:EXCHANGE  | 3  | 709.75ms | 728.91ms | 6.00M | 6.00M  | 5.46 MB  
>  | 10.78 MB  | 
> HASH(tpch.lineitem.l_orderkey,tpch.lineitem.l_partkey,tpch.lineitem.l_suppkey,tpch.lineitem.l_linenumber,tpch.lineitem.l_quantity,tpch.lineitem.l_extendedprice,tpch.lineitem.l_discount,tpch.lineitem.l_tax,tpch.lineitem.l_returnflag,tpch.lineitem.l_linestatus,tpch.lineitem.l_shipdate,tpch.lineitem.l_commitdate,tpch.lineitem.l_receiptdate,tpch.lineitem.l_shipinstruct,tpch.lineitem.l_shipmode,tpch.lineitem.l_comment)
>  |
> | 01:AGGREGATE | 3  | 4.37s| 4.70s| 6.00M | 6.00M  | 36.77 MB 
>  | 1.62 GB   | STREAMING  
>   
>   
>   
>   
>|
> | 00:SCAN HDFS | 3  | 437.14ms | 480.60ms | 6.00M | 6.00M  | 65.51 MB 
>  | 264.00 MB | tpch.lineitem

[jira] [Created] (IMPALA-7791) Aggregation Node memory estimates don't account for number of fragment instances

2018-10-30 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7791:


 Summary: Aggregation Node memory estimates don't account for 
number of fragment instances
 Key: IMPALA-7791
 URL: https://issues.apache.org/jira/browse/IMPALA-7791
 Project: IMPALA
  Issue Type: Sub-task
Affects Versions: Impala 3.1.0
Reporter: Pooja Nilangekar


AggregationNode's memory estimates are calculated based on the input 
cardinality of the node, without accounting for the division of input data 
across fragment instances. This results in very high memory estimates. In 
reality, the nodes often use only a part of this memory.   

Example query:

{code:java}
[localhost:21000] default> select distinct * from tpch.lineitem limit 5; 
{code}

Summary: 

{code:java}
+--++--+--+---++---+---+---+
| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem  
| Est. Peak Mem | Detail




|
+--++--+--+---++---+---+---+
| 04:EXCHANGE  | 1  | 21.24us  | 21.24us  | 5 | 5  | 48.00 KB  
| 16.00 KB  | UNPARTITIONED 




|
| 03:AGGREGATE | 3  | 5.11s| 5.15s| 15| 5  | 576.21 MB 
| 1.62 GB   | FINALIZE  




|
| 02:EXCHANGE  | 3  | 709.75ms | 728.91ms | 6.00M | 6.00M  | 5.46 MB   
| 10.78 MB  | 
HASH(tpch.lineitem.l_orderkey,tpch.lineitem.l_partkey,tpch.lineitem.l_suppkey,tpch.lineitem.l_linenumber,tpch.lineitem.l_quantity,tpch.lineitem.l_extendedprice,tpch.lineitem.l_discount,tpch.lineitem.l_tax,tpch.lineitem.l_returnflag,tpch.lineitem.l_linestatus,tpch.lineitem.l_shipdate,tpch.lineitem.l_commitdate,tpch.lineitem.l_receiptdate,tpch.lineitem.l_shipinstruct,tpch.lineitem.l_shipmode,tpch.lineitem.l_comment)
 |
| 01:AGGREGATE | 3  | 4.37s| 4.70s| 6.00M | 6.00M  | 36.77 MB  
| 1.62 GB   | STREAMING 




|
| 00:SCAN HDFS | 3  | 437.14ms | 480.60ms | 6.00M | 6.00M  | 65.51 MB  
| 264.00 MB | tpch.lineitem 




   

[jira] [Created] (IMPALA-7791) Aggregation Node memory estimates don't account for number of fragment instances

2018-10-30 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7791:


 Summary: Aggregation Node memory estimates don't account for 
number of fragment instances
 Key: IMPALA-7791
 URL: https://issues.apache.org/jira/browse/IMPALA-7791
 Project: IMPALA
  Issue Type: Sub-task
Affects Versions: Impala 3.1.0
Reporter: Pooja Nilangekar


AggregationNode's memory estimates are calculated based on the input 
cardinality of the node, without accounting for the division of input data 
across fragment instances. This results in very high memory estimates. In 
reality, the nodes often use only a part of this memory.   

Example query:

{code:java}
[localhost:21000] default> select distinct * from tpch.lineitem limit 5; 
{code}

Summary: 

{code:java}
+--++--+--+---++---+---+---+
| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem  
| Est. Peak Mem | Detail




|
+--++--+--+---++---+---+---+
| 04:EXCHANGE  | 1  | 21.24us  | 21.24us  | 5 | 5  | 48.00 KB  
| 16.00 KB  | UNPARTITIONED 




|
| 03:AGGREGATE | 3  | 5.11s| 5.15s| 15| 5  | 576.21 MB 
| 1.62 GB   | FINALIZE  




|
| 02:EXCHANGE  | 3  | 709.75ms | 728.91ms | 6.00M | 6.00M  | 5.46 MB   
| 10.78 MB  | 
HASH(tpch.lineitem.l_orderkey,tpch.lineitem.l_partkey,tpch.lineitem.l_suppkey,tpch.lineitem.l_linenumber,tpch.lineitem.l_quantity,tpch.lineitem.l_extendedprice,tpch.lineitem.l_discount,tpch.lineitem.l_tax,tpch.lineitem.l_returnflag,tpch.lineitem.l_linestatus,tpch.lineitem.l_shipdate,tpch.lineitem.l_commitdate,tpch.lineitem.l_receiptdate,tpch.lineitem.l_shipinstruct,tpch.lineitem.l_shipmode,tpch.lineitem.l_comment)
 |
| 01:AGGREGATE | 3  | 4.37s| 4.70s| 6.00M | 6.00M  | 36.77 MB  
| 1.62 GB   | STREAMING 




|
| 00:SCAN HDFS | 3  | 437.14ms | 480.60ms | 6.00M | 6.00M  | 65.51 MB  
| 264.00 MB | tpch.lineitem 




   

[jira] [Comment Edited] (IMPALA-7363) Spurious error generated by sequence file scanner with weird scan range length

2018-10-30 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669233#comment-16669233
 ] 

Pooja Nilangekar edited comment on IMPALA-7363 at 10/30/18 8:41 PM:


This seems to be a non-deterministic bug, executing the same query multiple 
times produces different results. A stream on the same file, at the same 
file_offset() and same bytes_left() value, (i.e., These two streams are exactly 
identical locations) return different long values. In case of the error, the 
ScannerContext::ReadVLong() function returns -10434 while in the non-error case 
it returns 10433. However, the bytes are the exact same in each case. 0x8e for 
the firstbyte and the value is always 10433. -The bug is somewhere in the 
ReadWriteUtil::IsNegativeVInt(). Not sure how "return byte < -120 || (byte >= 
-112 && byte < 0);" can be non-deterministic. Ideally, it should always return 
false for byte = -114 (0x8e). -

So it looks like the firstbyte value is overwritten during subsequent calls to 
GetBytes because the output buffer is owned by the stream and can be cleared to 
read more bytes. 


was (Author: poojanilangekar):
This seems to be a non-deterministic bug, executing the same query multiple 
times produces different results. A stream on the same file, at the same 
file_offset() and same bytes_left() value, (i.e., These two streams are exactly 
identical locations) return different long values. In case of the error, the 
ScannerContext::ReadVLong() function returns -10434 while in the non-error case 
it returns 10433. However, the bytes are the exact same in each case. 0x8e for 
the firstbyte and the value is always 10433. -The bug is somewhere in the 
ReadWriteUtil::IsNegativeVInt(). Not sure how "return byte < -120 || (byte >= 
-112 && byte < 0);" can be non-deterministic. Ideally, it should always return 
false for byte = -114 (0x8e). -

So it looks like the firstbyte value is overwritten during subsequent calls to 
GetBytes because the output buffer is owned by the stream and can be cleared to 
read more bytes. 

> Spurious error generated by sequence file scanner with weird scan range length
> --
>
> Key: IMPALA-7363
> URL: https://issues.apache.org/jira/browse/IMPALA-7363
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: avro
>
> Repro on master
> {noformat}
> tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ impala-shell.sh
> Starting Impala Shell without Kerberos authentication
> Connected to localhost:21000
> Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build 
> cec33fa0ae75392668273d40b5a1bc4bbd7e9e2e)
> ***
> Welcome to the Impala shell.
> (Impala Shell v3.1.0-SNAPSHOT (cec33fa) built on Thu Jul 26 09:50:10 PDT 2018)
> To see a summary of a query's progress that updates in real-time, run 'set
> LIVE_PROGRESS=1;'.
> ***
> [localhost:21000] default> use tpch_seq_snap;
> Query: use tpch_seq_snap
> [localhost:21000] tpch_seq_snap> SET max_scan_range_length=5377;
> MAX_SCAN_RANGE_LENGTH set to 5377
> [localhost:21000] tpch_seq_snap> select count(*)
>> from lineitem;
> Query: select count(*)
> from lineitem
> Query submitted at: 2018-07-26 14:10:18 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=e9428efe173ad2f4:84b66bdb
> +--+
> | count(*) |
> +--+
> | 5993651  |
> +--+
> WARNINGS: SkipText: length is negative
> Problem parsing file 
> hdfs://localhost:20500/test-warehouse/tpch.lineitem_seq_snap/00_0 at 
> 36472193
> {noformat}
> Found while adding a test for IMPALA-7360



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-7363) Spurious error generated by sequence file scanner with weird scan range length

2018-10-30 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669233#comment-16669233
 ] 

Pooja Nilangekar edited comment on IMPALA-7363 at 10/30/18 8:41 PM:


This seems to be a non-deterministic bug, executing the same query multiple 
times produces different results. A stream on the same file, at the same 
file_offset() and same bytes_left() value, (i.e., These two streams are exactly 
identical locations) return different long values. In case of the error, the 
ScannerContext::ReadVLong() function returns -10434 while in the non-error case 
it returns 10433. However, the bytes are the exact same in each case. 0x8e for 
the firstbyte and the value is always 10433. -The bug is somewhere in the 
ReadWriteUtil::IsNegativeVInt(). Not sure how "return byte < -120 || (byte >= 
-112 && byte < 0);" can be non-deterministic. Ideally, it should always return 
false for byte = -114 (0x8e). -

So it looks like the firstbyte value is overwritten during subsequent calls to 
GetBytes because the output buffer is owned by the stream and can be cleared to 
read more bytes. 


was (Author: poojanilangekar):
This seems to be a non-deterministic bug, executing the same query multiple 
times produces different results. A stream on the same file, at the same 
file_offset() and same bytes_left() value, (i.e., These two streams are exactly 
identical locations) return different long values. In case of the error, the 
ScannerContext::ReadVLong() function returns -10434 while in the non-error case 
it returns 10433. However, the bytes are the exact same in each case. 0x8e for 
the firstbyte and the value is always 10433. The bug is somewhere in the 
ReadWriteUtil::IsNegativeVInt(). Not sure how "return byte < -120 || (byte >= 
-112 && byte < 0);" can be non-deterministic. Ideally, it should always return 
false for byte = -114 (0x8e). 

[~tarmstrong] Do you have any idea what might be going on here?


> Spurious error generated by sequence file scanner with weird scan range length
> --
>
> Key: IMPALA-7363
> URL: https://issues.apache.org/jira/browse/IMPALA-7363
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: avro
>
> Repro on master
> {noformat}
> tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ impala-shell.sh
> Starting Impala Shell without Kerberos authentication
> Connected to localhost:21000
> Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build 
> cec33fa0ae75392668273d40b5a1bc4bbd7e9e2e)
> ***
> Welcome to the Impala shell.
> (Impala Shell v3.1.0-SNAPSHOT (cec33fa) built on Thu Jul 26 09:50:10 PDT 2018)
> To see a summary of a query's progress that updates in real-time, run 'set
> LIVE_PROGRESS=1;'.
> ***
> [localhost:21000] default> use tpch_seq_snap;
> Query: use tpch_seq_snap
> [localhost:21000] tpch_seq_snap> SET max_scan_range_length=5377;
> MAX_SCAN_RANGE_LENGTH set to 5377
> [localhost:21000] tpch_seq_snap> select count(*)
>> from lineitem;
> Query: select count(*)
> from lineitem
> Query submitted at: 2018-07-26 14:10:18 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=e9428efe173ad2f4:84b66bdb
> +--+
> | count(*) |
> +--+
> | 5993651  |
> +--+
> WARNINGS: SkipText: length is negative
> Problem parsing file 
> hdfs://localhost:20500/test-warehouse/tpch.lineitem_seq_snap/00_0 at 
> 36472193
> {noformat}
> Found while adding a test for IMPALA-7360



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7363) Spurious error generated by sequence file scanner with weird scan range length

2018-10-30 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669233#comment-16669233
 ] 

Pooja Nilangekar commented on IMPALA-7363:
--

This seems to be a non-deterministic bug, executing the same query multiple 
times produces different results. A stream on the same file, at the same 
file_offset() and same bytes_left() value, (i.e., These two streams are exactly 
identical locations) return different long values. In case of the error, the 
ScannerContext::ReadVLong() function returns -10434 while in the non-error case 
it returns 10433. However, the bytes are the exact same in each case. 0x8e for 
the firstbyte and the value is always 10433. The bug is somewhere in the 
ReadWriteUtil::IsNegativeVInt(). Not sure how "return byte < -120 || (byte >= 
-112 && byte < 0);" can be non-deterministic. Ideally, it should always return 
false for byte = -114 (0x8e). 

[~tarmstrong] Do you have any idea what might be going on here?


> Spurious error generated by sequence file scanner with weird scan range length
> --
>
> Key: IMPALA-7363
> URL: https://issues.apache.org/jira/browse/IMPALA-7363
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: avro
>
> Repro on master
> {noformat}
> tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ impala-shell.sh
> Starting Impala Shell without Kerberos authentication
> Connected to localhost:21000
> Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build 
> cec33fa0ae75392668273d40b5a1bc4bbd7e9e2e)
> ***
> Welcome to the Impala shell.
> (Impala Shell v3.1.0-SNAPSHOT (cec33fa) built on Thu Jul 26 09:50:10 PDT 2018)
> To see a summary of a query's progress that updates in real-time, run 'set
> LIVE_PROGRESS=1;'.
> ***
> [localhost:21000] default> use tpch_seq_snap;
> Query: use tpch_seq_snap
> [localhost:21000] tpch_seq_snap> SET max_scan_range_length=5377;
> MAX_SCAN_RANGE_LENGTH set to 5377
> [localhost:21000] tpch_seq_snap> select count(*)
>> from lineitem;
> Query: select count(*)
> from lineitem
> Query submitted at: 2018-07-26 14:10:18 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=e9428efe173ad2f4:84b66bdb
> +--+
> | count(*) |
> +--+
> | 5993651  |
> +--+
> WARNINGS: SkipText: length is negative
> Problem parsing file 
> hdfs://localhost:20500/test-warehouse/tpch.lineitem_seq_snap/00_0 at 
> 36472193
> {noformat}
> Found while adding a test for IMPALA-7360



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7749) Merge aggregation node memory estimate is incorrectly influenced by limit

2018-10-29 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7749.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Merge aggregation node memory estimate is incorrectly influenced by limit
> -
>
> Key: IMPALA-7749
> URL: https://issues.apache.org/jira/browse/IMPALA-7749
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Critical
> Fix For: Impala 3.1.0
>
>
> In the below query the estimate for node ID 3 is too low. If you remove the 
> limit it is correct. 
> {noformat}
> [localhost:21000] default> set explain_level=2; explain select l_orderkey, 
> l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5;
> EXPLAIN_LEVEL set to 2
> Query: explain select l_orderkey, l_partkey, l_linenumber, count(*) from 
> tpch.lineitem group by 1, 2, 3 limit 5
> +---+
> | Explain String  
>   |
> +---+
> | Max Per-Host Resource Reservation: Memory=43.94MB Threads=4 
>   |
> | Per-Host Resource Estimates: Memory=450MB   
>   |
> | 
>   |
> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
>   |
> | |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
> thread-reservation=1|
> | PLAN-ROOT SINK  
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |   
>   |
> | 04:EXCHANGE [UNPARTITIONED] 
>   |
> | |  limit: 5 
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |  tuple-ids=1 row-size=28B cardinality=5   
>   |
> | |  in pipelines: 03(GETNEXT)
>   |
> | |   
>   |
> | F01:PLAN FRAGMENT [HASH(l_orderkey,l_partkey,l_linenumber)] hosts=3 
> instances=3   |
> | Per-Host Resources: mem-estimate=10.00MB mem-reservation=1.94MB 
> thread-reservation=1  |
> | 03:AGGREGATE [FINALIZE] 
>   |
> | |  output: count:merge(*)   
>   |
> | |  group by: l_orderkey, l_partkey, l_linenumber
>   |
> | |  limit: 5 
>   |
> | |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0  |
> | |  tuple-ids=1 row-size=28B cardinality=5   
>   |
> | |  in pipelines: 03(GETNEXT), 00(OPEN)  
>   |
> | |   
>   |
> | 02:EXCHANGE [HASH(l_orderkey,l_partkey,l_linenumber)]   
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |  tuple-ids=1 row-size=28B cardinality=6001215 
>   |
> | |  in pipelines: 00(GETNEXT)
>   |
> | |   
>   |
> | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3  
>   |
> | Per-Host Resources: mem-estimate=440.27MB mem-reservation=42.00MB 
> thread-reservation=2|
> | 01:AGGREGATE [STREAMING]
>   |
> | |  output: count(*) 
>   |
> | |  group by: l_orderkey, l_partkey, l_linenumber
>   |
> | |  mem-estimate=176.27MB mem-reservation=34.00MB spill-buffer=2.00MB 
> 

[jira] [Resolved] (IMPALA-7749) Merge aggregation node memory estimate is incorrectly influenced by limit

2018-10-29 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7749.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Merge aggregation node memory estimate is incorrectly influenced by limit
> -
>
> Key: IMPALA-7749
> URL: https://issues.apache.org/jira/browse/IMPALA-7749
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Critical
> Fix For: Impala 3.1.0
>
>
> In the below query the estimate for node ID 3 is too low. If you remove the 
> limit it is correct. 
> {noformat}
> [localhost:21000] default> set explain_level=2; explain select l_orderkey, 
> l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5;
> EXPLAIN_LEVEL set to 2
> Query: explain select l_orderkey, l_partkey, l_linenumber, count(*) from 
> tpch.lineitem group by 1, 2, 3 limit 5
> +---+
> | Explain String  
>   |
> +---+
> | Max Per-Host Resource Reservation: Memory=43.94MB Threads=4 
>   |
> | Per-Host Resource Estimates: Memory=450MB   
>   |
> | 
>   |
> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
>   |
> | |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
> thread-reservation=1|
> | PLAN-ROOT SINK  
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |   
>   |
> | 04:EXCHANGE [UNPARTITIONED] 
>   |
> | |  limit: 5 
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |  tuple-ids=1 row-size=28B cardinality=5   
>   |
> | |  in pipelines: 03(GETNEXT)
>   |
> | |   
>   |
> | F01:PLAN FRAGMENT [HASH(l_orderkey,l_partkey,l_linenumber)] hosts=3 
> instances=3   |
> | Per-Host Resources: mem-estimate=10.00MB mem-reservation=1.94MB 
> thread-reservation=1  |
> | 03:AGGREGATE [FINALIZE] 
>   |
> | |  output: count:merge(*)   
>   |
> | |  group by: l_orderkey, l_partkey, l_linenumber
>   |
> | |  limit: 5 
>   |
> | |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0  |
> | |  tuple-ids=1 row-size=28B cardinality=5   
>   |
> | |  in pipelines: 03(GETNEXT), 00(OPEN)  
>   |
> | |   
>   |
> | 02:EXCHANGE [HASH(l_orderkey,l_partkey,l_linenumber)]   
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |  tuple-ids=1 row-size=28B cardinality=6001215 
>   |
> | |  in pipelines: 00(GETNEXT)
>   |
> | |   
>   |
> | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3  
>   |
> | Per-Host Resources: mem-estimate=440.27MB mem-reservation=42.00MB 
> thread-reservation=2|
> | 01:AGGREGATE [STREAMING]
>   |
> | |  output: count(*) 
>   |
> | |  group by: l_orderkey, l_partkey, l_linenumber
>   |
> | |  mem-estimate=176.27MB mem-reservation=34.00MB spill-buffer=2.00MB 
> 

[jira] [Commented] (IMPALA-7780) Rebase PlannerTest expected output for estimates, errors

2018-10-29 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667720#comment-16667720
 ] 

Pooja Nilangekar commented on IMPALA-7780:
--

I have faced the issue of different estimates earlier. After asking around and 
poking though the dataload, I found that the file sizes and hence the estimates 
may vary between different instances of data load. You might end up using 
less/more disk space to load the exact same data on the same/ another machine 
with identical specs. So I am not sure rebase would actually solve this issue. 

> Rebase PlannerTest expected output for estimates, errors
> 
>
> Key: IMPALA-7780
> URL: https://issues.apache.org/jira/browse/IMPALA-7780
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Priority: Trivial
>
> The front-end includes the {{PlannerTest}} test which works by running a 
> query, writing the plan to a file, comparing selected parts of the file to 
> expected results, and flagging if the results differ.
> A plan includes some things we test (operators) and some we do not (text of 
> error messages, value of memory estimates). Over time the expected and actual 
> files have drifted apart. Example:
> {noformat}
> Expected:partitions=1/1 files=2 size=54.20MB
> Actual:  partitions=1/1 files=2 size=54.21MB
> {noformat}
> While the tests still pass (because we ignore the parts which have drifted), 
> it is a pain to track down issues because we must learn to manually ignore 
> "unimportant" differences.
> This ticket asks to "rebase" planner tests on the latest results, copying 
> into the expected results file the current "noise" values from the actual 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7749) Merge aggregation node memory estimate is incorrectly influenced by limit

2018-10-24 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-7749:


Assignee: Pooja Nilangekar  (was: Bikramjeet Vig)

> Merge aggregation node memory estimate is incorrectly influenced by limit
> -
>
> Key: IMPALA-7749
> URL: https://issues.apache.org/jira/browse/IMPALA-7749
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Critical
>
> In the below query the estimate for node ID 3 is too low. If you remove the 
> limit it is correct. 
> {noformat}
> [localhost:21000] default> set explain_level=2; explain select l_orderkey, 
> l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5;
> EXPLAIN_LEVEL set to 2
> Query: explain select l_orderkey, l_partkey, l_linenumber, count(*) from 
> tpch.lineitem group by 1, 2, 3 limit 5
> +---+
> | Explain String  
>   |
> +---+
> | Max Per-Host Resource Reservation: Memory=43.94MB Threads=4 
>   |
> | Per-Host Resource Estimates: Memory=450MB   
>   |
> | 
>   |
> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
>   |
> | |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
> thread-reservation=1|
> | PLAN-ROOT SINK  
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |   
>   |
> | 04:EXCHANGE [UNPARTITIONED] 
>   |
> | |  limit: 5 
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |  tuple-ids=1 row-size=28B cardinality=5   
>   |
> | |  in pipelines: 03(GETNEXT)
>   |
> | |   
>   |
> | F01:PLAN FRAGMENT [HASH(l_orderkey,l_partkey,l_linenumber)] hosts=3 
> instances=3   |
> | Per-Host Resources: mem-estimate=10.00MB mem-reservation=1.94MB 
> thread-reservation=1  |
> | 03:AGGREGATE [FINALIZE] 
>   |
> | |  output: count:merge(*)   
>   |
> | |  group by: l_orderkey, l_partkey, l_linenumber
>   |
> | |  limit: 5 
>   |
> | |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0  |
> | |  tuple-ids=1 row-size=28B cardinality=5   
>   |
> | |  in pipelines: 03(GETNEXT), 00(OPEN)  
>   |
> | |   
>   |
> | 02:EXCHANGE [HASH(l_orderkey,l_partkey,l_linenumber)]   
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |  tuple-ids=1 row-size=28B cardinality=6001215 
>   |
> | |  in pipelines: 00(GETNEXT)
>   |
> | |   
>   |
> | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3  
>   |
> | Per-Host Resources: mem-estimate=440.27MB mem-reservation=42.00MB 
> thread-reservation=2|
> | 01:AGGREGATE [STREAMING]
>   |
> | |  output: count(*) 
>   |
> | |  group by: l_orderkey, l_partkey, l_linenumber
>   |
> | |  mem-estimate=176.27MB mem-reservation=34.00MB spill-buffer=2.00MB 
> thread-reservation=0 |
> | |  

[jira] [Resolved] (IMPALA-7545) Add admission control status to query log

2018-10-17 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7545.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Add admission control status to query log
> -
>
> Key: IMPALA-7545
> URL: https://issues.apache.org/jira/browse/IMPALA-7545
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Critical
>  Labels: admission-control, observability
> Fix For: Impala 3.1.0
>
>
> We already include the query progress in the HS2 GetLog() response (although 
> for some reason we don't do the same for beeswax) so we should include 
> admission control progress. We should definitely include it if the query is 
> currently queued, it's probably too noisy to include once the query has been 
> admitted.
> We should also do the same for beeswax/impala-shell so that 
> live_progress/live_summary is useful if the query is queued. We should look 
> at the live_progress/live_summary mechanisms and extend those to include the 
> required information to report admission control state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IMPALA-7699) TestSpillingNoDebugActionDimensions fails earlier than expected

2018-10-12 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-7699:


Assignee: Bikramjeet Vig  (was: Tim Armstrong)

> TestSpillingNoDebugActionDimensions fails earlier than expected 
> 
>
> Key: IMPALA-7699
> URL: https://issues.apache.org/jira/browse/IMPALA-7699
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Bikramjeet Vig
>Priority: Critical
>  Labels: broken-build
>
> In some of the recent runs, the query fails without sufficient memory, 
> however it is in the HDFS scan node rather than the hash join node. Here are 
> the corresponding logs: 
> Stacktrace:
> {code:java}
> query_test/test_spilling.py:113: in test_spilling_no_debug_action 
> self.run_test_case('QueryTest/spilling-no-debug-action', vector) 
> common/impala_test_suite.py:466: in run_test_case 
> self.__verify_exceptions(test_section['CATCH'], str(e), use_db) 
> common/impala_test_suite.py:319: in __verify_exceptions (expected_str, 
> actual_str) E AssertionError: Unexpected exception string. Expected: 
> row_regex:.*Cannot perform hash join at node with id .*. Repartitioning did 
> not reduce the size of a spilled partition.* E Not found in actual: 
> ImpalaBeeswaxException: Query aborted:Memory limit exceeded: Failed to 
> allocate tuple bufferHDFS_SCAN_NODE (id=1) could not allocate 190.00 KB 
> without exceeding limit.Error occurred on backend localhost:22001 by fragment 
> 2e4f0f944d373848:9ae1d7e20002
> {code}
>  
> Here are the impalad logs: 
> {code:java}
> I1010 18:31:30.721693  7270 coordinator.cc:498] ExecState: query 
> id=2e4f0f944d373848:9ae1d7e2 
> finstance=2e4f0f944d373848:9ae1d7e20002 on host=localhost:22001 
> (EXECUTING -> ERROR) status=Memory limit exceeded: Failed to allocate tuple 
> buffer
> HDFS_SCAN_NODE (id=1) could not allocate 190.00 KB without exceeding limit.
> Error occurred on backend localhost:22001 by fragment 
> 2e4f0f944d373848:9ae1d7e20002
> Memory left in process limit: 9.19 GB
> Memory left in query limit: 157.62 KB
> Query(2e4f0f944d373848:9ae1d7e2): Limit=150.00 MB Reservation=117.25 
> MB ReservationLimit=118.00 MB OtherMemory=32.60 MB Total=149.85 MB 
> Peak=149.85 MB
>   Unclaimed reservations: Reservation=5.75 MB OtherMemory=0 Total=5.75 MB 
> Peak=55.75 MB
>   Fragment 2e4f0f944d373848:9ae1d7e20003: Reservation=2.00 MB 
> OtherMemory=22.20 MB Total=24.20 MB Peak=24.20 MB
> Runtime Filter Bank: Reservation=2.00 MB ReservationLimit=2.00 MB 
> OtherMemory=0 Total=2.00 MB Peak=2.00 MB
> SORT_NODE (id=3): Total=0 Peak=0
> HASH_JOIN_NODE (id=2): Total=42.25 KB Peak=42.25 KB
>   Exprs: Total=13.12 KB Peak=13.12 KB
>   Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=13.12 KB
> Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB
> HDFS_SCAN_NODE (id=0): Total=0 Peak=0
> EXCHANGE_NODE (id=4): Reservation=18.79 MB OtherMemory=235.89 KB 
> Total=19.02 MB Peak=19.02 MB
>   KrpcDeferredRpcs: Total=235.89 KB Peak=235.89 KB
> KrpcDataStreamSender (dst_id=5): Total=480.00 B Peak=480.00 B
> CodeGen: Total=3.13 MB Peak=3.13 MB
>   Fragment 2e4f0f944d373848:9ae1d7e20002: Reservation=109.50 MB 
> OtherMemory=10.39 MB Total=119.89 MB Peak=119.89 MB
> HDFS_SCAN_NODE (id=1): Reservation=109.50 MB OtherMemory=10.20 MB 
> Total=119.70 MB Peak=119.70 MB
>   Queued Batches: Total=6.12 MB Peak=6.12 MB
> KrpcDataStreamSender (dst_id=4): Total=688.00 B Peak=688.00 B
> CodeGen: Total=488.00 B Peak=51.00 KB
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7700) test_shell_commandline.TestImpalaShell.test_cancellation failure

2018-10-11 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7700:


 Summary: test_shell_commandline.TestImpalaShell.test_cancellation  
 failure
 Key: IMPALA-7700
 URL: https://issues.apache.org/jira/browse/IMPALA-7700
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar
Assignee: Thomas Tauber-Marshall


The query is not getting cancelled as expected. Here are the logs: 


{code:java}
Error Message

/data/jenkins/workspace/impala-cdh6.0.x-core/repos/Impala/tests/shell/test_shell_commandline.py:328:
 in test_cancellation result = p.get_result() shell/util.py:154: in 
get_result result.stdout, result.stderr = 
self.shell_process.communicate(input=stdin_input) 
/usr/lib64/python2.7/subprocess.py:800: in communicate return 
self._communicate(input) /usr/lib64/python2.7/subprocess.py:1401: in 
_communicate stdout, stderr = self._communicate_with_poll(input) 
/usr/lib64/python2.7/subprocess.py:1455: in _communicate_with_poll ready = 
poller.poll() E   Failed: Timeout >7200s
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7697) Query gets erased before reporting ExecSummary

2018-10-11 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-7697:


Assignee: Pooja Nilangekar

> Query gets erased before reporting ExecSummary
> --
>
> Key: IMPALA-7697
> URL: https://issues.apache.org/jira/browse/IMPALA-7697
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build
>
> In a recent build, certain queries went missing from the 
> ImpalaServer::query_log_index_ before the ImpalaServer::GetExecSumamry 
> function was invoked. Hence the test case failed. An easy (intermediate) fix 
> would be to increase FLAGS_query_log_size. However, ideally the query 
> shouldn't get erased before the ExecSummary has been reported to the client 
> via the beeswax/hs2 servers. 
> Here are the test logs:
> {code:java}
> Error Message
> query_test/test_resource_limits.py:45: in test_resource_limits 
> self.run_test_case('QueryTest/query-resource-limits', vector) 
> common/impala_test_suite.py:478: in run_test_case assert False, "Expected 
> exception: %s" % expected_str E   AssertionError: Expected exception: 
> row_regex:.*expired due to execution time limit of 2s000ms.*
> Stacktrace
> query_test/test_resource_limits.py:45: in test_resource_limits
> self.run_test_case('QueryTest/query-resource-limits', vector)
> common/impala_test_suite.py:478: in run_test_case
> assert False, "Expected exception: %s" % expected_str
> E   AssertionError: Expected exception: row_regex:.*expired due to execution 
> time limit of 2s000ms.*
> Standard Error
> -- executing against localhost:21000
> SET SCAN_BYTES_LIMIT="0";
> -- 2018-10-10 22:38:29,826 INFO MainThread: Started query 
> 8e45a13bc999749e:58175e16
> {code}
> Here are the impalad logs: 
> {code:java}
> impalad.INFO.20181010-191824.5460:I1010 22:38:29.825745 31580 
> impala-server.cc:1060] Registered query 
> query_id=8e45a13bc999749e:58175e16 
> session_id=43434de5f83010f9:7e0750ad7ad86b80
> impalad.INFO.20181010-191824.5460:I1010 22:38:29.826026 31580 
> impala-server.cc:1115] Query 8e45a13bc999749e:58175e16 has scan bytes 
> limit of 100.00 GB
> impalad.INFO.20181010-191824.5460:I1010 22:38:29.826328 31580 
> impala-beeswax-server.cc:197] get_results_metadata(): 
> query_id=8e45a13bc999749e:58175e16
> impalad..INFO.20181010-191824.5460:I1010 22:38:29.826584 31580 
> impala-server.cc:776] Query id 8e45a13bc999749e:58175e16 not found.
> impalad.INFO.20181010-191824.5460:I1010 22:38:29.826858 31580 
> impala-beeswax-server.cc:239] close(): 
> query_id=8e45a13bc999749e:58175e16
> impalad.INFO.20181010-191824.5460:I1010 22:38:29.826861 31580 
> impala-server.cc:1127] UnregisterQuery(): 
> query_id=8e45a13bc999749e:58175e16
> impalad.INFO.20181010-191824.5460:I1010 22:38:29.826864 31580 
> impala-server.cc:1238] Cancel(): query_id=8e45a13bc999749e:58175e16
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7697) Query gets erased before reporting ExecSummary

2018-10-11 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-7697:


Assignee: Bikramjeet Vig  (was: Pooja Nilangekar)

> Query gets erased before reporting ExecSummary
> --
>
> Key: IMPALA-7697
> URL: https://issues.apache.org/jira/browse/IMPALA-7697
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Bikramjeet Vig
>Priority: Blocker
>  Labels: broken-build
>
> In a recent build, certain queries went missing from the 
> ImpalaServer::query_log_index_ before the ImpalaServer::GetExecSumamry 
> function was invoked. Hence the test case failed. An easy (intermediate) fix 
> would be to increase FLAGS_query_log_size. However, ideally the query 
> shouldn't get erased before the ExecSummary has been reported to the client 
> via the beeswax/hs2 servers. 
> Here are the test logs:
> {code:java}
> Error Message
> query_test/test_resource_limits.py:45: in test_resource_limits 
> self.run_test_case('QueryTest/query-resource-limits', vector) 
> common/impala_test_suite.py:478: in run_test_case assert False, "Expected 
> exception: %s" % expected_str E   AssertionError: Expected exception: 
> row_regex:.*expired due to execution time limit of 2s000ms.*
> Stacktrace
> query_test/test_resource_limits.py:45: in test_resource_limits
> self.run_test_case('QueryTest/query-resource-limits', vector)
> common/impala_test_suite.py:478: in run_test_case
> assert False, "Expected exception: %s" % expected_str
> E   AssertionError: Expected exception: row_regex:.*expired due to execution 
> time limit of 2s000ms.*
> Standard Error
> -- executing against localhost:21000
> SET SCAN_BYTES_LIMIT="0";
> -- 2018-10-10 22:38:29,826 INFO MainThread: Started query 
> 8e45a13bc999749e:58175e16
> {code}
> Here are the impalad logs: 
> {code:java}
> impalad.INFO.20181010-191824.5460:I1010 22:38:29.825745 31580 
> impala-server.cc:1060] Registered query 
> query_id=8e45a13bc999749e:58175e16 
> session_id=43434de5f83010f9:7e0750ad7ad86b80
> impalad.INFO.20181010-191824.5460:I1010 22:38:29.826026 31580 
> impala-server.cc:1115] Query 8e45a13bc999749e:58175e16 has scan bytes 
> limit of 100.00 GB
> impalad.INFO.20181010-191824.5460:I1010 22:38:29.826328 31580 
> impala-beeswax-server.cc:197] get_results_metadata(): 
> query_id=8e45a13bc999749e:58175e16
> impalad..INFO.20181010-191824.5460:I1010 22:38:29.826584 31580 
> impala-server.cc:776] Query id 8e45a13bc999749e:58175e16 not found.
> impalad.INFO.20181010-191824.5460:I1010 22:38:29.826858 31580 
> impala-beeswax-server.cc:239] close(): 
> query_id=8e45a13bc999749e:58175e16
> impalad.INFO.20181010-191824.5460:I1010 22:38:29.826861 31580 
> impala-server.cc:1127] UnregisterQuery(): 
> query_id=8e45a13bc999749e:58175e16
> impalad.INFO.20181010-191824.5460:I1010 22:38:29.826864 31580 
> impala-server.cc:1238] Cancel(): query_id=8e45a13bc999749e:58175e16
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7699) TestSpillingNoDebugActionDimensions fails earlier than expected

2018-10-11 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7699:


 Summary: TestSpillingNoDebugActionDimensions fails earlier than 
expected 
 Key: IMPALA-7699
 URL: https://issues.apache.org/jira/browse/IMPALA-7699
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar
Assignee: Tim Armstrong


In some of the recent runs, the query fails without sufficient memory, however 
it is in the HDFS scan node rather than the hash join node. Here are the 
corresponding logs: 

Stacktrace:
{code:java}
query_test/test_spilling.py:113: in test_spilling_no_debug_action 
self.run_test_case('QueryTest/spilling-no-debug-action', vector) 
common/impala_test_suite.py:466: in run_test_case 
self.__verify_exceptions(test_section['CATCH'], str(e), use_db) 
common/impala_test_suite.py:319: in __verify_exceptions (expected_str, 
actual_str) E AssertionError: Unexpected exception string. Expected: 
row_regex:.*Cannot perform hash join at node with id .*. Repartitioning did not 
reduce the size of a spilled partition.* E Not found in actual: 
ImpalaBeeswaxException: Query aborted:Memory limit exceeded: Failed to allocate 
tuple bufferHDFS_SCAN_NODE (id=1) could not allocate 190.00 KB without 
exceeding limit.Error occurred on backend localhost:22001 by fragment 
2e4f0f944d373848:9ae1d7e20002
{code}
 

Here are the impalad logs: 
{code:java}
I1010 18:31:30.721693  7270 coordinator.cc:498] ExecState: query 
id=2e4f0f944d373848:9ae1d7e2 
finstance=2e4f0f944d373848:9ae1d7e20002 on host=localhost:22001 (EXECUTING 
-> ERROR) status=Memory limit exceeded: Failed to allocate tuple buffer
HDFS_SCAN_NODE (id=1) could not allocate 190.00 KB without exceeding limit.
Error occurred on backend localhost:22001 by fragment 
2e4f0f944d373848:9ae1d7e20002
Memory left in process limit: 9.19 GB
Memory left in query limit: 157.62 KB
Query(2e4f0f944d373848:9ae1d7e2): Limit=150.00 MB Reservation=117.25 MB 
ReservationLimit=118.00 MB OtherMemory=32.60 MB Total=149.85 MB Peak=149.85 MB
  Unclaimed reservations: Reservation=5.75 MB OtherMemory=0 Total=5.75 MB 
Peak=55.75 MB
  Fragment 2e4f0f944d373848:9ae1d7e20003: Reservation=2.00 MB 
OtherMemory=22.20 MB Total=24.20 MB Peak=24.20 MB
Runtime Filter Bank: Reservation=2.00 MB ReservationLimit=2.00 MB 
OtherMemory=0 Total=2.00 MB Peak=2.00 MB
SORT_NODE (id=3): Total=0 Peak=0
HASH_JOIN_NODE (id=2): Total=42.25 KB Peak=42.25 KB
  Exprs: Total=13.12 KB Peak=13.12 KB
  Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=13.12 KB
Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB
HDFS_SCAN_NODE (id=0): Total=0 Peak=0
EXCHANGE_NODE (id=4): Reservation=18.79 MB OtherMemory=235.89 KB 
Total=19.02 MB Peak=19.02 MB
  KrpcDeferredRpcs: Total=235.89 KB Peak=235.89 KB
KrpcDataStreamSender (dst_id=5): Total=480.00 B Peak=480.00 B
CodeGen: Total=3.13 MB Peak=3.13 MB
  Fragment 2e4f0f944d373848:9ae1d7e20002: Reservation=109.50 MB 
OtherMemory=10.39 MB Total=119.89 MB Peak=119.89 MB
HDFS_SCAN_NODE (id=1): Reservation=109.50 MB OtherMemory=10.20 MB 
Total=119.70 MB Peak=119.70 MB
  Queued Batches: Total=6.12 MB Peak=6.12 MB
KrpcDataStreamSender (dst_id=4): Total=688.00 B Peak=688.00 B
CodeGen: Total=488.00 B Peak=51.00 KB
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7699) TestSpillingNoDebugActionDimensions fails earlier than expected

2018-10-11 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7699:


 Summary: TestSpillingNoDebugActionDimensions fails earlier than 
expected 
 Key: IMPALA-7699
 URL: https://issues.apache.org/jira/browse/IMPALA-7699
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar
Assignee: Tim Armstrong


In some of the recent runs, the query fails without sufficient memory, however 
it is in the HDFS scan node rather than the hash join node. Here are the 
corresponding logs: 

Stacktrace:
{code:java}
query_test/test_spilling.py:113: in test_spilling_no_debug_action 
self.run_test_case('QueryTest/spilling-no-debug-action', vector) 
common/impala_test_suite.py:466: in run_test_case 
self.__verify_exceptions(test_section['CATCH'], str(e), use_db) 
common/impala_test_suite.py:319: in __verify_exceptions (expected_str, 
actual_str) E AssertionError: Unexpected exception string. Expected: 
row_regex:.*Cannot perform hash join at node with id .*. Repartitioning did not 
reduce the size of a spilled partition.* E Not found in actual: 
ImpalaBeeswaxException: Query aborted:Memory limit exceeded: Failed to allocate 
tuple bufferHDFS_SCAN_NODE (id=1) could not allocate 190.00 KB without 
exceeding limit.Error occurred on backend localhost:22001 by fragment 
2e4f0f944d373848:9ae1d7e20002
{code}
 

Here are the impalad logs: 
{code:java}
I1010 18:31:30.721693  7270 coordinator.cc:498] ExecState: query 
id=2e4f0f944d373848:9ae1d7e2 
finstance=2e4f0f944d373848:9ae1d7e20002 on host=localhost:22001 (EXECUTING 
-> ERROR) status=Memory limit exceeded: Failed to allocate tuple buffer
HDFS_SCAN_NODE (id=1) could not allocate 190.00 KB without exceeding limit.
Error occurred on backend localhost:22001 by fragment 
2e4f0f944d373848:9ae1d7e20002
Memory left in process limit: 9.19 GB
Memory left in query limit: 157.62 KB
Query(2e4f0f944d373848:9ae1d7e2): Limit=150.00 MB Reservation=117.25 MB 
ReservationLimit=118.00 MB OtherMemory=32.60 MB Total=149.85 MB Peak=149.85 MB
  Unclaimed reservations: Reservation=5.75 MB OtherMemory=0 Total=5.75 MB 
Peak=55.75 MB
  Fragment 2e4f0f944d373848:9ae1d7e20003: Reservation=2.00 MB 
OtherMemory=22.20 MB Total=24.20 MB Peak=24.20 MB
Runtime Filter Bank: Reservation=2.00 MB ReservationLimit=2.00 MB 
OtherMemory=0 Total=2.00 MB Peak=2.00 MB
SORT_NODE (id=3): Total=0 Peak=0
HASH_JOIN_NODE (id=2): Total=42.25 KB Peak=42.25 KB
  Exprs: Total=13.12 KB Peak=13.12 KB
  Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=13.12 KB
Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB
HDFS_SCAN_NODE (id=0): Total=0 Peak=0
EXCHANGE_NODE (id=4): Reservation=18.79 MB OtherMemory=235.89 KB 
Total=19.02 MB Peak=19.02 MB
  KrpcDeferredRpcs: Total=235.89 KB Peak=235.89 KB
KrpcDataStreamSender (dst_id=5): Total=480.00 B Peak=480.00 B
CodeGen: Total=3.13 MB Peak=3.13 MB
  Fragment 2e4f0f944d373848:9ae1d7e20002: Reservation=109.50 MB 
OtherMemory=10.39 MB Total=119.89 MB Peak=119.89 MB
HDFS_SCAN_NODE (id=1): Reservation=109.50 MB OtherMemory=10.20 MB 
Total=119.70 MB Peak=119.70 MB
  Queued Batches: Total=6.12 MB Peak=6.12 MB
KrpcDataStreamSender (dst_id=4): Total=688.00 B Peak=688.00 B
CodeGen: Total=488.00 B Peak=51.00 KB
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7697) Query gets erased before reporting ExecSummary

2018-10-11 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7697:


 Summary: Query gets erased before reporting ExecSummary
 Key: IMPALA-7697
 URL: https://issues.apache.org/jira/browse/IMPALA-7697
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar


In a recent build, certain queries went missing from the 
ImpalaServer::query_log_index_ before the ImpalaServer::GetExecSumamry function 
was invoked. Hence the test case failed. An easy (intermediate) fix would be to 
increase FLAGS_query_log_size. However, ideally the query shouldn't get erased 
before the ExecSummary has been reported to the client via the beeswax/hs2 
servers. 

Here are the test logs:

{code:java}
Error Message

query_test/test_resource_limits.py:45: in test_resource_limits 
self.run_test_case('QueryTest/query-resource-limits', vector) 
common/impala_test_suite.py:478: in run_test_case assert False, "Expected 
exception: %s" % expected_str E   AssertionError: Expected exception: 
row_regex:.*expired due to execution time limit of 2s000ms.*

Stacktrace

query_test/test_resource_limits.py:45: in test_resource_limits
self.run_test_case('QueryTest/query-resource-limits', vector)
common/impala_test_suite.py:478: in run_test_case
assert False, "Expected exception: %s" % expected_str
E   AssertionError: Expected exception: row_regex:.*expired due to execution 
time limit of 2s000ms.*

Standard Error
-- executing against localhost:21000
SET SCAN_BYTES_LIMIT="0";

-- 2018-10-10 22:38:29,826 INFO MainThread: Started query 
8e45a13bc999749e:58175e16
{code}


Here are the impalad logs: 

{code:java}
impalad.INFO.20181010-191824.5460:I1010 22:38:29.825745 31580 
impala-server.cc:1060] Registered query 
query_id=8e45a13bc999749e:58175e16 
session_id=43434de5f83010f9:7e0750ad7ad86b80
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826026 31580 
impala-server.cc:1115] Query 8e45a13bc999749e:58175e16 has scan bytes 
limit of 100.00 GB
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826328 31580 
impala-beeswax-server.cc:197] get_results_metadata(): 
query_id=8e45a13bc999749e:58175e16
impalad..INFO.20181010-191824.5460:I1010 22:38:29.826584 31580 
impala-server.cc:776] Query id 8e45a13bc999749e:58175e16 not found.
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826858 31580 
impala-beeswax-server.cc:239] close(): 
query_id=8e45a13bc999749e:58175e16
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826861 31580 
impala-server.cc:1127] UnregisterQuery(): 
query_id=8e45a13bc999749e:58175e16
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826864 31580 
impala-server.cc:1238] Cancel(): query_id=8e45a13bc999749e:58175e16

{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7697) Query gets erased before reporting ExecSummary

2018-10-11 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7697:


 Summary: Query gets erased before reporting ExecSummary
 Key: IMPALA-7697
 URL: https://issues.apache.org/jira/browse/IMPALA-7697
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar


In a recent build, certain queries went missing from the 
ImpalaServer::query_log_index_ before the ImpalaServer::GetExecSumamry function 
was invoked. Hence the test case failed. An easy (intermediate) fix would be to 
increase FLAGS_query_log_size. However, ideally the query shouldn't get erased 
before the ExecSummary has been reported to the client via the beeswax/hs2 
servers. 

Here are the test logs:

{code:java}
Error Message

query_test/test_resource_limits.py:45: in test_resource_limits 
self.run_test_case('QueryTest/query-resource-limits', vector) 
common/impala_test_suite.py:478: in run_test_case assert False, "Expected 
exception: %s" % expected_str E   AssertionError: Expected exception: 
row_regex:.*expired due to execution time limit of 2s000ms.*

Stacktrace

query_test/test_resource_limits.py:45: in test_resource_limits
self.run_test_case('QueryTest/query-resource-limits', vector)
common/impala_test_suite.py:478: in run_test_case
assert False, "Expected exception: %s" % expected_str
E   AssertionError: Expected exception: row_regex:.*expired due to execution 
time limit of 2s000ms.*

Standard Error
-- executing against localhost:21000
SET SCAN_BYTES_LIMIT="0";

-- 2018-10-10 22:38:29,826 INFO MainThread: Started query 
8e45a13bc999749e:58175e16
{code}


Here are the impalad logs: 

{code:java}
impalad.INFO.20181010-191824.5460:I1010 22:38:29.825745 31580 
impala-server.cc:1060] Registered query 
query_id=8e45a13bc999749e:58175e16 
session_id=43434de5f83010f9:7e0750ad7ad86b80
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826026 31580 
impala-server.cc:1115] Query 8e45a13bc999749e:58175e16 has scan bytes 
limit of 100.00 GB
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826328 31580 
impala-beeswax-server.cc:197] get_results_metadata(): 
query_id=8e45a13bc999749e:58175e16
impalad..INFO.20181010-191824.5460:I1010 22:38:29.826584 31580 
impala-server.cc:776] Query id 8e45a13bc999749e:58175e16 not found.
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826858 31580 
impala-beeswax-server.cc:239] close(): 
query_id=8e45a13bc999749e:58175e16
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826861 31580 
impala-server.cc:1127] UnregisterQuery(): 
query_id=8e45a13bc999749e:58175e16
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826864 31580 
impala-server.cc:1238] Cancel(): query_id=8e45a13bc999749e:58175e16

{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7690) TestAdmissionController.test_pool_config_change_while_queued fails on centos6

2018-10-10 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7690:


 Summary: 
TestAdmissionController.test_pool_config_change_while_queued fails on centos6
 Key: IMPALA-7690
 URL: https://issues.apache.org/jira/browse/IMPALA-7690
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 3.1.0
Reporter: Pooja Nilangekar
Assignee: Bikramjeet Vig


TestAdmissionController.test_pool_config_change_while_queued fails on Centos6 
because python 2.6 does not support iter() on {{xml.etree.ElementTree. }}

 

Here are the logs from the test failure:

 
{code:java}
custom_cluster/test_admission_controller.py:767: in 
test_pool_config_change_while_queued
config.set_config_value(pool_name, config_str, 1)
common/resource_pool_config.py:43: in set_config_value
node = self.__find_xml_node(self.root, pool_name, config_str)
common/resource_pool_config.py:86: in __find_xml_node
for property in xml_root.iter('property'):
E   AttributeError: _ElementInterface instance has no attribute 'iter'
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7678) Revert IMPALA-7660

2018-10-08 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7678:


 Summary: Revert IMPALA-7660
 Key: IMPALA-7678
 URL: https://issues.apache.org/jira/browse/IMPALA-7678
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar


After merging IMPALA-7660, impala server starts up but start-impala-cluster.py 
can't contact the debug webpage on RHEL builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7678) Revert IMPALA-7660

2018-10-08 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7678:


 Summary: Revert IMPALA-7660
 Key: IMPALA-7678
 URL: https://issues.apache.org/jira/browse/IMPALA-7678
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar


After merging IMPALA-7660, impala server starts up but start-impala-cluster.py 
can't contact the debug webpage on RHEL builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7638) Lower default timeout for connection setup

2018-09-27 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631201#comment-16631201
 ] 

Pooja Nilangekar commented on IMPALA-7638:
--

While reading the review for IMPALA-5394 to understand the reason behind 
setting the default timeout as 5 minutes, I came across this comment: 

We would also want the client connection timeout to default to a pretty high 
number since on large clusters, we've seen Kerberos negotiations take up to a 
few minutes. I would prefer keeping the timeout to 5 minutes. It's not ideal, 
however, we would rather not see queries fail because of timed out negotiations 
vs. optimize for an even worse case of clients hung for 5 minutes (which is 
configurable by a flag if the user chooses to do so). This is the same reason 
we keep the internal connection timeout so high, since we'd rather see progress 
than a failed query due to one timed out connection. 

[~jfs]'s initial patch had 5 seconds set as the default timeout.  So I thought 
this comment might be useful while addressing this issue. 

> Lower default timeout for connection setup
> --
>
> Key: IMPALA-7638
> URL: https://issues.apache.org/jira/browse/IMPALA-7638
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Lars Volker
>Priority: Major
> Fix For: Impala 2.11.0
>
>
> IMPALA-5394 added the sasl_connect_tcp_timeout_ms flag with a default timeout 
> of 5 minutes. This seems too long as broken clients will prevent new clients 
> from establishing connections for this time. In addition to increasing the 
> acceptor thread pool size (IMPALA-7565) we should lower this timeout 
> considerably, e.g. to 5 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7352) HdfsTableSink doesn't take into account insert clustering

2018-09-25 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7352.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> HdfsTableSink doesn't take into account insert clustering
> -
>
> Key: IMPALA-7352
> URL: https://issues.apache.org/jira/browse/IMPALA-7352
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: resource-management
> Fix For: Impala 3.1.0
>
>
> I noticed that the code doesn't check whether the insert is clustered, which 
> would mean it only produces a single partition at a time.
> {code}
>   @Override
>   public void computeResourceProfile(TQueryOptions queryOptions) {
> HdfsTable table = (HdfsTable) targetTable_;
> // TODO: Estimate the memory requirements more accurately by partition 
> type.
> HdfsFileFormat format = table.getMajorityFormat();
> PlanNode inputNode = fragment_.getPlanRoot();
> int numInstances = fragment_.getNumInstances(queryOptions.getMt_dop());
> // Compute the per-instance number of partitions, taking the number of 
> nodes
> // and the data partition of the fragment executing this sink into 
> account.
> long numPartitionsPerInstance =
> fragment_.getPerInstanceNdv(queryOptions.getMt_dop(), 
> partitionKeyExprs_);
> if (numPartitionsPerInstance == -1) {
>   numPartitionsPerInstance = DEFAULT_NUM_PARTITIONS;
> }
> long perPartitionMemReq = getPerPartitionMemReq(format);
> long perInstanceMemEstimate;
> // The estimate is based purely on the per-partition mem req if the input 
> cardinality_
> // or the avg row size is unknown.
> if (inputNode.getCardinality() == -1 || inputNode.getAvgRowSize() == -1) {
>   perInstanceMemEstimate = numPartitionsPerInstance * perPartitionMemReq;
> } else {
>   // The per-partition estimate may be higher than the memory required to 
> buffer
>   // the entire input data.
>   long perInstanceInputCardinality =
>   Math.max(1L, inputNode.getCardinality() / numInstances);
>   long perInstanceInputBytes =
>   (long) Math.ceil(perInstanceInputCardinality * 
> inputNode.getAvgRowSize());
>   long perInstanceMemReq =
>   PlanNode.checkedMultiply(numPartitionsPerInstance, 
> perPartitionMemReq);
>   perInstanceMemEstimate = Math.min(perInstanceInputBytes, 
> perInstanceMemReq);
> }
> resourceProfile_ = ResourceProfile.noReservation(perInstanceMemEstimate);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7352) HdfsTableSink doesn't take into account insert clustering

2018-09-25 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7352.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> HdfsTableSink doesn't take into account insert clustering
> -
>
> Key: IMPALA-7352
> URL: https://issues.apache.org/jira/browse/IMPALA-7352
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: resource-management
> Fix For: Impala 3.1.0
>
>
> I noticed that the code doesn't check whether the insert is clustered, which 
> would mean it only produces a single partition at a time.
> {code}
>   @Override
>   public void computeResourceProfile(TQueryOptions queryOptions) {
> HdfsTable table = (HdfsTable) targetTable_;
> // TODO: Estimate the memory requirements more accurately by partition 
> type.
> HdfsFileFormat format = table.getMajorityFormat();
> PlanNode inputNode = fragment_.getPlanRoot();
> int numInstances = fragment_.getNumInstances(queryOptions.getMt_dop());
> // Compute the per-instance number of partitions, taking the number of 
> nodes
> // and the data partition of the fragment executing this sink into 
> account.
> long numPartitionsPerInstance =
> fragment_.getPerInstanceNdv(queryOptions.getMt_dop(), 
> partitionKeyExprs_);
> if (numPartitionsPerInstance == -1) {
>   numPartitionsPerInstance = DEFAULT_NUM_PARTITIONS;
> }
> long perPartitionMemReq = getPerPartitionMemReq(format);
> long perInstanceMemEstimate;
> // The estimate is based purely on the per-partition mem req if the input 
> cardinality_
> // or the avg row size is unknown.
> if (inputNode.getCardinality() == -1 || inputNode.getAvgRowSize() == -1) {
>   perInstanceMemEstimate = numPartitionsPerInstance * perPartitionMemReq;
> } else {
>   // The per-partition estimate may be higher than the memory required to 
> buffer
>   // the entire input data.
>   long perInstanceInputCardinality =
>   Math.max(1L, inputNode.getCardinality() / numInstances);
>   long perInstanceInputBytes =
>   (long) Math.ceil(perInstanceInputCardinality * 
> inputNode.getAvgRowSize());
>   long perInstanceMemReq =
>   PlanNode.checkedMultiply(numPartitionsPerInstance, 
> perPartitionMemReq);
>   perInstanceMemEstimate = Math.min(perInstanceInputBytes, 
> perInstanceMemReq);
> }
> resourceProfile_ = ResourceProfile.noReservation(perInstanceMemEstimate);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7367) Pack StringValue, CollectionValue and TimestampValue slots

2018-09-24 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626314#comment-16626314
 ] 

Pooja Nilangekar commented on IMPALA-7367:
--

>From what I understood, you're suggesting packing collection value for now and 
>creating JIRAs for StringValue (due to Kudu) and TimestampValue (due to 
>boost). Is that correct?

In that case, we would only be getting the performance improvements because of 
packing CollectionValue. (We can pack CollectionValue structs because Kudu does 
not support it.)

> Pack StringValue, CollectionValue and TimestampValue slots
> --
>
> Key: IMPALA-7367
> URL: https://issues.apache.org/jira/browse/IMPALA-7367
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: perfomance
> Attachments: 0001-WIP.patch
>
>
> This is a follow-on to finish up the work from IMPALA-2789. IMPALA-2789 
> didn't actually fully pack the memory layout because StringValue, 
> TimestampValue and CollectionValue still occupy 16 bytes but only have 12 
> bytes of actual data. This results in a higher memory footprint, which leads 
> to higher memory requirements and worse performance. We don't get any benefit 
> from the padding since the majority of tuples are not actually aligned in 
> memory anyway.
> I did a quick version of the change for StringValue only which improves TPC-H 
> performance.
> {noformat}
> Report Generated on 2018-07-30
> Run Description: "b5608264b4552e44eb73ded1e232a8775c3dba6b vs 
> f1e401505ac20c0400eec819b9196f7f506fb927"
> Cluster Name: UNKNOWN
> Lab Run Info: UNKNOWN
> Impala Version:  impalad version 3.1.0-SNAPSHOT RELEASE ()
> Baseline Impala Version: impalad version 3.1.0-SNAPSHOT RELEASE (2018-07-27)
> +--+---+-++++
> | Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
> Delta(GeoMean) |
> +--+---+-++++
> | TPCH(10) | parquet / none / none | 2.69| -4.78% | 2.09   | 
> -3.11% |
> +--+---+-++++
> +--+--+---++-++++-+---+
> | Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
> Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
> +--+--+---++-++++-+---+
> | TPCH(10) | TPCH-Q22 | parquet / none / none | 0.94   | 0.93|   
> +0.75%   |   3.37%|   2.84%| 1   | 30|
> | TPCH(10) | TPCH-Q13 | parquet / none / none | 3.32   | 3.32|   
> +0.13%   |   1.74%|   2.09%| 1   | 30|
> | TPCH(10) | TPCH-Q11 | parquet / none / none | 0.99   | 0.99|   
> -0.02%   |   3.74%|   3.16%| 1   | 30|
> | TPCH(10) | TPCH-Q5  | parquet / none / none | 2.30   | 2.33|   
> -0.96%   |   2.15%|   2.45%| 1   | 30|
> | TPCH(10) | TPCH-Q2  | parquet / none / none | 1.55   | 1.57|   
> -1.45%   |   1.65%|   1.49%| 1   | 30|
> | TPCH(10) | TPCH-Q8  | parquet / none / none | 2.89   | 2.93|   
> -1.51%   |   2.69%|   1.34%| 1   | 30|
> | TPCH(10) | TPCH-Q9  | parquet / none / none | 5.96   | 6.06|   
> -1.63%   |   1.34%|   1.82%| 1   | 30|
> | TPCH(10) | TPCH-Q20 | parquet / none / none | 1.58   | 1.61|   
> -1.85%   |   2.28%|   2.16%| 1   | 30|
> | TPCH(10) | TPCH-Q16 | parquet / none / none | 1.18   | 1.21|   
> -2.11%   |   3.68%|   4.72%| 1   | 30|
> | TPCH(10) | TPCH-Q3  | parquet / none / none | 2.13   | 2.18|   
> -2.31%   |   2.09%|   1.92%| 1   | 30|
> | TPCH(10) | TPCH-Q15 | parquet / none / none | 1.86   | 1.90|   
> -2.52%   |   2.06%|   2.22%| 1   | 30|
> | TPCH(10) | TPCH-Q17 | parquet / none / none | 1.85   | 1.90|   
> -2.86%   |   10.00%   |   8.02%| 1   | 30|
> | TPCH(10) | TPCH-Q10 | parquet / none / none | 2.58   | 2.66|   
> -2.93%   |   1.68%|   6.49%| 1   | 30|
> | TPCH(10) | TPCH-Q14 | parquet / none / none | 1.37   | 1.42|   
> -3.22%   |   3.35%|   6.24%| 1   | 30|
> | TPCH(10) | TPCH-Q18 | parquet / none / none | 4.99   | 5.17|   
> -3.38%   |   1.75%

[jira] [Assigned] (IMPALA-7352) HdfsTableSink doesn't take into account insert clustering

2018-09-24 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-7352:


Assignee: Pooja Nilangekar  (was: Bikramjeet Vig)

> HdfsTableSink doesn't take into account insert clustering
> -
>
> Key: IMPALA-7352
> URL: https://issues.apache.org/jira/browse/IMPALA-7352
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: resource-management
>
> I noticed that the code doesn't check whether the insert is clustered, which 
> would mean it only produces a single partition at a time.
> {code}
>   @Override
>   public void computeResourceProfile(TQueryOptions queryOptions) {
> HdfsTable table = (HdfsTable) targetTable_;
> // TODO: Estimate the memory requirements more accurately by partition 
> type.
> HdfsFileFormat format = table.getMajorityFormat();
> PlanNode inputNode = fragment_.getPlanRoot();
> int numInstances = fragment_.getNumInstances(queryOptions.getMt_dop());
> // Compute the per-instance number of partitions, taking the number of 
> nodes
> // and the data partition of the fragment executing this sink into 
> account.
> long numPartitionsPerInstance =
> fragment_.getPerInstanceNdv(queryOptions.getMt_dop(), 
> partitionKeyExprs_);
> if (numPartitionsPerInstance == -1) {
>   numPartitionsPerInstance = DEFAULT_NUM_PARTITIONS;
> }
> long perPartitionMemReq = getPerPartitionMemReq(format);
> long perInstanceMemEstimate;
> // The estimate is based purely on the per-partition mem req if the input 
> cardinality_
> // or the avg row size is unknown.
> if (inputNode.getCardinality() == -1 || inputNode.getAvgRowSize() == -1) {
>   perInstanceMemEstimate = numPartitionsPerInstance * perPartitionMemReq;
> } else {
>   // The per-partition estimate may be higher than the memory required to 
> buffer
>   // the entire input data.
>   long perInstanceInputCardinality =
>   Math.max(1L, inputNode.getCardinality() / numInstances);
>   long perInstanceInputBytes =
>   (long) Math.ceil(perInstanceInputCardinality * 
> inputNode.getAvgRowSize());
>   long perInstanceMemReq =
>   PlanNode.checkedMultiply(numPartitionsPerInstance, 
> perPartitionMemReq);
>   perInstanceMemEstimate = Math.min(perInstanceInputBytes, 
> perInstanceMemReq);
> }
> resourceProfile_ = ResourceProfile.noReservation(perInstanceMemEstimate);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7367) Pack StringValue, CollectionValue and TimestampValue slots

2018-09-21 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624368#comment-16624368
 ] 

Pooja Nilangekar commented on IMPALA-7367:
--

[~tarmstrong] I still need to fix issues with KuduScanners. The other scanners 
work fine because we read from the file and populate it into the Tuple. 
However, in case of Kudu, we carry out a memcopy from the kudu tuple because 
the all slot descriptors are the same (except TimestampValue which is handled 
because Kudu adds sufficient padding). I spoke to [~bikramjeet.vig] about this, 
and he explained that replacing the single memcopy per tuple with some 
mechanism to handle strings (leading to multiple multiple memcopy calls) would 
cause significant regression for Kudu tables.

> Pack StringValue, CollectionValue and TimestampValue slots
> --
>
> Key: IMPALA-7367
> URL: https://issues.apache.org/jira/browse/IMPALA-7367
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: perfomance
> Attachments: 0001-WIP.patch
>
>
> This is a follow-on to finish up the work from IMPALA-2789. IMPALA-2789 
> didn't actually fully pack the memory layout because StringValue, 
> TimestampValue and CollectionValue still occupy 16 bytes but only have 12 
> bytes of actual data. This results in a higher memory footprint, which leads 
> to higher memory requirements and worse performance. We don't get any benefit 
> from the padding since the majority of tuples are not actually aligned in 
> memory anyway.
> I did a quick version of the change for StringValue only which improves TPC-H 
> performance.
> {noformat}
> Report Generated on 2018-07-30
> Run Description: "b5608264b4552e44eb73ded1e232a8775c3dba6b vs 
> f1e401505ac20c0400eec819b9196f7f506fb927"
> Cluster Name: UNKNOWN
> Lab Run Info: UNKNOWN
> Impala Version:  impalad version 3.1.0-SNAPSHOT RELEASE ()
> Baseline Impala Version: impalad version 3.1.0-SNAPSHOT RELEASE (2018-07-27)
> +--+---+-++++
> | Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
> Delta(GeoMean) |
> +--+---+-++++
> | TPCH(10) | parquet / none / none | 2.69| -4.78% | 2.09   | 
> -3.11% |
> +--+---+-++++
> +--+--+---++-++++-+---+
> | Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
> Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
> +--+--+---++-++++-+---+
> | TPCH(10) | TPCH-Q22 | parquet / none / none | 0.94   | 0.93|   
> +0.75%   |   3.37%|   2.84%| 1   | 30|
> | TPCH(10) | TPCH-Q13 | parquet / none / none | 3.32   | 3.32|   
> +0.13%   |   1.74%|   2.09%| 1   | 30|
> | TPCH(10) | TPCH-Q11 | parquet / none / none | 0.99   | 0.99|   
> -0.02%   |   3.74%|   3.16%| 1   | 30|
> | TPCH(10) | TPCH-Q5  | parquet / none / none | 2.30   | 2.33|   
> -0.96%   |   2.15%|   2.45%| 1   | 30|
> | TPCH(10) | TPCH-Q2  | parquet / none / none | 1.55   | 1.57|   
> -1.45%   |   1.65%|   1.49%| 1   | 30|
> | TPCH(10) | TPCH-Q8  | parquet / none / none | 2.89   | 2.93|   
> -1.51%   |   2.69%|   1.34%| 1   | 30|
> | TPCH(10) | TPCH-Q9  | parquet / none / none | 5.96   | 6.06|   
> -1.63%   |   1.34%|   1.82%| 1   | 30|
> | TPCH(10) | TPCH-Q20 | parquet / none / none | 1.58   | 1.61|   
> -1.85%   |   2.28%|   2.16%| 1   | 30|
> | TPCH(10) | TPCH-Q16 | parquet / none / none | 1.18   | 1.21|   
> -2.11%   |   3.68%|   4.72%| 1   | 30|
> | TPCH(10) | TPCH-Q3  | parquet / none / none | 2.13   | 2.18|   
> -2.31%   |   2.09%|   1.92%| 1   | 30|
> | TPCH(10) | TPCH-Q15 | parquet / none / none | 1.86   | 1.90|   
> -2.52%   |   2.06%|   2.22%| 1   | 30|
> | TPCH(10) | TPCH-Q17 | parquet / none / none | 1.85   | 1.90|   
> -2.86%   |   10.00%   |   8.02%| 1   | 30|
> | TPCH(10) | TPCH-Q10 | parquet / none / none | 2.58   | 2.66|   
> -2.93%   |   1.68%|   6.49%| 1   | 30|
> | TPCH(10) 

[jira] [Assigned] (IMPALA-7545) Add admission control status to query log

2018-09-21 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar reassigned IMPALA-7545:


Assignee: Pooja Nilangekar  (was: Tim Armstrong)

> Add admission control status to query log
> -
>
> Key: IMPALA-7545
> URL: https://issues.apache.org/jira/browse/IMPALA-7545
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: admission-control, observability
>
> We already include the query progress in the HS2 GetLog() response (although 
> for some reason we don't do the same for beeswax) so we should include 
> admission control progress. We should definitely include it if the query is 
> currently queued, it's probably too noisy to include once the query has been 
> admitted.
> We should also do the same for beeswax/impala-shell so that 
> live_progress/live_summary is useful if the query is queued. We should look 
> at the live_progress/live_summary mechanisms and extend those to include the 
> required information to report admission control state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7335) Assertion Failure - test_corrupt_files

2018-09-21 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7335.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Assertion Failure - test_corrupt_files
> --
>
> Key: IMPALA-7335
> URL: https://issues.apache.org/jira/browse/IMPALA-7335
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.1.0
>Reporter: nithya
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.1.0
>
>
> test_corrupt_files fails 
>  
> query_test.test_scanners.TestParquet.test_corrupt_files[exec_option: 
> \\{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] (from 
> pytest)
>  
> {code:java}
> Error Message
> query_test/test_scanners.py:300: in test_corrupt_files     
> self.run_test_case('QueryTest/parquet-abort-on-error', vector) 
> common/impala_test_suite.py:420: in run_test_case     assert False, "Expected 
> exception: %s" % expected_str E   AssertionError: Expected exception: Column 
> metadata states there are 11 values, but read 10 values from column id.
> STACKTRACE
> query_test/test_scanners.py:300: in test_corrupt_files
>     self.run_test_case('QueryTest/parquet-abort-on-error', vector)
> common/impala_test_suite.py:420: in run_test_case
>     assert False, "Expected exception: %s" % expected_str
> E   AssertionError: Expected exception: Column metadata states there are 11 
> values, but read 10 values from column id.
> Standard Error
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=0;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from 
> t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_negative_len;
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_out_of_bounds;
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from 
> t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7335) Assertion Failure - test_corrupt_files

2018-09-21 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7335.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Assertion Failure - test_corrupt_files
> --
>
> Key: IMPALA-7335
> URL: https://issues.apache.org/jira/browse/IMPALA-7335
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.1.0
>Reporter: nithya
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.1.0
>
>
> test_corrupt_files fails 
>  
> query_test.test_scanners.TestParquet.test_corrupt_files[exec_option: 
> \\{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] (from 
> pytest)
>  
> {code:java}
> Error Message
> query_test/test_scanners.py:300: in test_corrupt_files     
> self.run_test_case('QueryTest/parquet-abort-on-error', vector) 
> common/impala_test_suite.py:420: in run_test_case     assert False, "Expected 
> exception: %s" % expected_str E   AssertionError: Expected exception: Column 
> metadata states there are 11 values, but read 10 values from column id.
> STACKTRACE
> query_test/test_scanners.py:300: in test_corrupt_files
>     self.run_test_case('QueryTest/parquet-abort-on-error', vector)
> common/impala_test_suite.py:420: in run_test_case
>     assert False, "Expected exception: %s" % expected_str
> E   AssertionError: Expected exception: Column metadata states there are 11 
> values, but read 10 values from column id.
> Standard Error
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=0;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from 
> t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_negative_len;
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_out_of_bounds;
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from 
> t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7430) Remove the log added to HdfsScanNode::ScannerThread

2018-09-21 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7430.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Remove the log added to HdfsScanNode::ScannerThread
> ---
>
> Key: IMPALA-7430
> URL: https://issues.apache.org/jira/browse/IMPALA-7430
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 3.1.0
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
> Fix For: Impala 3.1.0
>
>
> Logs were added to the HdfsScanNode in order to debug IMPALA-7335 and 
> IMPALA-7418. These need to be removed once the cause of these bugs is 
> established and we find a way to reproduce it locally. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-7367) Pack StringValue, CollectionValue and TimestampValue slots

2018-09-20 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622879#comment-16622879
 ] 

Pooja Nilangekar edited comment on IMPALA-7367 at 9/20/18 11:49 PM:


I ran TPCH with a scale factor of 60 on a minicluster with a patch for 
StringValue and CollectionValue slots. Here is the summary of results: 

{noformat}
+-+--+--+---+-+---+
| Workload | File Format  | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+-+--+--+---+--+--+
| TPCH(60) | parquet / none / none | 12.45   | -29.84% | 8.63  
| -11.30%  |
+--+-+--+---+--+--+
{noformat}


The queries which showed significant performance gain did use strings or 
timestamps stored as strings. I can understand that we should see an 
improvement, however I am not sure about the magnitude. 

Also there were only 2 queries which showed a regression > 1 %. In both cases, 
the absolute difference was less than 5ms while the query took a few seconds to 
run. So this could just be system noise. 


was (Author: poojanilangekar):
I ran TPCH with a scale factor of 60 on a minicluster with a patch for 
StringValue and CollectionValue slots. Here is the summary of results: 

+-+--+--+---+-+---+
| Workload | File Format  | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+-+--+--+---+--+--+
| TPCH(60) | parquet / none / none | 12.45   | -29.84% | 8.63  
| -11.30%  |
+--+-+--+---+--+--+

The queries which showed significant performance gain did use strings or 
timestamps stored as strings. I can understand that we should see an 
improvement, however I am not sure about the magnitude. 

Also there were only 2 queries which showed a regression > 1 %. In both cases, 
the absolute difference was less than 5ms while the query took a few seconds to 
run. So this could just be system noise. 

> Pack StringValue, CollectionValue and TimestampValue slots
> --
>
> Key: IMPALA-7367
> URL: https://issues.apache.org/jira/browse/IMPALA-7367
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: perfomance
> Attachments: 0001-WIP.patch
>
>
> This is a follow-on to finish up the work from IMPALA-2789. IMPALA-2789 
> didn't actually fully pack the memory layout because StringValue, 
> TimestampValue and CollectionValue still occupy 16 bytes but only have 12 
> bytes of actual data. This results in a higher memory footprint, which leads 
> to higher memory requirements and worse performance. We don't get any benefit 
> from the padding since the majority of tuples are not actually aligned in 
> memory anyway.
> I did a quick version of the change for StringValue only which improves TPC-H 
> performance.
> {noformat}
> Report Generated on 2018-07-30
> Run Description: "b5608264b4552e44eb73ded1e232a8775c3dba6b vs 
> f1e401505ac20c0400eec819b9196f7f506fb927"
> Cluster Name: UNKNOWN
> Lab Run Info: UNKNOWN
> Impala Version:  impalad version 3.1.0-SNAPSHOT RELEASE ()
> Baseline Impala Version: impalad version 3.1.0-SNAPSHOT RELEASE (2018-07-27)
> +--+---+-++++
> | Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
> Delta(GeoMean) |
> +--+---+-++++
> | TPCH(10) | parquet / none / none | 2.69| -4.78% | 2.09   | 
> -3.11% |
> +--+---+-++++
> +--+--+---++-++++-+---+
> | Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
> Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
> +--+--+---++-++++-+---+
> | TPCH(10) | TPCH-Q22 | parquet / none / none | 0.94   | 0.93|   
> +0.75%   |   3.37%|   2.84%| 1

[jira] [Commented] (IMPALA-7367) Pack StringValue, CollectionValue and TimestampValue slots

2018-09-20 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622879#comment-16622879
 ] 

Pooja Nilangekar commented on IMPALA-7367:
--

I ran TPCH with a scale factor of 60 on a minicluster with a patch for 
StringValue and CollectionValue slots. Here is the summary of results: 

+-+--+--+---+-+---+
| Workload | File Format  | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+-+--+--+---+--+--+
| TPCH(60) | parquet / none / none | 12.45   | -29.84% | 8.63  
| -11.30%  |
+--+-+--+---+--+--+

The queries which showed significant performance gain did use strings or 
timestamps stored as strings. I can understand that we should see an 
improvement, however I am not sure about the magnitude. 

Also there were only 2 queries which showed a regression > 1 %. In both cases, 
the absolute difference was less than 5ms while the query took a few seconds to 
run. So this could just be system noise. 

> Pack StringValue, CollectionValue and TimestampValue slots
> --
>
> Key: IMPALA-7367
> URL: https://issues.apache.org/jira/browse/IMPALA-7367
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: perfomance
> Attachments: 0001-WIP.patch
>
>
> This is a follow-on to finish up the work from IMPALA-2789. IMPALA-2789 
> didn't actually fully pack the memory layout because StringValue, 
> TimestampValue and CollectionValue still occupy 16 bytes but only have 12 
> bytes of actual data. This results in a higher memory footprint, which leads 
> to higher memory requirements and worse performance. We don't get any benefit 
> from the padding since the majority of tuples are not actually aligned in 
> memory anyway.
> I did a quick version of the change for StringValue only which improves TPC-H 
> performance.
> {noformat}
> Report Generated on 2018-07-30
> Run Description: "b5608264b4552e44eb73ded1e232a8775c3dba6b vs 
> f1e401505ac20c0400eec819b9196f7f506fb927"
> Cluster Name: UNKNOWN
> Lab Run Info: UNKNOWN
> Impala Version:  impalad version 3.1.0-SNAPSHOT RELEASE ()
> Baseline Impala Version: impalad version 3.1.0-SNAPSHOT RELEASE (2018-07-27)
> +--+---+-++++
> | Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
> Delta(GeoMean) |
> +--+---+-++++
> | TPCH(10) | parquet / none / none | 2.69| -4.78% | 2.09   | 
> -3.11% |
> +--+---+-++++
> +--+--+---++-++++-+---+
> | Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
> Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
> +--+--+---++-++++-+---+
> | TPCH(10) | TPCH-Q22 | parquet / none / none | 0.94   | 0.93|   
> +0.75%   |   3.37%|   2.84%| 1   | 30|
> | TPCH(10) | TPCH-Q13 | parquet / none / none | 3.32   | 3.32|   
> +0.13%   |   1.74%|   2.09%| 1   | 30|
> | TPCH(10) | TPCH-Q11 | parquet / none / none | 0.99   | 0.99|   
> -0.02%   |   3.74%|   3.16%| 1   | 30|
> | TPCH(10) | TPCH-Q5  | parquet / none / none | 2.30   | 2.33|   
> -0.96%   |   2.15%|   2.45%| 1   | 30|
> | TPCH(10) | TPCH-Q2  | parquet / none / none | 1.55   | 1.57|   
> -1.45%   |   1.65%|   1.49%| 1   | 30|
> | TPCH(10) | TPCH-Q8  | parquet / none / none | 2.89   | 2.93|   
> -1.51%   |   2.69%|   1.34%| 1   | 30|
> | TPCH(10) | TPCH-Q9  | parquet / none / none | 5.96   | 6.06|   
> -1.63%   |   1.34%|   1.82%| 1   | 30|
> | TPCH(10) | TPCH-Q20 | parquet / none / none | 1.58   | 1.61|   
> -1.85%   |   2.28%|   2.16%| 1   | 30|
> | TPCH(10) | TPCH-Q16 | parquet / none / none | 1.18   | 1.21|   
> -2.11%   |   3.68%|   4.72%| 1   | 30|
> | TPCH(10) | TPCH-Q3  | parquet / none / none | 2.13   | 2.18 

[jira] [Updated] (IMPALA-7430) Remove the log added to HdfsScanNode::ScannerThread

2018-09-14 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar updated IMPALA-7430:
-
Affects Version/s: Impala 3.1.0

> Remove the log added to HdfsScanNode::ScannerThread
> ---
>
> Key: IMPALA-7430
> URL: https://issues.apache.org/jira/browse/IMPALA-7430
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 3.1.0
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
>
> Logs were added to the HdfsScanNode in order to debug IMPALA-7335 and 
> IMPALA-7418. These need to be removed once the cause of these bugs is 
> established and we find a way to reproduce it locally. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7367) Pack StringValue, CollectionValue and TimestampValue slots

2018-09-13 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614152#comment-16614152
 ] 

Pooja Nilangekar commented on IMPALA-7367:
--

[~csringhofer] Agreed. We should be moving away from boost but I am not sure 
replacing those with uint64_t/int32_t since the TimestampValue class requires 
some functions which could utilize some existing libraries/wrappers. Hence I 
thought one approach here would be to seize the benefits of using fewer bytes 
for StringValue and CollectionValue in this Jira and then handle TimestampValue 
separately since that structure can't be packed as is. 

> Pack StringValue, CollectionValue and TimestampValue slots
> --
>
> Key: IMPALA-7367
> URL: https://issues.apache.org/jira/browse/IMPALA-7367
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: perfomance
> Attachments: 0001-WIP.patch
>
>
> This is a follow-on to finish up the work from IMPALA-2789. IMPALA-2789 
> didn't actually fully pack the memory layout because StringValue, 
> TimestampValue and CollectionValue still occupy 16 bytes but only have 12 
> bytes of actual data. This results in a higher memory footprint, which leads 
> to higher memory requirements and worse performance. We don't get any benefit 
> from the padding since the majority of tuples are not actually aligned in 
> memory anyway.
> I did a quick version of the change for StringValue only which improves TPC-H 
> performance.
> {noformat}
> Report Generated on 2018-07-30
> Run Description: "b5608264b4552e44eb73ded1e232a8775c3dba6b vs 
> f1e401505ac20c0400eec819b9196f7f506fb927"
> Cluster Name: UNKNOWN
> Lab Run Info: UNKNOWN
> Impala Version:  impalad version 3.1.0-SNAPSHOT RELEASE ()
> Baseline Impala Version: impalad version 3.1.0-SNAPSHOT RELEASE (2018-07-27)
> +--+---+-++++
> | Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
> Delta(GeoMean) |
> +--+---+-++++
> | TPCH(10) | parquet / none / none | 2.69| -4.78% | 2.09   | 
> -3.11% |
> +--+---+-++++
> +--+--+---++-++++-+---+
> | Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
> Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
> +--+--+---++-++++-+---+
> | TPCH(10) | TPCH-Q22 | parquet / none / none | 0.94   | 0.93|   
> +0.75%   |   3.37%|   2.84%| 1   | 30|
> | TPCH(10) | TPCH-Q13 | parquet / none / none | 3.32   | 3.32|   
> +0.13%   |   1.74%|   2.09%| 1   | 30|
> | TPCH(10) | TPCH-Q11 | parquet / none / none | 0.99   | 0.99|   
> -0.02%   |   3.74%|   3.16%| 1   | 30|
> | TPCH(10) | TPCH-Q5  | parquet / none / none | 2.30   | 2.33|   
> -0.96%   |   2.15%|   2.45%| 1   | 30|
> | TPCH(10) | TPCH-Q2  | parquet / none / none | 1.55   | 1.57|   
> -1.45%   |   1.65%|   1.49%| 1   | 30|
> | TPCH(10) | TPCH-Q8  | parquet / none / none | 2.89   | 2.93|   
> -1.51%   |   2.69%|   1.34%| 1   | 30|
> | TPCH(10) | TPCH-Q9  | parquet / none / none | 5.96   | 6.06|   
> -1.63%   |   1.34%|   1.82%| 1   | 30|
> | TPCH(10) | TPCH-Q20 | parquet / none / none | 1.58   | 1.61|   
> -1.85%   |   2.28%|   2.16%| 1   | 30|
> | TPCH(10) | TPCH-Q16 | parquet / none / none | 1.18   | 1.21|   
> -2.11%   |   3.68%|   4.72%| 1   | 30|
> | TPCH(10) | TPCH-Q3  | parquet / none / none | 2.13   | 2.18|   
> -2.31%   |   2.09%|   1.92%| 1   | 30|
> | TPCH(10) | TPCH-Q15 | parquet / none / none | 1.86   | 1.90|   
> -2.52%   |   2.06%|   2.22%| 1   | 30|
> | TPCH(10) | TPCH-Q17 | parquet / none / none | 1.85   | 1.90|   
> -2.86%   |   10.00%   |   8.02%| 1   | 30|
> | TPCH(10) | TPCH-Q10 | parquet / none / none | 2.58   | 2.66|   
> -2.93%   |   1.68%|   6.49%| 1   | 30|
> | TPCH(10) | TPCH-Q14 | parquet / none / none | 1.37   | 1.42|   
> -3.22%   |   3.35%|   6.24%| 1   | 30|
> | 

[jira] [Created] (IMPALA-7551) Inaccurate timeline for "Row Available"

2018-09-10 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7551:


 Summary: Inaccurate timeline for "Row Available" 
 Key: IMPALA-7551
 URL: https://issues.apache.org/jira/browse/IMPALA-7551
 Project: IMPALA
  Issue Type: Improvement
Reporter: Pooja Nilangekar


While debugging IMPALA-6932, it was noticed that the "Rows Available" metric in 
the query profile was a short duration (~ 1 second) for a long running limit 1 
query (~ 1 hour).

Currently, it tracks when Open() from the top-most node in the plan returns, 
not when the first row is actually produced. This can be misleading. A better 
timeline would be to return true when the first non-empty batch was added to 
the PlanRootSink. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7551) Inaccurate timeline for "Row Available"

2018-09-10 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7551:


 Summary: Inaccurate timeline for "Row Available" 
 Key: IMPALA-7551
 URL: https://issues.apache.org/jira/browse/IMPALA-7551
 Project: IMPALA
  Issue Type: Improvement
Reporter: Pooja Nilangekar


While debugging IMPALA-6932, it was noticed that the "Rows Available" metric in 
the query profile was a short duration (~ 1 second) for a long running limit 1 
query (~ 1 hour).

Currently, it tracks when Open() from the top-most node in the plan returns, 
not when the first row is actually produced. This can be misleading. A better 
timeline would be to return true when the first non-empty batch was added to 
the PlanRootSink. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >