date:20190823

[jira] [Closed] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink

2019-08-23 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar closed IMPALA-8818.

Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Replace deque queue with spillable queue in BufferedPlanRootSink
> 
>
> Key: IMPALA-8818
> URL: https://issues.apache.org/jira/browse/IMPALA-8818
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in 
> {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a 
> {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by 
> {{PlanRootSink#computeResourceProfile}}.
> *BufferedTupleStream Usage*:
> The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' 
> mode so that pages are attached to the output {{RowBatch}} in 
> {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. 
> all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns 
> false (it returns false if "the unused reservation was not sufficient to add 
> a new page to the stream large enough to fit 'row' and the stream could not 
> increase the reservation to get enough unused reservation"), it should unpin 
> the stream ({{BufferedTupleStream::UnpinStream}}) and then add the row (if 
> the row still could not be added, then an error must have occurred, perhaps 
> an IO error, in which case return the error and fail the query).
> *Constraining Resources*:
> When result spooling is disabled, a user can run a {{select * from 
> [massive-fact-table]}} and scroll through the results without affecting the 
> health of the Impala cluster (assuming they close they query promptly). 
> Impala will stream the results one batch at a time to the user.
> With result spooling, a naive implementation might try and buffer the enter 
> fact table, and end up spilling all the contents to disk, which can 
> potentially take up a large amount of space. So there needs to be 
> restrictions on the memory and disk space used by the {{BufferedTupleStream}} 
> in order to ensure a scan of a massive table does not consume all the memory 
> or disk space of the Impala coordinator.
> This problem can be solved by placing a max size on the amount of unpinned 
> memory (perhaps through a new config option 
> {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). 
> The max amount of pinned memory should already be constrained by the 
> reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the 
> number of rows returned by a query, and so it should limit the number of rows 
> buffered by the BTS as well (although it is set to 0 by default). 
> SCRATCH_LIMIT already limits the amount of disk space used for spilling 
> (although it is set to -1 by default).
> The {{PlanRootSink}} should attempt to accurately estimate how much memory it 
> needs to buffer all results in memory. This requires setting an accurate 
> value of {{ResourceProfile#memEstimateBytes_}} in 
> {{PlanRootSink#computeResourceProfile}}. If statistics are available, the 
> estimate can be based on the number of estimated rows returned multiplied by 
> the size of the rows returned. The min reservation should account for a read 
> and write page for the {{BufferedTupleStream}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (IMPALA-8891) concat_ws() null handling is non-standard

2019-08-23 Thread Greg Rahn (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914757#comment-16914757
 ] 

Greg Rahn commented on IMPALA-8891:
---

{noformat}
MariaDB [(none)]> select version();
++
| version()  |
++
| 10.4.6-MariaDB |
++

MariaDB [(none)]> select concat_ws('-','foo',null,'bar') as expr1;
+-+
| expr1   |
+-+
| foo-bar |
+-+
{noformat}

{noformat}
impala> select version();
+-+
| version() 
  |
+-+
| impalad version 3.3.0-SNAPSHOT RELEASE (build 
df3e7c051e2641524fc53a0cd07c2a14decd55f7) |
| Built on Thu Aug 22 19:28:57 UTC 2019 
  |
+-+

impala> select concat_ws('-','foo',null,'bar') as expr1;
+---+
| expr1 |
+---+
| NULL  |
+---+
{noformat}

{noformat}
hive> select version();
++
|_c0 |
++
| 3.1.2000.7.0.0.0-463 r7db8023511683e2b30c31bcb6ad5b372b1876eab |
++

hive> select concat_ws('-','foo',null,'bar') as expr1;
+--+
|  expr1   |
+--+
| foo-bar  |
+--+
{noformat}



> concat_ws() null handling is non-standard
> -
>
> Key: IMPALA-8891
> URL: https://issues.apache.org/jira/browse/IMPALA-8891
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0, Impala 3.3.0
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: newbie
>
> [~grahn] reports
> {quote}Looks like Impala’s CONCAT_WS() does not behave correctly if an 
> argument is NULL — it returns NULL and it should not.  Mismatch between 
> Hive/MySQL and Impala (and apologies for not filing a bug)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8885) Improve parquet version metadata error

2019-08-23 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914750#comment-16914750
 ] 

Tim Armstrong commented on IMPALA-8885:
---

The new error template is:
{code}
  ("PARQUET_BAD_VERSION_NUMBER", 60, "File '$0' has an invalid Parquet version 
number: "
   "$1\\n. Please check that it is a valid Parquet file. "
   "This error can also occur due to stale metadata. "
   "If you believe this is a valid Parquet file, try running \\\"refresh 
$2\\\"."),
{code}

> Improve parquet version metadata error
> --
>
> Key: IMPALA-8885
> URL: https://issues.apache.org/jira/browse/IMPALA-8885
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: supportability
> Fix For: Impala 3.4.0
>
>
> The error looks like this now:
> {noformat}
> File 'hdfs://nn/books/books1G.csv' has an invalid version number: .99 This 
> could be due to stale metadata. Try running "refresh s3db.books_s3".
> {noformat}
> It seems to be reasonably common that this happens because a non-parquet file 
> is being queried in a parquet table.
> The error message should say something like "File 
> 'hdfs://nn/books/books1G.csv' has an invalid Parquet version number: .99 and 
> does not appear to be a valid Parquet file. This could be due to stale 
> metadata. Try running "refresh s3db.books_s3"



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-8885) Improve parquet version metadata error

2019-08-23 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8885.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Improve parquet version metadata error
> --
>
> Key: IMPALA-8885
> URL: https://issues.apache.org/jira/browse/IMPALA-8885
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: supportability
> Fix For: Impala 3.4.0
>
>
> The error looks like this now:
> {noformat}
> File 'hdfs://nn/books/books1G.csv' has an invalid version number: .99 This 
> could be due to stale metadata. Try running "refresh s3db.books_s3".
> {noformat}
> It seems to be reasonably common that this happens because a non-parquet file 
> is being queried in a parquet table.
> The error message should say something like "File 
> 'hdfs://nn/books/books1G.csv' has an invalid Parquet version number: .99 and 
> does not appear to be a valid Parquet file. This could be due to stale 
> metadata. Try running "refresh s3db.books_s3"



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (IMPALA-8885) Improve parquet version metadata error

2019-08-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914747#comment-16914747
 ] 

ASF subversion and git services commented on IMPALA-8885:
-

Commit af0e04f33bbf2e93b7676ed7768c335c49b195f2 in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=af0e04f ]

IMPALA-8885: Improve Parquet version metadata error

Update the error message to make it more obvious that
the error could occur by trying to parse a non-Parquet
file as Parquet

Updated tests that depended on the error test.

Change-Id: I2b36586dba14a31a613d79a0e28efc9a5173e75d
Reviewed-on: http://gerrit.cloudera.org:8080/14126
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Improve parquet version metadata error
> --
>
> Key: IMPALA-8885
> URL: https://issues.apache.org/jira/browse/IMPALA-8885
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: supportability
>
> The error looks like this now:
> {noformat}
> File 'hdfs://nn/books/books1G.csv' has an invalid version number: .99 This 
> could be due to stale metadata. Try running "refresh s3db.books_s3".
> {noformat}
> It seems to be reasonably common that this happens because a non-parquet file 
> is being queried in a parquet table.
> The error message should say something like "File 
> 'hdfs://nn/books/books1G.csv' has an invalid Parquet version number: .99 and 
> does not appear to be a valid Parquet file. This could be due to stale 
> metadata. Try running "refresh s3db.books_s3"



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8825) Add additional counters to PlanRootSink

2019-08-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914749#comment-16914749
 ] 

ASF subversion and git services commented on IMPALA-8825:
-

Commit d037ac8304b43f6e4bb4c6ba2eb1910a9e921c24 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d037ac8 ]

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Reviewed-on: http://gerrit.cloudera.org:8080/14039
Reviewed-by: Sahil Takiar 
Tested-by: Impala Public Jenkins 


> Add additional counters to PlanRootSink
> ---
>
> Key: IMPALA-8825
> URL: https://issues.apache.org/jira/browse/IMPALA-8825
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> The current entry in the runtime profile for {{PLAN_ROOT_SINK}} does not 
> contain much useful information:
> {code:java}
> PLAN_ROOT_SINK:(Total: 234.996ms, non-child: 234.996ms, % non-child: 100.00%)
> - PeakMemoryUsage: 0{code}
> There are several additional counters we could add to the {{PlanRootSink}} 
> (either the {{BufferedPlanRootSink}} or {{BlockingPlanRootSink}}):
>  * Amount of time spent blocking inside the {{PlanRootSink}} - both the time 
> spent by the client thread waiting for rows to become available and the time 
> spent by the impala thread waiting for the client to consume rows
>  ** So similar to the {{RowBatchQueueGetWaitTime}} and 
> {{RowBatchQueuePutWaitTime}} inside the

[jira] [Commented] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink

2019-08-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914748#comment-16914748
 ] 

ASF subversion and git services commented on IMPALA-8818:
-

Commit d037ac8304b43f6e4bb4c6ba2eb1910a9e921c24 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d037ac8 ]

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Reviewed-on: http://gerrit.cloudera.org:8080/14039
Reviewed-by: Sahil Takiar 
Tested-by: Impala Public Jenkins 


> Replace deque queue with spillable queue in BufferedPlanRootSink
> 
>
> Key: IMPALA-8818
> URL: https://issues.apache.org/jira/browse/IMPALA-8818
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in 
> {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a 
> {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by 
> {{PlanRootSink#computeResourceProfile}}.
> *BufferedTupleStream Usage*:
> The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' 
> mode so that pages are attached to the output {{RowBatch}} in 
> {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. 
> all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns 
> false (it returns false if "the unused reservation was not

[jira] [Resolved] (IMPALA-8885) Improve parquet version metadata error

2019-08-23 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8885.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Improve parquet version metadata error
> --
>
> Key: IMPALA-8885
> URL: https://issues.apache.org/jira/browse/IMPALA-8885
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: supportability
> Fix For: Impala 3.4.0
>
>
> The error looks like this now:
> {noformat}
> File 'hdfs://nn/books/books1G.csv' has an invalid version number: .99 This 
> could be due to stale metadata. Try running "refresh s3db.books_s3".
> {noformat}
> It seems to be reasonably common that this happens because a non-parquet file 
> is being queried in a parquet table.
> The error message should say something like "File 
> 'hdfs://nn/books/books1G.csv' has an invalid Parquet version number: .99 and 
> does not appear to be a valid Parquet file. This could be due to stale 
> metadata. Try running "refresh s3db.books_s3"



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch

2019-08-23 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914745#comment-16914745
 ] 

Tim Armstrong commented on IMPALA-8890:
---

Also, I guess the reason that this is all so complicated is the need to manage 
the buffer reservation when iterating over a read/write stream, and handle the 
various pinned and unpinned states.

The cases when transitioning from having read & write iterators pointing to the 
same page to different pages was complicated because we had to keep extra 
reservation on hand. There were just a lot of states and state transitions. The 
logic around ExpectedPinCount() was intended to make this simpler in a way - 
instead of trying to handle each state transition separately, it instead 
computes the expected state in the new state and then pins or unpins things 
accordingly.

> DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch 
> 
>
> Key: IMPALA-8890
> URL: https://issues.apache.org/jira/browse/IMPALA-8890
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Blocker
> Attachments: impalad.INFO, resolved.txt
>
>
> Full stack:
> {code}
> F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] 
> 6a4941285b46788d:68021ec6] Check failed: 
> !page->attached_to_output_batch
> *** Check failure stack trace: ***
> @  0x4c987cc  google::LogMessage::Fail()
> @  0x4c9a071  google::LogMessage::SendToLog()
> @  0x4c981a6  google::LogMessage::Flush()
> @  0x4c9b76d  google::LogMessageFatal::~LogMessageFatal()
> @  0x2917f78  impala::BufferedTupleStream::ExpectedPinCount()
> @  0x29181ec  impala::BufferedTupleStream::UnpinPageIfNeeded()
> @  0x291b27b  impala::BufferedTupleStream::UnpinStream()
> @  0x297d429  impala::SpillableRowBatchQueue::AddBatch()
> @  0x25d5537  impala::BufferedPlanRootSink::Send()
> @  0x207e94c  impala::FragmentInstanceState::ExecInternal()
> @  0x207afac  impala::FragmentInstanceState::Exec()
> @  0x208e854  impala::QueryState::ExecFInstance()
> @  0x208cb21  
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
> @  0x2090536  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x1e9830b  boost::function0<>::operator()()
> @  0x23e2d38  impala::Thread::SuperviseThread()
> @  0x23eb0bc  boost::_bi::list5<>::operator()<>()
> @  0x23eafe0  boost::_bi::bind_t<>::operator()()
> @  0x23eafa3  boost::detail::thread_data<>::run()
> @  0x3bc1629  thread_proxy
> @ 0x7f920a3786b9  start_thread
> @ 0x7f9206b5741c  clone
> {code}
> Happened once while I was running a full table scan of 
> {{tpch_parquet.orders}} via JDBC (client was running on another EC2 machine). 
> This was running on top of IMPALA-8819 with a fetch size of 32768.
> Attached full logs and mini-dump stack.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch

2019-08-23 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914741#comment-16914741
 ] 

Tim Armstrong commented on IMPALA-8890:
---

[~stakiar] yeah I'm pretty sure this is a BTS bug that we haven't seen before 
because the usage patterns of other nodes are different.

The caller of GetNext() does the right thing by processing the returned batch 
and then resetting it before any other BTS methods are called. That would free 
any pages that were attached to the batch.

I think the cleanest way to fix it might be to advance the read page when you 
encounter this situation in UnpinStream(). It should be safe to do that since 
you'll be at the end of the current read page, and then buffer management is 
simplified because the first page in the stream is the one you need to keep 
pinned.
{code}
  if (pinned_) {
CHECK_CONSISTENCY_FULL();
if (read_page_ != pages_.end() &&  read_page_rows_returned_ == 
read_page_->num_rows) {
  RETURN_IF_ERROR(NextReadPage());
   }
  ..
{code}

> DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch 
> 
>
> Key: IMPALA-8890
> URL: https://issues.apache.org/jira/browse/IMPALA-8890
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Blocker
> Attachments: impalad.INFO, resolved.txt
>
>
> Full stack:
> {code}
> F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] 
> 6a4941285b46788d:68021ec6] Check failed: 
> !page->attached_to_output_batch
> *** Check failure stack trace: ***
> @  0x4c987cc  google::LogMessage::Fail()
> @  0x4c9a071  google::LogMessage::SendToLog()
> @  0x4c981a6  google::LogMessage::Flush()
> @  0x4c9b76d  google::LogMessageFatal::~LogMessageFatal()
> @  0x2917f78  impala::BufferedTupleStream::ExpectedPinCount()
> @  0x29181ec  impala::BufferedTupleStream::UnpinPageIfNeeded()
> @  0x291b27b  impala::BufferedTupleStream::UnpinStream()
> @  0x297d429  impala::SpillableRowBatchQueue::AddBatch()
> @  0x25d5537  impala::BufferedPlanRootSink::Send()
> @  0x207e94c  impala::FragmentInstanceState::ExecInternal()
> @  0x207afac  impala::FragmentInstanceState::Exec()
> @  0x208e854  impala::QueryState::ExecFInstance()
> @  0x208cb21  
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
> @  0x2090536  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x1e9830b  boost::function0<>::operator()()
> @  0x23e2d38  impala::Thread::SuperviseThread()
> @  0x23eb0bc  boost::_bi::list5<>::operator()<>()
> @  0x23eafe0  boost::_bi::bind_t<>::operator()()
> @  0x23eafa3  boost::detail::thread_data<>::run()
> @  0x3bc1629  thread_proxy
> @ 0x7f920a3786b9  start_thread
> @ 0x7f9206b5741c  clone
> {code}
> Happened once while I was running a full table scan of 
> {{tpch_parquet.orders}} via JDBC (client was running on another EC2 machine). 
> This was running on top of IMPALA-8819 with a fetch size of 32768.
> Attached full logs and mini-dump stack.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8891) concat_ws() null handling is non-standard

2019-08-23 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8891:
--
Labels: newbie  (was: )

> concat_ws() null handling is non-standard
> -
>
> Key: IMPALA-8891
> URL: https://issues.apache.org/jira/browse/IMPALA-8891
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0, Impala 3.3.0
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: newbie
>
> [~grahn] reports
> {quote}Looks like Impala’s CONCAT_WS() does not behave correctly if an 
> argument is NULL — it returns NULL and it should not.  Mismatch between 
> Hive/MySQL and Impala (and apologies for not filing a bug)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8891) concat_ws() null handling is non-standard

2019-08-23 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-8891:
-

 Summary: concat_ws() null handling is non-standard
 Key: IMPALA-8891
 URL: https://issues.apache.org/jira/browse/IMPALA-8891
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.2.0, Impala 3.3.0
Reporter: Tim Armstrong


[~grahn] reports

{quote}Looks like Impala’s CONCAT_WS() does not behave correctly if an argument 
is NULL — it returns NULL and it should not.  Mismatch between Hive/MySQL and 
Impala (and apologies for not filing a bug)
{quote}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (IMPALA-8891) concat_ws() null handling is non-standard

2019-08-23 Thread Tim Armstrong (Jira)

Tim Armstrong created IMPALA-8891:
-

 Summary: concat_ws() null handling is non-standard
 Key: IMPALA-8891
 URL: https://issues.apache.org/jira/browse/IMPALA-8891
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.2.0, Impala 3.3.0
Reporter: Tim Armstrong


[~grahn] reports

{quote}Looks like Impala’s CONCAT_WS() does not behave correctly if an argument 
is NULL — it returns NULL and it should not.  Mismatch between Hive/MySQL and 
Impala (and apologies for not filing a bug)
{quote}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch

2019-08-23 Thread Sahil Takiar (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914715#comment-16914715
 ] 

Sahil Takiar edited comment on IMPALA-8890 at 8/23/19 11:38 PM:


Can re-produce this pretty consistently now (same setup as above but added some 
additional logging and changed the mini cluster to have 1 dedicated coordinator 
and 3 executors). Here is what I found out so far:
 * It doesn't *look* like a race condition
 * It seems to only happen when:
 ** Rows are being added to the {{BufferedTupleStream}} successfully, until: 
{{SpillableRowBatchQueue::GetBatch}} >>> {{BufferedTupleStream::GetNext}} >>> 
{{read_page_->AttachBufferToBatch}}
 ** Then, without any additional calls to {{BufferedTupleStream::GetNext}} (I 
think this part may be relevant because everything works if there are 
additional calls to {{GetNext}}), {{SpillableRowBatchQueue::AddBatch}} >>> 
{{BufferedTupleStream::AddRow}} is called repeatedly (for multiple 
{{RowBatch}}-es)
 ** This continues until eventually {{BufferedTupleStream::AddRow}} returns 
false (presumably because the reservation limits have been hit), and then 
{{BufferedTupleStream::UnpinStream}} is called, which eventually hits the 
DCHECK above
 * The DCHECK is hit because:
 ** Looking at the state of the {{Page}}-s in the {{BufferedTupleStream}} it 
looks like the last call to {{BufferedTupleStream::GetNext}} calls 
{{BufferedTupleStream::AttachBufferToBatch}} on the {{read_page_}} which sets 
{{attached_to_output_batch}} to true for the {{Page}}
 ** Then {{UnpinStream}} is called, iterates through all the {{pages_}} and 
sees that the {{read_page_}} has {{attached_to_output_batch}} set to true and 
then fails (I confirmed through logging that it fails specifically on the 
{{read_page_}} that had {{attached_to_output_batch}} set to true above)
 ** *If* there had been an additional call to {{GetNext}} then 
{{NextReadPage()}} would have been called which was calls 
{{pages_.pop_front()}} and removes the {{read_page_}} with 
{{attached_to_output_batch}} set to true from the list of pages

So *I think* this is a bug in {{BufferedTupleStream}}, unless there is 
something off with the way {{SpillableRowBatchQueue}} is using 
{{BufferedTupleStream}}, wondering what [~tarmstr...@cloudera.com] thinks?

Here is a snippet of the modified logs that may shows things more clearly:
{code:java}
I0823 16:08:32.386766 33770 buffered-plan-root-sink.cc:169] Getting Batch
I0823 16:08:32.388576 33770 buffered-plan-root-sink.cc:169] Getting Batch
...
I0823 16:08:32.394234 33770 buffered-tuple-stream.cc:804] Calling 
AttachBufferToBatch
I0823 16:08:32.394240 33770 buffered-tuple-stream.cc:204] Setting 
attached_to_output_batch to true for page 0x18cc8880
I0823 16:08:32.394279 33770 buffered-plan-root-sink.cc:209] Returning rows = 
32768
I0823 16:08:32.394289 33770 impala-hs2-server.cc:842] FetchResults(): 
#results=0 has_more=true
I0823 16:08:32.394300 33781 buffered-plan-root-sink.cc:77] 
f348799ab855e68e:697ddff1] Adding Batch
I0823 16:08:32.395067 33781 buffered-plan-root-sink.cc:77] 
f348799ab855e68e:697ddff1] Adding Batch
...
I0823 16:08:32.431181 33781 spillable-row-batch-queue.cc:79] 
f348799ab855e68e:697ddff1] SpillableRowBatchQueue about to start 
spilling BufferedTupleStream num_rows=1152120 rows_returned=557901 pinned=1 
attach_on_read=1 closed=0
 bytes_pinned=102760448 has_write_iterator=1 write_page=0x18dfd510 
has_read_iterator=1 read_page=0x18cc8880
 read_page_reservation=0 write_page_reservation=0
 # pages=50 pages=[
{ 0x18cc8880 CLOSED num_rows=12123 retrived_buffer=true 
attached_to_output_batch=true},
{ 0x1ba6e100 client: 0x13616b88/0x13b921c0 page: 
{ 0x15927f40 len: 2097152 pin_count: 1 buf: 
 0x15927fb8 client: 0x13616b88/0x13b921c0 data: 
0x2740 len: 2097152} num_rows=12107 retrived_buffer=true 
attached_to_output_batch=false},
...
0823 16:08:32.431262 33781 buffered-tuple-stream.cc:292] 
f348799ab855e68e:697ddff1] Check failed: 
!page->attached_to_output_batch check failed for page = 
 0x18cc8880 CLOSED num_rows=12123 retrived_buffer=true 
attached_to_output_batch=true{code}


was (Author: stakiar):
Can re-produce this pretty consistently now (same setup as above but added some 
additional logging and changed the mini cluster to have 1 dedicated coordinator 
and 3 executors). Here is what I found out so far:
 * It doesn't *look* like a race condition
 * It seems to only happen when:
 ** Rows are being added to the {{BufferedTupleStream}} successfully, until: 
{{SpillableRowBatchQueue::GetBatch}} --> {{BufferedTupleStream::GetNext}} --> 
{{read_page_->AttachBufferToBatch}}
 ** Then, without any additional calls to {{BufferedTupleStream::GetNext}} (I 
think this part may be relevant because everything works if there are 
additional calls to {{GetNext}}), {{SpillableRowBatchQueue::AddBatch}} -->

[jira] [Commented] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch

2019-08-23 Thread Sahil Takiar (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914715#comment-16914715
 ] 

Sahil Takiar commented on IMPALA-8890:
--

Can re-produce this pretty consistently now (same setup as above but added some 
additional logging and changed the mini cluster to have 1 dedicated coordinator 
and 3 executors). Here is what I found out so far:
 * It doesn't *look* like a race condition
 * It seems to only happen when:
 ** Rows are being added to the {{BufferedTupleStream}} successfully, until: 
{{SpillableRowBatchQueue::GetBatch}} --> {{BufferedTupleStream::GetNext}} --> 
{{read_page_->AttachBufferToBatch}}
 ** Then, without any additional calls to {{BufferedTupleStream::GetNext}} (I 
think this part may be relevant because everything works if there are 
additional calls to {{GetNext}}), {{SpillableRowBatchQueue::AddBatch}} --> 
{{BufferedTupleStream::AddRow}} is called repeatedly (for multiple 
{{RowBatch}}-es)
 ** This continues until eventually {{BufferedTupleStream::AddRow}} returns 
false (presumably because the reservation limits have been hit), and then 
{{BufferedTupleStream::UnpinStream}} is called, which eventually hits the 
DCHECK above
 * The DCHECK is hit because:
 ** Looking at the state of the {{Page}}-s in the {{BufferedTupleStream}} it 
looks like the last call to {{BufferedTupleStream::GetNext}} calls 
{{BufferedTupleStream::AttachBufferToBatch}} on the {{read_page_}} which sets 
{{attached_to_output_batch}} to true for the {{Page}}
 ** Then {{UnpinStream}} is called, iterates through all the {{pages_}} and 
sees that the {{read_page_}} has {{attached_to_output_batch}} set to true and 
then fails (I confirmed through logging that it fails specifically on the 
{{read_page_}} that had {{attached_to_output_batch}} set to true above)
 ** *If* there had been an additional call to {{GetNext}} then 
{{NextReadPage()}} would have been called which was calls 
{{pages_.pop_front()}} and removes the {{read_page_}} with 
{{attached_to_output_batch}} set to true from the list of pages

So *I think* this is a bug in {{BufferedTupleStream}}, unless there is 
something off with the way {{SpillableRowBatchQueue}} is using 
{{BufferedTupleStream}}, wondering what [~tarmstr...@cloudera.com] thinks?

Here is a snippet of the modified logs that may shows things more clearly:
{code:java}
I0823 16:08:32.386766 33770 buffered-plan-root-sink.cc:169] Getting Batch
I0823 16:08:32.388576 33770 buffered-plan-root-sink.cc:169] Getting Batch
...
I0823 16:08:32.394234 33770 buffered-tuple-stream.cc:804] Calling 
AttachBufferToBatch
I0823 16:08:32.394240 33770 buffered-tuple-stream.cc:204] Setting 
attached_to_output_batch to true for page 0x18cc8880
I0823 16:08:32.394279 33770 buffered-plan-root-sink.cc:209] Returning rows = 
32768
I0823 16:08:32.394289 33770 impala-hs2-server.cc:842] FetchResults(): 
#results=0 has_more=true
I0823 16:08:32.394300 33781 buffered-plan-root-sink.cc:77] 
f348799ab855e68e:697ddff1] Adding Batch
I0823 16:08:32.395067 33781 buffered-plan-root-sink.cc:77] 
f348799ab855e68e:697ddff1] Adding Batch
...
I0823 16:08:32.431181 33781 spillable-row-batch-queue.cc:79] 
f348799ab855e68e:697ddff1] SpillableRowBatchQueue about to start 
spilling BufferedTupleStream num_rows=1152120 rows_returned=557901 pinned=1 
attach_on_read=1 closed=0
 bytes_pinned=102760448 has_write_iterator=1 write_page=0x18dfd510 
has_read_iterator=1 read_page=0x18cc8880
 read_page_reservation=0 write_page_reservation=0
 # pages=50 pages=[
{ 0x18cc8880 CLOSED num_rows=12123 retrived_buffer=true 
attached_to_output_batch=true},
{ 0x1ba6e100 client: 0x13616b88/0x13b921c0 page: 
{ 0x15927f40 len: 2097152 pin_count: 1 buf: 
 0x15927fb8 client: 0x13616b88/0x13b921c0 data: 
0x2740 len: 2097152} num_rows=12107 retrived_buffer=true 
attached_to_output_batch=false},
...
0823 16:08:32.431262 33781 buffered-tuple-stream.cc:292] 
f348799ab855e68e:697ddff1] Check failed: 
!page->attached_to_output_batch check failed for page = 
 0x18cc8880 CLOSED num_rows=12123 retrived_buffer=true 
attached_to_output_batch=true{code}

> DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch 
> 
>
> Key: IMPALA-8890
> URL: https://issues.apache.org/jira/browse/IMPALA-8890
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Blocker
> Attachments: impalad.INFO, resolved.txt
>
>
> Full stack:
> {code}
> F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] 
> 6a4941285b46788d:68021ec6] Check failed: 
> !page->attached_to_output_batch
> *** Check failure stack trace: ***
> @  0x4c987cc

[jira] [Commented] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch

2019-08-23 Thread Sahil Takiar (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914632#comment-16914632
 ] 

Sahil Takiar commented on IMPALA-8890:
--

Could be a bug in the implementation of IMPALA-8819, not sure yet.

> DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch 
> 
>
> Key: IMPALA-8890
> URL: https://issues.apache.org/jira/browse/IMPALA-8890
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Blocker
> Attachments: impalad.INFO, resolved.txt
>
>
> Full stack:
> {code}
> F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] 
> 6a4941285b46788d:68021ec6] Check failed: 
> !page->attached_to_output_batch
> *** Check failure stack trace: ***
> @  0x4c987cc  google::LogMessage::Fail()
> @  0x4c9a071  google::LogMessage::SendToLog()
> @  0x4c981a6  google::LogMessage::Flush()
> @  0x4c9b76d  google::LogMessageFatal::~LogMessageFatal()
> @  0x2917f78  impala::BufferedTupleStream::ExpectedPinCount()
> @  0x29181ec  impala::BufferedTupleStream::UnpinPageIfNeeded()
> @  0x291b27b  impala::BufferedTupleStream::UnpinStream()
> @  0x297d429  impala::SpillableRowBatchQueue::AddBatch()
> @  0x25d5537  impala::BufferedPlanRootSink::Send()
> @  0x207e94c  impala::FragmentInstanceState::ExecInternal()
> @  0x207afac  impala::FragmentInstanceState::Exec()
> @  0x208e854  impala::QueryState::ExecFInstance()
> @  0x208cb21  
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
> @  0x2090536  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x1e9830b  boost::function0<>::operator()()
> @  0x23e2d38  impala::Thread::SuperviseThread()
> @  0x23eb0bc  boost::_bi::list5<>::operator()<>()
> @  0x23eafe0  boost::_bi::bind_t<>::operator()()
> @  0x23eafa3  boost::detail::thread_data<>::run()
> @  0x3bc1629  thread_proxy
> @ 0x7f920a3786b9  start_thread
> @ 0x7f9206b5741c  clone
> {code}
> Happened once while I was running a full table scan of 
> {{tpch_parquet.orders}} via JDBC (client was running on another EC2 machine). 
> This was running on top of IMPALA-8819 with a fetch size of 32768.
> Attached full logs and mini-dump stack.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch

2019-08-23 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-8890:


 Summary: DCHECK(!page->attached_to_output_batch) in 
SpillableRowBatchQueue::AddBatch 
 Key: IMPALA-8890
 URL: https://issues.apache.org/jira/browse/IMPALA-8890
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Sahil Takiar
Assignee: Sahil Takiar
 Attachments: impalad.INFO, resolved.txt

Full stack:

{code}
F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] 
6a4941285b46788d:68021ec6] Check failed: !page->attached_to_output_batch
*** Check failure stack trace: ***
@  0x4c987cc  google::LogMessage::Fail()
@  0x4c9a071  google::LogMessage::SendToLog()
@  0x4c981a6  google::LogMessage::Flush()
@  0x4c9b76d  google::LogMessageFatal::~LogMessageFatal()
@  0x2917f78  impala::BufferedTupleStream::ExpectedPinCount()
@  0x29181ec  impala::BufferedTupleStream::UnpinPageIfNeeded()
@  0x291b27b  impala::BufferedTupleStream::UnpinStream()
@  0x297d429  impala::SpillableRowBatchQueue::AddBatch()
@  0x25d5537  impala::BufferedPlanRootSink::Send()
@  0x207e94c  impala::FragmentInstanceState::ExecInternal()
@  0x207afac  impala::FragmentInstanceState::Exec()
@  0x208e854  impala::QueryState::ExecFInstance()
@  0x208cb21  _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
@  0x2090536  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
@  0x1e9830b  boost::function0<>::operator()()
@  0x23e2d38  impala::Thread::SuperviseThread()
@  0x23eb0bc  boost::_bi::list5<>::operator()<>()
@  0x23eafe0  boost::_bi::bind_t<>::operator()()
@  0x23eafa3  boost::detail::thread_data<>::run()
@  0x3bc1629  thread_proxy
@ 0x7f920a3786b9  start_thread
@ 0x7f9206b5741c  clone
{code}

Happened once while I was running a full table scan of {{tpch_parquet.orders}} 
via JDBC (client was running on another EC2 machine). This was running on top 
of IMPALA-8819 with a fetch size of 32768.

Attached full logs and mini-dump stack.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch

2019-08-23 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-8890:


 Summary: DCHECK(!page->attached_to_output_batch) in 
SpillableRowBatchQueue::AddBatch 
 Key: IMPALA-8890
 URL: https://issues.apache.org/jira/browse/IMPALA-8890
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Sahil Takiar
Assignee: Sahil Takiar
 Attachments: impalad.INFO, resolved.txt

Full stack:

{code}
F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] 
6a4941285b46788d:68021ec6] Check failed: !page->attached_to_output_batch
*** Check failure stack trace: ***
@  0x4c987cc  google::LogMessage::Fail()
@  0x4c9a071  google::LogMessage::SendToLog()
@  0x4c981a6  google::LogMessage::Flush()
@  0x4c9b76d  google::LogMessageFatal::~LogMessageFatal()
@  0x2917f78  impala::BufferedTupleStream::ExpectedPinCount()
@  0x29181ec  impala::BufferedTupleStream::UnpinPageIfNeeded()
@  0x291b27b  impala::BufferedTupleStream::UnpinStream()
@  0x297d429  impala::SpillableRowBatchQueue::AddBatch()
@  0x25d5537  impala::BufferedPlanRootSink::Send()
@  0x207e94c  impala::FragmentInstanceState::ExecInternal()
@  0x207afac  impala::FragmentInstanceState::Exec()
@  0x208e854  impala::QueryState::ExecFInstance()
@  0x208cb21  _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
@  0x2090536  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
@  0x1e9830b  boost::function0<>::operator()()
@  0x23e2d38  impala::Thread::SuperviseThread()
@  0x23eb0bc  boost::_bi::list5<>::operator()<>()
@  0x23eafe0  boost::_bi::bind_t<>::operator()()
@  0x23eafa3  boost::detail::thread_data<>::run()
@  0x3bc1629  thread_proxy
@ 0x7f920a3786b9  start_thread
@ 0x7f9206b5741c  clone
{code}

Happened once while I was running a full table scan of {{tpch_parquet.orders}} 
via JDBC (client was running on another EC2 machine). This was running on top 
of IMPALA-8819 with a fetch size of 32768.

Attached full logs and mini-dump stack.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (IMPALA-8889) Incorrect exception message when trying unsupported option for acid tables

2019-08-23 Thread Yongzhi Chen (Jira)

Yongzhi Chen created IMPALA-8889:


 Summary: Incorrect exception message when trying unsupported 
option for acid tables
 Key: IMPALA-8889
 URL: https://issues.apache.org/jira/browse/IMPALA-8889
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.3.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


when we try unsupported option say alter table on acid tables from , it thows 
an exception which is expected but it gives a wrong message :
 It says we only support Read for insert-only tables which is not true anymore, 
since we also support insert, drop ( and soon truncate) also now.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (IMPALA-8889) Incorrect exception message when trying unsupported option for acid tables

2019-08-23 Thread Yongzhi Chen (Jira)

Yongzhi Chen created IMPALA-8889:


 Summary: Incorrect exception message when trying unsupported 
option for acid tables
 Key: IMPALA-8889
 URL: https://issues.apache.org/jira/browse/IMPALA-8889
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.3.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


when we try unsupported option say alter table on acid tables from , it thows 
an exception which is expected but it gives a wrong message :
 It says we only support Read for insert-only tables which is not true anymore, 
since we also support insert, drop ( and soon truncate) also now.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8691) Query hint for disabling data caching

2019-08-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914553#comment-16914553
 ] 

ASF subversion and git services commented on IMPALA-8691:
-

Commit 9874ce37a989240571e2473dce3153357a0e417f in impala's branch 
refs/heads/master from Michael Ho
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9874ce3 ]

IMPALA-8691: Query option to disable data cache

This change adds a query option to disable the data cache for
a given session. By default, this option is set to false. When
it's set to true, all queries will by-pass the data cache. This
allows users to avoid polluting the cache for accesses to tables
which they don't want to cache. A follow-up change will add
a per-table query hint to allow caching disabled for a given
table only.

There is some small refactoring in the code to make it clearer
the type of caching being referred to in the code. As the code
stands now, we have both HDFS caching (for local reads) and the
data cache (for remote reads). BufferOpts has been extended to
allow users to explicitly state intention for using either/both
of the caches.

Change-Id: I39122ac38435cedf94b2b39145863764d0b5b6c8
Reviewed-on: http://gerrit.cloudera.org:8080/14015
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> Query hint for disabling data caching
> -
>
> Key: IMPALA-8691
> URL: https://issues.apache.org/jira/browse/IMPALA-8691
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Major
>
> IMPALA-8690 tracks the effort for a better eviction algorithm for the 
> Impala's data cache. As a short term workaround, it would be nice to allow 
> users to explicitly set certain tables as not cacheable via query hints or 
> simply disable caching for a query via query options.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7506) Support global INVALIDATE METADATA on fetch-on-demand impalad

2019-08-23 Thread Dinesh Garg (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-7506:
---

Assignee: Quanlong Huang

> Support global INVALIDATE METADATA on fetch-on-demand impalad
> -
>
> Key: IMPALA-7506
> URL: https://issues.apache.org/jira/browse/IMPALA-7506
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Todd Lipcon
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: catalog-v2
>
> There is some complexity with how this is implemented in the original code: 
> it depends on maintaining the minimum version of any object in the impalad's 
> local cache. We can't determine that in an on-demand impalad, so INVALIDATE 
> METADATA is not supported currently on "fetch-on-demand".



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7604) In AggregationNode.computeStats, handle cardinality overflow better

2019-08-23 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914550#comment-16914550
 ] 

Tim Armstrong commented on IMPALA-7604:
---

Looked again, this is kinda nasty - it can actually overflow and get set to 0 
in some cases.

> In AggregationNode.computeStats, handle cardinality overflow better
> ---
>
> Key: IMPALA-7604
> URL: https://issues.apache.org/jira/browse/IMPALA-7604
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0
>Reporter: Paul Rogers
>Assignee: Tim Armstrong
>Priority: Major
>
> Consider the cardinality overflow logic in 
> [{{AggregationNode.computeStats()}}|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/AggregationNode.java].
>  Current code:
> {noformat}
> // if we ended up with an overflow, the estimate is certain to be wrong
> if (cardinality_ < 0) cardinality_ = -1;
> {noformat}
> This code has a number of issues.
> * The check is done after looping over all conjuncts. It could be that, as a 
> result, the number overflowed twice. The check should be done after each 
> multiplication.
> * Since we know that the number overflowed, a better estimate of the total 
> count is {{Long.MAX_VALUE}}.
> * The code later checks for the -1 value and, if found, uses the cardinality 
> of the first child. This is a worse estimate than using the max value, since 
> the first child might have a low cardinality (it could be the later children 
> that caused the overflow.)
> * If we really do expect overflow, then we are dealing with very large 
> numbers. Being accurate to the row is not needed. Better to use a {{double}} 
> which can handle the large values.
> Since overflow probably seldom occurs, this is not an urgent issue. Though, 
> if overflow does occur, the query is huge, and having at least some estimate 
> of the hugeness is better than none. Also, seems that this code probably 
> evolved; this newbie is looking at it fresh and seeing that the accumulated 
> fixes could be tidied up.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-8885) Improve parquet version metadata error

2019-08-23 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8885 started by Tim Armstrong.
-
> Improve parquet version metadata error
> --
>
> Key: IMPALA-8885
> URL: https://issues.apache.org/jira/browse/IMPALA-8885
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: supportability
>
> The error looks like this now:
> {noformat}
> File 'hdfs://nn/books/books1G.csv' has an invalid version number: .99 This 
> could be due to stale metadata. Try running "refresh s3db.books_s3".
> {noformat}
> It seems to be reasonably common that this happens because a non-parquet file 
> is being queried in a parquet table.
> The error message should say something like "File 
> 'hdfs://nn/books/books1G.csv' has an invalid Parquet version number: .99 and 
> does not appear to be a valid Parquet file. This could be due to stale 
> metadata. Try running "refresh s3db.books_s3"



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7027) Multiple Cast to Varchar with different limit fails with "AnalysisException: null CAUSED BY: IllegalArgumentException: "

2019-08-23 Thread Yongzhi Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen resolved IMPALA-7027.
--
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Multiple Cast to Varchar with different limit fails with "AnalysisException: 
> null CAUSED BY: IllegalArgumentException: "
> 
>
> Key: IMPALA-7027
> URL: https://issues.apache.org/jira/browse/IMPALA-7027
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 
> 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Meenakshi
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: planner, regression
> Fix For: Impala 3.4.0
>
>
> If we have multiple cast of '' to varchar statements in a impala query which 
> has a distinct like below, the query breaks for scenario when the cast to 
> varchar limit in the SQL is lower than the previous cast.
>  
> Query 1> Fails with " AnalysisException: null CAUSED BY: 
> IllegalArgumentException: targetType=VARCHAR(100) type=VARCHAR(101)"
> SELECT DISTINCT CAST('' as VARCHAR(101)) as CL_COMMENTS,CAST('' as 
> VARCHAR(100))  as CL_USER_ID FROM tablename limit 1
> Where as the below query succeeds
> Query 2> Success
>  SELECT DISTINCT CAST('' as VARCHAR(100)) as CL_COMMENTS,CAST('' as 
> VARCHAR(101))  as CL_USER_ID FROM  tablename limit 1
> *Workaround*
> SET ENABLE_EXPR_REWRITES=false;



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Resolved] (IMPALA-7027) Multiple Cast to Varchar with different limit fails with "AnalysisException: null CAUSED BY: IllegalArgumentException: "

2019-08-23 Thread Yongzhi Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen resolved IMPALA-7027.
--
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Multiple Cast to Varchar with different limit fails with "AnalysisException: 
> null CAUSED BY: IllegalArgumentException: "
> 
>
> Key: IMPALA-7027
> URL: https://issues.apache.org/jira/browse/IMPALA-7027
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 
> 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Meenakshi
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: planner, regression
> Fix For: Impala 3.4.0
>
>
> If we have multiple cast of '' to varchar statements in a impala query which 
> has a distinct like below, the query breaks for scenario when the cast to 
> varchar limit in the SQL is lower than the previous cast.
>  
> Query 1> Fails with " AnalysisException: null CAUSED BY: 
> IllegalArgumentException: targetType=VARCHAR(100) type=VARCHAR(101)"
> SELECT DISTINCT CAST('' as VARCHAR(101)) as CL_COMMENTS,CAST('' as 
> VARCHAR(100))  as CL_USER_ID FROM tablename limit 1
> Where as the below query succeeds
> Query 2> Success
>  SELECT DISTINCT CAST('' as VARCHAR(100)) as CL_COMMENTS,CAST('' as 
> VARCHAR(101))  as CL_USER_ID FROM  tablename limit 1
> *Workaround*
> SET ENABLE_EXPR_REWRITES=false;



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8888) Profile fetch performance when result spooling is enabled

2019-08-23 Thread Sahil Takiar (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914506#comment-16914506
 ] 

Sahil Takiar commented on IMPALA-:
--

After talking with Tim offline, it seems that using a JDBC driver might be 
better than impala-shell (impala-shell is slow enough that server side perf 
improvements to this code probably don't affect latency). So will benchmark 
with JDBC instead.

> Profile fetch performance when result spooling is enabled
> -
>
> Key: IMPALA-
> URL: https://issues.apache.org/jira/browse/IMPALA-
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Profile the performance of fetching rows when result spooling is enabled. 
> There are a few queries that can be used to benchmark the performance:
> {{time ./bin/impala-shell.sh -B -q "select l_orderkey from 
> tpch_parquet.lineitem" > /dev/null}}
> {{time ./bin/impala-shell.sh -B -q "select * from tpch_parquet.orders" > 
> /dev/null}}
> The first fetches one column and 6,001,215 the second fetches 9 columns and 
> 1,500,000 - so a mix of rows fetched vs. columns fetched.
> The base line for the benchmark should be the commit prior to IMPALA-8780.
> The benchmark should check for both latency and CPU usage (to see if the copy 
> into {{BufferedTupleStream}} has a significant overhead).
> Various fetch sizes should be used in the benchmark as well to see if 
> increasing the fetch size for result spooling improves performance (ideally 
> it should) (it would be nice to run some fetches between machines as well as 
> that will better reflect network round trip latencies).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-8887) Conflicting links to issue tracker

2019-08-23 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8887.
---
Resolution: Fixed

Thanks for catching this, we obviously missed this when migrating that page a 
couple of years ago.

> Conflicting links to issue tracker
> --
>
> Key: IMPALA-8887
> URL: https://issues.apache.org/jira/browse/IMPALA-8887
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Tim Armstrong
>Priority: Major
>
> The following CWiki page has two links for reporting issues:
> https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala
> "please open a new JIRA ticket at the Impala JIRA tracker"
> and
> "Impala has a very active JIRA instance."
> The former links to https://issues.cloudera.org/projects/IMPALA/
> whereas the latter links to https://issues.apache.org/jira/projects/IMPALA/
> The cloudera tracker does not have any recent tickets so I assume it is no 
> longer used, and should not be referenced.
> Ideally flag the old tracker as obsolete with a link to the new tracker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Resolved] (IMPALA-8887) Conflicting links to issue tracker

2019-08-23 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8887.
---
Resolution: Fixed

Thanks for catching this, we obviously missed this when migrating that page a 
couple of years ago.

> Conflicting links to issue tracker
> --
>
> Key: IMPALA-8887
> URL: https://issues.apache.org/jira/browse/IMPALA-8887
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Tim Armstrong
>Priority: Major
>
> The following CWiki page has two links for reporting issues:
> https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala
> "please open a new JIRA ticket at the Impala JIRA tracker"
> and
> "Impala has a very active JIRA instance."
> The former links to https://issues.cloudera.org/projects/IMPALA/
> whereas the latter links to https://issues.apache.org/jira/projects/IMPALA/
> The cloudera tracker does not have any recent tickets so I assume it is no 
> longer used, and should not be referenced.
> Ideally flag the old tracker as obsolete with a link to the new tracker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-8887) Conflicting links to issue tracker

2019-08-23 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-8887:
-

Assignee: Tim Armstrong

> Conflicting links to issue tracker
> --
>
> Key: IMPALA-8887
> URL: https://issues.apache.org/jira/browse/IMPALA-8887
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Tim Armstrong
>Priority: Major
>
> The following CWiki page has two links for reporting issues:
> https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala
> "please open a new JIRA ticket at the Impala JIRA tracker"
> and
> "Impala has a very active JIRA instance."
> The former links to https://issues.cloudera.org/projects/IMPALA/
> whereas the latter links to https://issues.apache.org/jira/projects/IMPALA/
> The cloudera tracker does not have any recent tickets so I assume it is no 
> longer used, and should not be referenced.
> Ideally flag the old tracker as obsolete with a link to the new tracker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-8886) Please delete old releases from mirroring system

2019-08-23 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-8886:
-

Assignee: Quanlong Huang

> Please delete old releases from mirroring system
> 
>
> Key: IMPALA-8886
> URL: https://issues.apache.org/jira/browse/IMPALA-8886
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Quanlong Huang
>Priority: Major
>
> To reduce the load on the ASF mirrors, projects are required to delete old 
> releases [1]
> Please can you remove all non-current releases?
> i.e. all but 3.3.0
> It's unfair to expect the 3rd party mirrors to carry old releases.
> However you can still link to the archives for historic releases.
> Please also update your release procedures (if relevant)
> Thanks!
> [1] [http://www.apache.org/dev/release.html#when-to-archive]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7312) Non-blocking mode for Fetch() RPC

2019-08-23 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-7312:
-

Assignee: Sahil Takiar

> Non-blocking mode for Fetch() RPC
> -
>
> Key: IMPALA-7312
> URL: https://issues.apache.org/jira/browse/IMPALA-7312
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Clients
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Major
>  Labels: resource-management
>
> Currently Fetch() can block for an arbitrary amount of time until a batch of 
> rows is produced. It might be helpful to have a mode where it returns quickly 
> when there is no data available, so that threads and RPC slots are not tied 
> up.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8888) Profile fetch performance when result spooling is enabled

2019-08-23 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-:


 Summary: Profile fetch performance when result spooling is enabled
 Key: IMPALA-
 URL: https://issues.apache.org/jira/browse/IMPALA-
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Profile the performance of fetching rows when result spooling is enabled. There 
are a few queries that can be used to benchmark the performance:

{{time ./bin/impala-shell.sh -B -q "select l_orderkey from 
tpch_parquet.lineitem" > /dev/null}}

{{time ./bin/impala-shell.sh -B -q "select * from tpch_parquet.orders" > 
/dev/null}}

The first fetches one column and 6,001,215 the second fetches 9 columns and 
1,500,000 - so a mix of rows fetched vs. columns fetched.

The base line for the benchmark should be the commit prior to IMPALA-8780.

The benchmark should check for both latency and CPU usage (to see if the copy 
into {{BufferedTupleStream}} has a significant overhead).

Various fetch sizes should be used in the benchmark as well to see if 
increasing the fetch size for result spooling improves performance (ideally it 
should) (it would be nice to run some fetches between machines as well as that 
will better reflect network round trip latencies).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8888) Profile fetch performance when result spooling is enabled

2019-08-23 Thread Sahil Takiar (Jira)

Sahil Takiar created IMPALA-:


 Summary: Profile fetch performance when result spooling is enabled
 Key: IMPALA-
 URL: https://issues.apache.org/jira/browse/IMPALA-
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Profile the performance of fetching rows when result spooling is enabled. There 
are a few queries that can be used to benchmark the performance:

{{time ./bin/impala-shell.sh -B -q "select l_orderkey from 
tpch_parquet.lineitem" > /dev/null}}

{{time ./bin/impala-shell.sh -B -q "select * from tpch_parquet.orders" > 
/dev/null}}

The first fetches one column and 6,001,215 the second fetches 9 columns and 
1,500,000 - so a mix of rows fetched vs. columns fetched.

The base line for the benchmark should be the commit prior to IMPALA-8780.

The benchmark should check for both latency and CPU usage (to see if the copy 
into {{BufferedTupleStream}} has a significant overhead).

Various fetch sizes should be used in the benchmark as well to see if 
increasing the fetch size for result spooling improves performance (ideally it 
should) (it would be nice to run some fetches between machines as well as that 
will better reflect network round trip latencies).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (IMPALA-8754) S3 with S3Guard tests encounter "ResourceNotFoundException" from DynamoDB

2019-08-23 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914206#comment-16914206
 ] 

Steve Loughran commented on IMPALA-8754:


DDB table wasn't found 
# the table doesn't exist
# the table does exist, but it is in a different region
S3Guard infers the region of the table to be that of the bucket; if you are 
reading data from buckets in other regions, the inference will be wrong.

There's some option to fix the table region; {{fs.s3a.s3guard.ddb.region}}.
you need to set this to the region where the table is

> S3 with S3Guard tests encounter "ResourceNotFoundException" from DynamoDB
> -
>
> Key: IMPALA-8754
> URL: https://issues.apache.org/jira/browse/IMPALA-8754
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> When running tests on s3 with s3guard, various tests can encounter the 
> following error coming from the DynamoDB:
> {noformat}
> EQuery aborted:Disk I/O error on 
> impala-ec2-centos74-m5-4xlarge-ondemand-02c8.vpc.cloudera.com:22002: Failed 
> to open HDFS file 
> s3a://impala-test-uswest2-1/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2451718/6843d8a91fc5ae1d-88b2af4b0004_156969840_data.0.parq
> E   Error(2): No such file or directory
> E   Root cause: ResourceNotFoundException: Requested resource not found 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ResourceNotFoundException; Request ID: 
> XXX){noformat}
> Tests that have seen this (this is flaky):
>  * TestTpcdsQuery.test_tpcds_count
>  * TestHdfsFdCaching.test_caching_disabled_by_param
>  * TestMtDop.test_compute_stats
>  * TestScanRangeLengths.test_scan_ranges



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8887) Conflicting links to issue tracker

2019-08-23 Thread Sebb (Jira)

Sebb created IMPALA-8887:


 Summary: Conflicting links to issue tracker
 Key: IMPALA-8887
 URL: https://issues.apache.org/jira/browse/IMPALA-8887
 Project: IMPALA
  Issue Type: Bug
Reporter: Sebb


The following CWiki page has two links for reporting issues:

https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala

"please open a new JIRA ticket at the Impala JIRA tracker"

and

"Impala has a very active JIRA instance."

The former links to https://issues.cloudera.org/projects/IMPALA/
whereas the latter links to https://issues.apache.org/jira/projects/IMPALA/

The cloudera tracker does not have any recent tickets so I assume it is no 
longer used, and should not be referenced.

Ideally flag the old tracker as obsolete with a link to the new tracker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8887) Conflicting links to issue tracker

2019-08-23 Thread Sebb (Jira)

Sebb created IMPALA-8887:


 Summary: Conflicting links to issue tracker
 Key: IMPALA-8887
 URL: https://issues.apache.org/jira/browse/IMPALA-8887
 Project: IMPALA
  Issue Type: Bug
Reporter: Sebb


The following CWiki page has two links for reporting issues:

https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala

"please open a new JIRA ticket at the Impala JIRA tracker"

and

"Impala has a very active JIRA instance."

The former links to https://issues.cloudera.org/projects/IMPALA/
whereas the latter links to https://issues.apache.org/jira/projects/IMPALA/

The cloudera tracker does not have any recent tickets so I assume it is no 
longer used, and should not be referenced.

Ideally flag the old tracker as obsolete with a link to the new tracker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (IMPALA-8886) Please delete old releases from mirroring system

2019-08-23 Thread Sebb (Jira)

Sebb created IMPALA-8886:


 Summary: Please delete old releases from mirroring system
 Key: IMPALA-8886
 URL: https://issues.apache.org/jira/browse/IMPALA-8886
 Project: IMPALA
  Issue Type: Bug
Reporter: Sebb


To reduce the load on the ASF mirrors, projects are required to delete old 
releases [1]

Please can you remove all non-current releases?

i.e. all but 3.3.0

It's unfair to expect the 3rd party mirrors to carry old releases.

However you can still link to the archives for historic releases.

Please also update your release procedures (if relevant)

Thanks!

[1] [http://www.apache.org/dev/release.html#when-to-archive]




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8886) Please delete old releases from mirroring system

2019-08-23 Thread Sebb (Jira)

Sebb created IMPALA-8886:


 Summary: Please delete old releases from mirroring system
 Key: IMPALA-8886
 URL: https://issues.apache.org/jira/browse/IMPALA-8886
 Project: IMPALA
  Issue Type: Bug
Reporter: Sebb


To reduce the load on the ASF mirrors, projects are required to delete old 
releases [1]

Please can you remove all non-current releases?

i.e. all but 3.3.0

It's unfair to expect the 3rd party mirrors to carry old releases.

However you can still link to the archives for historic releases.

Please also update your release procedures (if relevant)

Thanks!

[1] [http://www.apache.org/dev/release.html#when-to-archive]




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

39 matches

Mail list logo