[jira] [Closed] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink
[ https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar closed IMPALA-8818. Fix Version/s: Impala 3.4.0 Resolution: Fixed > Replace deque queue with spillable queue in BufferedPlanRootSink > > > Key: IMPALA-8818 > URL: https://issues.apache.org/jira/browse/IMPALA-8818 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in > {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a > {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by > {{PlanRootSink#computeResourceProfile}}. > *BufferedTupleStream Usage*: > The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' > mode so that pages are attached to the output {{RowBatch}} in > {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. > all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns > false (it returns false if "the unused reservation was not sufficient to add > a new page to the stream large enough to fit 'row' and the stream could not > increase the reservation to get enough unused reservation"), it should unpin > the stream ({{BufferedTupleStream::UnpinStream}}) and then add the row (if > the row still could not be added, then an error must have occurred, perhaps > an IO error, in which case return the error and fail the query). > *Constraining Resources*: > When result spooling is disabled, a user can run a {{select * from > [massive-fact-table]}} and scroll through the results without affecting the > health of the Impala cluster (assuming they close they query promptly). > Impala will stream the results one batch at a time to the user. > With result spooling, a naive implementation might try and buffer the enter > fact table, and end up spilling all the contents to disk, which can > potentially take up a large amount of space. So there needs to be > restrictions on the memory and disk space used by the {{BufferedTupleStream}} > in order to ensure a scan of a massive table does not consume all the memory > or disk space of the Impala coordinator. > This problem can be solved by placing a max size on the amount of unpinned > memory (perhaps through a new config option > {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). > The max amount of pinned memory should already be constrained by the > reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the > number of rows returned by a query, and so it should limit the number of rows > buffered by the BTS as well (although it is set to 0 by default). > SCRATCH_LIMIT already limits the amount of disk space used for spilling > (although it is set to -1 by default). > The {{PlanRootSink}} should attempt to accurately estimate how much memory it > needs to buffer all results in memory. This requires setting an accurate > value of {{ResourceProfile#memEstimateBytes_}} in > {{PlanRootSink#computeResourceProfile}}. If statistics are available, the > estimate can be based on the number of estimated rows returned multiplied by > the size of the rows returned. The min reservation should account for a read > and write page for the {{BufferedTupleStream}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IMPALA-8891) concat_ws() null handling is non-standard
[ https://issues.apache.org/jira/browse/IMPALA-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914757#comment-16914757 ] Greg Rahn commented on IMPALA-8891: --- {noformat} MariaDB [(none)]> select version(); ++ | version() | ++ | 10.4.6-MariaDB | ++ MariaDB [(none)]> select concat_ws('-','foo',null,'bar') as expr1; +-+ | expr1 | +-+ | foo-bar | +-+ {noformat} {noformat} impala> select version(); +-+ | version() | +-+ | impalad version 3.3.0-SNAPSHOT RELEASE (build df3e7c051e2641524fc53a0cd07c2a14decd55f7) | | Built on Thu Aug 22 19:28:57 UTC 2019 | +-+ impala> select concat_ws('-','foo',null,'bar') as expr1; +---+ | expr1 | +---+ | NULL | +---+ {noformat} {noformat} hive> select version(); ++ |_c0 | ++ | 3.1.2000.7.0.0.0-463 r7db8023511683e2b30c31bcb6ad5b372b1876eab | ++ hive> select concat_ws('-','foo',null,'bar') as expr1; +--+ | expr1 | +--+ | foo-bar | +--+ {noformat} > concat_ws() null handling is non-standard > - > > Key: IMPALA-8891 > URL: https://issues.apache.org/jira/browse/IMPALA-8891 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0, Impala 3.3.0 >Reporter: Tim Armstrong >Priority: Major > Labels: newbie > > [~grahn] reports > {quote}Looks like Impala’s CONCAT_WS() does not behave correctly if an > argument is NULL — it returns NULL and it should not. Mismatch between > Hive/MySQL and Impala (and apologies for not filing a bug) > {quote} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8885) Improve parquet version metadata error
[ https://issues.apache.org/jira/browse/IMPALA-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914750#comment-16914750 ] Tim Armstrong commented on IMPALA-8885: --- The new error template is: {code} ("PARQUET_BAD_VERSION_NUMBER", 60, "File '$0' has an invalid Parquet version number: " "$1\\n. Please check that it is a valid Parquet file. " "This error can also occur due to stale metadata. " "If you believe this is a valid Parquet file, try running \\\"refresh $2\\\"."), {code} > Improve parquet version metadata error > -- > > Key: IMPALA-8885 > URL: https://issues.apache.org/jira/browse/IMPALA-8885 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: supportability > Fix For: Impala 3.4.0 > > > The error looks like this now: > {noformat} > File 'hdfs://nn/books/books1G.csv' has an invalid version number: .99 This > could be due to stale metadata. Try running "refresh s3db.books_s3". > {noformat} > It seems to be reasonably common that this happens because a non-parquet file > is being queried in a parquet table. > The error message should say something like "File > 'hdfs://nn/books/books1G.csv' has an invalid Parquet version number: .99 and > does not appear to be a valid Parquet file. This could be due to stale > metadata. Try running "refresh s3db.books_s3" -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8885) Improve parquet version metadata error
[ https://issues.apache.org/jira/browse/IMPALA-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8885. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Improve parquet version metadata error > -- > > Key: IMPALA-8885 > URL: https://issues.apache.org/jira/browse/IMPALA-8885 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: supportability > Fix For: Impala 3.4.0 > > > The error looks like this now: > {noformat} > File 'hdfs://nn/books/books1G.csv' has an invalid version number: .99 This > could be due to stale metadata. Try running "refresh s3db.books_s3". > {noformat} > It seems to be reasonably common that this happens because a non-parquet file > is being queried in a parquet table. > The error message should say something like "File > 'hdfs://nn/books/books1G.csv' has an invalid Parquet version number: .99 and > does not appear to be a valid Parquet file. This could be due to stale > metadata. Try running "refresh s3db.books_s3" -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IMPALA-8885) Improve parquet version metadata error
[ https://issues.apache.org/jira/browse/IMPALA-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914747#comment-16914747 ] ASF subversion and git services commented on IMPALA-8885: - Commit af0e04f33bbf2e93b7676ed7768c335c49b195f2 in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=af0e04f ] IMPALA-8885: Improve Parquet version metadata error Update the error message to make it more obvious that the error could occur by trying to parse a non-Parquet file as Parquet Updated tests that depended on the error test. Change-Id: I2b36586dba14a31a613d79a0e28efc9a5173e75d Reviewed-on: http://gerrit.cloudera.org:8080/14126 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Improve parquet version metadata error > -- > > Key: IMPALA-8885 > URL: https://issues.apache.org/jira/browse/IMPALA-8885 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: supportability > > The error looks like this now: > {noformat} > File 'hdfs://nn/books/books1G.csv' has an invalid version number: .99 This > could be due to stale metadata. Try running "refresh s3db.books_s3". > {noformat} > It seems to be reasonably common that this happens because a non-parquet file > is being queried in a parquet table. > The error message should say something like "File > 'hdfs://nn/books/books1G.csv' has an invalid Parquet version number: .99 and > does not appear to be a valid Parquet file. This could be due to stale > metadata. Try running "refresh s3db.books_s3" -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8825) Add additional counters to PlanRootSink
[ https://issues.apache.org/jira/browse/IMPALA-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914749#comment-16914749 ] ASF subversion and git services commented on IMPALA-8825: - Commit d037ac8304b43f6e4bb4c6ba2eb1910a9e921c24 in impala's branch refs/heads/master from Sahil Takiar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=d037ac8 ] IMPALA-8818: Replace deque with spillable queue in BufferedPRS Replaces DequeRowBatchQueue with SpillableRowBatchQueue in BufferedPlanRootSink. A few changes to BufferedPlanRootSink were necessary for it to work with the spillable queue, however, all the synchronization logic is the same. SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and a ReservationManager. It takes in a TBackendResourceProfile that specifies the max / min memory reservation the BufferedTupleStream can use to buffer rows. The 'max_unpinned_bytes' parameter limits the max number of bytes that can be unpinned in the BufferedTupleStream. The limit is a 'soft' limit because calls to AddBatch may push the amount of unpinned memory over the limit. The queue is non-blocking and not thread safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill if the BufferedTupleStream does not have enough reservation to fit the entire RowBatch. Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned and unpinned memory that a query can use for spooling, respectively. MAX_PINNED_RESULT_SPOOLING_MEMORY must be <= MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned data in the BufferedTupleStream to be unpinned. This is enforced in a new method in QueryOptions called 'ValidateQueryOptions'. Planner Changes: PlanRootSink.java now computes a full ResourceProfile if result spooling is enabled. The min mem reservation is bounded by the size of the read and write pages used by the BufferedTupleStream. The max mem reservation is bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is computed by estimating the size of the result set using stats. BufferedTupleStream Re-Factoring: For the most part, using a BufferedTupleStream outside an ExecNode works properly. However, some changes were necessary: * The message for the MAX_ROW_SIZE error is ExecNode specific. In order to fix this, this patch introduces the concept of an ExecNode 'label' which is a more generic version of an ExecNode 'id'. * The definition of TBackendResourceProfile lived in PlanNodes.thrift, it was moved to its own file so it can be used by DataSinks.thrift. * Modified BufferedTupleStream so it internally tracks how many bytes are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY'). Metrics: * Added a few of the metrics mentioned in IMPALA-8825 to BufferedPlanRootSink. Specifically, added timers to track how much time is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext' methods. * The BufferedTupleStream in the SpillableRowBatchQueue exposes several BufferPool metrics such as number of reserved and unpinned bytes. Bug Fixes: * Fixed a bug in BufferedPlanRootSink where the MemPool used by the expression evaluators was not being cleared incrementally. * Fixed a bug where the inactive timer was not being properly updated in BufferedPlanRootSink. * Fixed a bug where RowBatch memory was not freed if BufferedPlanRootSink::GetNext terminated early because it could not handle requests where num_results < BATCH_SIZE. Testing: * Added new tests to test_result_spooling.py. * Updated errors thrown in spilling-large-rows.test. * Ran exhaustive tests. Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9 Reviewed-on: http://gerrit.cloudera.org:8080/14039 Reviewed-by: Sahil Takiar Tested-by: Impala Public Jenkins > Add additional counters to PlanRootSink > --- > > Key: IMPALA-8825 > URL: https://issues.apache.org/jira/browse/IMPALA-8825 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > The current entry in the runtime profile for {{PLAN_ROOT_SINK}} does not > contain much useful information: > {code:java} > PLAN_ROOT_SINK:(Total: 234.996ms, non-child: 234.996ms, % non-child: 100.00%) > - PeakMemoryUsage: 0{code} > There are several additional counters we could add to the {{PlanRootSink}} > (either the {{BufferedPlanRootSink}} or {{BlockingPlanRootSink}}): > * Amount of time spent blocking inside the {{PlanRootSink}} - both the time > spent by the client thread waiting for rows to become available and the time > spent by the impala thread waiting for the client to consume rows > ** So similar to the {{RowBatchQueueGetWaitTime}} and > {{RowBatchQueuePutWaitTime}} inside the
[jira] [Commented] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink
[ https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914748#comment-16914748 ] ASF subversion and git services commented on IMPALA-8818: - Commit d037ac8304b43f6e4bb4c6ba2eb1910a9e921c24 in impala's branch refs/heads/master from Sahil Takiar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=d037ac8 ] IMPALA-8818: Replace deque with spillable queue in BufferedPRS Replaces DequeRowBatchQueue with SpillableRowBatchQueue in BufferedPlanRootSink. A few changes to BufferedPlanRootSink were necessary for it to work with the spillable queue, however, all the synchronization logic is the same. SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and a ReservationManager. It takes in a TBackendResourceProfile that specifies the max / min memory reservation the BufferedTupleStream can use to buffer rows. The 'max_unpinned_bytes' parameter limits the max number of bytes that can be unpinned in the BufferedTupleStream. The limit is a 'soft' limit because calls to AddBatch may push the amount of unpinned memory over the limit. The queue is non-blocking and not thread safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill if the BufferedTupleStream does not have enough reservation to fit the entire RowBatch. Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned and unpinned memory that a query can use for spooling, respectively. MAX_PINNED_RESULT_SPOOLING_MEMORY must be <= MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned data in the BufferedTupleStream to be unpinned. This is enforced in a new method in QueryOptions called 'ValidateQueryOptions'. Planner Changes: PlanRootSink.java now computes a full ResourceProfile if result spooling is enabled. The min mem reservation is bounded by the size of the read and write pages used by the BufferedTupleStream. The max mem reservation is bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is computed by estimating the size of the result set using stats. BufferedTupleStream Re-Factoring: For the most part, using a BufferedTupleStream outside an ExecNode works properly. However, some changes were necessary: * The message for the MAX_ROW_SIZE error is ExecNode specific. In order to fix this, this patch introduces the concept of an ExecNode 'label' which is a more generic version of an ExecNode 'id'. * The definition of TBackendResourceProfile lived in PlanNodes.thrift, it was moved to its own file so it can be used by DataSinks.thrift. * Modified BufferedTupleStream so it internally tracks how many bytes are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY'). Metrics: * Added a few of the metrics mentioned in IMPALA-8825 to BufferedPlanRootSink. Specifically, added timers to track how much time is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext' methods. * The BufferedTupleStream in the SpillableRowBatchQueue exposes several BufferPool metrics such as number of reserved and unpinned bytes. Bug Fixes: * Fixed a bug in BufferedPlanRootSink where the MemPool used by the expression evaluators was not being cleared incrementally. * Fixed a bug where the inactive timer was not being properly updated in BufferedPlanRootSink. * Fixed a bug where RowBatch memory was not freed if BufferedPlanRootSink::GetNext terminated early because it could not handle requests where num_results < BATCH_SIZE. Testing: * Added new tests to test_result_spooling.py. * Updated errors thrown in spilling-large-rows.test. * Ran exhaustive tests. Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9 Reviewed-on: http://gerrit.cloudera.org:8080/14039 Reviewed-by: Sahil Takiar Tested-by: Impala Public Jenkins > Replace deque queue with spillable queue in BufferedPlanRootSink > > > Key: IMPALA-8818 > URL: https://issues.apache.org/jira/browse/IMPALA-8818 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in > {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a > {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by > {{PlanRootSink#computeResourceProfile}}. > *BufferedTupleStream Usage*: > The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' > mode so that pages are attached to the output {{RowBatch}} in > {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. > all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns > false (it returns false if "the unused reservation was not
[jira] [Resolved] (IMPALA-8885) Improve parquet version metadata error
[ https://issues.apache.org/jira/browse/IMPALA-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8885. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Improve parquet version metadata error > -- > > Key: IMPALA-8885 > URL: https://issues.apache.org/jira/browse/IMPALA-8885 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: supportability > Fix For: Impala 3.4.0 > > > The error looks like this now: > {noformat} > File 'hdfs://nn/books/books1G.csv' has an invalid version number: .99 This > could be due to stale metadata. Try running "refresh s3db.books_s3". > {noformat} > It seems to be reasonably common that this happens because a non-parquet file > is being queried in a parquet table. > The error message should say something like "File > 'hdfs://nn/books/books1G.csv' has an invalid Parquet version number: .99 and > does not appear to be a valid Parquet file. This could be due to stale > metadata. Try running "refresh s3db.books_s3" -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch
[ https://issues.apache.org/jira/browse/IMPALA-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914745#comment-16914745 ] Tim Armstrong commented on IMPALA-8890: --- Also, I guess the reason that this is all so complicated is the need to manage the buffer reservation when iterating over a read/write stream, and handle the various pinned and unpinned states. The cases when transitioning from having read & write iterators pointing to the same page to different pages was complicated because we had to keep extra reservation on hand. There were just a lot of states and state transitions. The logic around ExpectedPinCount() was intended to make this simpler in a way - instead of trying to handle each state transition separately, it instead computes the expected state in the new state and then pins or unpins things accordingly. > DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch > > > Key: IMPALA-8890 > URL: https://issues.apache.org/jira/browse/IMPALA-8890 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Affects Versions: Impala 3.4.0 >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Blocker > Attachments: impalad.INFO, resolved.txt > > > Full stack: > {code} > F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] > 6a4941285b46788d:68021ec6] Check failed: > !page->attached_to_output_batch > *** Check failure stack trace: *** > @ 0x4c987cc google::LogMessage::Fail() > @ 0x4c9a071 google::LogMessage::SendToLog() > @ 0x4c981a6 google::LogMessage::Flush() > @ 0x4c9b76d google::LogMessageFatal::~LogMessageFatal() > @ 0x2917f78 impala::BufferedTupleStream::ExpectedPinCount() > @ 0x29181ec impala::BufferedTupleStream::UnpinPageIfNeeded() > @ 0x291b27b impala::BufferedTupleStream::UnpinStream() > @ 0x297d429 impala::SpillableRowBatchQueue::AddBatch() > @ 0x25d5537 impala::BufferedPlanRootSink::Send() > @ 0x207e94c impala::FragmentInstanceState::ExecInternal() > @ 0x207afac impala::FragmentInstanceState::Exec() > @ 0x208e854 impala::QueryState::ExecFInstance() > @ 0x208cb21 > _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv > @ 0x2090536 > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x1e9830b boost::function0<>::operator()() > @ 0x23e2d38 impala::Thread::SuperviseThread() > @ 0x23eb0bc boost::_bi::list5<>::operator()<>() > @ 0x23eafe0 boost::_bi::bind_t<>::operator()() > @ 0x23eafa3 boost::detail::thread_data<>::run() > @ 0x3bc1629 thread_proxy > @ 0x7f920a3786b9 start_thread > @ 0x7f9206b5741c clone > {code} > Happened once while I was running a full table scan of > {{tpch_parquet.orders}} via JDBC (client was running on another EC2 machine). > This was running on top of IMPALA-8819 with a fetch size of 32768. > Attached full logs and mini-dump stack. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch
[ https://issues.apache.org/jira/browse/IMPALA-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914741#comment-16914741 ] Tim Armstrong commented on IMPALA-8890: --- [~stakiar] yeah I'm pretty sure this is a BTS bug that we haven't seen before because the usage patterns of other nodes are different. The caller of GetNext() does the right thing by processing the returned batch and then resetting it before any other BTS methods are called. That would free any pages that were attached to the batch. I think the cleanest way to fix it might be to advance the read page when you encounter this situation in UnpinStream(). It should be safe to do that since you'll be at the end of the current read page, and then buffer management is simplified because the first page in the stream is the one you need to keep pinned. {code} if (pinned_) { CHECK_CONSISTENCY_FULL(); if (read_page_ != pages_.end() && read_page_rows_returned_ == read_page_->num_rows) { RETURN_IF_ERROR(NextReadPage()); } .. {code} > DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch > > > Key: IMPALA-8890 > URL: https://issues.apache.org/jira/browse/IMPALA-8890 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Affects Versions: Impala 3.4.0 >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Blocker > Attachments: impalad.INFO, resolved.txt > > > Full stack: > {code} > F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] > 6a4941285b46788d:68021ec6] Check failed: > !page->attached_to_output_batch > *** Check failure stack trace: *** > @ 0x4c987cc google::LogMessage::Fail() > @ 0x4c9a071 google::LogMessage::SendToLog() > @ 0x4c981a6 google::LogMessage::Flush() > @ 0x4c9b76d google::LogMessageFatal::~LogMessageFatal() > @ 0x2917f78 impala::BufferedTupleStream::ExpectedPinCount() > @ 0x29181ec impala::BufferedTupleStream::UnpinPageIfNeeded() > @ 0x291b27b impala::BufferedTupleStream::UnpinStream() > @ 0x297d429 impala::SpillableRowBatchQueue::AddBatch() > @ 0x25d5537 impala::BufferedPlanRootSink::Send() > @ 0x207e94c impala::FragmentInstanceState::ExecInternal() > @ 0x207afac impala::FragmentInstanceState::Exec() > @ 0x208e854 impala::QueryState::ExecFInstance() > @ 0x208cb21 > _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv > @ 0x2090536 > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x1e9830b boost::function0<>::operator()() > @ 0x23e2d38 impala::Thread::SuperviseThread() > @ 0x23eb0bc boost::_bi::list5<>::operator()<>() > @ 0x23eafe0 boost::_bi::bind_t<>::operator()() > @ 0x23eafa3 boost::detail::thread_data<>::run() > @ 0x3bc1629 thread_proxy > @ 0x7f920a3786b9 start_thread > @ 0x7f9206b5741c clone > {code} > Happened once while I was running a full table scan of > {{tpch_parquet.orders}} via JDBC (client was running on another EC2 machine). > This was running on top of IMPALA-8819 with a fetch size of 32768. > Attached full logs and mini-dump stack. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8891) concat_ws() null handling is non-standard
[ https://issues.apache.org/jira/browse/IMPALA-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8891: -- Labels: newbie (was: ) > concat_ws() null handling is non-standard > - > > Key: IMPALA-8891 > URL: https://issues.apache.org/jira/browse/IMPALA-8891 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0, Impala 3.3.0 >Reporter: Tim Armstrong >Priority: Major > Labels: newbie > > [~grahn] reports > {quote}Looks like Impala’s CONCAT_WS() does not behave correctly if an > argument is NULL — it returns NULL and it should not. Mismatch between > Hive/MySQL and Impala (and apologies for not filing a bug) > {quote} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8891) concat_ws() null handling is non-standard
Tim Armstrong created IMPALA-8891: - Summary: concat_ws() null handling is non-standard Key: IMPALA-8891 URL: https://issues.apache.org/jira/browse/IMPALA-8891 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 3.2.0, Impala 3.3.0 Reporter: Tim Armstrong [~grahn] reports {quote}Looks like Impala’s CONCAT_WS() does not behave correctly if an argument is NULL — it returns NULL and it should not. Mismatch between Hive/MySQL and Impala (and apologies for not filing a bug) {quote} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8891) concat_ws() null handling is non-standard
Tim Armstrong created IMPALA-8891: - Summary: concat_ws() null handling is non-standard Key: IMPALA-8891 URL: https://issues.apache.org/jira/browse/IMPALA-8891 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 3.2.0, Impala 3.3.0 Reporter: Tim Armstrong [~grahn] reports {quote}Looks like Impala’s CONCAT_WS() does not behave correctly if an argument is NULL — it returns NULL and it should not. Mismatch between Hive/MySQL and Impala (and apologies for not filing a bug) {quote} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch
[ https://issues.apache.org/jira/browse/IMPALA-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914715#comment-16914715 ] Sahil Takiar edited comment on IMPALA-8890 at 8/23/19 11:38 PM: Can re-produce this pretty consistently now (same setup as above but added some additional logging and changed the mini cluster to have 1 dedicated coordinator and 3 executors). Here is what I found out so far: * It doesn't *look* like a race condition * It seems to only happen when: ** Rows are being added to the {{BufferedTupleStream}} successfully, until: {{SpillableRowBatchQueue::GetBatch}} >>> {{BufferedTupleStream::GetNext}} >>> {{read_page_->AttachBufferToBatch}} ** Then, without any additional calls to {{BufferedTupleStream::GetNext}} (I think this part may be relevant because everything works if there are additional calls to {{GetNext}}), {{SpillableRowBatchQueue::AddBatch}} >>> {{BufferedTupleStream::AddRow}} is called repeatedly (for multiple {{RowBatch}}-es) ** This continues until eventually {{BufferedTupleStream::AddRow}} returns false (presumably because the reservation limits have been hit), and then {{BufferedTupleStream::UnpinStream}} is called, which eventually hits the DCHECK above * The DCHECK is hit because: ** Looking at the state of the {{Page}}-s in the {{BufferedTupleStream}} it looks like the last call to {{BufferedTupleStream::GetNext}} calls {{BufferedTupleStream::AttachBufferToBatch}} on the {{read_page_}} which sets {{attached_to_output_batch}} to true for the {{Page}} ** Then {{UnpinStream}} is called, iterates through all the {{pages_}} and sees that the {{read_page_}} has {{attached_to_output_batch}} set to true and then fails (I confirmed through logging that it fails specifically on the {{read_page_}} that had {{attached_to_output_batch}} set to true above) ** *If* there had been an additional call to {{GetNext}} then {{NextReadPage()}} would have been called which was calls {{pages_.pop_front()}} and removes the {{read_page_}} with {{attached_to_output_batch}} set to true from the list of pages So *I think* this is a bug in {{BufferedTupleStream}}, unless there is something off with the way {{SpillableRowBatchQueue}} is using {{BufferedTupleStream}}, wondering what [~tarmstr...@cloudera.com] thinks? Here is a snippet of the modified logs that may shows things more clearly: {code:java} I0823 16:08:32.386766 33770 buffered-plan-root-sink.cc:169] Getting Batch I0823 16:08:32.388576 33770 buffered-plan-root-sink.cc:169] Getting Batch ... I0823 16:08:32.394234 33770 buffered-tuple-stream.cc:804] Calling AttachBufferToBatch I0823 16:08:32.394240 33770 buffered-tuple-stream.cc:204] Setting attached_to_output_batch to true for page 0x18cc8880 I0823 16:08:32.394279 33770 buffered-plan-root-sink.cc:209] Returning rows = 32768 I0823 16:08:32.394289 33770 impala-hs2-server.cc:842] FetchResults(): #results=0 has_more=true I0823 16:08:32.394300 33781 buffered-plan-root-sink.cc:77] f348799ab855e68e:697ddff1] Adding Batch I0823 16:08:32.395067 33781 buffered-plan-root-sink.cc:77] f348799ab855e68e:697ddff1] Adding Batch ... I0823 16:08:32.431181 33781 spillable-row-batch-queue.cc:79] f348799ab855e68e:697ddff1] SpillableRowBatchQueue about to start spilling BufferedTupleStream num_rows=1152120 rows_returned=557901 pinned=1 attach_on_read=1 closed=0 bytes_pinned=102760448 has_write_iterator=1 write_page=0x18dfd510 has_read_iterator=1 read_page=0x18cc8880 read_page_reservation=0 write_page_reservation=0 # pages=50 pages=[ { 0x18cc8880 CLOSED num_rows=12123 retrived_buffer=true attached_to_output_batch=true}, { 0x1ba6e100 client: 0x13616b88/0x13b921c0 page: { 0x15927f40 len: 2097152 pin_count: 1 buf: 0x15927fb8 client: 0x13616b88/0x13b921c0 data: 0x2740 len: 2097152} num_rows=12107 retrived_buffer=true attached_to_output_batch=false}, ... 0823 16:08:32.431262 33781 buffered-tuple-stream.cc:292] f348799ab855e68e:697ddff1] Check failed: !page->attached_to_output_batch check failed for page = 0x18cc8880 CLOSED num_rows=12123 retrived_buffer=true attached_to_output_batch=true{code} was (Author: stakiar): Can re-produce this pretty consistently now (same setup as above but added some additional logging and changed the mini cluster to have 1 dedicated coordinator and 3 executors). Here is what I found out so far: * It doesn't *look* like a race condition * It seems to only happen when: ** Rows are being added to the {{BufferedTupleStream}} successfully, until: {{SpillableRowBatchQueue::GetBatch}} --> {{BufferedTupleStream::GetNext}} --> {{read_page_->AttachBufferToBatch}} ** Then, without any additional calls to {{BufferedTupleStream::GetNext}} (I think this part may be relevant because everything works if there are additional calls to {{GetNext}}), {{SpillableRowBatchQueue::AddBatch}} -->
[jira] [Commented] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch
[ https://issues.apache.org/jira/browse/IMPALA-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914715#comment-16914715 ] Sahil Takiar commented on IMPALA-8890: -- Can re-produce this pretty consistently now (same setup as above but added some additional logging and changed the mini cluster to have 1 dedicated coordinator and 3 executors). Here is what I found out so far: * It doesn't *look* like a race condition * It seems to only happen when: ** Rows are being added to the {{BufferedTupleStream}} successfully, until: {{SpillableRowBatchQueue::GetBatch}} --> {{BufferedTupleStream::GetNext}} --> {{read_page_->AttachBufferToBatch}} ** Then, without any additional calls to {{BufferedTupleStream::GetNext}} (I think this part may be relevant because everything works if there are additional calls to {{GetNext}}), {{SpillableRowBatchQueue::AddBatch}} --> {{BufferedTupleStream::AddRow}} is called repeatedly (for multiple {{RowBatch}}-es) ** This continues until eventually {{BufferedTupleStream::AddRow}} returns false (presumably because the reservation limits have been hit), and then {{BufferedTupleStream::UnpinStream}} is called, which eventually hits the DCHECK above * The DCHECK is hit because: ** Looking at the state of the {{Page}}-s in the {{BufferedTupleStream}} it looks like the last call to {{BufferedTupleStream::GetNext}} calls {{BufferedTupleStream::AttachBufferToBatch}} on the {{read_page_}} which sets {{attached_to_output_batch}} to true for the {{Page}} ** Then {{UnpinStream}} is called, iterates through all the {{pages_}} and sees that the {{read_page_}} has {{attached_to_output_batch}} set to true and then fails (I confirmed through logging that it fails specifically on the {{read_page_}} that had {{attached_to_output_batch}} set to true above) ** *If* there had been an additional call to {{GetNext}} then {{NextReadPage()}} would have been called which was calls {{pages_.pop_front()}} and removes the {{read_page_}} with {{attached_to_output_batch}} set to true from the list of pages So *I think* this is a bug in {{BufferedTupleStream}}, unless there is something off with the way {{SpillableRowBatchQueue}} is using {{BufferedTupleStream}}, wondering what [~tarmstr...@cloudera.com] thinks? Here is a snippet of the modified logs that may shows things more clearly: {code:java} I0823 16:08:32.386766 33770 buffered-plan-root-sink.cc:169] Getting Batch I0823 16:08:32.388576 33770 buffered-plan-root-sink.cc:169] Getting Batch ... I0823 16:08:32.394234 33770 buffered-tuple-stream.cc:804] Calling AttachBufferToBatch I0823 16:08:32.394240 33770 buffered-tuple-stream.cc:204] Setting attached_to_output_batch to true for page 0x18cc8880 I0823 16:08:32.394279 33770 buffered-plan-root-sink.cc:209] Returning rows = 32768 I0823 16:08:32.394289 33770 impala-hs2-server.cc:842] FetchResults(): #results=0 has_more=true I0823 16:08:32.394300 33781 buffered-plan-root-sink.cc:77] f348799ab855e68e:697ddff1] Adding Batch I0823 16:08:32.395067 33781 buffered-plan-root-sink.cc:77] f348799ab855e68e:697ddff1] Adding Batch ... I0823 16:08:32.431181 33781 spillable-row-batch-queue.cc:79] f348799ab855e68e:697ddff1] SpillableRowBatchQueue about to start spilling BufferedTupleStream num_rows=1152120 rows_returned=557901 pinned=1 attach_on_read=1 closed=0 bytes_pinned=102760448 has_write_iterator=1 write_page=0x18dfd510 has_read_iterator=1 read_page=0x18cc8880 read_page_reservation=0 write_page_reservation=0 # pages=50 pages=[ { 0x18cc8880 CLOSED num_rows=12123 retrived_buffer=true attached_to_output_batch=true}, { 0x1ba6e100 client: 0x13616b88/0x13b921c0 page: { 0x15927f40 len: 2097152 pin_count: 1 buf: 0x15927fb8 client: 0x13616b88/0x13b921c0 data: 0x2740 len: 2097152} num_rows=12107 retrived_buffer=true attached_to_output_batch=false}, ... 0823 16:08:32.431262 33781 buffered-tuple-stream.cc:292] f348799ab855e68e:697ddff1] Check failed: !page->attached_to_output_batch check failed for page = 0x18cc8880 CLOSED num_rows=12123 retrived_buffer=true attached_to_output_batch=true{code} > DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch > > > Key: IMPALA-8890 > URL: https://issues.apache.org/jira/browse/IMPALA-8890 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Affects Versions: Impala 3.4.0 >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Blocker > Attachments: impalad.INFO, resolved.txt > > > Full stack: > {code} > F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] > 6a4941285b46788d:68021ec6] Check failed: > !page->attached_to_output_batch > *** Check failure stack trace: *** > @ 0x4c987cc
[jira] [Commented] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch
[ https://issues.apache.org/jira/browse/IMPALA-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914632#comment-16914632 ] Sahil Takiar commented on IMPALA-8890: -- Could be a bug in the implementation of IMPALA-8819, not sure yet. > DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch > > > Key: IMPALA-8890 > URL: https://issues.apache.org/jira/browse/IMPALA-8890 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Affects Versions: Impala 3.4.0 >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Blocker > Attachments: impalad.INFO, resolved.txt > > > Full stack: > {code} > F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] > 6a4941285b46788d:68021ec6] Check failed: > !page->attached_to_output_batch > *** Check failure stack trace: *** > @ 0x4c987cc google::LogMessage::Fail() > @ 0x4c9a071 google::LogMessage::SendToLog() > @ 0x4c981a6 google::LogMessage::Flush() > @ 0x4c9b76d google::LogMessageFatal::~LogMessageFatal() > @ 0x2917f78 impala::BufferedTupleStream::ExpectedPinCount() > @ 0x29181ec impala::BufferedTupleStream::UnpinPageIfNeeded() > @ 0x291b27b impala::BufferedTupleStream::UnpinStream() > @ 0x297d429 impala::SpillableRowBatchQueue::AddBatch() > @ 0x25d5537 impala::BufferedPlanRootSink::Send() > @ 0x207e94c impala::FragmentInstanceState::ExecInternal() > @ 0x207afac impala::FragmentInstanceState::Exec() > @ 0x208e854 impala::QueryState::ExecFInstance() > @ 0x208cb21 > _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv > @ 0x2090536 > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x1e9830b boost::function0<>::operator()() > @ 0x23e2d38 impala::Thread::SuperviseThread() > @ 0x23eb0bc boost::_bi::list5<>::operator()<>() > @ 0x23eafe0 boost::_bi::bind_t<>::operator()() > @ 0x23eafa3 boost::detail::thread_data<>::run() > @ 0x3bc1629 thread_proxy > @ 0x7f920a3786b9 start_thread > @ 0x7f9206b5741c clone > {code} > Happened once while I was running a full table scan of > {{tpch_parquet.orders}} via JDBC (client was running on another EC2 machine). > This was running on top of IMPALA-8819 with a fetch size of 32768. > Attached full logs and mini-dump stack. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch
Sahil Takiar created IMPALA-8890: Summary: DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch Key: IMPALA-8890 URL: https://issues.apache.org/jira/browse/IMPALA-8890 Project: IMPALA Issue Type: Sub-task Components: Backend Affects Versions: Impala 3.4.0 Reporter: Sahil Takiar Assignee: Sahil Takiar Attachments: impalad.INFO, resolved.txt Full stack: {code} F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] 6a4941285b46788d:68021ec6] Check failed: !page->attached_to_output_batch *** Check failure stack trace: *** @ 0x4c987cc google::LogMessage::Fail() @ 0x4c9a071 google::LogMessage::SendToLog() @ 0x4c981a6 google::LogMessage::Flush() @ 0x4c9b76d google::LogMessageFatal::~LogMessageFatal() @ 0x2917f78 impala::BufferedTupleStream::ExpectedPinCount() @ 0x29181ec impala::BufferedTupleStream::UnpinPageIfNeeded() @ 0x291b27b impala::BufferedTupleStream::UnpinStream() @ 0x297d429 impala::SpillableRowBatchQueue::AddBatch() @ 0x25d5537 impala::BufferedPlanRootSink::Send() @ 0x207e94c impala::FragmentInstanceState::ExecInternal() @ 0x207afac impala::FragmentInstanceState::Exec() @ 0x208e854 impala::QueryState::ExecFInstance() @ 0x208cb21 _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv @ 0x2090536 _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE @ 0x1e9830b boost::function0<>::operator()() @ 0x23e2d38 impala::Thread::SuperviseThread() @ 0x23eb0bc boost::_bi::list5<>::operator()<>() @ 0x23eafe0 boost::_bi::bind_t<>::operator()() @ 0x23eafa3 boost::detail::thread_data<>::run() @ 0x3bc1629 thread_proxy @ 0x7f920a3786b9 start_thread @ 0x7f9206b5741c clone {code} Happened once while I was running a full table scan of {{tpch_parquet.orders}} via JDBC (client was running on another EC2 machine). This was running on top of IMPALA-8819 with a fetch size of 32768. Attached full logs and mini-dump stack. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch
Sahil Takiar created IMPALA-8890: Summary: DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch Key: IMPALA-8890 URL: https://issues.apache.org/jira/browse/IMPALA-8890 Project: IMPALA Issue Type: Sub-task Components: Backend Affects Versions: Impala 3.4.0 Reporter: Sahil Takiar Assignee: Sahil Takiar Attachments: impalad.INFO, resolved.txt Full stack: {code} F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] 6a4941285b46788d:68021ec6] Check failed: !page->attached_to_output_batch *** Check failure stack trace: *** @ 0x4c987cc google::LogMessage::Fail() @ 0x4c9a071 google::LogMessage::SendToLog() @ 0x4c981a6 google::LogMessage::Flush() @ 0x4c9b76d google::LogMessageFatal::~LogMessageFatal() @ 0x2917f78 impala::BufferedTupleStream::ExpectedPinCount() @ 0x29181ec impala::BufferedTupleStream::UnpinPageIfNeeded() @ 0x291b27b impala::BufferedTupleStream::UnpinStream() @ 0x297d429 impala::SpillableRowBatchQueue::AddBatch() @ 0x25d5537 impala::BufferedPlanRootSink::Send() @ 0x207e94c impala::FragmentInstanceState::ExecInternal() @ 0x207afac impala::FragmentInstanceState::Exec() @ 0x208e854 impala::QueryState::ExecFInstance() @ 0x208cb21 _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv @ 0x2090536 _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE @ 0x1e9830b boost::function0<>::operator()() @ 0x23e2d38 impala::Thread::SuperviseThread() @ 0x23eb0bc boost::_bi::list5<>::operator()<>() @ 0x23eafe0 boost::_bi::bind_t<>::operator()() @ 0x23eafa3 boost::detail::thread_data<>::run() @ 0x3bc1629 thread_proxy @ 0x7f920a3786b9 start_thread @ 0x7f9206b5741c clone {code} Happened once while I was running a full table scan of {{tpch_parquet.orders}} via JDBC (client was running on another EC2 machine). This was running on top of IMPALA-8819 with a fetch size of 32768. Attached full logs and mini-dump stack. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8889) Incorrect exception message when trying unsupported option for acid tables
Yongzhi Chen created IMPALA-8889: Summary: Incorrect exception message when trying unsupported option for acid tables Key: IMPALA-8889 URL: https://issues.apache.org/jira/browse/IMPALA-8889 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 3.3.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen when we try unsupported option say alter table on acid tables from , it thows an exception which is expected but it gives a wrong message : It says we only support Read for insert-only tables which is not true anymore, since we also support insert, drop ( and soon truncate) also now. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8889) Incorrect exception message when trying unsupported option for acid tables
Yongzhi Chen created IMPALA-8889: Summary: Incorrect exception message when trying unsupported option for acid tables Key: IMPALA-8889 URL: https://issues.apache.org/jira/browse/IMPALA-8889 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 3.3.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen when we try unsupported option say alter table on acid tables from , it thows an exception which is expected but it gives a wrong message : It says we only support Read for insert-only tables which is not true anymore, since we also support insert, drop ( and soon truncate) also now. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8691) Query hint for disabling data caching
[ https://issues.apache.org/jira/browse/IMPALA-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914553#comment-16914553 ] ASF subversion and git services commented on IMPALA-8691: - Commit 9874ce37a989240571e2473dce3153357a0e417f in impala's branch refs/heads/master from Michael Ho [ https://gitbox.apache.org/repos/asf?p=impala.git;h=9874ce3 ] IMPALA-8691: Query option to disable data cache This change adds a query option to disable the data cache for a given session. By default, this option is set to false. When it's set to true, all queries will by-pass the data cache. This allows users to avoid polluting the cache for accesses to tables which they don't want to cache. A follow-up change will add a per-table query hint to allow caching disabled for a given table only. There is some small refactoring in the code to make it clearer the type of caching being referred to in the code. As the code stands now, we have both HDFS caching (for local reads) and the data cache (for remote reads). BufferOpts has been extended to allow users to explicitly state intention for using either/both of the caches. Change-Id: I39122ac38435cedf94b2b39145863764d0b5b6c8 Reviewed-on: http://gerrit.cloudera.org:8080/14015 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins > Query hint for disabling data caching > - > > Key: IMPALA-8691 > URL: https://issues.apache.org/jira/browse/IMPALA-8691 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.3.0 >Reporter: Michael Ho >Assignee: Michael Ho >Priority: Major > > IMPALA-8690 tracks the effort for a better eviction algorithm for the > Impala's data cache. As a short term workaround, it would be nice to allow > users to explicitly set certain tables as not cacheable via query hints or > simply disable caching for a query via query options. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-7506) Support global INVALIDATE METADATA on fetch-on-demand impalad
[ https://issues.apache.org/jira/browse/IMPALA-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Garg reassigned IMPALA-7506: --- Assignee: Quanlong Huang > Support global INVALIDATE METADATA on fetch-on-demand impalad > - > > Key: IMPALA-7506 > URL: https://issues.apache.org/jira/browse/IMPALA-7506 > Project: IMPALA > Issue Type: Sub-task >Reporter: Todd Lipcon >Assignee: Quanlong Huang >Priority: Major > Labels: catalog-v2 > > There is some complexity with how this is implemented in the original code: > it depends on maintaining the minimum version of any object in the impalad's > local cache. We can't determine that in an on-demand impalad, so INVALIDATE > METADATA is not supported currently on "fetch-on-demand". -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7604) In AggregationNode.computeStats, handle cardinality overflow better
[ https://issues.apache.org/jira/browse/IMPALA-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914550#comment-16914550 ] Tim Armstrong commented on IMPALA-7604: --- Looked again, this is kinda nasty - it can actually overflow and get set to 0 in some cases. > In AggregationNode.computeStats, handle cardinality overflow better > --- > > Key: IMPALA-7604 > URL: https://issues.apache.org/jira/browse/IMPALA-7604 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.12.0 >Reporter: Paul Rogers >Assignee: Tim Armstrong >Priority: Major > > Consider the cardinality overflow logic in > [{{AggregationNode.computeStats()}}|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/AggregationNode.java]. > Current code: > {noformat} > // if we ended up with an overflow, the estimate is certain to be wrong > if (cardinality_ < 0) cardinality_ = -1; > {noformat} > This code has a number of issues. > * The check is done after looping over all conjuncts. It could be that, as a > result, the number overflowed twice. The check should be done after each > multiplication. > * Since we know that the number overflowed, a better estimate of the total > count is {{Long.MAX_VALUE}}. > * The code later checks for the -1 value and, if found, uses the cardinality > of the first child. This is a worse estimate than using the max value, since > the first child might have a low cardinality (it could be the later children > that caused the overflow.) > * If we really do expect overflow, then we are dealing with very large > numbers. Being accurate to the row is not needed. Better to use a {{double}} > which can handle the large values. > Since overflow probably seldom occurs, this is not an urgent issue. Though, > if overflow does occur, the query is huge, and having at least some estimate > of the hugeness is better than none. Also, seems that this code probably > evolved; this newbie is looking at it fresh and seeing that the accumulated > fixes could be tidied up. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8885) Improve parquet version metadata error
[ https://issues.apache.org/jira/browse/IMPALA-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8885 started by Tim Armstrong. - > Improve parquet version metadata error > -- > > Key: IMPALA-8885 > URL: https://issues.apache.org/jira/browse/IMPALA-8885 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: supportability > > The error looks like this now: > {noformat} > File 'hdfs://nn/books/books1G.csv' has an invalid version number: .99 This > could be due to stale metadata. Try running "refresh s3db.books_s3". > {noformat} > It seems to be reasonably common that this happens because a non-parquet file > is being queried in a parquet table. > The error message should say something like "File > 'hdfs://nn/books/books1G.csv' has an invalid Parquet version number: .99 and > does not appear to be a valid Parquet file. This could be due to stale > metadata. Try running "refresh s3db.books_s3" -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7027) Multiple Cast to Varchar with different limit fails with "AnalysisException: null CAUSED BY: IllegalArgumentException: "
[ https://issues.apache.org/jira/browse/IMPALA-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen resolved IMPALA-7027. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Multiple Cast to Varchar with different limit fails with "AnalysisException: > null CAUSED BY: IllegalArgumentException: " > > > Key: IMPALA-7027 > URL: https://issues.apache.org/jira/browse/IMPALA-7027 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala > 2.11.0, Impala 3.0, Impala 2.12.0 >Reporter: Meenakshi >Assignee: Yongzhi Chen >Priority: Major > Labels: planner, regression > Fix For: Impala 3.4.0 > > > If we have multiple cast of '' to varchar statements in a impala query which > has a distinct like below, the query breaks for scenario when the cast to > varchar limit in the SQL is lower than the previous cast. > > Query 1> Fails with " AnalysisException: null CAUSED BY: > IllegalArgumentException: targetType=VARCHAR(100) type=VARCHAR(101)" > SELECT DISTINCT CAST('' as VARCHAR(101)) as CL_COMMENTS,CAST('' as > VARCHAR(100)) as CL_USER_ID FROM tablename limit 1 > Where as the below query succeeds > Query 2> Success > SELECT DISTINCT CAST('' as VARCHAR(100)) as CL_COMMENTS,CAST('' as > VARCHAR(101)) as CL_USER_ID FROM tablename limit 1 > *Workaround* > SET ENABLE_EXPR_REWRITES=false; -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-7027) Multiple Cast to Varchar with different limit fails with "AnalysisException: null CAUSED BY: IllegalArgumentException: "
[ https://issues.apache.org/jira/browse/IMPALA-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen resolved IMPALA-7027. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Multiple Cast to Varchar with different limit fails with "AnalysisException: > null CAUSED BY: IllegalArgumentException: " > > > Key: IMPALA-7027 > URL: https://issues.apache.org/jira/browse/IMPALA-7027 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala > 2.11.0, Impala 3.0, Impala 2.12.0 >Reporter: Meenakshi >Assignee: Yongzhi Chen >Priority: Major > Labels: planner, regression > Fix For: Impala 3.4.0 > > > If we have multiple cast of '' to varchar statements in a impala query which > has a distinct like below, the query breaks for scenario when the cast to > varchar limit in the SQL is lower than the previous cast. > > Query 1> Fails with " AnalysisException: null CAUSED BY: > IllegalArgumentException: targetType=VARCHAR(100) type=VARCHAR(101)" > SELECT DISTINCT CAST('' as VARCHAR(101)) as CL_COMMENTS,CAST('' as > VARCHAR(100)) as CL_USER_ID FROM tablename limit 1 > Where as the below query succeeds > Query 2> Success > SELECT DISTINCT CAST('' as VARCHAR(100)) as CL_COMMENTS,CAST('' as > VARCHAR(101)) as CL_USER_ID FROM tablename limit 1 > *Workaround* > SET ENABLE_EXPR_REWRITES=false; -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8888) Profile fetch performance when result spooling is enabled
[ https://issues.apache.org/jira/browse/IMPALA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914506#comment-16914506 ] Sahil Takiar commented on IMPALA-: -- After talking with Tim offline, it seems that using a JDBC driver might be better than impala-shell (impala-shell is slow enough that server side perf improvements to this code probably don't affect latency). So will benchmark with JDBC instead. > Profile fetch performance when result spooling is enabled > - > > Key: IMPALA- > URL: https://issues.apache.org/jira/browse/IMPALA- > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > Profile the performance of fetching rows when result spooling is enabled. > There are a few queries that can be used to benchmark the performance: > {{time ./bin/impala-shell.sh -B -q "select l_orderkey from > tpch_parquet.lineitem" > /dev/null}} > {{time ./bin/impala-shell.sh -B -q "select * from tpch_parquet.orders" > > /dev/null}} > The first fetches one column and 6,001,215 the second fetches 9 columns and > 1,500,000 - so a mix of rows fetched vs. columns fetched. > The base line for the benchmark should be the commit prior to IMPALA-8780. > The benchmark should check for both latency and CPU usage (to see if the copy > into {{BufferedTupleStream}} has a significant overhead). > Various fetch sizes should be used in the benchmark as well to see if > increasing the fetch size for result spooling improves performance (ideally > it should) (it would be nice to run some fetches between machines as well as > that will better reflect network round trip latencies). -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8887) Conflicting links to issue tracker
[ https://issues.apache.org/jira/browse/IMPALA-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8887. --- Resolution: Fixed Thanks for catching this, we obviously missed this when migrating that page a couple of years ago. > Conflicting links to issue tracker > -- > > Key: IMPALA-8887 > URL: https://issues.apache.org/jira/browse/IMPALA-8887 > Project: IMPALA > Issue Type: Bug >Reporter: Sebb >Assignee: Tim Armstrong >Priority: Major > > The following CWiki page has two links for reporting issues: > https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala > "please open a new JIRA ticket at the Impala JIRA tracker" > and > "Impala has a very active JIRA instance." > The former links to https://issues.cloudera.org/projects/IMPALA/ > whereas the latter links to https://issues.apache.org/jira/projects/IMPALA/ > The cloudera tracker does not have any recent tickets so I assume it is no > longer used, and should not be referenced. > Ideally flag the old tracker as obsolete with a link to the new tracker. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8887) Conflicting links to issue tracker
[ https://issues.apache.org/jira/browse/IMPALA-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8887. --- Resolution: Fixed Thanks for catching this, we obviously missed this when migrating that page a couple of years ago. > Conflicting links to issue tracker > -- > > Key: IMPALA-8887 > URL: https://issues.apache.org/jira/browse/IMPALA-8887 > Project: IMPALA > Issue Type: Bug >Reporter: Sebb >Assignee: Tim Armstrong >Priority: Major > > The following CWiki page has two links for reporting issues: > https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala > "please open a new JIRA ticket at the Impala JIRA tracker" > and > "Impala has a very active JIRA instance." > The former links to https://issues.cloudera.org/projects/IMPALA/ > whereas the latter links to https://issues.apache.org/jira/projects/IMPALA/ > The cloudera tracker does not have any recent tickets so I assume it is no > longer used, and should not be referenced. > Ideally flag the old tracker as obsolete with a link to the new tracker. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8887) Conflicting links to issue tracker
[ https://issues.apache.org/jira/browse/IMPALA-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-8887: - Assignee: Tim Armstrong > Conflicting links to issue tracker > -- > > Key: IMPALA-8887 > URL: https://issues.apache.org/jira/browse/IMPALA-8887 > Project: IMPALA > Issue Type: Bug >Reporter: Sebb >Assignee: Tim Armstrong >Priority: Major > > The following CWiki page has two links for reporting issues: > https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala > "please open a new JIRA ticket at the Impala JIRA tracker" > and > "Impala has a very active JIRA instance." > The former links to https://issues.cloudera.org/projects/IMPALA/ > whereas the latter links to https://issues.apache.org/jira/projects/IMPALA/ > The cloudera tracker does not have any recent tickets so I assume it is no > longer used, and should not be referenced. > Ideally flag the old tracker as obsolete with a link to the new tracker. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8886) Please delete old releases from mirroring system
[ https://issues.apache.org/jira/browse/IMPALA-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-8886: - Assignee: Quanlong Huang > Please delete old releases from mirroring system > > > Key: IMPALA-8886 > URL: https://issues.apache.org/jira/browse/IMPALA-8886 > Project: IMPALA > Issue Type: Bug >Reporter: Sebb >Assignee: Quanlong Huang >Priority: Major > > To reduce the load on the ASF mirrors, projects are required to delete old > releases [1] > Please can you remove all non-current releases? > i.e. all but 3.3.0 > It's unfair to expect the 3rd party mirrors to carry old releases. > However you can still link to the archives for historic releases. > Please also update your release procedures (if relevant) > Thanks! > [1] [http://www.apache.org/dev/release.html#when-to-archive] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-7312) Non-blocking mode for Fetch() RPC
[ https://issues.apache.org/jira/browse/IMPALA-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-7312: - Assignee: Sahil Takiar > Non-blocking mode for Fetch() RPC > - > > Key: IMPALA-7312 > URL: https://issues.apache.org/jira/browse/IMPALA-7312 > Project: IMPALA > Issue Type: Sub-task > Components: Clients >Reporter: Tim Armstrong >Assignee: Sahil Takiar >Priority: Major > Labels: resource-management > > Currently Fetch() can block for an arbitrary amount of time until a batch of > rows is produced. It might be helpful to have a mode where it returns quickly > when there is no data available, so that threads and RPC slots are not tied > up. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8888) Profile fetch performance when result spooling is enabled
Sahil Takiar created IMPALA-: Summary: Profile fetch performance when result spooling is enabled Key: IMPALA- URL: https://issues.apache.org/jira/browse/IMPALA- Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar Assignee: Sahil Takiar Profile the performance of fetching rows when result spooling is enabled. There are a few queries that can be used to benchmark the performance: {{time ./bin/impala-shell.sh -B -q "select l_orderkey from tpch_parquet.lineitem" > /dev/null}} {{time ./bin/impala-shell.sh -B -q "select * from tpch_parquet.orders" > /dev/null}} The first fetches one column and 6,001,215 the second fetches 9 columns and 1,500,000 - so a mix of rows fetched vs. columns fetched. The base line for the benchmark should be the commit prior to IMPALA-8780. The benchmark should check for both latency and CPU usage (to see if the copy into {{BufferedTupleStream}} has a significant overhead). Various fetch sizes should be used in the benchmark as well to see if increasing the fetch size for result spooling improves performance (ideally it should) (it would be nice to run some fetches between machines as well as that will better reflect network round trip latencies). -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8888) Profile fetch performance when result spooling is enabled
Sahil Takiar created IMPALA-: Summary: Profile fetch performance when result spooling is enabled Key: IMPALA- URL: https://issues.apache.org/jira/browse/IMPALA- Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar Assignee: Sahil Takiar Profile the performance of fetching rows when result spooling is enabled. There are a few queries that can be used to benchmark the performance: {{time ./bin/impala-shell.sh -B -q "select l_orderkey from tpch_parquet.lineitem" > /dev/null}} {{time ./bin/impala-shell.sh -B -q "select * from tpch_parquet.orders" > /dev/null}} The first fetches one column and 6,001,215 the second fetches 9 columns and 1,500,000 - so a mix of rows fetched vs. columns fetched. The base line for the benchmark should be the commit prior to IMPALA-8780. The benchmark should check for both latency and CPU usage (to see if the copy into {{BufferedTupleStream}} has a significant overhead). Various fetch sizes should be used in the benchmark as well to see if increasing the fetch size for result spooling improves performance (ideally it should) (it would be nice to run some fetches between machines as well as that will better reflect network round trip latencies). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IMPALA-8754) S3 with S3Guard tests encounter "ResourceNotFoundException" from DynamoDB
[ https://issues.apache.org/jira/browse/IMPALA-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914206#comment-16914206 ] Steve Loughran commented on IMPALA-8754: DDB table wasn't found # the table doesn't exist # the table does exist, but it is in a different region S3Guard infers the region of the table to be that of the bucket; if you are reading data from buckets in other regions, the inference will be wrong. There's some option to fix the table region; {{fs.s3a.s3guard.ddb.region}}. you need to set this to the region where the table is > S3 with S3Guard tests encounter "ResourceNotFoundException" from DynamoDB > - > > Key: IMPALA-8754 > URL: https://issues.apache.org/jira/browse/IMPALA-8754 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Critical > Labels: broken-build, flaky > > When running tests on s3 with s3guard, various tests can encounter the > following error coming from the DynamoDB: > {noformat} > EQuery aborted:Disk I/O error on > impala-ec2-centos74-m5-4xlarge-ondemand-02c8.vpc.cloudera.com:22002: Failed > to open HDFS file > s3a://impala-test-uswest2-1/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2451718/6843d8a91fc5ae1d-88b2af4b0004_156969840_data.0.parq > E Error(2): No such file or directory > E Root cause: ResourceNotFoundException: Requested resource not found > (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: > ResourceNotFoundException; Request ID: > XXX){noformat} > Tests that have seen this (this is flaky): > * TestTpcdsQuery.test_tpcds_count > * TestHdfsFdCaching.test_caching_disabled_by_param > * TestMtDop.test_compute_stats > * TestScanRangeLengths.test_scan_ranges -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8887) Conflicting links to issue tracker
Sebb created IMPALA-8887: Summary: Conflicting links to issue tracker Key: IMPALA-8887 URL: https://issues.apache.org/jira/browse/IMPALA-8887 Project: IMPALA Issue Type: Bug Reporter: Sebb The following CWiki page has two links for reporting issues: https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala "please open a new JIRA ticket at the Impala JIRA tracker" and "Impala has a very active JIRA instance." The former links to https://issues.cloudera.org/projects/IMPALA/ whereas the latter links to https://issues.apache.org/jira/projects/IMPALA/ The cloudera tracker does not have any recent tickets so I assume it is no longer used, and should not be referenced. Ideally flag the old tracker as obsolete with a link to the new tracker. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8887) Conflicting links to issue tracker
Sebb created IMPALA-8887: Summary: Conflicting links to issue tracker Key: IMPALA-8887 URL: https://issues.apache.org/jira/browse/IMPALA-8887 Project: IMPALA Issue Type: Bug Reporter: Sebb The following CWiki page has two links for reporting issues: https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala "please open a new JIRA ticket at the Impala JIRA tracker" and "Impala has a very active JIRA instance." The former links to https://issues.cloudera.org/projects/IMPALA/ whereas the latter links to https://issues.apache.org/jira/projects/IMPALA/ The cloudera tracker does not have any recent tickets so I assume it is no longer used, and should not be referenced. Ideally flag the old tracker as obsolete with a link to the new tracker. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8886) Please delete old releases from mirroring system
Sebb created IMPALA-8886: Summary: Please delete old releases from mirroring system Key: IMPALA-8886 URL: https://issues.apache.org/jira/browse/IMPALA-8886 Project: IMPALA Issue Type: Bug Reporter: Sebb To reduce the load on the ASF mirrors, projects are required to delete old releases [1] Please can you remove all non-current releases? i.e. all but 3.3.0 It's unfair to expect the 3rd party mirrors to carry old releases. However you can still link to the archives for historic releases. Please also update your release procedures (if relevant) Thanks! [1] [http://www.apache.org/dev/release.html#when-to-archive] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8886) Please delete old releases from mirroring system
Sebb created IMPALA-8886: Summary: Please delete old releases from mirroring system Key: IMPALA-8886 URL: https://issues.apache.org/jira/browse/IMPALA-8886 Project: IMPALA Issue Type: Bug Reporter: Sebb To reduce the load on the ASF mirrors, projects are required to delete old releases [1] Please can you remove all non-current releases? i.e. all but 3.3.0 It's unfair to expect the 3rd party mirrors to carry old releases. However you can still link to the archives for historic releases. Please also update your release procedures (if relevant) Thanks! [1] [http://www.apache.org/dev/release.html#when-to-archive] -- This message was sent by Atlassian Jira (v8.3.2#803003)