[jira] [Resolved] (IMPALA-9132) Explain statements should not cause NPE in LogLineageRecord()
[ https://issues.apache.org/jira/browse/IMPALA-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Mantripragada resolved IMPALA-9132. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Explain statements should not cause NPE in LogLineageRecord() > - > > Key: IMPALA-9132 > URL: https://issues.apache.org/jira/browse/IMPALA-9132 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Anurag Mantripragada >Assignee: Anurag Mantripragada >Priority: Blocker > Labels: crash > Fix For: Impala 3.4.0 > > > For DDLs, LogLineageRecord() adds certain fields to the lineageGraph in the > backend. However, explain statements do not have a catalogOpExecutor causing > a NPE. > We should, in general, avoid creating lineage records for Explain as Atlas > currently does not use them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9132) Explain statements should not cause NPE in LogLineageRecord()
[ https://issues.apache.org/jira/browse/IMPALA-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968787#comment-16968787 ] ASF subversion and git services commented on IMPALA-9132: - Commit f49f8d8a32d12128eafb4c76632ca2908d22fa28 in impala's branch refs/heads/master from Anurag Mantripragada [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f49f8d8 ] IMPALA-9132: Explain statements should not cause nullptr in LogLineageRecord() For DDLs LogLineageRecord() adds certain fields in the backend before flushing the lineage. It uses ddl_exec_response() to get these fields. However, explain is a special kind of DDL which does not have an associated catalog_op_executor_. This causes explain statements to throw NPE when ddl_exec_response() is called. Currently, tools like atlas do not track lineages for explain statements. This change skips lineage logging for explain statements. In general, adds a nullptr check for catalog_op_executor_. Testing: Added a test to verify lineage is not created for explain statements. Change-Id: Iccc20fd5a80841c820ebeb4edffccebea30df76e Reviewed-on: http://gerrit.cloudera.org:8080/14646 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins > Explain statements should not cause NPE in LogLineageRecord() > - > > Key: IMPALA-9132 > URL: https://issues.apache.org/jira/browse/IMPALA-9132 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Anurag Mantripragada >Assignee: Anurag Mantripragada >Priority: Blocker > Labels: crash > > For DDLs, LogLineageRecord() adds certain fields to the lineageGraph in the > backend. However, explain statements do not have a catalogOpExecutor causing > a NPE. > We should, in general, avoid creating lineage records for Explain as Atlas > currently does not use them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8778) Support read/write Apache Hudi tables
[ https://issues.apache.org/jira/browse/IMPALA-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968698#comment-16968698 ] Vinoth Chandar commented on IMPALA-8778: > I think you need logic in Impala that understands slices and only uses the > latest slice when querying a partition. +1. in Hive/Spark/Presto, we make the query call HoodieInputFormat to do this > Support read/write Apache Hudi tables > - > > Key: IMPALA-8778 > URL: https://issues.apache.org/jira/browse/IMPALA-8778 > Project: IMPALA > Issue Type: New Feature >Reporter: Yuanbin Cheng >Assignee: Yanjia Gary Li >Priority: Major > > Apache Impala currently not support Apache Hudi, cannot even pull metadata > from Hive. > Related issue: > [https://github.com/apache/incubator-hudi/issues/179] > [https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146|https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146?filter=allopenissues] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9127) Clean up probe-side state machine in hash join
[ https://issues.apache.org/jira/browse/IMPALA-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968594#comment-16968594 ] Tim Armstrong commented on IMPALA-9127: --- Maybe the states would be something like: * PROBING_HASH_PARTITIONS_NO_BATCH -> hash_partitions are valid and we are processing probe batches. We do not have a current probe batch. * PROBING_HASH_PARTITIONS_IN_BATCH -> hash_partitions are valid and we are processing probe batches. We have a current probe batch. * OUTPUTTING_UNMATCHED_ROWS -> we have some output rows to emit * CURRENT_PROBE_COMPLETE -> finished processing the current input probe rows and any subsequent actions. There may be more spilled partitions. * OUTPUTTING_NULL_PROBE_ROWS -> processing null_probe_rows_ for output * OUTPUTTING_NULL_AWARE_PARTITION -> processing null_aware partition for output * END -> finishing processing all probe input, spilled partitions and any follow-up work. > Clean up probe-side state machine in hash join > -- > > Key: IMPALA-9127 > URL: https://issues.apache.org/jira/browse/IMPALA-9127 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > > There's an implicit state machine in the main loop in > PartitionedHashJoinNode::GetNext() > https://github.com/apache/impala/blob/eea617b/be/src/exec/partitioned-hash-join-node.cc#L510 > The state is implicitly defined based on the following conditions: > * !output_build_partitions_.empty() -> "outputting build rows after probing" > * builder_->null_aware_partition() == NULL -> "eos, because this the > null-aware partition is processed after all other partitions" > * null_probe_output_idx_ >= 0 -> "null probe rows being processed" > * output_null_aware_probe_rows_running_ -> "null-aware partition being > processed" > * probe_batch_pos_ != -1 -> "processing probe batch" > * builder_->num_hash_partitions() != 0 -> "have active hash partitions that > are being probed" > * spilled_partitions_.empty() -> "no more spilled partitions" > I think this would be a lot easier to follow if the state machine was > explicit and documented, and would make separating out the build side of a > spilling hash join easier to get right. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9127) Clean up probe-side state machine in hash join
[ https://issues.apache.org/jira/browse/IMPALA-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-9127: - Assignee: Tim Armstrong > Clean up probe-side state machine in hash join > -- > > Key: IMPALA-9127 > URL: https://issues.apache.org/jira/browse/IMPALA-9127 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > > There's an implicit state machine in the main loop in > PartitionedHashJoinNode::GetNext() > https://github.com/apache/impala/blob/eea617b/be/src/exec/partitioned-hash-join-node.cc#L510 > The state is implicitly defined based on the following conditions: > * !output_build_partitions_.empty() -> "outputting build rows after probing" > * builder_->null_aware_partition() == NULL -> "eos, because this the > null-aware partition is processed after all other partitions" > * null_probe_output_idx_ >= 0 -> "null probe rows being processed" > * output_null_aware_probe_rows_running_ -> "null-aware partition being > processed" > * probe_batch_pos_ != -1 -> "processing probe batch" > * builder_->num_hash_partitions() != 0 -> "have active hash partitions that > are being probed" > * spilled_partitions_.empty() -> "no more spilled partitions" > I think this would be a lot easier to follow if the state machine was > explicit and documented, and would make separating out the build side of a > spilling hash join easier to get right. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8778) Support read/write Apache Hudi tables
[ https://issues.apache.org/jira/browse/IMPALA-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968581#comment-16968581 ] Tim Armstrong commented on IMPALA-8778: --- I don't see how you could implement reading from a Hudi table without changing Impala (or Hive for that matter). With the original Hive table layout, the contents of a partition are determined by listing a directory, and it looks like if you list the directory of a Hudi partition, you will get back duplicated data from multiple slices. I.e. I think you need logic in Impala that understands slices and only uses the latest slice when querying a partition. The only way to add or remove an individual file to a classic Hive table (Impala/Hive tables are the same thing) is to add or remove it from the partition directory. > Support read/write Apache Hudi tables > - > > Key: IMPALA-8778 > URL: https://issues.apache.org/jira/browse/IMPALA-8778 > Project: IMPALA > Issue Type: New Feature >Reporter: Yuanbin Cheng >Assignee: Yanjia Gary Li >Priority: Major > > Apache Impala currently not support Apache Hudi, cannot even pull metadata > from Hive. > Related issue: > [https://github.com/apache/incubator-hudi/issues/179] > [https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146|https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146?filter=allopenissues] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8778) Support read/write Apache Hudi tables
[ https://issues.apache.org/jira/browse/IMPALA-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-8778: - Assignee: Yanjia Gary Li (was: Yuanbin Cheng) > Support read/write Apache Hudi tables > - > > Key: IMPALA-8778 > URL: https://issues.apache.org/jira/browse/IMPALA-8778 > Project: IMPALA > Issue Type: New Feature >Reporter: Yuanbin Cheng >Assignee: Yanjia Gary Li >Priority: Major > > Apache Impala currently not support Apache Hudi, cannot even pull metadata > from Hive. > Related issue: > [https://github.com/apache/incubator-hudi/issues/179] > [https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146|https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146?filter=allopenissues] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8561) ScanRanges with mtime=-1 can lead to inconsistent reads when using the file handle cache
[ https://issues.apache.org/jira/browse/IMPALA-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8561: -- Description: {color:red}colored text{color}The file handle cache relies on the mtime to distinguish between different versions of a file. For example, if file X exists with mtime=1, then it is overwritten and the metadata is updated so that now it is at mtime=2, the file handle cache treats them as completely different things and can never use a single file handle to serve both. However, some codepaths generate ScanRanges with an mtime of -1. This removes the ability to distinguish these two versions of a file and can read to consistency problems. A specific example is the code that reads the parquet footer [HdfsParquetScanner::ProcessFooter()|https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1354]. We don't know ahead of time how big the Parquet footer is. So, we read 100KB (determined by [FOOTER_SIZE|https://github.com/apache/impala/blob/449fe73d2145bd22f0f857623c3652a097f06d73/be/src/exec/hdfs-scanner.h#L331]). If the footer size encoded in the last few bytes of the file indicates that the footer is larger than that [code here|https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1414], then we issue a separate read for the actual size of the footer. That separate read does not inherit the mtime of the original read and instead uses an mtime of -1. I verified this by adding tracing and issuing a select against functional_parquet.widetable_1000_cols. A failure scenario associated with this is that we read the last 100KB using a ScanRange with mtime=2, then we find that the footer is larger than 100KB and issue a ScanRange with mtime=-1. This uses a file handle that is from a previous version of the file equivalent to mtime=1. The data it is reading may not come from the end of the file, or it may be at the end of the file but the footer has a different length. (There is no validation on the new read to check the magic value or metadata size reported by the new buffer.) Either would result in a failure to deserialize the thrift for the footer. For example, a problem case could produce an error message like: {noformat} File hdfs://test-warehouse/example_file.parq of length 1048576 bytes has invalid file metadata at file offset 462017. Error = couldn't deserialize thrift msg: TProtocolException: Invalid data .{noformat} To fix this, we should examine all locations that can result in ScanRanges with mtime=-1 and eliminate any that we can. For example, the HdfsParquetScanner::ProcessFooter() code should create a ScanRange that inherits the mtime from the original footer ScanRange. Also, the file handle cache should refuse to cache file handles with mtime=-1. The code in HdfsParquetScanner::ProcessFooter() should add validation for the magic value and metadata size when reading a footer larger than 100KB to verify that we are reading something valid. The thrift deserialize failure gives some information, but catching this case more specifically would provide a better error message. h2. Workarounds * This is most often caused by overwriting files in-place (e.g. INSERT OVERWRITE from Hive) without refreshing the metadata. You can avoid the issue by avoiding these in-place rewrites or by consistently running REFRESH in Impala after the modifications. * Setting --max_cached_file_handles=0 in the impalad startup options can work around the issue, at the cost of performance. was: {color:red}colored text{color}The file handle cache relies on the mtime to distinguish between different versions of a file. For example, if file X exists with mtime=1, then it is overwritten and the metadata is updated so that now it is at mtime=2, the file handle cache treats them as completely different things and can never use a single file handle to serve both. However, some codepaths generate ScanRanges with an mtime of -1. This removes the ability to distinguish these two versions of a file and can read to consistency problems. A specific example is the code that reads the parquet footer [HdfsParquetScanner::ProcessFooter()|https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1354]. We don't know ahead of time how big the Parquet footer is. So, we read 100KB (determined by [FOOTER_SIZE|https://github.com/apache/impala/blob/449fe73d2145bd22f0f857623c3652a097f06d73/be/src/exec/hdfs-scanner.h#L331]). If the footer size encoded in the last few bytes of the file indicates that the footer is larger than that [code here|https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1414], then we
[jira] [Updated] (IMPALA-8561) ScanRanges with mtime=-1 can lead to inconsistent reads when using the file handle cache
[ https://issues.apache.org/jira/browse/IMPALA-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8561: -- Description: {color:red}colored text{color}The file handle cache relies on the mtime to distinguish between different versions of a file. For example, if file X exists with mtime=1, then it is overwritten and the metadata is updated so that now it is at mtime=2, the file handle cache treats them as completely different things and can never use a single file handle to serve both. However, some codepaths generate ScanRanges with an mtime of -1. This removes the ability to distinguish these two versions of a file and can read to consistency problems. A specific example is the code that reads the parquet footer [HdfsParquetScanner::ProcessFooter()|https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1354]. We don't know ahead of time how big the Parquet footer is. So, we read 100KB (determined by [FOOTER_SIZE|https://github.com/apache/impala/blob/449fe73d2145bd22f0f857623c3652a097f06d73/be/src/exec/hdfs-scanner.h#L331]). If the footer size encoded in the last few bytes of the file indicates that the footer is larger than that [code here|https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1414], then we issue a separate read for the actual size of the footer. That separate read does not inherit the mtime of the original read and instead uses an mtime of -1. I verified this by adding tracing and issuing a select against functional_parquet.widetable_1000_cols. A failure scenario associated with this is that we read the last 100KB using a ScanRange with mtime=2, then we find that the footer is larger than 100KB and issue a ScanRange with mtime=-1. This uses a file handle that is from a previous version of the file equivalent to mtime=1. The data it is reading may not come from the end of the file, or it may be at the end of the file but the footer has a different length. (There is no validation on the new read to check the magic value or metadata size reported by the new buffer.) Either would result in a failure to deserialize the thrift for the footer. For example, a problem case could produce an error message like: {noformat} File hdfs://test-warehouse/example_file.parq of length 1048576 bytes has invalid file metadata at file offset 462017. Error = couldn't deserialize thrift msg: TProtocolException: Invalid data .{noformat} To fix this, we should examine all locations that can result in ScanRanges with mtime=-1 and eliminate any that we can. For example, the HdfsParquetScanner::ProcessFooter() code should create a ScanRange that inherits the mtime from the original footer ScanRange. Also, the file handle cache should refuse to cache file handles with mtime=-1. The code in HdfsParquetScanner::ProcessFooter() should add validation for the magic value and metadata size when reading a footer larger than 100KB to verify that we are reading something valid. The thrift deserialize failure gives some information, but catching this case more specifically would provide a better error message. was: {color:red}colored text{color}The file handle cache relies on the mtime to distinguish between different versions of a file. For example, if file X exists with mtime=1, then it is overwritten and the metadata is updated so that now it is at mtime=2, the file handle cache treats them as completely different things and can never use a single file handle to serve both. However, some codepaths generate ScanRanges with an mtime of -1. This removes the ability to distinguish these two versions of a file and can read to consistency problems. A specific example is the code that reads the parquet footer [HdfsParquetScanner::ProcessFooter()|https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1354]. We don't know ahead of time how big the Parquet footer is. So, we read 100KB (determined by [FOOTER_SIZE|https://github.com/apache/impala/blob/449fe73d2145bd22f0f857623c3652a097f06d73/be/src/exec/hdfs-scanner.h#L331]). If the footer size encoded in the last few bytes of the file indicates that the footer is larger than that [code here|https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1414], then we issue a separate read for the actual size of the footer. That separate read does not inherit the mtime of the original read and instead uses an mtime of -1. I verified this by adding tracing and issuing a select against functional_parquet.widetable_1000_cols. A failure scenario associated with this is that we read the last 100KB using a ScanRange with mtime=2, then we find that the footer is
[jira] [Updated] (IMPALA-9132) Explain statements should not cause NPE in LogLineageRecord()
[ https://issues.apache.org/jira/browse/IMPALA-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-9132: -- Priority: Blocker (was: Critical) > Explain statements should not cause NPE in LogLineageRecord() > - > > Key: IMPALA-9132 > URL: https://issues.apache.org/jira/browse/IMPALA-9132 > Project: IMPALA > Issue Type: Bug >Reporter: Anurag Mantripragada >Assignee: Anurag Mantripragada >Priority: Blocker > > For DDLs, LogLineageRecord() adds certain fields to the lineageGraph in the > backend. However, explain statements do not have a catalogOpExecutor causing > a NPE. > We should, in general, avoid creating lineage records for Explain as Atlas > currently does not use them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9132) Explain statements should not cause NPE in LogLineageRecord()
[ https://issues.apache.org/jira/browse/IMPALA-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-9132: -- Labels: crash (was: ) > Explain statements should not cause NPE in LogLineageRecord() > - > > Key: IMPALA-9132 > URL: https://issues.apache.org/jira/browse/IMPALA-9132 > Project: IMPALA > Issue Type: Bug >Reporter: Anurag Mantripragada >Assignee: Anurag Mantripragada >Priority: Blocker > Labels: crash > > For DDLs, LogLineageRecord() adds certain fields to the lineageGraph in the > backend. However, explain statements do not have a catalogOpExecutor causing > a NPE. > We should, in general, avoid creating lineage records for Explain as Atlas > currently does not use them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9132) Explain statements should not cause NPE in LogLineageRecord()
[ https://issues.apache.org/jira/browse/IMPALA-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-9132: -- Component/s: Backend > Explain statements should not cause NPE in LogLineageRecord() > - > > Key: IMPALA-9132 > URL: https://issues.apache.org/jira/browse/IMPALA-9132 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Anurag Mantripragada >Assignee: Anurag Mantripragada >Priority: Blocker > Labels: crash > > For DDLs, LogLineageRecord() adds certain fields to the lineageGraph in the > backend. However, explain statements do not have a catalogOpExecutor causing > a NPE. > We should, in general, avoid creating lineage records for Explain as Atlas > currently does not use them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7860) Tests use partition name that isn't supported on ABFS
[ https://issues.apache.org/jira/browse/IMPALA-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-7860. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed Closing this as Fixed. HADOOP-15860 was done a while ago, and now Impala-on-ABFS cannot write files / directories that end with a period (which is expected). There was one bug that was introduced to Impala due to this change: IMPALA-8557 - but that has been fixed now as well. In IMPALA-9117 I created a new skip flag for ABFS tests for the "cannot write write trailing periods" behavior, and added it to any affected tests. > Tests use partition name that isn't supported on ABFS > - > > Key: IMPALA-7860 > URL: https://issues.apache.org/jira/browse/IMPALA-7860 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Sean Mackrory >Priority: Major > Fix For: Impala 3.4.0 > > > IMPALA-7681 introduced support for the ADLS Gen2 service / ABFS client. As > mentioned in the code review for that > (https://gerrit.cloudera.org/#/c/11630/) a couple of tests were failing > because they use a partition name that ends with a period. If the tests are > modified to end with anything other than a period, they work just fine. > In HADOOP-15860, that's sounding like it's just a known limitation of the > blob storage that shares infrastructure with ADLS Gen2 that won't be changing > any time soon. I propose we modify the tests to just use a slightly different > partition name. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7726) Drop with purge tests fail against ABFS due to trash misbehavior
[ https://issues.apache.org/jira/browse/IMPALA-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-7726. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed Closing as Fixed. I re-enabled the tests, looped them overnight, and didn't hit any failures. So it is likely whatever bug was causing these issues has been resolved. > Drop with purge tests fail against ABFS due to trash misbehavior > > > Key: IMPALA-7726 > URL: https://issues.apache.org/jira/browse/IMPALA-7726 > Project: IMPALA > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Labels: flaky > Fix For: Impala 3.4.0 > > > In testing IMPALA-7681, I've seen test_drop_partition_with_purge and > test_drop_table_with_purge fail because of files not found in the trash are a > drop without purge. I've traced that functionality through Hive, which uses > Hadoop's Trash API, and traced through a bunch of scenarios in that API with > ABFS and I can't see it misbehaving in any way. It also should be pretty > FS-agnostic. I also suspected a bug in abfs_utils.py's exists() function, but > have not been able to find one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8557) Impala on ABFS failed with error "IllegalArgumentException: ABFS does not allow files or directories to end with a dot."
[ https://issues.apache.org/jira/browse/IMPALA-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8557. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Impala on ABFS failed with error "IllegalArgumentException: ABFS does not > allow files or directories to end with a dot." > > > Key: IMPALA-8557 > URL: https://issues.apache.org/jira/browse/IMPALA-8557 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: Eric Lin >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > HDFS introduced below feature to stop users from creating a file that ends > with "." on ABFS: > https://issues.apache.org/jira/browse/HADOOP-15860 > As a result of this change, Impala now writes to ABFS fails with such error. > I can see that it generates temp file using this format "$0.$1.$2": > https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/exec/hdfs-table-sink.cc#L329 > $2 is the file extension and will be empty if it is TEXT file format: > https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/exec/hdfs-text-table-writer.cc#L65 > Since HADOOP-15860 was backported into CDH6.2, it is currently only affecting > 6.2 and works in older versions. > There is no way to override this empty file extension so no workaround is > possible, unless user choose another file format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9117) test_lineage.py and test_mt_dop.py are failing on ABFS
[ https://issues.apache.org/jira/browse/IMPALA-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968499#comment-16968499 ] ASF subversion and git services commented on IMPALA-9117: - Commit e8fda1f224d3ad237183a53e238eee90188d82e2 in impala's branch refs/heads/master from Sahil Takiar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e8fda1f ] IMPALA-9117, IMPALA-7726: Fixed a few unit tests for ABFS This test makes the following changes / fixes when running Impala tests on ABFS: * Skips some tests in test_lineage.py that don't work on ABFS / ADLS (they were already skipped for S3) * Skips some tests in test_mt_dop.py; the test creates a directory that ends with a period (and ABFS does not support writing files or directories that end with a period) * Removes the ABFS skip flag SkipIfABFS.trash (IMPALA-7726: Drop with purge tests fail against ABFS due to trash misbehavior"); I removed these flags and looped the tests overnight with no failures, so it is likely whatever bug was causing this has now been fixed * Now that HADOOP-15860 has been resolved, and the agreed upon behavior for ABFS is that it will fail if a client tries to write a file / directory that ends with a period, I added a new entry to the SkipIfABFS class called file_or_folder_name_ends_with_period and applied it where necessary Testing: * Ran core tests on ABFS Change-Id: I18ae5b0f7de6aa7628a1efd780ff30a0cc3c5285 Reviewed-on: http://gerrit.cloudera.org:8080/14636 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > test_lineage.py and test_mt_dop.py are failing on ABFS > -- > > Key: IMPALA-9117 > URL: https://issues.apache.org/jira/browse/IMPALA-9117 > Project: IMPALA > Issue Type: Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > Both failures are known issues. > {{TestLineage::test_lineage_output}} is failing because the test requires > HBase to run (this test is already disabled for S3). > {{TestMtDopFlags::test_mt_dop_all}} is failing because it runs > {{QueryTest/insert}} which includes a query that writes a folder that ends in > a dot. ABFS does not allow files or directories to end in a dot - IMPALA-7860 > / IMPALA-7681 / HADOOP-15860. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7726) Drop with purge tests fail against ABFS due to trash misbehavior
[ https://issues.apache.org/jira/browse/IMPALA-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968501#comment-16968501 ] ASF subversion and git services commented on IMPALA-7726: - Commit e8fda1f224d3ad237183a53e238eee90188d82e2 in impala's branch refs/heads/master from Sahil Takiar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e8fda1f ] IMPALA-9117, IMPALA-7726: Fixed a few unit tests for ABFS This test makes the following changes / fixes when running Impala tests on ABFS: * Skips some tests in test_lineage.py that don't work on ABFS / ADLS (they were already skipped for S3) * Skips some tests in test_mt_dop.py; the test creates a directory that ends with a period (and ABFS does not support writing files or directories that end with a period) * Removes the ABFS skip flag SkipIfABFS.trash (IMPALA-7726: Drop with purge tests fail against ABFS due to trash misbehavior"); I removed these flags and looped the tests overnight with no failures, so it is likely whatever bug was causing this has now been fixed * Now that HADOOP-15860 has been resolved, and the agreed upon behavior for ABFS is that it will fail if a client tries to write a file / directory that ends with a period, I added a new entry to the SkipIfABFS class called file_or_folder_name_ends_with_period and applied it where necessary Testing: * Ran core tests on ABFS Change-Id: I18ae5b0f7de6aa7628a1efd780ff30a0cc3c5285 Reviewed-on: http://gerrit.cloudera.org:8080/14636 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Drop with purge tests fail against ABFS due to trash misbehavior > > > Key: IMPALA-7726 > URL: https://issues.apache.org/jira/browse/IMPALA-7726 > Project: IMPALA > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Labels: flaky > > In testing IMPALA-7681, I've seen test_drop_partition_with_purge and > test_drop_table_with_purge fail because of files not found in the trash are a > drop without purge. I've traced that functionality through Hive, which uses > Hadoop's Trash API, and traced through a bunch of scenarios in that API with > ABFS and I can't see it misbehaving in any way. It also should be pretty > FS-agnostic. I also suspected a bug in abfs_utils.py's exists() function, but > have not been able to find one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8557) Impala on ABFS failed with error "IllegalArgumentException: ABFS does not allow files or directories to end with a dot."
[ https://issues.apache.org/jira/browse/IMPALA-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968502#comment-16968502 ] ASF subversion and git services commented on IMPALA-8557: - Commit 8b8a49e617818e9bcf99b784b63587c95cebd622 in impala's branch refs/heads/master from Sahil Takiar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=8b8a49e ] IMPALA-8557: Add '.txt' to text files, remove '.' at end of filenames Writes to text tables on ABFS are failing because HADOOP-15860 recently changed the ABFS behavior when writing files / folders that end with a '.'. ABFS explicitly does not allow files / folders that end with a dot. >From the ABFS docs: "Avoid blob names that end with a dot (.), a forward slash (/), or a sequence or combination of the two." The behavior prior to HADOOP-15860 was to simply drop any trailing dots when writing files or folders, but that can lead to various issues because clients may try to read back a file that should exist on ABFS, but doesn't. HADOOP-15860 changed the behavior so that any attempt to write a file or folder with a trailing dot fails on ABFS. Impala writes all text files with a trailing dot due to some odd behavior in hdfs-table-sink.cc. The table sink writes files with a "file extension" which is dependent on the file type. For example, Parquet files have a file extension of ".parq". For some reason, text files had no file extension, so Impala would try to write text files of the following form: "244c5ee8ece6f759-8b1a1e3b_45513034_data.0.". Several tables created during dataload, such as alltypes, already use the '.txt' extension for their files. These tables are not created via Impala's INSERT code path, they are copied into the table. However, there are several tables created during dataload, such as alltypesinsert, that are created via Impala. This patch will change the files in these tables so that they end in '.txt'. This patch adds the ".txt" extension to all written text files and modifies the hdfs-table-sink.cc so that it doesn't add a trailing dot to a filename if there is no file extension. Testing: * Ran core tests * Re-ran affected ABFS tests * Added test to validate that the correct file extension is used for Parquet and text tables * Manually validated that without the addition of the '.txt' file extension, files are not written with a trailing dot Change-Id: I2a9adacd45855cde86724e10f8a131e17ebf46f8 Reviewed-on: http://gerrit.cloudera.org:8080/14621 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Impala on ABFS failed with error "IllegalArgumentException: ABFS does not > allow files or directories to end with a dot." > > > Key: IMPALA-8557 > URL: https://issues.apache.org/jira/browse/IMPALA-8557 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: Eric Lin >Assignee: Sahil Takiar >Priority: Major > > HDFS introduced below feature to stop users from creating a file that ends > with "." on ABFS: > https://issues.apache.org/jira/browse/HADOOP-15860 > As a result of this change, Impala now writes to ABFS fails with such error. > I can see that it generates temp file using this format "$0.$1.$2": > https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/exec/hdfs-table-sink.cc#L329 > $2 is the file extension and will be empty if it is TEXT file format: > https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/exec/hdfs-text-table-writer.cc#L65 > Since HADOOP-15860 was backported into CDH6.2, it is currently only affecting > 6.2 and works in older versions. > There is no way to override this empty file extension so no workaround is > possible, unless user choose another file format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7726) Drop with purge tests fail against ABFS due to trash misbehavior
[ https://issues.apache.org/jira/browse/IMPALA-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968500#comment-16968500 ] ASF subversion and git services commented on IMPALA-7726: - Commit e8fda1f224d3ad237183a53e238eee90188d82e2 in impala's branch refs/heads/master from Sahil Takiar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e8fda1f ] IMPALA-9117, IMPALA-7726: Fixed a few unit tests for ABFS This test makes the following changes / fixes when running Impala tests on ABFS: * Skips some tests in test_lineage.py that don't work on ABFS / ADLS (they were already skipped for S3) * Skips some tests in test_mt_dop.py; the test creates a directory that ends with a period (and ABFS does not support writing files or directories that end with a period) * Removes the ABFS skip flag SkipIfABFS.trash (IMPALA-7726: Drop with purge tests fail against ABFS due to trash misbehavior"); I removed these flags and looped the tests overnight with no failures, so it is likely whatever bug was causing this has now been fixed * Now that HADOOP-15860 has been resolved, and the agreed upon behavior for ABFS is that it will fail if a client tries to write a file / directory that ends with a period, I added a new entry to the SkipIfABFS class called file_or_folder_name_ends_with_period and applied it where necessary Testing: * Ran core tests on ABFS Change-Id: I18ae5b0f7de6aa7628a1efd780ff30a0cc3c5285 Reviewed-on: http://gerrit.cloudera.org:8080/14636 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Drop with purge tests fail against ABFS due to trash misbehavior > > > Key: IMPALA-7726 > URL: https://issues.apache.org/jira/browse/IMPALA-7726 > Project: IMPALA > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Labels: flaky > > In testing IMPALA-7681, I've seen test_drop_partition_with_purge and > test_drop_table_with_purge fail because of files not found in the trash are a > drop without purge. I've traced that functionality through Hive, which uses > Hadoop's Trash API, and traced through a bunch of scenarios in that API with > ABFS and I can't see it misbehaving in any way. It also should be pretty > FS-agnostic. I also suspected a bug in abfs_utils.py's exists() function, but > have not been able to find one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9132) Explain statements should not cause NPE in LogLineageRecord()
Anurag Mantripragada created IMPALA-9132: Summary: Explain statements should not cause NPE in LogLineageRecord() Key: IMPALA-9132 URL: https://issues.apache.org/jira/browse/IMPALA-9132 Project: IMPALA Issue Type: Bug Reporter: Anurag Mantripragada Assignee: Anurag Mantripragada For DDLs, LogLineageRecord() adds certain fields to the lineageGraph in the backend. However, explain statements do not have a catalogOpExecutor causing a NPE. We should, in general, avoid creating lineage records for Explain as Atlas currently does not use them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9131) Use single quotes when printing out FORMAT clause within CAST.
Gabor Kaszab created IMPALA-9131: Summary: Use single quotes when printing out FORMAT clause within CAST. Key: IMPALA-9131 URL: https://issues.apache.org/jira/browse/IMPALA-9131 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 3.3.0 Reporter: Gabor Kaszab Assignee: Gabor Kaszab Here the content of the FORMAT clause is surrounded by double quotes. {code:java} select cast('2016/10/10' as date format '/MM/DD'); ++ | cast('2016/10/10' as date format "/mm/dd") | ++ | 2016-10-10 | ++ {code} In order to follow SQL standards this should be surrounded by single quotes regardless of how the user gave the FORMAT clause. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9130) Upgrade external non-ACID table to ACID from Impala
[ https://issues.apache.org/jira/browse/IMPALA-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968324#comment-16968324 ] Gabor Kaszab commented on IMPALA-9130: -- [~csringhofer] > Upgrade external non-ACID table to ACID from Impala > --- > > Key: IMPALA-9130 > URL: https://issues.apache.org/jira/browse/IMPALA-9130 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Frontend >Affects Versions: Impala 3.3.0 >Reporter: Gabor Kaszab >Priority: Major > Labels: impala-acid > > If you have an external, non-ACID table and try to upgrade it to become an > ACID table you get an error message that an external table is not allowed to > be promoted to ACID. This is fine, however if in the very same step you set > 'EXTERNAL' = 'FALSE' in table properties you still get the same error while > Hive is able to execute it. > Steps to repro: > 1) Create a non-ACID external table. (or a single non-ACID table if you use > Hive that contains HIVE-22158) > 2) Upgrade the table > {code:java} > alter table tbl set tblproperties ('transactional'='true', > 'transactional_properties'='insert_only', 'EXTERNAL'='FALSE'); > {code} > Step 2) fails in Impala but succeeds in Hive -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9130) Upgrade external non-ACID table to ACID from Impala
Gabor Kaszab created IMPALA-9130: Summary: Upgrade external non-ACID table to ACID from Impala Key: IMPALA-9130 URL: https://issues.apache.org/jira/browse/IMPALA-9130 Project: IMPALA Issue Type: Bug Components: Catalog, Frontend Affects Versions: Impala 3.3.0 Reporter: Gabor Kaszab If you have an external, non-ACID table and try to upgrade it to become an ACID table you get an error message that an external table is not allowed to be promoted to ACID. This is fine, however if in the very same step you set 'EXTERNAL' = 'FALSE' in table properties you still get the same error while Hive is able to execute it. Steps to repro: 1) Create a non-ACID external table. (or a single non-ACID table if you use Hive that contains HIVE-22158) 2) Upgrade the table {code:java} alter table tbl set tblproperties ('transactional'='true', 'transactional_properties'='insert_only', 'EXTERNAL'='FALSE'); {code} Step 2) fails in Impala but succeeds in Hive -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org