[jira] [Commented] (IMPALA-5931) Don't synthesize block metadata in the catalog for S3/ADLS

ASF subversion and git services (Jira) Thu, 19 Sep 2019 10:32:16 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933615#comment-16933615
 ]


ASF subversion and git services commented on IMPALA-5931:
---------------------------------------------------------

Commit feed25084a999fe0a4e7b58b5264fce5829c43e7 in impala's branch 
refs/heads/master from stakiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=feed250 ]

IMPALA-8944: Update and re-enable S3PlannerTest

Addresses several test infra issues that were preventing the
S3PlannerTest from running successfully. Disables a few tests that are
no longer working, and removes some planner checks that are no longer
applicable when running on S3. Specifically, this patch removes the
checks in PlannerTestBase#checkScanRangeLocations when running against
S3, because the planner no longer generates scan ranges; generation is
deferred to the scheduler (IMPALA-5931).

Replaces the old logic of specifying S3-specific fe/ tests with a
combination of JUnit Categories and Maven Profiles. The previous method
was broken and assumed that all S3-specific fe/ tests started with S3*.
The new approach removes that restriction and only requires S3-specific
JUnit tests to be tagged with the Java annotation
'@Category(S3Tests.class)' (entire classes or individual tests can be
tagged with the annotation).

Testing:
* Ran fe/ tests with TARGET_FILESYSTEM=s3

Change-Id: I1690b6c5346376c1111fd4845c72062cc237e0f9
Reviewed-on: http://gerrit.cloudera.org:8080/14248
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Don't synthesize block metadata in the catalog for S3/ADLS
> ----------------------------------------------------------
>
>                 Key: IMPALA-5931
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5931
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Dan Hecht
>            Assignee: Vuk Ercegovac
>            Priority: Major
>             Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Today, the catalog synthesizes block metadata for S3/ADLS by just breaking up 
> splittable files into "blocks" with the FileSystem's default block size. 
> Rather than carrying these blocks around in the catalog and distributing them 
> to all impalad's, we might as well generate the scan ranges on-the-fly during 
> planning. That would save the memory and network bandwidth of blocks.
> That does mean that the planner will have to instantiate and call the 
> filesystem to get the default block size, but for these FileSystem's, that's 
> just a matter of reading the config.
> Perhaps the same can be done for HDFS erasure coding, though that depends on 
> what a block location actually means in that context and whether they contain 
> useful info.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-5931) Don't synthesize block metadata in the catalog for S3/ADLS

Reply via email to