John Russell has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 1:

(12 comments)

Mostly finished. Need to consult with Sailesh on the environment variable 
business, and test all this out with a real cluster & credentials. (Might make 
some modifications to the example output after such hands-on testing.)

http://gerrit.cloudera.org:8080/#/c/7175/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

PS2, Line 1072: <p rev="2.9.0 IMPALA-5333" id="adls_dml_performance">
              :         Because of differences between ADLS and traditional 
filesystems, DML operations
              :         for ADLS tables can take longer than for tables on HDFS.
              :         <draft-comment>
              :           Is there anything to say on this subject, if ADLS 
doesn't have
              :           the same file-moving-to-the-trashcan performance 
overhead as S3?
              :         </draft-comment>
              :       </p>
> This isn't necessarily true for ADLS the way it's true for S3. In S3 we don
Done


PS2, Line 1098: Because data files written to ADLS do not have a default block 
size
> This should be more like:
Done


PS2, Line 1153:       <p rev="2.9.0 IMPALA-5333" id="adls_dml">
              :         In <keyword keyref="impala29_full"/> and higher, the 
Impala DML statements (<codeph>INSERT</codeph>, <codeph>LOAD DATA</codeph>,
              :         and <codeph>CREATE TABLE AS SELECT</codeph>) can write 
data into a table or partition that resides in the
              :         Azure Data Lake Store (ADLS).
              :         The syntax of the DML statements is the same as for any 
other tables, because the ADLS location for tables and
              :         partitions is specified by an <codeph>adl://</codeph> 
prefix in the
              :         <codeph>LOCATION</codeph> attribute of
              :         <codeph>CREATE TABLE</codeph> or <codeph>ALTER 
TABLE</codeph> statements.
              :         If you bring data into ADLS using the normal ADLS 
transfer mechanisms instead of Impala DML statements,
              :         issue a <codeph>REFRESH</codeph> statement for the 
table before using Impala to query the ADLS data.
              :       </p>
> This was added for S3 since INSERT and LOAD DATA were added in a later rele
Why don't I leave this in because I can reuse the same text underneath the 
INSERT and LOAD DATA statements, where it will be a useful reminder / 
reassurance.


http://gerrit.cloudera.org:8080/#/c/7175/1/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

PS1, Line 59: </conbody>
Add some prereq info here about basic ADLS setup, both on the Azure side and 
from Hadoop.


PS1, Line 209: or on earlier Impala releases without DML support for ADLS
> We don't have earlier releases with any support for ADLS, so I don't think 
Done


Line 219:           <li>
> Need to consult with Sailesh. I haven't had hands-on experience with ADLS y
Done


PS1, Line 261:         You point
> Almost. I'll word it as "To X, do Y". (Although now the wording on the S3 p
Done


PS1, Line 271: bucket
> We call them stores in ADLS. So probably just do a find replace of "bucket"
Done


PS1, Line 291: impala-demo
> The stores have the format "adl://<store>.azuredatalakestore.net/path/to/fi
Done


PS1, Line 313: !??? ls adl://impala-demo/dir1/dir2/dir3 --recursive;
> I've not used the ADLS command line tool. It seemed a little hard to setup.
Done. I'll use the same hadoop fs syntax. I agree someone would need a lot of 
ADLS / Azure experience to be proficient with the ADLS command-line tools.


PS1, Line 329: !??? ls adl://impala-demo/dir1/dir2/dir3 --recursive;
> ditto
Done


http://gerrit.cloudera.org:8080/#/c/7175/2/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

PS2, Line 617: IMPALA-5383
Take out this stray JIRA number. It's addressed by the 'adls_block_splitting' 
item above which mentions the PARQUET_FILE_SIZE query option.


-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jruss...@cloudera.com>
Gerrit-Reviewer: David Knupp <dkn...@cloudera.com>
Gerrit-Reviewer: John Russell <jruss...@cloudera.com>
Gerrit-Reviewer: Laurel Hale <lau...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to