John Russell has posted comments on this change. Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support ......................................................................
Patch Set 1: (12 comments) Mostly finished. Need to consult with Sailesh on the environment variable business, and test all this out with a real cluster & credentials. (Might make some modifications to the example output after such hands-on testing.) http://gerrit.cloudera.org:8080/#/c/7175/2/docs/shared/impala_common.xml File docs/shared/impala_common.xml: PS2, Line 1072: <p rev="2.9.0 IMPALA-5333" id="adls_dml_performance"> : Because of differences between ADLS and traditional filesystems, DML operations : for ADLS tables can take longer than for tables on HDFS. : <draft-comment> : Is there anything to say on this subject, if ADLS doesn't have : the same file-moving-to-the-trashcan performance overhead as S3? : </draft-comment> : </p> > This isn't necessarily true for ADLS the way it's true for S3. In S3 we don Done PS2, Line 1098: Because data files written to ADLS do not have a default block size > This should be more like: Done PS2, Line 1153: <p rev="2.9.0 IMPALA-5333" id="adls_dml"> : In <keyword keyref="impala29_full"/> and higher, the Impala DML statements (<codeph>INSERT</codeph>, <codeph>LOAD DATA</codeph>, : and <codeph>CREATE TABLE AS SELECT</codeph>) can write data into a table or partition that resides in the : Azure Data Lake Store (ADLS). : The syntax of the DML statements is the same as for any other tables, because the ADLS location for tables and : partitions is specified by an <codeph>adl://</codeph> prefix in the : <codeph>LOCATION</codeph> attribute of : <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph> statements. : If you bring data into ADLS using the normal ADLS transfer mechanisms instead of Impala DML statements, : issue a <codeph>REFRESH</codeph> statement for the table before using Impala to query the ADLS data. : </p> > This was added for S3 since INSERT and LOAD DATA were added in a later rele Why don't I leave this in because I can reuse the same text underneath the INSERT and LOAD DATA statements, where it will be a useful reminder / reassurance. http://gerrit.cloudera.org:8080/#/c/7175/1/docs/topics/impala_adls.xml File docs/topics/impala_adls.xml: PS1, Line 59: </conbody> Add some prereq info here about basic ADLS setup, both on the Azure side and from Hadoop. PS1, Line 209: or on earlier Impala releases without DML support for ADLS > We don't have earlier releases with any support for ADLS, so I don't think Done Line 219: <li> > Need to consult with Sailesh. I haven't had hands-on experience with ADLS y Done PS1, Line 261: You point > Almost. I'll word it as "To X, do Y". (Although now the wording on the S3 p Done PS1, Line 271: bucket > We call them stores in ADLS. So probably just do a find replace of "bucket" Done PS1, Line 291: impala-demo > The stores have the format "adl://<store>.azuredatalakestore.net/path/to/fi Done PS1, Line 313: !??? ls adl://impala-demo/dir1/dir2/dir3 --recursive; > I've not used the ADLS command line tool. It seemed a little hard to setup. Done. I'll use the same hadoop fs syntax. I agree someone would need a lot of ADLS / Azure experience to be proficient with the ADLS command-line tools. PS1, Line 329: !??? ls adl://impala-demo/dir1/dir2/dir3 --recursive; > ditto Done http://gerrit.cloudera.org:8080/#/c/7175/2/docs/topics/impala_adls.xml File docs/topics/impala_adls.xml: PS2, Line 617: IMPALA-5383 Take out this stray JIRA number. It's addressed by the 'adls_block_splitting' item above which mentions the PARQUET_FILE_SIZE query option. -- To view, visit http://gerrit.cloudera.org:8080/7175 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell <jruss...@cloudera.com> Gerrit-Reviewer: David Knupp <dkn...@cloudera.com> Gerrit-Reviewer: John Russell <jruss...@cloudera.com> Gerrit-Reviewer: Laurel Hale <lau...@cloudera.com> Gerrit-Reviewer: Michael Brown <mi...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-HasComments: Yes